ProColor Diversity
The color module provides functions for constructing produce color diversity
indicators used in the Nutrition-Sensitive Food Environment Index (NFEI).
In food environment analysis, diversity is not only defined by the number of food groups available, but also by the variety of fruits and vegetables offered. Produce color diversity serves as a practical proxy for variation in micronutrient-rich foods, as different color groups are often associated with different vitamins, minerals, and bioactive compounds.
This module operationalizes produce diversity by scanning vendor-level data for predefined color groups and summarizing their presence into a single indicator. The approach is consistent with the NFEI workflow, where color diversity is used to complement food group diversity and provide additional insight into the nutritional quality of the food environment.
The module supports data structures where produce color information is stored as:
- single color values (e.g.,
"Red"), or - comma-separated values (e.g.,
"Red, Yellow_Orange"), reflecting multiple produce types available at a vendor.
Three core functions are provided:
add_produce_color_diversity, which constructs the main color diversity indicator by identifying the presence of predefined color groups and counting how many are available per observation.count_unique_colors_in_columns, which counts the number of distinct color values across one or more columns without enforcing predefined color groups.
Together, these functions enable flexible construction of produce color indicators, supporting both strict NFEI-aligned scoring and custom exploratory analyses of fruit and vegetable diversity.
Add produce color diversity indicators
add_produce_color_diversity(df: DataFrame, color_cols: list[str], color_groups: list[str] | None = None, output_col: str = 'overall_color', fillna_value: int | float = 0, include_color_flags: bool = True, flag_col_suffix: str = '_color') -> pd.DataFrame
This function computes the produce color diversity component used in the NFEI workflow. It scans user-specified produce color columns for defined color groups and creates a score representing the number of color groups available for each observation.
By default, the function uses the six NFEI produce color groups:
White_BrownYellow_OrangeGreen_otherDark_leafy_greenRedPurple_Blue
These color groups are used to summarize fruit and vegetable diversity beyond food-group counts alone. In the NFEI workflow, produce color diversity complements healthy food diversity by capturing variation in fruit and vegetable availability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input dataframe. |
required |
color_cols
|
list[str]
|
List of columns containing produce color values. Values may be single color names or comma-separated color names. |
required |
color_groups
|
list[str] | None
|
Optional list of color groups to search for. If None, the default NFEI color groups are used. |
None
|
output_col
|
str
|
Name of the output produce color diversity score. The default is
|
'overall_color'
|
fillna_value
|
int | float
|
Value used to replace missing output values before converting the final score to integer. |
0
|
include_color_flags
|
bool
|
If True, binary flag columns are retained for each color group. If False, only the final color diversity score is retained. |
True
|
flag_col_suffix
|
str
|
Suffix added to each color group name when creating binary color flag
columns. The default is |
'_color'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of the input dataframe with the produce color diversity score
added. If |
Raises:
| Type | Description |
|---|---|
KeyError
|
If any column listed in |
ValueError
|
If |
Notes
Input color values may be stored as comma-separated strings, for example
"Red, Yellow_Orange".
Color matching is case-insensitive because the internal helper function
:func:color_exists lowercases values before comparison. However, output
flag column names preserve the spelling of the color group labels supplied
in color_groups.
The final score counts how many color groups are present at least once across the selected columns. It does not count the number of individual food items.
Examples:
Standard use with the default NFEI color groups:
>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
... {
... "fruit_colors": ["Red, Yellow_Orange", "Purple_Blue"],
... "vegetable_colors": ["Green_other", "Dark_leafy_green, Red"],
... }
... )
>>> result = nfei.add_produce_color_diversity(
... df,
... color_cols=["fruit_colors", "vegetable_colors"],
... output_col="overall_color",
... )
Return only the final color diversity score:
>>> result = nfei.add_produce_color_diversity(
... df,
... color_cols=["fruit_colors", "vegetable_colors"],
... include_color_flags=False,
... )
Use a custom set of color groups:
>>> result = nfei.add_produce_color_diversity(
... df,
... color_cols=["fruit_colors", "vegetable_colors"],
... color_groups=["Red", "Green_other", "Purple_Blue"],
... output_col="custom_color_score",
... )
Count unique produce colors across selected columns
count_unique_colors_in_columns(df: DataFrame, columns: list[str], output_col: str = 'unique_color_count') -> pd.Series
This helper function counts the number of distinct produce color groups recorded for each observation across one or more dataframe columns. It is useful when produce color information is spread across multiple columns, such as separate fruit and vegetable color fields.
The function supports comma-separated color values within each cell, which reflects the format used in the original NFEI notebook workflows.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input dataframe. |
required |
columns
|
list[str]
|
List of columns containing produce color values. Values may be single color names or comma-separated color names. |
required |
output_col
|
str
|
Name assigned to the returned pandas Series. The default is
|
'unique_color_count'
|
Returns:
| Type | Description |
|---|---|
Series
|
Series containing the number of unique color groups found across the selected columns for each row. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If any column listed in |
Notes
Missing values are ignored. Empty strings are also ignored after splitting comma-separated values.
This function does not create binary flags for specific color groups. For
NFEI-style produce color diversity scoring, use
:func:add_produce_color_diversity.
Examples:
Count unique colors across fruit and vegetable color columns:
>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
... {
... "fruit_colors": ["Red, Yellow_Orange", "Purple_Blue"],
... "vegetable_colors": ["Green_other", "Red, Green_other"],
... }
... )
>>> result = nfei.count_unique_colors_in_columns(
... df,
... columns=["fruit_colors", "vegetable_colors"],
... )