Skip to content

ProColor Diversity

The color module provides functions for constructing produce color diversity indicators used in the Nutrition-Sensitive Food Environment Index (NFEI).

In food environment analysis, diversity is not only defined by the number of food groups available, but also by the variety of fruits and vegetables offered. Produce color diversity serves as a practical proxy for variation in micronutrient-rich foods, as different color groups are often associated with different vitamins, minerals, and bioactive compounds.

This module operationalizes produce diversity by scanning vendor-level data for predefined color groups and summarizing their presence into a single indicator. The approach is consistent with the NFEI workflow, where color diversity is used to complement food group diversity and provide additional insight into the nutritional quality of the food environment.

The module supports data structures where produce color information is stored as:

  • single color values (e.g., "Red"), or
  • comma-separated values (e.g., "Red, Yellow_Orange"), reflecting multiple produce types available at a vendor.

Three core functions are provided:

  • add_produce_color_diversity, which constructs the main color diversity indicator by identifying the presence of predefined color groups and counting how many are available per observation.
  • count_unique_colors_in_columns, which counts the number of distinct color values across one or more columns without enforcing predefined color groups.

Together, these functions enable flexible construction of produce color indicators, supporting both strict NFEI-aligned scoring and custom exploratory analyses of fruit and vegetable diversity.

Add produce color diversity indicators

add_produce_color_diversity(df: DataFrame, color_cols: list[str], color_groups: list[str] | None = None, output_col: str = 'overall_color', fillna_value: int | float = 0, include_color_flags: bool = True, flag_col_suffix: str = '_color') -> pd.DataFrame

This function computes the produce color diversity component used in the NFEI workflow. It scans user-specified produce color columns for defined color groups and creates a score representing the number of color groups available for each observation.

By default, the function uses the six NFEI produce color groups:

  • White_Brown
  • Yellow_Orange
  • Green_other
  • Dark_leafy_green
  • Red
  • Purple_Blue

These color groups are used to summarize fruit and vegetable diversity beyond food-group counts alone. In the NFEI workflow, produce color diversity complements healthy food diversity by capturing variation in fruit and vegetable availability.

Parameters:

Name Type Description Default
df DataFrame

Input dataframe.

required
color_cols list[str]

List of columns containing produce color values. Values may be single color names or comma-separated color names.

required
color_groups list[str] | None

Optional list of color groups to search for. If None, the default NFEI color groups are used.

None
output_col str

Name of the output produce color diversity score. The default is "overall_color".

'overall_color'
fillna_value int | float

Value used to replace missing output values before converting the final score to integer.

0
include_color_flags bool

If True, binary flag columns are retained for each color group. If False, only the final color diversity score is retained.

True
flag_col_suffix str

Suffix added to each color group name when creating binary color flag columns. The default is "_color".

'_color'

Returns:

Type Description
DataFrame

Copy of the input dataframe with the produce color diversity score added. If include_color_flags=True, binary color flag columns are also included.

Raises:

Type Description
KeyError

If any column listed in color_cols is not found in the dataframe.

ValueError

If color_groups is an empty list.

Notes

Input color values may be stored as comma-separated strings, for example "Red, Yellow_Orange".

Color matching is case-insensitive because the internal helper function :func:color_exists lowercases values before comparison. However, output flag column names preserve the spelling of the color group labels supplied in color_groups.

The final score counts how many color groups are present at least once across the selected columns. It does not count the number of individual food items.

Examples:

Standard use with the default NFEI color groups:

>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
...     {
...         "fruit_colors": ["Red, Yellow_Orange", "Purple_Blue"],
...         "vegetable_colors": ["Green_other", "Dark_leafy_green, Red"],
...     }
... )
>>> result = nfei.add_produce_color_diversity(
...     df,
...     color_cols=["fruit_colors", "vegetable_colors"],
...     output_col="overall_color",
... )

Return only the final color diversity score:

>>> result = nfei.add_produce_color_diversity(
...     df,
...     color_cols=["fruit_colors", "vegetable_colors"],
...     include_color_flags=False,
... )

Use a custom set of color groups:

>>> result = nfei.add_produce_color_diversity(
...     df,
...     color_cols=["fruit_colors", "vegetable_colors"],
...     color_groups=["Red", "Green_other", "Purple_Blue"],
...     output_col="custom_color_score",
... )

Count unique produce colors across selected columns

count_unique_colors_in_columns(df: DataFrame, columns: list[str], output_col: str = 'unique_color_count') -> pd.Series

This helper function counts the number of distinct produce color groups recorded for each observation across one or more dataframe columns. It is useful when produce color information is spread across multiple columns, such as separate fruit and vegetable color fields.

The function supports comma-separated color values within each cell, which reflects the format used in the original NFEI notebook workflows.

Parameters:

Name Type Description Default
df DataFrame

Input dataframe.

required
columns list[str]

List of columns containing produce color values. Values may be single color names or comma-separated color names.

required
output_col str

Name assigned to the returned pandas Series. The default is "unique_color_count".

'unique_color_count'

Returns:

Type Description
Series

Series containing the number of unique color groups found across the selected columns for each row.

Raises:

Type Description
KeyError

If any column listed in columns is not found in the dataframe.

Notes

Missing values are ignored. Empty strings are also ignored after splitting comma-separated values.

This function does not create binary flags for specific color groups. For NFEI-style produce color diversity scoring, use :func:add_produce_color_diversity.

Examples:

Count unique colors across fruit and vegetable color columns:

>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
...     {
...         "fruit_colors": ["Red, Yellow_Orange", "Purple_Blue"],
...         "vegetable_colors": ["Green_other", "Red, Green_other"],
...     }
... )
>>> result = nfei.count_unique_colors_in_columns(
...     df,
...     columns=["fruit_colors", "vegetable_colors"],
... )