Skip to content

Diversity

The diversity module provides functions for constructing food availability and food diversity indicators used in the Nutrition-Sensitive Food Environment Index (NFEI).

In food environment analysis, diversity is a central dimension of nutritional quality. A food environment that offers a wider range of food groups is more likely to support adequate and balanced diets, while limited diversity may constrain dietary choices and increase the risk of nutrient deficiencies.

This module operationalizes diversity through two complementary approaches:

  • Healthy food diversity, measured using the Market-Level Diversity Score (MLDS), which counts the availability of key food groups aligned with the Minimum Dietary Diversity for Women framework.
  • Unhealthy food exposure, measured by counting the availability of selected unhealthy beverages and snacks, providing a counterpoint to healthy food diversity within the same environment.

These indicators are designed to work with vendor- or market-level survey data where food availability is recorded as binary or numeric variables across multiple food items.

The module also includes a general-purpose utility function for counting item availability across selected columns. This supports flexible construction of custom indicators beyond the predefined NFEI measures.

Together, these functions enable users to construct interpretable food availability indicators that capture both the presence of diverse, nutrient-rich foods and the exposure to less healthy options, forming a key component of the NFEI framework.

Add Market-Level Diversity Score (MLDS)

add_market_level_diversity_score(df: DataFrame, food_group_cols: dict[str, str | list[str]], output_col: str = 'mlds', fillna_value: int | float = 0) -> pd.DataFrame

Add a Market-Level Diversity Score.

This function computes the Market-Level Diversity Score (MLDS), a vendor-level food diversity indicator used in the NFEI workflow. It counts how many of the 10 required food groups are available for each observation.

The food group structure follows the logic of the Minimum Dietary Diversity for Women framework, but is applied to market or vendor-level food availability data rather than individual dietary intake data.

The required food groups are:

  • grains_roots_tubers
  • legumes_pulses
  • nuts_seeds
  • dairy
  • meat_poultry_fish
  • eggs
  • dark_green_leafy_vegetables
  • vitamin_a_rich_fruits
  • other_vegetables
  • other_fruits

Users must explicitly map each required food group to the corresponding column or columns in their dataframe. If a food group is mapped to multiple columns, the group is scored as 1 when at least one mapped column has a value greater than zero.

Parameters:

Name Type Description Default
df DataFrame

Input dataframe.

required
food_group_cols dict[str, str | list[str]]

Dictionary mapping each required food group to one dataframe column or a list of dataframe columns. The dictionary must contain exactly the 10 required MLDS food group keys.

required
output_col str

Name of the output MLDS column. The default is "mlds".

'mlds'
fillna_value int | float

Value used to replace missing values in mapped food availability columns before calculating the score.

0

Returns:

Type Description
DataFrame

Copy of the input dataframe with the MLDS column added.

Raises:

Type Description
KeyError

If any required MLDS food group is missing, if unknown food group keys are provided, or if any mapped dataframe column is not found.

ValueError

If a food group is mapped to an empty list.

TypeError

If a food group is not mapped to either a column name or a list of column names.

Notes

The output score ranges from 0 to 10, where higher values indicate that a vendor or market observation offers more of the required food groups.

Missing values in mapped food columns are filled using fillna_value before scoring. By default, missing values are treated as 0.

This function does not perform spatial aggregation. To construct environment-level diversity indicators, compute relevant binary food group columns first and then use a spatial aggregation function such as :func:nfei.features_proximity_agg.

Examples:

Standard use with one or more columns mapped to each food group:

>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
...     {
...         "grains": [1, 1],
...         "roots_tubers": [0, 1],
...         "legumes_pulses": [1, 0],
...         "nuts_seeds": [0, 1],
...         "dairy": [1, 0],
...         "flesh_meat": [1, 0],
...         "organ_meat": [0, 0],
...         "fish": [0, 1],
...         "egg": [1, 0],
...         "dark_green_veg": [0, 1],
...         "vita_rich_fruits": [1, 0],
...         "other_veg": [1, 1],
...         "other_fruits": [0, 1],
...     }
... )
>>> food_group_cols = {
...     "grains_roots_tubers": ["grains", "roots_tubers"],
...     "legumes_pulses": "legumes_pulses",
...     "nuts_seeds": "nuts_seeds",
...     "dairy": "dairy",
...     "meat_poultry_fish": ["flesh_meat", "organ_meat", "fish"],
...     "eggs": "egg",
...     "dark_green_leafy_vegetables": "dark_green_veg",
...     "vitamin_a_rich_fruits": "vita_rich_fruits",
...     "other_vegetables": "other_veg",
...     "other_fruits": "other_fruits",
... }
>>> result = nfei.add_market_level_diversity_score(
...     df,
...     food_group_cols=food_group_cols,
... )

Use a custom output column name:

>>> result = nfei.add_market_level_diversity_score(
...     df,
...     food_group_cols=food_group_cols,
...     output_col="vendor_healthy_food_diversity",
... )

Create unhealthy beverage, snack, and total unhealthy food counts

add_unhealthy_food_count(beverage_df: DataFrame, snack_df: DataFrame, beverage_cols: list[str], snack_cols: list[str], id_col: str = 'survey_id', beverage_count_col: str = 'unhealthy_bev_count', snack_count_col: str = 'unhealthy_snack_count', output_col: str = 'unhealthy_food_count') -> pd.DataFrame

This function constructs the unhealthy food exposure indicator used in the NFEI workflow. It counts selected unhealthy beverage items and selected unhealthy snack items separately, merges both counts by an identifier column, and then creates a total unhealthy food count.

The function is designed for survey structures where beverage and snack availability may be stored in separate dataframes, such as separate repeat groups or separately cleaned item tables.

Parameters:

Name Type Description Default
beverage_df DataFrame

Dataframe containing the identifier column and unhealthy beverage item columns.

required
snack_df DataFrame

Dataframe containing the identifier column and unhealthy snack item columns.

required
beverage_cols list[str]

List of binary or numeric beverage columns to count.

required
snack_cols list[str]

List of binary or numeric snack columns to count.

required
id_col str

Identifier column used to merge beverage and snack counts. The default is "survey_id".

'survey_id'
beverage_count_col str

Name of the output beverage count column. The default is "unhealthy_bev_count".

'unhealthy_bev_count'
snack_count_col str

Name of the output snack count column. The default is "unhealthy_snack_count".

'unhealthy_snack_count'
output_col str

Name of the total unhealthy food count column. The default is "unhealthy_food_count".

'unhealthy_food_count'

Returns:

Type Description
DataFrame

Dataframe containing the identifier column, unhealthy beverage count, unhealthy snack count, and total unhealthy food count.

Raises:

Type Description
KeyError

If id_col is not found in either beverage_df or snack_df, or if any column listed in beverage_cols or snack_cols is missing.

Notes

Beverage and snack counts are merged using an outer join. This preserves identifiers that appear in only one of the two input dataframes. Missing counts after the merge are filled with 0 before calculating the total count.

The total unhealthy food count is calculated as:

unhealthy beverage count + unhealthy snack count

Higher values represent greater availability of selected unhealthy foods. In an NFEI-style composite score, this indicator is typically inverted during scaling so that higher scaled values represent healthier food environments.

Examples:

Count unhealthy beverages and snacks from separate item tables:

>>> import pandas as pd
>>> import nfei
>>>
>>> beverage_df = pd.DataFrame(
...     {
...         "survey_id": [1, 2],
...         "alcoholic_beverages": [1, 0],
...         "energy_drinks": [1, 0],
...         "sweetened_beverages_fruit_juices": [0, 1],
...     }
... )
>>> snack_df = pd.DataFrame(
...     {
...         "survey_id": [1, 2],
...         "biscuits": [1, 0],
...         "cakes_pastries": [1, 0],
...         "candies": [0, 1],
...         "cookies": [1, 0],
...         "sweets": [1, 1],
...     }
... )
>>> result = nfei.add_unhealthy_food_count(
...     beverage_df=beverage_df,
...     snack_df=snack_df,
...     beverage_cols=[
...         "alcoholic_beverages",
...         "energy_drinks",
...         "sweetened_beverages_fruit_juices",
...     ],
...     snack_cols=[
...         "biscuits",
...         "cakes_pastries",
...         "candies",
...         "cookies",
...         "sweets",
...     ],
...     id_col="survey_id",
... )

Use custom output column names:

>>> result = nfei.add_unhealthy_food_count(
...     beverage_df=beverage_df,
...     snack_df=snack_df,
...     beverage_cols=[
...         "alcoholic_beverages",
...         "energy_drinks",
...         "sweetened_beverages_fruit_juices",
...     ],
...     snack_cols=[
...         "biscuits",
...         "cakes_pastries",
...         "candies",
...         "cookies",
...         "sweets",
...     ],
...     beverage_count_col="bev_count",
...     snack_count_col="snack_count",
...     output_col="total_unhealthy_items",
... )

Count available items across selected columns

count_available_items(df: DataFrame, cols: list[str], output_col: str, fillna_value: int | float = 0) -> pd.DataFrame

This utility function counts the number of available items across a list of binary or numeric columns. It is useful when several item-level variables need to be summarized into a single count indicator.

In the NFEI workflow, this function is used internally by :func:add_unhealthy_food_count to count unhealthy beverage and snack availability before combining both categories into a total unhealthy food count.

Parameters:

Name Type Description Default
df DataFrame

Input dataframe.

required
cols list[str]

List of binary or numeric columns to count across each row.

required
output_col str

Name of the output count column.

required
fillna_value int | float

Value used to replace missing values in cols before counting. The default is 0.

0

Returns:

Type Description
DataFrame

Copy of the input dataframe with the item count column added.

Raises:

Type Description
KeyError

If any column listed in cols is not found in the dataframe.

Notes

The function sums values across cols row-wise. For binary columns, the result represents the number of available items. For non-binary numeric columns, the result is the row-wise sum of the supplied values.

Missing values are filled before summation using fillna_value.

Examples:

Count available snack items from binary columns:

>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
...     {
...         "biscuits": [1, 0],
...         "cookies": [1, 1],
...         "ice_cream": [0, 1],
...     }
... )
>>> result = nfei.count_available_items(
...     df,
...     cols=["biscuits", "cookies", "ice_cream"],
...     output_col="snack_count",
... )