Scaling

The scaling module provides functions for transforming raw indicator values into standardized scores used in the Nutrition-Sensitive Food Environment Index (NFEI).

In food environment analysis, different indicators are measured in different units and ranges. For example, vendor density may be expressed as counts per population, while availability is measured as a percentage, and diversity is measured as a count of food groups. These indicators are not directly comparable in their raw form.

This module addresses this challenge by applying linear scaling to convert indicators into a common interpretation range, typically from 0 to 10. This ensures that multiple indicators can be meaningfully compared and combined into a composite index.

The module also supports inversion of indicators where higher raw values represent less desirable conditions. For example, unhealthy food counts can be inverted so that higher scaled scores consistently represent healthier food environments.

The scaling approach used in this module mirrors the NFEI workflow, ensuring that final indicator scores are interpretable, comparable, and suitable for aggregation into composite measures.

Linear scaling

create_linear_scale(df: DataFrame, col: str, expected_max: float | None = None, min_scale: float = 0, max_scale: float = 10, invert: bool = False, var_title: str | None = None, drop_intermediate: bool = True) -> pd.DataFrame

Create a linearly scaled indicator.

This function transforms a numeric column into a standardized score using linear scaling. It is used in the NFEI workflow to align indicators with different units and ranges onto a common interpretation scale, typically from 0 to 10.

The function first normalizes the selected column to a 0–1 range using Min-Max scaling. It can optionally adjust this normalization using an expected maximum value, invert the resulting scores, and then rescale them to a user-defined range.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	Input dataframe.	required
`col`	`str`	Name of the numeric column to be scaled.	required
`expected_max`	`float \| None`	Optional theoretical or expected maximum value for the indicator. If provided, the normalized values are adjusted so that scores remain interpretable relative to this expected maximum, rather than only the observed maximum in the data.	`None`
`min_scale`	`float`	Lower bound of the final scale. The default is 0.	`0`
`max_scale`	`float`	Upper bound of the final scale. The default is 10.	`10`
`invert`	`bool`	If True, higher original values receive lower scaled scores. This is typically used for indicators where higher values represent less desirable conditions, such as unhealthy food exposure.	`False`
`var_title`	`str \| None`	Name of the output scaled column. If None, a default name is generated as `"{col}_scaled"`.	`None`
`drop_intermediate`	`bool`	If True, intermediate normalization columns are removed from the dataframe. If False, intermediate columns are retained for inspection. Intermediate columns are named using the input column name: `"_{col}_normalized"` and `"_{col}_final_normalized"`.	`True`

Returns:

Type	Description
`DataFrame`	Copy of the input dataframe with the scaled indicator column added.

Raises:

Type	Description
`KeyError`	If `col` is not found in the dataframe.
`ValueError`	If `max_scale` is less than or equal to `min_scale`.

Notes

The scaling process follows three steps:

Normalize the input column to a 0–1 range using Min-Max scaling.
Optionally adjust normalized values using expected_max.
Rescale the values to the specified range:

min_scale + normalized * (max_scale - min_scale)

If invert=True, the normalized values are transformed as:

1 - normalized

before rescaling.

The expected_max parameter is particularly important when working with indicators that have a known theoretical upper bound. It prevents artificially high scores when the observed maximum in the dataset is lower than what is theoretically possible.

Examples:

Scale an indicator to the default 0–10 range:

>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
...     {
...         "unhealthy_food_count": [0, 5, 10]
...     }
... )
>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
... )

Invert the scale so higher values represent better conditions:

>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
...     invert=True,
... )

Use an expected maximum to stabilize interpretation:

>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
...     expected_max=15,
... )

Retain intermediate columns for debugging:

>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
...     drop_intermediate=False,
... )