Skip to content

Scaling

The scaling module provides functions for transforming raw indicator values into standardized scores used in the Nutrition-Sensitive Food Environment Index (NFEI).

In food environment analysis, different indicators are measured in different units and ranges. For example, vendor density may be expressed as counts per population, while availability is measured as a percentage, and diversity is measured as a count of food groups. These indicators are not directly comparable in their raw form.

This module addresses this challenge by applying linear scaling to convert indicators into a common interpretation range, typically from 0 to 10. This ensures that multiple indicators can be meaningfully compared and combined into a composite index.

The module also supports inversion of indicators where higher raw values represent less desirable conditions. For example, unhealthy food counts can be inverted so that higher scaled scores consistently represent healthier food environments.

The scaling approach used in this module mirrors the NFEI workflow, ensuring that final indicator scores are interpretable, comparable, and suitable for aggregation into composite measures.

Linear scaling

create_linear_scale(df: DataFrame, col: str, expected_max: float | None = None, min_scale: float = 0, max_scale: float = 10, invert: bool = False, var_title: str | None = None, drop_intermediate: bool = True) -> pd.DataFrame

Create a linearly scaled indicator.

This function transforms a numeric column into a standardized score using linear scaling. It is used in the NFEI workflow to align indicators with different units and ranges onto a common interpretation scale, typically from 0 to 10.

The function first normalizes the selected column to a 0–1 range using Min-Max scaling. It can optionally adjust this normalization using an expected maximum value, invert the resulting scores, and then rescale them to a user-defined range.

Parameters:

Name Type Description Default
df DataFrame

Input dataframe.

required
col str

Name of the numeric column to be scaled.

required
expected_max float | None

Optional theoretical or expected maximum value for the indicator. If provided, the normalized values are adjusted so that scores remain interpretable relative to this expected maximum, rather than only the observed maximum in the data.

None
min_scale float

Lower bound of the final scale. The default is 0.

0
max_scale float

Upper bound of the final scale. The default is 10.

10
invert bool

If True, higher original values receive lower scaled scores. This is typically used for indicators where higher values represent less desirable conditions, such as unhealthy food exposure.

False
var_title str | None

Name of the output scaled column. If None, a default name is generated as "{col}_scaled".

None
drop_intermediate bool

If True, intermediate normalization columns are removed from the dataframe. If False, intermediate columns are retained for inspection. Intermediate columns are named using the input column name: "_{col}_normalized" and "_{col}_final_normalized".

True

Returns:

Type Description
DataFrame

Copy of the input dataframe with the scaled indicator column added.

Raises:

Type Description
KeyError

If col is not found in the dataframe.

ValueError

If max_scale is less than or equal to min_scale.

Notes

The scaling process follows three steps:

  1. Normalize the input column to a 0–1 range using Min-Max scaling.
  2. Optionally adjust normalized values using expected_max.
  3. Rescale the values to the specified range:

min_scale + normalized * (max_scale - min_scale)

If invert=True, the normalized values are transformed as:

1 - normalized

before rescaling.

The expected_max parameter is particularly important when working with indicators that have a known theoretical upper bound. It prevents artificially high scores when the observed maximum in the dataset is lower than what is theoretically possible.

Examples:

Scale an indicator to the default 0–10 range:

>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
...     {
...         "unhealthy_food_count": [0, 5, 10]
...     }
... )
>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
... )

Invert the scale so higher values represent better conditions:

>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
...     invert=True,
... )

Use an expected maximum to stabilize interpretation:

>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
...     expected_max=15,
... )

Retain intermediate columns for debugging:

>>> result = nfei.create_linear_scale(
...     df,
...     col="unhealthy_food_count",
...     drop_intermediate=False,
... )