Scaling
The scaling module provides functions for transforming raw indicator values into standardized scores used in the Nutrition-Sensitive Food Environment Index (NFEI).
In food environment analysis, different indicators are measured in different units and ranges. For example, vendor density may be expressed as counts per population, while availability is measured as a percentage, and diversity is measured as a count of food groups. These indicators are not directly comparable in their raw form.
This module addresses this challenge by applying linear scaling to convert indicators into a common interpretation range, typically from 0 to 10. This ensures that multiple indicators can be meaningfully compared and combined into a composite index.
The module also supports inversion of indicators where higher raw values represent less desirable conditions. For example, unhealthy food counts can be inverted so that higher scaled scores consistently represent healthier food environments.
The scaling approach used in this module mirrors the NFEI workflow, ensuring that final indicator scores are interpretable, comparable, and suitable for aggregation into composite measures.
Linear scaling
create_linear_scale(df: DataFrame, col: str, expected_max: float | None = None, min_scale: float = 0, max_scale: float = 10, invert: bool = False, var_title: str | None = None, drop_intermediate: bool = True) -> pd.DataFrame
Create a linearly scaled indicator.
This function transforms a numeric column into a standardized score using linear scaling. It is used in the NFEI workflow to align indicators with different units and ranges onto a common interpretation scale, typically from 0 to 10.
The function first normalizes the selected column to a 0–1 range using Min-Max scaling. It can optionally adjust this normalization using an expected maximum value, invert the resulting scores, and then rescale them to a user-defined range.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input dataframe. |
required |
col
|
str
|
Name of the numeric column to be scaled. |
required |
expected_max
|
float | None
|
Optional theoretical or expected maximum value for the indicator. If provided, the normalized values are adjusted so that scores remain interpretable relative to this expected maximum, rather than only the observed maximum in the data. |
None
|
min_scale
|
float
|
Lower bound of the final scale. The default is 0. |
0
|
max_scale
|
float
|
Upper bound of the final scale. The default is 10. |
10
|
invert
|
bool
|
If True, higher original values receive lower scaled scores. This is typically used for indicators where higher values represent less desirable conditions, such as unhealthy food exposure. |
False
|
var_title
|
str | None
|
Name of the output scaled column. If None, a default name is generated
as |
None
|
drop_intermediate
|
bool
|
If True, intermediate normalization columns are removed from the
dataframe. If False, intermediate columns are retained for inspection.
Intermediate columns are named using the input column name:
|
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Copy of the input dataframe with the scaled indicator column added. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
ValueError
|
If |
Notes
The scaling process follows three steps:
- Normalize the input column to a 0–1 range using Min-Max scaling.
- Optionally adjust normalized values using
expected_max. - Rescale the values to the specified range:
min_scale + normalized * (max_scale - min_scale)
If invert=True, the normalized values are transformed as:
1 - normalized
before rescaling.
The expected_max parameter is particularly important when working with
indicators that have a known theoretical upper bound. It prevents artificially
high scores when the observed maximum in the dataset is lower than what is
theoretically possible.
Examples:
Scale an indicator to the default 0–10 range:
>>> import pandas as pd
>>> import nfei
>>>
>>> df = pd.DataFrame(
... {
... "unhealthy_food_count": [0, 5, 10]
... }
... )
>>> result = nfei.create_linear_scale(
... df,
... col="unhealthy_food_count",
... )
Invert the scale so higher values represent better conditions:
>>> result = nfei.create_linear_scale(
... df,
... col="unhealthy_food_count",
... invert=True,
... )
Use an expected maximum to stabilize interpretation:
>>> result = nfei.create_linear_scale(
... df,
... col="unhealthy_food_count",
... expected_max=15,
... )
Retain intermediate columns for debugging:
>>> result = nfei.create_linear_scale(
... df,
... col="unhealthy_food_count",
... drop_intermediate=False,
... )