Estimate Usual Nutrient Intake using NRC method with adaptive repeater policy — estimate_usual_nutrient

Estimates usual nutrient intake distributions from 24-hour recall data following the NRC/IOM methodology (see doi:10.17226/10666 ). The method adjusts observed intakes for within-person variability using variance-component shrinkage based on respondents with repeated recalls. It is flexible and adaptive to different replicate data quality scenarios through the repeater_policy argument.

Usage

estimate_usual_nutrient_intake(
  recall_data,
  id_col,
  nutrient_cols,
  transform = c("cuberoot", "log", "sqrt", "none"),
  jitter = FALSE,
  warn_negative_between = TRUE,
  repeater_policy = c("auto", "strict", "lenient"),
  detailed = FALSE
)

Arguments

recall_data

A data frame containing repeated 24-hour recall data, with one row per observation (respondent-day).

id_col

Character scalar. The name of the column identifying respondents. Each unique ID represents one participant who may have one or more recall days.

nutrient_cols

Character vector of one or more column names containing nutrient intake values to be processed (e.g. "Energy.kcal_intake", "Protein.g_intake"). All must be numeric and non-negative.

transform

Transformation applied prior to variance estimation to improve normality. Options are "cuberoot" (default), "log", "sqrt", or "none".

jitter

Logical; if TRUE, adds a deterministic small numeric offset after transformation to prevent ties (useful when values are identical after rounding).

warn_negative_between

Logical; if TRUE, issues warnings when the estimated between-person variance component is negative before flooring to zero.

repeater_policy

Character scalar specifying how strictly to enforce the minimum amount of replicate information:

"auto" – chooses a balanced adaptive rule based on available replicate information (default).
"strict" – enforces higher thresholds for replicate data before adjusting.
"lenient" – proceeds with adjustment even when replicate information is limited.

detailed

Logical; if TRUE, includes diagnostic columns such as observed mean, between- and within-person standard deviations, degrees of freedom, replicate count, and shrinkage ratio.

Value

A tibble containing one row per respondent and estimated usual intakes for each nutrient. If detailed = TRUE, additional columns include:

*_observed_mean – back-transformed observed mean intake.
*_sd_between, *_sd_observed – variance components.
*_df_resid, *_R – residual degrees of freedom and total replicate info.
*_shrink_ratio – the shrinkage factor applied.

Details

This function implements the NRC (1986) / IOM (2003) recommended approach for adjusting observed 24-hour recall data to estimate the distribution of usual nutrient intakes within a population. The workflow is:

Apply the chosen transformation (transform).
Identify individuals with >=2 recall days (repeaters).
Estimate within- and between-person variance using ANOVA among repeaters.
Derive shrinkage ratio = SD(between) / SD(observed).
Shrink each individual's mean intake toward the population mean, adjusting for the ratio of within-to-between variation.
Back-transform to original units.

When no repeaters are available, observed means are returned unchanged. If insufficient replicate information exists, the behaviour depends on repeater_policy.

When the estimated between-person variance is non-identifiable (<= 0), the NRC adjustment is skipped and observed mean intakes are returned with a warning.

References

Institute of Medicine (2003). Dietary Reference Intakes: Applications in Dietary Planning. Washington (DC): National Academies Press. Appendix E. (https://www.ncbi.nlm.nih.gov/books/NBK221370/)

Examples

# Example with Energy and Protein
df <- tibble::tibble(
  id = c(1, 1, 2, 2, 3),
  Energy.kcal_intake = c(1800, 2200, 1500, 1600, 2000),
  Protein.g_intake = c(55, 65, 40, 42, 50)
)

estimate_usual_nutrient_intake(
  recall_data = df,
  id_col = "id",
  nutrient_cols = c("Energy.kcal_intake", "Protein.g_intake"),
  transform = "cuberoot"
)
#> Warning: Very limited replicate information for Energy.kcal_intake (df_resid = 2, R = 2). Skipping adjustment and returning observed means.
#> Warning: Very limited replicate information for Protein.g_intake (df_resid = 2, R = 2). Skipping adjustment and returning observed means.
#> # A tibble: 3 × 3
#>      id Energy.kcal_intake_usual Protein.g_intake_usual
#>   <dbl>                    <dbl>                  <dbl>
#> 1     1                    1993.                   59.9
#> 2     2                    1549.                   41.0
#> 3     3                    2000                    50