|Home | About | Journals | Submit | Contact Us | Français|
Due to the health and economic costs of childhood obesity, coupled with studies suggesting the benefits of comprehensive (dietary, physical activity and behavioral counseling) intervention, the United States Preventive Services Task Force recently recommended childhood screening and intervention for obesity beginning at age six. Using a longitudinal data set consisting of the body mass index of 3164 children up to age 18 and another longitudinal data set containing the body mass index at ages 18 and 40 and the presence or absence of disease (hypertension and diabetes) at age 40 for 747 people, we formulate and numerically solve – separately for boys and girls – a dynamic programming problem for the optimal biennial (i.e., at ages 2, 4, …, 16) obesity screening thresholds. Unlike most screening problem formulations, we take a societal viewpoint, where the state of the system at each age is the population-wide probability density function of the body mass index. Compared to the biennial version of the task force’s recommendation, the screening thresholds derived from the dynamic program achieve a relative reduction in disease prevalence of 3% at the same screening (and treatment) cost, or – due to the flatness of the disease vs. screening tradeoff curve – achieves the same disease prevalence at a 28% relative reduction in cost. Compared to the task force’s policy, which uses the 95th percentile of body mass index (from cross-sectional growth charts tabulated by the Centers for Disease Control and Prevention) as the screening threshold for each age, the dynamic programming policy treats mostly 16 year olds (including many who are not obese) and very few males under 14 years old. While our results suggest that adult hypertension and diabetes are minimized by focusing childhood obesity screening and treatment on older adolescents, the shortcomings in the available data and the narrowness of the medical outcomes considered prevent us from making a recommendation about childhood obesity screening policies.
The prevalence of obesity in U.S. school-aged children and adolescents tripled during 1980–1999 and has plateaued since 1999 at ≈ 17%, with 31.7% of U.S. children ages 2–19 overweight or obese (Ogden et al. 2010). Obese children tend to have elevated lipid concentrations and blood pressure (Freedman et al. 2007), and often become obese adults (Singh et al. 2008), with an increased risk of diseases such as hypertension and type 2 diabetes (e.g., Hubert et al. 1983, Barrett-Connor 1989). This is not only a public health problem, but an economic (the cost of adult obesity has been estimated at $147B/yr, Finkelstein et al. 2009) and national security (> 25% of Americans ages 17–24 are unqualified for the military due to their weight, Mission Readiness 2010) issue as well.
Two complementary approaches have been put forth to address this problem: a universal approach and a targeted approach. The universal approach focuses on better nutrition and more physical activity (as in First Lady Michelle Obama’s “Let’s Move” campaign, White House Task Force on Childood Obesity 2010), and involves schools, families, communities, and the food industry. Some school-based initiatives have generated reductions in obesity (Gortmaker et al. 1999, Coleman et al. 2005), though others have not shown statistically significant effects (The HEALTHY Study Group 2010). A recent meta-analysis found that obesity reduction programs provide modest benefits, at least over the short run, including for children ages 0–5 (Waters et al. 2011).
The targeted approach requires measuring children’s body mass index (BMI), which is the weight divided by the square of the height, measured in kg/m2 and rounded to one decimal place. BMI is a practical and reasonably accurate (Reilly et al. 2000) measure of a person’s true body fat, and obesity and overweight are defined as being the 95th and 85th percentile of the gender- and age-based BMI distributions tabulated by the Centers for Disease Control and Prevention (Kuczmarski et al. 2000). These tabulated percentages are based on historical distributions, which explains how 17% of U.S. children can exceed the 95th percentile. Many school districts and entire states require measuring a child’s BMI every several years, but for surveillance – not screening – purposes (Nihiser et al. 2007).
Although an Institute of Medicine report states that health-care professionals should routinely track children’s BMI and offer evidence-based counseling and guidance (Institute of Medicine 2005), clinical trials for treating childhood obesity have been hampered until recently by small sample sizes, poor generalizability and variable followup, leading to insufficient evidence for effectiveness of intervention in primary care settings (e.g., U.S. Preventive Services Task Force 2005). However, a targeted systematic review to support the updated U.S. Preventive Services Task Force (USPSTF) recommendations concludes that research into obesity treatment has improved in the past several years, leading to at least short-term benefits of comprehensive medium- to high-intensity (> 25 hr over six months) interventions that include dietary, physical activity and behavioral counseling components (Whitlock et al. 2010). Consequently, the USPSTF recommends childhood BMI screening and intervention for obesity starting at age six (USPSTF 2010).
This state of affairs highlights the need to assess possible intervention strategies. To our knowledge, the only study to do this compares screening and treatment at a single age (5, 10 or 15) to universal intervention, and suggests that, should effective universal interventions be available, this approach could lead to greater health benefits than the three single-age screening strategies, and that screening at age 15 is more beneficial than screening at ages 5 or 10 because some children who are obese at these younger ages are not obese at age 18 in the absence of treatment (Goldhaber-Fiebert et al. 2012).
In this paper, we derive optimal screening thresholds for children under 18 years old, but do not prove that threshold policies are optimal among all policies. Using biennial BMI data from the National Longitudinal Survey of Youth (NLSY) samples (Bureau of Labor Statistics 2010), we model the Markovian evolution of the probability density function (pdf) of BMI for all U.S. children at ages t = 0, 2, …, 18, assuming there is no obesity treatment. We optimize over the class of biennial threshold policies, where every child has his BMI measured every two years and receives treatment whenever his BMI exceeds a gender- and age-based threshold. The impact of treatment is estimated using the pooled analysis of three comprehensive moderate- or high-intensity behavioral interventions (Whitlock et al. 2010). The objective function of our model is to minimize the disease prevalence of these children when they reach age 40, which is estimated using Panel Study of Income Dynamics (PSID) data (Institute for Social Research 2010). The optimal screening problem is formulated as a deterministic dynamic program with an infinite-dimensional state (the pdf of the BMI in the population at age t). We then use a simulation model, which has much less restrictive assumptions than the dynamic program, to compare the performance of the USPSTF policy and the policy derived from the dynamic program. When we refer to an “optimal policy” we mean the solution to the dynamic program; due to shortcomings in the data and the problem formulation (which are discussed in §6), we are not suggesting that the dynamic programming policy is also the optimal approach to the actual problem. Our goal here is to present an initial mathematical formulation of this problem that makes the best use of available data and identifies key data needs and modeling assumptions that might improve future studies.
The problem considered here shares features with other screening and treatment problems, such as the screening of cholesterol (Garber et al. 1994), HIV viral load (D’Amato, D’Aquila and Wein 2000), CD4 count (Shechter et al. 2008), estimated glomerular filtration rate (Lee, Chertow and Zenios 2008), or prostate specific antigen level (Lavieri et al. 2009). While the latter four papers consider decisions of when to initiate or switch treatment from an individual’s point of view (i.e., the state of the system measures a person’s health status), our paper is perhaps closest in spirit to Garber et al. (1994) because both studies consider the time evolution of the population-wide pdf of a health measure (cholesterol in their case, BMI in our case). While Garber et al.’s goal is forecasting future cholesterol levels in the absence of treatment, we embed the dynamics in the presence of screening and treatment into a dynamic programming framework and derive optimal screening thresholds.
Exploratory data analyses in §2 guides our problem formulation in §3. Parameter estimation is undertaken in §4 and a computational study appears in §5. The results are discussed in §6.
The NLSY data set has a longitudinal population cohort of children born to women who participated in the NLSY between 1970 and 1988. Each of 3164 children (1566 boys and 1598 girls) has biennial BMI measurements from ages 0 to 18, which are partially self-reported or mother-reported. For children measured at odd-numbered ages, we linearly interpolate to obtain their BMI at even-numbered ages, and the analyses in this section considers BMI at ages 0,2,…,18. Each child in this data set has an attached weight such that the weighted sample of the 3164 children represents all children born between 1970 and 1988 to women ages 21 to 31 who were present in the U.S. in 1979, and all of our analyses are on the weighted sample.
We denote the BMI of a random child at age t by Bt and let B be the stochastic process (B0;B2,…,B18). We explore the Markov property of B in §2.1, the distribution of Bt in §2.2, and the distribution of the BMI increments (Bt+2 − Bt) in §2.3.
The mean and the standard deviation of Bt decrease during ages 0–4, plateau at ages 4–6, and increase during ages 6–18 (Fig. 1 of the Electronic Companion), and hence we dismiss the possibility that B has stationary transition probabilities. We use the approach in Garber et al. (1994), which tests the hypothesis (all tests in this study are performed at the 95% confidence level) that B is a first-order Markov chain via a direct test of the Chapman-Kolmogorov equations. The significance levels of the chi-square statistics, which measure the difference between the observed and expected two-step (i.e., from t to t + 4) transitions, are derived by a bootstrap technique (see §1 of the Electronic Companion for details). Of the 16 tests (ages t = 0, 2, …, 14 for each gender), we reject the null hypothesis that BMI levels follow a first-order Markov chain in four cases: males at age 12 and females at ages 0, 6 and 14. Under a Bonferroni correction, which considers the 1-0.05/8=0.99625 fractile of the chi-square distributions because there are eight age groups being tested, the Markov hypothesis is not rejected for any age-gender combination. However, because we are trying to retain the null hypothesis, this Bonferroni approach is not conservative.
Initial exploratory analysis shows that Bt follows neither a normal nor a lognormal distribution for any t, although it is much closer to the latter (data not shown). Consequently, to increase the normality of the data, we employ a Box-Cox transformation of ln Bt, which we denote by (Box and Cox 1964). The optimal value (i.e., the value that makes the distribution closest to normal) of the Box-Cox parameter is λ = −2:15 for males and λ = −2:02 for females. The resulting Q-Q plots in Fig. 2 of the Electronic Companion suggest that Xt is reasonably approximated by a normal distribution, particularly for the higher age groups. The Kolmogorov-Smirnov goodness-of-fit test for a normal distribution shows that at the 95% confidence level, Xt passes four of the 20 tests: girls at age 0, 8, 10 and 12. Further analysis shows that the majority of the deviation in the failed tests occurs in the left tail of the distribution (i.e., unusually low BMI levels), which is not important for the purposes of obesity screening. We also estimated the normal parameters from just the right half of Xt (i.e., only the data larger than the median), and obtained qualitatively similar results to those in Table 1 (data not shown).
Because the increment Xt+2 − Xt depends upon the value of Xt, we assess the normality of the increments using a 95% Kolmogrov-Smirnov test on the increment from age t to t + 2 for t = 0, 2, …, 16, and where the (untransformed) BMI values are segregated into different buckets. The results are given in Tables 6–7 in the Electronic Companion when BMI is divided into 25 buckets (< 12, 12 – 13, 13 – 14, …, 34 –35; 35+). A cruder system with 6 buckets (< 16, 16 – 18, 18 – 20, 20 – 22, 22 – 24, > 24) generates the results in Tables 8–the Electronic Companion. With 25 buckets, only 13 of the 362 tests (88 of the 450 possible tests had no data) failed the normality test, and 36 of the 108 normality tests failed in the six-bucket system.
By our analysis in §2 we assume that X = (X0, X2, …, X18) is a Markov process, where is the Box-Cox transformation of ln Bt. Because we are considering this screening problem from a societal – rather than individual – point of view, the system state at age t is given by ft(x), which is the pdf of Xt. If treatment is given at age t, it is given after the measurement Xt = x. In the absence of treatment at age t, the increment Xt+2 − Xt given Xt = x has pdf gxt(w), where w is the dummy variable for the value of the increment. By the analysis in §2.3, we assume that gxt(w) is the pdf of a normal distribution with mean μgxt and standard deviation σgxt. If treatment is given at age t, we assume that Xt+2 − Xt given Xt = x has a normal distribution with pdf hxt(w), mean μhxt and standard deviation σhxt; note that the impact of treatment is assumed to be independent of the impact of treatment received at previous ages. We consider biennial threshold policies with thresholds θt, where a person with Bt > θt (i.e., ) receives treatment at age t, for t = 0, 2, …, 16 (while we define θ0 for notational convenience, we do not allow treatment at birth and set θ0 =∞). The system state equation is
In the absence of any treatment (i.e., θt = ∞∀t), equation (1) implies that if the distribution of X0 is normal then the distribution of Xt is normal for t = 2, 4, …, 18, which is consistent with our analysis in §2.2–2.3 that concluded that both Xt and Xt+2 − Xt are roughly normal.
We assume that the initial system state, f0(x), is a skew-normal pdf with location parameter μf0, scale parameter σf0 and shape parameter αf0, which has pdf where ϕ(z) and Φ(z) are the pdf and cdf of the standard normal distribution (O’Hagan and Leon-hard 1976). The shape parameter αf0 indicates the amount of skewness, and f0(x) reduces to a normal pdf when αf0 = 0.
For t = 0, 2, …, 18, we define the disease function ct(x) to be the contribution to the disease prevalence at age 40 due to having Xt = x at age t. The objective is to minimize the disease prevalence at age 40, which in §4.4 is computed using a nonseparable function of (f0(x), f2(x), …, f18(x)). We use a surrogate objective function,
which is separable, and hence fits into the dynamic programming framework. This issue is discussed further at the end of §4.4.
In §4.4, we estimate the disease functions (c0(x), c2(x), …, c18(x)) by assuming that the Markov property of X continues until age 40 (i.e., X40 depends on (X0,X2, …, X18) only via X18) and by assuming that the prevalence of disease at age 40 is a function of the average BMI throughout the first 40 years; this latter assumption is relaxed in the sensitivity analyses in §5.3.
Because the USPSTF found no evidence of adverse effects from treatment on growth, eating-disorder pathology or mental health (USPSTF 2010), the main costs associated with screening and treatment are financial. We simply impose a constraint on the fraction of children ages 2,4,…,16 that screen positive (i.e., have a BMI exceeding their gender- and age-specific BMI threshold) each year, assuming there are the same number of children of each age in the population:
We refer to p as the treatment prevalence. Because the screening cost is negligible compared to the treatment cost, the left side of (3) is essentially proportional to the annual treatment cost. By varying p on the right side of (3), we can generate a range of policies and easily compare derived policies to recommended policies (e.g., USPSTF 2010).
Notice that although our system state is a pdf, the state transition is deterministic, and hence our problem, which is given by (1)–(3), is a deterministic constrained dynamic program. We do not derive any structural results about this dynamic program; e.g., we do not show that the optimal policy is a threshold policy.
The initial BMI distribution, the BMI increments in the absence of treatment, the BMI increments in the presence of treatment, and the disease functions are estimated in §4.1– §4.4.
Using the BMI data at birth, the MLE parameter values for the skew-normal pdf f0(x) are μf0 = 0:42, σf0 = 0:009 and αf0 = −1:00 for boys, and μf0 = 0:43, σf0 = 0:006 and αf0 = −1:30 for girls. As expected from §2.2, these two distributions have very little skewness (figures not shown).
In the absence of treatment, we assume that Xt+2 −Xt given Xt = x is normal with pdf gxt(w). We denote its mean and standard deviation by μgxt and σgxt, and estimate the values of these parameters using the NLSY data set. These parameter values are plotted as functions of x for t = 0, 2, …, 16 for boys and girls in Fig. 3 of the Electronic Companion.
If treatment is given at age t, we assume that Xt+2 − Xt given Xt = x is normal with pdf hxt(w), mean μhxt and standard deviation σhxt, where this increment captures the natural progression in the absence of treatment plus the effect of treatment. Although there are a number of meta-analyses that assess the impact of treatment for childhood obesity, we use data from the three comprehensive moderate- or high-intensity programs considered in Table 1 of Whitlock et al. (2010) (Savoye et al. 2007, Reinehr et al. 2006, and Nemet et al. 2005), which is a targeted systematic review to support the updated USPSTF recommendations. These data provide aggregate results on the effect of treatment and do not allow us to reliably estimate the dependence of treatment on the gender, age or pre-treatment BMI level; the latter two factors are interrelated because younger children tend to have smaller BMI values (e.g., the CDC obesity threshold at 16 years old is approximately 50% larger than at 6 years old).
The approach used to model the effect of treatment in the face of limited data is both important (i.e., it will affect the qualitative conclusions of our study) and nonobvious. Given that we cannot develop detailed functions that depend on age or pre-treatment BMI level, three modeling approaches come to mind: let the treatment-induced change in (i) BMI, Bt; (ii) transformed BMI, Xt, or (iii) z-BMI, be independent of gender, age and pre-treatment BMI level. We denote z-BMI by , where Mt is the median, St is the generalized coefficient of variation, and Lt is the parameter of the Box-Cox transformation (these three parameters also depend on gender and are given in Kuczmarski et al. (2000)); Zt can be converted into percentiles using standard normal distribution tables. We pursue option (ii), and in §3 of the Electronic Companion, we show that our treatment model accurately captures the dependence on pre-treatment BMI (albeit in a somewhat narrow range of pre-treatment levels), and successfully predicts that the change in z-BMI is larger for younger children than for older children (as shown in clinical studies by Epstein et al. 2007).
We estimate the impact of treatment in three steps; details are in §2 of the Electronic Companion. First, we transform the BMI data in Table 1 of Whitlock et al. (2010) from B to X using a Taylor series expansion around the mean of B, which gives us the mean and standard deviation of the change in X due to treatment for each of the three studies. In the second step, we compute the mean and standard deviation of the net treatment (i.e., treatment minus control) effect, pooled over the three studies. In the final step, we combine the natural progression and the net treatment effect to obtain μhxt and σhxt.
The most difficult task in this section is to estimate the four sets of disease functions (c0(x0), c2(x2),…,c18(x18)), which allow us to predict the disease (hypertension or diabetes) prevalence at age 40 in terms of (B0;B2,…,B18) for each gender. We use three different data sets, which measure BMI from three different sets of individuals, to estimate these functions: the NLSY data that contain (B0,…,B18), the PSID data (Institute for Social Research 2010), which have self-reported BMI at ages 18 and 40 and the presence or absence of hypertension and of type 2 diabetes by age 40, and the National Health and Nutrition Examination Survey (NHANES) data (National Center for Health Statistics 2006), which contain annual BMI data for ages [18,40]. The NHANES data set is cross-sectional, and to use it to infer patterns of BMI changes between ages 18 and 40 in the PSID data requires the assumption that there are no substantial cohort-specific trends.
This subsection begins with a literature review, which motivates our key assumption that determines the independent variable in our logistic regression for the prevalence of disease at age 40. Then we use the NHANES database to estimate the parameter γ which tells us how to weight B18 and B40 in the PSID data so as to succintly capture how BMI changes between ages 18 and 40. With γ in hand, we re-express the independent variable in terms of Xt rather than Bt to obtain the disease functions, and run four logistic regressions (for each combination of gender and disease) by stringing together sample paths from the NLSY and PSID data sets. Finally, we use the regression results to calculate the disease prevalence at age 40 for a generic set of BMI thresholds.
We briey review the literature on how BMI throughout various stages of life impacts adult disease. A key issue - whether, after controlling for adult BMI, childhood BMI impacts adult health - has not been answered conclusively. A 55-year follow-up study found that, after controlling for adult BMI and other factors such as smoking and socioeconomic status, the risk of heart disease mortality among men increased with adolescent BMI, with many of the associated deaths occurring in their 40s (Must et al. 1992). While another study (Mahoney et al. 1996) obtained similar results, four studies found no inuence on adult disease from childhood BMI after controlling for adult BMI. However, these four studies do not necessarily conflict with the finding in Must et al. (1992): Colditz et al. (1990) considered diabetes in women, Willett et al. (1995) considered heart disease in women, Freedman et al. (2001) followed up at ages 18–37, and Raitakari et al. (2003) followed up at ages 24–39. On a related issue, some studies have found that weight gain or BMI increase in adulthood is a strong predictor of adult disease (Wilsgaard and Arnesen 2007 and references therein).
Several studies have shown that adult disease is a j-shaped function of childhood weight, where low-weight children were undernourished, leading to adult obesity (Abraham, Collins and Nordsieck 1971, Gunnell et al. 1998). Another study showed that death by age 32 is a j-shaped function of B18, with lung cancer and cardiovascular disease being the main cause of death for low-and high-BMI, respectively.
Obesity at a fixed age in childhood and young adulthood is associated with risk factors for heart disease at that age (e.g., Freedman et al. 2007). Similarly, childhood obesity is a strong predictor of young-onset type 2 diabetes (Franks et al. 2007). Moreover, childhood obesity is associated with fibrous plaques (Berenson et al. 1998) and coronary artery calcification (Mahoney et al. 1996) in young adults, suggesting that “the development of coronary heart disease may depend on the cumulative lifetime effects of obesity” (pg 715, Freedman et al. 2001). On a different note, low BMI up to two years of age followed by an early adiposity rebound is associated with diabetes in young adulthood (Eriksson et al. 2003, Bhargava et al. 2004); we do not model this effect, but discuss it further in §6.
Due to the conceptual simplicity of the quote from Freedman et al. (2001) and for lack of definitive data, we estimate the disease functions under the assumption that the prevalence of disease at age 40 depends on the average (or, equivalently, cumulative) amount of obesity throughout the first 40 years of life. We emphasize that we cannot test this assumption directly, but rather we assume that it is true for the purposes of analysis; as discussed in §6, the lack of an evidence base for this assumption – or any competing assumptions – presents us from making policy recommendations. To operationalize this assumption, we take as our measure of obesity at age t the BMI Bt. Before carrying out the remainder of these calculations, we note that the z-BMI Zt may be a more appropriate measure of obesity than Bt due to the increase with age of the population-wide mean BMI. Consequently, we consider a variant of the key assumption where our measure of obesity is Zt instead of Bt in §5.3, where we find that Bt is a more powerful predictor of adult disease incidence (as measured by McFadden pseudo-R2, McFadden 1973) than Zt.
Recall that the NLSY data has Bt for t = 0, 2, …, 18, and the PSID data set has B18, B40 and D40 (which equals 1 or 0 in the presence or absence of disease) for a different set of individuals. While it would be natural to use B2t as a measure of the cumulative amount of obesity up to age 40, even if we string together the sample paths from the NLSY and PSID data sets, we still only have BMI data for (B0, B2, …, B18, B40). Hence, we use the NHANES data to approximate the BMI evolution over ages [18,40] by a convex combination of B18 and B40. If we define p40 = P(D40 = 1), then – based on our key assumption – we can run a logistic regression where is the dependent variable and the average BMI throughout ages [0,40] is the independent variable. This latter quantity is given by
where the subscripts N and P are used to denote the NLSY and PSID data sets, respectively, because we have B18 values from both data sets (note that in both the dependent and independent variables, we are omitting a subscript for each individual person).
To find γ in (4), we use the NHANES data set (National Center for Health Statistics 2006), which contains annual BMI data for ages [18,40] for 3721 males and 4245 females. The mean BMI for each age is plotted in Fig. 5 in the Electronic Companion, along with the best-fitting quadratic curve, which has endpoints f,18 and f;40. We compute γ by equating an estimate of the average BMI during [18,40], which is the area under the quadratic curve divided by 22, to the weighted average [γf,18 + (1 −γ)f;40]. This procedure yields γ = 0:49 for females and γ = 0:23 for males. Note that γ ≈ 0 if the additional weight during [18,40] is gained soon after age 18, and γ ≈ 1 if the additional weight is gained just before age 40. Hence, our estimates suggest that females tend to gain weight uniformly throughout this age range (we do not exclude pregnant women from this analysis), while males tend to increase their BMI in the earlier years of this age range.
Note that for the purposes of solving the dynamic program, it suffices to minimize the independent variable (i.e., the independent variable in (4) can be taken as the cost function); the regression coefficients are needed to estimate D40 under the optimal policy, but are not needed to derive the optimal screening and treatment policy. Now that we have the independent variable in terms of Bt, we need to convert it into functions of for t = 0, 2, … 18. By (4), the cost function can be given by
To express (5) solely in terms of (X0, X2, …, X18), we use the PSID data set to write B40 as a quadratic function of B18, which yields , where (a2, a1, a0) = (−0:0198; 1:94;− 7:27) for males and (−0.0210,2.19,− 11.19) for females.
By (5) and by factoring out and dropping the data set subscript from X18, the disease functions are
where (λ, γ, a2, a1, a0) = (−2.15, 0.23, −0.0198, 1.94, −7.27) for males and (−2.02,0.49,−0.0210,2.19,−11.19) for females.
We now run the logistic regression, where the dependent variable is and the independent variable is (4). To review, the BMI values in the independent variable are (B0, B2, …, B16, B18,N;B18,P;B40), where we have (B0, B2, …, B16, B18,N) for 1566 boys and 1598 girls from the NLSY data set, and we have (B18,P, B40) for 279 men and 468 women from the PSID data set. Using the Markov assumption that B40 depends on (B0, B2, …, B18) only via B18, we string the data sets together by randomly assigning sample paths (B0, B2, …, B18,N) from the NLSY data based on the value of B18,P in the PSID data, although we employ bootstrapping to overcome the limited sample size of the latter data set. More specifically, the PSID data for 747 people are sampled with replacement with sampling probability proportional to the sample weights (which describe how representative each of the 747 people are in the general population), and create a bootstrapped data set of 747,000 people. For each bootstrap sample (B18,P, B40, D40), one NLSY sample path (B0, B2, … B16, B18,N) of the same gender is selected to link to it with a probability that is proportional to where ϕ(x) is the standard normal pdf and the standard deviation of 0.03 induces a strong distance effect while maintaining a reasonable number of NLSY samples in each selection. The results of the four logistic regressions are given in Table 12 of the Electronic Companion.
Using equation (1), we simulate 106 sample paths of BMI values (x0, x2, … x18) under a generic threshold policy (θ0, θ2, … θ16). By construction, note that the independent variable in (4) can also be expressed as . For each individual, we calculate his value of the independent variable using (6)–(7), and use the regression coefficients in Table 12 of the Electronic Companion to determine his value of p40, which is
We compute the disease prevalence by taking the average of p40 over all 106 individuals (where each individual has a BMI sample path). From a computational viewpoint, this approach is much easier than integrating (8) over the ten-dimensional joint pdf of ft(x).
Note that our disease prevalence is the expected value of the quantity in (8), which is a nonseparable function of (f0(x), f2(x), …, f18(x)), and hence cannot be used in the dynamic programming formulation. Instead, we minimize the separable cost function in equation (2). Although we have not been able to show analytically that minimizing (2) leads to minimizing the simulated disease prevalence in (8), the simulated disease prevalence is indeed monotonic in in our numerical computations in §5.2.
We briey discuss the computational aspects of the dynamic program in §5.1. The base-case results and the sensitivity analyses are presented in §5.2 and §5.3, respectively.
To deal with constraint (3), we use a “finite-fuel” approach (e.g., Benes, Shepp and Witsenhausen 1980), where is the slack in the treatment prevalence from age t to age 16, and augment the system state to be (ft(x), yt). If we define Vt(ft(x), yt) to be the minimum remaining cost until age 18 when the state at age t is (ft(x), yt), then the optimality equation for t = 2, 4, …, 16 is
where the minimization is subject to the finite-fuel constraint y18 ≥ 0, and the state transitions for ft(x) are defined in (1). Note that the optimal thresholds are independent of disease (i.e., hypertension vs. diabetes) because the disease prevalence for both diseases is an increasing function of the objective function in equation (2).
Although our dynamic program is deterministic, its continuous state (i.e., ft(x)) makes for unwieldy calculations, and we have been unable to solve for the optimal thresholds for more than three ages (i.e., we can only solve a three time-period dynamic program). Consequently, we also use an approximate dynamic programming algorithm, more specifically an extension of the rollout algorithm that is specifically tailored to our setting of discrete-time constrained deterministic dynamic programming problems (Bertsekas 2005). The algorithm, specified in §2 of Bertsekas (2005), relies on a suboptimal policy called the base heuristic, upon which it improves, and we use the optimal (14,16) policy (i.e., the optimal policy among those that screen only at ages 14 and 16) as the base heuristic.
In this subsection, we solve the dynamic program for a wide variety of treatment prevalences, and compare these policies to a variant of the USPSTF recommendation. Recall that the USPSTF recommends that children be screened and treated for obesity starting at age six (USPSTF 2010). Although the USPSTF does not explicitly recommend a screening frequency, they suggest that screening be done during routine pediatric assessments, which typically occur every year or two. We adapt their recommendation to biennial screening, and refer to the biennial USPSTF policy as the one that sets the BMI thresholds θt equal to the 95th percentile of the BMI distributions in the CDC growth charts (Kuczmarski et al. 2000) for ages t = 6, 8, …, 16 (and sets θt = ∞ for t = 0, 2, 4). It is important to remember that the CDC BMI distributions do not reflect the BMI distributions in the NLSY data set: obesity has tripled in recent decades and approximately 15% of children in the NLSY data set exceed the 95th percentile of the CDC BMI distributions.
Because we have been able to solve the dynamic program for up to only three time periods (this takes several days on a personal computer, and incorporating each additional time period increases the running time by approximately a factor of 800, which is the size of our discretized state space), we present results that progressively restrict the number of ages that are tested to one, two and three, respectively. The main results are presented as tradeoff curves in Fig. 1 as well as Figs. 6 and 7 in the Electronic Companion, respectively, with selected numerical results appearing in Table 1.
Fig. 6 of the Electronic Companion has eight tradeoff curves that test at a single age: 2, 4,…,16. For each gender and disease, performance improves with increasing age (except for males at age 8). Fig. 7 of the Electronic Companion displays tradeoff curves for three two-age screening polices, at ages (6,16), (10,16) and (14,16). Screening at ages (14,16) is the best among these three policies. Fig. 1 reveals that the three-age policy using ages (12,14,16) performs only marginally better than the optimal two-age policy.
The optimal (12,14,16) policy in Table 1 has two potential sources of suboptimality: it uses the sum of BMIs as its separable objective function, rather than the disease prevalence computed by simulation via equation (8), and the dynamic programming solution only screens at three ages. We perform two other calculations to assess these suboptimalities. First, an exhaustive search of the (14,16) policy using the simulation model yields exactly the same solution as the optimal (14,16) policy in Table 1 (however, an exhaustive search for the optimal three-stage policy is extremely onerous from a computational standpoint). Second, while the approximate dynamic programming procedure based on Bertsekas (2005) generates a policy that treats at more than three ages, it generates nearly the same disease prevalences as the optimal (12,14,16) policy (Table 1). Taken together, all of these results strongly suggest that the optimal (12,14,16) policy is nearly optimal using the simulation in equation (8) among the class of all threshold policies (i.e., allowing screening at ages 2,4,…,16) for our set of parameter values.
Overall, the optimal tradeoff curves are only slightly convex and are rather at. The flatness of the tradeoff curves leads to different qualitative conclusions regarding the magnitude of the performance gap between the optimal three-age policy and the biennial USPSTF policy (Fig. 1 and Table 1): the optimal three-age policy provides a 3% mean relative reduction in disease prevalence (2.0% for male hypertension, 3.2% for male diabetes, 4.4% for female hypertension, and 4.3% for female diabetes) at the same treatment prevalence, or a 28% mean relative reduction in treatment prevalence (28.5% for male hypertension, 29.1% for male diabetes, 28.0% for female hypertension, and 27.3% for female diabetes) at the same disease prevalence.
Relative to no obesity screening, the optimal three-age policy with a treatment prevalence equal to that of the biennial USPSTF policy achieves a relative reduction in disease prevalence of 8.1% for male hypertension, 13.8% for male diabetes, 17.8% for female hypertension, and 21.3% for female diabetes. To calibrate the treatment prevalence, we note that if the thresholds were set at the overweight level (i.e., 85th percentile of the CDC distributions) for ages 6, 8, … 16, then the treatment prevalence is 0.2158 for boys and 0.1930 for girls. As an upper bound, the optimal three-age policy with a treatment prevalence equal to 0.25 reduces disease prevalence (compared to no screening) by 15.6% for male hypertension, 24.5% for male diabetes, 35.0% for female hypertension, and 37.0% for female diabetes.
The optimal multi-age policies achieve their improvements over the biennial USPSTF policy by being less selective – as measured by the threshold’s corresponding percentile in the CDC BMI distributions – with increasing age; the same is not always true for the actual thresholds, e.g., the optimal BMI threshold is smaller at age 14 than age 16 in the optimal (14,16) female policy. The majority (≈ 51% for females and ≈ 70% for males) of treatment occurs at 16 years old (including both males and females who fall under the 85th CDC percentile, which is the overweight threshold), and very few (≈ 1:7%) males under 14 years old are treated (Fig. 2); these percentages are calculated using the system state pdf’s ft(x) under the optimal solution, the values in Fig. 2, and the CDC growth curves (i.e., we use these three pieces of information to compute the fraction of children of each age and gender that receive treatment).
In this subsection, we carry out four sensitivity analyses: we allow treatment efficacy to be dependent on age and pre-treatment BMI, and we consider two variants of the key assumption: disease prevalence depends on average z-BMI rather than on average BMI, and disease prevalence at age 40 depends on childhood BMI only via B18.
To assess the robustness of our results with respect to age-dependent treatment efficacy, we assume that treatment is more effective for children ages 6,8,10 than for children ages 12,14,16, and we consider the ratio of the change in untransformed BMI due to treatment for children ages 6,8,10 divided by the change in untransformed BMI due to treatment for children ages 12,14,16. This ratio is computed by replacing the mean net treatment effect, μT (−0.00183 for males and −0.00214 for females in §4.3), by μT −d for ages 6,8,10 and μT +d for ages 12,14,16, and choosing d to achieve the desired ratio, after converting from Xt to Bt and averaging over the top 15% of NLSY BMI values for each age, which is the subset of children who are apt to receive treatment. Under this construction, the mean and variance of the treatment effect for all children is independent of d. Note that when d = 0, which corresponds to our base case, this ratio is 0.61 for females and 0.62 for males (Fig. 4 of the Electronic Companion).
We now consider two policies, the optimal two-stage (10,16) policy and the optimal two-stage (14,16) policy, and compute the breakeven value of the ratio in the previous paragraph that equates the performance of these two policies. The breakeven ratio is 0.92 for males and 0.85 for females. If instead we compare the biennial USPSTF policy and the optimal (14,16) policy, the breakeven values are 1.03 for males and 0.94 for females, which are approximately 60% higher than the base-case values (i.e., the young-to-old ratio of the reduction in BMI needs to be 60% higher in the base-case for these two policies to achieve the same performance).
We perform a similar sensitivity analysis as above, but assume that bigger treatment effects occur for children with higher pre-treatment BMI levels rather than for younger children. We let the net treatment effect be μT −0:441d for males (μT −0:856d for females) with pre-treatment BMI levels above the 97th CDC percentile, and be μT + d for those below the 97th percentile (0.441 and 0.856 maintain the mean treatment effect at 0 for the top 15% of NLSY BMI values), where there is a one-to-one correspondence between d and the ratio of the change in untransformed BMI due to treatment for children above the 97th percentile divided by the change in untransformed BMI due to treatment for children between the 85th NLSY percentile and the 97th CDC percentile. When d = 0, this ratio is 1.54 for males and females (Fig. 4 of the Electronic Companion). We find that the breakeven ratio that equates the performance of the biennial USPSTF policy and the optimal (14,16) policy is 5.86 for males and 5.14 for females, which are much larger than the corresponding ratios in the sensitivity analysis for age-dependent treatment efficacy.
Here, we re-estimate the disease functions in §4.4, but assume that obesity at age t is measured by the z-BMI Zt rather than Bt. We set γ = 0:5 (which generates higher pseudo-R2 values than the γ values used earlier) and replace Bt by Zt in equation (4). Because the CDC charts are not tabulated through age 40, to compute the values of M40, S40 and L40, we estimate these parameters from the PSID age-40 data, and then modify them by the ratios of the age-18 parameters derived from the CDC charts divided by the age-18 parameters derived from the PSID data. The new McFadden pseudo-R2 values are (0.020,0.032,0.100,0.080), which are smaller than the values in Table 11 of the Electronic Companion. Overall, the results are qualitatively similar to the base-case results, in that the optimal (12,14,16) policy provides a 3% reduction in disease prevalence relative to the biennial USPSTF policy. The main difference is that the optimal single-age policy (i.e., test only at age 16) is nearly optimal compared to the policy that screens at ages 12,14,16.
Finally, we forego the key assumption that disease prevalence at age 40 depends on the cumulative lifetime obesity, and consider the extreme case in which the BMI at age 18 is the sole childhood factor (aside from gender) that affects the disease prevalence at age 40. More specifically, the objective function of our dynamic program is changed from (2) to minimizing . Not surprisingly, this change causes a shift towards more treatment at age 16 and less treatment at earlier ages. For example, in Table 1 the CDC threshold percentiles for the optimal (14,16) policy for males are 90.8 for age 14 and 74.1 for age 16, and they change to 96.1 for age 14 and 62.1 for age 16 under the new objective function. For females, the (14,16) percentiles change from (85.7,81.6) to (93.9,72.8).
Our primary result is that – relative to a biennial version of the USPSTF policy, the optimal policy is predicted to achieve a 3% reduction in hypertension or diabetes prevalence at age 40 at the same cost, or achieve the same disease prevalence at a 28% reduction in cost. While the USPSTF biennial policy sets thresholds at the 95th percentile of the CDC BMI distributions for ages 6,8,…,16, the optimal policy treats mostly 16 year olds and few children under age 14. This is largely due to the high false positive rate associated with treating younger children (Goldhaber-Fiebert et al. 2012). More specifically, additional calculations (averaged over both genders) with our simulation model show that, in the absence of treatment, 26.8% of obese six-year olds, 18.8% of non-obese six-year olds, 39.4% of obese 16-year olds and 16.9% of non-obese 16-year olds develop hypertension by age 40. The corresponding percentages for diabetes at age 40 are 7.6%, 4.8%, 12.2% and 4.2%, respectively. If obese six-year olds are treated at age six, 25.1% have hypertension and 6.9% have diabetes at age 40, and if obese 16-year olds are treated at age 16, 34.9% have hypertension and 10.2% have diabetes at age 40. These calculations allow for three observations: obese 16-year olds are much more likely than obese 6-year olds to have disease at age 40 in the absence of treatment, treatment of an obese 16-year old at age 16 reduces the disease prevalence at age 40 more than the treatment of an obese six-year old at age six, and non-obese six-year olds are more likely to have disease at age 40 than non-obese 16-year olds in the absence of treatment.
In our view, the total amount of screening to perform (i.e., setting the treatment prevalence p in equation (3)) is not clear cut because the tradeoff curve is only modestly convex: the absolute value of the slope of the optimal tradeoff curve in Fig. 1, which is a measure of cost-effectiveness, decreases b y an average of 72% as the treatment prevalence increases from 0 to 0.25. Even with perfect compliance, the tradeoff curve in Fig. 1 is somewhat at: it takes roughly 20 childhood treatments (each taking > 25 hr over six months) to avert a single case of disease by age 40 (referring to Table 1 and conservatively assuming that no adults get both hypertension and diabetes, the number of treatments per averted case is [(0.2214−0.2035)+(0.0636−0.0548)]/(8×0.1027)=30.8 for boys and [(0.2350−0.1930)+(0.0568−0446)]/(8× 0.085)=12.5 for girls). At the treatment prevalence of the biennial USPSTF policy, the optimal (12,14,16) policy treats 51.1% of 16-year-old males in the NLSY data set. If this magnitude of intervention was desired, universal intervention may make more practical sense than screening over half of 16-year-old males, particularly given the questionable compliance of this age group. Taken together, although our model has many shortcomings, in the very least our results raise the possibility that universal intervention is preferable to intervention that is targeted via screening. See Cecchini et al. (2010) for a study that compares the cost-effectiveness of various obesity interventions.
Childhood obesity screening and treatment is a prime candidate for mathematical modeling and analysis because of the impracticality of running 50-year clinical trials that link childhood BMI to adult disease. The USPSTF policy has not been previously analyzed with respect to suboptimality or supply-side feasibility (over 5M children per year would be referred for treatment under the USPSTF policy), and our study is intended to inform the debate on the appropriate screening policy for childhood obesity. However, our results do not imply that the USPSTF policy should be overturned for two reasons: our model captures only some of the aspects of this very complex issue, and the lack of available data leads to some modeling assumptions that are not based on strong evidence.
Our model does not include a number of issues that – with a stronger evidence base – could conceivably provide arguments to justify the policy recommendations of the USPSTF, even if our numerical results are accurate: e.g., (i) the social and psychological effects of childhood obesity, (ii) treatment also instills other good lifestyle choices (e.g., no smoking) that provide health benefits later in life (Magnussen et al. 2011), (iii) screening earlier would prevent type 2 diabetes (the prevalence of type 2 diabetes in childhood is on the order of 0.1%, and is concentrated in Native American populations and in minority children with a family history of diabetes, Fagot-Campagna et al. 2000) or the possibly irreversible effects of nonalcoholic fatty liver disease (Alisi et al. 2009) in childhood, (iv) the effect of pre-pregnancy weight on birth outcome and obesity risk in the next generation (this factor may lead to screening girls a few years earlier than in Fig. 2, but not at age 6), or (v) the tangential health benefits accrued by other family members as a result of family involvement in intervention might be larger when the treated children are younger. Although some of these effects (e.g., issue (i)) are difficult to quantify, better data on health outcomes related to these issues would help to inform the debate and could possibly be incorporated into future models.
Aside from the omission of the issues listed above, most of the limitations of our study are due to the paucity of data. The data include BMI evolution, the effect of treatment on BMI, and the effect of BMI on disease, which we consider in turn. The stochastic model for the evolution of the population-wide BMI pdf captures the salient features of the NLSY data and appears to be at the appropriate level of complexity for the strategic issues addressed here. The BMI evolution data up to age 18 are plentiful, but somewhat dated (and also partially self-reported, which may understate BMI levels): the NLSY data cover children born between 1970 and 1988, and the PSID data used for this analysis cover adults born in the late 1960s and early 1970s, who were not children during the obesity epidemic. Childhood obesity has tripled since 1980, but has leveled off in the last decade with the exception of the far right tail of the BMI distribution for boys (Ogden et al. 2010). Hence, our results underestimate the fraction of children who would screen positive, and our analysis should be repeated when more recent data become available; nonetheless, the nature of the optimal policy depends on the temporal manner in which obese children’s weight fluctuates over time in the absence of treatment, not on how many children are obese, and so our qualitative conclusions are unlikely to change if we use more recent data. In addition, although our model looks at each gender separately, the NLSY data set is not sufficiently large to look at other (e.g., racial or socioeconomic) subgroups. Finally, better longitudinal data beyond age 18 – both more frequent measurements and data beyond age 40 – would strengthen the analysis (the tradeoff curve would likely be steeper if we minimized disease prevalence at age 50 or 60).
Turning to the effect of treatment on BMI, while the NLSY BMI data by and large follow the Markov property in the absence of treatment, our model also assumes that the Markov property holds after treatment. That is, if a child receives treatment at age t and consequently has BMI bt+2 (a realization of the random variable Bt+2) two years later, then his future BMI trajectory is statistically identical to a child who has never received treatment but who also has BMI bt+2 at age t + 2. This Markov assumption can cut both ways. On the one hand, data suggest that some treatment effects are transient (Oude Luttikhuis et al. 2009), while others are not (Epstein et al. 2007). On the other hand, treatment could have a long-term impact on healthy behaviors (e.g., food choices and exercise patterns). This Markov assumption deserves more attention in the future. In addition, we may be overestimating the impact of multi-age policies if an individual’s efficacies across different treatments are positively correlated: we treat them as iid, and people with lower efficacy are more apt to exceed the threshold in subsequent years. In this regard, it is worth noting that the optimal single-age policy outperforms the biennial USPSTF policy (except for diabetes in women) and is nearly optimal if obesity is measured by z-BMI.
Perhaps the most unrealistic aspect of the model formulation is that it implicitly assumes 100% treatment compliance for all children who screen positive. In practice, many families will refuse treatment for their children, even if the treatment is free. Moreover, compliance among those who initiate treatment may be lower than the compliance achieved in clinical trials. However, if compliance is independent of pre-treatment BMI level and age, then the model (in particular, equation (1)) is easily modified to incorporate imperfect compliance. Extensive numerical results in Wein, Yang and Goldhaber-Fiebert (2012) suggest that the relative reduction in disease prevalence (at the same treatment prevalence) under imperfect compliance is slightly less than the compliance rate times the relative reduction in the 100% compliance case, and the relative reduction in treatment prevalence (at the same disease prevalence) is 15–30% less than it is under 100% compliance. Hence, the overall qualitative conclusion about the optimal policy versus the USPSTF policy should be unaffected by the compliance rate as long as compliance is not age-dependent. As noted in the next paragraph, although the compliance rate may vary by age, there are not sufficient data to quantify this effect.
Given that it is only very recently that treatment has been shown to be moderately effective at reducing BMI (Whitlock et al. 2010), it is not surprising that the data on the effect of treatment on BMI are sparse. Indeed, while our model can accommodate efficacy that varies by gender, age and pre-treatment BMI, the existing data do not allow us to take full advantage of this. In the near term, this – along with compliance data (which may be indirectly quantified by the effect of treatment on BMI) – probably represents the most pressing data requirement. Despite this lack of data, we assume that the effect of treatment on the transformed BMI, Xt, is independent of age and pre-treatment BMI level. This assumption forces the reduction in untransformed BMI due to treatment to increase with pre-treatment BMI, which fits the data from the high-intensity programs in Table 1 of Whitlock et al. (2010) (Fig. 4 of the Electronic Companion). This assumption also leads to a larger z-BMI reduction for ages 6,8,10 than for ages 12,14,16, which is consistent with the studies summarized in Epstein et al. (2007). Our sensitivity analysis in §5.3 provides a crude assessment of this issue: the ratio of the absolute value of BMI reduction due to treatment for ages 6,8,10 divided by the absolute value of BMI reduction due to treatment for ages 12,14,16 needs to increase by 60% relative to our base-case assumptions in order for the optimal biennial policy to achieve the same performance as the optimal (14,16) policy. Given that adolescents are typically under less parental control than younger children and that a 60% increase does not appear to be beyond the realm of feasibility, the effect of age on treatment compliance and efficacy deserves further investigation. In contrast, our sensitivity analysis in §5.3 shows that our qualitative conclusions are insensitive to the possibility that children above the 97th percentile of the CDC BMI distributions achieve significantly larger BMI reductions than those below the 97th percentile, or if, in our key assumption, obesity is measured by z-BMI rather than BMI.
Finally, the data on the lifetime effects of BMI on adult disease are by far the most difficult to obtain, perhaps requiring a longitudinal study of at least five decades. In the absence of additional studies on this topic, we believe that it is prudent to employ our key assumption that adult disease depends on the average lifetime obesity. Our sensitivity analysis in §5.3 on this issue shows that our key assumption is conservative with respect to the argument that the optimal policy outperforms the biennial USPSTF policy. That said, one effect that our model ignores (because we use separate data sets for ages [2,18] and [18,40]) is that infants who are very small until age two and then experience a catch-up increase in BMI soon thereafter have an increased risk of type 2 diabetes as an adult (Eriksson et al. 2003, Bhargava et al. 2004), perhaps due to changes in metabolic factors. Although this phenomenon may be more prevalent in the developing world (indeed, our Markov test results suggest that this effect is not prevalent in the NLSY data), it does raise the possibility that more complicated policies that screen for this phenomenon (screening, although not treatment, would need to start by age 2) could enhance screening performance. We leave this as a topic for future research.
While additional treatment efficacy and compliance data and more recent longitudinal BMI data would allow us to refine our analysis, and the lifetime effects of obesity on adult disease are not well understood, our results – which consequently are quite fragile – suggest that the most effective way to reduce adult disease via childhood obesity screening and treatment may be to focus on older adolescents. The fragility of our results, coupled with the fact that our model captures only some of the aspects of this complex issue, preclude us from making policy recommendations regarding childhood obesity and screening policies. Hence, although our numerical results are at odds with the USPSTF recommendations, we feel that our main contribution is not in the policy realm, but in the problem framing and modeling: viewing the issue as a constrained optimization problem of minimizing adult disease prevalence subject to a treatment budget, modeling the evolution of the population-wide distribution of BMI, investigating the Markov behavior of BMI and the normality of BMI and its increments, the modeling of treatment with a discussion of the three modeling options, and the key assumption underlying the calculation of the adult disease prevalence. More specifically, we find that – for our set of parameter values – the solution to the dynamic program, which assumes a Markov BMI process, Box-Cox normality of the BMI increments, and a separable objective function, appears to be very similar to the optimal solution for the more complex simulation model. While our modeling choices may not all stand the test of time, they do provide a basis for discussion and a framework on which to build. Our analysis also highlights the need for treatment efficacy data that are broken down by age, gender, pre-treatment BMI level and treatment history, and – more ambitiously – longitudinal data on the effect of BMI levels throughout an individual’s lifetime on morbidity and mortality. Until these data become available, policymakers will need to assess various policies in the face of incomplete information.