U.S. national growth charts consisting of a series of percentile curves of various measures of body size (e.g., height and body weight) in children and adolescents were first constructed in 1976 by NCHS [
2]. These charts provided reference values that are an important clinical tool for health professionals for assessing the appropriate development of children in the U.S. and throughout the world. In 2000 CDC revised the growth charts for U.S. boys and girls and included additional charts for BMI for ages 2 to 20 years old [
16]. These charts were created to replace the weight-for-age charts. The data used in the revision comes from the National Health Examination Survey (NHES) II (1963–65) and III (1966–70), and the National Health and Nutrition Examination Survey (NHANES) I (1971–74), II (1976–80), and III (1988–94). The sample designs of these surveys are stratified, multistage probability samples of the civilian, noninstitutionalized population in the 48 contiguous states (NHES II, NHES III, NHANES I) or all 50 states (NHANES II, NHANES III). Each of the surveys has sample weights that are a combination of the inverse of the rates of sample selection, and nonresponse and poststratification adjustments.
The construction of the most recent CDC percentile curves of BMI-for-age involved a three-step procedure [
3]. In the first step, BMI values were grouped by age into 6-month intervals for ages 2 to 20 years and sample-weighted empirical percentiles were computed for each age group for the percentiles of interest (3
^{rd}, 5
^{th}, 10
^{th}, 25
^{th}, 50
^{th}, 75
^{th}, 85
^{th}, 90
^{th}, and 97
^{th}). In the second step, each of these empirical percentiles was then smoothed across age using local weighted regression [
17] with a tricubic kernel weight function. The bandwidths for the local weighted regression varied by age and sex; for details see Kuczmarski et al.[
3]. Differential population sizes among the age groups were not accounted for in this smoothing. In the third step “The smoothed percentile curves obtained through LWR [local weighted regression] were then fit by a 4-degree polynomial to achieve parametric percentiles.” [
3]
A public use data file called NHANES_GROWTHDATA was provided to us by NCHS. This data file corresponds to the data used to create the 2000 CDC Growth Charts. The file includes the BMI values along with the sample weights of 18,592 boys and 18,779 girls with ages 18–305 months. The curves of the conditional percentiles corresponding to the single and double-kernel methods with the median correction, and bandwidths
and
(that incorporate the sample weights in the bandwidth selection along with a scale adjustment for
) are presented in and along with the 2000 CDC BMI-for-age curves. (The kernel percentile curves are plotted beyond age 20 to age 23.) The design effect due to the sample weights was same for boys and girls,
, and resulted in effective sample sizes of
n^{*} = 10,387 (=18,592/1.79) boys and 10,491 (=18,779/1.79) girls and in the bandwidth for the conditional mean curves of 8.48 and 9.27 months for boys and girls, respectively.
In general, the shape of the single-kernel curves is similar to the double-kernel ones, but the expected greater roughness of the single-kernel curves is evident (). The CDC curves and the single and double-kernel curves for the conditional percentiles generally track together ( and ). However, there appears to be important systematic differences between the kernel and CDC curves. For example, the nadirs for CDC curves tend to be lower than those for the kernel curves where BMI decreases until approximately 4.5 to 6.5 years depending upon the percentile. The nadir discrepancy between the CDC curves and kernel curves appear to be larger for girls than for boys. Among the boys for ages beyond the nadir, the percentiles increase approximately linearly for the CDC growth curves whereas the kernel curves show nonlinear trajectories particularly in the higher and lower percentiles. Among the girls for ages beginning at about 11 to 13 years the kernel curves tend to be higher than the CDC curves across the different percentiles. Another major difference between the kernel and CDC curves appears near the oldest ages between 19 and 20 years that suggests the CDC curves may be oversmoothed. Among the boys, the 2000 CDC growth curves show an approximately linear increase in BMI in the older ages whereas the kernel curves show a plateauing of BMI starting at about 18 to 19 years. For the girls, the 2000 CDC growth curves for percentiles above the 50^{th} percentile show an approximately linear increase in BMI in the older ages whereas the kernel curves show a plateauing of BMI starting at about 16–17 years. In order to further evaluate the differences in the patterns of growth in BMI between the curves, we plotted BMI-for-age using the kernel methods for ages 20–23 years. The increase of BMI for older ages, as expected, is not as fast as the earlier ages, especially for the extreme percentiles. Because of the upward trajectory of the CDC BMI-for-age curves up to age 20, these curves give the impression that the children at older ages will increase their BMI as they age beyond 20 years and do not show the leveling off of the curves at ages 18–20.
The standard errors in for the double-kernel percentile curve were estimated using 200 random half-sample replications where one PSU was randomly selected from the two (sample) PSUs in each (sample) stratum from each survey to form a half-sample replicate [
14]. The five national surveys used to develop the growth curves have two sample PSUs for each sample stratum except for the first ten strata in NHANES I where the entire stratum was a PSU (called a certainty stratum). In order to form two (pseudo-)PSUs in these strata, we followed a standard survey research approach by randomly dividing the next smaller sample units (segments) in these stratum into two groups per stratum [
14]. The strata were combined across the surveys by concatenating them to form a total of 156 strata, each now with two PSUs. Each half-sample were used to estimate a replicate double-kernel smoothed percentile curve. Following the recommendation of Korn and Graubard [
14], the bandwidth used for the replicate double-kernel estimates is the same as the bandwidth used for the kernel estimates for the original data. The variances were estimated from these 200 half-sample replicates [
14]. presents the standard error of 85
^{th} and 95
^{th} double-kernel percentile estimates for age among U.S. children. In general, standard error increases with age, and is less than 0.4 for the 85
^{th} percentile and less than 1 for the 95
^{th} percentile.
To test statistically the difference between the CDC and the double-kernel growth curves, we need to estimate the variances for differences between corresponding points for the CDC and double-kernel growth curves. As described below, this requires re-estimating both sets of curves from half-sample replicates of the data. We were unable to do this because some of the exact details of the construction of the CDC growth curves were not described and the CDC computer programs for estimating the curves are not available. Therefore, in , we provide estimates for the standard errors of the points on 85^{th} and 95^{th} percentile for only the double-kernel curves. Because both the CDC and the double-kernel curves have similar shapes and are estimated using the same data, the covariance between the two sets of estimates should be large resulting in quite small standard errors for the differences between the curves.
Recently the 2000 CDC percentile curves were used as reference curves with the recent NHANES 2003–2006 sample for obtaining the most current U.S. prevalence estimates of children and adolescents with a high BMI, i.e., a BMI at or above the 85
^{th} or 95
^{th} percentiles for age [
9]. In this application we compare the prevalence estimates using our double-kernel percentile curves to those using the 2000 CDC percentile curves (). Prevalence estimates are presented for the same sex, racial/ethnic, and age groups as Ogden et al. [
9], except that the last age group (12–19 years) was further partitioned into 12–15 years and 16–19 years so that there were approximately same sample sizes across age groups. Design-based standard errors (computed using Taylor linearization variance estimation, SUDAAN [
18]) of the prevalence estimates and two-sided p-values for differences in the prevalence based on the two percentile curves are presented by taking into account the variability from the NHANES 2003–2006 data but assuming no variability in either the 2000 CDC or the double-kernel percentile curves. This was the same assumption used by Ogden et al. [
9] in their estimates of standard errors and is the assumption used whenever the 2000 CDC percentile curves are used as reference curves.
| Table 3^{1}Comparison of Percentage of Children with high BMI by Age, Gender and Race based on Double-Kernel versus 2000 CDC Percentile Curves among US Children, 2003–2006 (SE's are in parentheses) |
For 2–5 years olds, the CDC prevalence estimates for children at or above the 85^{th} and 95^{th} percentiles tend to be larger. For 6–11 years, the double-kernel estimates are close to the CDC estimates and most differences between the two estimates are within 1%. For 12–15 and 16–19 years, the boys and girls show different patterns. For boys the two sets of estimates are similar, but for the girls the CDC estimates for 12–15 years tend to estimate higher prevalence compared to the double kernel estimates, particularly for the 85^{th} percentile; while the CDC estimates for girls in 16–19 years are consistently smaller than the double-kernel estimates. These findings are consistent with the appearance of the curves in .