PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Ann Appl Stat. Author manuscript; available in PMC Jul 28, 2011.
Published in final edited form as:
Ann Appl Stat. Jun 1, 2011; 5(2B): 1456–1487.
doi:  10.1214/10-AOAS446
PMCID: PMC3145332
NIHMSID: NIHMS269689
A NEW MULTIVARIATE MEASUREMENT ERROR MODEL WITH ZERO-INFLATED DIETARY DATA, AND ITS APPLICATION TO DIETARY ASSESSMENT
Saijuan Zhang,* Douglas Midthune, Patricia M. Guenther, Susan M. Krebs-Smith, Victor Kipnis, Kevin W. Dodd, Dennis W. Buckman, Janet A. Tooze, Laurence Freedman, and Raymond J. Carroll*
Saijuan Zhang, Department of Statistics Texas A&M University 3143 TAMU College Station, Texas 77843-3143 U.S.A.
Texas A&M University, National Cancer Institute, U.S. Department of Agriculture, National Cancer Institute, National Cancer Institute, National Cancer Institute, Information Management Services, Inc., Wake Forest University, Sheba Medical Center and Texas A&M University
*This paper forms part of Zhang's Ph.D. dissertation at Texas A&M University. Zhang and Carroll's research was supported by a grant from the National Cancer Institute (CA57030). This work was also supported by National Science Foundation Instrumentation grant number 0922866.
Corresponding Author.
sjzhang/at/stat.tamu.edu carroll/at/stat.tamu.edu midthund/at/mail.nih.gov kipnisv/at/mail.nih.gov doddk/at/mail.nih.gov
In the United States the preferred method of obtaining dietary intake data is the 24-hour dietary recall, yet the measure of most interest is usual or long-term average daily intake, which is impossible to measure. Thus, usual dietary intake is assessed with considerable measurement error. Also, diet represents numerous foods, nutrients and other components, each of which have distinctive attributes. Sometimes, it is useful to examine intake of these components separately, but increasingly nutritionists are interested in exploring them collectively to capture overall dietary patterns. Consumption of these components varies widely: some are consumed daily by almost everyone on every day, while others are episodically consumed so that 24-hour recall data are zero-inflated. In addition, they are often correlated with each other. Finally, it is often preferable to analyze the amount of a dietary component relative to the amount of energy (calories) in a diet because dietary recommendations often vary with energy level. The quest to understand overall dietary patterns of usual intake has to this point reached a standstill. There are no statistical methods or models available to model such complex multivariate data with its measurement error and zero inflation. This paper proposes the first such model, and it proposes the first workable solution to fit such a model. After describing the model, we use survey-weighted MCMC computations to fit the model, with uncertainty estimation coming from balanced repeated replication.
The methodology is illustrated through an application to estimating the population distribution of the Healthy Eating Index-2005 (HEI-2005), a multi-component dietary quality index involving ratios of interrelated dietary components to energy, among children aged 2-8 in the United States. We pose a number of interesting questions about the HEI-2005 and provide answers that were not previously within the realm of possibility, and we indicate ways that our approach can be used to answer other questions of importance to nutritional science and public health.
Keywords: Bayesian methods, Dietary assessment, Latent variables, Measurement error, Mixed models, Nutritional epidemiology, Nutritional surveillance, Zero-Inflated Data
This paper presents statistical models and methodology to overcome a major stumbling block in the field of dietary assessment. More nutritional background is provided in Section 2: a summary of the key conceptual issues follows.
  • Nutritional surveys conducted in the United States typically use 24-hour (24hr) dietary recalls to obtain intake data, i.e., an assessment of what was consumed in the past 24 hours.
  • Because dietary recommendations are intended to be met over time, nutritionists are interested in “usual” or long-term average daily intake.
  • Dietary intake is thus assessed with considerable measurement error.
  • Consumption patterns of dietary components vary widely; some are consumed daily by almost everyone, while others are episodically consumed so that 24-hour recall data are zero-inflated. Further, these components are correlated with one another.
  • Nutritionists are interested in dietary components collectively to capture patterns of usual dietary intake, and thus need multivariate models for usual intake.
  • These multivariate models for usual intakes, taking into account episodically consumed foods, do not exist, nor do methods exist for fitting them.
One way to capture dietary patterns is by scores, although our work is not limited to scores. The Healthy Eating Index-2005 (HEI-2005), described in detail in Section 2, is a scoring system based on a priori knowledge of dietary recommendations, and is on a scale of 0 to 100. Ideally, it consists of the usual intake of 6 episodically consumed and thus 24hr-zero inflated foods, 6 daily-consumed dietary components, adjusts these for energy (caloric) intake, and gives a score to each component. The total score is the sum of the individual component scores. Higher scores indicate greater compliance with dietary guidelines and, therefore, a healthier diet. Here are a few questions that nutritionists have not been able to answer, and that our approach can address.
  • What is the distribution of the HEI-2005 total score, and what % of Americans are eating a healthier diet defined for example, by a total score exceeding 80?
  • What is the correlation between the individual score on each dietary component and the scores of all other dietary components?
  • Among those whose total HEI-2005 score is > 50 or ≤ 50, what is the distribution of usual intake of whole grains, whole fruits, dark green and orange vegetables and legumes (DOL) and calories from solid fats, alcoholic beverages and added sugars (SoFAAS)?
  • What % of Americans exceed the median score on all 12 HEI-2005 components?
In this paper, to answer public health questions such as these that can have policy implications, we build a novel multivariate measurement error model for estimating the distributions of usual intakes, one that accounts for measurement error and zero-inflation, and has a special structure associated with the zero-inflation. Previous attempts to fit even simple versions of this model, using nonlinear mixed effects software, failed because of the complexity and dimensionality of the model. We use survey-weighted Monte Carlo computations to fit the model with uncertainty estimation coming from balanced repeated replication. The methodology is illustrated using the HEI-2005 to assess the diets of children aged 2-8 in the United States. This work represents the first analysis of joint distributions of usual intakes for multiple food groups and nutrients.
The paper is outlined as follows. In Section 2 we give the background for the data we observe. In particular, we provide more information about the HEI-2005. Section 3 describes our model which is a highly nonlinear, zero-inflated, repeated measures model with multiple latent variables. The model also has a patterned covariance matrix with structural zeros and ones. We derive a parameterization that allows estimated covariance matrices to be actual covariance matrices. We also define technically what we mean by usual intake, and illustrate the use of simulation methods used to answer the questions posed above, as well as many others.
Section 4 describes our estimation procedure. Previous attempts using nonlinear mixed effects models to estimate the distribution of episodically consumed food groups (Tooze, et al., 2006; Kipnis, et al., 2009) do not work here because of the high dimensionality of the problem. We instead develop a Monte Carlo strategy based on the idea of Gibbs sampling; although because of sampling weights, we treat the method as a frequentist (non-Bayesian) one. This section describes some of the basics of the methodology; the full technical details of implementation are given in an appendix.
Section 5 describes the analysis of the HEI-2005 components using the 2001-2004 National Health and Nutrition Examination Survey (NHANES) for children ages 2-8. Important contextual points arise because of the nature of the data. For example, if whole grains are consumed, then necessarily total grains are consumed with probability one, a restriction that a naive use of our model cannot handle. We develop a simple novel device to uncouple consumption variables that are tightly linked in this way. Finally in this section, we provide the first answers to the four questions we have posed. In Section 6, we discuss various additional aspects of the problem and the data analysis. Concluding remarks and a policy application are given in Section 7.
There are a number of general reviews of the measurement error field (Fuller, 1987; Gustafson, 2003; Carroll, et al., 2006; Buonaccorsi, 2010). Recent papers that focus on estimating the density function of a univariate continuous random variable subject to measurement error include Delaigle (2008), Delaigle and Hall (2008, 2010), Delaigle and Meister (2008), Delaigle, et al. (2008), Staudenmayer, et al. (2008) and Wand (1998). The field of measurement error in regression continues to expand rapidly, with some recent contributions including Küchenhoff, et al (2006), Guolo (2008), Liang, et al. (2008), Messer and Natarajan (2008) and Natarajan (2009). There is also a large statistical literature on measurement error as it relates to public health nutrition: some recent papers relevant to our work include Carriquiry (1999, 2003), Ferrari, et al. (2009), Fraser and Shavlik (2004), Kott, et al. (2009), Nusser, et al. (1996, 1997), Prentice (1996, 2003), and Tooze, et al. (2003, 2006).
Here we give more detail about the nutrition context that motivates this work.
In surveys conducted in the United States, the preferred method of obtaining intake data is the 24-hour dietary recall because it limits respondent burden and facilitates accurate reporting; yet the measure of greatest interest is “usual” or long-term average daily intake. Thus dietary intake is assessed with considerable measurement error. Also, diets are comprised of numerous foods, nutrients, and other components, each of which may have distinctive attributes and effects on nutritional health. Sometimes, it is useful to examine intake of these components separately, but increasingly nutritionists are interested in exploring them collectively to capture patterns of dietary intake. Consumption patterns of these components vary widely; some are consumed daily by almost everyone while others are episodically consumed so that 24-hour recall data are zero-inflated. In addition, these various components are often correlated with one other. Finally, it is often preferable to analyze the amount of a dietary component relative to the amount of energy (calories) in a diet because dietary recommendations often vary with energy level, and this approach provides a way of standardizing dietary assessments.
One of the US Department of Agriculture's (USDA's) strategic objectives is “to promote healthy diets” and it has developed an associated performance measure, the Healthy Eating Index-2005 (HEI-2005, http://www.cnpp.usda.gov/HealthyEatingIndex.htm). The HEI-2005 is based on the key recommendations of the 2005 Dietary Guidelines for Americans (http://www.health.gov/dietaryguidelines/dga2005/document/default.htm). The index includes ratios of interrelated dietary components to energy. The HEI-2005 comprises 12 distinct component scores and a total summary score. See Table 1 for a list of these components and the standards for scoring, and see Guenther et al. (2008) for details. Intakes of each food or nutrient, represented by one of the 12 components, are expressed as a ratio to energy intake, assessed, and ascribed a score.
Table 1
Table 1
Description of the HEI-2005 scoring system. Except for saturated fat and SoFAAS, density is obtained by multiplying usual intake by 1000 and dividing by usual intake of kilo-calories. For saturated fat, density is 9 × 100 usual saturated fat (grams) (more ...)
The HEI-2005 is used to evaluate the diets of Americans to assess compliance with the 2005 Dietary Guidelines, yet use of the HEI-2005 is limited by the challenges described above. Until recently, there have been no solutions to these challenges, so published evaluations have been limited to analyses of mean scores for the population and various subgroups. Freedman, et al. (2010) have described a method of estimating the population distribution of a single component of HEI-2005, and the prevalence of high or low scores on that component; but there has been to date no satisfactory way to determine the prevalence of high or low total HEI-2005 scores, considering all of its interrelated components simultaneously. In addition, answers to the complex questions posed in the Introduction remain unavailable. This paper aims to provide a means to do these crucial evaluations.
The 12 HEI-2005 components represent 6 episodically consumed food groups (total fruit, whole fruit, total vegetables, dark green and orange vegetables and legumes or DOL, whole grains and milk), 3 daily-consumed food groups (total grains, meat and beans and oils), and 3 other daily-consumed dietary components (saturated fat; sodium; and calories from solid fats, alcoholic beverages and added sugars, or SoFAAS). The classification of food groups as “episodically” and “daily” consumed is based on the number of individuals who report them on 24hr recalls. If there are only a few zeros for a component, we treat that as a daily-consumed food, and replace all zeros with 1/2 the minimum value of the non-zeros for that food. However, the crucial statistical aspect of the data is that six of the food groups are zero-inflated. The percentages of reported non-consumption of total fruit, whole fruit, whole grains, total vegetables, DOL, and milk on any single day are 17%, 40%, 42%, 3%, 50% and 12%, respectively.
We are interested in the usual intake of foods for children aged 2-8. The data available to us, described in more detail in Section 5, came from the National Health and Nutrition Examination Survey, 2001-2004 (NHANES). The data used here consisted of n = 2, 638 children, each of whom had a survey weight wi for i = 1, ..., n. In addition, one or two 24hr dietary recalls were available for each individual. Along with the dietary variables, there are covariates such as age, gender, ethnicity, family income and dummy variables that indicate a weekday or a weekend day, and whether the recall was the first or second reported for that individual.
Using the 24hr recall data reported, for each of the episodically consumed food groups, two variables are defined: (a) whether a food from that group was consumed; and (b) the amount of the food that was reported on the 24hr recall. For the 6 daily-consumed food groups and nutrients, only one variable indicating the consumption amount is defined. In addition, the amount of energy that is calculated from the 24hr recall is of interest. The number of dietary variables for each 24hr recall is thus 12+6+1 = 19. The observed data are Yijk for the ith person, the jth variable and the kth replicate, j = 1, . . . , 19 and k = 1, . . . , mi. In the data set, at most two 24hr recalls were observed, so that mi ≤ 2. Set ik = (Yi1k, ..., Yi,19,k)T, where
  • equation M1 = Indicator of whether dietary component equation M2 is consumed, with equation M3.
  • equation M4 = Amount of food equation M5 consumed. This equals zero, of course, if none of food equation M6 is consumed, with equation M7.
  • equation M8 = Amount of non-episodically consumed food or nutrient equation M9, with equation M10.
  • Yi,19,k = Amount of energy consumed as reported by the 24hr recall.
3.1. Basic Model Description
Our model is a generalization of work by Tooze et al. (2006) and Kipnis, et al. (2009) for a single food and Kipnis, et al. (2010) and Zhang, et al. (2010) for a single food and nutrient. Observed data will be denoted as Y, and covariates in the model will be denoted as X. As is usual in measurement error problems, there will also be latent variables, which will be denoted by W.
We use a probit threshold model. Each of the 6 episodically consumed foods will have 2 sets of latent variables, one for consumption and one for amount, while the 6 daily-consumed foods and nutrients as well as energy will have 1 set of latent variables, for a total of 19. The latent random variables are εijk and Uij, where (Ui1, . . . , Ui,19) = Normal(0, Σu) and (εi1k, . . . , εi,19,k) = Normal(0, Σε) are mutually independent. In this model, food equation M11 being consumed on day k is equivalent to observing the binary equation M12, where
equation M13
(3.1)
If the food is consumed we model the amount reported equation M14 as
equation M15
(3.2)
where equation M16, g(y, λ) is the usual Box-Cox transformation with transformation parameter λ, and {μ(λ), σ(λ)} are the sample mean and standard deviation of g(y, λ), computed from the non-zero food data. This standardization is simply a convenient device to improve the numerical performance of our algorithm without affecting the conclusions of our analysis.
The reported consumption of daily consumed foods or nutrients equation M17 are modeled as
equation M18
(3.3)
Finally, energy is modeled as
equation M19
(3.4)
As seen in (3.3)-(3.4), different transformations (λ1, ..., λ13) are allowed to be used for the different types of dietary components, see Section A.12.
In summary, there are latent variables equation M20, latent random effects Ũi = (Ui1, ..., Ui,19)T, fixed effects (β1, ..., β19), and design matrices (Xi1k, ..., Xi,19,k). Define equation M21. The latent variable model is
equation M22
(3.5)
where Ũi = Normal(0, Σu) and equation M23 are mutually independent.
3.2. Restriction on the Covariance Matrix
Two necessary restrictions are set on Σε. First, following Kipnis, et al. (2009, 2010), equation M24 and equation M25 are set to be independent. Second, in order to technically identify equation M26 and the distribution of equation M27, we require that equation M28, because otherwise the marginal probability of consumption of dietary component equation M29 would be equation M30, and thus components of β and Σu would be identified only up to the scale equation M31.
So that we can handle any number of episodically consumed dietary components and any number of daily consumed components, suppose that there are J episodically consumed dietary components, and K daily consumed dietary components, and in addition there is energy. Then the restrictions defined above lead to the covariance matrix
equation M32
(3.6)
The diffculty with parameterizations of (3.2) is that the cells that are not constrained to be 0 or 1 cannot be left unconstrained, otherwise (3.2) need not be a covariance matrix, i.e., positive semidefinite.
We have developed an unconstrained parameterization that results in the structure (3.2). Consider an unconstrained lower triangular matrix V and define Σε = VVT. This is positive semidefinite and therefore qualifies Σε as a proper covariance matrix. The form of V is
equation M33
To achieve the desired pattern (3.2), we derive the following four restrictions:
equation M34
The third restriction can be ensured by the further parameterization
equation M35
where q = 2, 3, . . . , J – 1; |rt| ≤ 1, t = 1, . . . , J – 1, and |θs| ≤ π, s = 1, . . . , (J – 1)2.
Similarly, the fourth restriction can be further expressed by setting
equation M36
where q = 3, 5, . . . , 2J – 1. Note that equation M37.
3.3. The Use of Sampling Weights
As described in the Appendix, we used the survey sample weights from NHANES both in the model fitting procedure and, after having fit the model, in estimating the distributions of usual intake.
While not displayed here, we redid the model fitting calculations without weighting, because the covariates we use are major players in determining the sampling weights, hence it is reasonable to believe that the model in Section 3 holds both in the sample and in the population. When we did this, the parameter estimates were essentially unchanged.
Thus, we use the sampling weights only for estimation of the population distributions. We actually did this for the purpose of handling the clustering in the sample design. For such a complex statistical procedure as ours, we knew we could not do theoretical standard errors, so we thought about the bootstrap, and realized that putting together a bootstrap for the complex survey would be nearly impossible. However, we already had developed a set of Balanced Repeated Replication (BRR) weights (Wolter, 1995), see Section 5.7 for details. These BRR weights have the property that, in the frequentist survey sampling sense, they appropriately reflect the clustering in the standard error calculations.
Of course, the use of sampling weights in the modeling provide unbiased estimates of the (super) population parameters of interest. In addition, the use of sampling weights in the distribution estimation provides an estimated distribution that is representative of the US population, not just the sample.
3.4. Distribution of Usual Intake and the HEI-2005 Scores
We assume here that estimates of Σu, Σε and βj for j = 1, ..., 19 have been constructed, see Section 4. Here we discuss what we mean by usual intake for an individual, how to estimate the distribution of usual intakes, how to convert usual intakes into HEI-2005 scores, and how to assess uncertainty.
Consider the first episodically consumed dietary component, a food group, with reporting being done on a weekend. Set Xi1,wkend and Xi2,wkend to be the versions of Xi1k and Xi2k where the dummy variable has the indicator of the weekend and that the recall is the first one. Following Kipnis, et al. (2009), we define the usual intake for an individual on the weekend to be the expectation of the reported intake conditional on the person's random effects Ũi. Let the (q, p) element of Σε be denoted as Σε,q,p. As in Kipnis, et al. define
equation M38
(3.7)
Detailed formulas for this are given in Appendix A.11. Then, following the convention of Kipnis, et al. (2009), the person's usual intake of the first episodically consumed dietary component on the weekend is defined as
equation M39
Similarly, let Xi1,wkday and Xi2,wkday be as above but the dummy variable is appropriate for a weekday. Then the person's usual intake of the first episodically consumed food group on weekdays is defined as
equation M40
Finally, the usual intake of the first episodically consumed food for the individual is
equation M41
since Fridays, Saturdays and Sundays are considered to be weekend days. Usual intake for the other episodically consumed food groups is defined similarly.
A person's usual intake of a daily-consumed food group/nutrient and energy on the original scale is defined similarly. Consider, for example, energy, which is the 13th dietary component and the 19th set of terms in the model. Let Xi,19,wkend and Xi,19,wkday be the versions of Xi,19,k where the dummy variable has the indicator of the weekend or weekday, respectively, and that the recall is the first one. Then
equation M42
Similar formulae are used for the other daily-consumed foods and nutrients.
Finally, the energy-adjusted usual intakes and the HEI-2005 scores are then obtained as in Table 1, using the estimated usual intakes of the dietary components.
To find the joint distribution of usual intakes of the HEI-2005 scores, it is convenient to use Monte-Carlo methods. Recall that wi is the sampling weight for individual i. Let B be a large number: we set B = 5, 000. Generate b = 1, ..., B observations Ũbi = Normal(0, Σu) and then obtain equation M43 by replacing Uij in their formulae by Ubij. With appropriate sample weighting, the Tbi can be used to estimate joint and marginal distributions. Thus, for example, consider the total HEI-2005 score, which is a deterministic function of the usual intakes, say G(Ti). Its cumulative distribution function is estimated as
equation M44
(3.8)
Frequentist standard errors of derived quantities such a mean, median and quantiles can be estimated using the Balanced Repeated Replication (BRR) method (Wolter, 1995), see Section 5.7 for details.
Our model (3.3)-(3.4) is a highly nonlinear, mixed effects model with many latent variables and nonlinear restrictions on the covariance matrix Σε. As seen in Section 3.4, we can estimate relevant distributions of usual intake in the population if we can estimate Σu, Σε and βj for j = 1, ..., 19. We have found that working within a pseudo-likelihood Bayesian paradigm is a convenient way to do this computation. We emphasize, however, that we are doing this only to get frequentist parameter estimates based on the well-known asymptotic equivalence of frequentist likelihood estimators and Bayesian posterior means, and especially the consistency of both (Lehmann and Casella, 1998). We are specifically not doing Bayesian posterior inference, since valid Bayesian inference in a complex survey such as NHANES is an immensely challenging task, and because frequentist estimation and inference are the standard in the nutrition community.
Kipnis, et al. (2009) were able to get estimates of parameters separately for each food group using the nonlinear mixed effects program NLMIXED in SAS with sampling weights. While this gives estimates of βj for j = 1, ..., 19, it only gives us parts of the covariance matrices Σu and Σε, and not all the entries. Using the 2001-2004 NHANES data, we have verified that our estimates and the subset of the parameters that can be estimated by one food group at a time using NLMIXED are in close agreement, and that estimates of the distributions of usual intake and HEI-2005 component scores are also in close agreement. We expect this because of the rather large sample size in our data set. Zhang, et al. (2010) have shown that even considering a single food group plus energy is a challenge for the NLMIXED procedure, both in time and in convergence, and using this method for the entire HEI-2005 constellation of dietary components is impossible.
Full technical details of the model fitting procedure are given in Appendices A.1-A.10.
Of course, our model has assumptions, e.g., additivity and homoscedasticity on a transformed scale for observed and latent variables, normality of person-specific random effects and normality of day-to-day variability on the transformed scale. These assumptions are clearly not exactly correct, although our marginal model-checking suggests to us that they are mostly not disastrously wrong. Some reasons for this conclusion include the facts that we reproduce the marginal distributions of the components, that comparison with 24hr recalls shows differences that decrease when moving from one 24hr recall to two 24hr recalls, that q-q plots of the data are fairly satisfactory, etc. Thinking, as we do, of our work as a first step, and not a last step, it would be extremely interesting to make the model more general, e.g., skew-normal, skew-t or Dirichlet process distributions after transformation, and possibly directly modeling heteroscedasticity. Such generalizations will require effort to implement, but will speak to the robustness of the results and would be a useful future step.
5.1. Basic Analysis
We analyzed data from the 2001-2004 National Health and Nutrition Examination Survey (NHANES) for children age 2-8. The study sample consisted of 2, 638 children, among whom 1, 103 children have two 24hr recalls and the rest have only one. We used the dietary intake data to calculate the 12 HEI-2005 components plus energy. In addition, besides age, gender, race and interaction terms, two covariates were employed, along with an intercept. The first was a dummy variable indicating whether or not the recall was for a weekend day (Friday, Saturday, or Sunday) because food intakes are known to differ systematically on weekends and weekdays. The second was a dummy variable indicating whether the 24hr recall was the first or second such recall, the idea being that there may be systematic differences attributable to the repeated administration of the instrument.
5.2. Contextual Information
When we ran our program based on the variables in Table 1, the results were disastrous. Mixing of the MCMC sampler was very poor, with long sojourns in different regions.
The reason for this failure to converge depends on the context of the dietary variables. For example, whole grains are a subset of total grains. Thus, if someone consumes any whole grains, then necessarily, with probability 1.0, that person also consumes total grains. Such a restriction cannot be handled by our model, because it would force one of the random effects U to equal infinity. A similar thing happens for energy. Calories coming from saturated fat are a subset of total calories as are calories from SoFAAS, so there is a restriction that total calories must be greater than calories from saturated fat and also greater than calories from SoFAAS. Since the latter sum makes up a significant portion of calories, this restriction is not something that our model can handle well.
Luckily, there is an easy and natural context-based solution. Instead of using total grains in the model, we used grains that are not whole grains, i.e., refined grains, thus decoupling whole grains and total grains, and removing the restriction mentioned above. Similarly, instead of using total fruit, we use fruit that is not whole fruits, i.e., fruit juices. Additionally, instead of using total vegetables, we use total vegetables excluding dark green and orange vegetables and legumes. Finally, instead of total energy, we use total energy minus the sum of energy from saturated fat (11% of mean energy) and from SoFAAS (35% of mean energy). We recognize that there is overlap of energy from saturated fat and energy from solid fat, but this has no impact on our analysis since total energy has sources other than these two. An alternative of course, would have been to simply use total energy minus energy from SoFAAS,
This is sufficient to estimate the distributions of interest. If, for example, in the new data set Ti1 represents usual intake of non-whole fruits, and Ti2 is usual intake of whole fruits, then the usual intake of total fruits is Ti1 + Ti2. Similar remarks apply for total grains and total vegetables.
With these new variables, our model mixed well and gave reasonable looking answers that, as mentioned in Section 4, give similar results to other methods employed with smaller parts of the data set.
5.3. Estimation of the HEI-2005 Scores
In the introduction, we posed 4 questions to which answers had not been possible previously. The first open question concerned the distribution of the HEI total score. Along the way towards this, Table 2 presents the energy-adjusted distributions of the dietary components used in the HEI-2005. Table 3 presents the distributions of the HEI-2005 individual component scores and the total score, with a graphical view given in Figure 1.
Table 2
Table 2
Estimated distributions of energy-adjusted usual intakes for children aged 2-8; NHANES, 2001-2004. For each dietary component, the first line = estimate from our model, while the second line is its BRR-estimated standard error. Here, “DOL” (more ...)
Table 3
Table 3
Estimated distributions of the usual intake HEI-2005 scores. For each component score, the first line = estimate from our model, while the second line is its BRR-estimated standard error. The total score is the sum of the individual scores. Here, “DOL” (more ...)
Fig 1
Fig 1
The estimated percentiles of the HEI-2005 total score. The horizontal axis is the percentile of interest, e.g., 0.5 refers to the median, while the vertical axis gives percentile of the HEI-2005 scores. Standard error estimates are given in Table 2.
Table 3 presents the first estimates of the distribution of HEI-2005 scores for a vulnerable subgroup of the population, namely children aged 2-8 years. A previous analysis of 2003-04 NHANES data, looking separately at 2-5 year olds and 6-11 year olds, was limited to estimates of mean usual HEI-2005 scores (59.6 and 54.7, respectively, see Fungwe, et al., 2009). The mean scores noted here are comparable to those and reinforce the notion that children's diets, on average, are far from ideal. However, this analysis provides a more complete picture of the state of US children's diets. By including the scores at various percentiles, we estimate that only 5% of children have a score of 69 or greater and another 10% have scores of 41 or lower. While not in the Table, we also estimate that the 99th percentile is 74. This analysis suggests that virtually all children in the US have suboptimal diets and that a sizeable fraction (10%) have alarmingly low scores (41 or lower.)
We have also considered whether our multivariate model fitting procedure gives reasonable marginal answers. To check this, we note that it is possible to use the SAS procedure NLMIXED separately for each component to fit a model with one episodically consumed food group or daily consumed dietary component together with energy. The marginal distributions of each such component done separately are quite close to what we have reported in Table 3, as is our mean, which is 53.50 compared to the mean of 53.25 based on analyzing one HEI-2005 component at a time with the NLMIXED procedure. The only case where there is a mild discrepancy is in the estimated variability of the energy-adjusted usual intake of oils, likely caused by the NLMIXED procedure itself, which has an estimated variance 9 times greater than our estimated variance.
Of course, it is the distribution of the HEI-2005 total score that cannot be estimated by analysis of one component at a time.
There are other things that have not been computed previously that are simple by-products of our analysis. For example, the correlations among energy-adjusted usual intakes involving episodically consumed foods have not been estimated previously, but this is easy for us, see Table 4. The estimated correlation of –0.64 between energy-adjusted total fruit and energy-adjusted SoFAAS, and the –0.47 correlation between DOL and SoFAAS are surprisingly high.
Table 4
Table 4
Estimated correlation matrix for energy-adjusted usual intakes. Here TF = Total Fruits, WF = Whole Fruits, TV = Total Vegetables, WG = Whole Grains, TG = Total Grains, SatFat = Saturated Fat. Here, “DOL” is dark green and orange vegetables (more ...)
5.4. Component Scores and Other Scores
As described in the introduction, an open problem has been to estimate the correlation between the individual score on each dietary component and the scores of all other dietary components. In their Table 3, Guenther, et al. (2008b) consider this problem, but of course they did not have a model for usual energy adjusted intakes, and instead they used a single 24hr recall. In Table 5, we show the resulting correlations using (a) a single 24hr recall; (b) the mean of two 24hr recalls for those who have two 24hr recalls; and (c) our model for usual intake. The numbers for the former differ from that of Guenther, et al. (2008b) because we are considering here a different population than do they. A striking and not unexpected aspect of this table is that for those components with non-trivial correlations, the correlations all increase as one moves from a single 24hr recall to the mean of two 24hr recalls and then finally to estimated usual intake. Thus, for example, the correlation between the HEI-2005 score for total fruit and its difference with the total score is 0.38 for a single 24hr recall, 0.44 for the mean of two 24hr recalls and then finally 0.62 for usual intake.
Table 5
Table 5
Estimated correlations between each individual HEI-2005 component score and the sum of the other HEI component scores, i.e., the difference of the total score and each individual component. The column labeled “Two 24hr” is the naive analysis (more ...)
5.5. Distributions of Intakes for Subsets of HEI Total Scores
A third open question is: among those whose total HEI-2005 score is > 50 or ≤ 50, what is the distribution of energy-adjusted usual intake of whole grains, whole fruits, dark green and orange vegetables and legumes (DOL) and calories from solid fats, alcoholic beverages and added sugars (SoFAAS)? This follows naturally from our method. Following (3.8), let G1(Tbi) be energy adjusted usual intake and let G2(Tbi) be the HEI total score. Then the distributions in question for when the total HEI-2005 score is > 50 can be estimated as equation M45.
The results are provided in Table 6, with a graphical view in Figure 2. The results show that those who have poorer diets with usual HEI-2005 total score ≤ 50 are consistently eating poorer diets, i.e., less whole fruits, less whole grains and less DOL, but higher SoFAAS.
Table 6
Table 6
Estimated distributions of energy-adjusted usual intake for those whose total HEI-2005 total scores are ≤ 50 and > 50. Here, “DOL” is dark green and orange vegetables and legumes. Also, “SoFAAS” is calories (more ...)
Fig 2
Fig 2
The estimated percentiles of the energy-adjusted usual intakes for Whole fruits (Top left) in cups/(1000 kcal), Whole grains (Top right) in ounces/(1000 kcal), DOL (bottom left) in cups/(1000 kcal) and calories from SoFAAS (bottom right) in % of Energy. (more ...)
5.6. Dietary Consistency
We stated in the introduction that it is interesting to understand the percentage of children whose usual intake HEI score exceeds the median HEI score on all 12 HEI components. Those median scores, say (κ1, ..., κ12), are estimated in Table 3. If Gj(Tbi) is the HEI component score for episodically consumed food j, then following (3.8) the quantity in question can be estimated as equation M46. We estimate that the percentage is 6%, woefully small. The percentage of children whose usual intake HEI score exceeds the median HEI score on all 12 HEI components is 0.24%. Figure 3 gives the estimated probabilities of exceeding the κ percentile on all 12 HEI components simultaneously, for κ = 1, 2, ..., 99.
Fig 3
Fig 3
The Y-axis gives the estimated probabilities of exceeding the κ (X-axis) percentile on all 12 HEI components, for κ = 1,2, ..., 99, see Section 5.6.
5.7. Uncertainty Quantification
The BRR standard errors of HEI-2005 components’ adjusted usual intakes and scores are shown in Tables 2 and and3.3. The BRR weights are only used in variance calculations. Once we have estimated some quantity, say equation M47, from the sample using sample weight, we will need to compute the same quantity using, in succession, the 32 BRR weights. This will give us 32 estimates equation M48. The BRR estimate for the variance of equation M49 is equation M50. The 32 in the denominator is for the 32 different estimates from the 32 different sets of weights, and the 0.49 is the square of the perturbation factor used to construct the BRR weight sets (Wolter, 1995).
6.1. Never Consumers
An aspect of the modeling that we have not discussed is the possibility that some people never, ever consume an episodically consumed dietary component. Our model does not allow for this, for general reasons and for reasons that are specific to our data analysis.
It is in principle possible to add an additional modeling step for non-consumers, via fixed effects probit regression, but we do not think this is a practical issue in our case, for two reasons.
  • The first is that the HEI-2005 is based on 6 episodically consumed dietary components, namely total fruit, whole fruit, whole grains, total vegetables, DOL, and milk, the latter of which includes cheese, yogurt and soy beverages. None of these are “lifestyle adverse”, unlike say alcohol. While 40% of the responses for whole fruits, for example, equal zero, the percentage of children who never eat any whole fruits at all is likely to be minuscule.
  • Even if one disputes whether there are very few individuals who never consume one of the dietary components, then it necessarily follows that we have overestimated the HEI-2005 total scores, and hence the estimates of the proportion of individuals with alarmingly low HEI scores are deflated, and not inflated. The reason is that our model suggests everyone has a positive usual intake of the 6 episodically consumed dietary components. Since the HEI-2005 score components are nondecreasing functions of usual intake of the episodically consumed dietary components, this would mean that we overestimate the HEI-2005 total score.
6.2. Computing and Data
Our programs were written in Matlab. The programs, along with the NHANES data we used, are available in the Annals of Applied Statistics online archive. Although a much smaller amount of computing effort yields similar results, using 70, 000 MCMC steps with a burn-in of 20, 000 takes approximately 10 hours on a Linux server.
We also estimated the Monte Carlo standard error which is defined by Flegal, et al. (2008) as equation M51, where n is the total of iterations, and n = ab, where a is the number of blocks and b is the block size, and where
equation M52
The batch means estimate of equation M53 is
equation M54
The ratio of the Monte Carlo standard error to the estimated standard deviation of the estimated parameters averages 3.4% for Σu and 1.7% for β.
Because of the public health importance of the problem, the National Cancer Institute has contracted for the creation of a SAS program that performs our analysis. It will allow any number of episodically and daily consumed dietary components. The first draft of this program, written independently in a different programming language, gives almost identical results to what we have obtained, at least suggesting that our results are not the product of a programming error.
7.1. Transformations
In Section A.12, we describe how we estimated the transformation parameters as a separate component-wise calculation. We have done some analyses where we simultaneously transform each component, and found very little difference with our results. However, the computing time to implement this is extremely high, because of the fact that different transformations make data on different scales, so we have to compute the usual intakes at each step in the MCMC, and not just at the end.
7.2. What Have We Learned That Is New
There are many important questions in dietary assessment that have not been able to be answered because of a lack of multivariate models for complex, zero-inflated data with measurement errors and a lack of ability to fit such multivariate models. Nutrients and foods are not consumed in isolation, but rather as part of a broader pattern of eating. There is reason to believe that these various dietary components interact with one another in their effect on health, sometimes working synergistically and sometimes in opposition. Nonetheless, simply characterizing various patterns of eating has presented enormous statistical challenge. Until now, descriptive statistics on the HEI-2005 have been limited to examination of either the total scores or only a single energy-adjusted component at a time. This has precluded characterization of various patterns of dietary quality as well as any subsequent analyses of how such patterns might relate to health.
This methodology presented in this paper presents a workable solution to these problems which has already proven valuable. In May 2010, just as we were submitting the paper, a White House Task Force on Childhood Obesity created a report. They had wanted to set a goal of all children having a total HEI score of 80 or more by 2030, but when they learned we estimated only 10% of the children ages 2-8 had a score of 66 or higher, they decided to set a more realistic target. The facility to estimate distributions of the multiple component scores simultaneously will be important in tracking progress toward that goal.
7.3. In What Other Arenas Will Our Work Have Impact?
There are many other important problems where multivariate models such as ours will be important. One such problem arises when studying the relationship between multiple dietary components or dietary patterns and health outcomes. Traditionally, for cost reasons, large cohort studies have used a food frequency questionnaire (FFQ) to measure dietary intake, sometimes with a small calibration study including short-term measures such as 24hr recalls. However, there is a new web-based instrument called the Automated Self-administered 24-hour Dietary Recall (ASA24™), see http://riskfactor.cancer.gov/tools/instruments/asa24, which has been proposed to replace or at least supplement the FFQ and which is currently undergoing extensive testing. The dietary data we will see then is what we have called Yijk, i.e., 24hr recall data. In order to correct relative risk estimates for the measurement error inherent in the ASA24™, regression calibration (Carroll, et al., 2006) will almost certainly be the method of choice, as it is in most of nutritional epidemiology. This method attempts to produce an estimate of the regression of usual intake on the observed intakes, and then to use these estimates in Cox and logistic regression for the health outcome. In order to perform this regression, a multivariate measurement error model will be required, since the regression is on all the observed dietary intake components in the regression model measured by the ASA24™, and not on each individual component. Our methodology is easily extended to address this problem.
Supplementary Material
Supplement
ACKNOWLEDGMENTS
This paper forms part of Zhang's Ph.D. dissertation at Texas A&M University. Zhang and Carroll's research was supported by a grant from the National Cancer Institute (CA57030). This work was also supported by National Science Foundation Instrumentation grant number 0922866.
APPENDIX A: DETAILS OF THE FITTING PROCEDURE
In this Appendix we give the full details of the model fitting procedure.
A.1. Notational Convention
In our example, age was standardized to have mean 0.0 and variance 1.0, to improve numerical stability.
As described in Section 3.1, the observed, transformed non-zero 24hr recalls were standardized to have mean 0.0 and variance 2.0. More precisely, for equation M55, we first transformed the non-zero food group data as equation M56, and then we standardized these data as equation M57, where equation M58 are the mean and standard deviation of the non-zero food intakes equation M59. Similarly, for non-episodically consumed dietary components and energy we transformed to equation M60 for equation M61, and then standardized to equation M62. Of course, whether the food group is consumed or not is equation M63 for equation M64. Collected, the data are equation M65. The terms equation M66 are not random variables but are merely constants used for standardization, and we need not consider inference for them. Back-transformation is discussed in Appendix A.11.
A.2. Prior Distributions
Because the data were standardized, we used the following conventions.
  • The prior for all βj were normal with mean zero and variance 100.
  • The prior for Σu was exchangeable with diagonal entries all equal to 1.0 and correlations all equal to 0.50. There were 21 degrees of freedom in the inverse Wishart prior, i.e., mu = 21. Thus, the prior is IW{(mu – 19 – 1)Σu,prior, mu}. We experimented with this prior by using zero correlation, and the results were essentially unchanged.
  • The prior for rk is Uniform[-1, 1]. Set the initial value: rk = 0, k = 1, . . . , 5.
  • The prior for θk is Uniform[–π, π]. Set the initial value: θk = 0, k = 1, . . . , 25.
  • The priors for v22, v44, . . . , v12,12 and v13,13, . . . , v19,19 were Uniform[-3,3]. Set the initial values: v22 = v44 = . . . = v12,12 = v13,13 = . . . = v19,19 = 1.
  • For the rest of the non-diagonal vij's which could not be determined by the restrictions, we used Uniform[-3,3] priors. Set the initial values to be 0.
The constraints on Σε are nonlinear, and our parameterization enforces them easily without having to have prior distributions for the original parameterization that satisfy the nonlinear constraints.
The key thing that makes things work well with the other components of the matrix V with Σε = VVT is that we have standardized the data as described in Section A.1. With this standardization, things become much nicer. For example, the variance of the ε's for energy is equation M67. However, since the sample variance for energy is standardized to equal 2.0, we simply just need to make priors for v19,j be uniform on a modest range to have real flexibility.
A.3. Generating Starting Values for the Latent Variables
While we observe Qik, in the MCMC we need to generate starting values for the latent variables equation M68 to initiate the MCMC.
  • For nutrients and energy, Qijk = Wijk, no data need be generated, j = 13, . . . , 19.
  • For the amounts, Qi2k, Qi4k, Qi6k, Qi8k, Qi,10,k and Qi,12,k, we set Wi2k = Qi2k, Wi4k = Qi4k, Wi6k = Qi6k, Wi8k = Qi8k, Wi,10,k = Qi,10,k and Wi,12,k = Qi,12,k.
  • For consumption, we generate Ũi as normally distributed with mean zero and covariance matrix given as the prior covariance matrix for Σu. For equation M69, we also compute equation M70, where equation M71 are generated independently. We then set equation M72.
  • Finally, we then updated Wik by a single application of the updates given in Appendix A.9.
A.4. Complete Data Loglikelihood
Let J = 19. The complete data include the indicators of whether a food was consumed, the W variables, and the random effect U variables. The loglikelihood of the complete data is
equation M73
We used Gibbs sampling to update this complete data loglikelihood, the details for which are given in subsequent appendices. The weights wi are integers and are used here in a pseudo-likelihood fashion. One can also think of this as expanding each individual into wi individuals, each with the same observed data but different latent variables. For computational convenience, since we are only asking for a frequentist estimator and not doing full Bayesian inference, the latent variables in the process are generated once for each individual. Estimates of Σu, Σε and βj for j = 1, ..., J were computed as the means from the Gibbs samples. Once again, we emphasize that we are not doing a proper Bayesian analysis, but only using MCMC techniques to obtain a frequentist estimate, with uncertainty assessed using the frequentist BRR method.
A.5. Complete Conditionals for rq, θq and vpq
Except for irrelevant constants, the complete conditional for rq (q = 1, . . . , 5) is
equation M74
Except for irrelevant constants, the complete conditionals for vqq (q = 2, 4, 6, 8, 10, 12, 13, . . . , 19) are
equation M75
Except for irrelevant constants, the compete conditionals for θq, (q = 1, . . . , 25) and non-diagonal free parameters vpq are
equation M76
The full conditionals do not have an explicit form, so we use a Metropolis-Hastings within a Gibbs sampler to generate it.
  • rq (q = 1, . . . , 5)
    • We discretize the values of rq to the set {–0.99 + 2 × 0.99(j – 1)/(M – 1)}, where j = 1, ..., M and we choose M = 41.
    • Proposal: The current value is rq,t. The proposed value of rq,t+1 is selected randomly from the current value and the two nearest neighbors of rq,t. Then rq,t+1 is accepted with probability min{1, g(rq,t+1)/g(rq,t), where
      equation M77
      where here and in what follows, for any A, equation M78.
  • θq (q = 1, . . . , 25)
    • We discretize similarly as above.
    • Proposal: The current value is θq,t. The proposed value θq,t+1 is selected randomly from the current value and the two nearest neighbors of θq,t. Then θq,t+1 is accepted with probability min{1, g(θq,t+1)/g(θq,t)}, where
      equation M79
  • vqq (q = 2, 4, 6, 8, 10, 12, 13, . . . , 19)
    • Proposal: The current value is vqq,t. A candidate vqq,t+1 is generated from the Uniform distribution of length 0.4 with mean vqq,t. The candidate value vqq,t+1 is accepted with probability min{1, g(vqq,t+1)/g(vqq,t)}, where
      equation M80
  • non-diagonal free parameters vpq
    • Proposal: The current value is vpq,t. The candidate value vpq,t+1 is generated from the Uniform distribution of length 0.4 with mean vpq,t. The candidate value is accepted with probability min{1, g(vpq,t+1)/g(vpq,t)}, where
      equation M81
A.6. Complete Conditionals for Σu
The dimension of the covariance matrices is J = 19. By inspection, the complete conditional for Σu is
equation M82
where here IW = the Inverse-Wishart distribution. The density of IW(Ω, m) for a J × J random variable is
equation M83
This has expectation Ω/(mJ – 1).
A.7. Complete Conditionals for β
Let the elements of equation M84 be equation M85. For any j, except for irrelevant constants,
equation M86
which implies equation M87, where
equation M88
A.8. Complete Conditionals for Ũi
The NHANES 2001-2004 weights are integers, representing the number of children that each sampled child represents. Thus, as described therein, the loglikelihood in Section A.4 could also be rewritten equivalently by developing wi pseudo-children, each with the same observed data values. It thus does not make sense to use the weights to generate an individual Ũi. Instead, as described in Section A.4, for computational convenience for generating a Ũi to represent wi children, we set the weight for that child temporarily = 1.0. Then, except for irrelevant constants,
equation M89
Remembering that for purposes of this section we are setting wi = 1.0, this implies that equation M90, where
equation M91
A.9. Complete Conditional for equation M92, equation M93, 3, 5, 7, 9, 11
Here we do the complete conditional for equation M94 with equation M95, 3, 5, 7, 9, 11. Except for irrelevant constants,
equation M96
where, using the convention of Section A.8,
equation M97
If we use the notation TN+(μ, σ, c) for a normal random variable with mean μ and standard deviation σ that is truncated from the left at c, and similarly use TN(μ, σ, c) when truncation is from the right at c, then it follows that with equation M98 and equation M99,
equation M100
Generating TN+(0, 1, c) is easy: if c < 0, simply do rejection sampling of a Normal(0, 1) until you get one that is > c. If c > 0, there is an adaptive rejection scheme (Robert, 1995).
A.10. Complete Conditionals for Wi2k, Wi4k, Wi6k, Wi8k, Wi,10,k and Wi,12,k When Not Observed
For p = 2, 4, 6, 8, 10, 12, the variable Wipk is not observed when Qi,p–1,k = 0, or, equivalently, when Wi,p–1,k < 0. Except for irrelevant constants,
equation M101
where, using the convention of Section A.8,
equation M102
Therefore,
equation M103
A.11. Usual Intake, Standardization and Transformation
Here we present detailed formulas for functions defined in Section 3.4. When λ = 0, the back-transformation is
equation M104
When λ ≠ 0, the back-transformation is
equation M105
A.12. Transformation Estimation
As part of an earlier project (Freedman, et al., 2009), we estimated the transformations for one food/nutrient at a time using the method of Kipnis, et al. (2009), both for the data and also for each BRR weighted data set. To facilitate comparison with the one food/nutrient at a time analysis, in our analysis of all HEI-2005 components, we used these transformations as well. Of course, our methods can be generalized to allow for estimation of the transformations as well. By allowing a different transformation for each BRR weighted data set, we have captured the variation due to estimation of the transformations.
SUPPLEMENTARY MATERIAL
Included in the supplementary materials (Zhang, et al., 2011) are (a) additional tables in a pdf file; (b) data files of the NHANES data used in the analysis; and (c) Matlab programs for the data analysis. (http://???/???). ???
Contributor Information
Saijuan Zhang, Department of Statistics Texas A&M University 3143 TAMU College Station, Texas 77843-3143 U.S.A.
Douglas Midthune, Biometry Research Group Division of Cancer Prevention National Cancer Institute 6130 Executive Boulevard EPN-3131 Bethesda, Maryland 20892-7354 U.S.A.
Patricia M. Guenther, Center for Nutrition Policy and Promotion U.S. Department of Agriculture 3101 Park Center Drive, Ste. 1034 Alexandria, Virginia 22302 U.S.A. Patricia.Guenther/at/cnpp.usda.gov.
Susan M. Krebs-Smith, Applied Research Program Division of Cancer Control and Population Sciences National Cancer Institute 6130 Executive Boulevard, EPN-4005 Bethesda, Maryland 20892, U.S.A. krebssms/at/mail.nih.gov.
Victor Kipnis, Biometry Research Group Division of Cancer Prevention National Cancer Institute 6130 Executive Boulevard EPN-3131 Bethesda, Maryland 20892-7354 U.S.A.
Kevin W. Dodd, Biometry Research Group Division of Cancer Prevention National Cancer Institute 6130 Executive Boulevard EPN-3131 Bethesda, Maryland 20892-7354 U.S.A.
Dennis W. Buckman, Information Management Services, Inc. 12501 Prosperity Drive Silver Spring, Maryland 20904, U.S.A. BuckmanD/at/imsweb.com.
Janet A. Tooze, Department of Biostatistical Sciences Wake Forest University, School of Medicine Medical Center Boulevard Winston-Salem, North Carolina 27157, U.S.A. jtooze/at/wfubmc.edu.
Laurence Freedman, Gertner Institute for Epidemiology and Health Policy Research Sheba Medical Center Tel Hashomer 52161, Israel ; lsf/at/actcom.co.il.
Raymond J. Carroll, Department of Statistics Texas A&M University 3143 TAMU College Station, Texas 77843-3143 U.S.A.
  • Buonaccorsi J. Measurement Error: Models, Methods and Applications. Chapman and Hall/CVRC Press; 2010.
  • Carriquiry AL. Assessing the prevalence of nutrient inadequacy. Public Health Nutrition. 1999;2:23–33. [PubMed]
  • Carriquiry AL. Estimation of usual intake distributions of nutrients and foods. Journal of Nutrition. 2003;133:601–608. [PubMed]
  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective. Second Edition Chapman and Hall CRC Press; 2006.
  • Delaigle A. An alternative view of the deconvolution problem. Statistica Sinica. 2008;18:1025–1045.
  • Delaigle A, Hall P. Estimation of observation-error variance in errors-in-variables regression. Statistica Sinica. 2010 to appear.
  • Delaigle A, Hall P. Using SIMEX for smoothing-parameter choice in errors-in-variables problems. Journal of the American Statistical Association. 2008;103:280–287.
  • Delaigle A, Hall P, Meister A. On deconvolution with repeated measurements. Annals of Statistics. 2008;36:665–685.
  • Delaigle A, Meister A. Density estimation with heteroscedastic error. Bernoulli. 2008;14:562–579.
  • Ferrari P, Roddam A, Fahey MT, Jenab M, Bamia C, Ock M, Amiano P, Hjartker A, Biessy C, Rinaldi S, Huybrechts I, Tjnneland A, Dethlefsen C, Niravong M, Clavel-Chapelon F, Linseisen J, Boeing H, Oikonomou E, Orfanos P, Palli D, Santucci de Magistris M, Bueno-de-Mesquita HB, Peeters PH, Parr CL, Braaten T, Dorronsoro M, Berenguer T, Gullberg B, Johansson I, Welch AA, Riboli E, Bingham S, Slimani N. A bivariate measurement error model for nitrogen and potassium intakes to evaluate the performance of regression calibration in the European Prospective Investigation into Cancer and Nutrition study. European Journal of Clinical Nutrition. 2009;63(Supplement 4):S179–187. [PubMed]
  • Flegal JM, Haran M, Jones GL. Markov Chain Monte Carlo: can we trust the third significant figure? Statistical Science. 2008;23:250–260.
  • Fraser GE, Shavlik DJ. Correlations between estimated and true dietary intakes. Annals of Epidemiology. 2004;14:287–95. [PubMed]
  • Freedman LS, Guenther PM, Krebs-Smith SM, Dodd KW, Midthune D. A population's distribution of Healthy Eating Index-2005 component scores can be estimated when more than one 24-hour recall is available. Journal of Nutrition. 2010;140:1529–1534. [PubMed]
  • Fuller WA. Measurement Error Models. Wiley; New York: 1987.
  • Fungwe T, Guenther PM, Juan WY, Hiza H, Lino M. Nutrition Insight. Vol. 43. USDA Center for Nutrition Policy and Promotion; 2009. The quality of children's diets in 2003-04 as measured by the Healthy Eating Index-2005.
  • Guolo A. A flexible approach to measurement error correction in casecontrol studies. Biometrics. 2008;64:1207–1214. [PubMed]
  • Gustafson P. Measurement Error and Misclassi cation in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Chapman and Hall/CRC Press; 2003.
  • Guenther PM, Reedy J, Krebs-Smith SM. Development of the Healthy Eating Index-2005. Journal of the American Dietetic Association. 2008a;108:1896–1901. [PubMed]
  • Guenther PM, Reedy J, Krebs-Smith SM, Reeve BB. Evaluation of the Healthy Eating Index-2005. Journal of the American Dietetic Association. 2008b;108:1854–1864. [PubMed]
  • Kipnis V, Midthune D, Buckman DW, Dodd KW, Guenther PM, Krebs-Smith SM, Subar AF, Tooze JA, Carroll RJ, Freedman LS. Modeling data with excess zeros and measurement error: application to evaluating relationships between episodically consumed foods and health outcomes. Biometrics. 2009;65:1003–1010. [PMC free article] [PubMed]
  • Kipnis V, Freedman LS, Carroll RJ, Midthune D. A measurement error model for episodically consumed foods and energy. 2010. Preprint.
  • Kott PS, Guenther PM, Wagstaff DA, Juan WY, Kranz S. Fitting a linear model to survey data when the long-term average daily intake of a dietary component is an explanatory variable. Survey Research Methods. 2009;3(3):157–165.
  • Küchenhoff H, Mwalili SM, Lesaffre E. A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics. 2006;62:85–96. [PubMed]
  • Lehmann EL, Casella G. Theory of Point Estimation. Springer; New York: 1998.
  • Liang H, Thurston S, Ruppert D, Apanasovich T, Hauser R. Additive partial linear models with measurement errors. Biometrika. 2008;95:667–678.
  • Messer K, Natarajan L. Maximum likelihood, multiple imputation and regression calibration for measurement error adjustment. Statistics in Medicine. 2008;27:6332–6350. [PMC free article] [PubMed]
  • Natarajan L. Regression Calibration for Dichotomized Mismeasured Predictors. International Journal of Biostatistics. 2009;5:nihpa121098. [PMC free article] [PubMed]
  • Nusser SM, Carriquiry AL, Dodd KW, Fuller WA. A semiparametric approach to estimating usual intake distributions. Journal of the American Statistical Association. 1996;91:1440–1449.
  • Nusser SM, Fuller WA, Guenther PM. Estimating usual dietary intake distributions: Adjusting for measurement error and non-normality in 24-hour food intake data. In: Lyberg L, Biemer P, Collins M, Deleeuw E, Dippo C, Schwartz N, Trewin D, editors. Survey Measurement and Process Quality. Wiley; New York: 1997. 1997. pp. 670–689.
  • Prentice RL. Measurement error and results from analytic epidemiology: dietary fat and breast cancer. Journal of the National Cancer Institute. 1996;88:1738–47. [PubMed]
  • Prentice RL. Dietary assessment and the reliability of nutritional epidemiology reports. Lancet. 2003;362:182–183. [PubMed]
  • Staudenmayer J, Ruppert D, Buonaccorsi JP. Density estimation in the presence of heteroskedastic measurement error. Journal of the American Statistical Association. 2008;103:726–736.
  • Tooze JA, Grunwald GK, Jones RH. Analysis of repeated measures data clumping at zero. Statistical Methods in Medical Research. 2002;11:341–355. [PubMed]
  • Tooze JA, Midthune D, Dodd KW, Freedman LS, Krebs-Smith SM, Subar AF, Guenther PM, Carroll RJ, Kipnis V. A new statistical method for estimating the distribution of usual intake of episodically consumed foods. Journal of the American Dietetic Association. 2006;106:1575–1587. [PMC free article] [PubMed]
  • Wand MP. Finite sample performance of deconvolving kernel density estimators. Statistics and Probability Letters. 1998;37:131–139.
  • Wolter KM. Introduction to Variance Estimation. Springer-Verlag; New York: 1995.
  • Zhang S, Midthune D, Pérez A, Buckman DW, Kipnis V, Freedman LS, Dodd KW, Krebs-Smith SM, Carroll RJ. A bivariate measurement error model for episodically consumed dietary components. 2010. [PMC free article] [PubMed]
  • Zhang S, Midthune D, Guenther PM, Krebs-Smith SM, Kipnis V, Dodd KW, Buckman DW, Tooze JA, Freedman LS, Carroll RJ. Supplement to “A new multivariate measurement error model with zero-inflated dietary data, and its application to dietary assessment”. 2011. [PMC free article] [PubMed]