Search tips
Search criteria 


Logo of amjepidLink to Publisher's site
Am J Epidemiol. 2016 September 15; 184(6): 465–476.
Published online 2016 September 9. doi:  10.1093/aje/kww010
PMCID: PMC5023789

Instrumental-Variables Simultaneous Equations Model of Physical Activity and Body Mass Index

The Coronary Artery Risk Development in Young Adults (CARDIA) Study


We used full-system-estimation instrumental-variables simultaneous equations modeling (IV-SEM) to examine physical activity relative to body mass index (BMI; weight (kg)/height (m)2) using 25 years of data (1985/1986 to 2010/2011) from the Coronary Artery Risk Development in Young Adults (CARDIA) Study (n = 5,115; ages 18–30 years at enrollment). Neighborhood environment and sociodemographic instruments were used to characterize physical activity, fast-food consumption, smoking, alcohol consumption, marriage, and childbearing (women) and to predict BMI using semiparametric full-information maximum likelihood estimation to control for unobserved time-invariant and time-varying residual confounding and differential measurement error through model-derived discrete random effects. Comparing robust-variance ordinary least squares, random-effects regression, fixed-effects regression, single-equation-estimation IV-SEM, and full-system-estimation IV-SEM, estimates from random- and fixed-effects models and the full-system-estimation IV-SEM were unexpectedly similar, despite the lack of control for residual confounding with the random-effects estimator. Ordinary least squares tended to overstate the significance of health behaviors in BMI, while results from single-equation-estimation IV-SEM were notably different, revealing the impact of weak instruments in standard instrumental-variable methods. Our robust findings for fixed effects (which does not require instruments but has a high cost in lost degrees of freedom) and full-system-estimation IV-SEM (vs. standard IV-SEM) demonstrate potential for a full-system-estimation IV-SEM method even with weak instruments.

Keywords: body mass index, endogeneity, epidemiologic methods, fixed effects, health behaviors, instrumental variables, semiparametric methods, simultaneous equations

A critical challenge in observational studies of the influence of health behaviors, such as physical activity, on body weight (1, 2) is the potential for residual confounding due to omitted variables or differential measurement error (referred to in econometrics as endogeneity) (38). The assessment of diet and physical activity is known to be susceptible to error, which may be differential by body weight (9, 10); more generally, confounding may reflect difficult-to-measure innate characteristics, such as underlying health consciousness (3, 11).

Standard epidemiologic analyses of health behaviors and body mass index (BMI) are susceptible to residual confounding due to observed variables and differential measurement error. In the current study, our goal was to estimate the effect of physical activity on BMI, while accounting for these biases. We addressed the potential for residual confounding and differential measurement error in our analysis with an instrumental-variables simultaneous equations modeling (IV-SEM) approach commonly used in econometric studies using longitudinal data (8, 1113).

Econometric approaches to causal inference have been employed in the epidemiologic literature (1418) but have typically been limited to single-equation systems (11), such as the role of a specific genetic variant in an outcome (19), or systems models fitted using single-equation methods (11), such as 2-stage least squares (20). Our approach has several enhancements over standard instrumental-variable (IV) methods, including joint estimation of an entire system of equations that accounts for unmeasured confounding and differential measurement error in the analysis of multiple BMI risk factors, and the use of semiparametric and nonlinear estimation. Our method has been shown to perform better than linear IV methods even in the presence of weak instruments—a well-known challenge for IV methods (21, 22). For these reasons, we consider our approach a valuable addition to the IV methodological framework considered by epidemiologists.

We estimated the effect of physical activity on BMI using 25 years of data from an established prospective cohort study, the Coronary Artery Risk Development in Young Adults (CARDIA) Study, with clinic-assessed, time-varying measures of weight and height, weight-related health behaviors, and other relevant variables, as well as an extensive set of community-level variables hypothesized to influence BMI-related risk factors but not BMI directly, which served as IVs (11, 23). These data provided the necessary components for an IV-SEM with which to estimate the effect of physical activity on BMI while accounting for other health behaviors, including diet, smoking, alcohol consumption, and marital status, as well as residual confounding and differential measurement error. As a secondary aim, we examined the extent to which our estimates differed from those obtained via other modeling approaches. We therefore compared estimates from our model with those from standard ordinary least squares (OLS) with robust variance, longitudinal random-effects and fixed-effects regression models, and single-equation-estimation IV-SEM.


CARDIA sample

CARDIA is a multicenter, longitudinal study of cardiometabolic risk factors (24). The study began in 1985–1986 with 5,115 black and white adults aged 18–30 years sampled from 4 US metropolitan areas (Birmingham, Alabama; Chicago, Illinois; Minneapolis, Minnesota; and Oakland, California). Participant home addresses were geocoded at baseline (year 0) and at years 7, 10, 15, 20, and 25 of follow-up (respective retention in survivors: 81%, 79%, 74%, 72%, and 72%). The CARDIA protocol was approved by the institutional review board at each field center, and every participant provided informed written consent.

Individual-level measures

In standardized surveys, participants provided extensive demographic and socioeconomic information, including age, sex, race, education, income, marital status, the ages of any children, and, for women, pregnancy status. Participants reported their engagement in 13 physical activities, including walking, running, and cycling, from which activity-specific and total activity intensity scores were created (25). Fast-food consumption, smoking status, and alcohol consumption were assessed in all years; an interviewer-administered diet history was included at years 0 (baseline), 7, and 20 (26). At each examination, height and weight were measured by trained study staff to the nearest 0.5 cm and 0.2 kg, respectively. BMI was calculated as weight (kg)/height (m)2.

Neighborhood-level measures

Neighborhood measures temporally and geographically linked to participant home addresses served as instruments for identification of estimated effects of health behaviors on BMI. Briefly, neighborhood measures included the presence of food stores and restaurants, physical activity facilities, and parks; consumer price data; and features of the road network (details are provided in the Web Appendix, available at US Census data included population-level educational attainment and income for the census tract corresponding to participants’ home addresses at the time of the examination.

Statistical analysis

Figure Figure11 is an abbreviated causal diagram of our statistical model. We sought to estimate the effects of physical activity and other health behaviors (the vector HB) on BMI. We considered the potential for time-invariant and time-varying residual confounding and differential measurement error, µ, to cause bias in effect estimates, and we used a full-system-estimation IV-SEM to estimate the effects of health behaviors on BMI by using a set of IVs, V, to first identify variation in health behaviors. A complete list of model variables is shown in the Web Appendix and Web Table 1.

Figure 1.
Causal diagram for a full-system-estimation instrumental-variables simultaneous equations model created to examine physical activity in relation to body mass index (BMI). Time-varying BMI (BMIi,t) was predicted from time-varying physical activity and ...

Econometricians refer to the differential error that concerns us, µ, such as residual confounding, as “endogeneity due to unobserved heterogeneity,” where “unobserved heterogeneity” reflects individual heterogeneity in the outcome (here, BMI) that 1) is not explained by independent variables (predictors already included in the regression model) and 2) is correlated with independent variables. “Endogeneity” refers to variables that are determined within the model (i.e., are recipients of an inward-pointing arrow on the causal diagram); in contrast, “exogenous” variables are not determined by other variables in the system (i.e., have only outward-pointing arrows). Variables V and X in Figure Figure11 are exogenous. Both endogenous and exogenous variables influence other system variables. Here, endogenous health behaviors influence BMI; exogenous IVs, V, influence health behaviors but not BMI directly. There may be many instances where formal endogeneity will not fit within our causal framework, such as reverse causation, whereby BMI influences health behaviors. We acknowledge that the exact sources of unobserved heterogeneity quantified by our model-based approach cannot be distinguished.

Previous publications provide more details of our approach (2729). Additional details on system equations can be found in the Web Appendix. Briefly, in a series of first-stage regression equations, we used a set of IVs (and other exogenous variables, X (in Figure Figure1),1), distinguished from IVs by their possible direct influence on BMI) to predict physical activity and other health behaviors, after which we estimated the effect of health behaviors on BMI, accounting for endogeneity due to unobserved heterogeneity (reflecting residual confounding and differential measurement error).

In addition to the estimated effect of total physical activity on BMI, we estimated the effects of specific types of physical activities most relevant for our neighborhood environment variables: 1) walking (controlling for all 12 nonwalking physical activities) and 2) walking, running, and cycling (controlling for all 10 other physical activities). Physical activity component variables were obtained by summing intensity scores over the relevant activities. We hypothesized that walking, along with running and cycling, would be most affected by neighborhood features—such as the road network—as opposed to other activities, such as swimming, which may require dedicated facilities.

We modeled other BMI risk factors, including fast-food consumption, smoking, and alcohol consumption, as well as marital status and, among women, childbearing, that we considered endogenous variables both associated with physical activity and predictive of BMI. Our full-system-estimation IV-SEM allowed us to account for confounding due to unobserved selectivity of individuals engaging in these activities, as well as residual confounding and differential measurement error in endogenous covariates themselves. Models were fitted separately for men and women to allow us to control for the selectivity of women who were pregnant at the time of each examination and control for childbearing history.

We considered, but did not explicitly control for, overall diet quality in our final models because diet quality (modeled as a score reflecting comprehensive food consumption) was not predictive of BMI in multivariable-adjusting models. Further, diet was assessed in only 3 examination periods, and this would have limited the analysis to a fraction of the study data. Any unobserved heterogeneity due to diet will be controlled in IV-SEM models and, to the extent that diet is stable within-person, in fixed-effects regression.

The general specification for the set of first-stage equations is shown below.


Health behavior (HB) was modeled as a function of strictly exogenous explanatory variables, not influenced by other components within our model system, including a vector of IVs (V)—which influence BMI through health behaviors, but do not directly influence BMI—and a vector of other exogenous variables (X)—which may influence health behaviors as well as BMI. Health behaviors, exogenous variables, and random error (ϵ) were time-varying. Estimated effects of exogenous variables, V and X, are reflected by the regression coefficients γ and δ, respectively, for the difference in health behavior per unit change in V or X. In addition, our model included time-invariant and time-varying error components reflective of residual confounding and differential measurement error (μ). Each health behavior (endogenous BMI predictor) was modeled in a set of first-stage equations, with the full set repeated for each of the 3 physical activity specifications among men and women. Each first-stage equation was “overidentified,” meaning the number of instruments exceeded the number of health behaviors, and comprised the same set of independent variables (11).

The final equation in the model estimated BMI from physical activity and other health behaviors (HB) and a set of exogenous covariates (X) that were considered to be associated with BMI (e.g., age, race, education, and income) beyond that accounted for by our set of modeled health behaviors. Note that this is the same set of X exogenous variables included in the first-stage equations.


Estimated effects of endogenous health behaviors (HB) and exogenous (X) independent variables are reflected by the regression coefficients β and γ, respectively, for the difference in BMI per unit change in the independent variable. The complete set of system equations is shown in the Web Appendix.

As with all IV approaches, our method relies on having a set of IVs that are 1) substantively relevant, 2) predictive of endogenous health behaviors, 3) exogenous—not influenced by other system variables, and 4) not directly related to BMI (and can be excluded from the BMI model). We based our causal model on substantive considerations. In particular, the selection of IVs was guided by published findings on the role of the built environment in health behaviors (3034). We included an assessment of the strength of our full set of IVs to predict endogenous health behaviors with F tests of first-stage models. The set of IVs, noninstrument exogenous variables, and the endogenous health behaviors included in the model are further described in the Web material (Web Appendix and Web Table 1).

We compared our full-system-estimation IV-SEM with several other model-based approaches available in Stata software (StataCorp LP, College Station, Texas; see Table Table1),1), including OLS regression with robust variance estimation (Stata's -regress- command), random-effects regression (-xtreg- command, “re” option), fixed-effects regression (-xtreg-, “fe” option), and single-equation-estimation IV-SEM (-ivregress-, “gmm” option). OLS and random-effects regression assume that there is no residual confounding or differential measurement error, although the random effects allow for individual variability in random errors. In contrast, differential unobserved heterogeneity is controlled in fixed-effects regression (time-invariant sources only) and single-equation-estimation IV-SEM (both time-invariant and time-varying sources). Fixed-effects regression can suffer from low statistical power due to loss of degrees of freedom (individual-level differencing) and a lack of within-person variability in the exposure of interest, and it does not allow estimation of time-invariant predictors.

Table 1.
Key Attributes That Differentiate Regression Approaches

Single-equation-estimation IV-SEM methods are less efficient than full-information IV-SEM methods. In addition, single-equation linear estimators (as in -ivregress-) are particularly susceptible to weak instruments; our method allows nonlinear estimation and has been shown to be robust in the presence of weak instruments (21, 22). We use 2 specifications of our full-system-estimation IV-SEM: 1) accounting for endogeneity due to time-invariant unobserved heterogeneity only and 2) accounting for time-varying unobserved heterogeneity as well. Our model is more flexible parametrically than other IV approaches, in terms of the availability of nonlinear functional forms for model equations and the use of the discrete factor method for modeling endogeneity due to unobserved heterogeneity. (The discrete factor method is further described in the Web Appendix.) In the present analysis, we assumed constant effect estimates over time and with respect to participant characteristics (no interaction).

We conducted analyses on 5,112 participants with data collected over 6 examination periods. Three participants were excluded from the original sample of 5,115 (1 dropped out of the study and 2 changed sex). Because of follow-up losses, there were 4,010 participants at year 7; 3,947 at year 10; 3,670 at year 15; 3,548 at year 20; and 3,497 at year 25 (23,858 observations in total). We tested the influence of loss to follow-up with sensitivity analysis using inverse probability weighting by examination participation; these models are not presented, as results did not differ from those of unadjusted models. There were complete data on community-level indicators and participant age, sex, and race. We used regression prediction models to fill in missing individual-level data (described in the Web Appendix and Web Table 2). We used an α level of 0.05 for statistical significance. We used Stata (version 13) and Fortran (Intel Fortran Compiler; Intel Corporation, Santa Clara, California) for all analyses.


Over the 25-year study period, smoking and fast-food consumption declined among both men and women, and engagement in physical activity declined among men, with the exception of walking, which remained stable (Table (Table2).2). Changes in physical activity were less clear among women.

Table 2.
Sex-Specific Descriptive Statistics for Study Participants Over the Course of the Study Period,a CARDIA Study, 1985–1986 to 2010–2011

Table Table33 presents results from sex-specific model F tests for each of the 3 physical activity specifications, reflecting the strength of the full set of IVs in predicting endogenous health behaviors. F test statistics were small and well below recommended thresholds (e.g., ≥10) (35) for considering the models strongly identified; that is, our models suffered from “weak” instruments. Even in the presence of weak instruments, P values for first-stage model F tests were statistically significant for all models except smoking among men.

Table 3.
Fit Statistics for Each of the 3 Sex-Specific Physical Activity Specifications of the Instrumental-Variables Simultaneous Equations Model, CARDIA Study, 1985–1986 to 2010–2011

With the exception of generalized method of moments (GMM) IV-SEM, model estimates were surprisingly similar across the 6 modeling approaches (Table (Table4).4). The similarity of point estimates from approaches that correct (fixed effects and full-information maximum likelihood (FIML) IV-SEM) and do not correct (OLS and random effects) for endogeneity due to unobserved heterogeneity supports a lack of residual confounding or differential measurement error in multivariable-adjusted models, which is contrary to our expectation and unlikely to hold in general. Of the 2 FIML IV-SEMs, those that accounted for time-invariant unobserved heterogeneity only generally had a higher value for the log-likelihood function.

Table 4.
Regression Coefficients (β) for the Effects of Physical Activity and Other Health Behaviors on Body Mass Index,a CARDIA Study, 1985–1986 to 2010–2011b

Point estimates from the GMM IV-SEM were frequently substantively different from those of other models, as well as much less precise (Table (Table4).4). As an example, in both men and women, all models supported a negative association between smoking and BMI, with the exception of the GMM IV-SEM, which yielded a positive (albeit statistically nonsignificant) estimate. Current smokers had a lower BMI than nonsmokers (men: −0.88 (95% confidence interval (CI): −1.49, −0.27); women: −0.61 (95% CI: −1.18, −0.05)), based on the FIML IV-SEM accounting for time-invariant unobserved heterogeneity (10th column of Table Table4).4). Among men, some GMM IV-SEM estimates were consistent with models that did not account for unobserved heterogeneity; for example, there was an apparent negative association between alcohol consumption and BMI based on OLS, random effects, and GMM IV-SEM, which was not supported in fixed-effects regression and FIML IV-SEM.

Aside from GMM IV-SEM, model estimates were generally consistent with expectation in both men and women (Table (Table4).4). Physical activity and smoking were negatively associated with BMI, while fast-food consumption and marriage were positively associated with BMI. Among women, but less consistently among men, alcohol consumption was negatively associated with BMI. Although we did not formally examine effect measure modification by sex, the variables alcohol consumption, physical activity, and fast-food consumption appeared to be more strongly predictive of BMI in women than in men. For example, among women, the BMI differences associated with a 100-unit increase in total physical activity were −0.18 (95% CI: −0.23, −0.12) and −0.27 (95% CI: −0.39, −0.16) in the FIML IV-SEM models accounting for, respectively, time-invariant unobserved heterogeneity and time-invariant and time-varying unobserved heterogeneity, as compared with −0.09 (95% CI: −0.16, −0.02) and −0.09 (95% CI: −0.17, −0.01), respectively, among men.


We have presented results from a semiparametric full-system-estimation approach to IV-SEM analysis (FIML IV-SEM) of BMI over 25 years of follow-up in the CARDIA cohort. Using a system of equations to predict BMI as a function of weight-related behaviors, we corrected for residual confounding or differential measurement error (endogeneity due to unobserved heterogeneity). Substantively, our findings were consistent with expectation, with BMI being negatively associated with physical activity and smoking and positively associated with fast-food consumption. In addition, marriage was positively associated with BMI, and alcohol consumption was negatively associated with BMI. Estimates from models that adjusted and did not adjust for unobserved heterogeneity were generally similar, indicating a relative lack of endogeneity after controlling for observed covariates, with the exception of OLS regression, which tended to overstate the significance of health behaviors in influencing BMI. Effect estimates from a single-equation-estimation approach to IV-SEM analysis (GMM IV-SEM) were notably different from other estimates, reflecting the challenge of using standard linear IV methods in the presence of weak instruments.

We hypothesized greater differences across modeling approaches, and in particular we did not anticipate marked similarity between estimates that did not account for residual confounding and differential measurement error (OLS and random effects) and those that did (fixed effects and FIML IV-SEM). The largest differences were observed for GMM IV-SEM regression estimates, which were also the least precise. The differences between GMM and FIML IV-SEM approaches are especially noteworthy, as GMM IV-SEM is available in standard statistical software and more accessible to researchers. Differences between GMM and FIML IV-SEM estimates probably reflect FIML's allowance of nonlinear estimators and our use of the discrete factor method (21, 22). GMM IV-SEM is a single-equation-estimation approach (fitting within the 2-stage least squares framework) and lacks the efficiency of full-system estimators, which not only account for unmeasured confounding and differential measurement error (unobserved heterogeneity) but use that information for more precise parameter estimation.

IV methods are known to be sensitive to weak identification. Our comparison illustrates the potential for severe bias of IV approaches in the absence of strong instruments. In contrast, the consistency of results from our full-system-estimation IV-SEM and other models, particularly fixed-effects regression, is illustrative of the robust nature of our approach in the presence of weak instruments. As was shown in previous work, our FIML IV-SEM has substantially stronger estimation performance in the presence of weak instruments (21, 22). Our results illustrate that, in the presence of weak instruments, standard IV approaches can be less preferable than a non-IV method, such as random- or fixed-effects regression.

In practice, several considerations are likely to drive the decision about which modeling approach to adopt, as outlined in Table Table1.1. Assuming that unmeasured confounding or differential measurement error is considered a threat to validity, fixed-effects models or IV methods can be used to account for such nonrandom unobserved heterogeneity. Fixed-effects models are limited to studies with more than 1 observation period, relatively large samples, and estimation of effects for exposures that change sufficiently over the observation period; in the absence of one or more of these features, IV methods can be considered. IV approaches require that valid instruments can be identified. In the presence of strong instruments, IV methods are generally robust to many parametric assumptions; in the presence of weak instruments, more robust methods (such as the FIML IV-SEM) may be necessary. Furthermore, as in any modeling exercise, results may be more or less sensitive to specification assumptions relating to the linearity of outcome variables or multivariate normality of random effects, and approaches that limit these assumptions will be preferable.

Our FIML IV-SEM estimates that accounted only for time-invariant unobserved heterogeneity were generally similar to those that accounted for both time-invariant and time-varying unobserved heterogeneity. Although it is conceptually preferable to account for time-varying unobserved heterogeneity, for several reasons we prefer the models that accounted for time-invariant unobserved heterogeneity only. First, when estimates differed meaningfully (such as all other physical activity in the walking/running/biking model among men), the results from the model that accounted for time-invariant unobserved heterogeneity only were more consistent with substantive expectation (in the exception noted, physical activity was inversely associated with BMI). In addition, the models with time-invariant unobserved heterogeneity accommodated an equal or greater number of levels for unobserved heterogeneity, which we would expect to improve our control for endogeneity. Finally, results from the time-invariant-only models were generally more precise.

Substantively, our results confirm the importance of physical activity, smoking, and fast-food consumption in shaping body mass (3638). The association between alcohol consumption and BMI has been equivocal in epidemiologic studies (39). In our heterogeneity-corrected analysis, alcohol consumption was negatively associated with BMI among women, as supported by prior work (40, 41), but not among men, suggesting possible sex differences. Our findings and those of others (42, 43) indicate that marriage is positively associated with BMI. The roles of alcohol consumption and marriage in BMI merit further study.

Although our method accounted for confounding due to omitted variables and differential measurement error, it may have residual bias due to residential selection. Community-level indicators served as IVs; however, the potential for informative residential selection is a recognized challenge in studies of neighborhood exposures and health (44, 45). We considered including community dummy variables as model covariates, but this proved infeasible due to the very large number of communities with small numbers of participants. Instead, we adjusted for baseline study center, hypothesizing that participant residence was independent of error components conditional on baseline study center. The similarity of estimates derived from our model and from fixed-effects regression indicates that selective migration due to unobserved time-invariant individual characteristics was not an appreciable source of bias.

The use of community-level indicators as IVs is common in the econometrics and epidemiologic literature but may have contributed to the weak identification. Only 1 community indicator, cigarette price, was directly related to cigarette smoking, which was the most poorly identified health behavior. IVs are assumed to be exogenous and, ideally, are strongly predictive of endogenous predictors but not directly predictive of the system outcome (BMI). Achieving this balance is the greatest challenge in IV analysis, and one's success in doing so cannot be directly tested. A major contribution of our method is the ability to obtain consistent estimates even in the presence of weak instruments, which allows us to focus on IVs less likely to be within the model system, such as community indicators. We note that weak identification may also reflect nondifferential measurement errors in our IVs.

Our model does not eliminate the possibility of bias, such as bias from model misspecification, but the robust findings across multiple regression approaches (aside from GMM IV-SEM) are supportive of causal estimates. In addition to our statistical modeling approach, a strength of our study was our extensive set of candidates for IVs from a comprehensive set of community-level data. Our finding of multiple levels of time-invariant and time-varying unobserved heterogeneity is a testament to our rich data set and highlights our ability to capture significant unobserved heterogeneity.

In conclusion, we used 25 years of CARDIA data to jointly model health behaviors and BMI. Our analysis yielded consistent estimates of the influences of diet, physical activity, smoking, and other variables on BMI, and it accounted for endogeneity due to unobserved heterogeneity stemming, we hypothesized, from residual confounding and differential measurement error. Our results confirm the importance of physical inactivity and fast-food consumption in relation to weight gain, and they support the development of policy and intervention efforts in these areas. In addition, our findings indicate that marital status and alcohol consumption may play underappreciated roles in body mass. The large differences between our full-system-estimation IV-SEM, as well as fixed- and random-effects regression, and the single-equation-estimation method that used standard statistical software (Stata) illustrate that, in the presence of weak instruments, a non-IV approach may be preferable. The potential for large bias when using simple IV models in the presence of weak instruments is added support for our method—for which estimates were robust across estimation approaches. This work contributes to the growing body of literature related to the use of IV methods in epidemiologic practice.

Supplementary Material

Web Material:


Author affiliations: Department of Nutrition, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina (Katie A. Meyer, Barry M. Popkin, Penny Gordon-Larsen); Department of Economics, College of Arts and Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina (David K. Guilkey); Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina (David K. Guilkey, Hsiao-Chuen Tien, Barry M. Popkin, Penny Gordon-Larsen); and Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, Massachusetts (Catarina I. Kiefe).

This work was funded by the National Heart, Lung, and Blood Institute (NHLBI) (grant R01HL104580). The Coronary Artery Risk Development in Young Adults (CARDIA) Study is supported by contracts HHSN268201300025C, HHSN268201300026C, HHSN268201300027C, HHSN268201300028C, HHSN268201300029C, and HHSN268200900041C from the NHLBI, the Intramural Research Program of the National Institute on Aging (NIA), and an intraagency agreement between the NIA and the NHLBI (grant AG0005). For general support, we are grateful to the Carolina Population Center, University of North Carolina at Chapel Hill (grant R24HD050924 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development), the Nutrition Obesity Research Center, University of North Carolina (grant P30DK56350 from the National Institute for Diabetes and Digestive and Kidney Diseases), and the UNC Center for Environmental Health and Susceptibility, University of North Carolina (grant P30ES010126 from the National Institute of Environmental Health Sciences).

The National Institutes of Health had no role in the design or conduct of the study; the collection, management, analysis, or interpretation of the data; or the preparation, review, or approval of the manuscript.

Conflict of interest: none declared.


1. Butland B, Jebb S, Kopelman P et al. Tackling Obesities: Future Choices. (Foresight Project report). London, United Kingdom: Government Office for Science; 2007. [PubMed]
2. Huang TT, Drewnosksi A, Kumanyika S et al. A systems-oriented multilevel framework for addressing obesity in the 21st century. Prev Chronic Dis. 2009;63:A82. [PMC free article] [PubMed]
3. Briscoe J, Akin J, Guilkey D People are not passive acceptors of threats to health: endogeneity and its consequences. Int J Epidemiol. 1990;191:147–153. [PubMed]
4. Chou SY, Grossman M, Saffer H An economic analysis of adult obesity: results from the Behavioral Risk Factor Surveillance System. J Health Econ. 2004;233:565–587. [PubMed]
5. Rashad I. Structural estimation of caloric intake, exercise, smoking, and obesity. Q Rev Econ Finance. 2006;462:268–283.
6. French MT, Norton EC, Fang H et al. Alcohol consumption and body weight. Health Econ. 2010;197:814–832. [PMC free article] [PubMed]
7. Ng SW, Norton EC, Guilkey DK et al. Estimation of a dynamic model of weight. Empir Econ. 2012;422:413–443.
8. Cebu Study Team. Underlying and proximate determinants of child health: the Cebu Longitudinal Health and Nutrition Study. Am J Epidemiol. 1991;1332:185–201. [PubMed]
9. Subar AF, Kipnis V, Troiano RP et al. Using intake biomarkers to evaluate the extent of dietary misreporting in a large sample of adults: the OPEN study. Am J Epidemiol. 2003;1581:1–13. [PubMed]
10. Neuhouser ML, Di C, Tinker LF et al. Physical activity assessment: biomarkers and self-report of activity-related energy expenditure in the WHI. Am J Epidemiol. 2013;1776:576–585. [PMC free article] [PubMed]
11. Greene WH. Econometric Analysis. 7th ed Upper Saddle River, NJ: Prentice Hall; 2011.
12. Guilkey DK, Hutchinson PL Overcoming methodological challenges in evaluating health communication campaigns: evidence from rural Bangladesh. Stud Fam Plann. 2011;422:93–106. [PMC free article] [PubMed]
13. Rindfuss RR, Guilkey DK, Morgan SP et al. Child-care availability and fertility in Norway. Popul Dev Rev. 2010;364:725–748. [PMC free article] [PubMed]
14. Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000;294:722–729. [PubMed]
15. Hernán MA, Robins JM Instruments for causal inference: an epidemiologist's dream? Epidemiology. 2006;174:360–372. [PubMed]
16. Burgess S, Granell R, Palmer TM et al. Lack of identification in semiparametric instrumental variable models with binary outcomes. Am J Epidemiol. 2014;1801:111–119. [PMC free article] [PubMed]
17. Palmer TM, Sterne JA, Harbord RM et al. Instrumental variable estimation of causal risk ratios and causal odds ratios in Mendelian randomization analyses. Am J Epidemiol. 2011;17312:1392–1403. [PubMed]
18. Rassen JA, Schneeweiss S, Glynn RJ et al. Instrumental variable analysis for estimation of treatment effects with dichotomous outcomes. Am J Epidemiol. 2009;1693:273–284. [PubMed]
19. Trompet S, Jukema JW, Katan MB et al. Apolipoprotein E genotype, plasma cholesterol, and cancer: a Mendelian randomization study. Am J Epidemiol. 2009;17011:1415–1421. [PubMed]
20. Au Yeung SL, Jiang CQ, Cheng KK et al. Evaluation of moderate alcohol use and cognitive function among men using a Mendelian randomization design in the Guangzhou Biobank Cohort Study. Am J Epidemiol. 2012;17510:1021–1028. [PubMed]
21. Mroz TA. Discrete factor approximations in simultaneous equation models: estimating the impact of a dummy endogenous variable on a continuous outcome. J Econom. 1999;922:233–274. [PubMed]
22. Guilkey DK, Lance PM Program impact estimation with binary outcome variables: Monte Carlo results for alternative estimators and empirical examples. In: Sickles RC, Horrace WC, eds. Festschrift in Honor of Peter Schmidt: Econometric Methods and Applications. New York, NY: Springer Publishing Company; 2014:5–46.
23. Angrist JD, Imbens GW, Rubin DB Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;91434:444–455.
24. Friedman GD, Cutter GR, Donahue RP et al. CARDIA: study design, recruitment, and some characteristics of the examined subjects. J Clin Epidemiol. 1988;4111:1105–1116. [PubMed]
25. Jacobs DR, Hahn LP, Haskell WL et al. Validity and reliability of short physical activity history: CARDIA and the Minnesota Heart Health Program. J Cardiopulm Rehabil. 1989;911:448–459.
26. Liu K, Slattery M, Jacobs D Jr et al. A study of the reliability and comparative validity of the CARDIA dietary history. Ethn Dis. 1994;41:15–27. [PubMed]
27. Heckman J, Singer B A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica. 1984;522:271–320.
28. Angeles G, Guilkey DK, Mroz TA Purposive program placement and the estimation of family planning program effects in Tanzania. J Am Stat Assoc. 1998;93443:884–899.
29. Laird N. Nonparametric maximum likelihood estimation of a mixing distribution. J Am Stat Assoc. 1978;73364:805–811.
30. Morland K, Wing S, Diez Roux A The contextual effect of the local food environment on residents’ diets: the Atherosclerosis Risk in Communities Study. Am J Public Health. 2002;9211:1761–1767. [PubMed]
31. Meyer KA, Guilkey DK, Ng SW et al. Sociodemographic differences in fast food price sensitivity. JAMA Intern Med. 2014;1743:434–442. [PMC free article] [PubMed]
32. Hou N, Popkin BM, Jacobs DR Jr et al. Longitudinal associations between neighborhood-level street network with walking, bicycling, and jogging: the CARDIA study. Health Place. 2010;166:1206–1215. [PMC free article] [PubMed]
33. Boone-Heinonen J, Gordon-Larsen P, Kiefe CI et al. Fast food restaurants and food stores: longitudinal associations with diet in young to middle-aged adults: the CARDIA study. Arch Intern Med. 2011;17113:1162–1170. [PMC free article] [PubMed]
34. Andreyeva T, Long MW, Brownell KD The impact of food prices on consumption: a systematic review of research on the price elasticity of demand for food. Am J Public Health. 2010;1002:216–222. [PubMed]
35. Stock JH, Wright JH, Yogo M A survey of weak instruments and weak identification in generalized method of moments. J Bus Econ Stat. 2002;204:518–529.
36. Hankinson AL, Daviglus ML, Bouchard C et al. Maintaining a high physical activity level over 20 years and weight gain. JAMA. 2010;30423:2603–2610. [PMC free article] [PubMed]
37. Lewis CE, Smith DE, Wallace DD et al. Seven-year trends in body weight and associations with lifestyle and behavioral characteristics in black and white young adults: the CARDIA study. Am J Public Health. 1997;874:635–642. [PubMed]
38. Pereira MA, Kartashov AI, Ebbeling CB et al. Fast-food habits, weight gain, and insulin resistance (the CARDIA study): 15-year prospective analysis. Lancet. 2005;3659453:36–42. [PubMed]
39. Suter PM. Is alcohol consumption a risk factor for weight gain and obesity? Crit Rev Clin Lab Sci. 2005;423:197–227. [PubMed]
40. Wang L, Lee IM, Manson JE et al. Alcohol consumption, weight gain, and risk of becoming overweight in middle-aged and older women. Arch Intern Med. 2010;1705:453–461. [PMC free article] [PubMed]
41. Thomson CA, Wertheim BC, Hingle M et al. Alcohol consumption and body weight change in postmenopausal women: results from the Women's Health Initiative. Int J Obes (Lond). 2012;369:1158–1164. [PubMed]
42. Sobal J, Rauschenbach B, Frongillo EA Marital status changes and body weight changes: a US longitudinal analysis. Soc Sci Med. 2003;567:1543–1555. [PubMed]
43. French SA, Jeffery RW, Forster JL et al. Predictors of weight change over two years among a population of working adults: the Healthy Worker Project. Int J Obes Relat Metab Disord. 1994;183:145–154. [PubMed]
44. Boone-Heinonen J, Gordon-Larsen P, Guilkey DK et al. Environment and physical activity dynamics: the role of residential self-selection. Psychol Sport Exerc. 2011;121:54–60. [PMC free article] [PubMed]
45. Oakes JM. The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology. Soc Sci Med. 2004;5810:1929–1952. [PubMed]

Articles from American Journal of Epidemiology are provided here courtesy of Oxford University Press