We will use data from the Coronary Artery Risk Development in Young Adults Study (CARDIA), a longitudinal study of the antecedents and risk factors for cardiovascular disease in an ethnicity-, age-, and sex-balanced cohort of 5,115 black and white young adults aged 18–30 years at baseline (1985). Using GIS, we linked time-varying respondent residential addresses from four CARDIA study years (1985, 1992, 1995, and 2001) with contemporaneous data on environmental factors derived from a series of federal and commercial data bases. Recognizing the limitations of existing methods for adjusting for residential selection, we will use a behavioral model that focuses on the sequencing of career and residential choice as a major component, given the changes central to this lifecycle stage.
In our models, physical activity will be treated as an endogenous factor in this decision-making process (see review (Bartik & Smith, 1987
)). We acknowledge the possible correlation between built environment variables and the error term that may arise through endogenous sorting of households across communities. We assume that individuals choose their place of residence by comparing the indirect utility that they would receive from alternative locations; the indirect utility will be partly a function of the neighborhood environment “prices,” such as recreational facilities, inter-connected street networks, and cost of living. We will develop a series of equations for location choice, choice variables relevant to the young- to middle-aged adult such as selection of career and marriage, and physical activity. We assume that the error terms across these equations are correlated with each other. Thus, standard estimation methods will yield biased and inconsistent parameter estimates.
Our solution to the estimation problem is joint estimation of the entire system by FIML similar to the econometric method developed by Bhat and Guo (Bhat & Guo, 2007
) in their study of the relationship household residential choice and auto ownership levels. Distinguishing features of our model system include its incorporation of longitudinal data and simultaneous estimation of physical activity, diet, and downstream health outcomes including obesity and clinical cardiovascular disease risk factor measures, whereas Bhat and Guo focused on the cross-sectional relationship between land use and one transportation measure. Additionally, FIML methods typically require assumptions about the distribution of unobservables in the equations with multivariate normality being a common assumption. Bhat and Guo’s method accommodates multinomial ordered response data, and we plan to use a semi-parametric method that provides more flexibility by not requiring the assumption of a specific distribution. The discrete factor method is a variant of the Heckman and Singer method (Heckman & Singer, 1984
), shown to work well in models such as these (Mroz, 1999
; Mroz & Guilkey, 1995
). In particular, we will approximate the distribution of the unobserved variables associated with the respondent’s original residential location as well as time varying unobserved individual level characteristics; we will specify a discrete distribution where the parameters of the discrete distribution will be estimated with the coefficients of interest. The method has been used with similar modeling problems (Blau, 1994
; Guilkey & Riphahn, 1998
Of course, the use of FIML methods requires statistical identification of the choice of residential location (and career and marriage choice) and physical activity equations (e.g., family background variables such as state level employment related variables and other similar state, county, and Standard Metropolitan Statistical Area measures). Exogenous variables unique to the career choice equation may be state level employment related variables. Exogenous variables unique to the residential choice equations may be other characteristics of the locations (e.g., access to larger cities, cultural activities, cost of living), which may not directly affect physical activity. The importance of obtaining more than just “technical” identification has been well documented in recent literature (Bound, Jaeger, & Baker, 1995
). It is important that the identifying variables have significant impact on the dependent variables and the explanatory power of the set of identifying variables is particularly crucial. Our structural model will employ tests by Bound et al. to gauge strength of identification. Since the model is over-identified, we will also be able to test some identifying restrictions (e.g., (Davidson & MacKinnon, 1993