Study Population and Data Sources
The CARDIA Study is a population-based prospective epidemiologic study of the determinants and evolution of cardiovascular risk factors among black and white young adults. At baseline (1985–6), 5,115 eligible subjects, aged 18–30 years, were enrolled with balance according to race (black, white), gender, education (≤ and >high school) and age (18–24 and 25–30 years) from four U.S. communities: Birmingham, Alabama; Chicago, Illinois; Minneapolis, Minnesota; and Oakland, California. Specific recruitment procedures were described elsewhere (
Hughes, Cutter, Donahue, Friedman, Hulley, Hunkeler et al., 1987). Study data were collected under protocols approved by Institutional Review Boards at each study center and the University of North Carolina at Chapel Hill. Follow-up examinations conducted in 1987–1998 (Year 2), 1990–1991 (Year 5), 1992–1993 (year 7), 1995–1996 (year 10), and 2000–2001 (year 15) had retention rates of 90%, 86%, 81%, 79%, and 74% of the surviving cohort, respectively.
Using a Geographic Information System, we linked time-varying, community-level, U.S. census data to CARDIA respondent residential locations in exam years 0, 7, 10, and 15 from geocoded home addresses. 48.2, 68.8, and 33.0% of participants moved residential locations between years 0 and 7, 7 and 10, and 10 and 15, respectively.
Of the possible 20,460 observations for 5115 participants at baseline across 4 examinations, 4,400 observations were missing due to loss to follow-up (including mortality): 80, 77, and 72% of the initial participants were observed at years 7, 10, and 15, respectively. Of remaining observations, we excluded observations for women who were pregnant at the time of examination (n=114 observations), and with missing PA (n=126 observations), neighborhood SES variables (n=86 observations) or covariate data (n=274 additional observations). Those lost to follow-up or missing data were generally more likely black, male, younger, and of lower baseline education (p<0.05); however, attrition (except for year 7, p=0.02) and missing data were unrelated to baseline PA and, to the extent that attrition and missing data are related to unobserved fixed characteristics of the individuals, our fixed effects models may mitigate selection bias. The final analytical sample totaled 15,460 observations for 4,179 individuals.
Neighborhood socioeconomic measures
Several commonly used neighborhood socioeconomic measures were approximately time-matched to each examination period (CARDIA year, Census: Year 0, 1980; Years 7 and 10, 1990; Year 15, 2000). Census tracts were used to define neighborhoods because they are consistent with prior research, block groups were not universally implemented until the 1990 census, and we theorized counties as too large to capture the neighborhood environment.
Measures of socioeconomic disadvantage included percent of persons with income less than 150% of federal poverty level [1.5 times federal poverty level (
Krieger, Zierler, Hogan, Waterman, Chen, Lemieux et al., 2003;
U.S. Census Bureau, 2009)] and percent of persons aged 25 years and over with less than high school level of education. Because nearly the entire study population (>96%) resided in metropolitan areas, we used the 150% poverty cutpoint to account for higher cost of living in urban areas; results using the 100% poverty cutpoint were similar but slightly weaker. Other SES measures included percent of persons ≥25 years with college degrees and median household income, which, for comparability across exam periods, was inflated to reflect the value of U.S. dollars in the year 2000, based on the Consumer Price Index.
Exploratory factor analysis was used to summarize neighborhood SES exposures and indicated that the four measures represented a single construct (factor) with similar factor loadings across exam years (
electronic appendix 1, Table A1.1, available with the online version of paper). Therefore, factor analysis was performed in data pooled across years. Higher factor scores represent higher neighborhood deprivation, indicated by
higher neighborhood poverty and proportion with <high school education and
lower median household income and proportion with a college education. Each respondent’s score on this factor was used as a composite neighborhood deprivation measure.
Outcome: physical activity index
At each examination, frequency of participation in 13 categories of moderate and vigorous recreational sports, exercise, leisure, and occupational activities (
electronic appendix 1, Table A1.2, available with the online version of paper) over the previous 12 months was ascertained by an interviewer-administered questionnaire designed for CARDIA. As described elsewhere (
Jacobs, Hahn, Haskell, Pirie, & Sidney, 1989), PA scores were calculated in exercise units based on frequency and intensity of each activity. Reliability and validity of the instrument is comparable to other activity questionnaires (
Jacobs Jr., Ainsworth, Hartman, & Leon, 1993;
Jacobs Jr., Hahn, Haskell, Pirie, & Sidney, 1989). We excluded occupational and household PA from our PA score because they were not theorized to be influenced by the neighborhood environment.
Individual-level covariates
Individual-level baseline characteristics included age (mean centered), race (white, black), and study center (Birmingham, Chicago, Minneapolis, Oakland). Education at Year 7, after most individuals attained their highest education level, was examined as a time invariant variable; Year 0 education was used if Year 7 education was missing. Time-varying individual-level characteristics included income, marital status (married, not married), and children or stepchildren 18 years or younger living in the household (any, none). Income was examined as a categorical variable (approximate tertiles: <$25,000, $25,000–49,900, >$49,900) because of a non-linear relationship with PA. Income was not collected in year 0 or 2, so the closest measurement (year 5) was analyzed. To avoid over-adjustment and induction of selection bias, we did not control for BMI, a theorized outcome of PA.
Statistical analysis
Effects of neighborhood deprivation on PA score throughout young to middle adulthood were estimated in a series of longitudinal random and fixed effects linear models. Conditioned on the individual, fixed effect models account for time-invariant unmeasured variables (e.g., motivation to exercise that remains constant over time) which may be related to both PA and neighborhood deprivation (Boone-Heinonen et al.). By analyzing within-person variation observed in repeated measures over time, each individual in essence serves as his/her own control. In contrast, random effects models (random person-level intercept) (
Rabe-Hesketh & Skrondal, 2008) analyze variation both within and between individuals; they do not control for possible correlation between observed and unmeasured characteristics and are therefore most comparable to cross-sectional associations reported in prior research. Random slopes were not estimated in order to maintain comparability with fixed effects estimates. See
electronic appendix 2 (available with the online version of paper) for a detailed discussion of random and fixed effects models. All models were fit using the Stata 10.1 xtreg function, using the “fe” option for fixed effects models (
StataCorp, 2005). The Hausman specification test formally compared fixed and random effects estimates.
In race-stratified random and fixed effect models, PA scores were modeled as a function of three cumulative sets of confounders which were theorized to influence residential selection: Model 1 included concurrent neighborhood deprivation, age and study center; Model 2 added individual-level education and time-varying income; and Model 3 further added time-varying marital status and children in household.
The built environment – such as PA facilities (
Diez Roux, Evenson, McGinn, Brown, Moore, Brines et al., 2007;
Gordon-Larsen et al., 2006), parks (
Bedimo-Rung, Mowen, & Cohen, 2005), and pedestrian infrastructure (
Krizek & Johnson, 2006) – and crime (
Foster & Giles-Corti, 2008) were theorized to mediate the relationship between neighborhood deprivation and PA and thus not included in our models. Concurrent (versus time lagged) neighborhood deprivation was examined because the neighborhood socioeconomic environment was theorized to represent relatively immediate environmental PA influences. Coefficients for time-invariant individual-level variables (study center, education, sex) are estimated in random effects models but not fixed effects models.
PA scores were natural log-transformed to address skewness, so model coefficients were interpreted as the percent change in PA score expected from a 1-unit change in the corresponding independent variable; analysis of the continuous outcome variable allowed examination of changes across the full distribution of PA. To examine non-linear relationships between neighborhood deprivation and PA (assessed graphically and through testing of higher order terms), neighborhood deprivation was modeled as quartiles (in observations pooled over time). Quartiles were race-specific because, due to residential patterning by race in the U.S. (R. J.
Sampson & Sharkey, 2008), there was limited overlap in neighborhood deprivation between whites and blacks. Sex interactions with each independent variable were tested using backward elimination; for comparability, interaction terms were retained if significant (likelihood ratio test, p<0.10) in Models 1, 2, or 3 in random or fixed effects models. Time interactions with neighborhood deprivation in Model 1 (using baseline instead of time varying age to avoid collinearity) were not significant and thus excluded; time main effects were not included because the sequencing of observations were not important for our study objective.
Because CARDIA respondents lived in different census tracts over time, our data were not nested across three levels (e.g., multiple people per census tract, multiple time periods per person) as required for multi-level models. Therefore, neighborhood deprivation was treated as an individual-level exposure. Our data were sparse (few individuals on average within census tracts) and unbalanced (variable numbers of individuals), with the following number of individuals per census tract, by study center [mean (range)]: Birmingham [10.1 (1, 46)], Chicago [6.3 (1, 47)], Minneapolis [6.3 (1, 173)], Oakland [5.1 (1, 28)] in year 0, declining to [2.7 (1, 17)], [1.4 (1, 13)], [1.8 (1,27)] and [1.6 (1, 11)] by Year 15, respectively. Intraclass correlations were relatively small (0.08 in pooled data).