Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Eval Rev. Author manuscript; available in PMC 2010 August 1.
Published in final edited form as:
PMCID: PMC2714913

Design and Analysis of the Community Youth Development Study Longitudinal Cohort Sample


Communities That Care (CTC) is a prevention system designed to reduce adolescent substance use and delinquency through the selection of effective preventive interventions tailored to a community’s specific profile of risk and protection. A community-randomized trial of CTC, the Community Youth Development Study, is currently being conducted in 24 communities across the United States. This paper describes the rationale, multilevel analyses, and baseline comparability for the study’s longitudinal cohort design. The cohort sample consists of 4,407 fifth- and sixth-grade students recruited in 2004 and 2005, and surveyed annually through ninth grade. Results of mixed-model ANOVAs indicated that students in CTC and control communities exhibited no significant differences (ps > .05) in baseline levels of student outcomes.

Keywords: Community Youth Development Study, Communities That Care, longitudinal design, group randomized trial, prevention


Adolescent health and behavior problems, such as underage drinking, illicit drug use, and delinquency, continue to pose threats to the healthy development of youth populations (Johnston et al. 2008; National Institute on Alcohol Abuse and Alcoholism 2003; Snyder, Howard, and Sickmund 2006).Consequently, the development and implementation of service delivery systems for preventing these problems in communities has become a priority for prevention researchers and practitioners (Chinman et al. 2005; Spoth and Greenberg 2005; Wandersman 2003; Weissberg, Kumpfer, and Seligman 2003). Despite increased knowledge of risk and protective factors for adolescent health and behavior problems (Hawkins, Catalano, and Miller 1992), many communities continue to select and implement prevention programs and strategies that show little or no evidence of effectiveness (Ennett et al. 2003; Gottfredson and Gottfredson 2002; Hallfors and Godette 2002; Wandersman and Florin 2003). A current challenge for prevention science, therefore, is to mobilize communities to link what is known about community risk and protection with preventive interventions that have proven effective (Mitchell, Florin, and Stevenson 2002; Roussos and Fawcett 2000; Wandersman 2001).

Communities That Care (CTC) (Hawkins and Catalano 2002; Hawkins, Catalano, and Arthur 2002) is a manualized prevention service delivery system that mobilizes communities to adopt a science-based framework that focuses on empirically identified risk and protective factors to prevent adolescent health and behavior problems. Because differences across communities in levels of risk and protection are related to differences in community levels of substance use and delinquency (Hawkins, Van Horn, and Arthur 2004), the theory of change underlying CTC posits that selection of effective prevention programs and strategies should be tailored to the specific epidemiology of risk and protection found in communities. This is operationalized in communities via repeated epidemiologic assessments of adolescent risk and protective factors used for strategic prevention planning and ongoing evaluation of communities’ prevention service delivery systems (Arthur and Blitz 2000). Additionally, social development strategies (Catalano and Hawkins 1996; Hawkins and Weis 1985) are incorporated in CTC training activities and technical assistance to provide specific guidelines for implementation (Fagan, Hawkins, and Catalano 2008; Quinby et al. 2008). As a result, through the implementation of CTC in communities, use of these prevention programs is hypothesized to prevent the onset and to decrease the prevalence of adolescent health and behavior problems.

The Community Youth Development Study (CYDS) (Hawkins et al. 2008) is a community-randomized trial of CTC currently being conducted in 24 communities across the United States. The study is a collaborative effort between the Social Development Research Group at the University of Washington and the Illinois Division of Community Health and Prevention; the Kansas Department of Prevention and Early Intervention; the state substance abuse agencies of Colorado, Maine, Oregon, Utah, and Washington; and 24 communities in these states. Four distinct aims are addressed in the CYDS. In Aim 1, the study seeks to examine the efficacy of the CTC system in impacting levels and trends in risk and protection and reducing the incidence and prevalence of adolescent substance use and delinquency in students and communities. Analyses planned to address this aim utilize two distinct, yet complimentary, longitudinal designs: an extended nested cross-sectional design (Murray 1998) intended to examine change within the entire population of students in communities, and an extended nested cohort design (Murray 1998) intended to examine change within a specific panel of students. The second aim of the CYDS assesses levels of collaboration among various sectors within communities (e.g., civic, business, youth recreation, religious) as a system-level mediator between adoption of the science-based CTC approach to prevention programming and changes in levels of risk and protection in communities. Aim 3 examines the use of epidemiologic data to prioritize community-specific risk and protective factors and the selection of appropriate and effective prevention programs that address those factors. Aim 4 examines whether the use of these selected prevention programs is related to changes in community levels of risk, protection, substance use, and delinquency.

The purpose of this paper is to describe the design of, and planned analyses for, the CYDS longitudinal cohort sample. Specifically, we describe the strategy used to recruit the student panel and planned statistical analyses, including latent growth and discrete-time survival models. We also test the equivalence of intervention and control communities at baseline on measures of risk, protection, substance use, and delinquency.



The overall design of the CYDS includes multiple assessments of student outcomes, risk and protective factors, and prevention service system functioning. In addition to the cohort design described in this paper, other design elements of the CYDS include: (a) a cross-sectional design assessing community levels and trends in adolescent substance use, delinquency, risk, and protection using repeated anonymous biennial population-based surveys of 6th-, 8th-, 10th-, and 12th-grade students in both intervention and control communities (Murray et al. 2006); (b) repeated pre-post measures of community-level indicators of prevention service planning and delivery (e.g., prevention collaboration (Brown et al. 2008)) by representative samples of community leaders (Arthur, Glaser, and Hawkins 2005); (c) repeated pre-post documentation of community prevention resources and program exposure via surveys with prevention service providers (e.g., teachers, principals, and prevention program directors); (d) annual assessments of CTC functioning via surveys of CTC board members; and (e) ongoing assessment of prevention program implementation fidelity (Fagan et al. 2008).


The CYDS cohort design provides an assessment of the efficacy of the CTC prevention system at reducing adolescent substance use and delinquency outcomes that is separate from the cross-sectional design. The cohort design allows for examination of within-individual change in a panel of students whom the implemented prevention programs in CTC communities were likely to reach given CYDS’s focus on students in Grades 5 through 9. This design component reduces the susceptibility of the overall CYDS design to random heterogeneity due to secular changes in student populations. Secular changes have been identified as potentially limiting the ability to identify intervention effects in community trials (Bauman, Suchindran, and Murray 1999).

The cohort design used in the CYDS calls for annual repeated measurements of a panel of students who were in the fifth grade during the 2003–2004 school year. Wave 1 data collection took place in the spring of 2004 and represents a pre-intervention baseline assessment of students’ substance use, delinquency, levels of risk and protective factors, and demographic characteristics. Following the CTC training and implementation schedule, prevention programs and strategies began to be implemented in intervention communities in the summer of 2004. Wave 2 of data collection was conducted in the spring of 2005 and included an effort to recruit additional students in the cohort who were not recruited in the Wave 1 administration.


Communities in the CYDS were selected from a larger pool of 19 matched pairs, and one matched triad, of communities in seven states (i.e., Colorado, Illinois, Kansas, Maine, Oregon, Utah, and Washington) that participated in a naturalistic study of the diffusion of science-based prevention strategies (Arthur, Glaser, and Hawkins 2005). Communities were matched within state by total population, poverty, racial/ethnic diversity, and unemployment and crime indices. Eligibility criteria for inclusion in the CYDS consisted of: (a) having not selected tested, effective prevention programs to address prioritized community risk factors according to community leaders interviewed in 2001 and (b) securing letters of support from the superintendent of schools, the mayor or city manager, and the lead law enforcement officer in each community agreeing to random assignment of communities and to all ensuing CYDS data collection activities. Consequently, one community from within each of 12 matched pairs of communities that met eligibility requirements was assigned by coin toss to either the CTC intervention or the control (i.e., prevention services as usual) condition.

The 24 CYDS communities are small and medium-sized incorporated towns with an average population of 14,646 (range = 1,578 to 40,787 (U.S. Census Bureau 2001)). These towns are geographically distinct communities (i.e., at least 60 miles apart) with clear community boundaries (i.e., were not suburbs of larger cities) and separate governmental, educational, and law enforcement structures. On average, 89% of the population members are European American (range = 64% to 98), 3% are African American (range = 0% to 21%), 10% are of Hispanic origin (range = 1% to 65%), 12% are between the ages of 10 and 17 (range = 9% to 16%), and 37% of students are eligible for free or reduced price school lunch (range = 21% to 66%).

All data collection procedures were designed and implemented consistently across all communities. Prevention resources and prevention program exposure in both CTC and control communities were documented by CYDS researchers to monitor the potential of experimental contamination. The CYDS includes analyses of community prevention resources and program exposure to assess the degree to which elements, if any, of the CTC system have been implemented in control communities. Although exposure of the control group to elements of the CTC prevention system could decrease the likelihood of observing effects of CTC in this trial, we conduct the CYDS trial under the assumption of noninterference among communities (i.e., the stable unit-treatment value assumption (SUTVA, Rosenbaum 2007).


Measures of substance use, delinquency, risk and protective factors, and demographic characteristics are obtained from the CYDS Youth Development Survey (YDS) (Social Development Research Group 2005). Patterned after the Communities That Care Youth Survey (Arthur et al. 2007; Glaser et al. 2005; Arthur et al. 2002), the YDS is a self-administered, paper-and-pencil questionnaire designed to be completed in a 50-minute classroom period. The survey includes questions on student demographic characteristics, (i.e., age, gender, race/ethnicity, family composition, and parental education); lifetime and 30-day measures of alcohol, marijuana, cigarette, and other drug use; heavy episodic drinking (i.e., five or more drinks in a row); past-year delinquency; and risk and protective factors in community, school, family, and peer/individual domains. Additional items (e.g., more severe forms of delinquency, sexual behavior) were added to the YDS as deemed developmentally appropriate. Table 1 lists the 28 risk and protective factor scales measured in the YDS, the number of items, and reliability coefficient alphas for each scale based on data from the Wave 1 administration. Items used to measure risk and protective factors were standardized and then averaged within each scale.

Table 1
Risk and Protective Factors (by Domain), Number of Items, and Reliability Coefficient Alphas


Recruitment for the cohort sample began in the fall of 2003 by mailing information packets and making in-person calls to each school district superintendent and school principal within the 24 CYDS communities, asking for their continued commitment to participate in the study and outlining the requirements of involvement in the coming year. As a result, 28 of 29 school districts, comprising 88 schools, agreed to participate. In participating school districts, a school-based, teacher-coordinated approach was used to secure informed parental consent and student assent for participation in the study. School principals identified a lead teacher to coordinate the distribution of consent materials to teachers of fifth-grade classes. Lead teachers worked closely with classroom teachers to help them distribute the consent forms to students for parental consent, provide instructions to their students, and collect the consent forms in two weeks, indicating whether or not each eligible student’s parents gave informed consent for their child to participate in the study. Lead teachers served as the point of contact for CYDS staff and received a $20 cash incentive for their assistance. To encourage high return rates, teachers of fifth-grade classes were offered $100 for classroom supplies if at least 90% of the eligible students returned their consent forms and $150 if at least 95% of students returned their forms. Each school received an additional $150 for its overall participation in the study. In the one community whose schools declined to participate, a community-based recruitment method was used whereby families with fifth graders were solicited via newspaper advertisements and flyers distributed at child-centered locations.

Recruitment efforts for Wave 1 data collection yielded a return of 92.5% of the consent forms to the schools. Parents of 63.1% (n = 3,682) of eligible students consented to their participation in the study at Wave 1 (community range = 24.7% to 72.9%). To increase the overall participation rate of students across the 24 CYDS communities, a second recruitment effort was initiated in Wave 2. Beginning in the fall of 2004, trained CYDS recruiters were sent into study schools to conduct 5-minute classroom presentations to interest students in the study, answer questions, and directly distribute new consent brochures to eligible students whose families had not previously consented or otherwise had not been recruited in Wave 1. In addition to working with lead teachers, CYDS recruiters directly contacted sixth-grade teachers in Wave 2 to explain the study. A new incentive plan was implemented for Wave 2, setting recruitment goals at the school level instead of at the classroom level. In addition to $150 given to each participating school, an incentive of $150 was distributed to every participating classroom if at least 85% of eligible sixth-grade students from the entire school returned their consent forms within a 2-week period and $75 if 85% of eligible students returned their consent forms by the targeted survey date, regardless of whether the form granted or denied permission for the student to participate in the survey. This recruitment effort resulted in an additional 1,146 sixth-grade students consented to the study.

Eleven percent (n = 404) of the students consented in Wave 1 were ineligible for participation in Wave 2 because they moved out of the school district (n = 388), did not remain in their grade cohort (i.e., skipped or were held back a grade; n = 4), were in foster care and did not have consent from state authorities to participate (n = 7), or were unable to complete the survey on their own due to severe learning disabilities (n = 5). Excluding ineligible students and including the newly recruited students resulted in a total of 4,420 students whose parents consented to their participation in the study (76.4% of the eligible population). Final consent rates did not differ significantly by intervention condition (i.e., rates were 76.1% for students in intervention communities and 76.7% for students in control communities). Overall, 3,585 students completed a Wave 1 survey, 4,390 students completed a Wave 2 survey, and 4,407 students completed either a Wave 1 or Wave 2 survey.


In Wave 1, trained CYDS interviewers read the survey aloud to classrooms of fifth graders who followed along and marked their answers in the survey booklet. Two interviewers were present in each classroom; one to read the survey aloud to the class and the other to assist students individually, as needed. In Wave 2, CYDS interviewers introduced the YDS to classrooms of sixth-grade students who then read and completed the survey on their own. Again, interviewers were available to assist students with questions or special requests as needed. In both waves, make-up sessions with students who needed extra time or required special attention were conducted by CYDS staff. For students who were not surveyed by the time CYDS staff left the community (e.g., continued absence or suspension from school), teacher-proctor survey packets were left with lead teachers with explicit instructions on how to administer the survey.

To ensure confidentiality, no names or other identifying information were included on any of the surveys. Identification numbers were printed on the survey booklets to allow tracking across data collection waves. Students read and signed assent statements indicating that they were fully informed of their rights as research participants and agreed to participate in the study. At the end of the classroom period, CYDS interviewers collected the survey booklets and assent statements from students, separated them, and sealed them in separate secured envelopes for return to the University of Washington. Upon completion of the survey, students received small incentive gifts worth approximately $5 to $8. Wave 1 participants recruited through community-based methods were surveyed in the local library and received $50 cash for their participation.

Strategies for collecting future waves of data from the longitudinal cohort data include maintaining continuous locating information on all students in the cohort through contacts with CYDS schools and tracking all cohort students, even if they move to locations other than the original 24 CYDS communities. Students who are not present at the time of school data collection have separate data collection dates scheduled with CYDS interviewers. Student attrition is monitored closely and ongoing analyses will assess whether rates of attrition differ significantly by intervention condition or other student and community characteristics and outcomes. Especially valuable in this context will be the use of longitudinal diagnostics suggested by Graham (2009) following Hedeker and Gibbons (1997). Variables found to be related to student attrition will be incorporated in outcome analyses as covariates or in multiple imputation analyses as auxiliary variables (Collins, Schafer, and Kam 2001; Rubin 1987; Schafer 1997, 2000).


In the longitudinal cohort design, the repeatedly measured outcomes (Time, t) are nested within students (i) who, in turn, are nested within communities (j), with communities being nested within matched pairs of communities (k). To address the statistical dependencies that can occur in such nesting, we will rely on the General Linear Mixed Model (McCulloch and Searle 2001; Raudenbush and Bryk 2002) for Gaussian distributed outcomes and the Generalized Linear Mixed Model (Breslow and Clayton 1993; Liang and Zeger 1986) with logit link transformation for Bernoulli distributed outcomes. Three sets of statistical analyses are planned for the cohort sample. First, beginning with the Grade 7 wave of student data collection, we will examine differences in the prevalence rates of substance use and delinquent behaviors using a mixed-model ANCOVA (Murray 1998), controlling for baseline levels of substance use or delinquent behaviors. Second, we will use multilevel discrete-time survival models to assess the efficacy of CTC to prevent the onset of substance use and delinquency during successive waves of data collection. The third set of analyses will employ latent growth models (Laird and Ware 1982; Raudenbush 2001) to examine intervention effects in long-term trajectories of substance use, delinquency, and risk/protective factors. All analytic strategies assess the effects of the intervention at the appropriate unit of randomization (i.e., communities) with appropriate estimates of standard errors and degrees of freedom and allow for regression adjustment of potential covariate effects. Student- and community-level covariates will be added as linear predictors of targeted outcomes to improve the precision of estimated intervention effects (Murray 1998; Schafer and Kang 2008). Although communities in the CYDS were matched into pairs with one community from each matched pair assigned randomly to experimental condition, random assignment does not guarantee that student populations within each community pair will be equivalent with regard to their respective distributions of demographic and individual characteristics; nor does it guarantee that community pairs will remain similar over time with regard to population and economic growth.

Mixed-Model ANCOVA

Using the Bernoulli-distributed 30-day alcohol use outcome at Grade 7 as an example (i.e., 1 = alcohol use during the previous 30 days, 0 = no alcohol use during the previous 30 days), the Generalized Linear Model for the mixed-model ANCOVA can be expressed (in multilevel equation format; see Raudenbush and Bryk 2002) as:

  • Level 1 (Student i)
    ηijk=β00jk+β01jk(AGEijk)+β02jk(SEXijk)+β03jk(WHITEijk)+β04jk(HISPijk)+   β05jk(PAREDijk)+β06jk(RELIGijk)+β07jk(REBELijk)+β08jk(G5ALC30ijk)
    where ηjk=log[P(G7ALC30ijk=1)/(1 - P(G7ALC30ijk))]
  • Level 2 (Community j)
  • Level 3 (Community-matched pair k)

This model statistically controls for student baseline characteristics: age (AGE), gender (SEX), race/ethnicity (White vs. Nonwhite [WHITE] and Hispanic vs. Nonhispanic [HISP]), parental education (PARED), religious attendance (RELIG), and rebelliousness (REBEL); and includes a baseline measure of the dependent variable (G5ALC30) as an additional adjustment for any potential baseline differences. These covariates were selected on the basis of having putative zero-order linear relationships with targeted outcomes, as suggested by previous research (e.g., Hawkins, Catalano, and Arthur 2002; Johnston et al. 2008). Characteristics of students’ communities, population size (POP) and percentage of students receiving free or reduced price school lunch (PCTFRL) are included as community-level covariates. In the absence of a priori theory regarding the functional form of these covariates, we will include them as linear predictors of community-level intercepts. We make the assumption of linear additive covariate effects as a matter of convenience, but will consider modeling interactions among covariates should they be warranted.

The intervention effect (γ001) for the community-level dichotomous indicator of intervention status (CTC; 0 = control community, 1 = CTC community) is estimated as the mean difference in adjusted community-level prevalence rates between intervention and control communities as tested against the average variation among the intervention condition-specific adjusted community-level prevalence rates, with degrees of freedom equal to the number of community-matched pairs (12) minus the number of community-level covariates and intervention effect (3), minus one (i.e., df = 8; Murray 1998). We note that the variance for the level 1 random effect is a function of the proportion of students responding affirmatively to the outcome in question and, therefore, is not an estimated parameter; however, random effects for variability in intercepts (the mean log odds of 30-day alcohol use at Grade 7) across communities and community-matched pairs (u00jk and v000k, respectively) are included.

Multilevel Discrete-time Survival Analyses

Multilevel discrete-time survival analysis (Barber et al. 2000; Hedeker, Siddiqui, and Hu 2000; Yau 2001) will be used to assess the effects of the CTC intervention in delaying onset of alcohol, marijuana, cigarettes, and delinquency. This model also can be specified as a Generalized Linear Mixed Model (Reardon, Brennan, and Buka 2002), however, as a longitudinal model, an additional level of nesting is incorporated to model the time-specific hazard of initiation as a function of a student’s grade level at time of first self-reported occasion of substance use or delinquency (coded 1 = initiated during the time interval, 0 = did not initiate during the time interval). In line with specifications for survival models (Singer and Willett 2003), observations for individuals following the first reported event will be coded as missing data since they are no longer at risk for the indexed event; similarly, students that do not experience the target event before the conclusion of the study period will be treated as right-censored observations. Using first annual use of alcohol (FIRSTALC) as an example, the statistical model for the ML-DTSA is expressed (in multilevel equation format) as:

  • Level 1 (Time t)
    where ηtijk=log [P(FIRSTALCtijk=1)/(1 - (FIRSTALCtijk=1))]
  • Level 2 (Student i)
  • Level 3 (Community j)
  • Level 4 (Community-matched pair k)

This model is similar to that of the mixed-model ANCOVA except that an additional level of nesting (Time) is introduced with Time (coded “0” for Grade 5, “1” for Grade 6, and so on) modeled as a fixed effect. An additional random effect (r0ijk) is included to model the variability in the log odds of alcohol use initiation across students. Random effects u00jk and v000k are retained to model variation in the log odds of alcohol use initiation across community and community-matched pairs, respectively. The intervention effect (γ001) is assessed as the mean difference in adjusted community-level hazard rates between intervention and control communities and is tested against the average variation among the intervention condition-specific adjusted community-level hazard rates, with the same number of degrees of freedom as the mixed-model ANCOVA.

Latent Growth (Hierarchical Linear) Models

We will use latent growth models, also known as hierarchical linear models, to examine intervention effects on the change in the frequency of substance use and delinquency, and levels of risk and protective factors, over time. Similar to the ML-DTSA shown above, the latent growth/hierarchical linear model consists of four levels of nesting and explicitly models the outcome as a function of data collection wave. However, unlike the ML-DTSA, no conditionality is imposed on the values of the outcome and, as the dependent variables are considered to be distributed as Gaussian, no logit link transformation is required. Thus, the latent growth/hierarchical linear model, using the frequency of 30-day alcohol use (ALC30) as an example, can be depicted as:

  • Level 1 (Time t):
  • Level 2 (Student i):
  • Level 3 (Community j):
  • Level 4 (Community-Matched Pair k):

This model includes random effects to capture deviations in intercepts (i.e., predicted levels of 30-day alcohol use at Grade 5) between (a) a student’s observed and predicted levels at each wave of data collection, conditional upon being in a specific community and community-matched pair (etijk); (b) a student’s predicted level and his/her respective community’s predicted level of use (r0ijk); (c) each community’s predicted level and the predicted level for that community’s matched pair (u00jk); and (d) observed and predicted levels for each matched pair of communities (v000k). The model additionally includes random effects to capture variation in growth rates (slopes) among (a) each student’s predicted rate of growth in the frequency of 30-day alcohol use relative to his/her respective community’s average rate of growth (r1ijk), and (b) a community’s predicted rate of growth in the frequency of 30-day alcohol use at Grade 5 and the predicted growth rate for that community’s matched pair (u10jk). Random effects are assumed to be Gaussian with a mean of 0 and variance = σ 2, and uncorrelated with model covariates, which are reasonable assumptions given analysis of existing baseline data from the longitudinal cohort and cross-sectional designs (Murray et al. 2006). Variation in random effects is assumed to be homogeneous over time, among students, and between CTC and control communities; however, these a priori assumptions will be assessed during the course of the analyses and violations of model assumptions will be addressed (e.g., explicitly modeling heteroskedastic random effects). In all latent growth/hierarchical linear models, the intervention effect (γ101) will be estimated as the difference between the intervention condition-specific growth rates and will be tested against the average variation among the intervention condition-specific growth rates across communities, with appropriate degrees of freedom (Murray 1998). To assure that intervention effects are not unduly influenced by model covariates, we will conduct the analyses by first assessing fully unconditional models (i.e., no predictor variables), then examining models of unadjusted intent–to–treat intervention effects, and finally adding model covariates in fully conditional models with regression adjustment for covariates.



Of the 4,407 students recruited into the study, three students were excluded from the analysis for reporting that they were honest only “some of the time” or for reporting use of a fictitious drug included in the YDS as a validity screen. The resulting analysis sample of 4,404 students was split evenly between male (50%) and female (50%) students. Seventy percent of students were European American, 9% were Native American, 4% were African American, 25% were of another racial/ethnic group, and 20% were of Hispanic origin (students could select more than one race/ethnicity category). In Wave 1, students were an average of 11.1 years of age (SD = 0.4). The mean number of students per community was 184 (SD = 122, range = 20 to 454). Forty-five percent of the analysis sample was in control communities and 55% was in intervention communities.


Mixed-model ANOVAs, as implemented by the General and Generalized Mixed Models, were conducted to validate the randomization process and determine if intervention and control groups exhibited comparable levels of substance use, delinquency, and risk and protection at baseline. Because the cohort sample was augmented in Wave 2, missing baseline data for these students were imputed using Norm version 2.03 (Schafer 1997, 2000) under the assumption of data missing at random (MAR). In total, 40 separate data sets (Graham, Olchowski, and Gilreath 2007) were imputed separately by intervention condition to preserve within-condition mean and covariance structures. Imputation models included student and community characteristics, student outcomes, risk and protective factors, and dummy-coded indicators of community membership (to preserve the nested structure of the data). After imputation, data sets were combined to include both intervention and control groups for analysis. Each combined data set was analyzed using a General or Generalized Mixed Model via the Proc MIXED or Proc GLIMMIX procedures, respectively, in SAS version 9.1 (SAS Institute Inc. 2001). The mixed models included random effects for intercepts to model variability in baseline outcomes measures across students and communities. Results among the 40 imputed data sets were then combined into a single set of parameter estimates and standard errors using Rubin’s (1987) rules via the PROC MIANALYZE procedure in SAS.

Table 2 shows the unconditional prevalence rates (after missing data imputation) for the 16 substance use and delinquent behavior outcome variables measured in Grade 5 (correct prevalence rates were estimated by conducting separate EM algorithm analysis on dichotomous versions of the variables; prevalence rates based on normal model imputation are known to be incorrect unless the data are normally distributed, e.g., see Graham 2009). Intraclass correlation coefficients (ICCs) measuring between-community variability in substance use and delinquent behavior outcomes averaged .013, ranging from a low of .002 for lifetime marijuana use to a high of .039 for lifetime alcohol use. For the 28 risk and protective factors, the average ICC was .030, ranging from .010 for family conflict to .076 for religious attendance. Results of the random-coefficients analyses indicated that none of the substance use, delinquent behaviors, or risk and protective factors demonstrated significant differences (i.e., all ps > .05) in Grade 5 prevalence rates or means between students in intervention communities and students in control communities.

Table 2
Prevalence Rates (percentages) for Grade 5 Substance Use and Delinquent Behavior Outcomes by Intervention Condition


Communities That Care (CTC) is a prevention system designed to reduce levels of adolescent substance use and delinquency through the selection of effective preventive interventions tailored to a community’s specific profile of risk and protection. The Community Youth Development Study (CYDS) is a community-randomized trial that includes both repeated cross-sectional and longitudinal cohort designs to assess the ability of CTC to impact adolescent health and behavior problems in communities. Whereas the cross-sectional design will be used to examine community-level change in successive populations of 6th-, 8th-, 10th- and 12th-grade students, the cohort design, which is the focus of this paper, will be used to examine individual-level change in a panel of students during a 5-year period. Because the preventive interventions implemented through the CTC system in the CYDS were selected on the basis of being appropriate for adolescents between fifth and ninth grades, analysis of the cohort sample will assess the efficacy of the intervention in this targeted subpopulation of youth in communities. Additionally, the inclusion of the cohort design in the CYDS reduces the threat of secular trends abating intervention effects, which can be present in repeated cross-sectional designs.

Because the cohort sample recruitment strategy augmented the original Wave 1 sample in Wave 2, multiple imputation analyses were conducted to account for the missing data. Multiple imputation analysis has become an increasingly widely used method to account for both planned (Graham et al. 2006) and unplanned (Shaffer and Chinchilli 2007) missing data, and has been shown to provide acceptable results even when the cause of missingness is not fully captured in the imputation model (Collins, Schafer, and Kam 2001; Graham 2009; Graham et al. 1997). In the CYDS longitudinal cohort design, continued multiple imputation analyses are planned as new waves of data are made available.

As a check on the random assignment of communities to intervention or control condition, we examined differences in the imputed Wave 1 baseline data for 15 substance use and delinquent behavior outcomes, and 28 risk and protective factors using random-coefficients analyses. Results of these analyses supported the baseline equivalency of intervention and control groups across all examined measures. We regard the lack of differences in baseline measures as evidence of the successful a priori matching of communities and their subsequent random assignment to condition. The equivalency of students between intervention and control arms of the study guards against biases that may result from differential history or selection and reduces the plausibility of alternative explanations for hypothesized intervention effects (Murray 1998; Shadish, Cook, and Campbell 2002; Graham 2009; Hedeker and Gibbons 1997). However, as noted by Freedman (2008); (2008), randomization alone does not justify the use of multivariate regression models such as those described in this paper. He notes that adjustment for model covariate may improve precision of the intervention effect, or may make precision worse, and that standard errors can be biased, sometimes severely. Freedman points to results of simulations that suggest that the bias in regression estimates becomes negligible with large sample sizes (e.g., greater than 1,000). Caution is recommended when interpreting results of these models, and comparison among fully unconditional models, models of unadjusted intent-to-treat intervention effects, and conditional models with regression adjustment for covariates will be made as diagnostics to the proposed analyses.

Design decisions regarding the sample recruitment and statistical analysis of the longitudinal cohort sample reflect the theory of change underlying the CTC system. This theory posits that implementing social development strategies for mobilizing communities to adopt a science-based approach to identifying community-specific risk and protective factors will result in better selection and implementation of effective preventive interventions appropriate for a community’s specific needs. In turn, use of these interventions is hypothesized to result in delayed and decreased substance use and delinquency during early adolescence. In the context of the longitudinal cohort design, these outcomes are operationalized as growth trajectories and hazard rates. Early results of the CYDS trial, showing fidelity of CTC implementation in intervention communities (Quinby et al. 2008), significant pre-post change in prevention service system characteristics (Brown et al. 2007), and fidelity of prevention program implementation in intervention communities (Fagan et al. 2008) support the system-level components of the CTC theory of change. Analysis of the cohort sample will determine whether installation of CTC affects exposure to risk and protection, use of substances, and delinquent behavior of the young people growing up in these communities.


We gratefully acknowledge Charles B. Fleming and W. Alex Mason for helpful comments on the manuscript; and the CYDS data collection team and participating communities for their contributions to this study.

Richard F. Catalano is a board member of Channing Bete Company, distributor of Supporting School Success ® and Guiding Good Choices ®. These programs were used in some communities in the study that produced the data set used in this paper.

This work was supported by a research grant from the National Institute on Drug Abuse (R01 DA015183-04) with co-funding from the National Cancer Institute, the National Institute of Child and Human Development, the National Institute of Mental Health, and the Center for Substance Abuse Prevention, and does not necessarily represent the official views of the funding agencies.


  • Arthur MW, Blitz C. Bridging the gap between science and practice in drug abuse prevention through needs assessment and strategic community planning. Journal of Community Psychology. 2000;28:241–255.
  • Arthur MW, Briney JS, Hawkins JD, Abbott RD, Brooke-Weiss BL, Catalano RF. Measuring risk and protection in communities using the Communities That Care Youth Survey. Evaluation and Program Planning. 2007;30:197–211. [PubMed]
  • Arthur MW, Glaser RR, Hawkins JD. Steps towards community-level resilience: Community adoption of science-based prevention programming. In: Peters RD, Leadbeater B, McMahon RJ, editors. Resilience in children, families, and communities: Linking context to practice and policy. New York: Kluwer Academic/Plenum; 2005. pp. 177–194.
  • Arthur MW, Hawkins JD, Pollard JA, Catalano RF, Baglioni AJ., Jr Measuring risk and protective factors for substance use, delinquency, and other adolescent problem behaviors: The Communities That Care Youth Survey. Evaluation Review. 2002;26:575–601. [PubMed]
  • Barber JS, Murphy SA, Axinn WG, Maples J. Discrete-time multilevel hazard analysis. Sociological Methodology. 2000;30:201–235.
  • Bauman KE, Suchindran CM, Murray DM. The paucity of effects in community trials: is secular trends the culprit? Preventive Medicine. 1999;28:426–429. [PubMed]
  • Breslow N, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25.
  • Brown EC, Hawkins JD, Arthur MW, Abbott RD, Van Horn ML. Multilevel analysis of a measure of community prevention collaboration. American Journal of Community Psychology. 2008;41:115–126. [PubMed]
  • Brown EC, Hawkins JD, Arthur MW, Briney JS, Abbott RD. Effects of Communities That Care on prevention services systems: Findings from the Community Youth Development Study at 1.5 years. Prevention Science. 2007;8:180–191. [PubMed]
  • Catalano RF, Hawkins JD. The social development model: A theory of antisocial behavior. In: Hawkins JD, editor. Delinquency and crime: Current theories. New York: Cambridge University Press; 1996. pp. 149–197.
  • Chinman M, Hannah G, Wandersman A, Ebener P, Hunter SB, Imm P, Sheldon J. Developing a community science research agenda for building community capacity for effective preventive interventions. American Journal of Community Psychology. 2005;35:143–157. [PubMed]
  • Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods. 2001;6:330–351. [PubMed]
  • Ennett ST, Ringwalt CL, Thorne J, Rohrbach LA, Vincus A, Simons-Rudolph A, Jones S. A comparison of current practice in school-based substance use prevention programs with meta-analysis findings. Prevention Science. 2003;4:1–14. [PubMed]
  • Fagan AA, Hanson K, Hawkins JD, Arthur MW. Bridging science to practice: Achieving prevention program implementation fidelity in the Community Youth Development Study. American Journal of Community Psychology. 2008;41:235–249. [PubMed]
  • Fagan AA, Hawkins JD, Catalano RF. Using community epidemiologic data to improve social settings: The Communities That Care prevention system. In: Shinn M, Yoshikawa H, editors. Toward positive youth development: Transforming schools and community programs. Oxford; New York: Oxford University Press; 2008. pp. 292–312.
  • Freedman DA. On regression adjustments in experiments with several treatments. Annals of Applied Statistics. 2008;2:176–196.
  • Freedman DA. On regression adjustments to experimental data. Advances in Applied Mathematics. 2008;40:180–193.
  • Glaser RR, Van Horn ML, Arthur MW, Hawkins JD, Catalano RF. Measurement properties of the Communities That Care Youth Survey across demographic groups. Journal of Quantitative Criminology. 2005;21:73–102.
  • Gottfredson DC, Gottfredson GD. Quality of school-based prevention programs: Results from a national survey. Journal of Research in Crime and Delinquency. 2002;39:3–35.
  • Graham JW. Missing data analysis: making it work in the real world. Annual Review of Psychology. 2009;60:549–576. [PubMed]
  • Graham JW, Hofer SM, Donaldson SI, MacKinnon DP, Schafer JL. Analysis with missing data in prevention research. In: Bryant KJ, Windle MT, West SG, editors. The science of prevention: Methodological advances from alcohol and substance abuse research. Washington, DC: American Psychological Association; 1997. pp. 325–366.
  • Graham JW, Olchowski AE, Gilreath TD. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science. 2007;8:206–213. [PubMed]
  • Graham JW, Taylor BJ, Olchowski AE, Cumsille PE. Planned missing data designs in psychological research. Psychological Methods. 2006;11:323–343. [PubMed]
  • Hallfors D, Godette D. Will the 'Principles of Effectiveness' improve prevention practice? Early findings from a diffusion study. Health Education Research. 2002;17:461–470. [PubMed]
  • Hawkins JD, Catalano RF. Investing in your community's youth: An introduction to the Communities That Care system. South Deerfield, MA: Channing Bete; 2002.
  • Hawkins JD, Catalano RF, Arthur MW. Promoting science-based prevention in communities. Addictive Behaviors. 2002;27:951–976. [PubMed]
  • Hawkins JD, Catalano RF, Arthur MW, Egan E, Brown EC, Abbott RD, Murray DM. Testing Communities That Care: The rationale, design and behavioral baseline equivalence of the Community Youth Development Study. Prevention Science. 2008;9:178–190. [PMC free article] [PubMed]
  • Hawkins JD, Catalano RF, Miller JY. Risk and protective factors for alcohol and other drug problems in adolescence and early adulthood: implications for substance-abuse prevention. Psychological Bulletin. 1992;112:64–105. [PubMed]
  • Hawkins JD, Van Horn ML, Arthur MW. Community variation in risk and protective factors and substance use outcomes. Prevention Science. 2004;5:213–220. [PubMed]
  • Hawkins JD, Weis JG. The social development model: An integrated approach to delinquency prevention. Journal of Primary Prevention. 1985;6:73–97. [PubMed]
  • Hedeker D, Gibbons RD. Application of random-effects pattern-mixture models for missing data in longitudinal studies. Psychological Methods. 1997;2:64–78.
  • Hedeker D, Siddiqui O, Hu FB. Random-effects regression analysis of correlated grouped-time survival data. Statistical Methods In Medical Research. 2000;9:161–179. [PubMed]
  • Johnston LD, O'Malley PM, Bachman JG, Schulenberg JE. Monitoring the Future national results on adolescent drug use: Overview of key findings 2007. (NIH Publication No. 08-6418) National Institute on Drug Abuse; 2008. [accessed February 14, 2009]. Available from
  • Laird NM, Ware JH. Random effects models for longitudinal data. Biometrika. 1982;65:581–590.
  • Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
  • McCulloch CE, Searle SR. Generalized, linear and mixed models. New York: John Wiley & Sons, Inc; 2001.
  • Mitchell RE, Florin P, Stevenson JF. Supporting community-based prevention and health promotion initiatives: Developing effective technical assistance systems. Health Education and Behavior. 2002;29:620–639. [PubMed]
  • Murray DM. Design and analysis of group-randomized trials. New York: Oxford University Press; 1998. Monographs in Epidemiology and Biostatistics: Vol. 27.
  • Murray DM, Van Horn ML, Hawkins JD, Arthur MW. Analysis strategies for a community trial to reduce adolescent ATOD use: A comparison of random coefficient and ANOVA/ANCOVA models. Contemporary Clinical Trials. 2006;27:188–206. [PubMed]
  • National Institute on Alcohol Abuse and Alcoholism. Underage drinking: A major public health challenge. 2003. [accessed April 23, 2007]. Alcohol Alert, No. 59 Available from
  • Quinby RK, Fagan AA, Hanson K, Brooke-Weiss B, Arthur MW, Hawkins JD. Installing the Communities That Care prevention system: Implementation progress and fidelity in a randomized controlled trial. Journal of Community Psychology. 2008;36:313–332.
  • Raudenbush SW. Toward a coherent framework for comparing trajectories of individual change. In: Collins LM, Sayer AG, editors. New methods for the analysis of change. Washington, DC: American Psychological Association; 2001. pp. 35–64.
  • Raudenbush SW, Bryk AS. Hierarchical linear models: Applications and data analysis methods. 2nd ed. Newbury Park, CA: Sage; 2002.
  • Reardon SF, Brennan R, Buka SL. Estimating multi-level discrete-time hazard models using cross-sectional data: Neighborhood effects on the onset of adolescent cigarette use. Multivariate Behavioral Research. 2002;37:297–330.
  • Rosenbaum PR. Interference between units in randomized experiments. Journal of the American Statistical Association. 2007;102:191–200.
  • Roussos ST, Fawcett SB. A review of collaborative partnerships as a strategy for improving community health. Annual Review of Public Health. 2000;21:369–402. [PubMed]
  • Rubin DB. Applied probability and statistics. New York: Wiley; 1987. Multiple imputation for nonresponse in surveys Wiley series in probability and mathematical statistics.
  • SAS Institute Inc. The SAS system for Windows (Version 9.1) Cary, NC: SAS Inc; 2001.
  • Schafer JL. Analysis of incomplete multivariate data. New York: Chapman and Hall; 1997.
  • NORM for Windows 95/98/NT: Multiple imputation of incomplete multivariate data under a normal model [computer software] University Park: Pennsylvania State University, Department of Statistics;
  • Schafer JL, Kang J. Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods. 2008;13:279–313. [PubMed]
  • Shadish WR, Cook TD, Campbell DT. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin Company; 2002.
  • Shaffer ML, Chinchilli VM. Including multiple imputation in a sensitivity analysis for clinical trials with treatment failures. Contemporary Clinical Trials. 2007;28:130–137. [PubMed]
  • Singer JD, Willett JB. Applied longitudinal data analysis. New York: Oxford University Press; 2003.
  • Snyder HN, Howard N, Sickmund M. Juvenile offenders and victims: 2006 national report. U.S. Department of Justice, Office of Justice Programs, Office of Juvenile Justice and Delinquency Prevention. U.S. Department of Justice, Office of Justice Programs, Office of Juvenile Justice and Delinquency Prevention 2006; [accessed May 2, 2007]. Available from
  • Social Development Research Group. Community Youth Development Study, Youth Development Survey 2005–2006. Seattle: Social Development Research Group, School of Social Work, University of Washington; 2005.
  • Spoth RL, Greenberg MT. Toward a comprehensive strategy for effective practitioner-scientist partnerships and larger-scale community health and well-being. American Journal of Community Psychology. 2005;35:107–126. [PMC free article] [PubMed]
  • US Census Bureau. Census 2000 Summary File 1 United States. Census Bureau American FactFinder. Census Bureau American FactFinder; 2001. [accessed February 13, 2007]. Available from
  • Wandersman A. Community mobilization for prevention and health promotion CAN work. In: Schneiderman N, Speers MA, Silva JM, Tomes H, Gentry JH, editors. Integrating behavioral and social sciences with public health. Chapter 11. Washington, DC: American Psychological Association; 2001.
  • Wandersman A. Community science: Bridging the gap between science and practice with community-centered models. American Journal of Community Psychology. 2003;31:227–242. [PubMed]
  • Wandersman A, Florin P. Community interventions and effective prevention. American Psychologist. 2003;58:441–448. [PubMed]
  • Weissberg RP, Kumpfer KL, Seligman ME. Prevention that works for children and youth: An introduction. American Psychologist. 2003;58:425–432. [PubMed]
  • Yau KKW. Multilevel models for survival analysis with random effects. Biometrics. 2001;57:96–102. [PubMed]