|Home | About | Journals | Submit | Contact Us | Français|
Cluster randomized trials have become increasingly common in studies that aim to evaluate non-pharmaceutical, behavioral, and quality-improvement intervention [1,2]. Examples of such interventions include community-improvement projects , lifestyle interventions such as cessation of smoking and increasing physical activity , education intervention trials , and office-based child-care intervention programs . The cluster design is often used to minimize contamination between intervention and control groups.
Under a cluster design, individuals within a cluster may be more alike than individuals across clusters. A universally used parameter for measuring the extent to which individuals within a cluster are similar to one another is the intraclass correlation (ICC). Suppose the ICC for patients within the same clinic was 0.04, which is not an uncommon level of correlation , and that there were 100 patients in a clinic. The sample size for a study based on the clinic as a unit of randomization requires five times as many patients as a study using the patient as a unit of randomization. This demonstrates how varying ICC estimates can dramatically change the sample size that would be required to test a hypothesis, and thus its estimation has important implications for study design, study cost, and implementation [2,7].
Consequently, to conduct a power analysis for a cluster randomized trial, reliable estimates of the magnitude of ICC are often required. At least two challenges exist for accomplishing power analysis. First, existing ICC estimates usually come from small pilot studies that use cluster sampling or other intervention studies of a similar kind. Unfortunately, ICC values from pilot studies can be rather unstable, and published ICC values from large studies are rarely available. Murray et al.  reviewed a large number of cluster randomized trials in terms of study design and data analysis and concluded that the lack of published estimated ICCs may lead to inaccurate power calculations and compromise the ability to test the hypothesis articulated. This is not a new problem [5, 9,10]. Second, even for published values of ICC, their standard errors are often too large for practical use. For example, an ICC value of 0.02 that has a standard error of 0.01 implies that the 95 percent confidence bounds are 0.00 and 0.04. For the above clinical example, the sample sizes corresponding to these bounds would range from one third to 1.7 times that of the estimate of 0.02.
This paper has two objectives. First, we report ICC numbers from a national study in the United States—the Randomized Controlled Trial to Prevent Child Violence (Safety Check, NIH-R01HD042260), a multi-level, cluster randomized trial that used the clinical office to deliver the intervention . In addition to reporting ICC values, we show how data collected from repeated measures in cluster randomized trials can be pooled together to improve the precision of ICC estimates. Because studies that have repeated measures over participants are becoming more common, the method of using repeated measurements for estimating ICC has important implications for future design in randomized trials and the analysis of clustered data.
Data from the Safety Check study  were derived from families seeing providers in the Pediatric Research in Office Settings (PROS) Network, the practice-based research network of the American Academy of Pediatrics (AAP). All 677 practices belonging to PROS were invited to participate. In this sample, 137 pediatric practices across the country that were randomized were included. The unit of randomization was the pediatric office, and each unit was assigned to an office-based violence-prevention intervention or a control group. In the intervention group, practitioners were trained to (1) review the parent pre-visit survey including patient-family behavior, parental concern about media use, discipline strategies, and children’s exposure to firearms; (2) counsel using brief principles of motivational interviewing; (3) identify and provide local agency resources for anger and behavior management when indicated; and (4) instruct patient-families to use tangible tools (minute timers to monitor media time/time outs; firearm cable locks to store firearms more safely where children live or play). The control group received usual care enhanced by a literacy promotion attention placebo handout. Randomization was performed at the practice level with computerized random numbers by the study biostatistician who was blinded to the identity of the practices. After allocation, practices were alerted to their group assignment via a letter included with the training materials. Practices were blinded as to the study hypotheses. Each practice had 1–10 providers, with a harmonic mean of 2 providers per practice. Because most practices had either one or two providers, the design at the practice level was not balanced. Each provider recruited 30 consecutive families that presented for a well-child visit for children ages 2–11 years. As a result of the cluster design, patients were clustered within provider, and providers were clustered within practice. Parents were surveyed at baseline, 1 month, and 6 months. The main outcomes were change over time in self-reported media use < 120 minutes per day, use of timeouts for a disciplinary purpose, and the use of firearm cable locks. The study was approved by the Wake Forest University School of Medicine IRB, and the work was carried out in accordance with The Code of Ethics of the World Medical Association for experiments involving humans.
A subset of the entire sample that included all families recruited between 2002 and 2006 was used to calculate the ICC. There were 3,294 families in the selected sample. While the protocol required each provider to recruit 30 families, there were some providers who stalled and recruited a varying number of families. To prevent including providers who might have recruited a non-representative sample of families, we only included providers who recruited more than 25 families. Follow-up surveys were conducted on the phone by a professional survey center 1 and 6 months after the baseline survey, and we only included families who responded to the follow-up surveys in the final sample. The total number of families included in the final sample was 2,649 (80.4% of the baseline sample), the number of providers in the final sample was 90, and the number of practices was 68.
Parents/legal guardians completed surveys, gathering such data as: demographics (e.g., age of child, number of children in the home, race/ethnicity of the child, parental home structure, maternal education), reported parental behaviors (e.g., media use, and discipline practices), and parental attitudes toward these same topics. We first examined both the content of the survey questions and the distributions of the responses. We then selected a total of 10 questions from parental behaviors and attitudes for reporting ICCs. These questions all showed bell-shaped distributions in response patterns, and they also reflected the general scope of the survey questions.
Media-related behavior questions included: “When this child is at home, how many hours per day do they watch TV/videos or play computer games/gameboy?” This was broken down by hours on an average weekday and average weekend day. For the purposes of our analysis, we collapsed these data to reflect average media use time per day (including television, video, computer games, and electronic handheld devices). Our survey presented three survey items assessing parental strategies for their child's media use. These questions asked how often, in the past month, the parent restricted use, explained content, or allowed unlimited media use. The response scale for each question was from "never" to "always" on a 4-point scale. Three media-related items assessed parental awareness about the potential negative outcomes of watching violent media: "watching violent TV programs makes children more afraid," "children's behavior is not influenced by what they see on TV," and "children who watch violent TV think real-life violence is normal behavior." These responses were rated on a 5-point scale from “strongly disagree” a to “strongly agree.” a We developed an awareness scale (range: 3 to 15; Cronbach’s alpha = 0.6) that was derived from the above three items. A higher value on the awareness scale indicated better understanding of the effects of exposure to violent media.
Questions about current discipline practices included: In the past month, “How often did you use time-outs or cool-down periods?" "How often have you yelled at this child?" "How often did you take away privileges (something this child enjoys)?" and "How often did you spank this child?" Response categories varied on a 4-point scale from "never" to "always."
The intraclass correlation is mathematically defined as the ratio of the between-cluster variance to the total variance for an individual family. When the sample is relatively homogeneous and there is little clustering effect, the between-cluster variance tends to be small, leading to a small ICC. At the practice level, which is the unit of randomization, ICC is defined as:
where is the component of variance between practices, is the component of variance between providers within the same practice, and is the variance between families within the same provider. Equation 1 represents the ICC for individuals within practices but belonging to different providers and different families. The standard error of the ICC estimate, which indicates accuracy of the estimate pract of the quantity in Eq. 1, is given by the following equation, originally formulated by Fisher  and discussed in :
where m is the number of providers per practice, and k is the number of families per provider. When m and k are not constant, we use the harmonic mean values for the equation. Because only baseline values are used here, this approach to producing ICC estimates will be called baseline only.
When repeated measurements of an outcome variable are available, the variance between families within the same provider can further be decomposed into two separate components—the variance component of families within provider, and the variance component from repeated measurements over the same family. With repeated measurements, ICC is defined as:
with standard error associated with the estimate given by:
where r is the number of occasions for which a family is measured. In Equation 3, denotes the component of variance across repeated measurements. In the language of multi-level analysis, Eq. 3 contains four levels of data—practice, provider, family, and repeated measurements over family.
The ICCs for various levels of clustering are defined similarly. For example, with a single occasion of measurement for each family, the ICC at the provider level is given by:
The evaluations of variance components for both baseline-only and repeated-measurement ICCs were conducted using estimation methods for multi-level linear models, and they were implemented through the program PROC MIXED in SAS (Statistical Analysis Systems Institute, Inc.) The program codes for both the baseline-only and repeated-measurement approaches are included in Appendix A for illustrating the multi-level linear model. In order to accommodate a possible difference in outcome measures over time, the estimation of the variance component for repeated measurement needs to be adjusted. The adjustment was made through adding time as a fixed effect in the procedure PROC MIXED. Accordingly, the adjustment in effect “de-means” observations across time points to derive proper variance components. The ICCs can therefore be viewed as measures of cluster strength defined within a framework of a hierarchical model with covariates . Although computationally PROC MIXED is not as efficient as PROC NESTED —another SAS multi-level sampling procedure for estimating variance components—it has the advantage of being able to handle fixed effects and unbalanced designs . PROC MIXED also subsumes the variance component estimation program PROC VARCOMP, which only estimates simple random effects. We used PROC MIXED for both three-level (practice, provider, and family) and four-level (practice, provider, family, and repeated measurements) variance-component analysis such as the one entailed by Eq. 1.
The commonly used confidence interval derived from using Eq. 4 is based on several statistical assumptions: that the sample is sufficiently large, that ICC is not close to one, and that the cluster randomized trial has a balanced design. In order to validate Eq. 4 and its counterpart Eq. 3 against these assumptions, we used Monte Carlo simulation to report bias-corrected bootstrap confidence intervals  in reporting ICCs. The bias-corrected bootstrap method first resamples multiple times from the data set, with replacement, to create a bootstrap sample. Then each bootstrap sample was analyzed exactly the same way as one would analyze the original data set. The same analysis was applied to 200 bootstrap samples to generate an empirical distribution of the estimates. The difference between the 5th and 50th percentiles of the bootstrap estimates, and between the 51st and 95th percentiles, were respectively used to estimate the left and right 95-percent confidence intervals. Because of the unbalanced nature of the data, a bootstrap sampling scheme that sampled according to the nested hierarchy (e.g., sample first from practice, then from provider within practice) will produce samples of different sizes. Therefore, it was decided that a fixed sample size of N = 2,649 would be directly drawn for each of the 200 bootstrap samples. In simulation experiments for studying ICC , it was reported that the two methods did produce similar results when ICC values were not too high, which is the case in the present study. Besides providing nonparametric confidence intervals, the bootstrap method also served as a descriptive tool for assessing the stability and replicability of the results across repeated samplings . Prior work on applications of bootstrap methods to ICC analysis suggest that the nonparametric method is reliable provided the sample size is not too small .
Table 1 shows the characteristics of the sample by intervention status. Tables 2 and and33 show the ICCs, the asymptotic confidence intervals, and the bias-corrected bootstrap confidence intervals for repeated measurements for the practice and provider levels, respectively. ICC values have the ranges 0.012–0.110 and 0.018–0.110 at the practice and provider levels, respectively. Tables 4 and and55 show the ICCs, the asymptotic confidence intervals, and the bias-corrected bootstrap confidence intervals for baseline-only data for the practice and provider levels, respectively. For the baseline-only approach, ICC values have the ranges 0.0007–0.141 and 0.017–0.160 at the practice and provider levels, respectively.
We compared both the ICC estimates and their associated standard error estimates across the items for the repeated-measurement approach and the baseline-only approach, examining them at three levels: patient-family, provider, and practice. Because we pooled together intervention and control groups for estimating ICCs, it is important to verify that the characteristics of the two groups are similar. Table 1 shows that the summary statistics are comparable between the intervention and control groups, even though some variables have significant p-values, which are due to the relatively large sample sizes.
Except in the case of media use, the ICC values for the repeated-measurement and baseline-only approaches generally agreed both at the practice level (Tables 2 and and4)4) and at the provider level (Tables 3 and and5).5). We noticed that the variance component from repeated measurements for media is quite large (12.2 vs. 0.53 from practice) because of a substantial amount of variation in reported media use within the family over the 6-month period of the study. Also, with the exception of media use, all of the repeated-measurement ICC estimates—provider and practice level included—fall within the 95% confidence intervals calculated from the baseline-only approach using Eq. 2. If we assume that the repeated-measurement ICC values are close to the population ICC values, then this would indicate that Eq. 2 is a good approximation of the true standard error.
Compared with the baseline-only approach, the confidence intervals for ICCs were narrower, indicating precision, when repeated measurements were used. At the practice level, the mean width of the ICC confidence interval using repeated measurement is 0.020 (Table 2), while that from using baseline-only data is 0.048 (Table 4). The reduction in the width of the confidence interval is 58%. At the provider level, the reduction is 42% (mean widths are, respectively, 0.027 and 0.048 from Tables 3 and and5).5). In terms of the ratio of standard error to ICC value, which indicates the size of the relative margin of error (smaller implies higher precision), the values (SD) for repeated measurements and baseline only are, respectively, 0.189 (0.03) and 0.260 (0.09) for provider-level ICCs. For practice-level ICCs, the values are, respectively, 0.223 (0.178) and 0.328 (0.147).
The results from bootstrap 95% confidence intervals lend some support to the validity of Eqs. 2 and 4 for calculating standard errors. For example, for repeated-measurement practice-level ICCs, the average absolute difference (SD) between the lower confidence intervals for the asymptotic and bootstrap methods is 0.0046 (0.0027). For the upper confidence intervals, the average absolute difference is 0.0044 (0.0030). Generally, the bootstrap method produces slightly wider confidence intervals. The bootstrap method also shows that, with the exception of media use, the results from using repeated-measurement and baseline-only are both rather stable. While the bootstrap confidence intervals agree reasonably well with the asymptotic calculation, especially for the repeated-measurement ICCs, the bootstrap methods tend to produce asymmetric confidence intervals. Figure 1 shows the asymmetry of the width of the left and right bootstrap confidence intervals for practice-level ICCs from repeated measurements. The corresponding bootstrap confidence intervals for the baseline-only method are shown in Fig. 2. Not surprisingly, the widths of the right confidence intervals tend to be longer because ICCs are bounded by zero on the left. However, it is unexpected that the symmetrical baseline-only asymptotic confidence intervals contain more repeated-measurement ICCs than the asymmetrical bootstrap confidence interval method does. The bootstrap confidence intervals only contain 8 and 6 repeated measurements for practice and provider ICCs, respectively, while the asymptotic confidence intervals contain 9 and 9, out of 10 and 10 confidence intervals, respectively.
Because of attrition, follow-up measurements may be missing. Furthermore, nonresponse to individual items was also present. In order to produce comparable estimates for the baseline-only and repeated-measurement approaches, in this paper we only reported results of families that had data on all three time points. Within this “complete case” sample, nonresponse rates varied across different items, and the average rate was approximately 8% .The computational procedure in PROC MIXED does not delete subjects that contain partially missing values , and it handles missing values by making the assumption that missing data are missing at random . This flexibility allows the preservation of partially missing data cases across repeated measurements and increases the power of the proposed procedure. For data within which a non-missing-at-random mechanism might be potentially at work, the estimation of ICC will require further investigation.
A limitation of this study is that some of the questions reported in this paper contain ordinal, not continuous, response categories. Although we have included only questions for which distributions are bell-shaped, the procedure should only be viewed as an approximation estimating method. For rigorously treating ordinal data, one possible solution is to develop a generalized nonlinear model framework and utilize programs such as PROC NLMIXED. For ICC estimation for binary data, alternative procedures are available for cross-sectional data  and could be further extended to repeated measurement.
With the growth of cluster randomized trials conducted in research-based networks, the estimation and reporting of intraclass correlation will play an increasingly important role in study design and data analysis. From an evidence-based approach that aims to improve the quality of reporting, the Consolidated Standards of Reporting Trials (CONSORT) statement recommends standard reporting protocols. It is clear that ICC must be a part of this trial report. Even more importantly, to design a group-randomized, controlled trial with sufficient power to test hypotheses, a reasonable estimate of the ICC must be utilized [8,23]. We report here on ICCs generated from national data derived from an office-based clinical trial in the United States. Previously published ICC estimates for pediatric office-based visits, to our knowledge, are rare. In addition, except in a limited setting [24, 25], it appears that the use of repeated measurements to improve ICC estimates also has not been fully studied. Based on the analysis of a U.S. national sample of relatively large size with repeated measurements, we expect that both the ICC estimates and the methods for improving their precision reported in this paper are generalizable.
In conclusion, the repeated measurement approach provides ICC estimates with substantially narrower confidence intervals. The asymptotic standard error estimates for both repeated measurement and baseline-only tend to be slightly smaller, as evaluated against the nonparametric bootstrap procedure. The ICC values derived from baseline-only and repeated measurement estimation methods generally agree well. For variables that have a large variation over repeated measurements, the ICC estimates using repeated-measurement and baseline-only data can be very different.
This study was supported by a grant from the National Institute of Child Health and Human Development (NICHD) (R01 HD 42260), the Agency for Healthcare Research and Quality (AHRQ), The Robert Wood Johnson Generalist Faculty Scholars Program (RWJGFS), and the American Academy of Pediatrics’ Friends of Children Fund.
We especially appreciate the efforts of the PROS practices and practitioners. The pediatric practices or individual practitioners who enrolled participants in this study are listed here by AAP Chapter: Alabama: Pediatric Care Group (Montgomery); Alaska: Anchorage Pediatric Group, LLC (Anchorage), Practice of Joy Neyhart, MD (Juneau); Arizona: Orange Grove Pediatrics (Tucson), Tanque Verde Pediatrics (Tucson); California-1: Practice of Arthur S Dover, MD (Freedom), Sierra Park Pediatrics (Mammoth Lakes), Palo Alto Medical Foundation (Palo Alto), Pediatric & Adolescent Medical Associates of the Pacific Coast, Inc (Salinas); California-3: East County Community Clinic – Lakeside (Lakeside), Pediatric Medical Associates of Tri-City, Inc (Vista), La Jolla Pediatrics (La Jolla); California-4: Edinger Medical Group & Research Center, Inc (Fountain Valley); Colorado: Community Health Services (Commerce City), Lamar Pediatrics (Lamar), Rocky Mountain Health Centers, North (Denver); Connecticut: Mauks Koepke Medical, LLC (Danbury), Jeff Cersonsky, MD (Southbury), Pediatric Associates of Connecticut, PC (Waterbury); Florida: Atlantic Coast Pediatrics (Merritt Island), Family Health Center East & Oviedo Children's Health Center (Orlando), Heartland Pediatrics of Lake Placid (Lake Placid); Georgia: The Pediatric Center (Stone Mountain), Practice of Victor Lui, MD (Chamblee), Practice of Nandlal Chainani, MD (Ocilla), Snapfinger Woods Pediatric Associates, PC (Decatur), Gwinnett Pediatrics & Adolescent Medicine (Lawrenceville); Hawaii: Island Youth Heart and Health Center (Hilo), Children's Medical Association Inc (Aiea), Medicine Pediatrics Associates (Honolulu); Iowa: Children's Hospital Physicians (Des Moines); Illinois: Yacktman Children's Pavillion (Park Ridge), S.W. Pediatrics (Orland Park), Stroger Hospital of Cook County (Chicago), Macomb Pediatrics, SC (Macomb); Indiana: Georgetown Pediatrics (Indianapolis), Jeffersonville Pediatrics (Jeffersonville); Kansas: Ashley Clinic (Chanute); Louisiana: Carousel Pediatrics (Metairie), Shalom Clinic for Children (Natchitoches), The Baton Rouge Clinic, AMC (Baton Rouge); Massachusetts: Burlington Pediatrics (Burlington), Holyoke Pediatric Associates (Holyoke), Pediatric Associates of Norwood (Franklin), Mary Lane Pediatric Associates (Ware); Maryland: Practice of Steven E Caplan, MD, PA (Baltimore), Dundalk Pediatric Associates (Baltimore), Practice of Ralph Brown, MD (Baltimore); Maine: Maine Coast Memorial Hospital (Ellsworth); Michigan: Pediatric & Adolescent Medicine (Bay City), Pediatric Health Care (Sterling Heights); Minnesota: Brainerd Medical Center, PA (Brainerd), Lakeview Clinic - Watertown Pediatrics (Watertown); Missouri: Children's Mercy Hospital Pediatric Care Center (Kansas City), Tenney Pediatric and Adolescent LLC (Kansas City); North Carolina: Guilford Child Health, Inc – Greensboro (Greensboro), Goldsboro Pediatrics, PA (Goldsboro), Aegis Family Health Center - Winston East Pediatrics (Winston-Salem), Guilford Child Health, Inc - High Point (High Point); North Dakota: Altru Clinic (Grand Forks); New Hampshire: Foundation Pediatrics (Nashna); New Jersey: Lourdes Pediatric Associates (Camden), Chestnut Ridge Pediatric Associates (Woodcliff Lake); New York-1: Elmwood Pediatric Group (Rochester), Lewis Pediatrics (Rochester), United Medical Associates Pediatrics (Binghamton), Wayne Medical Group (Williamson); New York-3: Pediatric Primary Care-Montefiore Medical Center (Bronx), Cardinal Mc Closkey Services (Bronx), Pediatric Practice Bronx-Lebanon Hospital (Bronx), Westchester Avenue Medical and Dental Center (Bronx), Bronx Lebanon Pediatric Clinic - Third Avenue (Bronx); New Mexico: Presbyterian Family Healthcare - Rio Bravo (Albuquerque), Santa Fe Pediatric Associate, PC (Santa Fe); Ohio: Oxford Pediatrics & Adolescents (Oxford), Pediatric Associates of Lancaster (Lancaster); Oklahoma: Pediatric & Adolescent Care, LLP (Tulsa), OK State University (OSU) - Center for Health Sciences (Tulsa); Oregon: NBMC (Coos Bay); Ontario: Richard J MacDonald, MD (Oakville, Ontario); Pennsylvania: Pennridge Pediatric Associates (Sellersville), Buckingham Pediatrics (Buckingham), Pediatric Practices of Northeastern Pennsylvania (Honesdale), Laurel Health Center – Blossburg (Blossburg); Quebec: Clinique Enfant-Medic (Dollard des Ormeaux); Puerto Rico: Practice of Dra Ethel Lamela, MD (Isabela). Rhode Island: Northstar Pediatrics (Providence); South Carolina: Edisto Pediatrics (Walterboro), Oakbrook Pediatrics (Summerville), Palmetto Pediatrics & Adolescent Clinic, PA (Columbia), Barnwell Pediatrics, PA (Barnwell); Tennessee: ETSU Physicians & Associates (Johnson City), Pediatric Consultant, PC (Memphis), Memphis and Shelby County Pediatric Group (Memphis); Texas: The Pediatric Clinic (Greenville), Winnsboro Pediatrics (Winnsboro), Su Clinica Familiar (Harlingen), Parkland Health & Hospital System (Dallas), Child Wellness Center (Horizon City), Danette Elliott-Mullens, DO, PA (New Braunfels); Utah: University of Utah Hospitals & Clinics (Park City), Utah Valley Pediatrics, LC (American Fork), University of Utah Health Sciences Center (Salt Lake City), Willow Creek Pediatrics – Draper (Draper), IHC Health Center – Memorial (Salt Lake City); Virginia: Tidewater Pediatric Consultants, PC (Virginia Beach), Hampton Roads Pediatrics/CMG (Hampton), Alexandria Lake Ridge Pediatrics (Alexandria), Pediatrics of Arlington, PLC (Arlington), Fishing Bay Family Practice (Deltaville); Vermont: University Pediatrics, UHC Campus (Burlington), Pediatric Medicine (South Burlington), Brattleboro Primary Care (Brattleboro), University Pediatrics (Williston), Practice of Rebecca Collman, MD (Colchester), Springfield Pediatric Network (Springfield); Washington: Harbor Pediatrics (Gig Harbor), Central Washington Family Medicine (Yakima); Wisconsin: 16th Street Community Health Center (Milwaukee), Beloit Clinic SC (Beloit), Gundersen Clinic – Whitehall (Whitehall), Ministry Medical Group – Woodruff (Woodruff); West Virginia: Grant Memorial Pediatrics (Petersburg); Wyoming: Jackson Pediatrics, PC (Jackson).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.