|Home | About | Journals | Submit | Contact Us | Français|
The majority of research examining college drinking utilizes self-report data, and collateral reports have been used to verify participants' self-reported alcohol use.
This meta-analytic integration examined the correspondence of over 970 collateral and participant dyads in the college setting.
Results indicated that there is little bias (mean difference) between collateral estimates of participant drinking and participant's self-report. A cumulative meta-analysis revealed that this (null) effect was stable and unlikely to be altered by subsequent research or the existence of unpublished studies. Analysis of the agreement between collaterals and participant estimates (measured by intraclass correlation coefficients; ICCs) revealed moderate levels of agreement (mean ICC = 0.501). Examination of predictors of both bias and agreement in collateral and participant reports indicates a possible intentional and protective underreporting on the part of the collaterals. Ways to reduce this bias are discussed along with the value of using collaterals to verify participant self-report in the college setting.
IT WAS ONCE assumed that adults would underreport their drinking out of concern for social or legal consequences, and therefore their self-reports could not be trusted (Babor et al., 1987;Watson et al., 1984). As a result, collateral informants were employed to verify the individuals' self-report of drinking. Research participants nominate collateral informants, usually friends or family members, to provide corroboration for self-reported alcohol use and/or problems. Ideally, the collateral has frequent contact with the participant, especially in drinking situations. Statistically significant positive correlations have been found between collateral estimates of participant self-reported substance use (Babor et al., 1987; O'Farrell and Maisto, 1987), although poor collateral and participant agreement has been found in those with comorbid mental disorders (Stasiewicz et al., 2008). As more research indicates that self-reported drinking is often accurate, the use of collaterals has been discouraged as a way to obtain accurate reports of participant alcohol use (Babor and Del Boca, 1992; Babor et al., 1987, 2000; Maisto et al., 1990).
Self-report is the most common method of data collection with college students, and researchers continue to use collateral informants to confirm self-reports of drinking behavior of research participants. As in the adult literature, collaterals are most often used in the context of randomized clinical trials to assess the veracity of the self-report of participants who have received treatment. The primary downside of the use of collaterals is the significant time and effort it takes to recruit, track, and contact them in order to obtain information about the participants' drinking. Therefore it is important to determine the extent to which estimates of drinking provided by collaterals in college drinking research correspond with the self-reported alcohol use of participants. This correspondence can be expressed in a number of ways. In the present effort, we define bias as the mean difference between participant and collateral reports, and we define agreement as the reliability of the estimates that are provided by collaterals and participants (how much their estimates agree). It is important to evaluate both as a sample could exhibit no bias (no mean differences) with little or no agreement between collateral and participant reports.
College students are a unique population to collect collateral reports as there may be little motivation for college students to misrepresent their alcohol use. Heavy drinking is perceived by many students to be an integral part of the college culture for many college students (Borsari and Carey, 2001, 2006; Perkins, 2002), and is often highly salient at social functions and sporting events. Therefore it is unlikely that students view their alcohol use as socially undesirable or embarrassing. This is much different for adults and adolescents for whom heavy drinking is more likely to be seen as atypical and problematic. As a result, adults and adolescents may be more motivated to underreport their use (especially in treatment samples) than college students. Indeed, previous research with college collaterals (Laforge et al., 2005) found no evidence of systematic underreport of alcohol use. Furthermore, there was no evidence that the use of a “bogus pipeline” (where the student was told that a collateral would verify his/her self-report, but no verification actually was carried out) enhanced the accuracy of college student's self-reports. There may be situations in which collateral informants would be useful to corroborate self-report in the college setting. For example, students who have received judicial sanctions (Barnett and Read, 2005) or who are going to receive unwanted treatment as a result of their responses (as in stepped care, Borsari and O'Leary Tevyaw, 2005; Sobell and Sobell, 2000) may intentionally underreport their alcohol use. Recent research indicates that mandated students intentionally underreported their alcohol use prior to receiving an intervention due to concerns about how the information would be used (Walker and Cosden, 2007). Given the amount of research conducted with college students, we examined the research literature to further our understanding of the factors associated with collateral and participant correspondence and to determine the situations in which collateral reports can be a valuable option.
Several predictors in the adult collateral literature have been hypothesized to influence the correspondence between collateral and participant reports. Therefore, we selected 7 predictors that may predict the magnitude of effect sizes of collateral and participant correspondence (with smaller bias effect sizes and larger agreement effect sizes).
The frequency of contact the collateral has with the participant is an important factor in enhancing the rates of correspondence in adults (Maisto et al., 1979; O'Farrell et al., 1984) and college students (Laforge et al., 2005). Therefore, studies with collaterals who report frequent contact with the participants should produce smaller bias and greater agreement in collateral and participant correspondence.
The collateral's relationship with the participant may also influence the accuracy of their estimates. Previous research with college students revealed that collaterals who were close friends of the participant demonstrated greater levels of correspondence than collaterals who were not close friends (Laforge et al., 2005). Therefore, studies that used collaterals who were close friends should exhibit smaller bias and greater agreement.
The confidence of collaterals in their estimates of participant drinking has also been linked to collateral/participant correspondence in adults (O'Farrell et al., 1984) and college students (Laforge et al., 2005). Collaterals may have frequent contact with the participant but not in drinking situations, or the collateral may not pay attention to the participant's alcohol use. Therefore, studies wherein collaterals expressed greater confidence in their reports should exhibit smaller bias and greater agreement.
The studies included in this meta-analysis included a wide variety of drinking behaviors, and the type of drinking behavior the collateral is asked to report may also influence his/her accuracy in corroborating the participant's drinking. Therefore, studies that used more concise questions (e.g., the highest number of drinks on one specific occasion) should report smaller bias and greater agreement than studies that used more vague questions (e.g., frequency of drinking alcohol over the past 30 days).1
The variety of drinking behaviors reported by the participants also fall into 3 categories that are commonly assessed: typical quantity of alcohol consumed, the frequency of alcohol consumption, and heavy drinking episodes (e.g., peak drinks in past month, binge drinking). These categories may differ in their salience for the collateral, which would influence his/her accuracy in estimating the participant's drinking. Therefore we hypothesized that heavy drinking would be most would exhibit smaller bias and greater agreement.
Whether the participant is motivated to misrepresent his or her use has been recognized as affecting the validity of participant self-report (Babor and Del Boca, 1992). Two of the eight studies in this meta-analytic integration recruited mandated students who had been referred to the administration for violating campus alcohol policy (Barnett et al., 2007; Borsari and Carey, 2005). It is possible that mandated students may be more motivated to underreport their alcohol use as they had already been under the scrutiny of their school administration for their drinking (e.g., Walker and Cosden, 2007). Therefore studies that included mandated students should exhibit greater bias and less agreement.
Men and women may use alcohol in the college drinking environment differently. Specifically, alcohol use may be a more of a central role in men's social interactions than women's (Borsari and Carey, 2006). The more men interact with peers, the more drinking they report (e.g., Dorsey et al., 1999). In contrast, women may not drink as frequently or as heavily in social situations (Dowdall et al., 1998). As a result, women may have fewer opportunities to observe their female friends' alcohol use, making it more difficult for collaterals to estimate the drinking of their female friends. There is also a lesser degree of social acceptability of alcohol use (especially heavy drinking) for women (e.g., Ashmore et al., 2002; Young et al., 2005), which may lead to conscious (protective) or sub-conscious bias in collateral reports. Therefore, we predict that there will be greater bias and less agreement for collateral estimates of female participants' drinking.
The present effort is a meta-analytic integration of collateral–participant correspondence in their estimates of alcohol use in the college setting. Using traditional meta-analytic techniques (Mullen, 1989; Rosenthal, 1991), we examined the extent to which collateral and participant self-reports agreed. We also present a cumulative meta-analysis (Mullen et al., 2001) to determine whether the set of effect sizes gathered from research conducted to date are a sufficient and stable representation of collateral reports, which may help determine the extent to which additional resources should be devoted in collecting data from collateral informants. Finally, using an expanded dataset, we evaluate 7 factors that may moderate correspondence between participants and collaterals.
Standard literature search techniques were utilized to conduct an exhaustive search for studies that utilized collateral informants for drinking behaviors in a college setting. Online computer searches of databases such as PubMed and PsycINFO used combinations of the words college, collateral, alcohol, substance use, and drinking. We also used ancestry and descendancy approaches, together with the invisible college [researchers active in the domain, see Mullen (1989)]. Data available as of September 2008 were eligible for inclusion.
Studies were included if they met the following criteria: first, participants had to be college students. Second, the study had to utilize collateral informants who provided information on the participants' drinking. Third, studies and/or primary authors had to provide sufficient information from which a test of the collateral and participant discrepancy in alcohol use could be derived.2 The effect sizes for bias in this analysis were based on paired sample t-tests, where an effect of zero indicated that there was no difference between the participants' and the collaterals' reports. A positive test indicated that the collaterals provided lower estimates of the participants' drinking than the participants did. A negative test indicated that the collaterals provided higher estimates of the participants' drinking than the participants did. The effect sizes for agreement were based on intraclass correlation coefficients (ICC) calculated and provided by the primary researchers for the purpose of this meta-analytic integration.
In the course of conducting this literature search, 11 published and unpublished articles, reports, and theses were considered for inclusion (Barnett et al., 2007; Borsari and Carey, 2005; Carey et al., 2006; Curtin et al., 2001; Greaves, 1996; Hagman et al., 2007; Laforge et al., 2005; Marlatt et al., 1998; O'Leary Tevyaw et al., 2007; Stacy et al., 1985; Tevyaw et al., 2005). One study was not included in this meta-analysis because the document did not provide enough information to derive an effect size, and the original data files had been lost (Greaves, 1996). A second study (O'Leary Tevyaw et al., 2007) did not collect participant and collateral information on the same metric, making it impossible to extract an effect size. The last study that was omitted from this analysis (Tevyaw et al., 2005) recruited collaterals who were required to attend a two-session brief motivational intervention or standard education session with the participant. Because this study was qualitatively different than the others where it involved collaterals in the therapeutic context, we excluded it from the meta-analysis.
The remaining 8 studies were first analyzed as wholly independent tests of the bias hypothesis, coded separately by gender where possible. These efforts yielded 14 separate tests of collateral and participant correspondence, representing the responses of 970 collateral–participant dyads. The primary authors were contacted in order to get separate hypothesis tests of bias and agreement by participant gender for each study. All authors provided this information with 2 exceptions: one was unable to recover the data to provide these estimates (Stacy et al., 1985); and another could only recover a portion of the data from the original report (Marlatt et al., 1998). Thus, the meta-analysis of wholly independent samples was compiled by aggregating across the individual nonindependent hypothesis tests using the most conservative degrees of freedom from each nonindependent test for the aggregate statistic.
A cumulative meta-analysis (Muellerleile and Mullen, 2006) was conducted on the same data in order to determine whether additional research would be necessary to demonstrate bias; that is, whether additional research would be necessary to demonstrate that participant–collateral reports differ from one another. Following the procedures outlined in Mullen and colleagues (2001), we performed new meta-analyses each year (or wave) that new hypothesis tests were added to the database in order to demonstrate the sufficiency and stability of the database. In this case, sufficiency would indicate that no additional evidence is required to establish the existence of an effect, and stability indicates that the effect size is unlikely to change with the addition of new studies.3
The hypothesis tests for the cumulative meta-analysis on bias are described in Table 1, which represents the individual hypothesis tests of participant–collateral bias and agreement collapsed across nonindependent tests of the hypothesis. Therefore, the effect size statistics are presented as r (the weighted average of the individual nonindependent tests presented in Table 2). All hypothesis tests were coded for direction of effect (where + = the participant's self-reported use was higher than that of the collateral, and − = the participant's self-reported use was lower than that of the collateral) and using the smallest number of collateral/participant pairs (by gender) within each study.
All analyses were conducted using Mullen's Advanced BASIC meta-analytic database management system (Mullen, 1989). The significance level of an effect is provided by Z, or standard normal deviate, and its associated p-value; ZFisher is used as an indicator of effect size; and relationships between predictors and effect sizes are provided by the correlation coefficient r. BASIC provides fixed-effects analysis of the data, the underlying assumption of which is that the studies sample from a single distribution such that there is one true effect underlying the analysis. This is in contrast with a random-effects meta-analysis which assumes that the effects represent a random selection of possible effects (Schulze et al., 2003). We restricted our analysis in such a way that the studies are relatively uniform in the means by which the participant–collateral differences were collected; if we had been more inclusive in these criteria, a random-effects analysis may have been more appropriate.
The primary level data were collected at nine 4-year institutions (5 private, 4 public) with participants aged 17 to 25 years (mean age: 19.52 years). Nearly all (95%) collaterals were college students who were the same gender as the participant. Six of the studies delivered brief alcohol interventions to the participants (Barnett et al., 2007; Borsari and Carey, 2005; Carey et al., 2006; Curtin et al., 2001; Laforge et al., 2005; Marlatt et al., 1998), whereas two of the studies explicitly assessed collateral–participant bias (Hagman et al., 2007; Stacy et al., 1985).4,5
In order to examine the predictors of participant–collateral bias and agreement, it was necessary to re-analyze the data from these 8 studies such that each study could contribute multiple tests of the bias hypothesis (Table 2) and the agreement hypothesis (Table 3), a procedure consistent with recommended meta-analytic methods (Lipsey and Wilson, 2001; Mullen, 1989; Schulze et al., 2003).6 The meta-analytic conversions of the bias effect sizes involved standard weighting procedures (i.e., ni − 3) (Mullen, 1989; Rosenthal, 1991), but the conversions of ICC to Fisher's Z were somewhat different (McGraw and Wong, 1996). Although the ICC to ZFisher conversion for the particular type of ICC required to assess agreement in this case rendered an equation that was equivalent to an r to ZFisher conversion, the meta-analytic aggregation required weighting by the reciprocal of the appropriate variance (in this case, the appropriate weight was ni − 2).
When possible, each hypothesis test was coded for each of the 6 predictors (with separate tests conducted separately for male and female participants). Contact was defined as the percentage of the collaterals in the study who reported seeing the participant “every day” or “nearly every day.” The relationship between the collateral and participant was the percentage of participants who reported being a “close” or “best” friend. This predictor was not significantly correlated with contact. The collaterals' confidence was obtained by calculating the percentage of participants in each study who reported being “very” or “mostly” confident in the estimates they provided. Drinking specificity addressed methodological features of the primary studies' examination of drinking behavior. Across the 8 studies, collaterals were asked to report on 7 different drinking behaviors in the study: drinks per week; frequency of drinking alcohol; heavy drinking episodes (defined as 5 or more drinks in one sitting for men, 4 or more for women; Wechsler et al., 1995); typical number of drinks per occasion; highest number of drinks on one occasion; frequency of drinking game participation; and drinking days per week. An additional 6 drinking questions were asked to collaterals in 2 studies (Hagman et al., 2007; Stacy et al., 1985): the frequency and typical quantity of beer, wine, and hard liquor consumed by the participants. Three judges were asked to rate all 13 drinking behaviors on a scale of 0 (concise: defined as “specific, precise, explicit”) to 100 (vague: defined as “not precisely defined, determined, or distinguished”). Reliability of the judges' ratings was good (intraclass correlation coefficient = 0.79). The drinking specificity predictor was the mean of the judges' ratings. The drinking behaviors predictor was created by categorizing each of the drinking behaviors into those that assessed quantity (e.g., typical number of drinks per occasion), frequency (e.g., frequency of drinking alcohol), or heavy drinking (e.g., binge drinking). Finally, studies that included mandated students were dummy coded (mandated = 1).
Table 1 displays the results of the 14 wholly independent hypothesis tests of bias obtained by averaging the individual effect sizes presented in Table 2. With each hypothesis test weighted by the sample size, there was a nonsignificant, practically null effect (; , p = 0.290) of difference between collateral and participant reports. Therefore, there appeared to be no practical discrepancy between participant and collateral estimates of alcohol use. A diffuse comparison of these effect sizes [or homogeneity test, , p = 0.164] indicates that the effect sizes reported in Table 1 are not significantly different from one another. Table 1 also displays the results of the 14 wholly independent hypothesis tests of agreement obtained by averaging the individual effect sizes presented in Table 3. With each hypothesis test weighted by the reciprocal variance for the ICC, there was a significant, moderate effect (average ICC = 0.537, ZFisher = 0.601 ± 0.065, p < 0.001) of agreement between collateral and participant reports. A diffuse comparison of these effect sizes [, p = 0.830] indicates that the effect sizes reported in Table 1 are not significantly different from one another. For both bias and agreement, a funnel plot of the effect sizes against the sample sizes, though sparse, revealed no evidence of any publication bias that would produce results different from those reported above.
The combined results of the 102 tests of the magnitude of collateral/participant bias with each hypothesis test weighted by its reciprocal variance estimate (ni − 3) revealed a significant,7 practically null (; , p = 0.035) effect. Therefore, there appeared to be little bias between participant and collateral estimates of alcohol use. A diffuse comparison of these effect sizes [, p < 0.001] indicates that the effect sizes reported in Table 2 are significantly different from one another. The independent and nonindependent databases yielded the same conclusion: there was little bias between participant and collateral reports of drinking behavior.
Finally, Table 3 presents 102 tests of the magnitude of collateral/participant agreement, which revealed a significant, moderate (average ICC = 0.501; 0.551 ± 0.026, p < 0.001) effect. This result does not reach the accepted standard reliability estimate for standardized measures by trained raters (Nunnally and Bernstein, 1994). However, in a context where participants and collaterals are not trained to produce reliable estimates on a variety of drinking behaviors during a recall period of at least 30 days, this level of agreement may be considered good (Wong, personal communication, 12/1/2008). A diffuse comparison of these effect sizes [, p < 0.001] indicates that the effect sizes reported in Table 3 are significantly different from one another. The results of this analysis indicated considerable variability among the effect sizes of the ICCs.
We performed a cumulative meta-analysis (Muellerleile and Mullen, 2006; Mullen et al., 2001) on the wholly independent bias dataset to examine the evidence that participant self-reports of drinking behavior were significantly different than collateral reports of the participant's drinking behavior. Figure 1 displays the results of the cumulative meta-analysis, combining the average effect, the fail-safe ratio, and the cumulative slope. The average effect size or mean effect across the 7 waves of the database hovered under and then settled just above zero.
Because the fail-safe ratio did not rise above 1 over the course of the database, there is insufficient evidence for bias; in other words, there appears to be no evidence for a discrepancy between participants' and collaterals' reports. The fail-safe ratio developed by Mullen and colleagues (2001) was intended to address the “file drawer” problem, meaning that there are likely to be some number of studies with null results that were never published, languishing in researchers' file drawers. Rosenthal (1979) developed a means of calculating a “fail-safe” number of unavailable studies with null results that would be required to bring the significance level down to just significant at p = 0.05. Rosenthal also suggested (Rosenthal, 1979, 1984; Rosenthal and Hall, 1981) that a reasonable benchmark against with which the fail-safe number could be compared was 5k + 10, where k refers to hypothesis tests. That is, there were unlikely to be 5 times the number of retrieved studies remaining unpublished in file drawers, and the value of 10 was added to ensure that the smallest database must have a fail-safe number of at least 15. The fail-safe ratio is the ratio of Rosenthal's fail-safe number to the benchmark value, and when it exceeds 1, it has reached a level that will tolerate unpublished null results. In this case, the indicator did not rise above 1, so there was no evidence that this database was sufficient to show an effect different from zero across the 6 waves in the database. Thus, there was no evidence that a bias exists between participant and collateral reports of drinking.
The cumulative slope indicates that the database is stable. The cumulative slope is the absolute value of the slope of the line derived from regressing effect size upon the number of hypothesis tests in the database. As the cumulative slope approaches 0, the effect becomes stable or less likely to change with the addition of more tests of the hypothesis. In the case of collateral reports of participants' drinking behavior, the cumulative slope indicates stability in the database by wave four, when the eighth hypothesis test was conducted in 2005. At this point, there was enough evidence to suggest that this practically null discrepancy between collateral and participant reports of participant drinking behavior would not change over time as additional tests of the hypothesis were added. This was the case in our cumulative analysis, and there is no reason to believe that additional tests of the hypothesis will suddenly reveal discrepancies between collateral and participant reports.
To better understand the observed variation in the correspondence between collateral and participant responses, we examined the influence of the 7 predictors on bias and agreement.
A significant, positive relationship (r = 0.315, p < 0.001) emerged between the frequency of the collateral's contact with the participant and the bias effect size. Therefore the more contact between the collateral and the participant, the greater the positive discrepancy in their estimates, such that the collaterals underestimated the participants' alcohol use. A nonsignificant, negative relationship (r = −0.067, p = 0.092) emerged between the frequency of the collateral's contact with the participant and the collateral/participant agreement. Therefore contact between the collateral and participant did not influence the reliability of their estimates.
A significant, positive relationship (r = 0.449, p < 0.001) emerged between the relationship the collateral had with the participant and collateral/participant bias effect size. Therefore the closer the relationship between the collaterals and participants, the greater the positive discrepancies in their estimates, indicating that the collaterals underestimated the participants' alcohol use. A nonsignificant negative relationship (r = −0.072, p = 0.081) emerged between the relationship the collateral had with the participant and collateral/participant agreement.
A significant, positive relationship (r = 0.384, p < 0.001) was observed between the collateral's confidence in their ratings and participant/collateral bias. Therefore more confident the collateral was in his or her estimate of the participant's drinking, the greater the positive discrepancy in their estimates, such that the collaterals underestimated the participants' alcohol use. A positive, nonsignificant relationship (r = 0.012, p = 0.491) emerged between the collaterals' confidence in their ratings and the collateral/participant agreement.
A significant, positive relationship (r = 0.145, p = 0.007) emerged between drinking specificity and the participant and participant/collateral bias. Therefore, the more vague the question assessing the alcohol behaviors, the more likely the collateral underestimated the participants' alcohol use. A negative, nonsignificant relationship (r = −0.034, p = 0.213) emerged between the specificity of the drinking behavior and collateral/participant agreement.
A nonsignificant, practically null (; , p = 0.578) effect was obtained for the 33 bias hypothesis tests that assessed quantity, indicating that the collateral reports of drinking quantity did not differ from the participants' reports. A null (; , p = 0.704) effect was obtained for the 41 bias hypothesis tests that assessed frequency, indicating that the collateral reports of drinking frequency did not differ from the participants' reports. Finally, a nonsignificant, small, positive (; , p = 0.220) effect was obtained for the 28 bias hypothesis tests that assessed heavy drinking, indicating that collaterals in these studies underreported participant's heavy drinking. Although there was no difference in the magnitude of the effect size between quantity and frequency, the effect size for heavy drinking was significantly different than both quantity and frequency. These results suggested that collaterals were more accurate in their estimates of drinking quantities or frequencies, but less accurate in their estimates regarding participants' heavy drinking episodes.
Regarding collateral and participant agreement on different types of drinking behaviors, significant and moderate effects were found for quantity (average ICC = 0.500; , p ≤ 0.001), frequency (average ICC = 0.491; , p < 0.001), and heavy drinking (average ICC = 0.521; , p < 0.001). The agreement effect sizes for each drinking behavior were not significantly different.
A significant, small, positive (; , p < 0.001) effect was obtained for the 52 hypothesis tests that assessed collateral/participant agreement in mandated students, indicating that collaterals in these studies underreported participant drinking. A nonsignificant, negative (; , p = 0.320) effect was obtained for the 50 hypothesis tests that included nonmandated participants. The difference between the magnitudes of the 2 effects was highly significant (p < 0.001), indicating that the collateral and participant discrepancies in mandated students were significantly greater than those in nonmandated samples.
A significant effect (average ICC = 0.419; , p < 0.001) was obtained for the 52 hypothesis tests that assessed collateral/participant agreement in mandated students, indicating reasonable participant/collateral agreement. Similarly, a significant, moderate (average ICC = 0.516; , p < 0.001) effect was obtained for the 50 hypothesis tests that included nonmandated participants. The difference between the magnitudes of the 2 effects was significant (p = 0.044), indicating that agreement in nonmandated samples was somewhat better than agreement in mandated samples.
Because the primary authors provided us with separate tests of bias by gender, we were able to examine this variable in the independent database (Table 1). A nonsignificant, positive (; , p = 0.243) effect was obtained for the 6 independent hypothesis tests that assessed participant/collateral bias in male students. A nonsignificant, negative (; , p = 0.670) effect was obtained for the 7 independent hypothesis tests that included female participants. The difference between the magnitudes of the 2 effects was not significant (p = 0.356), indicating that the collateral and participant discrepancies in male and female participants were not significantly different. Similar results (both magnitude and direction) were obtained in the larger, nonindependent database (see Table 2).
To test gender differences in agreement, we examined the independent database (Table 1). A significant effect (average ICC = 0.505, , p < 0.001) was obtained for the 6 independent hypothesis tests that assessed participant/collateral agreement in male students. A diffuse comparison of these 6 independent hypothesis tests of agreement indicated that they were not significantly different from one another [χ2(5) = 0.832, p = 0.975]. A significant effect (average ICC = 0.552, , p < 0.001) also was obtained for the 7 independent hypothesis tests that included female participants, and a diffuse comparison of these effect sizes revealed that they were not significantly different [χ2(6) = 6.608, p = 0.359]. The difference between the magnitudes of the 2 effects was not significant (p = 0.390), indicating that the collateral and participant agreement in male and female participants were not significantly different. As was the case with bias, similar results (both magnitude and direction) were obtained in the larger, nonindependent database (see Table 3).
Findings from this meta-analytic integration did not provide evidence that collaterals exhibited significant bias, but good agreement, when estimating the alcohol use of participants. The bias that was detected was consistent with previous research (Laforge et al., 2005), in which the majority of collaterals (73%) underreported participant alcohol use. The cumulative meta-analysis indicated that the bias between collaterals and participants was sufficient and stable, unlikely to be altered by subsequent research using collaterals in the college setting. Although the agreement between collateral and participant reports was quite variable (ICC's ranging from 0.0 to 1.0), the average ICC of 0.51 indicated that the reliability of collateral estimates was good, especially given the circumstances under which the reliability estimates were produced. To better understand these findings, we examined 7 predictors of bias and agreement.
Regarding bias, 6 of the 7 predictors we examined were significantly related to collateral and participant agreement. First, inconsistent with previous research (Laforge et al., 2005; Maisto et al., 1979; O'Farrell et al., 1984), frequency of contact was not associated with a less bias between collateral and participant reports. Instead, collaterals who saw the participant more frequently reported less alcohol use than the participant. Second, the closer the relationship between the collaterals and the participants, the larger the positive discrepancies in the estimates. This is an unexpected finding as closer relationships had been associated with more agreement in previous research with college students (Laforge et al., 2005). However, there are several reasons why having a close relationship with the participant may not lead to less bias in the college setting: the two may not consistently drink together or may drink together so frequently that the different drinking experiences may come together in memory as some kind of weighted average that it is accurate overall but inaccurate for any one drinking occasion. In addition, collaterals who are close friends of the participant may be less comfortable reporting on the participants' alcohol use, leading to lower, more conservative estimates. Third, collaterals who reported greater confidence in their estimates were more likely to provide estimates that were lower than the participants' self-reported use. This discrepancy between confidence and bias is consistent with the literature on eyewitness testimony, which has found little connection between the confidence and accuracy (Brewer et al., 2002; Smith et al., 1989). Fourth, consistent with the adult literature on the validity of self-report (Babor and Del Boca, 1992; Babor et al., 1987), the more specific the drinking behavior, the greater the agreement between the collateral and participant estimates. The more specific the drinking behavior reported, it is more likely that the collateral estimates matched the participants' self-reported alcohol use. Fifth, collaterals and participants exhibited less discrepancy on general drinking behaviors, frequency, and quantity as opposed to heavy drinking. This was unexpected as we thought that heavy drinking episodes would be more salient and memorable to the collateral and the participant, resulting in greater agreement (e.g., Perkins, 2002). However, it is possible that the participant and collateral reported on different heavy drinking episodes, resulting in the discrepancy. Finally, mandated participants and their collaterals also demonstrated less agreement than participants who had not received an alcohol infraction. Specifically, collaterals reported that the participants were drinking less than the participants reported, whereas collaterals for nonmandated students demonstrated practically no discrepancy.
Taken together, this pattern of findings suggests a possible protective bias on the part of the collateral. Specifically, there appears to be a small yet significant tendency for collaterals who are close friends to underestimate the participants' use, a trend also evident for collaterals who are reporting on the use of participants who have received alcohol violations. This bias may be the result of demand characteristics created by the method of collecting collateral data. Participants in this research underwent a detailed consenting process, where it would have been made clear that their responses would be kept confidential and not be used against them in any way. If participants wanted to underreport their use, then a negative effect would have been found—the participants' self-reported use would have been lower than that of the collateral. Instead, the opposite occurred. Collaterals were most often contacted by phone, and given a verbal description of the study, and then asked to provide information. As a result, the collaterals may have been concerned that their responses would be used against the participants, often a close friend. Thus, collaterals may have underestimated the drinking of the participants in order to protect them from consequences of their alcohol use. The observed trend to underestimate the use of the participants may also have resulted from an intentional bias on the part of the participant. That is, the participant may have underreported his or her use to the collateral, who in turn reported this inaccurate perception of drinking to the researchers. There are a number of reasons why the participant would want to underreport his or her use to the collateral, including being embarrassed about continued drinking after a referral incident (for mandated students), concern over how the information will be used (e.g., Walker and Cosden, 2007), or if the collateral had expressed concern about the participant's drinking (for a close friend). These social pressures may have led to an underrepresentation of alcohol use to peers that was not replicated in the context of a confidential survey.
In general, the results of the agreement analysis indicate that the estimates of reliability do not vary systematically with most of the predictors we examined. In other words, the average ICC obtained in this analysis did not change depending on the amount of contact or type of relationship between the participants and collaterals, the degree of confidence that collaterals had in their estimates, the specificity of the drinking assessments, the type of drinking behavior, or the participant gender. The only significant predictor of agreement was mandated status: collaterals and mandated students demonstrated significantly poorer agreement (average ICC = 0.419) than collaterals and nonmandated participants (average ICC = 0.516). It is likely that this discrepancy may have been the result of intentional misrepresentation of alcohol use, especially the protective behavior of the collateral for the mandated student. It is also possible that the observed differences may be partially due to unintentional error resulting from being asked to report the drinking behavior of a friend over an extended period of time.
These findings have direct implications for researchers in the college setting. If the purpose of the use of collaterals is to verify participant self-report, it appears that they will do so with little bias and relatively good agreement. However, to limit bias, specific questions should be asked regarding the drinking behavior of the participant. Furthermore, clear, concise instructions should be provided to the collateral, including how the information will be used, especially if the participants are close friends or mandated students. It may be preferable to email the collateral detailed instructions and assurances or refer them to a project webpage that contain this information. Therefore it is important to implement proper procedures for accurate self-report. These include consideration of the sensitivity of the information sought, the specificity of the information the participant is asked to report, the personal characteristics of the participant, the time period covered by the measures, and the demand characteristics of the situation (Babor and Del Boca, 1992; Babor et al., 1987; Midanik, 1988). In addition, care should be taken when using collaterals that were actively involved in the treatment of the participant (e.g., Tevyaw et al., 2005).
The findings of the study should be considered in the context of some limitations. First, this meta-analysis examined the agreement between collateral estimates of the alcohol use of participants and the participants' self-report. Thus, it is impossible to determine from these data what the actual alcohol use of the participants may be. There is no “gold standard” of alcohol consumption, and many college students are unaware of standard serving sizes and tend to underestimate the amount that they consume (White et al., 2005). Therefore, it is possible that both participants and collaterals' estimates of actual drinking may be inaccurate. Second, we were not able to examine dyad-specific characteristics of the collaterals and participants. Instead, given the nature of the data we reconstructed from the available studies, we could only examine sample-specific predictors. The examination of personal factors that may contribute to correspondence would also be valuable (e.g., hostility, degree of protectiveness of the participant). Third, the homogeneity of college students may have contributed to the lack of large effects for the predictors. It may be that these predictors play more of a role in the accuracy of collaterals in more heterogeneous samples (e.g., adult alcoholics), in which there is a wide variety of drinking contexts and consequences. Finally, although these predictors were significantly related to the collateral and participant bias, it should be noted that their overall effect was rather small. Therefore it is possible that the significance achieved by some predictors may have resulted from Type I error (inflated significance due to re-sampling from the same study) rather than from an effect with real-world implications.
Perhaps the best way to summarize the findings of this meta-analytic integration is that, in the general college context, collaterals may slightly underreport the drinking of participants. The agreement between participants and collaterals is reasonably good, although less so with mandated students. Although this pattern of findings may be the result of intentional underreporting on the part of the collaterals and/or participants, it is also possible that they reflect the difficulty of estimating a variety of drinking behaviors over an extended period of time across different contexts. As a result, there does not appear to be a significant return on the considerable investment of time and money required to collect collateral data with college student drinkers. Therefore, researchers should consider carefully the purpose and design of their studies before implementing self-report verification procedures via collaterals with college student samples.
This work was supported by National Institute on Alcohol Abuse and Alcoholism Grant R01-AA015518 to B. Borsari. The authors thank all of the researchers who graciously and promptly contributed their data, and also recognize the importance of the guidance of Alice Eagly and Seok P. Wong in converting ICCs to effect sizes for meta-analytic integration. Finally, the authors would also like to acknowledge the late Brian Mullen for his thoughtful comments and guidance in the conceptualization of this project.
1We acknowledge that the length of time the collateral is asked to recall may also influence collateral and participant agreement (Babor and Del Boca, 1992). Specifically, it may be easier to recall the participant's drinking over a shorter period (e.g., over the past month) than a longer period of time (e.g., over the past year). Therefore we expected that collateral and participant agreement would be best for shorter recall periods. However, the recall period was 30 days except in 2 studies (Hagman et al., 2007; Stacy et al., 1985), precluding a robust test of this relationship.
2The suggestions of the reviewers and action editor prompted us to request additional data and analyses from the primary study authors. As a result, in some cases the values reported in the present analysis differ from the data that were published in the original reports.
3We decided not to conduct a cumulative meta-analysis for agreement for 2 reasons. First, because an ICC is bound by the context and question, we believed that asking researchers to collapse across these contexts and questions to provide one estimate would render a value that would be difficult to interpret. Second, because reliability is contextual, we felt that the overtime implication of cumulative analysis was secondary to the values of the ICCs within contexts. In other words, conducting a cumulative meta-analysis might establish that the average rather unreliable effect is both sufficiently established and stable. Such a result does not seem to have any practical utility if the context of the measure is what is important.
4Examination of the effect sizes in intervention and survey studies revealed a nonsignificant (p = 0.183), negligible (, ) effect that was obtained for the 3 hypothesis tests from survey studies (Hagman et al., 2007; Stacy et al., 1985) and a nonsignificant (p = 0.548), null (, ) effect for the 11 hypothesis tests from intervention studies. The difference between the magnitudes of these 2 effects was not significant (p = 0.377), indicating that the type of study did not appear to systematically influence collateral and participant agreement.
5Some studies in the meta-analytic database recorded collateral estimates of alcohol-related problems (Barnett et al., 2007; Laforge et al., 2005; Marlatt et al., 1998; Stacy et al., 1985). However, we did not analyze the participant and collateral estimates on problems for 2 reasons. First, correspondence between participant and collateral reports of problems in adult samples has ranged widely, with the highest rates of agreement being found for summary scores of measures of severity and impairment of alcohol use that are more observable and concrete (O'Farrell and Maisto, 1987; Polich, 1982). Lower agreement has been found on measures of more minor consequences (e.g., minor withdrawal symptoms) that may not be observable such as having a hangover (Connors and Maisto, 2003). This tendency is especially problematic in a college sample in which a wide variety of minor problems can be endorsed (Kahler et al., 2004). Each study used different measures of problems, all of which included consequences that were difficult for collaterals to observe (e.g., felt that she/he needed more alcohol than she/he used to in order to get the same effect). Indeed, correlations between participant and collateral estimates of problems were consistently lower than those of alcohol use in these studies, ranging from 0.29 (Marlatt et al., 1998) to 0.52 (Laforge et al., 2005). Second, the studies used summary scores for their problem measures, making it impossible to determine agreement on specific items. Furthermore, many of the problem measures combined both concrete behaviors (e.g., “got sick”) and behaviors that were difficult for the collateral to observe (e.g., “had a bad time”). As a result, we questioned the utility of evaluating the agreement of summary scores as identical scores could be derived by endorsing items that were very different in terms of severity of alcohol-related problems.
6We recognize the assumption that hypothesis tests from the same study are independent is patently false. For example, the 20 hypothesis tests reported by Borsari and Carey (2005) were derived from the same participants at the same time. Without making this assumption of independence, 2 options are available to avoid nonindependent hypothesis tests. One option is to choose the “best” hypothesis test from each study. Given the variety in hypothesis tests within each study in the type of alcohol use assessed, confidence in the collateral report, and so on, it would have been difficult to determine which hypothesis test was the “best” one. The second option is to pool the results from all the hypothesis tests to create a single test of participant–collateral agreement. Nevertheless, both these alternatives create more problems with assumptions and arbitrariness than the present assumption of independence, and they fail to allow a nuanced examination of the predictors of participant–collateral agreement. Further, our results indicate that the degree of distortion engendered by the assumption of independence in the nonindependence hypothesis tests in the main database is (at worst) tolerable.
7All confidence intervals and significance levels are computed based on nonindependent data. The nonindependent nature of these tests is likely to inflate the Type I error rate, so these tests should be interpreted with caution.