We found that the sensitivity of the CIDI varies by race/ethnicity for four diagnoses out of the ten considered here (ADHD, agoraphobia, panic disorder, and PTSD) and that specificity of the CIDI varies by race/ethnicity for one diagnosis (agoraphobia). Although it is not clear why these four rather than the other six disorders were particularly discrepant in their concordance with clinical diagnoses, it is noteworthy that all four have a component of physiological hyperarousal and reactivity, which previous research suggests may be particularly sensitive to the cultural context of racial/ethnic minority youth (Pina and Silverman, 2004
; Varela et al., 2007
In the case of PTSD, which has previously been identified as problematic for the CIDI to diagnose accurately in racial/ethnic minority groups (Alegria et al., 2009
), it has been suggested that fully-structured instruments like the CIDI are differentially biased for minorities because they are less able than semi-structured clinical interviews to interpret the cultural context of trauma and trauma-related symptoms (Alarcon, 1995
). However, we were unable to investigate this possibility in the NCS-A clinical reappraisal study because the number of minority youths with PTSD was too small to allow modifications of diagnostic criteria to be evaluated with adequate precision. The same was true for panic disorder.
The situation was different for agoraphobia and ADHD, where we were able to make modifications to improve diagnostic criteria. In the case of agoraphobia, we found that tightening the diagnostic algorithm for Latino and Non-Latino Black adolescents reduced the inflated CIDI prevalence estimates in these subsamples. In particular, we added the requirement that racial/ethnic minority youth indicate feeling badly about or disappointed in themselves because of their fear or avoidance. We speculate that this item may tap into perceptions of the cultural acceptability of symptoms. However, this change in diagnostic criteria also substantially decreased CIDI sensitivity to detect agoraphobia among racial/ethnic minority youths, leading to a decrease in individual-level concordance between diagnoses based on the CIDI and clinical diagnoses. The end result was that we did not implement any changes in the CIDI diagnosis of agoraphobia. A general absence of data on the accuracy of assessments of agoraphobia for racial/ethnic minority youths (Lewis-Fernandez et al., 2010
) suggests that this is an important area for future research.
Modifications to the diagnostic algorithm for ADHD were more successful. In an earlier paper, we modified CIDI diagnostic criteria in the total sample to account for parent over-estimated of ADHD symptoms (Green et al., 2010
). Here, the finding that diagnostic criteria needed to be loosened to improve concordance for Latino and Non-Latino Black parents suggests that parents of racial/ethnic minority youth are less
likely to over-endorse ADHD symptoms. This finding is consistent with prior research indicating that, given comparable levels of hyperactivity, parents of racial/ethnic minority youth less often endorse symptoms than parents of Non-Latino White children (Hillemeier et al., 2007
). By modifying CIDI criteria to allow more flexibility in parent symptom report, we were able to perfectly identify Latino youths with ADHD and improve the accuracy of prevalence estimates for Non-Latino Black adolescents, although these benefits occurred at the expense of a slight decrease in the specificity of diagnoses for Non-Latino Black adolescents.
Several limitations in the design of the NCS-A and of the clinical reappraisal study may have influenced our results. First, the NCS-A sample excluded school dropouts, the homeless, and non-English speakers; all of which are groups where racial/ethnic minority youth are disproportionately represented. Second, there were high rates of individual non-response and school non-response, although analysis of effects of nonresponse in the NCS-A found little evidence of bias (Kessler et al., 2009a
). No data were collected on the race/ethnicity of non-respondents, so we do not know whether non-response rates differ across race/ethnic groups. Third, the K-SADS was administered by telephone (in contrast to the face-to-face CIDI administration). There is strong evidence that telephone interviews are a valid method for clinical assessment (Aneshensel et al., 1982
; Rohde et al., 1997
; Sobin et al., 1993
) and, in the case of the NCS-A, it provided the only feasible method for this type of large-scale data collection. However, the comparison of in-person CIDI with phone K-SADS interviews likely make concordance estimates more conservative. Fourth, the design of the clinical reappraisal study, which provided K-SADS clinical interviewers with information about responses to diagnostic stem questions in the CIDI interview, may have influenced racial/ethnic differences. Fifth, all the analyses reported here were based on the untested assumption that diagnoses based on the K-SADS are equally valid for minority and nonminority youth. Sixth, the clinical reappraisal study was not specifically designed to study CIDI validity by race/ethnicity. As a result, the number of racial/ethnic minority youths in the reappraisal study sample was smaller than we would have desired, limiting statistical power to study modifications to diagnostic criteria for the least common disorders.
These findings underscore the importance of testing measurement validity by race and ethnicity. They suggest that, although CIDI diagnostic algorithms appear to function similarly across racial/ethnic groups for some disorders, there are four for which CIDI classifications have lower validity for racial/ethnic minority youth. In these cases, applying a universal framework to assessment may mask racial/ethnic differences, resulting in misleading prevalence estimates and disorder misclassification (Alegria and McGuire, 2003
). In some cases we were able to adjust disorder-specific CIDI diagnostic algorithms across racial/ethnic groups to improve estimated prevalence for racial/ethnic minorities, but these benefits were offset by a diminished ability to classify individuals with disorders. In deciding whether to use these modified diagnostic algorithms, researchers should be guided by the specific purposes of diagnostic assessment and, in particular, whether they are emphasizing prevalence estimation, or individual classification.
We recommend that future studies of diagnostic validity similarly attend to the potential for differential validity across racial/ethnic groups. Results have implications for interpreting subgroup comparisons and, further, may suggest qualitative distinctions between groups in the phenomenology of disorders (Alegria et al., 2009