Before turning to substantive interpretation, a number of methodological limitations must be acknowledged. First, K-SADS interviews were carried out by telephone and CIDI interviews were carried out face-to-face. Even though telephone interviews constitute a valid mode of clinical assessment in both adults
29, 30 and adolescents,
31, 32 we do not know what would have happened if the same mode of administration had been used in both interviews. Second, the design of the clinical reappraisal study, in which clinical interviewers were provided information about respondent reports to diagnostic stem questions in the initial interview, might have biased results. Third, findings may be biased by the tendency for respondents to report more symptoms in first interviews than subsequent interviews.
9 Such bias might have been minimized by counter-balancing order of CIDI and K-SADS interviews, but that was not feasible in our design. We tried to minimize this bias by having a substantial period of time between the two interviews, but this bias might nonetheless remain to some extent. In addition, the comparatively long time lag between CIDI and K-SADS interviews might have resulted in some first onsets occurring in the interval, introducing a conservative bias into estimates of concordance between CIDI and K-SADS diagnoses.
It should be noted that we focused only on prevalence rather than severity.
46 The high prevalence of mental disorders in the community makes it more relevant for policy purposes to study disorders with higher-than-average clinical severity.
47 It has also been argued that the clinical relevance of epidemiological studies would be improved by considering dimensional measures of clinical severity.
48, 49 Criticism along these lines might contend that good diagnostic concordance such as documented here is less relevant than information about diagnostic concordance in distinguishing severe cases from mild cases and about consistency of dimensional clinical severity ratings. It is noteworthy in this regard that the CIDI includes fully structured versions of standard clinical severity scales to assess the severity of individual disorders, such as the Quick Inventory of Depressive Symptoms Self-Report Version
50 to assess the severity of major depression and the Panic Disorder Severity Scale Self-Report Form
51 to assess the severity of panic disorder. In addition, the WHO Disability Assessment Schedule
52 is included in the CIDI to assess the severity of overall psychopathology. Although these dimensional measures were not considered in the current report, they are available in the NCS-A to consider disorder severity rather than only disorder prevalence.
Despite the focus of the current study only on diagnoses rather than also on severity, information about diagnostic concordance is useful in determining whether DSM-IV diagnostic thresholds and criteria are defined consistently in the CIDI versus the K-SADS. Our results show that the CIDI diagnostic thresholds are generally consistent with K-SADS thresholds, with the two exceptions of specific phobia and oppositional-defiant disorder. In the latter two cases, the CIDI thresholds are well below the K-SADS thresholds, resulting in proportionally much higher prevalence estimates in the CIDI than K-SADS (51.2% for specific phobia; 38.7% for oppositional defiant disorder). The problem with the CIDI assessments of these two diagnoses involves the fact that both evaluated core symptoms with a yes-no checklist that failed to distinguish symptoms in terms of persistence or severity. We suspect that the use of dimensional rather than dichotomous ratings in future versions of the CIDI would help resolve these problems. The other cases where CIDI diagnoses were significantly (in a statistical, rather than substantive, sense of that term) higher than K-SADS diagnoses all involved either a substantively small proportional difference for a disorder with very high prevalence (major depression/dysthymia) or substantively small absolute differences for disorders with low prevalence (GAD, agoraphobia, alcohol and drug dependence).
We found that biased prevalence estimates in the CIDI could be corrected by using predicted probabilities of K-SADS diagnoses instead of CIDI diagnoses as outcome measures. As discussed in more detail elsewhere
21 and illustrated in a series of recent disorder-specific analyses of adult disorders,
53–55 it is practical to use predicted probabilities of clinical diagnoses in substantive analyses of CIDI surveys by imputing these predicted probabilities to all survey respondents based on the prediction equations generated in the clinical reappraisal sub-sample. These predicted probabilities can then either be treated as outcomes in substantive analyses or can be used as input to more complex analyses that use the method of multiple imputation (MI)
56 to make estimates of the prevalence and correlates of clinical diagnoses. Comparison with parallel estimates of the prevalence and correlates of CIDI diagnoses can be used in such cases to carry out much more fine-grained analyses of consistency with clinical diagnoses than conventional analyses of diagnostic concordance. We consequently plan to make use of predicted probabilities in substantive analyses of the NCS-A data to correct problems with diagnoses where CIDI and K-SADS prevalence estimates differed substantially (most notably, specific phobia and oppositional-defiant disorder).
Individual-level concordance between diagnoses based on the CIDI and the K-SADS were generally good. In the one case where individual-level concordance was slight, involving alcohol dependence, much higher concordance was found for the broader diagnosis of alcohol abuse. Similarly, although the assessment of illicit drug dependence suffers from low PPV, this problem was addressed by considering the broader diagnosis of illicit drug abuse. We consequently plan to focus on the diagnoses of alcohol and illicit drug abuse rather than on dependence in our substantive analyses. A less extreme version of the same situation occurred in distinguishing Bipolar I from Bipolar II, where we found that concordance of diagnoses based on the CIDI and K-SADS improved when we combined both diagnoses. We therefore plan to combine BP-I and BP-II in our substantive analyses. In all these cases (i.e., substance dependence vs. abuse and BP-I vs. BP-II), the severe form of the disorder is comparatively rare among children and adolescents and the CIDI severity questions are too coarse to make powerful distinctions between the severe and less severe forms. Future versions of the CIDI should modify these sections to increase the ability to make these distinctions.
We also documented that the few cases in which individual-level diagnostic concordance is less than substantial can be corrected by developing dimensional probability-of-disorder measures based on CIDI symptom data. Three cases of this sort exist: PTSD, ADHD, and alcohol dependence. This means that although we are not able to reproduce K-SADS diagnoses of these disorders with high accuracy at the individual level, we can generate predicted probabilities of these diagnoses that have excellent concordance with K-SADS distributions, allowing us to estimate prevalence and correlates of these disorders with good accuracy using statistical methods appropriate to the analysis of predicted probabilities.
55 The one exception is our inability to make accurate distinctions between bipolar I and bipolar II disorders. Given the rarity of threshold bipolar disorder among adolescents, we failed in our attempts to develop a logistic regression equation that had high AUC in distinguishing between these two disorders. As a result, all NCS-A analyses of threshold bipolar disorder will combine bipolar I and bipolar II cases into a single category.
As noted in the introduction, previous validation studies of lay-administered diagnostic interviews with clinician-administered gold standard interviews administered to adolescents generally found relatively low concordance,
5 particularly for disruptive behaviour disorders,
33 although concordance increased when informant reports were obtained from parents and/or teachers.
5, 9 As noted in the introduction, those studies generally found concordance in the range κ = .3–.6. The aggregate κ estimates documented in our study are generally above this range. This might be due to the fact that numerous features of CIDI 3.0 improve on earlier fully-structured research diagnostic interviews in interviewer training, quality control, question wording, and interview flow. The inclusion of a separate section to review lifetime diagnostic stem questions for all disorders might have played an especially important part in this respect, as previous research has shown that this approach leads to a substantial increase in the endorsement of lifetime stem questions.
16 The modification of CIDI 3.0 questions based on cognitive interviewing might also have been involved.
14 Our clinical reappraisal study design, most notably the separation of the clinical reappraisal interview by two months to minimize respondent fatigue, and the un-blinding of clinical interviewers to CIDI diagnostic stem questions to encourage reluctant respondents who reported episodes in the CIDI to discuss those episodes rather than conceal them, might also have contributed to the good concordance, although, as noted above, these design features can be seen as limiting external validity.
Although the word
validation is often used to characterize the kinds of results reported here, this term is not entirely accurate due to the fact that the K-SADS diagnoses cannot be taken as perfect representations of true DSM disorders. This is true both because K-SADS test-retest reliability is imperfect
15 and because some respondents in community surveys consciously hide information about their mental disorders from clinical interviewers.
57 This imperfect validity, which characterizes not only the K-SADS but all “gold standard interviews,”
15, 58 presumably attenuates associations with diagnoses based on fully-structured diagnostic interviews. Consistent with this thinking, the application of external criteria of validity, such as measures of impairment and service use, generally yield evidence of stronger associations than those found with independent diagnostic interviews.
59 Based on these considerations, the estimates of concordance reported here should be considered lower bound estimates of CIDI validity. A good empirical illustration of this thinking can be found in the work of Booth et al.,
60 who compared lifetime diagnoses of major depression based on an earlier version of CIDI administered to an adult sample with diagnoses based on SCID clinical reappraisal interviews, where κ was .53. However, when CIDI diagnoses were compared with more accurate LEAD standard diagnoses (longitudinal, expert, and all data)
61 that used not only the SCID, but also all the clinical information available, to arrive at an improved estimate of clinical diagnoses, κ increased to .67.
In conclusion, the results reported here demonstrate that lifetime DSM-IV diagnoses based on the CIDI as implemented in the NCS-A have good individual-level concordance with diagnoses based on blinded clinical reappraisal interviews using the K-SADS, that prevalence estimates based on the two instruments are fairly similar in substantive terms for most disorders, and that symptom-level modifications can be used to correct prevalence estimates in most cases where between-instrument differences in prevalence estimates are substantively meaningful. As noted in the first paper in this series
11 there is considerable need for national data on the prevalence and correlates of psychiatric disorders in adolescents.
62 The practical utility of such data relies on accurate classification of disorders, a complex task given inconsistencies in diagnostic decision-making by clinicians.
63, 64 Substantial efforts were made to ensure that the CIDI provided clinically meaningful diagnoses of adolescents, including the use of cognitive interviewing strategies described elsewhere
11 that built on earlier iterative CIDI revisions and refinements.
14, 16, 17, 65 The results of the current study show that the CIDI has good concordance with clinician diagnoses, providing a solid foundation for later substantive analyses of the NCS-A data.