The AUDADIS-IV demonstrated good test-retest and internal consistency reliability for the new diagnostic modules introduced in Wave 2 of the NESARC. Reliabilities of PTSD and ADHD dichotomous diagnoses were good, with slightly higher ICC and alpha coefficients generally observed for their dimensional scale counterparts. The reliability of lifetime PTSD was lower than that for the corresponding 12-month diagnosis. This result, found in past research (
Grant et al., 2003), implicates recall bias as one factor that may lead to diminutions in reliability for diagnostic measures in general population test-retest studies. By contrast, reliability of childhood ADHD was lower than that for the manifestation of the disorder in adulthood. The prevalence of childhood ADHD in the U.S. general population has recently been estimated at 8.1%, with only 36.3% of individuals who had ADHD as children also experiencing the disorder as adults (
Kessler et al., 2005). Thus, in this case, it may be that the lower prevalence of the disorder in adulthood compared with childhood resulted in the diminution of reliability observed in this study.
To our knowledge, this is the first study to examine the reliability of DSM-IV borderline, narcissistic, and schizotypal PD diagnoses in the general population. The reliabilities of these PDs were as good as or better than the corresponding reliabilities found in short-term test-retest studies of the same diagnoses in clinical samples (
Zimmerman, 1994). However, test-retest reliability should be greater in clinical compared with general population surveys, since more severe cases of PDs are found in treatment, whereas milder cases tend to be found among community respondents. These results suggest that our efforts to reduce sources of unreliability often related to fully structured diagnostic interviews appear to have been successful. Specifically, the high degree of standardization of the AUDADIS-IV and its training module appears to have reduced three major sources of unreliability in instruments of this type: (1) the questions that assess psychiatric symptoms; (2) the symptom information provided by the respondent; and (3) the interpretation of the information provided by the respondent. The AUDADIS-IV is also unique in its requirement to assess stringently the DSM-IV clinical significance criterion, i.e., the requirement that each PD lead to distress and/or social or occupational impairment. Requiring individuals classified as having a particular PD to endorse the DSM-specified number of symptoms of the disorder in addition to meeting the clinical significance criterion may, in part, have been responsible for the good reliability observed for borderline, narcissistic, and schizotypal PDs in this study.
For most diagnoses assessed in this study, an order effect was observed. That is, prevalences generally decreased from test to retest interview. Although the present study cannot offer a definitive explanation for the observed order effect, the decline in prevalences of alcohol, drug, and psychiatric disorders from test to retest has been observed in tests of a variety of other psychiatric assessment instruments (Helzer, 1981; Bromet et al., 1986). The decline in prevalences has been attributed to either a reduction in reporting of symptoms, or inconsistency in positive responses to screening questions that were often used to route respondents past sections of the interview that were not relevant. However, the AUDADIS-IV does not have screening questions associated with ADHD or the three PDs assessed in this study. That is, all respondents answered symptom item and associated diagnostic questions in these modules. Although the AUDADIS-IV PTSD module does have screening questions, there was little discrepancy in responses to these questions at test and retest. The gross error rate for these screening questions was 2.0%, a rate that was too small to account for the observed order effect. Taken together, these findings suggest that the decreases in prevalences of the disorders between test and retest is more likely the result of a decline in symptom reporting rather than an increase in negative responses to screening questions. Further methodologic research should focus on this important but insufficiently understood phenomenon.
Reliability was also found to be somewhat greater when dimensional scales were examined, compared with their categorical counterparts. Consistent with the internal consistency results, ICC values were good to excellent for all DSM-IV psychiatric diagnoses assessed in this study. These results are consistent with prior research (
Grant et al., 2003) and were expected. Continuous measures are more statistically informative than categorical measures and therefore should be more reliable. Further, less severe cases of disorder will necessarily have a greater adverse impact on the reliability of categorical measures than on their continuous counterparts.
The psychometric evaluation of major risk factors associated with alcohol, drug, and other psychiatric disorders is rare in the substance use disorder or psychiatric epidemiology literature. This study showed fair to good test-retest and internal consistency reliability for most risk factors, with acculturation, race-ethnic identification, sexual orientation discrimination, perceived stress, stressful life events, alcoholism stigma, and adverse childhood experience scales demonstrating reliabilities in the good to excellent range. These reliability results were not surprising since only measures of risk factors that had been found in prior psychometric studies to demonstrate satisfactory reliability were selected for inclusion in the AUDADIS-IV. However, our slight modifications to those measures necessitated a re-evaluation of their reliabilities, especially in general population samples, where many of these measures had not been assessed.
The excellent reliability of the ACE scales was not expected due to the sensitivity of these measures. It is possible that the objective and behavioral nature of these scales outweighed their sensitivity to yield excellent reliability. Alternatively, ACEs are extremely memorable, albeit painful, a situation that may have reduced recall biases related to the events and consequently increased their reliability. Another nonmutually exclusive possibility is that the rapport established between respondents and interviewers highly trained to ask sensitive questions may have contributed to the high reliability observed for these measures.
The high level of reliability found in this study for PTSD, ADHD, and borderline, narcissistic, and schizotypal PDs suggests that the AUDADIS-IV can be a useful diagnostic tool in research settings. The Wave 2 NESARC survey, from which the data from this study were derived, queries a wide range of clinical symptomatology that cuts across numerous DSM-IV Axis I and II disorders. The finding that dimensional symptom scales for the diagnoses examined in this study were highly reliable supports the need for continued and sustained research to construct and evaluate dimensionally based assessment instruments to improve upon the purely categorical approach to diagnoses that underlies the DSM-IV. Incorporating dimensional components in future DSM revisions promises to address concerns commonly cited with respect to the categorical model of diagnosis, that is, excessive comorbidity, heterogeneity among persons with the same disorder, and inconsistent, unstable, and arbitrary diagnostic boundaries between disordered and normal functioning (
Oldham et al., 2005). Further, studies (e.g.,
Markon and Krueger, 2005;
Krueger et al., 2006) that explicitly compare continuous and dichotomous models of DSM-IV disorders using sophisticated latent class and trait models appear warranted in helping to define better phenotypic targets for etiologic and treatment research.
Importantly, this study has also provided the research community with a reliable battery of risk factors for use in future epidemiologic research on alcohol, drug, and psychiatric disorders. The availability of such a broad range of reliable risk factor assessments promises to contribute to the reliability of future research and the conclusions drawn from it, and to sharpen the direction of inquiry it defines.