Before turning to a discussion of proposed future directions of ESEMeD analyses, it is important to comment on the validity of the ESEMeD assessment of mental disorders. There are two key issues in comparing results of the ESEMeD surveys with those of previous surveys. The first is the issue of diagnostic validity: whether the instruments used to operationalize DSM or ICD diagnoses in these surveys are valid. The second is the issue of practical validity: whether these diagnoses, even if valid in a narrow technical sense of operationalizing the intended DSM or ICD criteria, are more broadly valid in identifying the range of people in need of treatment for mental disorders.
Diagnostic validity has been the focus of methodological research since the early 1980s.52–54
As of the early 1990s, Wittchen reviewed the literature in this area,17
while a number of related reports have subsequently been published that focus largely on the reliability or validity of revised versions of the CIDI.34, 35, 55–57
Three results emerge consistently from this literature. The first is that the concordance between diagnoses based on the DIS or CIDI compared to diagnoses based on blind clinical re-interviews is far from perfect, with concordance for most diagnoses in the adequate-to-good range using conventional standards to define these characterizations.58
This lack of concordance if often due to one or two criteria for a particular diagnosis that are inaccurately assessed in the DIS or CIDI,55
resulting in substantially improved concordance if these criteria are suppressed.
The second consistent result is that substantial downward bias exists in the DIS and early versions of the CIDI. For example, a comparison of results collected at baseline and one-year follow-up of the ECA study showed that a substantial number of early-onset lifetime disorders reported in the follow-up were not reported at baseline.59
If we accept the well-known finding that embarrassing behaviors are much more likely to be under-reported than over-reported, this result is most plausibly interpreted as evidence of under-reporting in the baseline interview. This bias was substantial, with up to 40% of true lifetime cases in the combined two-wave study missed in the baseline interview. At least two plausible causes exist for this under-reporting: the tendency for people to have an easier time recalling past experiences when they are in a mood that is consistent with the mood that existed in the situation they are being asked to recall; and the tendency for respondents to vary in the effort they put into active memory search in response to recall questions depending on their engagement with the interviewer and the interview. An extensive experimental literature shows that these two processes have substantial effects on the accuracy of responses to recall questions.60, 61
The third consistent result is that the under-reporting biases described in the last paragraph can be at least partially overcome by the use of strategies developed by survey methodologists to increase motivation for active memory search and to facilitate this sort of active memory search when motivation exists.32, 34, 62
Six such strategies were used in developing the WMH-CIDI. First, the CIDI diagnostic stem questions for all disorders were moved up to the front of the interview in a separate lifetime review section rather than appearing at the beginning of each separate diagnostic section. This consolidation of diagnostic stem questions allowed memory motivation and facilitation strategies to be focused on these critical entry questions to each diagnostic section and to be administered at a point in the interview when respondents were still cognitively fresh. It also allowed all stem questions to be administered before respondents became aware that endorsement of a stem question would result in the administration of many follow-up questions.
Second, an explanation was included at the beginning of the WMH-CIDI lifetime review section aimed at increasing respondent understanding that serious memory search was required to answer the lifetime stem questions. Third, motivational components were included in this introduction to encourage active memory search. Specifically, respondents were told “it is very important for the research to get complete and accurate answers to this next set of questions, so please take your time and think carefully before answering.” Respondents were then administered a commitment probe that asked them, with this injunction as a background, that they are ready to begin. Methodological research has shown that commitment probes of this sort that require respondents to acknowledge their understanding that active memory search is needed and their willingness to engage in this type of active memory search, significantly improves the accuracy of responses to survey questions that require recall.63
Fourth, interviewers were trained to read the stem questions slowly and deliberately. The aims here were to make sure respondents heard all the elements of the questions, to convey to respondents the importance of the questions, and to give respondents time to begin their memory search before the questions were finished. Fifth, interviewers were trained to use feedback probes aimed at encouraging active memory search. A nondirective reinforcing feedback probe such as “thanks, that’s very useful,” for example, was periodically used when the respondent appeared to be taking his or her time to think before answering. This sort of probe was used regardless of whether the respondent answered yes or no to the question. A corrective probe such as “You answered that one awfully quickly. Are you sure there’s not something you forgot?” was used if the respondent appeared to be giving a superficial answer. Sixth, the stem questions were presented to respondents as a set on a card as a visual aid aimed at improving question comprehension and focus.
A field experiment carried out in conjunction with the NCS showed that this set of six strategies, when combined, leads to a dramatic decrease in diagnostic under-reporting bias as well as to an associated increase in concordance with diagnoses based on clinical reappraisal interviews.34
Indeed, there were no statistically significant differences in this experiment between prevalence estimates based on the version of the CIDI that used these methodological strategies and prevalence estimates based on blind clinical interviews for twelve of fourteen DSM-III-R diagnoses. Furthermore, an independent investigation showed that the prevalence estimates obtained in a single interview that used these methodological strategies very closely reproduced prevalence estimates obtained by combining data over two waves of a similar fully structured interview that did not use these strategies.59
Based on these results, one would expect, all else equal, that the ESEMeD prevalence estimates would be more strongly concordant with independent clinical diagnoses and higher than the estimates in previous surveys based on the DIS and CIDI. An ESEMeD clinical reappraisal sub-study will eventually allow us to make a definitive evaluation of the first of these two expectations. However, the data from this sub-study are still being cleaned and coded and probably will not be thoroughly analyzed for another year.
The second expectation, that the ESEMeD prevalence estimates should be higher than in previous surveys, all else being equal, can be evaluated now based on the results in and . Prevalence estimates in the ESEMeD surveys are roughly comparable with those of earlier surveys. In considering this result, we must recognize that the ESEMeD prevalence estimates differ in at least one important respect from those obtained in earlier surveys: the ESEMeD surveys, as a component of the larger WHO World Mental Health Survey Initiative, are among the first large-scale international community epidemiological surveys to base prevalence estimates on DSM-IV criteria. Previous DIS and CIDI surveys primarily used DSM-II and DSM-III-R criteria. This is important because DSM-IV criteria are more strict than those in previous versions of the DSM as a result of the more prominent emphasis on the requirement that a syndrome must be associated with clinically significant distress or impairment to qualify as a disorder. Controversy exists as to the wisdom of this requirement, based on the observation that many serious medical conditions such as hypertension and hypercholesterolemia do not cause meaningful impairment until many years after they begin.64
In any event, this difference means that we would expect a downward drift in ESEMeD prevalence estimates compared with earlier surveys.
Another methodological issue that may also have played an important part in creating under-estimation of prevalence in the ESEMeD surveys involves fundamental survey conditions. Four fundamental survey conditions of the ESEMeD surveys are especially relevant here. First, the ESEMeD interviews were both longer (an average administration time of about two hours) and much more variable in length (ranging between forty-five minutes for a respondent who denied all diagnostic stem questions to as much as four hours for a respondent who endorsed all the stem questions and had a complex psychiatric history) than the typical market research interviews that the professional interviewers who conducted the ESEMeD surveys were used to carrying out. Second, the interviewers were paid by the interview rather than by the hour and generally did not receive any additional compensation for a long interview. Third, the stem-branch structure of the CIDI creates an opportunity for interviewers to guarantee that the interview will be short merely by entering negative responses to the small number of diagnostic stem questions that guide the interview skip logic. Bias of this sort can occur either by interviewers consciously entering negative responses even when respondents answer the stem questions affirmatively or by more subtle methods that involve using voice tone or speed of reading the questions or incorrect use of the feedback probes to induce negative responses to the stem questions. Fourth, the interviewer quality control procedures used in the ESEMeD surveys did not adequately guard against this type of downward recording bias. The required controls include supervisor monitoring of the clock in the computerized software used to administer the ESEMeD interviews to make sure that diagnostic stem questions were not rushed through and supervisor telephone recontact of a high proportion of respondents shortly after the completion of their interviews to repeat diagnostic stem questions and make sure that positive responses were correctly elicited and accurately recorded. It should be noted that the ESEMeD investigators specifically contracted survey firms that had carried out surveys with similar quality control requirements, but the firms did not implement these procedures with the rigor required to prevent bias.
Preliminary data files only became available to the ESEMeD coordinating center in Spain midway through the completion of the data collection. Staff of the coordinating center immediately detected problems of the sort described in the last paragraph when they were able to inspect the data. The ESEMeD committee made truly heroic efforts to correct the most egregious of these problems, including discarding a nearly complete survey in one country, based on evidence that an interviewer was cheating, and conducting the survey afresh with new fundamental survey conditions designed to prevent a repeat of the same problems. However, it must be acknowledged that there is no foolproof method of guarding against subtle forms of these problems when interviewers have financial disincentives for long interviews. The only truly foolproof method to deal with this problem is to pay interviewers in a way that removes any incentive to decrease the duration of the interview. Paying interviewers by the hour or providing a financial bonus for long interviews over and above a per-interview payment are the two most reasonable approaches of this sort. ESEMeD used the second of these methods in some countries once it became clear that downward bias was a serious problem. It is almost certainly the case, though, that some residual downward bias remains in the data, although we have no way of knowing how large this bias might be.