Economy is, of course, a matter of degree. The most economical procedure is the self-administered questionnaire (SAQ). SAQs do away with the expenses of training interviewers and paying them to conduct interviews about life events and may not lose anything in reliability when compared with interview-administered versions of the same questionnaire (e.g., Kubany et al., 2000
). Moreover, some evidence exists that SAQs also have the particular virtue of eliciting more reports of sensitive events (e.g., events involving matters such as child abuse, rape, and abortion) than personal interviews with the same checklist (Schaeffer, 2000
). The results are not, however, entirely consistent, and Schaeffer has speculated that the underlying benefit of more complete reporting derives from privacy that may be ensured in other ways—for example, by computer-assisted interviewing that does not require the presence of an interviewer or a certain level of reading skills on the part of the respondent (Turner et al., 1998
), by emphasis on the privacy of the data collection situation and the importance and legitimacy of the research (Schaeffer, 2000
), or by both.
It is hard to imagine how self-administered questionnaires would ever be useful in the exploratory stages of research on stressful life events. Moreover, for investigators who need to be able to retrospectively date the occurrence of events in relation to the occurrence of episodes of disorder over substantial periods of time, it is difficult to see how procedures to aid autobiographical memory (e.g., Bradburn, 2000
; Tourangeau, 2000
), such as the use of life calendar methods (e.g., Belli, 1998
; Caspi et al., 1996
; Lyketsos, Nestadt, Cwi, Heithoff, & Eaton, 1994
), can be implemented without the active presence of an interviewer.
Narrative-rating procedures that require the presence of highly trained interviewers and judges are at the other extreme from SAQs on the economy continuum. The benefits in reliability and validity of narrative-rating procedures are now widely recognized along with their lack of economy and, as would be expected under these circumstances, there have been and continue to be attempts to retain at least some of the benefits of narrative-rating methods with more economical procedures.
At least two attempts are underway to approximate contextual threat ratings of narrative material from intensive semistructured interviews with measures based on more structured interview approaches. The latter approaches place greater reliance on relatively economical closed questions and closed follow-up probes compared with the more open-ended LEDS approach. One of these more structured substitutes for LEDS is an interview called the Structured Life Events Inventory (SLI) that makes more extensive use of closed questions and closed probes and that requires less interviewer training than the more open-ended questioning procedures of LEDS (Wethington et al., 1997
). The SLI was investigated in a study of 243 community respondents, half of whom were interviewed with the SLI and half with LEDS. Raters of the SLI interviews were said to reliably distinguish between events representing severe contextual threat and more minor events, and to identify recent (previous 3 months) severe events and difficulties that showed similar positive associations with onset of depressive episodes, as did the ratings based on LEDS. Moreover, use of the SLI for these purposes reduced interview and rating time to an average of 9 hours per respondent compared with the 16-hour average for LEDS. However, only 41 respondents were interviewed with both methods, which must have greatly limited the opportunity to examine the extent to which the SLI and LEDS agreed in the measurement of contextual threat and other characteristics of relatively recent stressful events. Moreover the SLI, though more economical than LEDS, still involves a fairly intensive interview and narrative-rating process.
The other approach, now in its initial stage of development, is an attempt to go a step beyond the SLI in economy by substituting fully structured questions and probes and mechanical scoring for intensive semistructured interviews and narrative ratings altogether (Grant et al., 2004
). In addition, by contrast with the SLI, the focus is on children and adolescents rather than adults. The procedure involves analyzing contextual threat ratings based on narratives elicited by intensive semistructured interviews. The purpose of the analysis is to identify the specific items of information in the narratives on which the ratings are based. Once the information is identified, the next step would be to develop structured, closed questions, the answers to which would provide direct indicators of the nature and severity of the threat posed by the different types of events reported.
Unlike the previously discussed attempts to supplant labor-intensive procedures directly with more economical methods of doing the same thing, there have been attempts to increase economy by developing screening procedures designed to reduce the number of events for labor-intensive investigation. In this approach, minor events are assumed to be relatively unimportant and can be economically screened out so that larger events can be investigated with more intensive interview procedures (Brugha & Cragg, 1990
; Costello & Devins, 1988
; Goodman et al., 1998
; Kubany et al., 2000
; Miller & Salter, 1984
; Wittchen et al., 1989
The results of most of the screening studies that have actually conducted intensive follow-up interviews suggest that, if the screening instrument includes a few mandatory open-ended or closed-question probes to elicit more information about context, then the occurrence of the large majority of the events that are positive on the screening instrument will later be verified by more intensive interviews with free probing for details of what actually occurred. This seems to hold for both traditional checklist screens composed of broad event categories (Brugha & Cragg, 1990
; Miller & Salter, 1984
) and for screens composed of both broad and more specifically defined checklist items focused on traumatic events (Goodman et al., 1998
The most serious limitation of most of these tests of event screening instruments is that only positive responses on the screening instruments have been followed up with intensive interviews. Such studies cannot therefore address the question of how many important events are being missed by the screens. The exception is the study by Goodman et al. (1998)
, in which a subsample of the respondents that included screen negatives and screen positives were followed 2 weeks later with an intensive interview by a clinician blind to the results of the earlier screen. This 40-minute interview, which drew questions from a variety of previous studies but is not otherwise described, was designed to cover more intensively the same topics as those addressed with the screening instrument. The test-retest correlation for total events was .77; the median kappa for individual events was .64, with kappa for six of the items falling below .60. The interview elicited more events than the screening instrument, especially with regard to the six low kappa events. These results suggest that screening instruments of this type, despite showing substantial agreement with intensive interviews, will err on the side of underinclusiveness.
As Paykel (1983)
pointed out many years ago, all detailed interview approaches start with lists of life event categories. This includes LEDS and SEPRATE. The difference between checklists and intensive interview approaches is that the latter probe positive responses to the listed topics for details to develop narratives of what occurred, with the goals of reducing intracategory variability by permitting identification in the narratives of the events and event characteristics of interest. The resulting measures of the events and their characteristics are obtained by ratings of the narratives made by the investigators. How much detail is needed, however, and how best to provide it are matters for investigation. The question of whether trained raters must obtain the resulting measures or whether more economical mechanical scoring can be used must also be answered. Here is how such investigation might proceed.
Checklist measures would be redesigned to reduce the indeterminate mix of major and minor events within each category. Clues in the research reviewed so far suggest that this could be economically done in one of two ways: (1) adding a few closed-question probes after a positive response to a broad checklist category, as was done by Goodman et al. (1998)
; or (2) building inclusion or exclusion criteria (or both) into the checklist category itself (Grant et al., 2004
; Kubany et al., 2000
). In either case, scoring of the responses to the closed-question probes could be done mechanically.
Reducing intracategory variability economically by either of these procedures—(a) developing structured follow-up probes of broad event categories or (b) providing detailed definitions with which to narrow event categories—requires having a great deal of information about the nature of each type of event of interest. The procedures being investigated by Wethington et al. (1997)
in the SLI and by Grant et al. (2003)
involve searches for information in narratives that is strongly associated with contextual threat ratings made according to LEDS types of inventorying procedures in particular studies. Other more generalizable sources can be found in the vast literature reporting investigations of individual events, such as human-made and natural disasters (e.g., Giel, 1998
), bereavement (e.g., Clayton, 1998
), marital separation and divorce (e.g., Bruce, 1998
), rape (e.g., Kilpatrick et al., 1998
), and unemployment (e.g., Kasl, Rodriguez, & Lasch, 1998
). Kubany et al. (2000)
seem to have drawn, for example, on studies of various types of sexual abuse to specify inclusion and exclusion criteria for their screening items for these types of event.
How much can be accomplished with economically closed questions and closed probes to identify stressful events and measure their important characteristics needs to be empirically investigated. This investigation could be done by comparing and contrasting types of checklist approaches on their ability to screen the events of interest and to measure their important characteristics in samples of respondents from relevant populations of interest. If the interest is in major stressful events, then the focus could be on the occurrence of such events over the full life course. For life course coverage, personal interviews would be indicated, because dating, preferably with the aid of life calendar procedures, would be required. A field experiment designed to test the contrasting screening measures could involve data collection in three phases of increasingly more labor-intensive methods as shown in .
Three-phase design for comparing reliability and validity of checklist and stem question screening instruments for measuring major stressful events.
Two checklist approaches with different strengths and weaknesses would be used as screening instruments in the first phase of the comparison in . One screening instrument, Check I, would be a traditional checklist consisting of broad, overinclusive event categories that would be made more specific by the use of closed-question probes of positive responses to determine, for example, whether the events being reported were major or minor (see earlier examples of the probes used by Goodman et al., 1998
, for this purpose). The other would be an underinclusive second-generation checklist made up of fully structured questions that narrow the event categories by spelling out, in more detail than most second-generation checklists, inclusion and exclusion criteria in the checklist items themselves (see previous examples from Kubany et al., 2000
). To guard against missing major events that do not clearly meet these inclusion and exclusion criteria, this procedure would make use of “other” categories to gain information about additional events of each type. More specifically, the instrument would consist of stem questions that would be expanded by liberal use of other categories for events similar in some but not all ways to the types of events described by the detailed stem questions. This instrument would be called Stem I. The responses to the closed questions in both fully structured probes in Check I and closed questions in Stem I would be mechanically scored for each event and its important characteristics. The contrasting screening instruments would be randomly alternated between two groups of respondents as shown in . This would permit an investigation of how each instrument related to the other, and tests of which instrument more closely approximated criterion Phase 3 measures of major negative events and their important characteristics. The Phase 3 measures would be based on full-semistructured interview and narrative-rating procedures.
About 400 respondents would be needed to ensure sufficient statistical power to detect carry-over effects and possible interactions with such demographic differences as those involved in gender. All respondents would be interviewed with both Check I and Stem I instruments in Phase 1 of this experimental test, using a cross-over design with a 1-week delay between the administering of instruments to reduce order or carry-over effects. In Phase 2 of the experiment, one designed to investigate test-retest reliability, the respondents would be reinterviewed about 2 weeks later. At the end of the retest, the respondents would be asked two brief open-ended questions about each of her or his positive responses on either Check I or Stem I (i.e., What happened? and What led up to it?). The purpose of these questions would be to investigate further whether this relatively modest addition would lead to a large increase in accuracy by providing additional detail for measuring important characteristics such as the source, centrality, and magnitude of the event. The Phase 3 criterion measures would be applied to all positive responses elicited by Check I and Stem I about 1 month earlier.
The field experiment summarized in is designed for the purposes of (a) comparing the major stressful events and their characteristics identified by each screening instrument to the major stressful events and their characteristics identified by the other screening instrument, as well as (b) testing the ability of each screening instrument to identify the major stressful events and their important characteristics elicited and measured by the full intensive interview and narrative ratings from Phase 3. The results would be relevant for retrospective research over the life course or long periods of the life course. As noted earlier, such research must rely on personal interviews and life calendar aids to recall. Additional tests of the two types of screening instruments could be developed for different purposes; for example, the study of minor and major events longitudinally in relation to the onset and recurrent course of major depression. If particularly sensitive events were to be investigated, especially with children or adolescents (Turner et al., 1998
), then it would be well to test the relative accuracy of computer-assisted interviews, self-administered questionnaires, and personal interview versions of Check I and Stem I.