|Home | About | Journals | Submit | Contact Us | Français|
The present study examined the psychometric properties and diagnostic efficiency of the Davidson Trauma Scale (DTS), a self-report measure of posttraumatic stress disorder (PTSD) symptoms. Participants included 158 U.S. military veterans who have served since September 11, 2001 (post-9/11). Results support the DTS as a valid self-report measure of PTSD symptoms. The DTS demonstrated good internal consistency, concurrent validity, and convergent and divergent validity. Diagnostic efficiency was excellent when discriminating between veterans with PTSD and veterans with no Axis I diagnosis. However, although satisfactory by conventional standards, efficiency was substantially attenuated when discriminating between PTSD and other Axis I diagnoses. Thus, results illustrate that potency of the DTS as a diagnostic aid was highly dependent on the comparison group used for analyses. Results are discussed in terms of applications to clinical practice and research.
Self-report measures are valuable tools for clinicians and researchers, as they are quick and cost-effective methods for assessing symptoms associated with mental illness. In the last two decades, several self-report measures of posttraumatic stress disorder (PTSD) have been developed (see Brewin, 2005; Norris & Hamblen, 2004; for reviews). Concurrently, there has been a growing appreciation for the reality that for a measure to have utility, it is essential that support for its validity has been demonstrated (Hunsley & Mash, 2005). As evidence accumulates for the negative impact of PTSD on overall health (Beckham et al., 1998; Dohrenwend et al., 2007; Taft, Stern, King, & King, 1999), family adjustment (Jordan et al., 1992) and health care costs (Walker et al., 2003) the need for brief and valid measures of PTSD symptoms has become clear.
Prior studies have validated various PTSD symptom questionnaires for use with several targeted groups, including breast cancer patients (Andrykowski, Cordova, Studts, & Miller, 1998), crime victims (Wohlfarth, van den Brink, Winkel, & ter Smitten, 2003), Vietnam-era combat veterans (Forbes, Creamer, & Biddle, 2001), female veterans in primary care (Dobie et al., 2002; Lang, Laffaye, Satz, Dresselhaus, & Stein, 2003) and older adults in primary care (Cook, Elhai, & Areάn, 2005). Replications such as these are important, as a symptom scale may take on different properties in different populations (Bossuyt et al., 2003; Brewin, 2005). For example, Blanchard et al. (1996) found that a score of 44 or higher on the PCL Checklist was most effective at identifying PTSD while minimizing false positives in a sample of motor vehicle accident and sexual assault victims. In contrast, Lang et al. (2003), using the same measure, found that a score in the range of 28–30 was most effective in detecting PTSD in female veterans who visited a primary care clinic.
Unfortunately, many populations that are at high-risk for trauma exposure do not have adequately-validated measures available. For example, in spite of the increasing need for valid PTSD screening instruments for returning military service personnel (Hoge et al., 2004), no self-report measure of PTSD has yet been validated with veterans who have served since September 11th, 2001 (post-9/11). It is estimated that 35% of OIF veterans will access mental health services in the year after returning home, and 5–20% will meet criteria for PTSD (Hoge, Auchterlonie, & Milliken, 2006; Hoge et al., 2004). As there are now several empirically-supported treatments for PTSD (American Psychiatric Association,2004; Bisson & Andrew, 2005; Department of Veterans Affairs & Department of Defense, 2004), there is compelling incentive to validate screening tools for the identification of PTSD in high-risk groups, such as military personnel and veterans.
Diagnostic tests are most efficient when a group identified as having the condition is compared to an equal number of those that exhibit none of the clinical characteristics of the condition, e.g. healthy controls. It is important to consider that clinical characteristics of the comparison group may have significant effects on the efficiency of the test in question, and thus their generalizability (Coyne & Thompson, 2007; Streiner, 2003). For example, individuals with Major Depressive Disorder endorse many of the symptoms that are found in PTSD (e.g., poor concentration, sleep difficulties, and anhedonia), and thus tend to score higher on PTSD symptom questionnaires than healthy controls (Shalev et al., 1998). In effect, a score that is very efficient when discriminating between PTSD and healthy controls may be less efficient in discriminating between PTSD and individuals with other presenting problems. The latter scenario more approximates conditions in a mental health clinic, in which most patients will be in distress and the clinician is faced with the often challenging task of differential diagnosis (cf. Hankin, Spiro, Miller, & Kazis, 1999). Unfortunately, prior studies have rarely described or assessed the clinical characteristics of their comparison groups. Thus, the literature provides little evidence for the diagnostic efficiency of PTSD symptom questionnaires in a mental health setting.
The purpose of the current study was to examine the validity and diagnostic efficiency of the Davidson Trauma Scale (DTS; Davidson, Book, et al., 1997) in a group of veterans who served after September 11th, 2001. The DTS is a self-report measure of the 17 PTSD symptoms as described in the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV-TR; American Psychiatric Association, 2000). As a diagnostic tool, Davidson, Book, et al. (1997) demonstrated that the DTS performed well at discriminating 67 individuals with PTSD from 62 without PTSD (area under the curve [AUC] = 0.88, S.E. = 0.02) using a semi-structured interview (SCID; Spitzer, Williams, Gibbon, & First, 1990) as the reference standard. A DTS score of 40 was recommended as the optimal cut-point for accurate classification of those with or without PTSD (efficiency = 0.83). This cut-point correctly classified 69% of individuals with PTSD (sensitivity = 0.69) and 95% of those who did not have PTSD (specificity = 0.95).
No previous studies have examined the psychometric properties of the DTS in veterans who have served post-9/11. Like most PTSD screening measures, the ability of the DTS to discriminate between those with PTSD and other psychiatric disorders is unknown. Therefore, the current study tested the ability of the DTS to discriminate between veterans with PTSD and two comparison groups: (1) veterans with no Axis I diagnosis and (2) veterans without PTSD but with a current diagnosis of another Axis I disorder.
The sample consisted of 226 volunteer participants in the Mid-Atlantic Mental Illness Research, Education and Clinical Center (MIRECC) Recruitment Database for the Study of Post-Deployment Mental Health. Participants were veterans who have served in the United States Armed Forces since September 11, 2001. About half (53%) of the participants had been stationed in a region of conflict in support of Operation Enduring Freedom or Operation Iraqi Freedom. Participants were recruited from four VISN-6 Veterans Affairs medical centers through mailings advertisements, and clinician referrals. Informed consent was obtained after explaining procedures. The veterans were administered a battery of questionnaires related to post-deployment mental illness, including: psychiatric symptoms, mental-health service utilization, health, and health-related behaviors. Diagnosis was established by the Structured Clinical Interview for DSM-IV-TR Axis I Disorders (SCIDI/ P; First, Spitzer, Gibbon, & Williams, 1994), a semi-structured interview administered by trained masters- or Ph.D.-level clinicians. Participants in this sample were among those used for an evaluation of the factor structure of the DTS, which is presented elsewhere (McDonald et al., 2008).
Of the initial pool of 226 participants, 158 veterans recorded a traumatic event on the DTS that clearly met DSM-IV Criterion A1 (i.e., “involved actual or threatened death or serious injury, or a threat to the physical integrity of self or others”; APA, 2000, p. 467). Of those 158 veterans, 75 (47%) recorded war zone-related traumas (e.g., “my truck hit by an improvised explosive device”) and 83 (53%) recorded traumas occurring outside of their deployment (e.g., “near drowning of my son”). The remaining 68 participants recorded a traumatic event on the DTS that was either not a trauma as defined by the DSM-IV (n = 11; e.g., “back pain, ” “not being able to talk to my kids”), included multiple, discrete traumatic events in the narrative (n = 3), was too vague to determine the nature of the trauma (n = 32; e.g., “the way I was treated ”), or reported no lifetime exposure to trauma (n = 21). As a description of a bona fide traumatic event is necessary for the DTS to be considered valid (Davidson, 1996), these 68 participants were excluded. Thus, data from a total of 158 veterans were retained for analyses.
The DTS is a 17-item self-report questionnaire of posttraumatic stress symptoms, developed for use with trauma survivors. For the current study, the original Davidson, Book, et al. (1997) version was used. Respondents are first asked to record “the trauma that is most disturbing to you. ” Next, respondents are asked to read each of the 17 items, and “consider how often in the last week the symptom troubled you and how severe it was. ” The first five items specifically refer to reexperiencing or avoiding the disturbing event. The frequency and severity of the symptoms are recorded using 5-point, Likert-type scales. Frequency and severity scores were summed for each symptom, resulting in a total of 17 variables used in analyses (Elhai et al., 2006). The DTS total score was computed by adding all item responses together, with a possible range of 0–136. The three DTS subscales (reexperiencing, avoidance/numbing, and hyperarousal) and separate subscales for avoidance and for numbing (McDonald et al., 2008) were computed by adding all subscale items together and dividing by the total number used in the scale, resulting in a possible range of 0–5.
In an early validation study, the DTS demonstrated good internal consistency (alpha = 0.99), convergent validity (CAPS, R = 0.78), divergent validity (extroversion, R = 0.04), and concurrent validity, as well as strong test–retest reliability (Davidson, Book, et al., 1997; 0.86). A later study (Davidson, Tharwani, & Connor, 2002) demonstrated that the DTS is sensitive to treatment effects of selective serotonin reuptake inhibitors (SSRIs) for PTSD symptoms. Furthermore, the treatment effect size for the DTS was larger than the effect size for the Impact of Events Scale (Horowitz, Wilner, & Alvarez, 1979) and equal to those observed for the Clinician Administered PTSD Scale (CAPS; Blake et al., 1990) and Structured Interview for PTSD (SIP; Davidson, Malik, & Travers, 1997). Two studies have examined the factorial validity of the DTS Davidson, Book, et al. (1997) conducted exploratory factor analyses on data from 67 individuals with PTSD, and reported a six-factor structure that roughly corresponded to the three DSM-IV symptom clusters. More recently, using confirmatory factor analysis (CFA), McDonald et al. (2008) found that a four-factor structure for the DTS (reexperiencing, avoidance, numbing, and hyperarousal) was invariant across three veteran samples.
The SCID-I/P is a semi-structured interview used to diagnose DSM-IV Axis I disorders, and was used as the reference standard for a diagnosis of PTSD in this study. The SCID-I/P was administered by trained Masters or Doctorate-level research personnel. The reliability of interviewers on scoring a series of seven SCID training videos (full SCID I/P, mean Fleiss’ kappa across 18 Axis I diagnoses) was excellent (Fleiss’ kappa = 0.95).
The SCL-90-R is a 90-item self-report questionnaire used to evaluate a wide range of symptoms of psychopathology. Ratings are made on a 5-point scale. It has nine symptom scales and three global indices of distress. Internal consistency (alpha) for the subscales ranged from 0.77 (psychoticism) to 0.90 (depression) across two validation studies (Derogatis, 1994). The SCL-90-R subscales have demonstrated acceptable convergent and discriminant validity when compared to other multidimensional measures of psychopathology (e.g., MMPI-2, Middlesex Hospital Questionnaire) and when used to discriminate between anxiety and depressive disorders (Derogatis, 1994).
Descriptive data for the sample, item- and scale-level characteristics (e.g., internal consistency), correlations, and mean group differences were computed using SPSS 15.0. Concurrent validity was examined by comparing the DTS total scores across the three veteran groups using analysis of variance (ANOVA). Pearson correlations between the DTS and the subscales of the SCL-90-R were conducted to examine convergent and divergent validity. In addition, standard multiple regression was employed to examine the unique relationship between SCL-90-R subscales and the DTS when all subscales are simultaneously modeled.
Receiver operating characteristic (ROC) curve analysis was used to examine the utility of the DTS compared to a reference standard of PTSD (i.e., SCID-I/P). The ROC curve, based upon signal detection theory, provides a visual and quantitative way of illustrating how well a continuous measure, or “test, ” properly classifies cases according to their known membership into one of two groups (Streiner, 2003). The accuracy of the test is graphically represented on an ROC curve, illustrating the change in sensitivity vs. (1 –specificity) as the cut-point on the test is varied. An ROC curve can be used to determine the optimal cut-point which maximizes true positives (i.e., sensitivity) while minimizing false positives (i.e., 1 –specificity) for a given application. The further the curve is positioned away from the diagonal (and toward high true positive rate and low false positive rate), the more accurate the test. An additional advantage of ROC analyses is that it provides a common metric, the “area under the curve” (AUC), with which to quantify the information value of assessment data that is independent of changes in the cutting-scores used for diagnostic decisions and prevalence rates (McFall & Treat, 1999). The AUC is a summary statistic that represents the ability of the variable to correctly classify cases into two groups for one or a range of scale values. More specifically, the AUC is the probability that two cases randomly-drawn from the sample, one from each group, will both be classified correctly. A test that provides no information concerning group membership will have an AUC of 0.50, and is depicted on an ROC curve as a diagonal line. The AUC of a test increases with the accuracy of the test, to a maximum of 1.00.
Sensitivity (proportion of true positives), specificity (proportion of true negatives), positive predictive power (PPP; proportion of those testing positive who actually have the condition), negative predictive power (NPP; proportion of those testing negative who actually do not have the condition), and efficiency (proportion of individuals correctly classified) were calculated in addition to AUC statistics. As a diagnostic test is most efficient when 50% of the sample being tested has the condition of interest (Streiner, 2003), similar numbers of participants were included in each group to be compared to provide an optimal test of diagnostic efficiency. While sensitivity and specificity are considered fixed properties of a test (Streiner, 2003), the predictive power of a test is in part determined by the prevalence of a disorder in a given sample. Thus, PPP and NPP were computed using formulae provided by Streiner (2003) for several PTSD prevalence rates that approximate clinical conditions within the Department of Veterans Affairs health care system: 13% (Operations Enduring Freedom and Iraqi Freedom [OEF/OIF] veterans receiving care in DVA facilities; Seal, Bertenthal, Miner, Sen, & Marmar, 2007), 50% (OEF/OIF veterans with any mental health diagnosis; Seal et al., 2007), and 90% (treatment-seeking veterans at intake in a specialty PTSD clinic).
In addition to the standard scoring of the DTS, the utility of the symptom cluster method for testing diagnostic efficiency was also employed (e.g., Foa, Cashman, Jaycox, & Perry, 1997). The symptom cluster method is based on the DSM-IV criteria for PTSD, requiring the presence of one reexperiencing symptom, three avoidance/numbing symptoms, and two hyperarousal symptoms. Such a method may provide a more intuitive diagnostic procedure for those familiar with DSM-IV criteria. The present study is the first to test the utility of the symptom cluster method using the DTS.
Gender, race, and other demographic data for the participants are presented in Table 1. Diagnostic interviews using the SCID-I/P revealed that 71 participants (45%) met criteria for an Axis I disorder, with about half of those (39, 55%) meeting criteria for PTSD. Of those with PTSD, 25 (64%) met criteria for a secondary Axis I disorder, consistent with the high levels of comorbidity described in other studies (Magruder et al., 2005). The most common diagnoses accompanying PTSD were Major Depressive Disorder (16, 41%) and Social Phobia (3, 8%). Of those meeting criteria for an Axis I disorder other than PTSD (alone or in combination with another Axis I disorder), the most common diagnoses were Major Depressive Disorder (13, 41%), Social Phobia (7, 22%), alcohol or substance abuse (7, 22%), and Anxiety, NOS (3, 9%). Eighty-seven participants (55%) did not meet criteria for any Axis I disorder. No one in the sample met criteria for a psychotic disorder.
Table 2 presents means, standard deviations, skew, kurtosis, corrected item-total correlations, and squared multiple correlations for items on the DTS. Internal consistency was good for the DTS total score (alpha = 0.97; Nunnally & Bernstein, 1994). Dletion of any one of the items would not substantially change the internal consistency of the DTS total score. However, evaluation of the corrected item total correlation (corrected rtotal) and the squared multiple correlation for each item suggested that item 7, “found yourself unable to recall important parts of the event” was a relatively poor fit to the scale.
Table 3 presents means, standard deviations, skew, kurtosis, Cronbach’s alphas, and Pearson correlations for the DTS total score, three subscale scores (reexperiencing, avoidance/numbing, and hyperarousal), and two separate subscale scores for avoidance and numbing items (McDonald et al., 2008). Internal consistency was good for the three conventional subscales (alpha = 0.92–0.94). Internal consistency of the subscale with the five numbing items (alpha = 0.89) and the subscale for the two avoidance items (alpha = 0.84) was adequate for group research, but was lower than acceptable for analysis of individual responses (Nunnally & Bernstein, 1994). The distribution of the DTS total score and subscale scores was generally positively skewed, with zero as the modal response. When only data from individuals with PTSD were assessed, skewness fell within normal limits for all (−0.12 to −0.43, S.E. = 0.38) but the arousal subscale (−0.80, S.E. = 0.38).
Two-way analyses of variance (ANOVA) were conducted to examine whether DTS total score differed for PTSD status (PTSD, no diagnosis) by gender (male, female) or race (White, Black). While there was a main effect of PTSD status (F[1,121] = 89.9, p < 0.001), there was neither a main effect for gender (F[1,121] < 0.1, ns), nor an interaction (F[1,121] = 0.8, ns). Similarly for race, there was a main effect of PTSD status (F[1,108] = 127.3, p < 0.001), but no main effect of race (F[1,108] < 0.1, ns) or an interaction (F[1,108] = 0.1, ns). For both gender and race, similar results were found with DTS subscale scores serving as the dependent variables.
Analyses of variance demonstrated diagnostic group differences on the DTS total score and subscales, supporting the concurrent validity of the DTS (Table 4). Participants diagnoses with PTSD (M = 79.6) scored significantly higher than participants with no diagnosis (M = 14.7) and those with a non-PTSD diagnosis (M = 37.6). Similar findings were observed for the DTS subscales.
The DTS should demonstrate a modest relationship with measures of anxiety (i.e., convergent validity), as PTSD is considered an anxiety disorder. In contrast, the DTS should demonstrate a smaller relationship with measures of other symptoms, such as depression, psychosis, and hostility (i.e., divergent or discriminant validity). Table 5 displays Pearson correlations between the DTS and subscales from the SCL-90-R. Although correlations were moderate to strong between the DTS and all SCL-90-R subscales, correlations were strongest with two anxiety-related subscales (Obsessive-Compulsive and Anxiety), demonstrating convergent validity.
However, correlations between the DTS and other subscales were moderate to strong as well. To examine the independent ability of the subscales to predict the DTS, a standard multiple regression analysis was conducted with all SCL-90-R subscales placed into one step.TheSCL-90-R subscales accounted for 62% of the variance in the DTS, F(9,139) = 27.8, p < 0.001. Semi-partial correlations are presented in Table 5. One anxiety-related subscale, “Obsessive-Compulsive, ” was a significant predictor of DTS total scores after accounting for variance contributed by other subscales. The “Anxiety” subscale accounted for the second-largest amount of variance in the DTS, although it was just beyond the threshold of significance (p = 0.06). These findings suggest that the DTS shares a unique relationship with other measures of anxiety that it does not share with other measures of psychopathology (e.g., depression, psychoticism), providing evidence for divergent validity.
ROC curves and AUC statistics were generated for two comparisons. The first examined the ability of the DTS to discriminate between veterans with PTSD (n = 39) and a random subsample of those with no Axis I diagnosis (n = 32; M = 11.8, S.D. = 21.4). A second analysis was conducted to examine the ability of the DTS to discriminate between veterans with PTSD and those with other Axis I diagnoses (n = 32).
As shown in Fig. 1, the DTS was excellent at discriminating between veterans with PTSD and those with no diagnosis (AUC = 0.95, S.E. = 0.03, p < 0.001, asymptotic 95% CI: lower bound = 0.89, upper bound = 1.00). A score of 32 provided the most efficient tradeoff of false positive to false negative classification (efficiency = 0.94; Table 6). Although its performance was still considered good by conventional standards, the DTS was less capable at discriminating between veterans with PTSD and those with mood or anxiety disorders but no PTSD (AUC = 0.83, S.E. = 0.05, p < 0.001, asymptotic 95% CI: lower bound = 0.74, upper bound = 0.92). The range of DTS scores that demonstrated comparable efficiency was present across a range of scores, with the highest efficiency at cut-points of 35 (efficiency = 0.77) and 75 (efficiency = 0.76; Table 6). Table 7 provides PPP and NPP values across a range of prevalence values for both comparisons.
The symptom cluster method provides an alternative to the cut-point method for evaluation diagnostic efficiency (Foa et al., 1997; Weathers, Ruscio, & Keane, 1999). This method requires endorsement of one reexperiencing symptom, three avoidance/numbing symptoms, and two hyperarousal symptoms. Rates of endorsement for each symptom per diagnostic group are presented in Table 2. A symptom was scored as “present” if the respondent endorsed both the presence of the symptom (i.e., a “frequency” score of one or greater) and distress associated with the symptom (i.e., a “severity” score of one or greater). By this method, 36 of 39 veterans with PTSD were correctly classified (sensitivity = 0.92). Twenty-nine of the 32 veterans with no diagnosis (specificity = 0.91, efficiency = 0.92) and only half of those with mood or anxiety disorders but not PTSD (16 of 32, specificity = 0.50, efficiency = 0.73) were correctly classified. To examine whether the test’s efficiency could be improved by increasing the symptom severity criteria, the analysis was repeated, this time requiring symptoms to be rated at least “moderately distressing” (i.e., a “severity” score of two or greater). In this case, only 30 of 39 veterans with PTSD were correctly classified (sensitivity = 0.77). Thirty of 32 veterans with no diagnosis (specificity = 0.94, efficiency = 0.85) and 23 of 32 veterans with mood or anxiety disorders but not PTSD (specificity = 0.72, efficiency = 0.75) were correctly classified. Although increasing symptom severity criteria to “moderately distressing” resulted in improved specificity, sensitivity was attenuated. Thus, although these two symptom cluster methods demonstrated adequate efficiency in discriminating between veterans with PTSD and those with no Axis I diagnosis,they did not perform as well as the cut-score method.
The present study examined the validity and diagnostic efficiency of the DTS (Davidson, Book, et al., 1997) in a sample of veterans who have served since September 11th, 2001. Posttraumatic stress disorder is a relatively prevalent condition among these veterans, and there is a strong need for valid assessment tools. Brief self-report questionnaires, such as the DTS, can provide a considerable aid to the clinician and researcher in identifying those who are experiencing symptoms of PTSD. However, an invalid or misused diagnostic aid can cause more harm than good, considering the major impact that a diagnosis or positive screening can have in a patient’s treatment and subsequently his or her quality of life (Streiner, 2003). Thus, it is critical that self-report measures of PTSD are valid in order to provide accurate aid to assessment, conduct quality research, and offer the best care to patients.
Results of the current study provide support for the DTS as a valid self-report measure of PTSD symptoms for veterans serving since 9/11, particularly in comparison to individuals without a psychiatric diagnosis. Concurrent validity was supported, as veterans with PTSD scored higher on the DTS than those with other Axis I diagnoses or no diagnosis. Convergent and divergent validity were supported, as the DTS had a stronger relationship with anxiety-related symptom scales compared to those tapping other psychopathology. The DTS demonstrated good internal consistency (alpha = 0.97), matching or exceeding alpha reported for other PTSD symptom questionnaires (Norris & Hamblen, 2004). However, item 7, regarding the respondent’s memory of the traumatic event, was endorsed by less than half of the participants with PTSD, and demonstrated relatively low inter-correlations with other DTS items. This finding is consistent with prior research indicating that loss of memory for the event is not a reliable predictor of PTSD (Davidson, Book, et al., 1997; Foa, Riggs, & Gershuny, 1995), and suggests that this item would benefit from rewording or perhaps closer scrutiny as a core symptom of PTSD.
Concerning diagnostic efficiency, potency of the DTS was dependent on the comparison group used for analyses: whereas diagnostic efficiency was excellent when discriminating between veterans with PTSD and veterans with no diagnosis (AUC = 0.95), efficiency was attenuated, although still acceptable, when discriminating between PTSD and other Axis I diagnoses (AUC = 0.83). When attempting to discriminate between veterans with PTSD and those with no Axis I diagnosis, the diagnostic efficiency of the DTS in the current study (AUC = 0.95) was superior to that reported by Davidson, Book, et al. (1997; AUC = 0.88). Whereas Davidson and colleagues reported that a DTS score of 40 provided the best efficiency (0.83), the current study found that a lower cut-point of 32 was more efficient in this sample of post-9/11 veterans (efficiency = 0.94).
Although the efficiency statistic is a common indicator of a measure’s performance, the cut-point employed by a researcher or clinician should also be informed by the intended application. One should note that efficiency reflects the overall hit rate and thus places equal value on obtaining false positives and false negatives. In practice, other utility functions that place more or less emphasis on obtaining false positives may be preferred. For example, if the DTS was used as a PTSD screening tool to identify individuals who may benefit from more time and resource-intensive assessment, a clinician would likely want to use a lower cut-point to identify as many individuals with PTSD as possible (i.e., maximize sensitivity) while accepting the risk of increasing false positives. In another circumstance, a researcher with a limited budget may want to use a higher cut-point to reduce the number of false positives enrolled in the study (i.e., maximize specificity).
In addition to testing the ability of the DTS to discriminate between PTSD and healthy veterans, this study also examined its capability to discern PTSD from other Axis I disorders. This scenario is likely of more interest to clinicians and researchers in mental health clinics and psychiatric research, who may use PTSD symptom questionnaires to aid differential diagnosis. The DTS performed adequately in this circumstance (AUC = 0.83), albeit not as well as when discriminating PTSD from healthy veterans. A cut-point of 35 provided the optimal balance of identifying as many veterans with PTSD as possible, while maximizing efficiency (0.77): 37 of 39 veterans with PTSD were correctly classified in this study, although specificity was poor, with 14 of 32 vets with other Axis I disorders mistakenly identified as having PTSD. If a clinician or researcher wishes to minimize false positives (while maximizing efficiency), a cut-point of 75 was best: in the current study, only three false positives (of 32, or 9%) were returned.
Results from our study examined the range of values for positive and predictive power that correspond to PTSD prevalence rates in several scenarios (post-9/11 veterans receiving care, post-9/11 veterans with any mental health diagnosis, 90% of treatment-seeking veterans in a specialty PTSD clinic), and these data illustrate that PPP and NPP vary depending on the prevalence of the disorder in the clinic’s population. When prevalence of a condition is low (e.g., primary care), a test is best used to rule out a condition but not to rule it in. For example, when applying the cutting score that maximized hit-rates between those with PTSD and without a psychiatric disorder in a primary care clinic setting, a positive test result would be correct 61% of the time. Thus, 39% of the time positive test results would wrong. Similarly, when the prevalence of a condition is high such as a specialty PTSD clinic, a test is best used to rule in a condition but not to rule it out (Streiner, 2003).
Concerning the optimal scoring method for testing the diagnostic efficiency of the DTS, results of this study support the conventional cut-point method over a DSM-IV-based, symptom cluster method. Although the authors of the DTS utilized the cut-point method in an early validation study (Davidson, Book, et al., 1997), other PTSD symptom questionnaires have been scored using variations of both methods (e.g., Foa et al., 1997; Ruggiero, Del Ben, Scotti, & Rabalais, 2003). Results of the current study indicated that although the symptom cluster method was effective at correctly classifying veterans with PTSD from those with no Axis I disorders, the cut-point method was more efficient. These findings are consistent with a recent review that found little benefit in utilizing more complex scoring methods instead of the conventional cut-point strategy (Brewin, 2005).
An important issue concerning PTSD symptom questionnaires that needs further research is the potential impact of anchoring responses to one particular traumatic event (Norris & Hamblen, 2004). Clarifying the nature of the disturbing event would appear especially important, given evidence that persons with other psychiatric disturbances, such as major depression, can report symptoms consistent with PTSD even though they have not been exposed to a Criterion A traumatic event (Bodkin, Pope, Detke, & Hudson, 2007). The DTS attempts to capture the index trauma by requiring respondents to write a brief description of the Criterion A traumatic event “that is most disturbing to you” (Davidson, 1996). For this reason, the DTS ostensibly has an advantage over other PTSD measures that do not document the index trauma, such as the PCL-C (Blanchard, Jones-Alexander, Buckley, & Forneris, 1996) and the PC-PTSD (Prins et al., 2003) when used as a PTSD screening tool. However, in this study, 20% of the participants recorded a disturbing event on the DTS that could not be clearly identified as a discrete DSM-IV PTSD Criterion A1 “trauma.” Certainly, many of the participants had a bona fide trauma in mind, as evinced by the 16 of the 47 excluded participants who met SCID I/P diagnosis for PTSD. Perhaps some participants refrained from providing detailed trauma descriptions due to fearful avoidance of reminders, or other motivational factors such as guilt, shame, or fatigue. For whatever reasons, these cases demonstrate the limitation of using an open-ended response format to record traumatic stressors on self-report questionnaires. Although rarely done in practice and not generally reported in previous studies using self-report PTSD instruments, it would seem important to assess whether the event meets criteria for a “traumatic event” prior to the respondent completing the symptom inventory. In this regard, itmay be helpful to anchor the DTS symptom ratings to the outcome of a trauma rating scale, such as the Traumatic Life Events Questionnaire (Kubany et al., 2000; cf. PDS, Foa et al., 1997).
It is notable that this sample entirely consisted of veterans who have served since September 9th, 2001. Although this specificity will be useful to those working with veterans who have served in the post-9/11 era, the extent to which these results generalize to other populations is unclear. In addition, although the participant group employed for this study was diverse, the sample was not large enough to fully examine ethnic or racial differences in diagnostic efficiency for the DTS. Further research is needed to determine generalizability of these findings across demographic groups (e.g., race, gender, and first language), the range of symptom severity and comorbidity, and a variety of settings (i.e., primary care).
Our results suggest that use of the DTS as a diagnostic screening in settings where rates of other Axis I disorders are high (e.g., a mental health clinic), particularly at the level of usual recommended screening cutoff scores, substantial diagnostic errors could result. Several symptoms associated with PTSD are also common in other Axis I mental disorders (e.g., concentration problems in Major Depressive Disorder). Thus, it is not surprising that those with an Axis I diagnosis other than PTSD had elevated DTS scores relative to those with no diagnosis, and subsequently were far more difficult to distinguish from those with PTSD. This result is also consistent with other research suggesting that PTSD symptom instruments overlap substantially with other psychological distress symptoms (Lauterbach, Vrana, King, & King, 1997). Results are also consistent with a taxometric approach to psychopathology which acknowledges the nonspecific symptoms of distress reported by patients with PTSD, depressive disorders, and Generalized Anxiety Disorder (e.g., Bodkin et al., 2007; O’Donnell, Creamer, & Pattison, 2004; Watson, Gamez, & Simms, 2005). In addition, these findings provide practical support for the view that only symptoms with diagnostic specificity be retained in future revisions of the DSM (McHugh & Treisman, 2007; Spitzer, First, & Wakefield, 2007).
Unfortunately, no reports are available concerning the ability of the PTSD Checklist (PCL) or the Posttraumatic Diagnostic Scale (Foa et al., 1997) to discriminate between PTSD and other Axis I disorders. Our results suggest that this would be an important avenue for future studies evaluating self-report instruments for PTSD. In addition, we urge clinicians and researchers who utilize a measure’s diagnostic efficiency statistics as part of their diagnostic assessment to consider whether the comparison group used in the validation study is an appropriate match to the population they serve. Furthermore, we reiterate the consensus of the authors of the DTS that it should not be used alone to make a diagnosis of PTSD. Instead, the DTS is foremost a measure of symptom severity, and secondly an ally in comprehensive clinical assessment.
Preparation of this manuscript was in part supported by Office of Research and Development Clinical Science, Department of Veterans Affairs, K24DA016388, K23MH073091, 2R01CA081595, and R21DA019704 and HL54780. The views expressed in this presentation are those of the authors and do not necessarily represent the views of the Department of Veterans Affairs or the National Institutes of Health. We would like to acknowledge the contributions of the OIF/OEF Registry Workgroup (Ms. Kimberly Green and Drs. Larry Tupler, Ryan Wagner and Christine Marx), and principal investigators for the OEF/OIF Registry at sites of data collection other than the Durham VAMC (Drs. Marinell Miller-Mumford, Antony Fernandez, Katherine Taber, and Ruth Yoash-Gantz).