|Home | About | Journals | Submit | Contact Us | Français|
Over two decades ago, clinicians were challenged to demonstrate they were not superfluous as diagnosticians (Spitzer 1983). Since then, reports have compared clinical diagnostic assessments with standardized schedules to determine level of agreement. Studies have focused on children, adolescents and adult outpatients (Basco et al 2000; Komiti et al 2001; Kranzler et al 1995; Shear et al 2000; Zimmerman and Mattia 1999; Ezpeleta et al 1997; Jensen and Weisz 2002; Lewczyk et al 2003; Thienemann 2004), inpatients (Fennig et al 1996; Kranzler et al 1995; Miller et al 2001; Rosenman et al 1997; Aronen et al 1993; Steiner et al 1995), adults transferring from emergency departments to inpatient units (Miller 2001; Taggart et al 2006) and adult epidemiologic samples (Anthony et al 1985; Eaton et al 2000). Studies have assessed psychiatric diagnostic agreement in children and adults from a diagnostic range (Aronen et al 1993; Ezpeleta et al 1997; Jensen and Weiss 2002; Lewczyk et al 2003; Steiner et al 1995; Weinstein et al 1989; Zimmerman and Mattia 1999), and from restricted number (Basco et al 2000; Fennig et al 1996; Komiti et al 2001; Kranzler et al 1995; Miller 2001; Miller et al 2001; Rosenman et al 1997; Shear et al 2000; Taggart et al 2006; Thienemann 2004). Some findings indicate moderate (Anthony et al 1985; Ezpeleta et al 1997; Fennig et al 1996; Komiti et al 2001; Kranzler et al 1995; Miller et al 2001; Taggart et al 2006), but mostly poor (Aronen et al 1993; Ezpeleta et al 1997; Jensen et al 2002; Komiti et al 2001; Lewczyk et al 2003; Miller et al 2001; Rosenman et al 1997; Shear et al 2000) agreement between diagnoses obtained by clinical versus research assessment.
A more critical task is suicide risk assessment. Prospective studies have identified risk factors for suicidal behavior (Oquendo et al 2006), but no standard clinical suicide assessment exists. Few studies have assessed the utility and accuracy of suicide related rating scales in psychiatric in and outpatients (Beck et al 1988; Beck et al 1989; Beck et al 1979; Brown et al 2000; Holden and DeLisle 2005; Pinninti et al 2002; Steer et al 1993; Steer et al 1993), and there is sparse literature comparing standardized rating scales for suicidality with clinical assessments. However, clinicians appear to fail to document suicidal behaviors reported by patient self-report or identified by research ratings (Healy et al 2006; Malone et al 1995).
Thus, accuracy of clinical diagnostic assessment and suicide risk evaluation, imperative to providing quality and safe care, is sub-par. To address this issue, we determined agreement between clinical and research assessments of diagnosis and suicidal behaviors in inpatients admitted to a research unit. If in fact clinical assessment is less likely to identify high-risk patients or different diagnoses compared to research assessments, then standardized scales in routine care may be useful.
Adult inpatients (N=201) with a major depressive episode (MDE) in the context of major depressive or bipolar disorder based on the Structured Clinical Interview for DSM III-R (Spitzer and Williams 1985) gave written informed consent as approved by the IRB. Postgraduate year II resident physicians (PGYIIs), with attending physician supervision made clinical diagnostic and suicide assessments. Master’s or Ph.D. level clinicians performed independent structured diagnostic interviews and suicide assessments within 1-5 days of another. Clinical data were obtained from a retrospective chart review of consecutively admitted patients (October 2002 - August 2006).
Women (n=120) and men (n=81) aged 18-72 had a physical examination and routine blood tests, including urine toxicology. Exclusion criteria were current substance or alcohol abuse, or active medical conditions that could confound diagnosis.
The inpatient unit is in a tertiary care, university-affiliated medical center. Attending psychiatrists, PGYIIs, nurses, social workers, recreational therapists and mental health therapy aides provide patient care.
On admission, patients had a thorough clinical assessment by a PGYII covering chief complaint, history of present illness, current medications, past psychiatric, substance use, physical and sexual abuse, family psychiatric, past medical, family medical, and psychosocial histories, allergies, mental status exam (MSE), multi-axial diagnosis, immediate needs and plan. The standard of care includes an unstructured assessment of current and past suicidal ideation, intent or plan. Attending psychiatrists evaluated patients within 24 hours and concurred or amended the PGYIIs’ diagnostic and suicide assessment.
Charts were reviewed for documented suicide risk in the “alerts” section. When there was a suicide alert, the MSE was reviewed for current suicidal ideation, intent or plan. If no alert was documented, the chart was not reviewed. Charts were reviewed for admission and discharge diagnoses.
The International Personality Disorders Examination (Loranger et al. 1994) and SCID-II (Spitzer et al 1990), the 17-item Hamilton Depression Rating Scale (HDRS-17) (Hamilton 1960), Beck Depression Inventory (BDI) (Beck et al. 1961), and Brief Psychiatric Rating Scale (Overall and Gorham 1962) were used to assess psychopathology. Suicide attempt was defined as a self-destructive act with intent to end one’s life. The number, method, and degree of medical damage of suicide attempts were characterized using the Columbia Suicide History Form (CSHF) (Oquendo et al. 2003). Suicidal ideation was assessed using the Scale for Suicidal Ideation (BSSI: Beck et al. 1979).
For comparative purposes, PGYII provided one type of rating and research interviewers, a second type. Research interviewers’ inter-rater reliability is robust (ICC: 0.80 to 0.95). We do not have similar data for clinicians, but this approximation was necessary because 50 PGYIIs assessed between 1-5 patients. Attending staff was unchanged during the study. The SSI used in the research interview was dichotomized to classify ideation as present/absent for comparison with clinical assessment. Percent agreement and Cohen’s Kappa coefficients (Cohen, 1960) were calculated and interpreted according to standard criteria (Landis and Koch, 1977).
Proximity of the most recent suicide attempt was classified as remote, recent, or none. An attempt was recent if within one year of assessment and remote if more than a year. Associations between proximity of suicide attempt and agreement between raters were tested using Chi-squared.
Clinical and demographic characteristics of the sample are in Table 1. Agreement for admission diagnosis between research assessment and that made by PGYIIs was 67.7%; moderate agreement with a Cohen’s kappa of 0.407. Agreement between discharge diagnosis and research assessment was moderate, 68.3%, with a kappa of 0.432. Cross-tabulation of discharge diagnoses showed that over half the patients identified by a scheduled interview as having MDD were so diagnosed by PGYIIs (Table II).
There was moderate agreement for suicide attempts at 79.2%, kappa = 0.595. Of note, 18.8% of those patients identified by research schedule as past suicide attempters were not identified as such by PGYIIs (Table III). Agreement was fair when evaluating suicidal ideation with a value of 66.5%, kappa = 0.250. All 54.1% of patients identified by PGYIIs as having suicidal ideation were also captured by research assessment. The converse was not true, with 29.7% of patients assessed by structured interview as having suicidal ideation, not identified as such by PGYIIs.
Table IV shows 74.6% agreement between clinical and research assessments for attempts within a year. For suicide attempts beyond a year, agreement dropped to 56.5% (p <0.001). The level of agreement for suicidal ideation based on the most recent suicide attempt showed no statistically significant difference based on proximity of suicide attempt.
We extend Malone et al (1995) findings using a larger sample to compare agreement of clinical diagnosis and suicide assessments with standardized diagnostic and suicide assessments for depressed inpatients. We report moderate agreement in clinical and research diagnoses. As well, we show moderate agreement on suicide attempt history obtained by clinical and standardized interview. Finally, we find fair agreement in the assessment of suicidal ideation by clinical and standardized interview.
Moderate diagnostic agreement has been previously reported using similar methodology (Anthony et al 1985; Ezpeleta et al 1997; Fennig et al 1996; Komiti et al 2001; Kranzler et al 1995; Miller et al 2001; Taggart et al 2006). Prior studies’ limitations include: (1) mismatch in experience of clinicians performing structured and unstructured interviews (Ezpeleta 1997; Fennig et al 1996; Jensen et al 2002; Kranzler et al 1995; Komiti et al 2001; Lewczyk et al 2003; Rosenman et al 1997; Steiner et al 1995); (2) time frames between evaluations varying from days to a month (Fennig et al 1996; Jensen et al 2002; Kranzler et al 1995; Rosenman et al 1997; Steiner et al 1995); (3) in two studies, different patients groups with similar demographics were evaluated (Thienemann 2004; Zimmerman and Mattia 1999); (4) no “gold standard” for assessment of psychopathology (Brugha 1999). Our study offers several strengths. Evaluations were performed within days of each other, by similarly experienced (though not identical trained) clinicians, on the same patients, by clinical and research teams. We are left to contend with the issue of a lack of a “gold standard” for assessing psychopathology. Assessment involves knowledge about abnormal mental states, the skill to elicit them, and judgment regarding their presence and significance (Brugha et al 1999). Baca-Garcia and colleagues (2007) showed that consistency of psychiatric diagnosis ranged from 29% for personality disorders to 70% for schizophrenia, with greatest stability for inpatient and least for outpatient diagnoses. Longitudinal data demonstrating significant fluctuation of psychiatric diagnosis in clinical settings are important reminders of the inherent weakness in our current nosology. As diagnostic assessments move from clinical impressions to semi-structured schedules, reliability may improve, but the issue of validity remains unaddressed.
Our findings regarding agreement between clinical and research assessments of suicide attempts and ideation are sobering. We found moderate agreement for assessment of suicide attempt history, which dropped to only fair agreement for suicidal ideation. These findings are remarkable given that clinicians were aware of the research team’s focus on suicidal behavior. Most worrisome were patients found on semi-structured interview to have either a suicide attempt history (18.7%) or suicidal ideation (29.7%), not identified by PGYIIs as suicidal. Investigators have reported similar findings (Beck et al 1988; Levine et al 1989; Steer et al 1993) stating patients reveal more information about suicide risk during computer-assisted assessment than during clinical interview. Consonant with our findings, Malone et al (1995) demonstrated that discharge summaries did not document recent suicidal ideation or planning behavior in 38% of patients identified with suicidal behavior on research assessment. Our study has a similar design and setting, but a 400% increase in sample size. Healy and colleagues reported 90% of 735 patients presenting to an Emergency Department had suicidal ideation. Only 37% were rated as suicidal by clinicians, although 62% scored positive on the BSSI. While the sample is large, assessment of suicidal ideation was not clinician administered in the comparison with patient self-report.
There are limitations to our study. Neither PGYIIs nor research raters were blinded to the goal of the study. Second, this was a retrospective chart review in which only predetermined parts of the chart were surveyed for diagnosis and suicide alert. Third, patients may have differentially expressed suicidal behaviors to “researchers” versus “clinicians” fearing restrictive observation status by the latter. Fourth, suicidal behaviors may have changed over the 1-5 days between assessments explaining discrepancies. Fifth, clinical material was generated by PGYIIs on one unit at one training program. This may not reflect the clinical skills of PGYIIs at other programs or more experienced clinicians.
Despite these caveats, this study is unique. We compared independent evaluations of diagnosis, suicide history and ideation in the same patients over a brief time using different assessment tools, allowing us to make recommendations for care. Use of semi-structured interviews and suicide assessments would improve clinical assessments by capturing almost 20% of patients clinically misidentified as not being past suicide attempters and close to 30% of patients clinically misidentified as not having suicidal ideation. User-friendly instruments may aid clinical assessment by enhancing reliability and validity in diagnostic and suicide risk assessment.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.