|Home | About | Journals | Submit | Contact Us | Français|
Diagnostic shifts have been prospectively examined in the short-term, but the long-term stability of initial and follow-up diagnoses have rarely been evaluated.
A cohort of 470 first-admission patients with psychotic disorders was systematically assessed at baseline, 6-month, 2-year, and 10-year follow-up. Longitudinal best-estimate consensus diagnoses were formulated after each assessment.
At baseline, the diagnostic distribution was: schizophrenia spectrum disorders 29.6%, bipolar disorder with psychotic features 21.1%, major depression with psychotic features 17.0%, substance-induced psychosis 2.4%, and other psychoses 27.9%. At year 10, the distribution changed to 49.8%, 24.0%, 11.1%, 7.0%, and 8.1%, respectively. Overall, 50.7% changed diagnoses at some point during the study. Most participants who were initially diagnosed with schizophrenia or bipolar disorder retained the diagnosis at year 10 (89.2% and 77.8%, respectively). However, 32.0% of participants (N=98) originally given a non-schizophrenia diagnosis gradually shifted into schizophrenia at year 10. The second biggest shift was to bipolar disorder (10.7% of those not given this diagnosis at baseline). Changes in the clinical picture explained many diagnostic shifts. In particular, poorer functioning and greater negative and psychotic symptoms predicted a subsequent shift to schizophrenia. Better functioning and lower negative and depressive symptoms predicted the shift to bipolar disorder.
First-admission patients run the risk of being misclassified at early stages in the illness course, including more than 2 years after first hospitalization. Diagnosis should be reassessed at all follow-up points.
In their classic paper on diagnostic validity, Robins and Guze (1) commented that a change in diagnosis in follow-up studies provides compelling evidence about diagnostic heterogeneity. Shifts in diagnosis mean that reliance on the initial categorization could lead to biased estimates of risk factors, familial aggregation, and prognosis (1–4), as well as misjudgments about optimal treatment (5). Prospective studies of systematically diagnosed patients with first-episode psychosis have found that by 5-year follow-up, initial diagnoses of schizophrenia or bipolar disorder were retained by 80–90% of patients (3, 6–7), but other diagnoses, such as major depression with psychosis, drug-induced psychosis, and schizophreniform disorder, were frequently revised, suggesting substantial misclassification (7, 8–17). Little is known about diagnostic stability of psychotic disorders beyond five years or the temporal stability of the follow-up diagnoses. The World Health Organization (WHO) Determinants of Outcome of Severe Mental Disorders (DOSMeD) first-contact study suggested that agreement continues to erode with increasing time from initial diagnosis. In that cohort, 12.8% shifted out of the ICD-10 schizophrenia spectrum 13 years later, and 7.5% shifted in, for an overall kappa (k) of 0.43 (18). The Nottingham DOSMeD site found better agreement for DSM-III-R diagnoses (k=0.60), with all subjects initially diagnosed with schizophrenia retaining the diagnosis (19). We found a similar level of agreement between baseline and 10-year follow-up diagnosis of schizophrenia in preliminary analyses of the Suffolk County Mental Health Project (k=0.52) (20). Long-term studies of mood disorders also reported substantial shifting from major depression to bipolar disorder and to schizophrenia (21–25). Since these studies compared diagnoses at two points in time, it is unknown how often the diagnosis may have shifted before becoming stable.
Information on the determinants of short- and long-term diagnostic shifts is limited. The key determinant involves the evolution of the disorder since diagnosis is based on the presence or duration of specific symptoms and/or decline in functioning (7, 18, 26).
This paper uses a heterogeneous, first-admission sample with psychotic disorders reassessed over a period of 10 years to examine the stability of five broad diagnostic categories: schizophrenia spectrum disorders, bipolar disorder and major depression (MDD) with psychotic features, substance induced psychosis, and other psychotic conditions (primarily Psychosis NOS). Diagnoses were formulated by consensus at baseline and at the 6-month, 2-year and 10-year follow-ups (20). We specifically evaluated (a) the distributions, stability, and trajectories of these diagnostic categories; (b) the associations of changes in symptom severity and treatment with changes in diagnosis; and (c) the ability of early clinical features to forecast the diagnostic changes.
The research reported here is from the Suffolk County Mental Health Project, a naturalistic study of the course and outcome of psychotic disorders. The sampling frame consisted of consecutive first admissions with psychosis to the 12 psychiatric inpatient facilities in Suffolk County, New York, from 1989 to 1995. Inclusion criteria were first admission current or within six months, clinical evidence of psychosis (any positive symptoms or antipsychotic medication), ages 15–60, IQ >70, proficiency with English, and no apparent general medical etiology.
The study was approved annually by the Committees on Research Involving Human Subjects at Stony Brook University and the institutional review boards of participating hospitals. Treating physicians determined capacity to provide consent. The head nurse or social worker referred potentially eligible patients to the study. Written consent was obtained from adult participants and from parents of patients <18 years old.
Face-to-face assessments by master’s level mental health professionals were conducted at baseline, 6-month, 2-year, and 10-year follow-up. Medical records and interviews with informants, usually a family member, were obtained at each assessment.
We initially interviewed 675 participants (72% of referrals); N=628 met the eligibility criteria. Forty-two participants died during the 10 years. Of the remaining 586 participants, N=470 (80.2%) were successfully contacted at 10-year follow-up and comprise the analysis sample. Reasons for non-participation were refusal (N=61), untraceable (N=36, including 9 who left the country), uncooperative relatives (N=10), and lacking capacity to provide consent (N=9).
At baseline, month 6, and year 2, we administered the Structured Clinical Interview for DSM-III-R (SCID; 27); at year 10, we administered sections of the SCID for DSM-IV. Follow-up SCIDs covered the interval from last assessment. The interviewers were aware of previous SCID information. The depression module was administered without skip-outs. We inserted items about severity of suicide attempts and aggression. SCID symptom ratings integrated interview, medical record, and significant other sources. The SCID trainer observed 5–10% of interviews. Agreement (intraclass correlation) for the baseline to 2-year waves was 0.75 for psychotic symptoms, 0.78 for negative symptoms, and k=0.73 for depressive symptoms; at 10 years, the corresponding statistics were 0.81, 0.87, and 0.79 (28–30).
The primary study diagnosis was determined by consensus. At baseline, two psychiatrists independently completed the SCID diagnosis module; inconsistent diagnoses, occurring for <10% of participants, were reviewed by a third psychiatrist (13). At follow-up, at least four psychiatrists formulated best-estimate longitudinal consensus diagnoses from information accumulated over time (except prior research diagnoses), including the interviewers’ narratives (7, 20). If consensus was not reached or the diagnosis did not fit a DSM category, the diagnosis was coded as ‘unknown.’ At baseline, 12.8% of diagnoses were unknown (60/470). At follow-up, the figures were: 4.0% at 6 months (19/438); 1.7% at 2 years (8/459); and 0.9% at 10 years (4/470). ‘Unknown’ was included in the Other category.
Baseline diagnoses used DSM-III-R; follow-up diagnoses used DSM-IV. Although DSM-III-R and IV criteria varied somewhat, a review of 6-month diagnoses using both criteria sets indicated that for the broad categories considered here, the differences were negligible (i.e., four 6-month DSM-III-R cases were revised under DSM-IV).
Eight clinical ratings were obtained at each assessment: (1) negative symptoms (sum of 18 items from the Scale for the Assessment of Negative Symptoms [SANS] (excludes inattentiveness during mental status testing) (31); (2) psychotic symptoms (16 items on delusions and hallucinations from the Scale for the Assessment of Positive Symptoms [SAPS] (30, 32); (3) disorganized symptoms (13 SAPS items on bizarre behavior and thought disorder); (4) depressive symptoms (sum of 9 SCID past-month depression symptoms; 30); (5) mania severity (excitement item of the Brief Psychiatric Rating Scale [BPRS]) (33); (6) suicide attempts (lifetime at baseline; past interval at follow-up); (7) aggression, rated 1=never to 5=frequent violence toward people or property; and (8) Global Assessment of Functioning (GAF) for the best month in the year before baseline and year 10 and in the interval between assessments at month 6 and year 2.
Treatment variables included rehospitalization during follow-up intervals; antipsychotic, antidepressant, and anti-manic medication use at each contact; and substance abuse treatment in the prior 6 months. There was good agreement between self-report and medication information in outpatient records (34).
Agreement of earlier diagnoses with 10-year diagnosis was examined using kappa, positive predictive value, negative predictive value, sensitivity and specificity.
Symptom and treatment determinants of shifts in diagnosis were examined using mixed-effects logistic regression (35) estimated in SAS version 9.1 with PROC NLMIXED. The time-varying symptom composites and treatment variables were entered simultaneously into separate regression models examining changes in each diagnostic category (coded 1=present, 0=absent). Continuous variables were standardized with respect to their grand means and standard deviations (across all subjects and follow-up points) to facilitate interpretation. Slopes of independent variables and the intercepts were random terms in order to model associations for each participant. Time was modeled as a categorical variable to control for average changes in the dependent and independent variables across assessments. The random effects covariance structure was specified as an unstructured covariance matrix.
We then tested whether the variables that were significant in the mixed-effect logistic regression models predicted subsequent shifts in diagnosis. Using structural equation modeling (Mplus version 5.1), we specified cross-lagged models in which the follow-up diagnostic status was predicted jointly by diagnostic status and participant characteristic from the preceding assessment point (online data supplement, Figure 1). In evaluating the models, we examined the Comparative Fit Index, the Tucker Lewis Index, and the Root Mean Square Error of Approximation (36).
Missing data were addressed in structural equation modeling using the full information maximum likelihood method (37), which estimates models from all available data, thus minimizing attrition-related biases. An analogous approach was employed in mixed-effects logistic regression so that data from each participant were included in the analysis. The longitudinal analyses were based on 1837 observations from 470 participants.
About half of the 10-year follow-up sample was male (57.2%), under age 28 at baseline (50.4%), and from blue collar households (47.4%) (Table 1). Three-quarters (74.3%) were Caucasian. At baseline, about half (46.4%) had lifetime episodes of MDD (46.4%). One-fifth (21.3%) had a history of frequent/serious aggression.
Compared to non-participants, the 10-year sample had poorer baseline SANS and GAF ratings, and more participants came from blue collar households (Table 1). No other significant differences were found, including the baseline research diagnosis.
The proportion diagnosed with schizophrenia spectrum disorders increased progressively from 29.6% of the sample at baseline to 49.8% at year 10 (Figure 1). Schizophrenia increased from 20.9% at baseline to 38.1% at year 10, and schizoaffective disorder from 3.4% at baseline to 11.5% at year 10. In contrast, schizophreniform disorder decreased from 5.3% of the sample at baseline to 0.2% at year 10. Eighty percent of participants with baseline schizophreniform disorder were rediagnosed with schizophrenia or schizoaffective disorder. Similar proportions were diagnosed each time with bipolar (21.1% at baseline; 24.0% at year 10) and substance-induced (4.5% at baseline; 7.0% at year 10) disorders, while MDD fell from 17.0% at baseline to 11.1% at year 10, and Other disorders decreased from 27.9% at baseline to 8.1% at year 10.
Agreement of baseline with 10-year diagnoses was low (with kappa ranging 0.13 – 0.65), but reasons for inconsistency differed. Schizophrenia showed relatively low negative predictive value and sensitivity but high positive predictive value, indicating low false positive and high false negative rates. In contrast, bipolar, MDD and substance disorders showed relatively weak sensitivity and positive predictive values (prospective consistency), indicating high false negative and false positive rates. The same was true for Other psychoses except that the false negative and false positive rates were higher. Agreement improved over time, with kappa for 2 to 10 year comparisons ranging from 0.69 to 0.76, except for Other psychoses (k=0.45).
To examine the patterns of diagnostic shifts, we focused on the 432 participants given a research diagnosis at all waves (Figure 2). For each baseline category, we traced the number receiving the same diagnosis each time (straight line), the number with the same baseline and 10-year diagnosis but different at 6-months and/or 2-years (squiggly line), and the number with a different diagnosis at year 10 (broken line). Only 49.3% (213/432) retained their original diagnosis each time. Participants initially diagnosed with schizophrenia were most likely to retain the diagnosis throughout (78.6%, 99/126), followed by bipolar disorder (69.4%, 66/95), substance-induced psychosis (56.3%, 9/16), and MDD (42.9%, 33/77). Only a small proportion (8.5%) stayed in the Other category.
The largest proportion of diagnostic shifts was to schizophrenia. Among 306 participants with a non-schizophrenia diagnosis at baseline, 98 (32.0%) were eventually diagnosed with schizophrenia, with one-third of these shifts (36/98) occurring after year 2. Shifts from mood disorders were primarily to schizoaffective disorder (15/23 from MDD, 8/14 from bipolar disorder). The second largest shift was to psychotic bipolar disorder, involving 10.7% of participants with a non-bipolar diagnosis at baseline (36/337); one-third of them (12/36) occurred after year 2. Eleven participants with baseline MDD (14.3%) switched to bipolar disorder, half (5/11) after year 2.
The right half of Figure 2 shows the composition of the 10-year diagnostic groups relative to these trajectories. Looking retrospectively, 68.8% with MDD at 10-year follow-up received the same diagnosis since baseline, followed by bipolar disorder (60.0%), schizophrenia (47.1%), substance-induced psychosis (31.0%), and other disorders (28.6%).
Mixed-effects logistic regression was used to examine the changes in the clinical picture and treatment exposures that contributed to changes in diagnosis. Given the number of comparisons, we focus on findings with p<0.01 (Table 3).
The shift to schizophrenia was more likely to occur when there was a decrease in GAF and depression symptoms, an increase in negative and psychotic symptoms, and initiation/reinstatement of antipsychotic medications. Shifts to psychotic mood disorders were associated with improvement on the GAF, increased depressive symptoms, and decreased negative and psychotic symptoms. Improvement on the GAF was particularly pronounced for a shift to bipolar disorder, while the increase in depressive symptoms was especially important for a shift to MDD. The change to bipolar disorder was also associated with an increase in excitement ratings and initiation/reinstatement of mood stabilizers. A change to MDD was preceded by initiation/reinstatement of antidepressants and discontinuation of mood stabilizers. Rediagnosis to substance-induced psychosis followed initiation/reinstatement of substance abuse treatment.
To determine whether we could forecast changes in diagnosis, we selected the significant variables from the mixed-effects logistic regression models and constructed 18 models using structural equation modeling. These models showed reasonably good fit (Table 4). For schizophrenia, poorer GAF, SANS, and SAPS predicted shifts into this category from baseline to month 6 and from year 2 to year 10, but none predicted a shift from month 6 to year 2. For bipolar disorder, better GAF, lower SANS, lower depression, greater excitement, and anti-manic medication antedated shifts from baseline to month 6. The first three also predicted the shift from year 2 to year 10, but only treatment with anti-manic medication forecasted the shift from month 6 to year 2. For MDD, increased depressive symptoms, lower SAPS, use of antidepressants, and not using anti-manic medication predicted a shift from baseline to month 6. None of the selected variables predicted later shifts. For substance-induced psychosis, substance abuse treatment predicted a shift from baseline to month 6, but not at later intervals.
This report examined diagnostic stability in a first-admission cohort across 4 assessments over a 10 year period. We had earlier found considerable shifting between baseline, 6-month, and 2-year follow-up, most notably from MDD and Other psychoses to schizophrenia (7, 13). In this paper, we again found a substantial number of revisions at the 10-year follow-up, including 20.7% whose diagnosis changed from year 2 to year 10. Only half of the cohort retained the same diagnosis throughout the study. Changes in clinical symptoms and treatment were important determinants of shifts in diagnosis. The observed effects were fully consistent with expectations, except that disorganized symptoms and rehospitalization were largely unrelated to change in diagnosis. Furthermore, some diagnostic changes could be anticipated in advance. Participants who did not meet criteria for schizophrenia but exhibited poor functioning and greater negative and psychotic symptoms were likely to shift into that category, whereas better functioning, and lower negative and depressive symptoms predicted a later shift to bipolar disorder.
Our findings must be viewed within the context of the limitations of the study. First, the sample included patients hospitalized with psychotic symptoms, and the results may not generalize to patients who were never hospitalized or did not have co-occurring psychosis. Second, the substance-induced and MDD groups were small, and thus the modeling analyses were able to detect antecedents only for the baseline to 6-month shifts, when changes were more common. Third, our measure of mania severity (BPRS excitement) was crude. Our study began with a focus on schizophrenia and was designed to minimize exclusion of false negatives. We later realized that many participants had a primary mood disorder. Starting at 2-year follow-up, we added a mania rating scale. Fourth, DSM-IV was published as we were completing the 6-month diagnoses. We updated all of the 6-month diagnoses but were unable to recheck the baseline diagnoses. However, only four 6-month diagnoses warranted a change. The fact that the shifts occurred across the 10 year period and were not limited to the baseline to 6-month period also suggests that this is not the primary explanation for our findings. Fifth, our diagnoses were formulated by consensus, which precluded examining inter-rater (psychiatrist) agreement. Sixth, we do not know precisely when, during the 2- to 10-year follow-up, the diagnostic team would have concluded that a change in diagnosis was warranted. Lastly, the interviewers and psychiatrists had access to multiple longitudinal sources of information and were blind only to prior research diagnoses. Diagnoses established in this fashion are not comparable to diagnoses determined by clinical judgment or cross-sectional SCID ratings. However, longitudinal information is essential to most diagnoses, and we wished to improve the accuracy of the consensus diagnosis with the best possible chronological record of the evolution of the disorder. If anything, having access to longitudinal information should have led the research psychiatrists to maintain the same diagnosis without clear evidence to the contrary. All in all, the diagnostic instability reported here should be regarded as a “best case scenario.”
We could not locate prior studies that considered serial research diagnoses in samples with psychosis. However, studies have examined serial clinical diagnoses in treatment samples (e.g., 39–40). These studies also report temporal variability in diagnosis, with schizophrenia having the best and personality disorders the worst agreement.
Changes in diagnosis may have a number of explanations. By definition, some diagnoses require specific temporal patterns of symptoms (e.g., schizoaffective disorders). Other diagnoses, such as bipolar disorder, include episodes with different polarities that take time to unfold. Many symptoms, such as social withdrawal or agitation, may be present in more than one disorder. In terms of psychotic symptoms, none is pathognomonic of a specific diagnosis. At the time of initial presentation, there are often gaps or ambiguities in the information available to establish a diagnosis. In addition, we included participants with significant substance use histories to construct a generalizable sample, and this too may have confounded the clinical presentation and ultimately the diagnoses. Events occurring just before an acute decompensation may be “red herrings” when viewed in the context of the overall illness course and take on undue weight in the initial diagnosis. Thus, it is not surprising that first-episode patients run the greatest risk of a shift in diagnosis with longitudinal assessment.
Our results demonstrate that diagnostic reconsideration is often linked to changes in functioning or symptoms. In addition, we found that changes in medication regimens administered by community physicians forecasted shifts in diagnosis. Although at first glance, this may seem tautological, the treatments administered by community physicians could also reflect their awareness of diagnostically important symptoms, even when they are subsyndromal or at low to moderate levels of severity. This raises the possibility that strict adherence to the diagnostic criteria may have led us to miss clues utilized by practitioners and misjudge the illness initially. There is obviously a tension between strict implementation of diagnostic criteria, in which equal weight is given to all of the components, and community diagnoses that consider some symptoms and behaviors as more salient than others. Nevertheless, if we regard our 10-year diagnoses as the gold standard, then half of the study population was misclassified based on our initial rigorous application of DSM criteria. This is a very concerning finding given that patients have treatments (with their associated side effects) recommended on a long-term basis based upon presumptive diagnoses that our data suggest have a 50–50 chance of being revised. Misclassification also has serious implications for research by promoting non-reproducible results and potentially erroneous conclusions across a broad range of studies (e.g., therapeutic indications and outcome predictors; biomarkers; genetics; etiologic factors of specific disorders).
The DSM-III and subsequent revisions have been celebrated for introducing a reliable system of classification. Yet most assessments of diagnostic reliability have focused on inter-rater reliability at a given point in time rather than temporal reliability of initial and subsequent diagnoses determined from prospective research. Our results make clear that a reliable cross-sectional diagnosis may still have poor reliability over time. Conceivably, representative samples, as opposed to participants in clinical trials, include many patients who do not have ‘classic’ clinical presentations. Development of future criteria for psychiatric diagnosis will need to give greater consideration to temporal reliability and predictive validity, rather than cross-sectional reliability of diagnosis.
Robins and Guze (1), writing before the publication of DSM-III, were prescient in drawing attention to the fundamental importance of longitudinal diagnosis for both research and clinical care. Our results, along with a recent study of mood disorder (26), reinforce the importance of reassessing diagnosis over the long-term. As a naturalistic study, our findings highlight the complexity of formulating a diagnosis in the face of multiple comorbidities (e.g., mood symptoms, psychotic symptoms, substance use). They also emphasize the clinical significance of judiciously integrating longitudinal information from multiple sources. Finally, these findings underscore the need to periodically re-evaluate clinical diagnoses to assure that patients are receiving appropriate interventions.
We gratefully acknowledge the support of the participants and mental health community of Suffolk County for graciously contributing their time and energy to this project. We also are indebted to the interviewers for their careful assessments, and to the psychiatrists who contributed to the consensus diagnoses: Alan Brown, Eduardo Constantino, Thomas Craig, Frank Dowling, Shmuel Fennig, Silvana Fennig, Beatrice Kovasznay, Alan Miller, Ramin Mojtabai, Bushra Naz, Joan Rubinstein, Carlos Pato, Michele Pato, Ranganathan Ram, Charles Rich, and Ezra Susser. Special thanks to Janet Lavelle and Al Hamdy for coordinating the study.
FUNDING: National Institute of Health (MH-44801 to EJB)