|Home | About | Journals | Submit | Contact Us | Français|
The contemporary diagnoses of schizophrenia (sz)—Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition(DSM-IV) and International Classification of Diseases,10th Revision(ICD-10)—are widely considered as important scientific achievements. However, these algorithms were not a product of explicit conceptual analyses and empirical studies but defined through consensus with the purpose of improving reliability. The validity status of current definitions and of their predecessors remains unclear. The so-called “polydiagnostic approach” applies different definitions of a disorder to the same patient sample in order to compare these definitions on potential validity indicators.
We reviewed 92 polydiagnostic sz studies published since the early 1970s. Different sz definitions show a considerable variation concerning frequency, concordance, reliability, outcome, and other validity measures. The DSM-IV and the ICD-10 show moderate reliability but both definitions appear weak in terms of concurrent validity, eg, with respect to an aggregation of a priori important features. The first-rank symptoms of Schneider are not associated with family history of sz or with prediction of poor outcome. The introduction of long duration criteria and exclusion of affective syndromes tend to restrict the diagnosis to chronic stable patients. Patients fulfilling the majority of definitions (core sz patients) do not seem to constitute a strongly valid subgroup but rather a severely ill subgroup. Paradoxically, it seems that a century after the introduction of the sz concept, research is still badly needed, concerning conceptual and construct validity of sz, its essential psychopathological features, and phenotypic boundaries.
Schizophrenia (sz) remains an elusive entity, and the history of psychiatric research is replete with the attempts at formalizing its definition and hence to distinguish it from other disorders as well as the attempts at various internal subdivisions (eg, acute—chronic or poor premorbid—good premorbid subtypes). In fact, since the introduction of the concept, psychiatry has produced not less than 40 definitions of sz.
These historical permutations naturally sink gradually into oblivion with the most recent algorithms (such as Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition [DSM-IV] and International Classification of Diseases, 10th Revision [ICD-10]) acquiring the aura of important epistemological achievements with solid empirical foundations and insidiously reified into truly existing natural entities.1, 2 Yet, it is important to realize that the operational diagnoses of today owe their shape not so much to their scientific foundations but to pragmatic needs and ensuing decisions to increase international consensus.
One possible investigative approach to the reliability and validity of sz definitions is to compare these definitions between themselves and with their historical predecessors. For example, to say that ICD-10 is superior to ICD-8/ICD-9 requires comparing these 2 algorithms with respect to some validating data of interest. The purpose of this study is to provide a review of such a polydiagnostic approach in sz research. This goal gains in urgency, given the ongoing contemplation of yet another change in the diagnostic systems.
The polydiagnostic approach3–5 consists of applying different sets of criteria for a given diagnostic category to the same group of patients in order to assess the degree of concordance between the diagnoses and/or to compare their validity indicators.
The Medline searches were performed for all clinical and epidemiological studies published since 1970 comparing at least 2 definitions of sz. The Medline search was supplemented by screening references of the individual articles. Studies that did not indicate the numbers of patients with a given diagnosis were not included.
A preestablished scheme was used to record which and how many definitions were used, number of patients, the inclusion criteria, rating setting, the interrater reliability, the diagnostic concordance, follow-up assessments and their results, and other types of validation. Because the studies and hence the data were too heterogeneous, it was not possible to perform a systematic review, where the individual studies could enter into a meta-analytic approach.
We have identified more than 100 articles published between 1972 and 2005, referring to 92 polydiagnostic studies. Twenty-six of these were follow-up studies. An overview of all studies appears in table 1.
The polydiagnostic studies used approximately 40 different diagnostic definitions of sz and related disorders (2–23 in each study, median = 4). An overview of the definitions is shown in table 2. The formal criteria of these definitions differ; table 3 compares the criteria of some selected definitions.
58 studies (63%) dealt primarily with psychosis, the 11 of which (12%) with first-admission or recent onset psychosis. 34 studies (37%) included broad groups of patients and population subjects.
The information about the details of psychopathological rating procedures was typically inadequate, except for listing the rating scales. 45% of the studies explicitly mentioned psychiatrists as raters, a further 13% used groups of raters with varying professional backgrounds, and 42% gave no information on the education of the interviewers.
In 26% of the cases, the rating was performed solely on the basis of hospital charts, in 39% exclusively on the basis of patient interviews, and in the remainder based on composite sources of information.
The expectation of increased diagnostic reliability was what justified the introduction of operational definitions, and the DSM-III field studies did indeed present a high reliability level for sz (81%6), but the methodology was loose structured and no further field studies were presented for the later DSM revisions to clarify this issue. However, the diagnostic interrater reliability was assessed in less than half of the polydiagnostic studies, usually in the form of Cohen's kappa coefficients, which were, not surprisingly, somewhat better for the more recent (from Research Diagnostic Criteria [RDC] onward) operational definitions than for older definitions (Modestin et al,7 Kirk and Kutchins,8 cf. Kety et al9 vs Kendler et al10), generally labeled “good” or even “excellent.” Other forms of reliability checks (eg, test-retest) and other expressions of reliability (eg, symptom agreement) were rarely presented.
Before exploring the question of reliability, one should realize that there are 2 major, overlapping sources of a diagnostic disagreement: (1) criterion variance, which refers to the differences in the raters' use and interpretation of the diagnostic criteria, and (2) information variance, referring to the quality and quantity of the originally collected psychopathological information. The significance of information variance is illustrated by higher kappas found in rating live or videotaped interviews than in rating hospital charts11 and by the fact that the reliability of rating case records remained only moderate even when using structured checklists.12 Brockington13 suggested that low interrater reliability for Feighner's and New Haven definitions in the Camberwell sample was caused by their complexity, which can be seen as an effect of criterion variance. As a rule, a diagnosis based on a few simple items becomes easily reliable compared with the diagnostic algorithms defined by many and interacting features.
Unfortunately, the structure of reliability was rarely discussed, and only a few studies allowed a more detailed reliability examination. In a unique study, Strakowski14 showed that a lack of reliability between the clinical and the SCID-P (Structured Clinical Interview for DSM-II-R—Patient Version)–generated diagnoses could be partitioned into 58% caused by the information variance and 42% caused by the criterion variance. Unfortunately, such distinctions and explorations of the sources of variance are typically not performed nor discussed. Yet, if a creation or a revision of diagnostic criteria is motivated by reliability concerns, the emphasis should be focused on the criterion variance because the information variance is basically related to the comprehensiveness of the assessment.
Reliability is not an intrinsic property of the diagnostic definition. Needless to say, unreliability may be related to multiple factors, including skill and education of the interviewer. Reliability is higher in research settings but does not ensure reliability in clinical practice. Furthermore, reliability acquired through training on clinical samples cannot be unproblematically extrapolated to population studies where the majority of subjects do not suffer from any mental illness, or suffer from specific psychopathology but unaccompanied by dysfunction or distress, or where the subjects are prone to hide their symptoms. Moreover, the exact significance of quantifying reliability is not unequivocal. Thus, the magnitude of kappa coefficient may reflect differences in prevalence rates.15 Kirk and Kutchins 8 demonstrated that a kappa of the same magnitude may be presented by different adjectives (eg, good or excellent), depending on the agenda of the individual study. Finally, the conventional wisdom of low reliability precluding validity16 is not invariably true. Some authors demonstrated that diagnostic validity is possible even in the case of low reliability, if the sensitivity is low and specificity is high.17, 18
The studies demonstrated a wide range in the proportions of patients fulfilling the criteria for the individual definitions of sz (eg, Strauss and Gift19: 1–25%; Lewine et al20: 2–60%). Such differences in the frequency and hence in the inclusiveness of the definitions reflect the variation in the diagnostic criteria. The influence of the duration criteria and the exclusion of affective syndromes were illustrated by a shift from DSM-II (having no such criteria) to the criteria of RDC and DSM-III.21 DSM-II sz was often repartitioned as affective, schizoaffective, and schizophreniform disorders.
A DSM-IV and ICD-10 reanalysis of the Burghölzi sz sample, originally diagnosed by Eugen and Manfred Bleuler, showed that the sz diagnosis was retained in nearly in all cases as the contemporary spectrum diagnoses (sz, schizoaffective disorder, schizotypal personality disorder).7
There was a striking interstudy variation in the proportion of patients fulfilling a given diagnosis of sz. Differences in study design and inclusion criteria were primarily responsible for this variation. The number of studies allowing an assessment of frequencies of the contemporary definitions was limited. Across 12 studies, the proportion of DSM-III-R sz varied from 24 to 100%, lowest in a group of patients with “functional psychosis22” and highest in patients with “narrowly defined schizophrenia.23” Corresponding figures are found for ICD-10 sz.
The samples composed of the patients selected because of their sz diagnosis were (tautologically) frequently diagnosed as having sz by all applied definitions.24, 25 Selection of chronic sz patients resulted in frequent sz diagnosis even by Feighner's conservative definition.11 In fact, a comparison of different samples of patients demonstrated that the proportion of Feighner sz increased with chronicity, whereas it was not the case for the frequencies of Schneider's first-rank symptom (FRS)—Berner et al26 and Lenz et al27 vs Cernovsky et al28 and Landmark et al.29 , 30
Substantial differences in concordance between the sz definitions were demonstrated in studies comparing the various preoperational definitions.31–34 Yet, between related systems, there was a considerable concordance.7, 31 Some diagnostic concentricity was seen between related definitions. Thus, in one study, almost all Feighner cases fulfilled also DSM-III criteria.33 Cases fulfilling most of the definitions of sz and, consequently, yielding the highest concordance were often named “core schizophrenia” cases. In one study,35 such cases were found to suffer from paranoid sz. The concordant group of the IPSS patients was characterized by a higher percentage of males and of single patients, a psychopathological profile with more hallucinations, delusions, and flatness of affect, fewer depressive symptoms, precipitating factors, and previous inpatient treatments.36 Uniforming the patient sample tended to increase the concordance between the definitions.
Restricting the sample to a group of patients with illness duration longer than 6 months increased the concordance kappa between definitions having different duration criteria.35 In one study, the concordance was increased by widening the sample to all first admissions and by eliminating the 3 strictest definitions.13 Definitions excluding affective symptoms were demonstrated to form a cluster with a higher kappa than the cluster formed by the definitions that permit them.37 In a sample of chronic psychotic patients, the elimination of the OPCRIT item 52, “co-occurrence of psychotic and affective symptoms,” increased the agreement of the sz spectrum disorders.38 Among all studies of the present review comparing diagnostic concordance kappas (N = 34), values above 0.80 were found exclusively in those that included chronic psychotic patients but not in first-onset psychotic patients and mixed groups of patients (Fisher exact test: P < .005).
78 studies (85%) presented validation data. The most frequently occurring measure of validation was the predictive power of diagnostic definitions. However, true concurrent validation—be it through neurobiological markers or other relevant measures that do not enter into the diagnostic definition such as family history of mental illness, psychometric measures of formal thought disorder, or subjective sense of self-dissolution39—was rare.
24 studies (28%) compared the outcome of different sz definitions. The majority of the outcome periods were longer than 5 years. The outcome variables investigated were the prediction of the course of illness, the number of readmissions, symptomatology levels, diagnostic stability, and of social and functional outcome.
Diagnostic stability as a measure of outcome (6 studies) was usually calculated as positive predictive value. Several studies showed high stability of the operational definitions, such as DSM-III-R and ICD-10.40
Conservative definitions were found to be predictors of poor outcome, but tautologically, the notion of conservatism is often dependent on the chronicity of course. This applied first of all to Feighner's criteria.41–44 Broad definitions such as The New Haven Schizophrenia Index, on the other hand, did not predict the outcome.13, 31, 42, 45, 46 Such diagnoses embrace favorable as well as poor outcome cases; conservative diagnoses only include the latter group. The duration criteria of the diagnostic algorithm influence the predictive validity. Thus, the 6-month duration criterion has been demonstrated to increase predictive validity in terms of diagnostic stability.12, 47, 48 Elimination of affective components in sz tended to result in an aggregation of chronic, nonepisodic, and therefore stable forms of illness.49, 50
Schneider's FRS, playing a central part in the contemporary sz definitions, resulted in a relatively inclusive sz concept that did not predict the outcome.13, 23, 27, 31, 45, 46 In comparing DSM-II and DSM-III, the former was found to be more inclusive and indicative of a more favorable outcome. The DSM-III appeared to exclude many females with favorable outcome.51
In a few studies, concurrent validity was established by relating sz definitions with traditional sz symptoms or traits such as Bleuler's fundamental symptoms, Schneider's FRS, Huber's basic symptoms, and premorbid adjustment.
ICD-9 sz when compared with ICD-10 was associated with formal thought disorder39 and with self-disorders and basic symptoms (L.B.J and J.P, unpublished data from the same study).
In a comparison of 6 definitions of sz, Bleulerian fundamental symptoms were found to be more important for the diagnosis than Schneiderian FRS.26 In one study, Schneider sz was associated with better premorbid adjustment than non-Schneider sz.32 The significance of basic symptoms assessed by Frankfurt Complaint Questionnaire (FCQ)52 seemed more ambiguous,34, 50, 53 probably, because of the methodological shortcomings of the FCQ.
In the IPSS,36 a McKeon cluster analysis of the present state examination (PSE) data resulted in 10 clusters. Some ICD-8 sz subtypes tended to be concentrated in certain clusters. Some clusters were common to all centers, others only in a small number of them. Three clusters were selected to make up a sz definition for further analyses together with de ICD-8 and Catego-S diagnoses.
Latent class analysis54 was carried out in a handful of studies. In an attempt to explain test-retest reliability findings, Faraone18 estimated the sensitivity and specificity of RDC and DSM-III-R diagnoses to latent classes. Sz according to both systems had high kappas and excellent sensitivity and specificity. Kendler55 compared classes generated by a handful of OPCRIT items collected in the Roscommon Family Study with DSM-III-R diagnoses. The classes which emerged resembled well-known diagnostic categories such as classic (Kraepelinian) sz, hebephrenia, and schizophreniform disorder. Eighty-four percent of cases classified as classic sz were also so diagnosed by the DSM-III-R. The classes were validated against the familial risk of illness. The risk for sz and sz spectrum was significantly increased in relatives of all probands classes except major depression and, especially, marked in the relatives of hebephrenia-class patients (sz 16.1%, sz spectrum 45.5%).
Factor analysis of diagnostic variables of 23 sz definitions applied by Peralta56 to 660 psychotic patients yielded 3 interpretable factors (a general sz factor, a Schneiderian factor, and a Bleulerian factor) explaining 58% of the variance, which was found to support a dimensional approach to sz.
Only a few studies related biological findings to multiple diagnoses. Assuming that the prolactin-releasing potency of a drug corresponds to its antipsychotic potency, Keks57, 58 found prolactin concentration to be lower in patients fulfilling criteria precluding affective syndromes.
In measuring the growth hormone response to the injection of clonidine as an expression of α2-adrenergic receptor sensitivity, Keks59 found that most of the definitions associated with blunted response did not preclude affective symptomatology.
Heritability served as a measure of validation in a few studies.
Gottesman and Shields, examining twin concordance as an expression of heritability, found both monozygotic (MZ) and dizygotic (DZ) concordance highest using the broadest definitions (among nonoperational diagnoses of 6 clinicians) but the best MZ:DZ discrimination using “middle-of-the-road” criteria.60, 61 However, the emphasis on maximizing MZ:DZ concordance ratio is only meaningful on the prior assumption of polyfactorial transmission.
Conservative definitions such as Feighner's were among those with the highest MZ twin concordance whereas FRS were among those with the lowest.62 MZ twins diagnosed by the operational definitions had higher concordance and correlation in liability compared with FRS-diagnosed twins.62–64
In a sample of biological and adoptive relatives of index adoptees with sz and of control adoptees, significant differences were found in the prevalence of sz spectrum disorders in biological vs control relatives of index probands both by a Kraepelin-Bleuler-DSM-II definition9 and by DSM-III.10 The percentage of spectrum disorders was higher, though insignificantly, among the relatives of the former than of the latter.
Few polydiagnostic studies compared the familial rates of sz. Comparing 4 definitions, Asnis65 failed to find significant differences between the familial rates of sz spectrum disorders. In a first-admission sample, ICD-9 sz was found to be significantly associated with family history of sz, whereas ICD-10 was not associated at all.39 Moreover, partitioning of ICD-10 sz39 revealed that sz selectively aggregated in the relatives probands diagnosed by the criterion 2 (an assortment of Bleulerian and second rank symptoms). Kendler's latent class analysis study,55 mentioned above, showed a dramatically increased risk for sz in the relatives of the hebephrenia-class probands.
Four studies calculated the incidence rates of sz to be within a range from 6 to 32 per 100 000 inhabitants.66–69 The rates varied within each study between the diagnostic definitions. Thus, ICD-9 sz was found to be broader than DSM-III and DSM-III-R, and Feighner's definition was the most restrictive.
Examining the alleged decline in the incidence of sz, Allardyce70 found a falling rate of clinical diagnosis over time (20 years) but not the OPCRIT-generated ICD-10 and DSM-IV sz, suggesting that changes in the diagnostic habits have operated to bias the reported rates.
Lindstrom71 calculated the 1-year prevalence of sz by 4 contemporary diagnostic definitions to be within the range of 40–47 per 10 000. The prevalence found by Harvey72 was 29–31 per 10 000. The 1-year prevalence of the PSE S-class estimated by Ni Nuallain73 was as low as 10 per 10 000 as compared with the 73 of ICD-8 because of the failure of the S-class to identify patients who presented with exclusively negative symptoms. The combination of PSE and lifetime syndrome checklist data increased the PSE S-class prevalence to 39 per 10 000. Among the Iban of Sarawak, Barrett74 found rates of treated sz between 18 and 35 per 10 000—age corrected (to age 55) between 42 and 83 per 10 000, and in rural Botswana, Ben-Tovim75 found the age-adjusted 1-year prevalence of DSM-III sz to be 43 per 10 000 and of ICD-9 53 per 10 000.
40 studies inform about the gender distribution. The mean numbers of male and female patients in these particular studies were 95 and 86 (nonsignificant). Some studies allowed for a comparison of incidence rates, frequencies, and lifetime courses. The highest ratio of male to female incidence rate was produced by the narrow Feighner definition.67, 68 Other studies failed to demonstrate the incident sex ratio differences between broad and narrow definitions.66, 76 Conservative definitions yielded a significantly greater male to female prevalence ratio.20, 51, 67, 77 Patients excluded by the narrow definition were typically favorable-outcome females.51 Castle67 found the male-to-female ratio to be higher than 1 in patients with onset below age 45 and lower than 1 above age 45 in sz definitions requiring a 6-month duration.
The polydiagnostic studies of the past 4 decades reflect an evolution away from prototypically anchored diagnostic concepts of sz to polythetically oriented definitions, based on the so-called operational criteria. It is, however, necessary to point out that all studies reviewed here—as polydiagnostic comparisons—necessitated a certain operationalization of the examined definitions.
The principal finding of our review is that the degree of concordance between different definitions of sz varies considerably, depending, of course, on the similarity of the criteria. The number of sz cases in a given sample may vary by more than factor 3 when diagnosed by 2 different systems. This is far from trivial and not only because of psychopathological considerations. In fact, etiological research is very frequently performed through comparisons of “schizophrenias” with “nonschizophrenias,” ie, the sample is simply dichotomized into szs and the remainder of the sample. Such procedure may attenuate or otherwise obscure differences of interest because the “nonschizophrenia” group may contain spectrum cases as well as sz cases defined so by other sets of criteria.
The polydiagnostic studies do not provide sufficient validity data to justify claiming a clear superiority of any particular definition over others. In many studies, the percentage of sz cases so diagnosed by all diagnostic algorithms is remarkably low. This subgroup—usually called “core schizophrenia”—appears to us more as a product of severity and impenetrable interactions between the single criteria rather than as being reflective of a class with a particularly strong validity.
What is conspicuously lacking in the polydiagnostic studies is a serious and systematic reflection on the conceptual validity of sz, ie, what we take this illness to be in the very first place.78 Empirical phases of validation do not happen in a void but are preceded and constrained by the original typifications of what we take sz to be.78–83 There are several possibilities: eg, is it an illness mainly defined by trait-like intersubjective displacement, subjective orientation with changes of the worldview (as described by Bleuler's generic term of autism84, 81), compromised unity of consciousness and self-dissolution (Kraepelin85, 86), characteristic psychotic symptoms (a view unjustly ascribed to Schneider87), a deteriorating or unremitting course (Feighner88), simply a multidimensional construct,56, 89 or something else (eg, schizotaxia90, 91)?
The issue of affective symptoms represents a special concern in the discussions of conceptual and construct validity. The exclusion of affective components from the picture of sz, despite their clinical reality as ubiquitous symptoms in all stages of sz, has also necessitated a creation of a rather convoluted category of schizoaffective psychosis.92 This evacuation of affective symptoms from sz appears as quite arbitrary, and yet as shown by Keks,57–59 a stratification of sz by presence or absence of affective symptoms may be biologically meaningful. The subdivisions of sz on the basis of biological findings obtained in polydiagnostic studies are in agreement with Bleuler's claim that we deal with a group of szs rather than a single disease.84 Such a view gains currently provisional support from genetic studies. Thus, in a family study by Hallmayer et al, a mathematically identified subtype of sz, characterized by pervasive neurocognitive deficit, had a distinct genetic profile.93
Empirical validity is a multidimensional concept comprising pathogenetic and etiologic knowledge (or hypotheses), course, treatment response, etc. Although we have knowledge of a variety of etiologically relevant risk factors in sz, this knowledge has no substantive form, which could permit assessment of causal validity in a polydiagnostic context. Genetic data39 suggest that it is the Bleulerian dimension of fundamental symptoms that is associated with familial aggregation of sz. No molecular genetic studies have so far been included in the polydiagnostic designs.
Predictive validity—exploring outcome and stability of course—is examined in approximately half of the studies. Unfortunately, it is a rather equivocal type of validity. Prediction of course may serve as a validity criterion with an independent a priori assumption that, say, an unremitting course or chronic social dysfunction is constitutive of a given diagnostic entity. The recent duration criteria lead to an automatic exclusion of favorable outcome, acute psychosis. Diagnostic stability in the sense of basically unchanged psychopathological picture as a measure of validity is at odds with the well-replicated findings that 20–30% of patients with sz recover from psychosis (cf. Modestin et al,7 Hafner and an der Heiden,94 Ciompi and Muller,95 Huber et al,96 and Bleuler97). Psychopathological stability would be relevant as a validating criterion if one were interested in the persistence of the trait features of the illness, indicating structural alterations of consciousness.81 Therefore, definitions based on trait-like features (eg, Bleuler's fundamental symptoms) appear to be more stable than those based on fluctuating psychotic features (eg, FRS). In the latter case, diagnostic stability means chronic, productive psychosis. The FRS are particularly poor predictors of outcome.13, 23, 27, 31, 45, 46 Conservative definitions with inbuilt chronicity (deviant preonset personality) such as Feighner's are more likely to predict uniformly poor outcome. Unfortunately, only few studies made an attempt to examine differential validity of sz by other means than outcome prediction.
A dominating concern of contemporary psychiatry is the quest for reliability of diagnostic categories. The very rise of “operational” definitions in the 1970s was stimulated by the demonstration of alarming US-UK diagnostic disagreements.98, 99
The operational definitions seem to have modestly increased the interrater reliability (eg, Gruenberg et al100; Kety et al9 vs Kendler et al10). However, reliability is easy to achieve but “it becomes vacuous when it is a primary goal, un-associated with other concerns.101” In the quest for reliability, many domains of psychopathology of sz, once considered as taxonomically and pathogenetically crucial (eg, the notion of autism or formal thought disorder) have been either strongly simplified (converting the “fundamental” schizophrenic symptoms into behaviorally defined “negative symptoms86, 102”) or deleted altogether from the psychiatric idiom (eg, the notion of self or subjectivity103).
In conclusion, this review highlights certain steps that seem to us as urgently needed in sz research. There is a need for integrating the rapidly expanding technological means with explicit reflection constrained by phenomenological familiarity with sz. Empirical studies should increasingly lose their exploratory nature and become instead designed to answer more specific and explicit questions.