|Home | About | Journals | Submit | Contact Us | Français|
A primary challenge in psychiatric genetics is the lack of a completely validated system of classification for mental disorders. Appropriate statistical methods are needed to empirically derive more homogenous disorder subtypes.
Using the framework of Robins & Guze’s (1970) five phases, latent variable models to derive and validate diagnostic groups are described. A process of iterative validation is proposed through which refined phenotypes would facilitate research on genetics, pathogenesis, and treatment, which would in turn aid further refinement of disorder definitions.
Latent variable methods are useful tools for defining and validating psychiatric phenotypes. Further methodological research should address sample size issues and application to iterative validation.
A primary challenge in psychiatric genetics is the lack of a completely validated system of classification for mental disorders (Merikangas & Risch, 2003). Without a well-defined phenotype, the establishment of a relationship between a gene and a disorder is difficult, since heterogeneity in the sample with respect to underlying disease process may dilute any existing effects. For example, if a gene were associated with a certain type of depression, the estimated odds ratio for the association would be biased toward one if individuals without depression, or with a different type of depression were misclassified as diseased. It is therefore not surprising that relatively few genetic findings have been replicated (Burmeister, et al 2008). This problem is not limited to genetics; heterogeneity within samples complicates most areas of psychiatric research, including neuroimaging, pharmacological response, and studies of patient outcomes.
A validated system of classification has eluded psychiatry for two reasons. First, with few exceptions, psychiatric disorders lack gold standards for diagnosis. Unlike a disorder such as diabetes, we are not able to say, “If the patient has a plasma level of X greater than Y mg/dL, they have disorder Z”. It might be argued that diagnostic criteria for diabetes have also changed over time, (WHO, 1999), but these have been adjustments to cutoff values, not changes to what was being measured. Psychiatry, on the other hand, lacks comparable biomarkers (Charney et al., 2002) because we know comparatively very little about the mechanisms underlying psychiatric disorders. Without addressing this second issue, the first is intractable. Yet, how does one study the mechanisms underlying a disorder, if the definition of that disorder is itself in question?
Typically, psychiatry has relied on syndromes (sets of symptoms which occur together more frequently than would be expected by chance alone). No two individuals, even with the same underlying disorder, would be expected to have exactly the same symptoms. Therefore, based on clinical observations, diagnostic criteria (e.g., DSM) requiring subsets of symptoms have been developed which are used to determine if the particular syndrome is present. With approach to case definition, several large population-based surveys have been undertaken, such as the Epidemiological Catchment Area Study (ECA) and National Comorbidity Study (NCS). The large samples in these studies allow researchers to determine how particular symptoms co-occur in the population, and to test the validity of empirically derived diagnostic groups. For the latter objective, in addition to large sample sizes, a “statistical toolbox” is needed. The purpose of this paper is to describe how latent variable methods can be used to empirically derive more homogenous subtypes (diagnoses), and how these diagnoses might be validated, within the framework described by Robins and Guze (1970). Additionally, the paper outlines how a multidisciplinary group of researchers might apply these methods in a system of iterative validation, and highlights potential avenues of methodological research to make these latent variable models more useful to researchers. These methods would also facilitate the proposed DSM-V research agenda for a pathophysiologically-based classification system (Charney et al., 2002).
A key premise is that homogeneity in clinical presentation represents homogeneity in underlying disease process. Obtaining homogenous groups of individuals requires the splitting of a heterogeneous group into some number of smaller groups based on features of their clinical presentation. Assuming that we have measured all of the salient clinical features, two difficulties remain: 1) it is not known how many groups there are, and 2) it is not known what weight should be placed on each feature in order determine who is likely to be in each group.
Latent class analysis (LCA), a form of latent variable modeling, is a method of elucidating subgroups of individuals that addresses both of these difficulties (McCutcheon, 1987). LCA uses categorical, typically dichotomous, sets of observed indicator variables, such as presence or absence of certain symptoms. Correlations, or dependences, between these variables are assumed to be the result of individuals’ membership in latent classes, such that conditioning upon latent class membership would remove any relationships among the observed variables. This assumption is known as conditional independence.
To address the first question of how many groups exist within the sample, models with different numbers of classes are fit. An iterative algorithm (Bartholomew & Knott, 1999) produces two sets of parameters estimates: latent class probabilities (proportions of individuals in each group) and conditional probabilities (probability of each observed variable, given membership in each class). Along with these parameter estimates, fit statistics, information criteria, and conditional independence are examined for each model, and an appropriate model is chosen (Nylund et al., 2007). LCA entails the estimation of a large number of parameters ((j−1)+(j*m)) where j is the number of latent classes and m is the number of indicator variables). Typically one chooses a model with enough classes to suggest that the conditional independence assumption has been satisfied, while fitting as parsimonious a model as possible.
Figure 1 shows simulated results from a three-class model with 9 observed indicators. The bold line shows observed prevalences of each indicator for the full sample, and the three dotted lines show conditional probability estimates for each class. Examination of the conditional probability estimates addresses the second question of how to weight symptoms to determine of which class an individual is likely to be a member. Classes 1 and 2 resemble each other in that the symptoms that individuals in those classes are most likely to have are changes in appetite or sleep, and feelings of worthlessness. In class 3, individuals are more likely to have apathy, motor, or cognitive symptoms. One interpretation might be that classes 1 and 2 represent mild and severe forms of one disorder, and class 3 represents a separate disorder. These patterns however are merely latent constructs, additional information would be needed to establish their validity.
In 1970, Robins and Guze outlined five phases for establishing diagnostic validity. These phases are familiar and well accepted among psychiatrists, and serve as an ideal framework in which to describe how latent variable methods might be used to validate psychiatric phenotypes. The first phase was clinical description, which included both disorder symptoms and demographic variables such as race, age, and sex, and precipitating factors. Those variables that are part of the clinical presentation, but that are not symptoms of the disorder are grouped under the heading of “risk factors”. It is assumed that individuals who share the same disorder will also share risk factors, and distinct latent classes of , say, depression, if they truly represent separate disorders, will show different relationships with risk factors.
The second phase was laboratory studies, which can be broadly defined as any “test” which produces a reliable and precise measurement. This might include imaging, psychometric testing, or genotyping. As in the case of risk factors, one would expect different latent classes of depression to show different test results. The third phase was delimitation from other disorders, to produce patient groups that are as homogenous as possible. The fourth phase was follow-up studies. Patients who are homogenous with regard to their diagnosis should have similar outcomes. This is a simplification, but marked differences among outcomes may indicate that meaningful subtypes exist within a patient group. The final phase was family studies. If individuals in the same family exhibit similar clinical characteristics, existence of the disorder is validated.
At the heart of these phases is the concept that heterogeneity among patients represents heterogeneity in risk factors, antecedents, and outcomes. Increasing homogeneity within research samples should increase the likelihood of identifying potential causes. The problem of developing valid disorder definitions can then be restated as the identification of the best way to group patients in order to maximize this homogeneity.
Figure 2 displays relationships between parts of the Robins-Guze phases as applied to syndromes of depression. There are three elements: observed variables (rectangles), latent variables (circles), and arrows. Observed variables are variables that are directly measured, such as tearfulness, weight loss, or inability to maintain employment. Latent variables, such as depression diagnoses, are variables that are not observable, but are inferred from the observed variables. In the language of structural equation modeling, the latent variable and its indicators constitute the measurement model. Here, disorder symptoms are used as indicators for the latent classes. Arrows point away from a latent class and toward the symptoms to represent the assumption that the symptoms are caused by the underlying, latent disorder. Reversing the direction of these arrows would produce an index, rather than a latent variable. Elements of phase 1 (risk factors) and phase 2 (test results) are shown here as preceding (causing) the latent class. Although both outcomes and indicators are modeled as consequences of the latent class, outcomes are separate and not included in the estimation of the latent classes, so that they might be used as validators. In other applications, elements of each phase might occupy different places in the model; for example, test results might be conceptualized at outcomes or indicators.
One of the first published latent class analyses was Eaton and colleagues’ (1989) study of depression in Wave I of the ECA. They chose a three class model using nine symptom groups from the DSM-III (dysphoria, appetite and sleep changes, psychomotor changes, apathy, fatigue, feelings of worthlessness, cognitive symptoms, and suicidal ideation) as indicator variables. Although true class membership is latent, Bayes’ theorem can be used to calculate each individual’s probability of membership in each class as a function of the parameter estimates and their own pattern of indicator variables. When the authors assigned individuals to the class of which they were most likely to be a member, strong concordance between assignment to the most severe class (Class C) and diagnosis of DSM-III major depressive disorder (MDD) was observed. In this way, the LCA provided empirical support for the DSM-III diagnostic criteria.
The majority of individuals in the ECA were assigned to a class with very low conditional probabilities for all of the symptoms (Class A). Another, smaller set were assigned to a class (Class B) with conditional probabilities intermediate between the those of Class A and Class C). Referring back to figure 1, the relationship between Class A and Class B more close resembled the relationship between that figure’s Class 1 and Class 2, and did not suggest a qualitatively separate disorder. This could be interpreted as a lack of empirical support for the subtyping of depression. However, this lack of differentiation may also have been the result of the authors’ having been obliged to combine opposing symptoms such as weight gain and weight loss into a single dichotomous appetite variable to avoid a scenario where conditional independence could not be attained.
Once potential subtypes have been identified, the next step in this phase is inclusion of risk factors. The simplest approach might be to assign individuals to their most likely class and to treat that assignment as an observed categorical variable, but there are two problems with that approach. First, it disregards the inherent uncertainty of the latent variable. This may not be problematic if class membership is fairly certain, but that is not always the case. The second problem is that this approach, which “fixes” class assignment before adding predictors assumes that the structure of the latent variable is the same at all levels of the predictor. This non-differential measurement assumption (Bandeen-Roche et al., 1997) may not be a valid one. For example, Gallo and Rabins (1999) report that elderly patients with depression often deny feelings of sadness, reporting instead somatic complaints and anhedonia. Not only does risk of depression change with age (Eaton et al., 2007), but also the nature of the latent construct of depression itself changes as individuals get older. Had Eaton et al. (1989) regressed fixed class membership on age, the results might have been problematic to interpret, as it would have presupposed a single, uniform construct of depression across age groups. In fact, a subsequent latent class analysis of ECA Wave 3 data, as the cohort aged, found evidence of an “anhedonic” class (Chen et al., 2000), which was not apparent in the analysis of Wave 1 data in 1989. Individuals in this class had later ages of onset than those in other depressive classes.
One can test for differential measurement by fitting separate models when the predictor is dichotomous (such as sex), but this would double the number of model parameters. In the case of age, a continuous variable, stratification would entail the estimation of far too many parameters. Alternatively, one can estimate both the measurement of the latent variable (latent class measurement model) and its relationship to its risk factors (structural model) at the same time. This simultaneous latent variable model (latent class regression) was first described by Dayton and Macready (1988). Estimation and model checking was expanded by Bandeen-Roche et al. (1997). This method precludes the need for fixed class assignment, and one can explore the presence of differential measurement by observing whether the estimates from the measurement portion of the model change after predictors are added. Assuming that differential measurement is not present, the model estimates can be examined to see what effect each risk factor has on the probability of membership in each latent class. For example, one might observe that family history of depression significantly increased the likelihood of membership in one latent class of depression that resembled the DSM diagnosis of “melancholic” depression, but had no relationship with other latent classes. This would support the hypothesis that the “melancholic” latent class of depression was a valid, separate entity.
If latent classes represent valid, separate disorders, one would also expect to see distinct patterns of relationships with test results. These tests results can be included in the latent variable model in the same way that risk factors were added, through latent class regression. One example of latent class regression with a combination of risk factors and test results was a study of the relationship between APO-E4 genotype and late-life depression (Yen et al., 2007). The authors’ goal was to determine if the presence of an E4 allele increased the risk of any type of depression, and whether that relationship still existed after controlling for lipid profile, cognitive impairment, and vascular risk factors. Since the symptomatology of depression may be different among the elderly, it was appropriate to use latent class analysis to define depressive subtypes, as use of DSM criteria alone might have failed to pick up all cases, and would not have distinguished between potentially meaningful subtypes. Using items from the Taiwanese Depression Questionnaire (TDQ) as indicators, the authors fit a latent class model with four classes. Next, the authors fit a series of latent class regression models with APO-E4, vascular risk factors, and demographic variables as predictors of latent class membership. They found that APO-E4 increased the odds of membership in the severely depressed class by almost a factor of 4, and diabetes was a risk factor for membership in the cognitive class, but not the severe depression class. This differential association with risk factors by class supports the hypothesis of distinct subtypes of depression among the elderly.
Though difficulties with the use of fixed class assignment exist, as described in the previous section, it can yield potentially useful results. Todd et al. (2005) found a significant association between a severe-combined ADHD latent class and the DAT VNTR polymorphism, and Fanous et al. (2008) reported evidence of linkage for seven regions when the sample was stratified by latent class. In both of these cases, candidate genes were identified through the use of subtypes which were not identified in analyses of the full samples.
Demonstrating distinctiveness of latent classes by showing different patterns of associations with risk factors and test results is one way to show separation between classes. Another is to examine how certain class membership is for each individual. While the latent class model assumes distinct classes, and that each individual is, in truth, a member of one and only one class, model estimates only produce probabilities of class membership. If the latent class model fits the data well, than each individual would be expected to have a high probability (~99%) of being in one class, and very low probabilities of being in any other class. Entropy (Dias & Vermunt, 2006), is a measure of certainty in classification ranging from 0 (chance) to 1 (perfect), and can be used to judge separation among classes. A high entropy value suggests that overall, latent classes are distinct, whereas a low value suggests that class assignment of individuals is uncertain. For example, the latent class regression model of Yen et al. (2007), had an entropy value of 0.928, suggesting that class assignment for individuals in the sample was quite definitive. Lack of delineation between disorders can also be demonstrated through LCA. For example, analyses of symptoms from the Great Britain National Psychiatric Morbidity Survey failed to demonstrate segregation of depression and anxiety by class, suggesting they may represent a single phenomenon (Das-Munshi et al., 2008).
In the work of Yen et al. (2007), associations between latent class membership and risk factors were cited as being supportive of the validity/existence of the latent subtypes. This argument could be criticized as being somewhat circular, though, because the latent class measurement model and its relationships with risk factors were estimated simultaneously. It might be considered a type of internal validity or internal consistency. To establish external validity, one needs to show that class membership is differentially associated with variables not included in the model that estimated the latent classes. These associations should also be consistent with current knowledge of the disorder.
As an example, Kaptein et al. (2006) estimated latent classes of longitudinal patterns of depressive symptoms following hospital admission for myocardial infarction (MI). After assigning individuals to the classes to which they were most likely to be a member, approximately half (54.6%) fell into a class with few or no symptoms (class 1), a quarter (25.7%) had mild but constant symptoms (class 2), 9.3% had moderate but constant symptoms (class 3), 4.6% had severe but decreasing symptoms (class 4), and 4.0% had severe and increasing symptoms (class 5). Compared to non-depressed individuals in class 1, Cox proportional hazards models for risk of new cardiac events showed a clear trend in hazard ratios from 1.43 for class 2 to 2.73 for class 5. This differential risk of additional MI as a function of class membership provides external validity for the concept of subgroups of post-MI depression based on course of symptoms.
There are several related latent variable methods, including latent transition analysis (LTA) (Collins & Wugalter, 1992) and growth mixture modeling (GMM) (Muthen & Muthen, 2000), that can be used to address longitudinal courses of psychiatric disorders more directly. Latent transition analysis estimates latent classes at each of a set of time points, as well as probabilities of transition between latent classes. Witkiewitz (2008) used this model to study post treatment alcohol use, and found that while most individuals remained in the same drinking class over time, alcohol dependence was associated with increased probability of transition into a heavier drinking class. Additional examples include the Connell et al. (2008) study of changes in children’s’ emotional problems and the Schumann et al. (2002) analysis of transitions in readiness to quit smoking.
GMM, like LCA and LTA, assumes the existence of underlying classes of individuals. While in LCA classes are defined by their relationships with cross-sectionally measured categorical indicator variables, latent growth classes are defined by longitudinal trajectories of a single, typically continuous, variable. Predictors of growth class membership as well as within-class variations in trajectory as a function of covariates can also be modeled (Muthen et al., 2002). Examples of its utility include identification of trajectory classes of aggressive behavior (Schaeffer et al., 2003), reading difficulty (Boscardin et al., 2008) and criminal behavior (Kreuter & Muthen, 2008).
The fifth Robins-Guze phase suggested that co-occurrence of a disorder among family members can be used as a validator. One can assess familiality by examining class membership concordance among relatives. Sullivan et al. (2002) applied LCA to symptoms of MDD among 2941 members of a twin registry. In contrast to the LCA of Eaton et al. (1989), this analysis included only individuals who had reported at least one depressive symptom in the previous year, and symptom groups were “disaggregated”, and a total of 14 observed indicators, including appetite gain, appetite loss, weight gain, and weight loss, were included separately, rather than being grouped. The authors chose a model with seven classes; these included a “typical” class with high prevalences of depressed mood, appetite decrease, insomnia, and fatigue, and an “atypical” class with high prevalences of appetite and weight increase. The authors assigned individuals to the class they were most likely to be a member, and then used kappa statistics to examine class membership concordance among twin-pairs. Several of the kappa statistics were statistically significantly greater than 0, but all were low (0.05–0.08), suggesting only modest familiality. A model with fewer classes, and which included twins without symptoms might have shown a stronger familiality signal. It would also be preferable to employ methods to account for the clustering introduced by the use of twin pairs.
After describing the five phases, Robins and Guze (1970) stated that each of the phases were inter-related with the others, and that advances in one might lead to improvements in others, thus creating an iterative process where “continuing self-rectification and increasing refinement” resulted in increasingly homogenous subgroups. This idea was echoed by Craddock and colleagues (2006) who described a cycle whereby identification of a genetic signal aids in the refinement of phenotypic definitions, which then increases power to detect other candidate genes. This process can be expanded further to include multiple disciplines; an example of this process is shown in figure 3. Here, latent class analysis is first conducted in a large, epidemiological sample, and can be viewed as hypothesis generating. This along with regression of latent classes on risk factors constitutes phase 1. Heritability of the latent classes can be determined (phase 5), and genetic studies such as genome-wide association scans can identify candidate loci. Next (phase 2), basic science research would identify candidate genes and biomarkers, which would elucidate potential mechanisms and targets for interventions. This would lead to follow-up studies (phase 4) of clinical course and outcomes as well as clinical treatment trials. Such multidisciplinary collaboration is readily accomplished, particularly through the establishment of disease-specific centers with clinical, pathology, imaging, genetic, and biostatistics cores. Existing example of these centers include the Alzheimer Disease Research Centers (ADRCs) and the National Parkinson Foundation Centers for Excellence. This network of collaborators from each discipline in turn must generate and share data across the variety of studies described in the five phases.
Each of these phases informs phenotypic refinement and delineation of disorders (phase 3). In this way, even if early phenotype definitions were imprecise or lacking key features in early iterations, or if those features were measured imperfectly, they would still be marginally better than previous attempts, and the resulting substantive science would also be incrementally improved upon. These phases would also inform the development of new statistical methods to incorporate accumulated scientific knowledge within the measurement model. The time between development of a method and its use by substantive researchers is often measured in decades, and methodological researchers may not be aware of which models would be most useful to substantive researchers. Additionally, it is not enough to develop a method; it must also be “publicized” among applied researchers with accessible software. Commercially available latent variable software packages include MPLUS (www.statmodel.com) and LatentGold (www.latentgold.com). There are also a number of contributed R packages (www.r-project.org/) which are free and perform a variety of latent variable analyses.
This paper describes how discrete latent variable methods might be used to define valid phenotypes within the framework described by Robins and Guze (1970). The latent class measurement model defines the disorder, and predictors of disorders can be accommodated through latent class regression. Methods to study concordance among family members have been developed, and methods for external validation have been described. Cooperation between disciplines will be essential to the iterative validation model, which will serve the dual purpose of furthering research into mechanisms and treatment of psychiatric disorders, as well as validating their latent constructs.
It is important to note that most (if not all) of what has been proposed assumes that the latent constructs (phenotypes) have a physical substrate. It may well be that for some disorders, heterogeneity in presentation is only a reflection of the complex nature of these disorders. There is evidence already that heterogeneity in clinical presentation may not necessarily imply genetic heterogeneity. For example, there is good linkage evidence for chromosome 13q33 for both bipolar disorder and schizophrenia (Jamra et al., 2006). The strongest bipolar findings occurred among those with persecutory delusions (Schulze et al., 2005), suggesting either that those individuals were misdiagnosed schizophrenia cases, or that the genotype produces not the disorders themselves, but an endophenotype which is shared by some individuals with both disorders. In that case, it would be possible for the proposed latent variable model to separate individuals with and without a particular endophenotype and to elucidate its relationship with a genotype. There are two scenarios though which would be more complicated. If bipolar disorder and schizophrenia merely represent artificial groupings of symptoms, each with their own causes, then the assumption of an underlying latent phenotype is incorrect, and item-specific analyses is more appropriate. Lastly, while the latent variable model can accommodate multiple genes and interactions, it is not clear how pleiotrophy (a single gene by itself expressible as any of a number of phenotypes) would be addressed. In that case, the premise that phenotypic heterogeneity implies genotypic heterogeneity would be incorrect.
Another assumption required for the use of LCA is that psychiatric disorders are discrete and delimitable entities (Kendell & Jablensky, 2003). Models for continuous latent variables (e.g., factor analysis, latent profile analysis) have been developed, and may be more appropriate for conditions typically measured along a continuum, such as intelligence or personality. Diagnoses may also benefit from a multidimensional approach. A scenario such as major depression in the context of a personality disorder or mental retardation could be accommodated by specifying a latent construct of the axis I disorder which varied as a function of the presence or degree of the axis II disorder. Additionally, “hybrid” analyses with categorical latent variables which are also measured along a continuum have recently been developed. For example, factor mixture models (Lubke & Muthen, 2005) incorporate discrete latent classes, but allow for conditional dependence among observed variables which is modeled using a common factor structure. It has been proposed that DSM-V incorporate dimensional measurement within categorical diagnoses (Kraemer, 2007); factor mixture models would lend themselves to that framework. The DSM-V research agenda also calls for developmental epidemiological studies (Pine et al., 2002); growth mixture models would be well-suited to that purpose. The concepts discussed in this paper are readily adapted to these more complex latent variable models.
Collaboration between methodological and substantive researchers has already yielded several avenues for further research. Perhaps the most common difficulty with these methods is the large sample size requirement. Simulations studies have suggested at least 500 subjects are required to choose the correct number of classes (Yang et al., 2006)., and precise estimation of conditional probabilities, such that each class has a distinct profile of symptom prevalences which distinguishes it from the other classes, requires an even larger sample size, perhaps greater than 1,000 (Leoutsakos, 2007). Further simulation studies of scenarios with a range of numbers of latent classes and indicators are needed in order to guide researchers considering using these models. Typically, clinical samples have sizes of 200 or less, which is probably too small. On the other hand, larger epidemiological studies are expensive, and may lack the type of data (e.g., genotyping or imaging) that the proposed process would require. To address this, it may be possible to estimate the measurement model with a large sample, and the structural piece with data from a smaller subsample, as described in Xue & Bandeen-Roche (2002).
Another challenge involves determining how additional scientific knowledge can be incorporated into measurement models. The non-differential measurement assumption of latent class regression implies that subtype definitions should not change after predictors are added to the model. However, knowledge about the salient features of hypothesized subtypes is limited, and it may be that the latent structure arrived upon could be “wrong”, but in some way “correctable” by allowing additional scientific knowledge (e.g., presence of a known risk genotype) to influence estimation of the measurement model. Work in this area has already begun (Houseman, et al., 2005; Bandeen-Roche, et al., 2007; Leoutsakos, 2007), and is a mathematical analog of the proposed inter-disciplinary collaboration.
Preparation of this manuscript was supported by grants from the NIMH (T32-MH019901) and NIA (T32-AG026778, P50AG05146-26). We would like to thank Paul Rosenberg and Marilyn Albert for their comments on earlier manuscript drafts.
The authors have no competing interests.