|Home | About | Journals | Submit | Contact Us | Français|
Vulnerability to anorexia nervosa (AN) and bulimia nervosa (BN) arise from the interplay of genetic and environmental factors. To explore the genetic contribution, we measured over 100 psychiatric, personality and temperament phenotypes of individuals with eating disorders from 154 multiplex families accessed through an AN proband (AN cohort) and 244 multiplex families accessed through a BN proband (BN cohort). To select a parsimonious subset of these attributes for linkage analysis, we subjected the variables to a multilayer decision process based on expert evaluation and statistical analysis. Criteria for trait choice included relevance to eating disorders pathology, published evidence for heritability, and results from our data. Based on these criteria, we chose six traits to analyze for linkage. Obsessionality, Age-at-Menarche, and a composite Anxiety measure displayed features of heritable quantitative traits, such as normal distribution and familial correlation, and thus appeared ideal for quantitative trait locus (QTL) linkage analysis. By contrast, some families showed highly concordant and extreme values for three variables — lifetime minimum Body Mass Index (lowest BMI attained during the course of illness), concern over mistakes, and food-related obsessions — whereas others did not. These distributions are consistent with a mixture of populations, and thus the variables were matched with covariate linkage analysis. Linkage results appear in a subsequent report. Our report lays out a systematic roadmap for utilizing a rich set of phenotypes for genetic analyses, including the selection of linkage methods paired to those phenotypes.
Most common human diseases have a complex etiology involving both genetic and environmental factors. This complexity makes the task of finding relationships between phenotypes and genotypes challenging. To achieve a substantial probability of success requires highly efficient experimental designs and analytic methods. This observation is true whether association, linkage/association or pure linkage designs are employed. Here we focus on the latter, with special reference to affected sibling/relative pair linkage designs (ASP/ARP). For reasonable models of complex disease and reasonable effects of disease loci, the power of ASP/ARP designs is low (Risch and Merikangas, 1996). Still, these designs are commonly used for studies to find linkage. To enhance their power, the use of covariates has been proposed, and it is common for investigators to measure numerous traits on affected individuals with this goal in mind. In response, covariate methods have been developed (review in Devlin et al., 2002a; Hauser et al., 2004) and employed (Goddard et al., 2001; Olson et al., 2001, Devlin et al., 2002b; Scott et al., 2003).
Because many of these covariates are quantitative, an alternative approach would be to use the covariates or “traits” in QTL linkage analysis (Almasy and Blangero, 1998; Etzel et al., 2003). QTL methodologies for ASP/ARP designs continue to be developed (Szatkiewicz and Feingold, 2004; TCuenco et al., 2003) and are being used to hunt for genetic variation affecting liability to disease (Evans et al., 2004; Loo et al., 2004).
Insofar as we are aware, a substantial void exists between the available QTL and covariate linkage methodologies and their application to ASP/ARP designs, namely there is no roadmap for selecting amongst the potentially numerous variables or for selecting the optimal methodology to analyze the resulting data (see also Rampersaud et al., 2003). We present a novel, structured approach to variable selection and to the teaming of variables with linkage methods, which we apply to data from two eating disorder cohorts.
For quantitative traits, a natural method to test for linkage is QTL linkage analysis. Such analyses typically assume trait values follow a normal distribution, approximately, and are substantially heritable, which implies that trait values in families are correlated (Fig. 1a); we refer to these features as “classic features of quantitative traits.” Whenever a trait is associated with liability to disease, however, trait values for affected individuals can be quite different than that expected from a random sample of the population, and features of the population distribution (e.g., mean, standard deviation) are critical for interpreting the data. It is possible that trait values convey no information not already embodied in diagnosis, in which case QTL linkage analysis is not useful.
QTL linkage analysis is not ideal in related settings as well. Imagine that the disease of interest is etiologically heterogeneous, that a quantitative trait is genetically-correlated with disease via a subset of the loci generating vulnerability, and trait values convey information only on the etiology of disease. This model for the data was formalized by using a mixture model in which a certain fraction π of families traces a portion of their liability to variation at disease locus l whereas 1-π do not (Devlin et al., 2002a). Trait value X is informative about membership in these groups. Notice that only π families would show linkage for markers proximate to l. Empirical examples of such traits, or covariates, are age-of-onset for breast cancer and Alzheimer disease, and their distribution differ from those amenable to QTL linkage analysis (Fig. 1b).
While some subtypes of eating disorders (i.e. restricting subtype of anorexia nervosa) have homogenous presentation compared to most psychiatric disease, they still have complex etiology. Many of the traits thought to underlie vulnerability to eating disorders exhibit substantial variation in the population (e.g., perfectionism), reminiscent of classic quantitative traits. Some traits, however, show quite different distributions and instead seem to reveal a mixture of populations within the eating disorders sample (Devlin et al., 2002b). For these traits, a covariate linkage analysis based on mixture models seems more appropriate. Our structured algorithm will, therefore, focus on two methods of analysis, QTL and mixture-model-based covariate linkage analysis.
Supported through funding provided by the Price Foundation, three cohorts of subjects relevant to this study have been recruited. Approximately 200 people with AN and their affected relatives with an eating disorder was recruited for the AN cohort (Kaye et al., 2000). This sample includes psychological assessments and blood samples from 196 probands and 229 affected relatives. For this analysis we focused on 154 families with affected siblings (140 with two affected siblings, 9 with 3 and 5 with 4; ~ 6% male). Approximately 316 people with BN and their affected relatives with an eating disorder were recruited during 1996-99 for the BN cohort. For these analyses, we focused on 244 families with affected siblings (228 with two affected siblings, 15 with 3 and 1 with 4; ~ 2% male). Control women (N=697), who were screened to be free from Axis I and II pathology, were recruited from the same sites as the BN cohort. Data from measured traits of control women approximate a random sample from the population, and were used to evaluate trait distributions from the AN and BN cohorts.
The Structured Interview on Anorexia Nervosa and Bulimic Syndromes (SIAB) was used to assess lifetime history of eating disorder diagnoses. Additional diagnostic information (i.e., course, severity) was obtained from the Structured Clinical Interview for DSM IV Axis I Disorders (SCID I) (First et al., 1997). A battery of standardized instruments (Table 1) was chosen to assess potential traits related to core eating disorder symptoms, mood, temperament and personality.
Over 100 variables were available from the AN and BN cohorts, creating a large, multidimensional space for analysis, on the order of the number of families participating. For this reason, we used our knowledge of eating disorders and related phenotypes to reduce this pool. Our choices were guided by several criteria: must be consistently related to eating pathology; must be heritable (or at least show correlation or clustering in our families); and must be indicators of severity of illness, or enduring traits rather than states resulting from the illness.
Using the pool of selected variables, we studied their multivariate distribution and determined which ones can be aggregated to produce a reduced set of variables. We used clustering methods for these multivariate analyses (Kaufman and Rousseeuw, 1990), specifically the “hclust” statistical procedure in the statistical package R (Dalgaard, 2002).
By these analyses, we attempted to distinguish whether variables cluster families (matching covariate linkage analysis) or whether they show attributes more typical of quantitative traits (matching QTL linkage analysis). Because age is known to impact many of these variables, we regressed out the effect of age and performed analyses on the residuals. For brevity, we refer to these transformed variables as variables or traits. The approach we took for clustering was similar to that described previously (Devlin et al., 2002a), namely to use simple graphical diagnostics to determine whether a selected variable clustered families. (If the clustering is readily visible, then we assume the variable could be biologically meaningful.) Clustering was performed using a per-family summary statistic, which was judged to be the most informative summary for that variable. For example, if young age-of-onset were thought to be the most informative for genetic analysis (e.g., breast cancer), the maximum value of age-of-onset for each family would be used in the analysis, because the maximum extracts more information about the within-family values of the variable than does the mean or sum.
To evaluate clustering formally, we used the “mclust” procedure (Banfield and Raftery, 1993) in the statistical package R (Dalgaard, 2002). Mclust assumes the observed data come from one or more populations, and the data from each population follows a normal distribution. Mclust estimates the number of populations K from the data, as well as the mean and variance associated with each population, by means of hierarchical, mixture model-based clustering. For a given K, the estimated mean and variance of each population maximizes the likelihood of the data (approximately). This is a non-standard statistical problem for which it is well known that increasing K always increases the likelihood. Therefore K is chosen on the basis of the Bayesian Information Criterion, which is a penalized likelihood procedure that favors parsimony (smaller K).
To evaluate the “within-family” correlation of variables, the intraclass correlation was estimated by using the NESTED procedure of SAS (version 8.1). Specifying family as a class produces an estimate of the variance attributable to family, which can then be used with the estimate of the total variance to obtain the interclass correlation. To contrast the distribution of variables in the AN/BN cohorts to that for control women, we use regression models and Generalized Estimating Equations methods (Liang and Zeger, 1986), which account for relatedness by adjusting the variance of parameter estimates.
To select a parsimonious subset of variable for linkage analysis, as well as choose the type of linkage analysis to be applied, we employed a structured approach to variable selection (Fig. 2). In Stage 1, clinical criteria were used to winnow the number of variables. To be chosen, the variable must be related to eating pathology, be heritable based on published data or at least familial in our data, and be relatively insensitive to state of illness. By these criteria, 26 variables were selected for the AN cohort and 24 for the BN cohort (Table 1). There was substantial overlap between the variables selected, which was purposeful because we hoped the overlapping variables would prove useful for linkage analysis of the combined AN/BN cohorts. All of these selected variables (Table 1) had some degree of support from the literature in terms of heritability and insensitivity to state of illness (i.e., trait-like qualities), as well as support from our clinical experience and from the contrast of women with eating disorders and control women (data not shown). Examples of variables excluded from consideration include Maturity Fears (associated with eating disorders, but not known to be heritable) and Agreeableness (no specific relationship to eating disorders).
In Stage 2, we sought to determine the degree of independence of the variables selected in Stage 1. Most were largely independent for both the AN and BN cohorts (Figs. (Figs.33 and and4).4). Several variables displayed moderate correlation (≥ 0.63). For the AN cohort, Obsessions were substantially correlated with Compulsions, Self-Directedness with Anxiety, Drive-For-Thinness with Body Dissatisfaction, and Concern Over Mistakes with Doubts About Actions; finally SIAB Compulsions (Table 1) showed moderate correlation with Obsessions and Compulsions (Fig. 3). Similar results accrued for the BN cohort, with the exception that Neuroticism was substantially correlated with both Anxiety and Self-Directedness. (Neuroticism as not measured in the AN cohort.)
Results from Stage 2 showed that certain variables contribute redundant information for individuals with eating disorders. Deliberations in Stage 3 focused on whether to combine these variables into composite variables by multivariate analysis, or select one of them for further analysis. Without missing data, composite variables would be preferred because they extract information for two or more variables. Missing data for either variable on a particular individual generates missing data for the composite variable for that individual (without imputation). Thus missing data were typically greater for the composite variable than for any of the variables to be combined. This problem was worsened for families, the unit of interest. Therefore, in most cases, we chose to target one of the variables; the exception is described below.
The group of eating-disorder experts believed that a fundamental underlying feature of the pathology of eating disorders is anxiety and that eating disorder pathology can serve an anxiolytic function. They doubted Anxiety, as measured (Table 1), would capture that feature. The first subscale of Harm Avoidance, anticipatory worry, captures a key feature of anxiety seen in individuals with eating disorders (Fassino et al, 2004; Klump et al., 2004). Therefore a composite variable was derived, consisting of the first principal component of Anxiety and the first subscale of Harm Avoidance (PC-Anxiety).
Stage 4 consisted of analysis of familiality, measured by either the magnitude of the correlation of trait values within families or whether the traits clustered families into distinct and meaningful groups (Fig. 1). Variables showing strong intraclass correlation (≥ 0.20) in both AN and BN cohorts include maximum BMI, Cooperativeness, Age at Menarche, Self Transcendence, and Obsessionality (Table 1). Harm Avoidance also shows substantial intraclass correlations, missing the cutoff by 0.01 for the BN cohort only. Other variables show strong intraclass correlations in one sample only (Table 1).
Concern over Mistakes, Harm Avoidance 2, Organization, and Obsessions over Food appear to cluster families into meaningful groups. Formal analysis for a mixture supports the visual diagnostics. Food Obsessions shows distinctive features of clustering and extreme values in individuals with eating disorders (Fig. 5). Most individuals with eating disorders are 4-6 standard deviations from the mean value for control women (Fig. 5). Moreover, when one sibling is extreme for this trait, the other sibling tends to be extreme as well. There are exceptions, however, creating a strong cluster of ASP who are extreme and concordant for Food Obsessions, and other ASP who are dispersed in other regions of bivariate space (Fig. 5). By contrast Age at Menarche shows substantial intraclass correlation (Table 1), but no evidence for clustering (Fig. 5). Minimum BMI shows clustering of families in the BN cohort, but not for families in the AN cohort. The lack of clustering in the AN cohort is largely structural, however, because by clinical definition individuals affected with AN must achieve and maintain remarkably low BMI. PC-Anxiety shows no clustering, and relatively low intraclass correlation (~ 0.1).
Stage 5 required the final selection of variables based on the analyses of Stage 4, relevance to eating disorder pathology, and clinical experience and insight. It is possible that a trait could be familial — in that it either clusters families or shows high intraclass correlation — yet be unrelated to liability to eating disorders. If a trait were related to liability, we expect its distribution in people diagnosed with eating disorders to be displaced (e.g., trait has a different mean) relative to a sample from the population. State of illness can impact values of the traits, so displacement must be evaluated critically.
We selected three traits that seemed most appropriate for QTL linkage analysis, namely Obsessionality, Age at Menarche, and PC-Anx. The first two show substantial intraclass correlations in both the AN and BN cohorts and no evidence for a mixture of populations of multiplex families (clustering). PC-Anx also showed no clustering, but it shows fairly small intraclass correlations for both the AN and BN cohorts; nonetheless, based on the literature (Godart et al., 2002; Strober, 2004) and expert opinion, we opted to include it in the final set of variables. Total Harm Avoidance could be another candidate, but it was ruled out because of its correlation with a component of PC-Anx. Self Transcendence and Maximum BMI were ruled out because their connection to eating disorders was tenuous; there was little or no difference between the control and eating disorder samples for Self Transcendence; and the Maximum BMI, while distinct between controls and eating disorder groups, tended to be rather low in the eating disorder sample.
We also selected three traits for covariate linkage analysis, Minimum BMI, Concern over Mistakes, and Food Obsessions. All three clustered families (e.g., Fig. 5). Concern over Mistakes and Food Obsessions showed similar features in both cohorts. While Minimum BMI showed no evidence of clustering in the AN cohort, that was judged unimportant because low BMI is an essential component for the diagnosis of AN. None of these variables showed substantial intraclass correlations for either data set (Table 1). Organization was ruled out because its values in the eating disorder samples were only weakly differentiated from those of the control sample.
From a field of over 100 phenotypes collected from two multiplex samples of eating disorder families, as well as a sample of control women, we sought to select a parsimonious set of variables that would prove useful to linkage analysis and to match these variables to the kind of analytic method for linkage. To achieve this goal, we performed a structured analysis of the phenotypes (Fig. 2), selecting three variables for QTL linkage analysis: Obsessionality, Age at Menarche, and PC-Anx. Obsessionality is a heritable trait (Jonnal et al., 2000), is a salient personality feature of individuals with AN and BN (Halmi et al., 2000), and family studies report increased prevalence of OCD and OCPD in relatives of individuals with EDs (Lilenfeld et al., 1998). The relation between age at menarche and eating disorders is salient yet incompletely understood. Bulimia nervosa is often associated with oligomennorhea despite the persistence of normal weight (Bulik et al., 2000), early age at menarche has been associated with the development of binge-eating in the absence of compensatory behaviors (Reichborn-Kjennerud et al., 2004), and it is heritable (Kirk et al., 2001). Anxiety disorders are common among people with eating disorders (Walters and Kendler, 1995; Godart et al., 2002; Kendler et al., 1995), usually precede onset of AN or BN (Deep et al., 1995; Bulik et al., 1997; Kaye et al., in press), and are heritable, as are related traits (Hettema et al., 2001).
Some variables, such as Food Obsessions, barely show any familial correlation (Table 1), yet they demonstrate strong familiality when viewed on a different scale — the ability to cluster families (Fig. 1). We use this feature in our structured analysis (Fig. 2) to select three other variables for covariate linkage analysis: Minimum BMI, Concern over Mistakes, and Food Obsessions. Covariate linkage analysis is not as straightforward as QTL linkage analysis. We assume the covariate probabilistically identifies a cluster of families that are “linked” at a liability locus while other families are not linked (Devlin et al., 2002a; Devlin et al., 2002b). Therefore our ultimate goal for this analysis is to assign probabilities or weights of membership into the linked and unlinked groups, and biological insights must determine which group is targeted for linkage analysis.
Selection of these three variables is supported by the fact that lifetime minimum BMI is a marker for severe anorexia nervosa and is associated with poor outcome (Lowe et al., 2001). Concern over Mistakes is a heritable component of perfectionism (Tozzi et al., 2004) and a personality feature that appears to be somewhat uniquely associated with the presence of eating disorders (Bulik et al., 2003). Finally, food obsessions combine the highly obsessional nature of individuals with eating disorders with a focus on food and related behaviors (Halmi et al., 2000; Mazure et al., 1994).
In the absence of clearly defined and biologically relevant endophenotypes, the selection of optimal traits for and approaches to linkage poses substantial methodological challenges in psychiatric genetics. On one level, the clinical phenotypes of anorexia nervosa and bulimia nervosa are clear. Viewed more critically, however, no measure exists that captures the essential essence of “eating disorderedness.” In such instances, novel, systematic approaches to selecting and evaluating the appropriateness of traits from among a pool of many are essential. We provide a roadmap for trait selection that can be applied to genetic research on other disorders for which comprehensive phenotyping has occurred. Our algorithm offers a rational, systematic blueprint for data modeling by parsimoniously selecting traits and then matching them with linkage approach. By its nature, data modeling involves choices that depend on the properties of the data and available analytic methods, as well as the goals of the analyst. Our goal is to be as parsimonious as possible in terms of the number of tests of linkage performed. We therefore winnow a long list of possible phenotypes to a short list of traits we expect to harbor key information about liability to eating disorders. Moreover, we carry forward our parsimony principle by selecting a priori the kind of linkage method to be used with each trait. In Bacanu et al. (in review), we show that, by using our data modeling, significant and suggestive linkages arise more often than expected by chance.
The authors wish to thank the Price Foundation for the support of the clinical collection of subjects and maintenance of the data. Data analysis was supported by a MH057881 and MH066117 (to BD), the latter part of a collaborative R01 grant to study the genetic basis of eating disorders. S-AB was also supported by a NARSAD Young Investigator Award. The authors are indebted to the participating families for their contribution of time and effort in support of this study.