To select a parsimonious subset of variable for linkage analysis, as well as choose the type of linkage analysis to be applied, we employed a structured approach to variable selection (). In Stage 1, clinical criteria were used to winnow the number of variables. To be chosen, the variable must be related to eating pathology, be heritable based on published data or at least familial in our data, and be relatively insensitive to state of illness. By these criteria, 26 variables were selected for the AN cohort and 24 for the BN cohort (). There was substantial overlap between the variables selected, which was purposeful because we hoped the overlapping variables would prove useful for linkage analysis of the combined AN/BN cohorts. All of these selected variables () had some degree of support from the literature in terms of heritability and insensitivity to state of illness (i.e., trait-like qualities), as well as support from our clinical experience and from the contrast of women with eating disorders and control women (data not shown). Examples of variables excluded from consideration include Maturity Fears (associated with eating disorders, but not known to be heritable) and Agreeableness (no specific relationship to eating disorders).
Flow chart for structured approach to variable selection.
In Stage 2, we sought to determine the degree of independence of the variables selected in Stage 1. Most were largely independent for both the AN and BN cohorts (Figs. and ). Several variables displayed moderate correlation (≥ 0.63). For the AN cohort, Obsessions were substantially correlated with Compulsions, Self-Directedness with Anxiety, Drive-For-Thinness with Body Dissatisfaction, and Concern Over Mistakes with Doubts About Actions; finally SIAB Compulsions () showed moderate correlation with Obsessions and Compulsions (). Similar results accrued for the BN cohort, with the exception that Neuroticism was substantially correlated with both Anxiety and Self-Directedness. (Neuroticism as not measured in the AN cohort.)
Clustered variables for the AN cohort. Dissimilarities are calculated as one minus the squared correlation each pair of variables.
Clustered variables for the BN cohort. Dissimilarities are calculated as one minus the squared correlation each pair of variables.
Results from Stage 2 showed that certain variables contribute redundant information for individuals with eating disorders. Deliberations in Stage 3 focused on whether to combine these variables into composite variables by multivariate analysis, or select one of them for further analysis. Without missing data, composite variables would be preferred because they extract information for two or more variables. Missing data for either variable on a particular individual generates missing data for the composite variable for that individual (without imputation). Thus missing data were typically greater for the composite variable than for any of the variables to be combined. This problem was worsened for families, the unit of interest. Therefore, in most cases, we chose to target one of the variables; the exception is described below.
The group of eating-disorder experts believed that a fundamental underlying feature of the pathology of eating disorders is anxiety and that eating disorder pathology can serve an anxiolytic function. They doubted Anxiety, as measured (), would capture that feature. The first subscale of Harm Avoidance, anticipatory worry, captures a key feature of anxiety seen in individuals with eating disorders (Fassino et al, 2004
; Klump et al., 2004
). Therefore a composite variable was derived, consisting of the first principal component of Anxiety and the first subscale of Harm Avoidance (PC-Anxiety).
Stage 4 consisted of analysis of familiality, measured by either the magnitude of the correlation of trait values within families or whether the traits clustered families into distinct and meaningful groups (). Variables showing strong intraclass correlation (≥ 0.20) in both AN and BN cohorts include maximum BMI, Cooperativeness, Age at Menarche, Self Transcendence, and Obsessionality (). Harm Avoidance also shows substantial intraclass correlations, missing the cutoff by 0.01 for the BN cohort only. Other variables show strong intraclass correlations in one sample only ().
Concern over Mistakes, Harm Avoidance 2, Organization, and Obsessions over Food appear to cluster families into meaningful groups. Formal analysis for a mixture supports the visual diagnostics. Food Obsessions shows distinctive features of clustering and extreme values in individuals with eating disorders (). Most individuals with eating disorders are 4-6 standard deviations from the mean value for control women (). Moreover, when one sibling is extreme for this trait, the other sibling tends to be extreme as well. There are exceptions, however, creating a strong cluster of ASP who are extreme and concordant for Food Obsessions, and other ASP who are dispersed in other regions of bivariate space (). By contrast Age at Menarche shows substantial intraclass correlation (), but no evidence for clustering (). Minimum BMI shows clustering of families in the BN cohort, but not for families in the AN cohort. The lack of clustering in the AN cohort is largely structural, however, because by clinical definition individuals affected with AN must achieve and maintain remarkably low BMI. PC-Anxiety shows no clustering, and relatively low intraclass correlation (~ 0.1).
ASP values for food obsessions (top) and age at menarche for the AN (left) and BN cohorts.
Stage 5 required the final selection of variables based on the analyses of Stage 4, relevance to eating disorder pathology, and clinical experience and insight. It is possible that a trait could be familial — in that it either clusters families or shows high intraclass correlation — yet be unrelated to liability to eating disorders. If a trait were related to liability, we expect its distribution in people diagnosed with eating disorders to be displaced (e.g., trait has a different mean) relative to a sample from the population. State of illness can impact values of the traits, so displacement must be evaluated critically.
We selected three traits that seemed most appropriate for QTL linkage analysis, namely Obsessionality, Age at Menarche, and PC-Anx. The first two show substantial intraclass correlations in both the AN and BN cohorts and no evidence for a mixture of populations of multiplex families (clustering). PC-Anx also showed no clustering, but it shows fairly small intraclass correlations for both the AN and BN cohorts; nonetheless, based on the literature (Godart et al., 2002
; Strober, 2004
) and expert opinion, we opted to include it in the final set of variables. Total Harm Avoidance could be another candidate, but it was ruled out because of its correlation with a component of PC-Anx. Self Transcendence and Maximum BMI were ruled out because their connection to eating disorders was tenuous; there was little or no difference between the control and eating disorder samples for Self Transcendence; and the Maximum BMI, while distinct between controls and eating disorder groups, tended to be rather low in the eating disorder sample.
We also selected three traits for covariate linkage analysis, Minimum BMI, Concern over Mistakes, and Food Obsessions. All three clustered families (e.g., ). Concern over Mistakes and Food Obsessions showed similar features in both cohorts. While Minimum BMI showed no evidence of clustering in the AN cohort, that was judged unimportant because low BMI is an essential component for the diagnosis of AN. None of these variables showed substantial intraclass correlations for either data set (). Organization was ruled out because its values in the eating disorder samples were only weakly differentiated from those of the control sample.