|Home | About | Journals | Submit | Contact Us | Français|
Multiple substance dependence (MSD) trait comorbidity is common, and MSD patients are often severely affected clinically. While shared genetic risks have been documented, so far there has been no published report using the linkage scan approach to survey risk loci for MSD as a phenotype. A total of 1,758 individuals in 739 families [384 African American (AA) and 355 European American (EA) families] ascertained via affected sib-pairs with cocaine or opioid or alcohol dependence were genotyped using an array-based linkage panel of single-nucleotide polymorphism markers. Fuzzy clustering analysis was conducted on individuals with alcohol, cannabis, cocaine, opioid, and nicotine dependence for AAs and EAs separately, and linkage scans were conducted for the output membership coefficients using Merlin-regression. In EAs, we observed an autosome-wide significant linkage signal on chromosome 4 (peak lod = 3.31 at 68.3 cM; empirical autosome-wide P = 0.038), and a suggestive linkage signal on chromosome 21 (peak lod = 2.37 at 19.4 cM). In AAs, four suggestive linkage peaks were observed: two peaks on chromosome 10 (lod = 2.66 at 96.7 cM and lod = 3.02 at 147.6 cM] and the other two on chromosomes 3 (lod = 2.81 at 145.5 cM) and 9 (lod = 1.93 at 146.8 cM). Three particularly promising candidate genes, GABRA4, GABRB1, and CLOCK, are located within or very close to the autosome-wide significant linkage region for EAs on chromosome 4. This is the first linkage evidence supporting existence of genetic loci influencing risk for several comorbid disorders simultaneously in two major US populations.
Substance dependence (SD) is a chronic psychiatric illness [Leshner, 1997; McLellan et al., 2000]. This illness has led to substantial injury, loss of life, and disability in societies worldwide. Developing effective treatments for SD is particularly crucial because of the exceptionally high rate of relapse [McGovern et al., 2005] and the tremendous disease burden to society. Dependence on multiple substances, which is frequently observed clinically, complicates the development of effective treatments. Multiple substance use is common among alcohol and drug users and the average percentage of individuals using multiple substances is around 56% [SAMSHA, 2009]. Individuals with comorbid DSM-IV alcohol and drug use disorders outnumber patients with a drug use or alcohol use disorder alone [Stinson et al., 2005] among treatment seekers. This treatment seeking pattern may be a reflection of disease severity and the damage that multiple substance dependence(MSD) inflicts on an individual’s health and well being. Indeed, MSD patients are often severely affected.
Research from adoption, twin and family studies has shown that there is a genetic diathesis in the development of SD [Cloninger, 1987; Mirin et al., 1991; Pickens et al., 1991; Rounsaville et al., 1991; Luthar and Rounsaville, 1993; Merikangas et al., 1998; Tsuang et al., 1998; Kendler et al., 2003; Agrawal and Lynskey, 2008]. Investigations on the genetic liability for SD, in particular MSD, suggest that there are common as well as substance-specific genetic factors [Metten and Crabbe, 1994, 1999; Pickens et al., 1995; Bierut et al., 1998; Tsuang et al., 1998; Grucza and Bierut, 2006; Belknap et al., 2008; Huizink et al., 2010]. And indeed, there are numerous known examples of specific genes known to affect multiple substance use disorders [Gelernter et al., 2006b; Luo et al., 2006, 2007; Yang et al., 2008; Sherva et al., 2010].
Current evidence suggests that copy number variants (CNVs) and rare single-nucleotide polymorphisms (SNPs) might be implicated in the risk for many complex diseases [Walsh et al., 2008; Johansen et al., 2010]. Genomewide association studies are unable to capture most of these rare genetic variants. An adequately powered linkage study design has the advantage of detecting diverse inherited genetic effects that segregate in families, including common variants, rare variants and CNVs. Since the current cost of conducting a large-scale whole genome sequencing study is still high, genetic linkage approaches remain valuable and informative to identify candidate regions for targeted resequencing studies.
Prior linkage studies have identified genomic susceptibility regions for single SD diagnoses [Gelernter et al., 2004, 2005, 2006a, 2007; Han et al., 2010; Panhuysen et al., 2010]. In this study, we proceed from the well-supported hypothesis that substance dependence traits share common genetic liability to some varying extent. Given the potential for shared genetic liability, investigation of dependence on multiple substances makes it possible to gain substantial phenotypic information and statistical power to map-shared susceptibility regions. In this autosomal linkage scan, we used dependence on three illicit drugs (cannabis, cocaine, and opioid) and two legal substances (alcohol and nicotine) as a phenotype. We used fuzzy clustering analysis [Kaufman and Rousseeuw, 1990] to reduce the dimensions and derive a more homogeneous quantitative trait to identify a genetic component common to multiple SDs. Fuzzy clustering yields a membership coefficient that contains more information than the standard hard clustering approaches.
Study subjects were recruited at four recruitment sites including University of Connecticut Health Center (UConn, Farmington, CT), Yale University School of Medicine (APT Foundation, New Haven, CT), Medical University of South Carolina (MUSC, Charleston, SC), and McLean Hospital (Harvard Medical School; Belmont, MA). The number of families recruited at each site is: for AA, Yale 153, UConn 140, McLean 43, MUSC 48, and for EA, Yale 101, UConn 152, McLean 45, and MUSC 52. We screened the families based on the belief that at least two siblings in each family would meet diagnostic criteria for cocaine dependence (CD) or opioid dependence(OD) or alcohol dependence (AD) for inclusion in small nuclear family linkage studies. OD families were recruited primarily at the UConn and Yale sites, while CD families were recruited at all sites. Probands with an axis I clinical diagnosis of a major psychotic disorder (e.g., schizophrenia or schizoaffective disorder) were excluded from the study. We also recruited other siblings and parents whenever available regardless of their affection status to increase the power to detect linkage. Each subject provided written informed consent prior to participation. We complied with the Code of Ethics of the World Medical Association (Declaration of Helsinki). The institutional review board at each site approved the study, and certificates of confidentiality for the work were issued by the National Institute on Drug Abuse and the National Institute on Alcohol Abuse and Alcoholism.
We interviewed and assessed subjects using the Semi-structured Assessment for Drug Dependence and Alcoholism (SSADDA) for psychiatric diagnosis as described previously [Gelernter et al., 2005; Pierucci-Lagha et al., 2005; Pierucci-Lagha et al., 2007]. The five substance dependence (SD) traits included in this study are dependence on alcohol (AD), cocaine (CD), cannabis (CanD), opioids (OD), and nicotine (ND). The diagnosis of each SD trait was based on DSM-IV diagnostic criteria [APA, 1994]. Our study sample was approximately equally divided by sex and subjects ranged in age from 17 to 66 (mean ± SD = 38.9 ± 7.8). Within each SD trait, there were more males among the EA affecteds, who were also younger than AA affecteds. Complete demographic data on study subjects are presented in Table I.
For most subjects, we obtained DNA from immortalized cell lines, but for a small proportion of the subjects, DNA was obtained directly from blood or saliva. A total of 6,008 SNPs were genotyped by the Center for Inherited Disease Research (CIDR) using the Illumina Linkage IVb Marker Panel (http://www.cidr.jhmi.edu). An additional 266 individuals were genotyped at Yale (Keck Center) using the 6,090 SNP Illumina Infinium-12 Human Linkage Marker Panel. The analyses were limited to the autosomal SNPs, and 5,636 and 5,735 SNPs were used from the first and second panels, respectively. Among the SNPs in the two panels, there were 4,518 SNPs in common across the two platforms, and the common SNPs were used for the following quality control and analyses. Allele frequencies were calculated and HWE was examined in each population by means of PLINK software [Purcell et al., 2007] using a set of unrelated subjects (355 EA and 384 AA subjects were randomly selected, one per family). Any SNP that had a genotyping rate ≤0.95, a minor allele frequency (MAF) ≤0.1, or was not in Hardy–Weinberg equilibrium (HWE) (P≤0.01) was excluded from the analysis. We used PedCheck [O’Connell and Weeks, 1998] and Merlin [Abecasis et al., 2002] to identify Mendelian inconsistencies. We used the Merlin “—error” option to detect error-prone genotypes based on the estimated probability of double-crossover events. The Pedigree Relationship Statistical Test (PREST) [McPeek and Sun, 2000] was applied to verify family relationships.
In total, 1,366 (1360), 322 (70), and 46 (40) SNPs failed to pass the quality control including the criteria of genotyping rate, MAF and HWE in AAs (EAs). We limited our analyses to the 4,133 (4,395) remaining autosomal markers in AAs (EAs). In the linkage analysis, we set these problematic genotypes as missing. Pedigree errors were detected in five EA families and two AA families by PREST. Among these families, the relationships in one AA family and five EA families were corrected based on the shared IBD patterns. The re-assigned family relationships were verified by running PREST again. An AA family with an unresolved relationship problem was excluded from further analysis.
Populations with cryptic genetic clusters could contribute differential linkage signals [Lewis et al., 2008; Li et al., 2008; Myles et al., 2008; Lei et al., 2009; Rebbeck et al., 2009] (although not false positive linkage results). We used ancestry informative markers (AIMs) and analyzed them in STRUCTURE, which implements a Bayesian clustering method, to infer genetic ancestry for each subject [Falush et al., 2003, 2007; Pritchard et al., 2000]. A total of 1,574 AIMs were selected from the SNP linkage panel. The selection of AIMs was based on the marker characteristic inferred from the Hapmap CEU (European) and YRI (African) samples. The following criteria were used to select AIMs: (i) the absolute allele-frequency difference (d) between the two HapMap populations > 0.2, (ii) pair-wise SNP r2 < 0.1 within each population, and (iii) HWE testing with P-value > 0.01 within each population. A combination of 10,000 burn-ins and 10,000 collected iterations was implemented in the STRUCTRUE analysis. Individual inferred population was based on >50% estimated ancestry in that population. We identified families as either EA or AA based on the predominant classification for each family.
We used fuzzy clustering analysis [Kaufman and Rousseeuw, 1990] to reduce the phenotype dimensions and to obtain more detailed information of data structures, rather than use each categorical phenotype alone, or typical “hard” clustering methods. The five SD traits, including AD, CD, CanD, OD and ND, for the 1,758 study subjects were the input data for the fuzzy clustering analysis. In a typical (non-fuzzy) partition, each subject is assigned to only one cluster. As a result, this approach is referred to as “hard clustering” because a clear-cut decision is made on each subject’s cluster membership. In contrast, fuzzy clustering allows for some ambiguity of belonging (i.e., cluster membership). In this approach, coefficients (probabilities ranging from 0 to 1) of cluster membership are derived for each subject. Given the substantial comorbidity of SDs and the possible shared genetic liability, fuzzy cluster analysis has the advantage of providing correlated data structures that are drawn from inherent comorbidity and/or underlying shared genetic liability, and provides a way to reduce and mine data. The membership coefficients can be considered more homogeneous quantitative traits for linkage analysis. The fuzzy cluster analysis was implemented using an R clustering algorithm known as the “fanny” algorithm [Kaufman and Rousseeuw, 1990]. The result of fuzzy clustering by the fanny algorithm can be summarized by a silhouette value for each subject [Rousseeuw, 1987]. The silhouette value (“Ŝ”, which ranges from −1 to 1) for each subject is a measure of how similar that subject is to subjects in the same cluster compared to subjects in other clusters. Subjects with a large value of “Ŝ” (almost 1) are very well clustered, a small “Ŝ” (around 0) means that the subject lies between two clusters, and subjects with a negative “Ŝ” are probably placed in the wrong cluster. Our study of anxiety disorders was the first to successfully implement fuzzy clustering in a genome scan [Kaabi et al., 2006].
Linkage analyses were carried out using the model-free regression method implemented in Merlin-regression [Sham et al., 2002]. This method is based on a modified Haseman–Elston method that regresses the estimated identity-by-descent sharing between relative pairs on the squared sums and squared differences of trait values of the relative pairs. Given that marker–marker linkage disequilibrium (LD) inflates multi-point linkage signals [Abecasis and Wigginton, 2005], SNPs were grouped by LD into clusters using the Merlin “—rsq” option of pairwise r2 > 0.10 to reduce linkage bias. We repeated the analyses with r2 thresholds of 0.05, 0.2, and 0.3 to evaluate the robustness of the linkage results.
Empirical thresholds for autosome-wide suggestive and significant linkages were determined by Monte Carlo simulations under the null hypothesis of random linkage via a gene-dropping algorithm in Merlin. We simulated 1,000 data sets conditional on the observed family structure, marker spacing, allele frequencies, and missing data pattern [Sawcer et al., 1997; Kruglyak and Daly, 1998]. A marker set with low LD of r2 < 0.1 was selected for each pair of markers (3,675 SNPs in AAs and 3,760 SNPs in EAs) to reduce computational burden. Regression-based linkage analysis, the same approach used for the observed data, was then carried out for each simulated dataset and the highest lod score for each chromosome was retained. The autosome-wide suggestive linkage threshold is defined as the maximum lod score expected once by chance per genome scan [Lander and Kruglyak, 1995], and is set as the 1,000th highest lod score out of 22,000 lod scores for each of the 22 autosomal chromosomes from the 1,000 simulations. The autosome-wide significant threshold is set as the 95th percentile of the 22,000 lod scores. The empirical “suggestive” and “significant” thresholds for AA (EA) are 1.77 (1.74) and 3.22 (3.19), respectively (Table II). The autosomal empirical significance of an observed lod score was determined by counting how often the entire genomehad a maximum lod score greater than or equal to the observed lod score from the above 1,000 simulated estimates.
The rate of affection with multiple forms of substance dependence was high among the family members recruited for our genetic studies of CD and OD. The rate of comorbidity (i.e., at least two SD disorders) was 80.1% and 88.5% in AAs and EAs, respectively. The break-down by the number of SD disorders is shown at the bottom of Table I. In our sample, the rate of OD was much higher in EAs than AAs (65.3% vs. 27.4%); the difference was not as great for AD (62.6% vs. 55.9%). CD was less common in EAs (81.4% vs. 85.7%). There were no other significant differences between the AA and EA samples on the characteristics listed in Table I. The complex comorbidity of SD disorders is depicted in a 3-D plot for dependence on the three illegal drugs, cannabis, cocaine, and opioids, regardless of alcohol and nicotine (Fig. 1). The complete dissection of all five studied SD disorders is shown in 3-D plots as well (Supplementary Fig. S1).
The fanny algorithm, with the number of clusters pre-selected to be two, resulted in two average silhouette values of 0.6259 and 0.6257 for the AAs and EAs, respectively, implying that the clustering patterns were very similar in the two population groups; subjects were clustered with modest similarity, on average, within the cluster each subject belonged to, compared to the other cluster. Mean (SD) of the cluster membership coefficients are 0.5014 (0.3370) in AAs and 0.5001 (0.3364) in EAs. The means are close to the theoretical 0.5 due to the constraint of summation to one for two clusters. For AAs, there are 475 subjects in one cluster, and 477 in the other. For EAs, subjects were equally clustered with 403 subjects in each cluster. The two clusters that incorporated the first two principal components of the five SDs explained 60.8% and 60.5% of the total phenotypic variation for AAs and EAs, respectively. The membership coefficient was used as a quantitative trait in the subsequent linkage analysis (Fig. 2).
Results are summarized in Table II. In EAs, we detected a genome-wide significant linkage signal in chromosome 4 with a peak lod = 3.31 at 68.3 cM (point-wise P = 0.00005, empirical genome-wide P = 0.038). The 1-lod score support interval centered on this linkage peak extends from 66.6 to 74.02 cM. A suggestive linkage signal in chromosome 21 was also observed with a peak lod = 2.37 at 19.4 cM. In AAs, two suggestive linkage peaks were observed in chromosome 10 with a peak lod = 2.66 at 96.7 cM and a peak lod = 3.02 at 147.6 cM; these flank our previously reported linkage peak near 117.2 cM for alcohol dependence in AAs. In addition, another suggestive linkage region was identified in chromosome 3 with a peak lod = 2.81 at 145.5 cM and in chromosome 9 with a peak lod = 1.93 at 146.8 cM.
In this linkage scan we identified several loci predisposing to comorbid dependence on multiple substances using a fuzzy clustering approach to derive a measure of common factors among the SD disorder phenotypes. This general measure of substance dependence was derived from five substance dependence traits including alcohol, cocaine, cannabis, opioid and nicotine, and explained about 60% of the total variability among the MSDs in the two US populations under study. We identified an autosome-wide significant linkage peak in EAs on chromosome 4q12 and obtained suggestive evidence for linkage with loci on chromosomes 3, 9 and 10 in AAs and on chromosome 21 in EAs. The two suggestive linkage peaks (peak locations at 147.6 and 96.7 cM) identified on chromosome 10 for the common component of MSD in AAs in the current study approximate our previously reported linkage signals for alcohol dependence on chromosome 10 at 117.2 cM in AAs [Gelernter et al., 2009] and at 137.7 cM in EAs [Panhuysen et al., 2010].
Linkage analysis using the derived measure of MSD as phenotype could increase power to detect shared risk loci due to pleiotropically severe affection, compared to analysis of an individual SD disorder, because each single SD does not fully reflect the clinical manifestation of these patients. The derived measure of MSD, which extracts the common component of the multiple phenotypes within each individual, reflects a more homogeneous trait corresponding to the underlying shared genetic risk loci. This measure was derived by fuzzy clustering. In comparison to hard clustering, fuzzy clustering preserves much more of the data structure and allows for diagnostic complexities often observed in real data. We pre-selected a solution with two clusters for the study based on the following considerations. First, if the two clusters could explain 100% of the five substance dependence traits, then the new clustering traits would be better phenotypes. For that reason, the key to selecting an appropriate number of clusters relied on the percentage of variation that the clusters could explain. In the exploratory stage, we observed over 60% variation in the five substance dependence traits could be explained by these two clusters. Second, the goal of implementing fuzzy clustering is to reduce the phenotypic dimensions such that the subsequent linkage analysis could be carried out in a conventional software package. We used the coefficients of fuzzy cluster membership as the trait for the subsequent linkage analysis. The membership coefficients of all clusters sum to one. A pre-selected number of clusters at two results in two membership coefficients such that one coefficient determines the other coefficient due to the constraint of summation to one, and this coefficient was used for the subsequent linkage analysis. Simulation studies showed that fuzzy clustering is more powerful than the principal component approach for multivariate continuous traits and more powerful than the joint linkage method [Mangin et al., 1998] in the presence of pleiotropy [Kaabi and Elston, 2003]. On the other hand, fuzzy clustering could be disadvantageous in studies of very large numbers of clusters because the amount of information may be overwhelming.
Three particularly promising candidate genes, GABRA4, GABRB1 and CLOCK, are located within or very close to the autosome-wide significant linkage region on chromosome 4. The two GABA receptor-encoding genes, GABRA4 and GABRB1, are located ~5 Mb upstream of the 1-lod support interval, and are part of a gene cluster encoding alpha 4, alpha 2 and gamma 1 subunits of the gamma-aminobutyric acid (GABA) alpha receptors, which are ligand-gated chloride channels that mediate inhibition in synaptic signal transduction in the mammalian central nervous system. Functional evidence shows that α4-containing GABAA receptors in the nucleus accumbens mediate alcohol intake [Rewal et al., 2009].
The CLOCK gene, located within the 1-lod support interval of the linkage peak on chromosome 4, encodes one of many proteins that regulate circadian rhythm. Addiction to various drugs has been found to disrupt circadian rhythms [Johanson et al., 1999; Schierenbeck et al., 2008; Acuna-Goycolea et al., 2010]. The CLOCK gene also plays an important role in regulating dopaminergic transmission and cocaine reward [McClung et al., 2005], and circadian-related genes are implicated in cocaine-induced locomotor sensitization in the day/night cycle [Abarca et al., 2002].
The autosome-wide significant linkage peak on chromosome 4 was observed in EAs only. Notably, association of alcoholism with GABRA2, about 10.7 Mb from the linkage peak we identified on chromosome 4 for the current study, has been documented and replicated in both EAs and AAs [Edenberg et al., 2004; Lappalainen et al., 2005; Covault et al., 2008; Enoch et al., 2009; Enochet al., 2010; Ittiwut et al., 2011]. A significant association of alcohol dependence with a haplotype block that extends from the intergenic region between GABRA2 and GABRAG1 to GABRG1 intron 3 was observed in EAs but not in AAs [Covault et al., 2008]. Covault et al.  also observed different comorbid MSD patterns in this study sample between AAs and EAs; for example, the percentage of the OD affecteds is much higher in EAs than in AAs (65.3% vs. 27.4%, Table I). This difference is likely due to the ascertainment scheme and not due to inherent differences between populations. Thus, the distinct comorbid MSD pattern or ascertainment differences in EAs and AAs might contribute to the linkage peak being identified only in EAs.
In summary, we identified an autosome-wide linkage peak for MSD in EAs and several suggestive linkage peaks for the two major US populations based on a dense SNP map. Replication of the linkage findings in other samples, including those from other populations, is warranted, as is a focused analysis of the genes located in the linkage regions implicated here. These novel linkage regions can provide useful information in the search for genes involved in susceptibility to MSD, perhaps for targeted sequencing studies to find both disease associated common and rare variants.
The authors thank the volunteer families and individuals who participated in this study. The Center for Inherited Disease Research (CIDR) genotyped most of the array-based linkage SNPs. CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University (contract number N01-HG-65403). Additional genotyping was supported in part by a Yale CTSA and NIH Neuroscience Microarray Consortium award, U24 NS051869-02S1. We are grateful to Ann Marie Lacobelle and Greg Kay for their excellent technical assistance. This study was supported by National Institute on Drug Abuse (NIDA) grants K01 DA24758, R01 DA12690, R01 DA12849, R01 DA18432, R01 AA11330, and the VA Connecticut REAP center, a VA MERIT grant, and the VA Connecticut MIRECC Center.
Additional supporting information may be found in the online version of this article.
Conflict of interest: Drs. Yang, Han, and Gelernter report no competing interests. Dr. Kranzler reports consulting arrangements with Alkermes Inc, Ortho-McNeil Pharmaceuticals, Elbion Pharmaceuticals, Solvay Pharmaceuticals, Sanofi-Aventis Pharmaceuticals, and Gilead Sciences, Inc. and has received research support from Merck and Company, Bristol-Myers Squibb Company, and Ortho-McNeil Pharmaceuticals. Dr. Farrer received a research grant from Eisai Pharmaceuticals and consultant fees from Novartis Pharmaceuticals.