|Home | About | Journals | Submit | Contact Us | Français|
Cocaine and opioid dependence are common, complex disorders with high heritability that commonly co-occur with other substance dependence disorders. Improved insight into the genetic basis of substance dependence would help elucidate its etiology and could inform its prevention and treatment. To generate new hypotheses about the genetics of substance dependence, we genotyped 5633 tagging single nucleotide polymorphism (SNP) markers in 1699 subjects from 339 African American (AA) families and 334 European American (EA) families ascertained through a sib pair meeting DSM-IV criteria for either cocaine or opioid dependence. The associations between genetic markers and five substance dependence traits (cocaine dependence, opioid dependence, cocaine-induced paranoia, alcohol dependence, and nicotine dependence) were assessed by family based association tests (FBAT). Results were ranked according to several criteria including statistical significance, concordance of results across population samples, and potential biological relevance of the implicated gene. The top-ranked result was an association of SNP rs1133503 in the MANEA gene with cocaine-induced paranoia (CIP). Our study provides an initial substance dependence trait-specific blue-print of associated regions for future candidate gene studies.
Chronic use of psychoactive drugs can result in a dependence syndrome, the central element of which is impaired control over substance use, but which may also include tolerance and physical dependence symptoms (APA 1994). Substance dependence poses serious medical, legal, and social risks to those affected (and sometimes to others), and is thus an important public health issue. In one study, the overall global burden of disease (GBD) due to the use of all psychoactive substances was estimated to be 8.9% of disability adjusted life years (DALYs), to which tobacco and alcohol contributed 4.1% and 4.0%, respectively, while illicit substances including cocaine and opioid were responsible for 0.8% (WHO 2004).
Although the etiology of substance dependence is poorly understood, evidence from adoption and twin studies implicates a moderate-to-strong role for genetic factors (Tsuang et al. 1996; Kendler and Prescott 1998; True et al. 1999; Karkowski et al. 2000; Kendler et al. 2000; Kendler et al. 2003). A genome-wide linkage scan detected regions harboring genes for cocaine dependence (CD) on chromosomes 3 and 10, and for cocaine-induced paranoia (CIP) on chromosome 9 (Gelernter et al. 2005). Regions on chromosome 17 showed significant evidence of linkage to opioid dependence (OD) in a recent study (Gelernter et al. 2006a), and chromosome 14q showed suggestive linkage for OD in an ethnically mixed population (Lachman et al. 2007). By comparison, a larger number of studies focused on alcohol dependence (AD) have identified linkage to several regions on chromosomes 1, 2, 4, 7, and 11 (reviewed by Dick and Beirut, 2006). Recent linkage studies showed several chromosomal regions to be potentially linked to nicotine dependence (ND) or smoking behavior traits, including 10q, 7q and 11p in the Finland twins cohort (Loukola et al. 2007) and locations on chromosomes 1–14 and 16–21 in European and African American samples (Li. 2008; Li et al. 2007; Li et al. 2006; Gelernter et al. 2004; Gelernter et al. 2007). A recent genome wide linkage scan of quantitative traits (QT) for AD and illicit drug (including cannabis) dependence conducted in informative pedigrees from the Collaborative Study on the Genetics of Alcoholism (COGA) identified several novel linkages, the most significant of which (multipoint LOD score of 3.7 for an AD QT) did not exceed the threshold for genome wide significance (empirical p-value = 0.06) (Agrawal et al. 2007). Case-control and family-based studies targeting candidate genes selected because of known or hypothesized biological roles have reported and confirmed associations with GABRA2 (Covault et al. 2004; Lappalainen et al. 2005; Agrawal et al. 2006; Fehr et al. 2006; Soyka et al. 2008), ADH4 (Edenberg et al. 2006; Guindalini et al. 2005; Luo et al. 2006; Luo et al. 2005a) and CHRM2 (Luo et al. 2005b; Wang et al. 2004) for AD, with DDC (Ma et al. 2005; Yu et al. 2006; Zhang et al. 2006a), the TTC12-ANKK1-DRD2 gene cluster (Gelernter et al. 2006b; Yang et al. 2007), and several cholinergic nicotinic receptor genes (most notably CHRNB3 and CHRNA5) (Saccone et al. 2007; Berrettini et al. 2008) for ND and related traits, and with OPRM1 (Kranzler et al. 1998; Bart et al. 2004; Luo et al. 2003; Zhang et al. 2006b) for dependence on various substances.
Undoubtedly, the few robust associations identified thus far represent only a small portion of the genes involved in the etiology of substance dependence. With the completion of the human genome sequence, SNP consortium and HapMap projects, and advances in genotyping arrays, genome wide association (GWA) studies have the potential to pinpoint many more genes for substance dependence traits. Recently a GWA study using sample pooling and 2.4 million SNPs in a population-based sample suggested several novel genes possibly associated with ND, including NRNX1 and VPS13A (Bierut et al. 2007). However, high density scans are still expensive, and the results from a single moderately-powered sample are not routinely confirmed, especially in samples from other populations. In this study, we carried out GWA scans using a low-density SNP chip array for four major substance dependence disorders (CD, OD, ND and AD) in a family-based cohort comprised of two distinct population samples with the goal of generating a trait-specific blue-print of associated regions for future candidate gene studies.
A total of 673 families were recruited at four clinical sites in the United States: Yale University School of Medicine (APT Foundation; New Haven, CT), the University of Connecticut Health Center (Farmington, CT), McLean Hospital (Harvard Medical School; Belmont, MA), and the Medical University of South Carolina (Charleston, SC). Subjects gave informed consent as approved by the institutional review board at each clinical site, and a certificate of confidentiality for the work was obtained from the National Institute on Drug Abuse. Families were ascertained through affected sibling pairs (ASPs) that met DSM-IV criteria for cocaine or opioid dependence (APA 1994), as previously described (Gelernter et al. 2005; 2006a). Probands were excluded from further study if diagnosed with a major psychotic illness (e.g., schizophrenia or schizoaffective disorder). Other family members of the ASPs were recruited if available, regardless of affection status. Subjects were classified as African American (AA) or European American (EA) based on a Bayesian model-based clustering method using genetic marker information as previously described (Gelernter et al. 2007). For all primary analyses, the population groups were treated as independent samples. Characteristics of the sample are described in Table 1.
Subjects were interviewed using a computerized version of the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA), an instrument eliciting detailed clinical and behavioral information to which scoring algorithms can be applied to derive DSM-IV diagnoses for substance dependence and other psychiatric traits (Gelernter et al. 2005; Pierucci-Lagha et al. 2005; Pierucci-Lagha et al. 2007). Accordingly, diagnoses of CD, OD, ND and AD were established based on the subject meeting three or more of the following seven criteria during a 12-month period: tolerance; withdrawal; taking the substance in larger amounts or over a longer period than was intended; persistent desire, or unsuccessful efforts to cut down or control substance use; a great deal of time spent in activities necessary to obtain, use, or recover from substance use; giving up or reducing important social, occupational, or recreational activities; continuing substance use despite persistent, or recurrent physical or psychological problems. Subjects who met criteria for substance abuse were considered as unknown and hence excluded from analysis of that trait. Subjects who reported symptoms of paranoia during cocaine intoxication were diagnosed as affected with CIP. An ordinal score between 0–10 for the Fagerstrom Test for Nicotine Dependence (FTND) (Heatherton et al. 1991; Fagerstrom 1978) was calculated based on the subject’s responses to 6 questions about smoking behavior. Although the majority of subjects in this study were dependent on two or more substances, the correlations for comorbidity for all pairs of traits are modest (see Table 2). Clinical information was unavailable for 137 subjects (108 parents and 29 sibs, 71 EAs and 66 AAs) who provided a blood specimen. Although these individuals were classified as missing for all phenotypic traits, they were included in the analyses because they provide information about transmission of marker alleles.
A total of 6,008 single nucleotide polymorphisms (SNPs) were genotyped at the Center for Inherited Disease Research (CIDR) using the Illumina Linkage IVb Marker Panel (http://www.cidr.jhmi.edu). Data for 36 markers (0.6% of the total) were not provided by CIDR because they displayed excessive replicate or Mendelian errors, had more than 50% missing data, or were monomorphic (i.e., all individuals homozygous for the same allele). The average rate of missing data among the remaining markers was 0.10%. We limited our analysis to 5,633 autosomal markers because the power for detecting association with X-linked markers in a family-based design without parental genotype information is comparatively low (Ding et al. 2006; Chung et al. 2007). On average, these markers were spaced 518 kb apart and had a heterozygosity of 0.405.
Mendelian inconsistencies were detected by the PedCheck program (O’Connell and Weeks 1998). A total of 248 genotyping inconsistencies were identified out of 9,570,467 assays for all DNA samples from 1699 subjects (i.e., <0.003% of all assays) and these results were excluded from all subsequent analyses. Three hundred forty-seven SNPs were excluded from further analysis because they were not informative in either population sample (i.e., minor allele frequency [MAF] less than 0.1). Consistency with Hardy–Weinberg equilibrium (HWE) expectations for each SNP was tested using a χ2 test in each ethnic group using a set of 673 unrelated subjects (one random subject from each family). Another 46 SNPs with significant evidence of deviation from HWE (p value < 0.001) were then excluded from further analysis. Association of the remaining 5,240 SNPs with each trait was evaluated using the Family Based Association Test (FBAT) program (Horvath et al. 2001) assuming an additive model under the null hypothesis of no linkage and no association. FTND score was adjusted for potential confounding factors (i.e., age and sex) in each population sample by computing standardized residuals using SAS (version 9.0).
Two criteria were applied to the test results to screen for potential candidate genomic regions associated with each trait. First, the SNP had to show significant evidence of association with the same trait (marginal p-value <0.05) in both the AA and EA samples. Second, the pattern of association of the SNP with the trait had to be the same in both samples. FBAT was performed on the combined group of AA and EA families only for those findings that were corroborated in the two independent samples.
Figure 1 shows the distribution of the MAF to be nearly uniform between 0.1 and 0.5 in the AA sample, whereas more than two-thirds of the SNPs had a MAF of at least 0.4 in the EA sample, an observation consistent with the fact that the SNP array was designed primarily based on information content in EAs. Each SNP was assessed for association with each of the six substance dependence traits. Complete results (i.e., each SNP and each trait in each ethnic group) are available at http://genetics.bumc.bu.edu/study_results.
A total of 13 SNPs (0.25% of the total) showed significant evidence for and similar patterns of association with CD in both AA (0.0009 < p < 0.05) and EA (0.007 < p < 0.05) families (Table 3). In the pooled sample of families, the strength of association for each of these SNPs achieved a level of significance of 0.005 or lower. Six of these 13 SNPs are located within known gene sequences or 5’promoter regions, and the most significant of these was rs1381355, which is located in the MGC48628 gene on chromosome 4 (p = 0.0002 in the combined sample).
Eight SNPs were significant in both groups for OD (Table 4). Three of these SNPs on chromosome 2 and one SNP on chromosome 20 are far from any known gene sequences. The most significant result among the gene-based SNPs was rs770124 in NAV3 on chromosome 12 (p = 0.0003 in the combined sample).
A slightly larger number of families were informative for ND than for CD or OD (Table 5). Six SNPs met our two selection criteria but only two (rs1886040 and rs1062935) are located within genes. A completely separate set of six SNPs met significance criteria for the smoking-related quantitative trait FTND (Table 6). Three of these SNPs are located in gene sequences on chromosome 5, 6 and 18. Of note, the genes tagged by the SNPs on chromosomes 6 (PLEKHG1) and 18 (PLEKHE1) encode pleckstrin homology domain proteins.
Only three SNPs were significantly associated with AD in both groups, and all of them are located in genes either on chromosome 1 (PDE4B and TBX19) or chromosome 9 (CCRK) (Table 7).
The most significant result in the entire study (p=0.00005 in the combined sample) was the association of CIP with rs1133503 which is located in MANEA (Table 8). Three of the other eight SNPs significantly associated with CIP are also located in genes (DNAH8, NARG2, TYK2).
To our knowledge, this is the first GWA study for multiple substance dependence traits performed simultaneously in two cohorts from distinct populations. Recognizing the limited power of our sample ascertained through and comprised primarily of sib pairs affected with CD or OD, our goal was to take advantage of the combined sample to generate new hypotheses about the genetic basis of substance dependence by identifying and prioritizing candidate genes or chromosome regions implicated in both samples as targets for further investigation.
The most significant finding in this study was the association of CIP with rs1133503 (p = 0.00005 in the combined sample), which is located in the 3′UTR of the α-endomannosidase (MANEA) gene. Absence or defective function of lysosomal α-endomannosidase could result in accumulation of undegraded mannose-rich oligosaccharides that can induce progressive neurologic deterioration and premature death (Crawley and Walkley 2007). From a biological perspective, the most noteworthy finding among the significant results for CD was with rs8929 (p=0.003 in the combined sample). This SNP is located in the 3′UTR of the gene encoding synaptotagmin XIII (SYT13), which belongs to a family of proteins serving as calcium sensors in facilitation and asynchronous neurotransmitter release (Saraswati et al. 2007). These calcium sensors regulate baseline synaptic transmission and short-term synaptic plasticity, and may play a key role in the etiology of substance dependence. We also detected significant association (p=0.0003 in the combined sample) of OD with a SNP (rs770124) in the neuron navigator 3 (NAV3) gene; neuron navigators are expressed predominantly in the nervous system and involved in axon guidance (Maes et al. 2002). A SNP (rs8688) in the TTC9 gene on chromosome 14 was significantly associated with OD in our study (p=0.003 in the combined sample) and is located approximately 5 cM from a linkage peak for OD in an ethnically mixed sample from New York City (Lachman et al. 2007). TTC9 encodes tetratricopeptide repeat domain 9, a hormonally regulated protein whose function is not yet clear. We have shown previously that another TTC gene, TTC12, is associated to nicotine (Gelernter et al, 2006b) and alcohol (Yang et al, 2007) dependence.
Our GWA study identified association of FTND with two SNPs from unlinked genes, PLEKHG1 on chromosome 6 and PHLPP (a.k.a. PLEKHE1) on chromosome 18, which encode homologous proteins involved in cell signaling. Each of these genes contains a pleckstrin homology (PH) domain, which plays a key role in cell signaling and cytoskeletal regulation by binding to phosphoinositides (Harlan et al. 1994). PHLPP is involved in the selective termination of PI3K/Akt signaling pathways (Brognard et al. 2007), which could be activated by nicotine via the nicotinic acetylcholine receptors (Carlisle et al. 2007). PHLPP is located under a broad linkage peak for a smoking-related quantitative trait in an independent sample of EA and AA families (Li et al, 2008). A recent study showed that a mutation in the PH domain of PLEKHG5, another member of the PLEKHG family, causes lower motor neuron disease (Maystadt et al. 2007). According to the UniGene expression profile database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=unigene), PLEKHG1 and PHLPP are both expressed in the brain and peripheral nervous system. It is possible that variants or isoforms of these PH–domain-containing proteins have an impact on the cell signaling pathway that regulates neuronal plasticity, and thus could influence predisposition to ND.
The use of GWA is increasingly recognized as a promising approach to identify common genetic variants that contribute substantially to the risk of human disease (Risch and Merikangas 1996; Kruglyak 1999; Hirschhorn and Daly 2005; Christensen and Murray 2007), and there is an impressive list of robust associations for several complex disorders (The Welcome Trust Case Control Consortium 2007). As discussed above, the results from our study that were strongest statistically also make sense biologically, which is encouraging. Nonetheless, highly significant genetic association findings for complex traits are not often replicated, and thus must be interpreted cautiously (Colhoun et al. 2003; Tabor et al. 2002). In response to this problem, an expert panel (Chanock et al. 2007) suggested several criteria for establishing replication of genetic associations including: (1) replication studies should be conducted in independent data sets of sufficient sample size to distinguish convincingly the proposed effect from no effect; (2) the same or a very similar phenotype should be analyzed; (3) similar magnitude of effect and significance should be demonstrated with the same SNP or SNP in high linkage disequilibrium with the prior SNP; and (4) a joint or combined analysis should lead to a smaller p-value than that seen in the original report. Two aspects of our study address these guidelines. First, because our results were obtained from family-based samples and by comparing allele transmission rates, they are unlikely to be caused by stratification within a population group. Second, our criteria for selecting SNPs or genes for further consideration included significant results in both population samples with the same pattern of association. Consistent results from independent samples of distinctive genetic background not only lessen the concern that the results are due to chance, but also increase the likelihood that the association is generalizable. In addition, we took advantage of a rich dataset containing detailed information on dependence on several psychoactive drugs (for which diagnosis has been shown to be reliable), conducting a simultaneous search for potential candidate genes influencing several substance dependence traits. The benefit of a single large and well-characterized population was recently demonstrated in a GWA study of seven common diseases in a British population (The Welcome Trust Case Control Consortium). Similarly, our findings offer a set of candidates for future genetic studies of substance dependence traits.
To avoid high genotyping cost and multiple testing problems, GWA studies often follow a staged design, in which a large number of markers are genotyped in a portion of the sample in the first stage, and a relatively small number of markers showing association in the discovery dataset are genotyped in the remainder of the sample in the second stage. Association test findings in the second stage are usually considered to be a replication. However, in spite of the recommendations for stringent significance levels in the discovery sample, Skol et al. 2006) demonstrated that analysis of a single undivided dataset often has greater power to detect association than the two-stage design. Although our GWA study included two datasets derived from a single study population and thus appears to conform to the staged design, we treated the datasets as independent discovery samples since they are genetically distinct and thus may have some unique genetic associations with substance dependence (which we had to forgo identifying owing to our requirement for significance in each dataset individually). We capitalized instead on the opportunity to replicate findings within the discovery sample.
Our results should be interpreted cautiously in light of several limitations of our study design. First, we analyzed 5,240 SNPs, a number that is much smaller than contemporary high density GWA studies and insufficient to cover most gene regions. Many genes influencing risk to substance dependence traits were probably not detected because the SNP array panel in our study included SNPs from fewer than 10 percent of all known genes. Second, the FBAT approach is one of the most conservative methods for genetic association analysis and is less powerful than methods used in population-based designs due in part to families that are uninformative for the transmission component of the association test (Van Steen et al, 2005). Third, none of the results in our study would be considered significant after adjustment for multiple comparisons using a Bonferroni correction (threshold p = 0.05/5240 = 0.00001). However, since all of the results proposed for follow-up required evidence for association in each data set, this correction is probably overly conservative. Moreover, given our requirement that a result attain a p-value of 0.05 in both population samples to be considered significant, the expected number of findings for a trait surpassing this threshold would be six (i.e., (0.05)2 * 5240 * 0.5) assuming a one-tailed test. Seven or more significant results were obtained with CD, OD and CIP. Finally, our selection criteria ignored potential true associations that are evident in only one population. Population-specific associations may account for lack of correspondence in the same dataset of the association signals reported here with linkage peaks for these traits, each of which was found in only one population (Gelernter et al. 2005; Gelernter et al. 2006a; Gelernter et al. 2007). Given that the purpose of this study was hypothesis generation rather than hypothesis testing, the latter two concerns would be lessened by follow-up studies involving more detailed analysis of candidate genes and testing in additional populations.
In summary, our GWA study identified several novel candidate genes for six substance dependence traits in sets of families from two distinct populations. This illustrates the merits of a GWA approach using distinct population samples in the discovery (i.e., hypothesis generating) stage. The results of this approach will encourage future investigations of the identified associations using this and other datasets.
This work was supported by NIH (NIDA) grants R01 DA12690, R01 DA12849, K24 DA15105, K24 DA022288, and K24AA013736. Dr. Kathleen Brady (Medical University of South Carolina) provided valuable patient recruitment and supervision at the MUSC site. We thank John Farrell for database management assistance. Genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University contract number N01-HG-65403.