Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Am J Med Genet B Neuropsychiatr Genet. Author manuscript; available in PMC 2014 February 12.
Published in final edited form as:
PMCID: PMC3922200

Pooled association genome scanning for alcohol dependence using 104,268 SNPs: Validation and use to identify alcoholism vulnerability loci in unrelated individuals from the Collaborative Study on the Genetics of Alcoholism


Association genome scanning can identify markers for the allelic variants that contribute to vulnerability to complex disorders, including alcohol dependence. To improve the power and feasibility of this approach, we report validation of “100k” microarray-based allelic frequency assessments in pooled DNA samples. We then use this approach with unrelated alcohol dependent vs control individuals sampled from pedigrees collected by the Collaborative Study on the Genetics of Alcoholism (COGA). Allele frequency differences between alcohol-dependent and control individuals are assessed in quadruplicate at 104,268 autosomal SNPs in pooled samples. One hundred eighty eight SNPs provide 1) the largest allele frequency differences between dependent vs control individuals, 2) t values ≥ 3 for these differences and 3) clustering, so that 51 relatively small chromosomal regions contain at least three SNPs that satisfy criteria 1 and 2 above (Monte Carlo p=0.00034). These positive SNP clusters nominate interesting genes whose products are implicated in cellular signaling, gene regulation, development, “cell adhesion” and Mendelian disorders. The results converge with linkage and association results for alcohol and other addictive phenotypes. The data support polygenic contributions to vulnerability to alcohol dependence These SNPs provide new tools to aid the understanding, prevention and treatment of alcohol abuse and dependence.


Substance abuse vulnerabilities are complex traits with strong genetic influences documented by family and twin studies (Cadoret and others 1986; Cadoret and others 1995; Goldberg and others 1993; Grove and others 1990; Gynther and others 1995; Kaprio and others 1982; Karkowski and others 2000; Kendler and others 1999; Kendler and Prescott 1998; Maes and others 1999; Merikangas and others 1998; Tsuang and others 1996; Tsuang and others 1998; Uhl 1999; Uhl and others 1995; Uhl and others 1997; Woodward and others 1996). Much of the genetic vulnerability to abuse of different legal and illegal addictive substances is shared; many abusers use multiple addictive substances (Karkowski and others 2000; Kendler and others 1999; Kendler and Prescott 1998; Tsuang and others 1999; Tsuang and others 1998). Identifying the allelic variants that contribute to vulnerability to alcohol dependence and comparing them to the variants that predispose to other addictions can improve understanding of human addictions and assist efforts to match vulnerable individuals with the prevention and treatment strategies most likely to work for them.

Association genome scanning can help determine which chromosomal regions and genes contain allelic variants that predispose to dependence on alcohol and other substances. This approach does not require family member participation, gains power as genomic marker densities increase (Cervino and Hill 2000; Risch and Merikangas 1996; Schork and others 2000; Sham and others 2000), identifies smaller chromosomal regions than linkage-based approaches, fosters pooling strategies that preserve confidentiality and reduce costs (Barcellos and others 1997; Germer and others 2000; Hacia and others 1999; Uhl and others 2001) and provides ample genomic controls that can minimize the chances of unintended ethnic mismatches between disease and control samples. We have used these approaches to assess allelic frequencies at 1494 and then 11,522 SNPs in unrelated control vs polysubstance abusing individuals who report dependence on at least one illegal substance (Liu and others 2005; Uhl and others 2001). SNPs that displayed nominally “reproducibly-positive” allele frequency differences between abuser and controls in both European- and African-American samples (Uhl and others 2001) cluster closer to each other and to positive markers from linkage studies of addictions than anticipated by chance (Uhl 2004; Uhl and others 2002). However, this density of SNP markers provides information about possible associations with addiction for only a modest number of the blocks of restricted haplotype diversity found in these subjects’ genomes. Association genome scanning has not yet been employed to study alcohol dependence, to our knowledge.

Unrelated individuals sampled from pedigrees collected by the Collaborative Study on the Genetics of Alcoholism provide an interesting sample for this approach for several reasons. Dependence on alcohol and other substances has been carefully characterized in these individuals using validated instruments. Unrelated control individuals free from substance abuse or dependence diagnoses, largely individuals who marry into these pedigrees, are available. Linkage work with these pedigrees has identified a number of interesting loci (Bierut and others 2002; Reich and others 1998). In addition, Genetics Analysis Workshop data provides individual genotypes for over 14,000 SNPs for a subset of COGA individuals (

“100k” SNP microarrays (Centurion ™, Affymetrix) use size-selected PCR products of genomic restriction fragments that have been ligated to universal linker sequences and amplified using single PCR primer pairs. Early access versions of these arrays allow assessment of 104,268 SNPs that can be localized to autosomes and display minor allele frequencies ≥ 2%. These arrays thus allow studies of many more SNP markers for more unrelated individuals than previously available. The data also overlaps with genotypes obtained in some of these same individuals as part of the Genetics Analysis Workshop, providing a rich set of comparisons of individual vs pooled genotypes that help validate use of pooling with these samples.

We thus now report validation and use of pooled association genome scanning using 100k arrays hybridized with size-selected amplicons from end-ligated Xba I and Hind III genomic DNA restriction fragments of pooled genomic DNAs. DNAs come from unrelated COGA individuals who report 1) dependence on alcohol vs 2) control individuals free from any alcohol dependence, largely those who have married into these pedigrees. We use this approach to generate more than 29 million person/genotype equivalents, determined in quadruplicate. We discuss the convergence that these results provide with association and linkage studies for alcohol and other addictive phenotypes, the genetic architecture for alcohol dependence that the results support, the classes of candidate genes that they nominate for roles in human alcohol dependence and the implications of these findings for pooled association genome scanning approaches to complex genetic disorders.

Materials and Methods

Research volunteers

We searched COGA pedigrees to identify unrelated individuals who displayed phenotypes 2 – 3 (“pure unaffected” or “unaffected with some symptoms”) or 4 (“affected” e.g. alcohol dependent). We identified 120 unrelated alcohol dependent individuals and 160 unrelated unaffected controls who self-reported European-American ethnicities. Information was available for Genetics Analysis Workshop (GAW) genotypes for 120 of these individuals. DNAs from these 120 individuals were placed into four of the control pools and 2 of the pools of alcohol dependent individuals. DNAs from other COGA subjects who were not included in GAW formed eight additional pools.

Genomic DNA

was prepared from lymphoblastoid cell lines (Corriel Institute), requantitated by spectrophotometry, picogreen and Heochst dye fluorescence and diluted to 10 ng/μl. Validation studies compared: 1a) allelic determinations from individual CEPH DNAs vs 1b) results from pools (n = 2) of the same DNAs and 2a) allelic determinations from individual COGA DNAs vs 2b) results from pools (n = 20) of the same DNAs. Other validation studies examined pool to pool variation and test-retest variation for each pool tested on four different sets of microarrays.

Allelic frequencies in polysubstance abusers and controls

were compared using pools made by carefully combining equal amounts of DNA from 20 individuals of the same phenotypes. We used hybridization probes prepared from genomic DNA as described (Affymetrix Genechip Mapping Assay Manual) with precautions to avoid contamination. 50 ng of pooled genomic DNA was digested by Xba I or by Hind III, ligated to appropriate adaptors, amplified by PCR using 3 min 95°C hot start, 35 cycles of 20 sec 95°C/ 15 sec 59°C/15 sec 72°C and a final 7 min 72°C extension. PCR products were purified (MinElute 96 UF kits, Qiagen, Valencia, CA), digested for 30 min with 0.04 unit/μl DNase I to produce 30-200 bp fragments, end-labeled using terminal deoxynucleotidyl transferase and biotinylated dideoxynucleotides and hybridized to 100k arrays (Centurion, Affymetrix) which were stained and washed as described (Affymetrix Genechip Mapping Assay Manual) using immunopure strepavidin (Pierce, Milwaukee, WI), biotinylated antistreptavidin antibody (Vector Labs, Burlingame, CA) and R-phycoerythrin strepavidin (Molecular Probes, Eugene, OR). Arrays were scanned and fluorescence intensities quantitated using an Affymetrix array scanner as described (Uhl and others 2001).

Chromosomal positions for each SNP were sought using NCBI and NETAFFYX (Affymetrix) data. Allele frequencies for each SNP in each DNA pool were assessed based on hybridization intensity signals from four arrays, allowing assessment of hybridization to the 20 “perfect match” cells on each array that are complementary to the PCR products from alleles “A” and “B” for each diallelic SNP. Each array was analyzed as follows: 1) “Background” values, the average fluorescence intensity from the 5% of cells with the lowest values, were subtracted from the fluorescence intensity of every cell. 2) Background-subtracted values were normalized by division by the average value obtained from the 5% of cells with the highest values. 3) Normalized hybridization intensities from the 20 array cells that corresponded to the perfect match “A” and “B” cells for each SNP were averaged. 4) “A/B ratios” were determined by dividing average normalized A values by average normalized B values. 5) Arctangent transformations were applied to each ratio to aid combination of data from arrays hybridized and scanned on different days. 6) Average arctan values from the 4 replicates of each experiment were determined. 7) Mean and standard deviations of average arctan values for each diagnostic group were calculated. 8) SNPs that displayed any of three criteria were eliminated from further analyses: i) SNPs with minor allele frequencies < 0.02, determined using Affymetrix data from analyses of European-American chromosomes; ii) SNPs on sex chromosomes and iii) SNPs whose chromosomal positions could not be adequately determined 9) For the remaining 104,268 SNPs, mean arctan A/B ratios for abusers were divided by mean arctan A/B ratios for controls for the rest of the SNPs to form abuser/control ratios. 10) A “t” statistic for the differences between abusers and controls was generated using the formula:


where Xabuser and Xcontrol are means of “arctan A/B” values for pools of the same diagnostic group, nabuser and ncontrol are number of pools in corresponding diagnostic group and σ2 is the variance of the mean of arctan A/B values for pools of the same diagnostic group.

Although there is no universally-accepted method for analyzing association genome scanning data, we used a preplanned analysis (for favorable mention of similar approaches see (Bansal 2001)). We identified SNPs with abuser/control ratios in the top or bottom 2.5% of all abuser vs control comparisons that also displayed t statistics ≥3 for the abuser vs control differences. We then sought evidence for clustering of these SNPs by focusing on chromosomal regions in which at least three of these outlier SNPs lay within 1 Mb of each other; we note that this somewhat arbitrary distance may or may not reflect the entire extent of long range linkage disequilibrium, which varies from chromosomal region to chromosomal region. We term these clustered, nominally-positive SNPs “clustered positive SNPs”, and focus our analyses on regions in which they lie (Table I).

Table thumbnail

To seek convergence between current and other association data, we compared the locations of the current clustered positive SNPs with SNPs that met criteria for reproducibly-positive association in analyses of 1) European-American and African-American NIDA samples ((Uhl and others 2001), Liu and others, 2005; Liu, Uhl and others in preparation) and 2) Japanese unrelated methamphetamine dependent and control individuals sampled from the Japanese Genetics Initiative on Drug Abuse (JGIDA), (Drgon et al submitted).

We assessed the statistical power of our analyses. We used 1) the observed control or abuser pool-to-pool ,standard deviations from the current datasets 2) the mean abuser/control differences for the SNPs that provided the largest abuser/control differences from the current datasets 3) α = 0.05 4) sample sizes from the current datasets 5) abuser/control ratios from the current dataset and the program PS v2.1.31 (Dupont and Plummer 1990),

Observed results were compared to those expected by chance using 100,000 Monte Carlo simulation trials that sampled from a Microsoft SQL server database that contained the results from the current study: 14 pools × 4 arrays/pool × 20 perfect match cells/array/SNP × 104,268 SNPs = 116,780,160 cells (1120 cells/SNP) (see also (Uhl and others 2001). For each of 100,000 simulation trials, a randomly-selected set of SNPs was chosen and the same procedure that had been followed for the actual data was run. The number of trials for which the results from the randomly-selected set of SNPs matched or exceeded the results actually observed from the SNPs identified in the current study was tabulated. Empirical p values were calculated by dividing the number of trials for which the observed results were matched or exceeded by the total number of Monte Carlo simulation trials performed. Since this method examines the properties of the SNPs in the current dataset, it should be relatively robust in the face of a number of features that include the uneven distribution of Affymetrix SNP markers across the genome.

To provide insights into some of the genes that we nominate for further study since they might harbor variants that contribute to individual differences in addiction vulnerability, we sought an identifiable candidate gene(s) for each cluster of positive SNPs. We selected candidate gene nominees when multiple clustered-positive SNPs lay 1) within the gene or 2) in 3’ or 5’ flanking sequences that were contained on a block of high restricted haplotype diversity along with exon sequences from that gene. We defined the blocks of high restricted haplotype diversity using Haploviewer and data from CEPH individuals. Clusters that did not identify genes that meet these criteria are annotated as “intergene” in Table 1. To assess the nominal false discovery rates for these genes, we obtained the joint false discovery rates for the clustered positive SNPs based on their individual q values, derived from the 104,268 t values and QVALUE software (Storey 2002; Storey and Tibshirani 2003). To provide one of several possible controls for the possibility that observed abuser-control differences might reflect occult stratification and correspondingly different allelic frequencies at the SNPs that display these abuser/control differences, we note the SNPs for which European-American vs African American ethnicity difference scores from MNB/NIDA control individuals (Liu et al, submitted) lie in the outlying 2.5% of all such differences (Table 1.)


There were 122,828 SNPs assigned to chromosomes 1-22 that were assessed using these arrays. 104,268 of these SNPs displayed minor allele frequencies of >2% in European-American samples (Netaffyx), could be assigned reasonably accurate chromosomal locations and were thus used for subsequent analyses.

Pooled genotyping using 100k arrays displays features that support the validity of our results. Regression analyses examined the relationships between a) “observed allele ratios”, background-subtracted, normalized, arctangent transformed hybridization intensity ratio values obtained from six pools of COGA DNAs and b) “expected allele ratios” the fraction of A and B alleles obtained from individual genotypes obtained for 7393 SNPs from these same individuals in work performed for the Genetics Analysis Workshop using Affymetrix “10k” arrays. Pearson correlation coefficients were 0.91 – 0.92 for each of these pools (p<0.001 for each). There were 31 “quality control” SNPs that were assessed using both XbaI and HindIII arrays. Correlations between the intensity ratios for these SNPs yielded a Pearson correlation coefficient of 0.94.

Abuser/control hybridization ratios for the 104,268 SNPs examined here fell into nearly-Gaussian distributions with mean values close to one (Fig. 1). There was modest variability of these assessments. Mean arctan A/B allele hybridization ratios for all SNPs assessed here +/− standard errors of the mean (SEM) for pool-to-pool differences were 0.79 +/− 0.028 for abusers and 0.79 +/− 0.024 for controls. SEMs for the four replicate arrays that assessed each sample were 0.036 and 0.034 for abusers and controls, respectively.

Figure 1
Main axes: Abuser/control ratios to the chromosomal position of each SNP for COGA alcohol dependent and control individuals. The positions of the SNPs whose data yield outlier abuser/control values are indicated by larger symbols.

For analyses, we selected 1) the 5216 candidate positive markers that represented the 2.5% of SNPs with greatest and the 2.5% of SNPs with the smallest abuser/control ratios and 2) the 1474 SNPs for which abuser/control differences yielded t values ≥ 3. Six hundred sixty seven SNPs satisfied both of these criteria; we note that these two criteria are neither totally dependent nor totally independent of each other, and we term the SNPs that satisfy both criteria “candidate positive SNPs”.. Chance findings of 667 SNPs that satisfy both criteria is rare. We performed 100,000 Monte Carlo simulations, each of which a sampled a random set of 5216 SNPs. None of these simulation trials identified as many as 667 randomly-selected SNPs that shared the properties (abuser/control differences and t values for these abuser control differences) found in the true results of these experiments, yielding Monte Carlo p < 0.00001.

These candidate positive SNPs clustered together in ways that would also not be expected by chance, although such clustering would be expected if they identified loci that contain allelic variants that distinguished alcohol-dependent subjects from control subjects. Three hundred sixty two of these 667 candidate positive SNPs lay in 138 clusters in which they were positioned within 1Mb of at least one other candidate positive SNP (p=0.03715). One hundred eighty-eight of these candidate positive SNPs lay in 51 clusters in which at least three positive SNPs met the same criterion (p=0.00034). Thus when we performed 100,000 Monte Carlo simulation trials in each of which a random set of 667 SNPs was sampled, only 34 such trials involved 188 or more SNPs in such clusters.

We focus our subsequent analyses on these 51 clusters of candidate positive SNPs (Table I). We identify candidate gene nominees for many of these 51 clusters of positive SNPs based on information from Mapviewer, HapMap and Unigene and the criteria noted in Methods.

The clustered- positive results from this dataset can be compared with results from other association and linkage results for addictions. Fourteen of the two-SNP clusters and 4 of the three- SNP clusters from the present work lie within 1 Mb of at least one of the clustered positive results obtained from both NIDA European-American and NIDA African-American polysubstance abusers who report dependence on at least one illegal addictive substance (p≤0.00001 for both comparisons). These results provide additional support for positive SNP clusters 9, 11, 32 and 38 in the current work. Ten of the two-SNP clusters and five of the three-SNP clusters from a study of methamphetamine-dependent Japanese individuals vs controls also lie within 1 Mb of at least one of the current clusters (p<0.00001), providing additional support for clusters 11, 20, 22, 31 and 49 from the current work. Of the 26 genes that are identified here by multiple clustered positive SNPs, LRP1B, AIP1, CDH13, LRRTM4, CSMD1, PCSK5, CSMD2, GPR154 and DGKB also contain clustered positive results from at least one other addiction association genome scanning sample. NMUR2 is also identified by clustered positive results from other samples. This level of replication is especially remarkable since these convergences were sought for samples from different ethnic backgrounds and different addictions. Such a level of replication is consistent with false discovery rate calculations for the 51 loci, which range from 0.06 for joint false discovery rates for the eight positive SNP cluster to ca 0.33 for the 3 positive SNP clusters. Such a level of replication is also consistent with simulation-based Monte Carlo p values for each of these loci. Each of 100,000 trials selected a genomic segment that started with the beginning of a randomly-selected annotated gene, continued 3’ for the same number of bases as that identified by the positive cluster and added an additional 1 Mb at either end of the segment. For each trial, the genomic segment was assessed to identify whether a cluster of positive SNPs with the same properties identified in the true dataset lay in the region. These studies appear to yield p values that range from 0.001 to 0.03, uncorrected for the 51 multiple comparisons (CJ, GRU et al, in preparation).

Based on the number of pools assessed here and the pool-to-pool variability actually observed in these experiments (SEMs for four replicate arrays 0.03; mean +/− SEMs for pool-to-pool variation 0.62 +/− 0.02), we calculate 0.9 power to detect abuser/control allele frequency differences of 0.05.


The results of this study support the idea that array-based pooled association genome scanning approaches can identify chromosomal regions likely to contain allelic variants that differ in frequencies between alcohol dependent and control individuals. This current identification of such frequency differences in alcohol dependent individuals provides the first genome-wide association-based assessment for genomic loci likely to contain variants that contribute to dependence on alcohol. We discuss the strengths and possible limitations of these results, the ways in which they converge with results of previous association- and linkage-based studies and the classes of genes that they nominate to play roles in human vulnerabilities to alcohol dependence.

The reliability and validity of the current approach is supported by data that documents the reliability of clinical assessments made by multiple observers and the extent to which the markers that display nominally-positive differences between abusers and controls cluster together in specific chromosomal regions in these samples. We and others have also provided extensive evidence for the reliability and validity of pooling approaches using related microarray-based assays ((Butcher and others 2004; Liu and others 2005), Drgon et al, in preparation). Correlations between the current data and preliminary data obtained using the same samples and “10k” Affymetrix arrays were 0.98 for the overlapping SNPs that displayed outlier abuser/control values (CJ, QRL, TD, GRU et al, unpublished observations, 2004).

Modeling studies support significant power for the current methods and also support the likelihood of both false-positive and false-negative results. Power calculations support 0.9 power to detect 5% allele frequencies differences in the current experiments. Despite the relatively high marker density used in this report, however, there are still likely to be haplotype blocks that contain vulnerability-modifying alleles but do not contain three SNPs that are assessed in this report. Such blocks could thus provide false negative results in these studies. False positives are also likely, since we make many comparisons in this study. Simulation studies suggest a very low likelihood that all of the clustered positive results displayed here represent false positives. However, false positive results are still likely even among the clustered positive SNPs. False discovery rates lie between 0.06 and ca 0.33 for the different clusters. Many, but not all, of these findings are supported by positive results from association genome scans of different addictions studied in different populations.

Since the COGA sample was not collected for association studies, it is possible that there might be occult stratification within these sample sets. We have examined this by comparing mean differences between arctangent transformed A/B allelic ratios between European- and African-American samples collected in Baltimore, Maryland (QR Liu, GRU et al, in preparation) for the SNPs that are 5% outliers among the Affymetrix SNPs from the current dataset and with data for all Affymetrix SNPs. The average normalized allele frequency ethnic difference for the SNPs that displayed outlier alcoholic/control values was 0.154. The average normalized ethic differences for all of the SNPs represented in the current dataset was 0.147. There was thus no evidence for overall stratification.

These results support the possibility that careful evaluation of associations within unrelated members of samples collected for linkage may be possible with the relatively high marker densities provided by SNP methods, given the genomic controls that these high SNP densities can also provide. These data and their convergence with prior results continue to provide support for the idea that common allelic variants contribute to human vulnerability to abuse of addictive substances. Finding a number of SNPs with substantial abuser-control differences near markers previously linked to alcoholism in this same dataset supports the idea that further fine mapping studies using association approaches in these samples, as well as others, might help to better define the specific genes, haplotypes and gene variants that contribute to previously-observed linkage signals in these datasets.

When we assess the extent to which the SNPs that display outlier abuser/control values also display outlier t values, the observed results are found rarely by chance in simulation studies. When we examine the degree to which these nominally-positive SNPs cluster together in groups of three or more on modest-sized chromosomal regions we also observe striking departures from chance values. Forty two of the 51 clusters identified here contain positive SNPs from both Xba I and Hind III arrays. Twenty-two of the 51 clusters identified here receive at least some support from another linkage or association study; others also receive support from candidate gene studies (see below).

Assessing convergence of the current data with results of linkage analyses identifies sixteen simple sequence length polymorphism (SSLP) markers that were previously linked to alcohol phenotypes with nominal statistical significance that lie within +/− 5Mb of clustered positive results from the current study. Seven of these markers lie near linked markers from analyses of alcohol dependence in COGA pedigrees (Reich and others 1998); four from linkage analyses of alcohol dependence in Southwest Indians (Long and others 1998) and five from linkage analyses of alcohol quantity/frequency phenotypes in data from the Framingham study (Bergen and others 2003; Ma and others 2003). Each of these observations provide additional levels of support for the validity of the observed clustering.

Interesting candidate gene nominees lie near many of the clustered positive markers identified in this work. Cell signaling molecule genes that lie near reproducibly-positive SNPs (Table 1) include those that signal within cells and between cells. Peptide signaling is implicated. Clustered positive SNPs lie just in the 5′ flank of the GPR154 G protein-coupled receptor 154 that has been characterized as the receptor for neuropeptide S (Xu and others 2004). Several SNPs lie in 5’ and 3’ flanks of the AGTR1 angiotensin II receptor, type 1 gene. Clustered positive SNPs also flank and/or lie within enzymes that function to convert propeptides to biologically active peptides, including CPE carboxypeptidase E and proprotein convertase subtilisin/kexin type 5 (PCSK5).

Intracellular signaling with several different second messenger systems is implicated. Phospholipid signaling pathways could be altered by variations in several genes that lie near clustered positive SNPs. Positive SNPs cluster in the 5’ flank and within the DGKB diacylglycerol kinase, beta 90kDa gene, the gene that encodes the ITPR2 inositol 1,4,5-triphosphate receptor, type 2 and the MAP3K7 mitogen-activated protein kinase kinase kinase 7 gene. Other phosphorylation patterns could well be altered by differences in the activities of the genes that encode the WW and PDZ domain containing BAIAP1/MAGI1 membrane associated guanylate kinase and the anchor protein for AKAP1A kinase (PRKA).

Channels are implicated by these results. The KCNK2 potassium channel, subfamily K, member 2 is implicated by multiple positive SNPs.

Gene regulatory and/or developmental genes lie near reproducibly-positive SNPs. The ephrin EFNA5 gene’s 5’ flank contains positive SNPs that support roles for variations in this single transmembrane domain receptor protein kinase in addiction vulnerability. The DAB1 disabled homolog 1, DOCK2 dedicator of cytokinesis 2, CSMD1 CUB and Sushi multiple domains 1, SESTD1 SEC14 and spectrin domains 1, ZNF533 zinc finger protein 533 and the MSH3 mutS homolog 3 (E. coli) genes each contain multiple clustered positive SNPs. The 3’ flank of the MSI2 musashi homolog 2 contains multiple positive SNPs. Each of these genes’ products could alter brain developmental and/or adult form and function with consequences for addiction vulnerability.

The atrophin-1 interacting protein 1 (AIP1) gene is a disease-related gene that lies near clustered positive SNPs from the current dataset and reproducibly positive SNPs in studies of African- and European-American polysubstance abusers vs controls (see (Liu and others 2005). This gene (Wood and others 1998) is expressed largely in brain where it interacts with proteins including atrophin, the protein in which trinucleotide repeat expansions cause dentatorubral and pallidoluysian atrophy.

We have identified clustered positive SNPs near the genes that encode documented or suspected cell adhesion molecules and their possible ligands. These genes include the LRP1B low density lipoprotein-related protein, cadherin 11 and cadherin 13 genes. Cadherin 13 is expressed in neurons and is abundant in interesting brain regions including amygdala. We have previously identified clustered positive SNPs in the cadherin 13 gene in comparisons of methamphetamine abusers with controls (Drgon et al, submitted). A number of SNPs are 3′ to the sequences currently annotated as the LRRTM4 leucine rich repeat transmembrane neuronal 4 gene. These SNPs lie near ESTs that derive from brain and seem likely to signal previously-unelucidated more 3′ portions of this gene. These data add to previous nomination and or/confirmation of addiction-associated variants in cell adhesion molecules including neurexin 3 (Liu and others 2005), NrCAM (Ishiguro and others 2005), PTPRB (Ishiguro et al, submitted), the minor histocompatibility antigen HB-1(Liu and others 2005), multimerin 1 (Drgon et al, submitted), ADAM23 (Drgon et al, submitted), the FAT tumor suppressor homolog 3 and the Downs syndrome cell adhesion molecule (Drgon et al, submitted).

Clustered positive SNPs also lie near genes with other diverse cellular functions. DLAD DNase II-like acid DNase, MYR8 myosin heavy chain Myr 8 and the C14orf31 that encodes the FRMD6 FREM domain containing 6 protein each contain multiple clustered positives SNPs. In addition, clustered positive SNPs also lie near genes that encode proteins of unknown function, including a number of hypothetical proteins (Table 1).

While these data nominate interesting genes, it is only confirmation in multiple datasets in ongoing and future studies that will link each of them securely to addiction vulnerability. In preliminary results from higher density genome scanning studies from at least three additional samples, several of these genes receive substantial support (Table 1, TD, QRL, CJ, GRU and others in preparation). Nevertheless, the current data provide support for loci nominated in prior SNP association and linkage-based studies and identify new chromosomal regions with clustered positive SNPs and interesting genes. They provide a set of genomic markers in these 51 chromosomal regions that should be useful in subsequent studies of alcohol abusers. As we identify more and more of the allelic variants that contribute to vulnerability to abuse of alcohol and other substances, we will be better able to understand addictions themselves.

Supplementary axis (right of main axis): SNPs for which abuser/control differences display t values ≥ 3. Red dots designate clustered positive SNPs that display outlier abuser/control and t values. Scale: chromosomal positions based on NCBI Map Viewer coordinates and supplemental data from NETAFFYX. The vertical bar represents 25 MB.


We acknowledge financial support from NIDA-IRP and passionate statistical discussions and help from Dr. Daniel Naiman, Department of Mathematical Sciences, Johns Hopkins University. For assistance in obtaining these datasets and samples, we are especially grateful the Genetics Analysis Workshop, NIAAA and COGA investigators, including PI: H. Begleiter, co-PIs L. Bierut, H. Edenberg, V. Hesselbrock and B. Porjesz; University of Connecticut (V. Hesselbrock); Indiana University (H. Edenberg, J. Nurnberger Jr., P.M. Conneally, T. Foroud); University of Iowa (S. Kuperman, R. Crowe); SUNY Downstate Medical Center (B. Porjesz, H. Begleiter); Washington University in St. Louis (L. Bierut, A. Goate, J. Rice); University of California at San Diego (M. Schuckit); Howard University (R. Taylor); Rutgers University (J. Tischfield); and Southwest Foundation (L. Almasy) and Zhaoxia Ren as NIAAA staff collaborator. We acknowledge support for sample and data collection and storage from U10AA008401 (NIAAA and NIDA). COGA investigators especially acknowledge the fundamental scientific contributions of the late Theodore Reich, M.D., Co-Principal Investigator of COGA from its inception and a founder of modern psychiatric genetics.

The authors acknowledge financial support from the intramural research program of the NIH, NIDA DHSS and assistance from NIAAAA, the Genetics Analysis Workgroup and members of the Collaborative Study on the Genetics of Alcoholism.


  • Bansal A. Trends in reporting of SNP associations. Lancet. 2001;358(9298):2016. [PubMed]
  • Barcellos LF, Klitz W, Field LL, Tobias R, Bowcock AM, Wilson R, Nelson MP, Nagatomi J, Thomson G. Association mapping of disease loci, by use of a pooled DNA genomic screen. Am J Hum Genet. 1997;61(3):734–47. [PubMed]
  • Bergen AW, Yang XR, Bai Y, Beerman MB, Goldstein AM, Goldin LR. Genomic regions linked to alcohol consumption in the Framingham Heart Study. BMC Genet. 2003;4(Suppl 1):S101. [PMC free article] [PubMed]
  • Bierut LJ, Saccone NL, Rice JP, Goate A, Foroud T, Edenberg H, Almasy L, Conneally PM, Crowe R, Hesselbrock V. Defining alcohol-related phenotypes in humans. The Collaborative Study on the Genetics of Alcoholism. Alcohol Res Health. 2002;26(3):208–13. others. [PubMed]
  • Butcher LM, Meaburn E, Liu L, Fernandes C, Hill L, Al-Chalabi A, Plomin R, Schalkwyk L, Craig IW. Genotyping pooled DNA on microarrays: a systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits. Behav Genet. 2004;34(5):549–55. [PubMed]
  • Cadoret RJ, Troughton E, O’Gorman TW, Heywood E. An adoption study of genetic and environmental factors in drug abuse. Arch Gen Psychiatry. 1986;43(12):1131–6. [PubMed]
  • Cadoret RJ, Yates WR, Troughton E, Woodworth G, Stewart MA. Adoption study demonstrating two genetic pathways to drug abuse. Arch Gen Psychiatry. 1995;52(1):42–52. [PubMed]
  • Cervino AC, Hill AV. Comparison of tests for association and linkage in incomplete families. Am J Hum Genet. 2000;67(1):120–32. [PubMed]
  • Dupont WD, Plummer WD., Jr Power and sample size calculations. A review and computer program. Control Clin Trials. 1990;11(2):116–28. [PubMed]
  • Germer S, Holland MJ, Higuchi R. High-throughput SNP allele-frequency determination in pooled DNA samples by kinetic PCR. Genome Res. 2000;10(2):258–66. [PubMed]
  • Goldberg J, Henderson WG, Eisen SA, True W, Ramakrishnan V, Lyons MJ, Tsuang MT. A strategy for assembling samples of adult twin pairs in the United States. Stat Med. 1993;12(18):1693–702. [PubMed]
  • Grove WM, Eckert ED, Heston L, Bouchard TJ, Jr., Segal N, Lykken DT. Heritability of substance abuse and antisocial behavior: a study of monozygotic twins reared apart. Biol Psychiatry. 1990;27(12):1293–304. [PubMed]
  • Gynther LM, Carey G, Gottesman, II, Vogler GP. A twin study of non-alcohol substance abuse. Psychiatry Res. 1995;56(3):213–20. [PubMed]
  • Hacia JG, Fan JB, Ryder O, Jin L, Edgemon K, Ghandour G, Mayer RA, Sun B, Hsie L, Robbins CM. Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays. Nat Genet. 1999;22(2):164–7. others. [PubMed]
  • Ishiguro H, Liu QR, Gong JP, Hall FS, Ujike H, Morales M, Sakurai T, Grumet M, Uhl GR. NrCAM in Addiction Vulnerability: Positional Cloning, Drug-Regulation, Haplotype-Specific Expression, and Altered Drug Reward in Knockout Mice. Neuropsychopharmacology. 2005. [PubMed]
  • Kaprio J, Hammar N, Koskenvuo M, Floderus-Myrhed B, Langinvainio H, Sarna S. Cigarette smoking and alcohol use in Finland and Sweden: a cross-national twin study. Int J Epidemiol. 1982;11(4):378–86. [PubMed]
  • Karkowski LM, Prescott CA, Kendler KS. Multivariate assessment of factors influencing illicit substance use in twins from female-female pairs. Am J Med Genet. 2000;96(5):665–70. [PubMed]
  • Kendler KS, Karkowski LM, Corey LA, Prescott CA, Neale MC. Genetic and environmental risk factors in the aetiology of illicit drug initiation and subsequent misuse in women. Br J Psychiatry. 1999;175:351–6. [PubMed]
  • Kendler KS, Prescott CA. Cocaine use, abuse and dependence in a population-based sample of female twins. Br J Psychiatry. 1998;173:345–50. [PubMed]
  • Liu QR, Drgon T, Walther D, Johnson C, Poleskaya O, Hess J, Uhl GR. Pooled association genome scanning: Validation and use to identify addiction vulnerability loci in two samples. Proc Natl Acad Sci U S A. 2005;102(33):11864–9. [PubMed]
  • Long JC, Knowler WC, Hanson RL, Robin RW, Urbanek M, Moore E, Bennett PH, Goldman D. Evidence for genetic linkage to alcohol dependence on chromosomes 4 and 11 from an autosome-wide scan in an American Indian population. Am J Med Genet. 1998;81(3):216–21. [PubMed]
  • Ma JZ, Zhang D, Dupont RT, Dockter M, Elston RC, Li MD. Mapping susceptibility loci for alcohol consumption using number of grams of alcohol consumed per day as a phenotype measure. BMC Genet. 2003;4(Suppl 1):S104. [PMC free article] [PubMed]
  • Maes HH, Woodard CE, Murrelle L, Meyer JM, Silberg JL, Hewitt JK, Rutter M, Simonoff E, Pickles A, Carbonneau R. Tobacco, alcohol and drug use in eight- to sixteen-year-old twins: the Virginia Twin Study of Adolescent Behavioral Development. J Stud Alcohol. 1999;60(3):293–305. others. [PubMed]
  • Merikangas KR, Stolar M, Stevens DE, Goulet J, Preisig MA, Fenton B, Zhang H, O’Malley SS, Rounsaville BJ. Familial transmission of substance use disorders. Arch Gen Psychiatry. 1998;55(11):973–9. [PubMed]
  • Reich T, Edenberg HJ, Goate A, Williams JT, Rice JP, Van Eerdewegh P, Foroud T, Hesselbrock V, Schuckit MA, Bucholz K. Genome-wide search for genes affecting the risk for alcohol dependence. Am J Med Genet. 1998;81(3):207–15. others. [PubMed]
  • Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273(5281):1516–7. [PubMed]
  • Schork NJ, Nath SK, Fallin D, Chakravarti A. Linkage disequilibrium analysis of biallelic DNA markers, human quantitative trait loci, and threshold-defined case and control subjects. Am J Hum Genet. 2000;67(5):1208–18. [PubMed]
  • Sham PC, Cherny SS, Purcell S, Hewitt JK. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet. 2000;66(5):1616–30. [PubMed]
  • Storey JD. A direct approach to false discovery rates. (Series B).Journal of the Royal Statistical Society. 2002;64:479–98.
  • Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100(16):9440–5. [PubMed]
  • Tsuang MT, Lyons MJ, Eisen SA, Goldberg J, True W, Lin N, Meyer JM, Toomey R, Faraone SV, Eaves L. Genetic influences on DSM-III-R drug abuse and dependence: a study of 3,372 twin pairs. Am J Med Genet. 1996;67(5):473–7. [PubMed]
  • Tsuang MT, Lyons MJ, Harley RM, Xian H, Eisen S, Goldberg J, True WR, Faraone SV. Genetic and environmental influences on transitions in drug use. Behav Genet. 1999;29(6):473–9. [PubMed]
  • Tsuang MT, Lyons MJ, Meyer JM, Doyle T, Eisen SA, Goldberg J, True W, Lin N, Toomey R, Eaves L. Co-occurrence of abuse of different drugs in men: the role of drug-specific and shared vulnerabilities. Arch Gen Psychiatry. 1998;55(11):967–72. [PubMed]
  • Uhl GR. Molecular genetics of substance abuse vulnerability: a current approach. Neuropsychopharmacology. 1999;20(1):3–9. [PubMed]
  • Uhl GR. Molecular genetic underpinnings of human substance abuse vulnerability: likely contributions to understanding addiction as a mnemonic process. Neuropharmacology. 2004;47(Suppl 1):140–7. [PubMed]
  • Uhl GR, Elmer GI, Labuda MC, Pickens RW. Genetic influences in drug abuse. In: Gloom FE, Kupfer DJ, editors. Psychopharmacology: The Fourth Generation of Progress. Raven Press; New York: 1995. pp. 1793–2783.
  • Uhl GR, Gold LH, Risch N. Genetic analyses of complex behavioral disorders. Proc Natl Acad Sci U S A. 1997;94(7):2785–6. [PubMed]
  • Uhl GR, Liu QR, Naiman D. Substance abuse vulnerability loci: converging genome scanning data. Trends Genet. 2002;18(8):420–5. [PubMed]
  • Uhl GR, Liu QR, Walther D, Hess J, Naiman D. Polysubstance abuse-vulnerability genes: genome scans for association, using 1,004 subjects and 1,494 single-nucleotide polymorphisms. Am J Hum Genet. 2001;69(6):1290–300. [PubMed]
  • Wood JD, Yuan J, Margolis RL, Colomer V, Duan K, Kushi J, Kaminsky Z, Kleiderlein JJ, Sharp AH, Ross CA. Atrophin-1, the DRPLA gene product, interacts with two families of WW domain-containing proteins. Mol Cell Neurosci. 1998;11(3):149–60. [PubMed]
  • Woodward CE, Maes HH, Silberg JL, Meyer JM, Eaves LJ. Tobacco, alcohol and drug use in 8-16 year old twins. NIDA Res Monograph. 1996;162:309.
  • Xu YL, Reinscheid RK, Huitron-Resendiz S, Clark SD, Wang Z, Lin SH, Brucher FA, Zeng J, Ly NK, Henriksen SJ. Neuropeptide S: a neuropeptide promoting arousal and anxiolytic-like effects. Neuron. 2004;43(4):487–97. others. [PubMed]