Search tips
Search criteria

Results 1-25 (28)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  BioQ: tracing experimental origins in public genomic databases using a novel data provenance model 
Bioinformatics  2012;28(8):1189-1191.
Motivation: Public genomic databases, which are often used to guide genetic studies of human disease, are now being applied to genomic medicine through in silico integrative genomics. These databases, however, often lack tools for systematically determining the experimental origins of the data.
Results: We introduce a new data provenance model that we have implemented in a public web application, BioQ, for assessing the reliability of the data by systematically tracing its experimental origins to the original subjects and biologics. BioQ allows investigators to both visualize data provenance as well as explore individual elements of experimental process flow using precise tools for detailed data exploration and documentation. It includes a number of human genetic variation databases such as the HapMap and 1000 Genomes projects.
Availability and implementation: BioQ is freely available to the public at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3324523  PMID: 22426342
2.  Chromosome 20 Shows Linkage With DSM-IV Nicotine Dependence in Finnish Adult Smokers 
Nicotine & Tobacco Research  2011;14(2):153-160.
Chromosome 20 has previously been associated with nicotine dependence (ND) and smoking cessation. Our aim was to replicate and extend these findings.
First, a total of 759 subjects belonging to 206 Finnish families were genotyped with 18 microsatellite markers residing on chromosome 20, in order to replicate previous linkage findings. Then, the replication data were combined to an existing whole-genome linkage data resulting in a total of 1,302 genotyped subjects from 357 families. ND diagnosed by DSM-IV criteria, the Fagerström Test for Nicotine Dependence (FTND) score, and the lifetime maximum number of cigarettes smoked within a 24-hr period (MaxCigs24) were used as phenotypes in the nonparametric linkage analyses.
We replicated previously reported linkage to DSM-IV ND, with a maximum logarithm of odd (LOD) score of 3.8 on 20p11, with females contributing more (maximum LOD [MLOD] score 3.4 on 20q11) than males (MLOD score 2.6 on 20p11). With the combined sample, a suggestive LOD score of 2.3 was observed for DSM-IV ND on 20p11. Sex-specific analyses revealed that the signal was driven by females with a maximum LOD score of 3.3 (on 20q11) versus LOD score of 1.3 in males (on 20q13) in the combined sample. No significant linkage signals were obtained for FTND or MaxCigs24.
Our results provide further evidence that chromosome 20 harbors genetic variants influencing ND in adult smokers.
PMCID: PMC3265743  PMID: 22039074
3.  A ν-support vector regression based approach for predicting imputation quality 
BMC Proceedings  2012;6(Suppl 7):S3.
Decades of genome-wide association studies (GWAS) have accumulated large volumes of genomic data that can potentially be reused to increase statistical power of new studies, but different genotyping platforms with different marker sets have been used as biotechnology has evolved, preventing pooling and comparability of old and new data. For example, to pool together data collected by 550K chips with newer data collected by 900K chips, we will need to impute missing loci. Many imputation algorithms have been developed, but the posteriori probabilities estimated by those algorithms are not a reliable measure the quality of the imputation. Recently, many studies have used an imputation quality score (IQS) to measure the quality of imputation. The IQS requires to know true alleles to estimate. Only when the population and the imputation loci are identical can we reuse the estimated IQS when the true alleles are unknown.
Here, we present a regression model to estimate IQS that learns from imputation of loci with known alleles. We designed a small set of features, such as minor allele frequencies, distance to the nearest known cross-over hotspot, etc., for the prediction of IQS. We evaluated our regression models by estimating IQS of imputations by BEAGLE for a set of GWAS data from the NCBI GEO database collected from samples from different ethnic populations.
We construct a ν-SVR based approach as our regression model. Our evaluation shows that this regression model can accomplish mean square errors of less than 0.02 and a correlation coefficient close to 0.75 in different imputation scenarios. We also show how the regression results can help remove false positives in association studies.
Reliable estimation of IQS will facilitate integration and reuse of existing genomic data for meta-analysis and secondary analysis. Experiments show that it is possible to use a small number of features to regress the IQS by learning from different training examples of imputation and IQS pairs.
PMCID: PMC3504919  PMID: 23173775
4.  A 3p26-3p25 genetic linkage finding for DSM-IV major depression in heavy smoking families 
The American journal of psychiatry  2011;168(8):848-852.
The authors tested for genetic linkage of DSM-IV-diagnosed major depressive disorder in families that were ascertained for cigarette smoking.
Within a study that targeted families characterized by a history of smoking, analyses derived a subset of 91 Australian families with two or more offspring with a history of DSM-IV major depressive disorder (affected sibling pairs, N=187) and 25 Finnish families (affected sibling pairs, N=33). Within this affected sibling pair design, the authors conducted nonparametric linkage analysis.
In the Australian heavy smoking families, the authors found a genome-wide significant multipoint LOD score of 4.14 for major depressive disorder on chromosome 3 at 24.9 cM (3p26-3p25).
Genome-wide significant linkage was detected for major depressive disorder on chromosome 3p in a sample ascertained for smoking. A linkage peak at this location was also observed in an independent study of major depressive disorder.
PMCID: PMC3433250  PMID: 21572167
5.  A Genomewide Association Study of DSM-IV Cannabis Dependence 
Addiction biology  2010;16(3):514-518.
Despite twin studies showing that 50–70% of variation in DSM-IV cannabis dependence is attributable to heritable influences, little is known of specific genotypes that influence vulnerability to cannabis dependence. We conducted a genomewide association study of DSM-IV cannabis dependence. Association analyses of 708 DSM-IV cannabis dependent cases with 2,346 cannabis exposed nondependent controls was conducted using logistic regression in PLINK. None of the 948,142 SNPs met genomewide significance (p < E−8). The lowest p-values were obtained for polymorphisms on chromosome 17 (rs1019238 and rs1431318, p-values at E−7) in the ANKFN1 gene. While replication is required, this study represents an important first step towards clarifying the biological underpinnings of cannabis dependence.
PMCID: PMC3117436  PMID: 21668797
6.  An investigation of candidate regions for association with Bipolar disorder 
American Journal of Medical Genetics  2010;153B(7):1292-1297.
We performed a case control study of 1,000 cases and 1,028 controls on 1,509 markers, 1,139 of which were located in a 8 Mb region on chromosome 6 (105-113 Mb). This region has shown evidence of involvement in BP in a number of other studies. We find association between BP and two SNPs in the gene LACE1. SNP rs9486880 and rs11153113 (both have p-values of 2 × 10-5). Both p-values are in the top 5% of the distribution derived from null simulations (p=0.02 and 0.01 respectively). LACE is a good candidate for BP; it is an ATPase. We genotyped 173 other markers in 17 other positional and/or functional loci but found no further evidence of association with BP.
PMCID: PMC3321541  PMID: 20872768
7.  Genetic Linkage Findings for DSM-IV Nicotine Withdrawal in Two Populations 
Nicotine withdrawal (NW) is both an important contributor to difficulty quitting cigarettes and because of mood-related withdrawal symptoms a problem of particular relevance to psychiatry. Twin-studies suggest that genetic factors influence NW (heritability= 45%). Only one previous linkage study has published findings on NW (Swan et al., 2006; LOD=2.7; Chr. 6 at 159 cM). As part of an international consortium, genome-wide scans (using 381 autosomal microsatellite markers) and telephone diagnostic interviews were conducted on 289 Australian (AUS) and 161 Finnish (FIN, combined (COMB) N=450 families) families ascertained from twin registries through index-cases with a lifetime history of cigarette smoking. The statistical approach used an affected-sib pair design (at least two adult full siblings reported a history of DSM-IV NW) and conducted the linkage analyses using MERLIN. Linkage signals with LOD scores greater than 1.5 were found on two chromosomes: 6 (FIN: LOD= 1.93 at 75 cM) and 11 at two different locations (FIN: LOD= 3.55 at 17 cM, and AUS: LOD= 1.68 with a COMB: LOD= 2.30 at 123 cM). The multipoint LOD score of 3.55 on chromosome 11p15 in FIN met genomewide significance (p = .013 with 1000 simulations). At least four strong candidate genes lie within or near this peak on chromosome 11: DRD4, TPH, TH, and CHRNA10. Other studies have reported that chromosome 11 may harbor genes associated with various aspects of smoking behavior. This study adds to that literature by highlighting evidence for nicotine withdrawal.
PMCID: PMC2995916  PMID: 19180564
genetics; linkage; nicotine; withdrawal
8.  A genetic network model of cellular responses to lithium treatment and cocaine abuse in bipolar disorder 
BMC Systems Biology  2010;4:158.
Lithium is an effective treatment for Bipolar Disorder (BD) and significantly reduces suicide risk, though the molecular basis of lithium's effectiveness is not well understood. We seek to improve our understanding of this effectiveness by posing hypotheses based on new experimental data as well as published data, testing these hypotheses in silico, and posing new hypotheses for validation in future studies. We initially hypothesized a gene-by-environment interaction where lithium, acting as an environmental influence, impacts signal transduction pathways leading to differential expression of genes important in the etiology of BD mania.
Using microarray and rt-QPCR assays, we identified candidate genes that are differentially expressed with lithium treatment. We used a systems biology approach to identify interactions among these candidate genes and develop a network of genes that interact with the differentially expressed candidates. Notably, we also identified cocaine as having a potential influence on the network, consistent with the observed high rate of comorbidity for BD and cocaine abuse. The resulting network represents a novel hypothesis on how multiple genetic influences on bipolar disorder are impacted by both lithium treatment and cocaine use. Testing this network for association with BD and related phenotypes, we find that it is significantly over-represented for genes that participate in signal transduction, consistent with our hypothesized-gene-by environment interaction. In addition, it models related pharmacogenomic, psychiatric, and chemical dependence phenotypes.
We offer a network model of gene-by-environment interaction associated with lithium's effectiveness in treating BD mania, as well as the observed high rate of comorbidity of BD and cocaine abuse. We identified drug targets within this network that represent immediate candidates for therapeutic drug testing. Posing novel hypotheses for validation in future work, we prioritized SNPs near genes in the network based on functional annotation. We also developed a "concept signature" for the genes in the network and identified additional candidate genes that may influence the system because they are significantly associated with the signature.
PMCID: PMC3212423  PMID: 21092101
9.  New tools and methods for direct programmatic access to the dbSNP relational database 
Nucleic Acids Research  2010;39(Database issue):D901-D907.
Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.
PMCID: PMC3013662  PMID: 21037260
10.  The CHRNA5-CHRNA3-CHRNB4 nicotinic receptor subunit gene cluster affects risk for nicotine dependence in African-Americans and in European-Americans 
Cancer research  2009;69(17):6848-6856.
Genetic association studies have demonstrated the importance of variants in the CHRNA5-CHRNA3-CHRNB4 cholinergic nicotinic receptor subunit gene cluster on chromosome 15q24-25.1 in risk for nicotine dependence, smoking, and lung cancer in populations of European descent. We have now carried out a detailed study of this region using dense genotyping in both European- and African-Americans.
We genotyped 75 known single-nucleotide-polymorphisms (SNPs) and one sequencing-discovered SNP in an African-American (AA) sample (N = 710) and European-American (EA) sample (N = 2062). Cases were nicotine-dependent and controls were non-dependent smokers.
The non-synonymous CHRNA5 SNP rs16969968 is the most significant SNP associated with nicotine dependence in the full sample of 2772 subjects (p = 4.49×10−8, OR 1.42 (1.25–1.61)) as well as in AAs only (p = 0.015, OR = 2.04 (1.15–3.62)) and EAs only (p = 4.14×10−7, OR = 1.40 (1.23–1.59)). Other SNPs that have been shown to affect mRNA levels of CHRNA5 in EAs are associated with nicotine dependence in AAs but not in EAs. The CHRNA3 SNP rs578776, which has low correlation with rs16969968, is associated with nicotine dependence in EAs but not in AAs. Less common SNPs (frequency ≤ 5%) also are associated with nicotine dependence.
In summary, multiple variants in this gene cluster contribute to nicotine dependence risk, and some are also associated with functional effects on CHRNA5. The non-synonymous SNP rs16969968, a known risk variant in European-descent populations, is also significantly associated with risk in African-Americans. Additional SNPs contribute in distinct ways to risk in these two populations.
PMCID: PMC2874321  PMID: 19706762
genetic association; smoking; cholinergic nicotinic receptors; nicotinic acetylcholine receptors
11.  Risk for nicotine dependence and lung cancer is conferred by mRNA expression levels and amino acid change in CHRNA5 
Human Molecular Genetics  2009;18(16):3125-3135.
Nicotine dependence risk and lung cancer risk are associated with variants in a region of chromosome 15 encompassing genes encoding the nicotinic receptor subunits CHRNA5, CHRNA3 and CHRNB4. To identify potential biological mechanisms that underlie this risk, we tested for cis-acting eQTLs for CHRNA5, CHRNA3 and CHRNB4 in human brain. Using gene expression and disease association studies, we provide evidence that both nicotine-dependence risk and lung cancer risk are influenced by functional variation in CHRNA5. We demonstrated that the risk allele of rs16969968 primarily occurs on the low mRNA expression allele of CHRNA5. The non-risk allele at rs16969968 occurs on both high and low expression alleles tagged by rs588765 within CHRNA5. When the non-risk allele occurs on the background of low mRNA expression of CHRNA5, the risk for nicotine dependence and lung cancer is significantly lower compared to those with the higher mRNA expression. Together, these variants identify three levels of risk associated with CHRNA5. We conclude that there are at least two distinct mechanisms conferring risk for nicotine dependence and lung cancer: altered receptor function caused by a D398N amino acid variant in CHRNA5 (rs16969968) and variability in CHRNA5 mRNA expression.
PMCID: PMC2714722  PMID: 19443489
12.  Multiple Distinct Risk Loci for Nicotine Dependence Identified by Dense Coverage of the Complete Family of Nicotinic Receptor Subunit (CHRN) Genes 
Tobacco smoking continues to be a leading cause of preventable death. Recent research has underscored the important role of specific cholinergic nicotinic receptor subunit (CHRN) genes in risk for nicotine dependence and smoking. To detect and characterize the influence of genetic variation on vulnerability to nicotine dependence, we analyzed 226 SNPs covering the complete family of 16 CHRN genes, which encode the nicotinic acetylcholine receptor (nAChR) subunits, in a sample of 1050 nicotine-dependent cases and 879 non-dependent controls of European descent. This expanded SNP coverage has extended and refined the findings of our previous large scale genome-wide association and candidate gene study. After correcting for the multiple tests across this gene family, we found significant association for two distinct loci in the CHRNA5-CHRNA3-CHRNB4 gene cluster, one locus in the CHRNB3-CHRNA6 gene cluster, and a fourth, novel locus in the CHRND-CHRNG gene cluster. The two distinct loci in CHRNA5-CHRNA3-CHRNB4 are represented by the non-synonymous SNP rs16969968 in CHRNA5 and by rs578776 in CHRNA3, respectively, and joint analyses show that the associations at these two SNPs are statistically independent. Nominally significant single-SNP association was detected in CHRNA4 and CHRNB1. In summary, this is the most comprehensive study of the CHRN genes for involvement with nicotine dependence to date. Our analysis reveals significant evidence for at least four distinct loci in the nicotinic receptor subunit genes that each influence the transition from smoking to nicotine dependence and may inform the development of improved smoking cessation treatments and prevention initiatives.
PMCID: PMC2693307  PMID: 19259974
cholinergic nicotinic receptors; nicotinic acetylcholine receptors; smoking; genetic association
13.  SPOT: a web-based tool for using biological databases to prioritize SNPs after a genome-wide association study 
Nucleic Acids Research  2010;38(Web Server issue):W201-W209.
SPOT (, the SNP prioritization online tool, is a web site for integrating biological databases into the prioritization of single nucleotide polymorphisms (SNPs) for further study after a genome-wide association study (GWAS). Typically, the next step after a GWAS is to genotype the top signals in an independent replication sample. Investigators will often incorporate information from biological databases so that biologically relevant SNPs, such as those in genes related to the phenotype or with potentially non-neutral effects on gene expression such as a splice sites, are given higher priority. We recently introduced the genomic information network (GIN) method for systematically implementing this kind of strategy. The SPOT web site allows users to upload a list of SNPs and GWAS P-values and returns a prioritized list of SNPs using the GIN method. Users can specify candidate genes or genomic regions with custom levels of prioritization. The results can be downloaded or viewed in the browser where users can interactively explore the details of each SNP, including graphical representations of the GIN method. For investigators interested in incorporating biological databases into a post-GWAS SNP selection strategy, the SPOT web tool is an easily implemented and flexible solution.
PMCID: PMC2896195  PMID: 20529875
14.  A New Statistic to Evaluate Imputation Reliability 
PLoS ONE  2010;5(3):e9697.
As the amount of data from genome wide association studies grows dramatically, many interesting scientific questions require imputation to combine or expand datasets. However, there are two situations for which imputation has been problematic: (1) polymorphisms with low minor allele frequency (MAF), and (2) datasets where subjects are genotyped on different platforms. Traditional measures of imputation cannot effectively address these problems.
Methodology/Principal Findings
We introduce a new statistic, the imputation quality score (IQS). In order to differentiate between well-imputed and poorly-imputed single nucleotide polymorphisms (SNPs), IQS adjusts the concordance between imputed and genotyped SNPs for chance. We first evaluated IQS in relation to minor allele frequency. Using a sample of subjects genotyped on the Illumina 1 M array, we extracted those SNPs that were also on the Illumina 550 K array and imputed them to the full set of the 1 M SNPs. As expected, the average IQS value drops dramatically with a decrease in minor allele frequency, indicating that IQS appropriately adjusts for minor allele frequency. We then evaluated whether IQS can filter poorly-imputed SNPs in situations where cases and controls are genotyped on different platforms. Randomly dividing the data into “cases” and “controls”, we extracted the Illumina 550 K SNPs from the cases and imputed the remaining Illumina 1 M SNPs. The initial Q-Q plot for the test of association between cases and controls was grossly distorted (λ = 1.15) and had 4016 false positives, reflecting imputation error. After filtering out SNPs with IQS<0.9, the Q-Q plot was acceptable and there were no longer false positives. We then evaluated the robustness of IQS computed independently on the two halves of the data. In both European Americans and African Americans the correlation was >0.99 demonstrating that a database of IQS values from common imputations could be used as an effective filter to combine data genotyped on different platforms.
IQS effectively differentiates well-imputed and poorly-imputed SNPs. It is particularly useful for SNPs with low minor allele frequency and when datasets are genotyped on different platforms.
PMCID: PMC2837741  PMID: 20300623
15.  Further evidence for an association between the GABAA genes on chromosome 4 and FTND-based nicotine dependence 
Addiction (Abingdon, England)  2009;104(3):471-477.
A previous association analysis identified polymorphisms in GABRA4 and GABRA2 to be associated with nicotine dependence, as assessed by a score of 4 or more on the Fagerström Test for Nicotine Dependence (FTND). In the present report, we extend the previous study by significantly expanding our genotyping efforts for these two genes.
In 1,049 cases (FTND of 4 or more) and 872 controls (smokers with FTND of 0), from the U.S. and Australia, we examine the association between 23 GABRA4 and 39 GABRA2 recently genotyped single nucleotide polymorphisms (SNPs) and nicotine dependence using logistic regression-based association analyses in PLINK.
Two and 18 additional SNPs in GABRA4 and GABRA2 respectively were associated with nicotine dependence. The SNPs identified in GABRA4 (p value = 0.002) were restricted to introns 1 and 2, exon 1 and the 5’ end of the gene, while those in GABRA2 localized to the 3’ end of the gene and spanned introns 9 through 3, and were in moderate to high linkage disequilibrium (as measured by r2) with each other and with previously studied polymorphisms.
Our findings consistently demonstrate the role of GABRA4 and GABRA2 in nicotine dependence. However, further research is needed to identify the biological influence of these intronic variations and to isolate functionally relevant polymorphisms neighboring them.
PMCID: PMC2653081  PMID: 19207358
Association; nicotine dependence; GABRA2; NICSNP
16.  A Comparison of Association Statistics between Pooled and Individual Genotypes 
Human Heredity  2009;67(4):219-225.
Markers for individual genotyping can be selected using quantitative genotyping of pooled DNA. This strategy saves time and money.
To determine the efficacy of this approach, we investigated the bivariate distribution of association test statistics from pooled and individual genotypes. We used a sample of approximately 1,000 samples with individual and pooled genotyping on 40,000 SNPs.
Results and Conclusions
We found that the distribution of the joint test statistics can be modelled as a mixture of two bivariate normal distributions. One distribution has a correlation of zero, and is probably due to SNPs whose pooled genotyping was unsuccessful. The other distribution has a correlation of approximately 0.65 in our data. This latter distribution is probably accounted for by SNPs whose pooled genotyping accurately predicts the underlying allele frequency. Approximately 87% of the data belongs to this distribution. We also derived a method to investigate the effect of both the correlation and selection cut-off on the relative power of pooling studies. We demonstrate that pooled genotyping has good power to detect SNPs that are truly associated with disease-causing variants for SNPs showing good correlation between pooled and individual genotyping. Therefore, this approach is a cost effective tool for association studies.
PMCID: PMC2880720  PMID: 19172081
Pooling; Power; Association
17.  Modeling complex genetic and environmental influences on comorbid bipolar disorder with tobacco use disorder 
BMC Medical Genetics  2010;11:14.
Comorbidity of psychiatric and substance use disorders represents a significant complication in the clinical course of both disorders. Bipolar Disorder (BD) is a psychiatric disorder characterized by severe mood swings, ranging from mania to depression, and up to a 70% rate of comorbid Tobacco Use Disorder (TUD). We found epidemiological evidence consistent with a common underlying etiology for BD and TUD, as well as evidence of both genetic and environmental influences on BD and TUD. Therefore, we hypothesized a common underlying genetic etiology, interacting with nicotine exposure, influencing susceptibility to both BD and TUD.
Using meta-analysis, we compared TUD rates for BD patients and the general population. We identified candidate genes showing statistically significant, replicated, evidence of association with both BD and TUD. We assessed commonality among these candidate genes and hypothesized broader, multi-gene network influences on the comorbidity. Using Fisher Exact tests we tested our hypothesized genetic networks for association with the comorbidity, then compared the inferences drawn with those derived from the commonality assessment. Finally, we prioritized candidate SNPs for validation.
We estimate risk for TUD among BD patients at 2.4 times that of the general population. We found three candidate genes associated with both BD and TUD (COMT, SLC6A3, and SLC6A4) and commonality analysis suggests that these genes interact in predisposing psychiatric and substance use disorders. We identified a 69 gene network that influences neurotransmitter signaling and shows significant over-representation of genes associated with BD and TUD, as well as genes differentially expressed with exposure to tobacco smoke. Twenty four of these genes are known drug targets.
This work highlights novel bioinformatics resources and demonstrates the effectiveness of using an integrated bioinformatics approach to improve our understanding of complex disease etiology. We illustrate the development and testing of hypotheses for a comorbidity predisposed by both genetic and environmental influences. Consistent with our hypothesis, the selected network models multiple interacting genetic influences on comorbid BD with TUD, as well as the environmental influence of nicotine. This network nominates candidate genes for validation and drug testing, and we offer a panel of SNPs prioritized for follow-up.
PMCID: PMC2823619  PMID: 20102619
18.  Systematic biological prioritization after a genome-wide association study 
Bioinformatics (Oxford, England)  2008;24(16):1805-1811.
A challenging problem after a genome-wide association study (GWAS) is to balance the statistical evidence of geno-type-phenotype correlation with a priori evidence of biological relevance.
We introduce a method for systematically prioritizing single nucleotide polymorphisms (SNPs) for further study after a GWAS. The method combines evidence across multiple domains, including statistical evidence of genotype-phenotype correlation, known pathways in the pathologic development of disease, SNP/gene functional properties, comparative genomics, prior evidence of genetic linkage, and linkage disequilibrium. We apply this method to a GWAS of nicotine dependence, and use simulated data to test it on several commercial SNP microarrays.
PMCID: PMC2610477  PMID: 18565990
19.  Systematic biological prioritization after a genome-wide association study: an application to nicotine dependence 
Bioinformatics  2008;24(16):1805-1811.
Motivation: A challenging problem after a genome-wide association study (GWAS) is to balance the statistical evidence of genotype–phenotype correlation with a priori evidence of biological relevance.
Results: We introduce a method for systematically prioritizing single nucleotide polymorphisms (SNPs) for further study after a GWAS. The method combines evidence across multiple domains including statistical evidence of genotype–phenotype correlation, known pathways in the pathologic development of disease, SNP/gene functional properties, comparative genomics, prior evidence of genetic linkage, and linkage disequilibrium. We apply this method to a GWAS of nicotine dependence, and use simulated data to test it on several commercial SNP microarrays.
Availability: A comprehensive database of biological prioritization scores for all known SNPs is available at This can be used to prioritize nicotine dependence association studies through a straightforward mathematical formula—no special software is necessary.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2610477  PMID: 18565990
20.  Supplementing High-Density SNP Microarrays for Additional Coverage of Disease-Related Genes: Addiction as a Paradigm 
PLoS ONE  2009;4(4):e5225.
Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well represented by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.
PMCID: PMC2668711  PMID: 19381300
21.  Linkage scan for quantitative traits identifies new regions of interest for substance dependence in the Collaborative Study on the Genetics of Alcoholism (COGA) Sample 
Drug and alcohol dependence  2007;93(1-2):12-20.
Dependence on alcohol and illicit drugs frequently co-occur. Results from a number of twin studies suggest that heritable influences on alcohol dependence and drug dependence may substantially overlap. Using large, genetically informative pedigrees from the Collaborative Study on the Genetics of Alcoholism (COGA), we performed quantitative linkage analyses using a panel of 1717 SNPs. Genome-wide linkage analyses were conducted for quantitative measures of DSM-IV alcohol dependence criteria, cannabis dependence criteria and dependence criteria across any illicit drug (including cannabis) individually and in combination as an average score across alcohol and illicit drug dependence criteria. For alcohol dependence, LOD scores exceeding 2.0 were noted on chromosome 1 (2.0 at 213 cM), 2 (3.4 at 234 cM) and 10 (3.7 at 60 cM). For cannabis dependence, a maximum LOD of 1.9 was noted at 95 cM on chromosome 14. For any illicit drug dependence, LODs of 2.0 and 2.4 were observed on chromosome 10 (116 cM) and 13 (64 cM) respectively. Finally, the combined alcohol and/or drug dependence symptoms yielded LODs > 2.0 on chromosome 2 (3.2, 234 cM), 10 (2.4 and 2.6 at 60 cM and 116 cM) and 13 (2.1 at 64 cM). These regions may harbor genes that contribute to the biological basis of alcohol and drug dependence.
PMCID: PMC2266629  PMID: 17942244
Linkage; alcohol; cannabis; illicit drugs; dependence; COGA
22.  A Risk Allele for Nicotine Dependence in CHRNA5 Is a Protective Allele for Cocaine Dependence 
Biological psychiatry  2008;64(11):922-929.
A non-synonymous coding polymorphism, rs16969968, of the CHRNA5 gene which encodes the alpha-5 subunit of the nicotinic acetylcholine receptor (nAChR) has been found to be associated with nicotine dependence (20). The goal of the present study is to examine the association of this variant with cocaine dependence.
Genetic association analysis in two, independent samples of unrelated cases and controls; 1.) 504 European-American participating in the Family Study on Cocaine Dependence (FSCD); 2.) 814 European Americans participating in the Collaborative Study on the Genetics of Alcoholsim (COGA).
In the FSCD, there was a significant association between the CHRNA5 variant and cocaine dependence (OR = 0.67 per allele, p = 0.0045, assuming an additive genetic model), but in the reverse direction compared to that previously observed for nicotine dependence. In multivariate analyses that controlled for the effects of nicotine dependence, both the protective effect for cocaine dependence and the previously documented risk effect for nicotine dependence were statistically significant. The protective effect for cocaine dependence was replicated in the COGA sample. In COGA, effect sizes for habitual smoking, a proxy phenotype for nicotine dependence, were consistent with those observed in FSCD.
The minor (A) allele of rs16969968, relative to the major G allele, appears to be both a risk factor for nicotine dependence and a protective factor for cocaine dependence. The biological plausibility of such a bidirectional association stems from the involvement of nAChRs with both excitatory and inhibitory modulation of dopamine-mediated reward pathways.
PMCID: PMC2582594  PMID: 18519132
Smoking; Nicotine dependence; Addiction; Substance-use disorders; Genetics; Receptors; nicotinic; Cocaine
23.  Variants in the Nicotinic Receptors Alter the Risk for Nicotine Dependence 
The American journal of psychiatry  2008;165(9):1163-1171.
A recent study provisionally identified numerous genetic variants as risk factors for the transition from smoking to the development of nicotine dependence, including an amino acid change in the α5 nicotinic cholinergic receptor (CHRNA5). The purpose of this study is to replicate these findings in an independent dataset and more thoroughly investigate the role of genetic variation in the cluster of physically linked nicotinic receptors, CHRNA5-CHRNA3-CHRNB4, and the risk of smoking.
Individuals from 219 European American families (N=2,284) were genotyped across this gene cluster to test the genetic association with smoking. The frequency of the amino acid variant (rs16969968) was studied in 995 individuals from diverse ethnic populations. In vitro studies were performed to directly test whether the amino acid variant in the CHRNA5 influenced receptor function.
A genetic variant marking an amino acid change showed association with the smoking phenotype (p=0.007). This variant is within a highly conserved region across non-human species, but its frequency varied across human populations (0% in African populations to 37% in European populations). Furthermore, functional studies demonstrated that the risk allele decreased response to a nicotine agonist. A second independent finding was seen at rs578776 (p=0.003), and the functional significance of this association remains unknown.
This study confirms that at least two independent variants in this nicotinic receptor gene cluster contribute to the development of habitual smoking in some populations, and it underscores the importance of multiple genetic variants contributing to the development of common diseases in various populations.
PMCID: PMC2574742  PMID: 18519524
24.  In search of causal variants: refining disease association signals using cross-population contrasts 
BMC Genetics  2008;9:58.
Genome-wide association (GWA) using large numbers of single nucleotide polymorphisms (SNPs) is now a powerful, state-of-the-art approach to mapping human disease genes. When a GWA study detects association between a SNP and the disease, this signal usually represents association with a set of several highly correlated SNPs in strong linkage disequilibrium. The challenge we address is to distinguish among these correlated loci to highlight potential functional variants and prioritize them for follow-up.
We implemented a systematic method for testing association across diverse population samples having differing histories and LD patterns, using a logistic regression framework. The hypothesis is that important underlying biological mechanisms are shared across human populations, and we can filter correlated variants by testing for heterogeneity of genetic effects in different population samples. This approach formalizes the descriptive comparison of p-values that has typified similar cross-population fine-mapping studies to date. We applied this method to correlated SNPs in the cholinergic nicotinic receptor gene cluster CHRNA5-CHRNA3-CHRNB4, in a case-control study of cocaine dependence composed of 504 European-American and 583 African-American samples. Of the 10 SNPs genotyped in the r2 ≥ 0.8 bin for rs16969968, three demonstrated significant cross-population heterogeneity and are filtered from priority follow-up; the remaining SNPs include rs16969968 (heterogeneity p = 0.75). Though the power to filter out rs16969968 is reduced due to the difference in allele frequency in the two groups, the results nevertheless focus attention on a smaller group of SNPs that includes the non-synonymous SNP rs16969968, which retains a similar effect size (odds ratio) across both population samples.
Filtering out SNPs that demonstrate cross-population heterogeneity enriches for variants more likely to be important and causative. Our approach provides an important and effective tool to help interpret results from the many GWA studies now underway.
PMCID: PMC2556340  PMID: 18759969
25.  Novel Genes Identified in a High Density Genome Wide Association Study for Nicotine Dependence 
Human molecular genetics  2006;16(1):24-35.
Tobacco use is a leading contributor to disability and death worldwide, and genetic factors contribute in part to the development of nicotine dependence. To identify novel genes for which natural variation contributes to the development of nicotine dependence, we performed a comprehensive genome wide association study using nicotine dependent smokers as cases and non-dependent smokers as controls. To allow the efficient, rapid, and cost effective screen of the genome, the study was carried out using a two-stage design. In the first stage, genotyping of over 2.4 million SNPs was completed in case and control pools. In the second stage, we selected SNPs for individual genotyping based on the most significant allele frequency differences between cases and controls from the pooled results. Individual genotyping was performed in 1050 cases and 879 controls using 31,960 selected SNPs. The primary analysis, a logistic regression model with covariates of age, gender, genotype and gender by genotype interaction, identified 35 SNPs with p-values less than 10-4 (minimum p-value 1.53 × 10-6). Although none of the individual findings is statistically significant after correcting for multiple tests, additional statistical analyses support the existence of true findings in this group. Our study nominates several novel genes, such as Neurexin 1 (NRXN1), in the development of nicotine dependence while also identifying a known candidate gene, the β3 nicotinic cholinergic receptor. This work anticipates the future directions of large-scale genome wide association studies with state-of-the-art methodological approaches and sharing of data with the scientific community.
PMCID: PMC2278047  PMID: 17158188

Results 1-25 (28)