Search tips
Search criteria

Results 1-25 (30)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Rare, low frequency, and common coding variants in CHRNA5 and their contribution to nicotine dependence in European and African Americans 
Molecular psychiatry  2015;21(5):601-607.
The common nonsynonymous variant rs16969968 in the α5 nicotinic receptor subunit gene (CHRNA5) is the strongest genetic risk factor for nicotine dependence in European Americans and contributes to risk in African Americans. To comprehensively examine whether other CHRNA5 coding variation influences nicotine dependence risk, we performed targeted sequencing on 1582 nicotine dependent cases (Fagerström Test for Nicotine Dependence score≥4) and 1238 non-dependent controls, with independent replication of common and low frequency variants using 12 studies with exome chip data. Nicotine dependence was examined using logistic regression with individual common variants (MAF≥0.05), aggregate low frequency variants (0.05>MAF≥0.005), and aggregate rare variants (MAF<0.005). Meta-analysis of primary results was performed with replication studies containing 12 174 heavy and 11 290 light smokers. Next-generation sequencing with 180X coverage identified 24 nonsynonymous variants and 2 frameshift deletions in CHRNA5, including 9 novel variants in the 2820 subjects. Meta-analysis confirmed the risk effect of the only common variant (rs16969968, European ancestry: OR=1.3, p=3.5×10−11; African ancestry: OR=1.3, p=0.01) and demonstrated that 3 low frequency variants contributed an independent risk (aggregate term, European ancestry: OR=1.3, p=0.005; African ancestry: OR=1.4, p=0.0006). The remaining 22 rare coding variants were associated with increased risk of nicotine dependence in the European American primary sample (OR=12.9, p=0.01) and in the same risk direction in African Americans (OR=1.5, p=0.37). Our results indicate that common, low frequency and rare CHRNA5 coding variants are independently associated with nicotine dependence risk. These newly identified variants likely influence risk for smoking-related diseases such as lung cancer.
PMCID: PMC4740321  PMID: 26239294
2.  Genetic Risk Can Be Decreased: Quitting Smoking Decreases and Delays Lung Cancer for Smokers With High and Low CHRNA5 Risk Genotypes — A Meta-Analysis 
EBioMedicine  2016;11:219-226.
Recent meta-analyses show that individuals with high risk variants in CHRNA5 on chromosome 15q25 are likely to develop lung cancer earlier than those with low-risk genotypes. The same high-risk genetic variants also predict nicotine dependence and delayed smoking cessation. It is unclear whether smoking cessation confers the same benefits in terms of lung cancer risk reduction for those who possess CHRNA5 risk variants versus those who do not.
Meta-analyses examined the association between smoking cessation and lung cancer risk in 15 studies of individuals with European ancestry who possessed varying rs16969968 genotypes (N = 12,690 ever smokers, including 6988 cases of lung cancer and 5702 controls) in the International Lung Cancer Consortium.
Smoking cessation (former vs. current smokers) was associated with a lower likelihood of lung cancer (OR = 0.48, 95%CI = 0.30–0.75, p = 0.0015). Among lung cancer patients, smoking cessation was associated with a 7-year delay in median age of lung cancer diagnosis (HR = 0.68, 95%CI = 0.61–0.77, p = 4.9 ∗ 10–10). The CHRNA5 rs16969968 risk genotype (AA) was associated with increased risk and earlier diagnosis for lung cancer, but the beneficial effects of smoking cessation were very similar in those with and without the risk genotype.
We demonstrate that quitting smoking is highly beneficial in reducing lung cancer risks for smokers regardless of their CHRNA5 rs16969968 genetic risk status. Smokers with high-risk CHRNA5 genotypes, on average, can largely eliminate their elevated genetic risk for lung cancer by quitting smoking- cutting their risk of lung cancer in half and delaying its onset by 7 years for those who develop it. These results: 1) underscore the potential value of smoking cessation for all smokers, 2) suggest that CHRNA5 rs16969968 genotype affects lung cancer diagnosis through its effects on smoking, and 3) have potential value for framing preventive interventions for those who smoke.
•CHRNA5 rs16969968 confers risk for earlier lung cancer diagnosis, but quitting produces benefit regardless of genotype.•Smokers can cut their risk of lung cancer in half and delay its onset by 7 years among those diagnosed.•Precision prevention allows clinicians to provide personalized health benefits of smoking cessation.
This is a report on whether smoking cessation confers the same benefits in terms of lung cancer risk reduction for those who possess CHRNA5 risk variants versus those who do not. We determined that quitting smoking is highly beneficial in reducing lung cancer risk levels for smokers regardless of their CHRNA5 rs16969968 genetic risk status. Although CHRNA5 rs16969968 increases risk for earlier lung cancer by 4 years, quitting produces essentially the same benefit for smokers with either high or low genetic risks. Smokers can cut their risk of lung cancer in half and delay its onset by 7 years among those diagnosed. These results are important for smokers to prevent cancer. On average, smokers at all genetic risk levels can largely eliminate their elevated risk for lung cancer by quitting smoking.
PMCID: PMC5049934  PMID: 27543155
Smoking cessation; Genetics; Meta-analysis; Lung cancer
3.  Return of individual genetic results in a high-risk sample: enthusiasm and positive behavioral change 
The goal of this study is to examine participant responses to disclosure of genetic results in a minority population at high-risk for depression and anxiety.
82 subjects in a genetic study of nicotine dependence were offered personalized genetic results: all were nicotine dependent and 64% self-identified as African American. Pathway Genomics was used to evaluate genetic risks for 5 complex diseases. Participants returned 4–8 weeks following enrollment for in-person genetic counseling interviews and evaluation of baseline measures. A telephone follow-up was performed 4–8 weeks later to assess responses to results.
50 of the 82 subjects (61%) were interested in receiving genetic results. These participants had multiple risk factors, including high baseline measures of depression (66%) and anxiety (32%), as well as low rates of employment (46%), adequate health literacy (46%), and health insurance (45%). Pathway Genomics reported “increased risk” for at least one disease in 77% of subjects. 95% of participants reported that they appreciated the genetic results, and receiving these results was not associated with changes in symptoms of depression or anxiety. Furthermore, after return of genetic results, smoking cessation attempts increased (p=0.003).
Even in an underserved population at high-risk for adverse psychological reactions, subjects responded positively to personalized genetic results.
PMCID: PMC4344933  PMID: 25166427
return of results; minorities; genomics; participants
4.  When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments? 
PLoS ONE  2015;10(10):e0137601.
Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen’s kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants.
PMCID: PMC4601794  PMID: 26458263
5.  Multiple distinct CHRNB3-CHRNA6 variants are genetic risk factors for nicotine dependence in African Americans and European Americans 
Addiction (Abingdon, England)  2014;109(5):814-822.
Studies have shown association between common variants in the α6–β3 nicotinic receptor subunit gene cluster and nicotine dependence in European Ancestry populations. We investigate whether this generalizes to African Americans, whether the association is specific to nicotine dependence, and whether this region contains additional genetic contributors to nicotine dependence.
We examined consistency of association across studies and race between the α6β3 nicotinic receptor subunit locus and nicotine, alcohol, marijuana, and cocaine dependence in three independent studies.
United States of America
European Americans and African Americans from three case control studies of substance dependence.
Subjects were evaluated using the Semi-Structured Assessment for the Genetics of Alcoholism. Nicotine dependence was determined using the Fagerström Test for Nicotine Dependence.
rs13273442 was significantly associated to nicotine dependence across all three studies in both ancestry groups (OR=0.75, p=5.8 × 10−4 European Americans; OR=0.80, p=0.05 African Americans). No other substance dependence was consistently associated to this variant in either group. Another SNP in the region, rs4952, remains modestly associated with nicotine dependence in the combined data after conditioning on rs13273442.
The common variant rs13273442 in the CHRNB3-CHNRA6 region is significantly associated to nicotine dependence in European Americans and African Americans across studies recruited for nicotine, alcohol, and cocaine dependence. Although these data are modestly powered for other substances, our results provide no evidence that correlates of rs13273442 represent a general substance dependence liability. Additional variants likely account for some of the association of this region to nicotine dependence.
PMCID: PMC3984604  PMID: 24401102
6.  Genotypic discrepancies arising from imputation 
BMC Proceedings  2014;8(Suppl 1):S17.
The ideal genetic analysis of family data would include whole genome sequence on all family members. A strategy of combining sequence data from a subset of key individuals with inexpensive, genome-wide association study (GWAS) chip genotypes on all individuals to infer sequence level genotypes throughout the families has been suggested as a highly accurate alternative. This strategy was followed by the Genetic Analysis Workshop 18 data providers. We examined the quality of the imputation to identify potential consequences of this strategy by comparing discrepancies between GWAS genotype calls and imputed calls for the same variants. Overall, the inference and imputation process worked very well. However, we find that discrepancies occurred at an increased rate when imputation was used to infer missing data in sequenced individuals. Although this may be an artifact of this particular instantiation of these analytic methods, there may be general genetic or algorithmic reasons to avoid trying to fill in missing sequence data. This is especially true given the risk of false positives and reduction in power for family-based transmission tests when founders are incorrectly imputed as heterozygotes. Finally, we note a higher rate of discrepancies when unsequenced individuals are inferred using sequenced individuals from other pedigrees drawn from the same admixed population.
PMCID: PMC4143754  PMID: 25519370
7.  Identifying cryptic population structure in multigenerational pedigrees in a Mexican American sample 
BMC Proceedings  2014;8(Suppl 1):S4.
Cryptic population structure can increase both type I and type II errors. This is particularly problematic in case-control association studies of unrelated individuals. Some researchers believe that these problems are obviated in families. We argue here that this may not be the case, especially if families are drawn from a known admixed population such as Mexican Americans. We use a principal component approach to evaluate and visualize the results of three different approaches to searching for cryptic structure in the 20 multigenerational families of the Genetic Analysis Workshop 18 (GAW18). Approach 1 uses all family members in the sample to identify what might be considered "outlier" kindreds. Because families are likely to differ in size (in the GAW18 families, there is about a 4-fold difference in the number of typed individuals), approach 2 uses a weighting system that equalizes pedigree size. Approach 3 concentrates on the founders and the "marry-ins" because, in principle, the entire pedigree can be reconstructed with knowledge of the sequence of these unrelated individuals and genome-wide association study (GWAS) data on everyone else (to identify the position of recombinations). We demonstrate that these three approaches can yield very different insights about cryptic structure in a sample of families.
PMCID: PMC4143674  PMID: 25519323
8.  Genetic Analysis Workshop 18: Methods and strategies for analyzing human sequence and phenotype data in members of extended pedigrees 
BMC Proceedings  2014;8(Suppl 1):S1.
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
PMCID: PMC4143625  PMID: 25519310
9.  Interpreting joint SNP analysis results: when are two distinct signals really two distinct signals? 
Genetic epidemiology  2013;37(3):301-309.
In genetic association studies, much effort has focused on moving beyond the initial single nucleotide polymorphism (SNP)-by-SNP analysis. One approach is to re-analyze a chromosomal region where an association has been detected, jointly analyzing the SNP thought to best represent that association with each additional SNP in the region. Such joint analyses may help identify additional, statistically independent association signals. However, it is possible for a single genetic effect to produce joint SNP results that would typically be interpreted as two distinct effects (e.g. both SNPs are significant in the joint model). We present a general approach that can (1) identify conditions under which a single variant could produce a given joint SNP result, and (2) use these conditions to identify variants from a list of known SNPs (e.g. 1000 Genomes) as candidates that could produce the observed signal. We apply this method to our previously reported joint result for smoking involving rs16969968 and rs588765 in CHRNA5. We demonstrate that it is theoretically possible for a joint SNP result suggestive of two independent signals to be produced by a single causal variant. Furthermore, this variant need not be highly correlated with the two tested SNPs nor must it have a large odds ratio. Our method aids in interpretation of joint SNP results by identifying new candidate variants for biological causation that would be missed by traditional approaches. Also, it can connect association findings that may seem disparate due to lack of high correlations among the associated SNPs.
PMCID: PMC3743534  PMID: 23404318
genetic association; gametic disequilibrium; multi SNP analysis; candidate gene; smoking; nicotine dependence
10.  Protocol for a collaborative meta-analysis of 5-HTTLPR, stress, and depression 
BMC Psychiatry  2013;13:304.
Debate is ongoing about what role, if any, variation in the serotonin transporter linked polymorphic region (5-HTTLPR) plays in depression. Some studies report an interaction between 5-HTTLPR variation and stressful life events affecting the risk for depression, others report a main effect of 5-HTTLPR variation on depression, while others find no evidence for either a main or interaction effect. Meta-analyses of multiple studies have also reached differing conclusions.
To improve understanding of the combined roles of 5-HTTLPR variation and stress in the development of depression, we are conducting a meta-analysis of multiple independent datasets. This coordinated approach utilizes new analyses performed with centrally-developed, standardized scripts. This publication documents the protocol for this collaborative, consortium-based meta-analysis of 5-HTTLPR variation, stress, and depression.
Study eligibility criteria: Our goal is to invite all datasets, published or unpublished, with 5-HTTLPR genotype and assessments of stress and depression for at least 300 subjects. This inclusive approach is to minimize potential impact from publication bias.
Data sources: This project currently includes investigators from 35 independent groups, providing data on at least N = 33,761 participants.
The analytic plan was determined prior to starting data analysis. Analyses of individual study datasets will be performed by the investigators who collected the data using centrally-developed standardized analysis scripts to ensure a consistent analytical approach across sites. The consortium as a group will review and interpret the meta-analysis results.
Variation in 5-HTTLPR is hypothesized to moderate the response to stress on depression. To test specific hypotheses about the role of 5-HTTLPR variation on depression, we will perform coordinated meta-analyses of de novo results obtained from all available data, using variables and analyses determined a priori. Primary analyses, based on the original 2003 report by Caspi and colleagues of a GxE interaction will be supplemented by secondary analyses to help interpret and clarify issues ranging from the mechanism of effect to heterogeneity among the contributing studies. Publication of this protocol serves to protect this project from biased reporting and to improve the ability of readers to interpret the results of this specific meta-analysis upon its completion.
PMCID: PMC3840571  PMID: 24219410
11.  Increased Genetic Vulnerability to Smoking at CHRNA5 in Early-Onset Smokers 
Hartz, Sarah M. | Short, Susan E. | Saccone, Nancy L. | Culverhouse, Robert | Chen, LiShiun | Schwantes-An, Tae-Hwi | Coon, Hilary | Han, Younghun | Stephens, Sarah H. | Sun, Juzhong | Chen, Xiangning | Ducci, Francesca | Dueker, Nicole | Franceschini, Nora | Frank, Josef | Geller, Frank | Guđbjartsson, Daniel | Hansel, Nadia N. | Jiang, Chenhui | Keskitalo-Vuokko, Kaisu | Liu, Zhen | Lyytikäinen, Leo-Pekka | Michel, Martha | Rawal, Rajesh | Hum, Sc | Rosenberger, Albert | Scheet, Paul | Shaffer, John R. | Teumer, Alexander | Thompson, John R. | Vink, Jacqueline M. | Vogelzangs, Nicole | Wenzlaff, Angela S. | Wheeler, William | Xiao, Xiangjun | Yang, Bao-Zhu | Aggen, Steven H. | Balmforth, Anthony J. | Baumeister, Sebastian E. | Beaty, Terri | Bennett, Siiri | Bergen, Andrew W. | Boyd, Heather A. | Broms, Ulla | Campbell, Harry | Chatterjee, Nilanjan | Chen, Jingchun | Cheng, Yu-Ching | Cichon, Sven | Couper, David | Cucca, Francesco | Dick, Danielle M. | Foroud, Tatiana | Furberg, Helena | Giegling, Ina | Gu, Fangyi | Hall, Alistair S. | Hällfors, Jenni | Han, Shizhong | Hartmann, Annette M. | Hayward, Caroline | Heikkilä, Kauko | Lic, Phil | Hewitt, John K. | Hottenga, Jouke Jan | Jensen, Majken K. | Jousilahti, Pekka | Kaakinen, Marika | Kittner, Steven J. | Konte, Bettina | Korhonen, Tellervo | Landi, Maria-Teresa | Laatikainen, Tiina | Leppert, Mark | Levy, Steven M. | Mathias, Rasika A. | McNeil, Daniel W. | Medland, Sarah E. | Montgomery, Grant W. | Muley, Thomas | Murray, Tanda | Nauck, Matthias | North, Kari | Pergadia, Michele | Polasek, Ozren | Ramos, Erin M. | Ripatti, Samuli | Risch, Angela | Ruczinski, Ingo | Rudan, Igor | Salomaa, Veikko | Schlessinger, David | Styrkársdóttir, Unnur | Terracciano, Antonio | Uda, Manuela | Willemsen, Gonneke | Wu, Xifeng | Abecasis, Goncalo | Barnes, Kathleen | Bickeböller, Heike | Boerwinkle, Eric | Boomsma, Dorret I. | Caporaso, Neil | Duan, Jubao | Edenberg, Howard J. | Francks, Clyde | Gejman, Pablo V. | Gelernter, Joel | Grabe, Hans Jörgen | Hops, Hyman | Jarvelin, Marjo-Riitta | Viikari, Jorma | Kähönen, Mika | Kendler, Kenneth S. | Lehtimäki, Terho | Levinson, Douglas F. | Marazita, Mary L. | Marchini, Jonathan | Melbye, Mads | Mitchell, Braxton D. | Murray, Jeffrey C. | Nöthen, Markus M. | Penninx, Brenda W. | Raitakari, Olli | Rietschel, Marcella | Rujescu, Dan | Samani, Nilesh J. | Sanders, Alan R. | Schwartz, Ann G. | Shete, Sanjay | Shi, Jianxin | Spitz, Margaret | Stefansson, Kari | Swan, Gary E. | Thorgeirsson, Thorgeir | Völzke, Henry | Wei, Qingyi | Wichmann, H.-Erich | Amos, Christopher I. | Breslau, Naomi | Cannon, Dale S. | Ehringer, Marissa | Grucza, Richard | Hatsukami, Dorothy | Heath, Andrew | Johnson, Eric O. | Kaprio, Jaakko | Madden, Pamela | Martin, Nicholas G. | Stevens, Victoria L. | Stitzel, Jerry A. | Weiss, Robert B. | Kraft, Peter | Bierut, Laura J.
Archives of general psychiatry  2012;69(8):854-860.
Recent studies have shown an association between cigarettes per day (CPD) and a nonsynonymous single-nucleotide polymorphism in CHRNA5, rs16969968.
To determine whether the association between rs16969968 and smoking is modified by age at onset of regular smoking.
Data Sources
Primary data.
Study Selection
Available genetic studies containing measures of CPD and the genotype of rs16969968 or its proxy.
Data Extraction
Uniform statistical analysis scripts were run locally. Starting with 94 050 ever-smokers from 43 studies, we extracted the heavy smokers (CPD >20) and light smokers (CPD ≤10) with age-at-onset information, reducing the sample size to 33 348. Each study was stratified into early-onset smokers (age at onset ≤16 years) and late-onset smokers (age at onset >16 years), and a logistic regression of heavy vs light smoking with the rs16969968 genotype was computed for each stratum. Meta-analysis was performed within each age-at-onset stratum.
Data Synthesis
Individuals with 1 risk allele at rs16969968 who were early-onset smokers were significantly more likely to be heavy smokers in adulthood (odds ratio [OR]=1.45; 95% CI, 1.36–1.55; n=13 843) than were carriers of the risk allele who were late-onset smokers (OR = 1.27; 95% CI, 1.21–1.33, n = 19 505) (P = .01).
These results highlight an increased genetic vulnerability to smoking in early-onset smokers.
PMCID: PMC3482121  PMID: 22868939
12.  The Challenge of Detecting Epistasis (G×G Interactions): Genetic Analysis Workshop 16 
Genetic epidemiology  2009;33(0 1):S58-S67.
Interest is increasing in epistasis as a possible source of the unexplained variance missed by genome-wide association studies. The Genetic Analysis Workshop 16 Group 9 participants evaluated a wide variety of classical and novel analytical methods for detecting epistasis, in both the statistical and machine learning paradigms, applied to both real and simulated data. Because the magnitude of epistasis is clearly relative to scale of penetrance, and therefore to some extent, to the choice of model framework, it is not surprising that strong interactions under one model might be minimized or even disappear entirely under a different modeling framework.
PMCID: PMC3692280  PMID: 19924703
generalized linear model; machine learning methods
13.  Smoking and Genetic Risk Variation across Populations of European, Asian, and African-American Ancestry - A Meta-analysis of Chromosome 15q25 
Genetic epidemiology  2012;36(4):340-351.
Recent meta-analyses of European ancestry subjects show strong evidence for association between smoking quantity and multiple genetic variants on chromosome 15q25. This meta-analysis extends the examination of association between distinct genes in the CHRNA5-CHRNA3-CHRNB4 region and smoking quantity to Asian and African American populations to confirm and refine specific reported associations.
Association results for a dichotomized cigarettes smoked per day (CPD) phenotype in 27 datasets (European ancestry (N=14,786), Asian (N=6,889), and African American (N=10,912) for a total of 32,587 smokers) were meta-analyzed by population and results were compared across all three populations.
We demonstrate association between smoking quantity and markers in the chromosome 15q25 region across all three populations, and narrow the region of association. Of the variants tested, only rs16969968 is associated with smoking (p < 0.01) in each of these three populations (OR=1.33, 95%C.I.=1.25–1.42, p=1.1×10−17 in meta-analysis across all population samples). Additional variants displayed a consistent signal in both European ancestry and Asian datasets, but not in African Americans.
The observed consistent association of rs16969968 with heavy smoking across multiple populations, combined with its known biological significance, suggests rs16969968 is most likely a functional variant that alters risk for heavy smoking. We interpret additional association results that differ across populations as providing evidence for additional functional variants, but we are unable to further localize the source of this association. Using the cross-population study paradigm provides valuable insights to narrow regions of interest and inform future biological experiments.
PMCID: PMC3387741  PMID: 22539395
smoking; genetics; meta-analysis; cross-population
14.  A Comparison of Methods Sensitive to Interactions with Small Main Effects 
Genetic Epidemiology  2012;36(4):303-311.
Numerous genetic variants have been successfully identified for complex traits, yet these genetic factors only account for a modest portion of the predicted variance due to genetic factors. This has led to increased interest in other approaches to account for the “missing” genetic contributions to phenotype, including joint gene-gene or gene-environment analysis.
A variety of methods for such analysis have been advocated. However, they have seldom been compared systematically. To facilitate such comparisons, the developers of the Multifactor Dimensionality Reduction (MDR) simulated 100 data replicates for each of 96 two-locus models displaying negligible marginal effects from either locus (16 variations on each of 6 basic genetic models). The genetic models, based on a dichotomous phenotype, had varying minor allele frequencies and from 2 to 8 distinct risk levels associated with genotype. The basic models were modified to include “noise” from combinations of missing data, genotyping error, genetic heterogeneity, and phenocopies. This study compares the performance of three methods designed to be sensitive to joint effects (MDR, Support Vector Machines (SVM), and the Restricted Partition Method (RPM)) on these simulated data.
In these tests, the RPM consistently outperformed the other two methods for each of the 6 classes of genetic models. In contrast, the comparison between other two methods had mixed results. The MDR outperformed the SVM when the true model had only a few, well-separated risk classes; while the SVM outperformed the MDR on more complicated models. Of these methods, only MDR has a well-developed user interface.
PMCID: PMC3357917  PMID: 22460684
epistasis; missing heritability; simulated data; Multifactor Dimensionality Reduction (MDR); Support Vector Machine (SVM); Restricted Partition Method (RPM)
15.  Regression and Data Mining Methods for Analyses of Multiple Rare Variants in the Genetic Analysis Workshop 17 Mini-Exome Data 
Genetic Epidemiology  2011;35(Suppl 1):S92-100.
Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.
PMCID: PMC3360949  PMID: 22128066
rare variants; LASSO; machine learning; random forests; logic regression; binary trees; Poisson regression; ISIS; classification trees; meta-analysis; extreme sampling
16.  Uncovering hidden variance: pair-wise SNP analysis accounts for additional variance in nicotine dependence 
Human genetics  2010;129(2):177-188.
Results from genome-wide association studies of complex traits account for only a modest proportion of the trait variance predicted to be due to genetics. We hypothesize that joint analysis of polymorphisms may account for more variance. We evaluated this hypothesis on a case–control smoking phenotype by examining pairs of nicotinic receptor single-nucleotide polymorphisms (SNPs) using the Restricted Partition Method (RPM) on data from the Collaborative Genetic Study of Nicotine Dependence (COGEND). We found evidence of joint effects that increase explained variance. Four signals identified in COGEND were testable in independent American Cancer Society (ACS) data, and three of the four signals replicated. Our results highlight two important lessons: joint effects that increase the explained variance are not limited to loci displaying substantial main effects, and joint effects need not display a significant interaction term in a logistic regression model. These results suggest that the joint analyses of variants may indeed account for part of the genetic variance left unexplained by single SNP analyses. Methodologies that limit analyses of joint effects to variants that demonstrate association in single SNP analyses, or require a significant interaction term, will likely miss important joint effects.
PMCID: PMC3030551  PMID: 21079997
17.  The Age of Necrotizing Enterocolitis Onset: An Application of Sartwell’s Incubation Period Model 
Model age of necrotizing enterocolitis (NEC) onset applying Sartwell’s model of incubation periods, and examine its relationship to gestational age (GA).
Study design
Retrospective chart review of St. Louis Children’s Hospital neonates diagnosed with NEC (≥ Bell’s stage II) from 2004 to 2008, inclusive.
The relationship between age of NEC (N=84 cases) onset and GA best fits a non-linear model, with infants ≤ 28 weeks having a disproportionately longer time to onset than older GA groups and explained 50.3% of the variability in age of NEC onset. Additional clinical variables provided no improvement in explaining age of NEC onset. Application of Sartwell’s model to age of NEC onset proved a good fit, when birth is used as the common exposure episode, and age is the equivalent of the incubation period.
The relationship between day of NEC diagnosis and GA is non-linear, with lower GA infants having disproportionately longer time to onset. Despite these GA differences, the fit to Sartwell’s model for incubation periods model is consistent with NEC being a consequence of an event that occurs at or soon after birth.
PMCID: PMC3145821  PMID: 21273988
premature morbidity; intestinal injury; newborn
18.  Identifying rare variants from exome scans: the GAW17 experience 
BMC Proceedings  2011;5(Suppl 9):S1.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
PMCID: PMC3287821  PMID: 22373325
19.  Stratify or adjust? Dealing with multiple populations when evaluating rare variants 
BMC Proceedings  2011;5(Suppl 9):S101.
The unrelated individuals sample from Genetic Analysis Workshop 17 consists of a small number of subjects from eight population samples and genetic data composed mostly of rare variants. We compare two simple approaches to collapsing rare variants within genes for their utility in identifying genes that affect phenotype. We also compare results from stratified analyses to those from a pooled analysis that uses ethnicity as a covariate. We found that the two collapsing approaches were similarly effective in identifying genes that contain causative variants in these data. However, including population as a covariate was not an effective substitute for analyzing the subpopulations separately when only one subpopulation contained a rare variant linked to the phenotype.
PMCID: PMC3287824  PMID: 22373399
20.  Linkage analysis merging replicate phenotypes: an application to three quantitative phenotypes in two African samples 
BMC Proceedings  2011;5(Suppl 9):S81.
We report two approaches for linkage analysis of data consisting of replicate phenotypes. The first approach is specifically designed for the unusual (in human data) replicate structure of the Genetic Analysis Workshop 17 pedigree data. The second approach consists of a standard linkage analysis that, although not specifically tailored to data consisting of replicate genotypes, was envisioned as providing a sounding board against which our novel approach could be assessed. Both approaches are applied to the analysis of three quantitative phenotypes (Q1, Q2, and Q4) in two sets of African families. All analyses were carried out blind to the generating model (i.e., the “answers”). Using both methods, we found numerous significant linkage signals for Q1, although population colocalization was absent for most of these signals. The linkage analysis of Q2 and Q4 failed to reveal any strong linkage signals.
PMCID: PMC3287922  PMID: 22373343
21.  The CHRNA5-CHRNA3-CHRNB4 nicotinic receptor subunit gene cluster affects risk for nicotine dependence in African-Americans and in European-Americans 
Cancer research  2009;69(17):6848-6856.
Genetic association studies have demonstrated the importance of variants in the CHRNA5-CHRNA3-CHRNB4 cholinergic nicotinic receptor subunit gene cluster on chromosome 15q24-25.1 in risk for nicotine dependence, smoking, and lung cancer in populations of European descent. We have now carried out a detailed study of this region using dense genotyping in both European- and African-Americans.
We genotyped 75 known single-nucleotide-polymorphisms (SNPs) and one sequencing-discovered SNP in an African-American (AA) sample (N = 710) and European-American (EA) sample (N = 2062). Cases were nicotine-dependent and controls were non-dependent smokers.
The non-synonymous CHRNA5 SNP rs16969968 is the most significant SNP associated with nicotine dependence in the full sample of 2772 subjects (p = 4.49×10−8, OR 1.42 (1.25–1.61)) as well as in AAs only (p = 0.015, OR = 2.04 (1.15–3.62)) and EAs only (p = 4.14×10−7, OR = 1.40 (1.23–1.59)). Other SNPs that have been shown to affect mRNA levels of CHRNA5 in EAs are associated with nicotine dependence in AAs but not in EAs. The CHRNA3 SNP rs578776, which has low correlation with rs16969968, is associated with nicotine dependence in EAs but not in AAs. Less common SNPs (frequency ≤ 5%) also are associated with nicotine dependence.
In summary, multiple variants in this gene cluster contribute to nicotine dependence risk, and some are also associated with functional effects on CHRNA5. The non-synonymous SNP rs16969968, a known risk variant in European-descent populations, is also significantly associated with risk in African-Americans. Additional SNPs contribute in distinct ways to risk in these two populations.
PMCID: PMC2874321  PMID: 19706762
genetic association; smoking; cholinergic nicotinic receptors; nicotinic acetylcholine receptors
22.  Multiple Independent Loci at Chromosome 15q25.1 Affect Smoking Quantity: a Meta-Analysis and Comparison with Lung Cancer and COPD 
PLoS Genetics  2010;6(8):e1001053.
Recently, genetic association findings for nicotine dependence, smoking behavior, and smoking-related diseases converged to implicate the chromosome 15q25.1 region, which includes the CHRNA5-CHRNA3-CHRNB4 cholinergic nicotinic receptor subunit genes. In particular, association with the nonsynonymous CHRNA5 SNP rs16969968 and correlates has been replicated in several independent studies. Extensive genotyping of this region has suggested additional statistically distinct signals for nicotine dependence, tagged by rs578776 and rs588765. One goal of the Consortium for the Genetic Analysis of Smoking Phenotypes (CGASP) is to elucidate the associations among these markers and dichotomous smoking quantity (heavy versus light smoking), lung cancer, and chronic obstructive pulmonary disease (COPD). We performed a meta-analysis across 34 datasets of European-ancestry subjects, including 38,617 smokers who were assessed for cigarettes-per-day, 7,700 lung cancer cases and 5,914 lung-cancer-free controls (all smokers), and 2,614 COPD cases and 3,568 COPD-free controls (all smokers). We demonstrate statistically independent associations of rs16969968 and rs588765 with smoking (mutually adjusted p-values<10−35 and <10−8 respectively). Because the risk alleles at these loci are negatively correlated, their association with smoking is stronger in the joint model than when each SNP is analyzed alone. Rs578776 also demonstrates association with smoking after adjustment for rs16969968 (p<10−6). In models adjusting for cigarettes-per-day, we confirm the association between rs16969968 and lung cancer (p<10−20) and observe a nominally significant association with COPD (p = 0.01); the other loci are not significantly associated with either lung cancer or COPD after adjusting for rs16969968. This study provides strong evidence that multiple statistically distinct loci in this region affect smoking behavior. This study is also the first report of association between rs588765 (and correlates) and smoking that achieves genome-wide significance; these SNPs have previously been associated with mRNA levels of CHRNA5 in brain and lung tissue.
Author Summary
Nicotine binds to cholinergic nicotinic receptors, which are composed of a variety of subunits. Genetic studies for smoking behavior and smoking-related diseases have implicated a genomic region that encodes the alpha5, alpha3, and beta4 subunits. We examined genetic data across this region for over 38,000 smokers, a subset of which had been assessed for lung cancer or chronic obstructive pulmonary disease. We demonstrate strong evidence that there are at least two statistically independent loci in this region that affect risk for heavy smoking. One of these loci represents a change in the protein structure of the alpha5 subunit. This work is also the first to report strong evidence of association between smoking and a group of genetic variants that are of biological interest because of their links to expression of the alpha5 cholinergic nicotinic receptor subunit gene. These advances in understanding the genetic influences on smoking behavior are important because of the profound public health burdens caused by smoking and nicotine addiction.
PMCID: PMC2916847  PMID: 20700436
23.  Power and false-positive rates for the restricted partition method (RPM) in a large candidate gene data set 
BMC Proceedings  2009;3(Suppl 7):S74.
Many phenotypes of public health importance (e.g., diabetes, coronary artery disease, major depression, obesity, and addictions to alcohol and nicotine) involve complex pathways of action. Interactions between genetic variants or between genetic variants and environmental factors likely play important roles in the functioning of these pathways. Unfortunately, complex interacting systems are likely to have important interacting factors that may not readily reveal themselves to univariate analyses. Instead, detecting the role of some of these factors may require analyses that are sensitive to interaction effects.
In this study, we evaluate the sensitivity and specificity of the restricted partition method (RPM) to detect signals related to coronary artery disease in the Genetic Analysis Workshop 16 Problem 3 data using the 50,000 k candidate gene single-nucleotide polymorphism set. Power and false-positive rates were evaluated using the first 100 replicate datasets. This included an exploration of the utility of using of all genotyped family members compared with selecting one member per family.
PMCID: PMC2795976  PMID: 20018069
24.  The Genetic Analysis Workshop 16 Problem 3: simulation of heritable longitudinal cardiovascular phenotypes based on actual genome-wide single-nucleotide polymorphisms in the Framingham Heart Study 
BMC Proceedings  2009;3(Suppl 7):S4.
The Genetic Analysis Workshop (GAW) 16 Problem 3 comprises simulated phenotypes emulating the lipid domain and its contribution to cardiovascular disease risk. For each replication there were 6,476 subjects in families from the Framingham Heart Study (FHS), with their actual genotypes for Affymetrix 550 k single-nucleotide polymorphisms (SNPs) and simulated phenotypes. Phenotypes are simulated at three visits, 10 years apart. There are up to 6 "major" genes influencing variation in high- and low-density lipoprotein cholesterol (HDL, LDL), and triglycerides (TG), and 1,000 "polygenes" simulated for each trait. Some polygenes have pleiotropic effects. The locus-specific heritabilities of the major genes range from 0.1 to 1.0%, under additive, dominant, or overdominant modes of inheritance. The locus-specific effects of the polygenes ranged from 0.002 to 0.15%, with effect sizes selected from negative exponential distributions. All polygenes act independently and have additive effects. Individuals in the LDL upper tail were designated medicated. Subjects medicated increased across visits at 2%, 5%, and 15%. Coronary artery calcification (CAC) was simulated using age, lipid levels, and CAC-specific polymorphisms. The risk of myocardial infarction before each visit was determined by CAC and its interactions with smoking and two genetic loci. Smoking was simulated to be commensurate with rates reported by the Centers for Disease Control. Two hundred replications were simulated.
PMCID: PMC2795938  PMID: 20018031
25.  A search for non-chromosome 6 susceptibility loci contributing to rheumatoid arthritis 
BMC Proceedings  2009;3(Suppl 7):S15.
We conducted a search for non-chromosome 6 genes that may increase risk for rheumatoid arthritis (RA). Our approach was to retrospectively ascertain three "extreme" subsamples from the North American Rheumatoid Arthritis Consortium. The three subsamples are: 1) RA cases who have two low-risk HLA-DRB1 alleles (N = 18), 2) RA cases who have two high-risk HLA-DRB1 alleles (N = 163), and 3) controls who have two low-risk HLA-DRB1 alleles (N = 652). We hypothesized that since Group 1's RA was likely due to non-HLA related risk factors, and because Group 3, by definition, is unaffected, comparing Group 1 with Group 2 and Group 1 with Group 3 would result in the identification of candidate susceptibility loci located outside of the MHC region. Accordingly, we restricted our search to the 21 non-chromosome 6 autosomes. The case-case comparison of Groups 1 and 2 resulted in the identification of 17 SNPs with allele frequencies that differed at p < 0.0001. The case-control comparison of Groups 1 and 3 identified 23 SNPs that differed in allele frequency at p < 0.0001. Eight of these SNPs (rs10498105, rs2398966, rs7664880, rs7447161, rs2793471, rs2611279, rs7967594, and rs742605) were common to both lists.
PMCID: PMC2795911  PMID: 20018004

Results 1-25 (30)