PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-23 (23)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness 
PLoS Genetics  2014;10(4):e1004234.
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally ‘unrelated’ individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
Author Summary
Every individual carries two copies of each chromosome (haplotypes), one from each of their parents, that consist of a long sequence of alleles. Modern genotyping technologies do not measure haplotypes directly, but the combined sum (or genotype) of alleles at each site. Statistical methods are needed to infer (or phase) the haplotypes from the observed genotypes. Haplotype estimation is a key first step of many disease and population genetic studies. Much recent work in this area has focused on phasing in cohorts of nominally unrelated individuals. So called ‘long range phasing’ is a relatively recent concept for phasing individuals with intermediate levels of relatedness, such as cohorts taken from population isolates. Methods also exist for phasing genotypes for individuals within explicit pedigrees. Whilst high quality phasing techniques are available for each of these demographic scenarios, to date, no single method is applicable to all three. In this paper, we present a general approach for phasing cohorts that contain any level of relatedness between the study individuals. We demonstrate high levels of accuracy in all demographic scenarios, as well as the ability to detect (Mendelian consistent) genotyping error and recombination events in duos and trios, the first method with such a capability.
doi:10.1371/journal.pgen.1004234
PMCID: PMC3990520  PMID: 24743097
2.  Distinct Loci in the CHRNA5/CHRNA3/CHRNB4 Gene Cluster Are Associated With Onset of Regular Smoking 
Stephens, Sarah H. | Hartz, Sarah M. | Hoft, Nicole R. | Saccone, Nancy L. | Corley, Robin C. | Hewitt, John K. | Hopfer, Christian J. | Breslau, Naomi | Coon, Hilary | Chen, Xiangning | Ducci, Francesca | Dueker, Nicole | Franceschini, Nora | Frank, Josef | Han, Younghun | Hansel, Nadia N. | Jiang, Chenhui | Korhonen, Tellervo | Lind, Penelope A. | Liu, Jason | Lyytikäinen, Leo-Pekka | Michel, Martha | Shaffer, John R. | Short, Susan E. | Sun, Juzhong | Teumer, Alexander | Thompson, John R. | Vogelzangs, Nicole | Vink, Jacqueline M. | Wenzlaff, Angela | Wheeler, William | Yang, Bao-Zhu | Aggen, Steven H. | Balmforth, Anthony J. | Baumeister, Sebastian E. | Beaty, Terri H. | Benjamin, Daniel J. | Bergen, Andrew W. | Broms, Ulla | Cesarini, David | Chatterjee, Nilanjan | Chen, Jingchun | Cheng, Yu-Ching | Cichon, Sven | Couper, David | Cucca, Francesco | Dick, Danielle | Foroud, Tatiana | Furberg, Helena | Giegling, Ina | Gillespie, Nathan A. | Gu, Fangyi | Hall, Alistair S. | Hällfors, Jenni | Han, Shizhong | Hartmann, Annette M. | Heikkilä, Kauko | Hickie, Ian B. | Hottenga, Jouke Jan | Jousilahti, Pekka | Kaakinen, Marika | Kähönen, Mika | Koellinger, Philipp D. | Kittner, Stephen | Konte, Bettina | Landi, Maria-Teresa | Laatikainen, Tiina | Leppert, Mark | Levy, Steven M. | Mathias, Rasika A. | McNeil, Daniel W. | Medland, Sarah E. | Montgomery, Grant W. | Murray, Tanda | Nauck, Matthias | North, Kari E. | Paré, Peter D. | Pergadia, Michele | Ruczinski, Ingo | Salomaa, Veikko | Viikari, Jorma | Willemsen, Gonneke | Barnes, Kathleen C. | Boerwinkle, Eric | Boomsma, Dorret I. | Caporaso, Neil | Edenberg, Howard J. | Francks, Clyde | Gelernter, Joel | Grabe, Hans Jörgen | Hops, Hyman | Jarvelin, Marjo-Riitta | Johannesson, Magnus | Kendler, Kenneth S. | Lehtimäki, Terho | Magnusson, Patrik K.E. | Marazita, Mary L. | Marchini, Jonathan | Mitchell, Braxton D. | Nöthen, Markus M. | Penninx, Brenda W. | Raitakari, Olli | Rietschel, Marcella | Rujescu, Dan | Samani, Nilesh J. | Schwartz, Ann G. | Shete, Sanjay | Spitz, Margaret | Swan, Gary E. | Völzke, Henry | Veijola, Juha | Wei, Qingyi | Amos, Chris | Cannon, Dale S. | Grucza, Richard | Hatsukami, Dorothy | Heath, Andrew | Johnson, Eric O. | Kaprio, Jaakko | Madden, Pamela | Martin, Nicholas G. | Stevens, Victoria L. | Weiss, Robert B. | Kraft, Peter | Bierut, Laura J. | Ehringer, Marissa A.
Genetic epidemiology  2013;37(8):846-859.
Neuronal nicotinic acetylcholine receptor (nAChR) genes (CHRNA5/CHRNA3/CHRNB4) have been reproducibly associated with nicotine dependence, smoking behaviors, and lung cancer risk. Of the few reports that have focused on early smoking behaviors, association results have been mixed. This meta-analysis examines early smoking phenotypes and SNPs in the gene cluster to determine: (1) whether the most robust association signal in this region (rs16969968) for other smoking behaviors is also associated with early behaviors, and/or (2) if additional statistically independent signals are important in early smoking. We focused on two phenotypes: age of tobacco initiation (AOI) and age of first regular tobacco use (AOS). This study included 56,034 subjects (41 groups) spanning nine countries and evaluated five SNPs including rs1948, rs16969968, rs578776, rs588765, and rs684513. Each dataset was analyzed using a centrally generated script. Meta-analyses were conducted from summary statistics. AOS yielded significant associations with SNPs rs578776 (beta = 0.02, P = 0.004), rs1948 (beta = 0.023, P = 0.018), and rs684513 (beta = 0.032, P = 0.017), indicating protective effects. There were no significant associations for the AOI phenotype. Importantly, rs16969968, the most replicated signal in this region for nicotine dependence, cigarettes per day, and cotinine levels, was not associated with AOI (P = 0.59) or AOS (P = 0.92). These results provide important insight into the complexity of smoking behavior phenotypes, and suggest that association signals in the CHRNA5/A3/B4 gene cluster affecting early smoking behaviors may be different from those affecting the mature nicotine dependence phenotype.
doi:10.1002/gepi.21760
PMCID: PMC3947535  PMID: 24186853
CHRNA5; CHRNA3; CHRNB4; meta-analysis; nicotine; smoke
3.  Multiple type 2 diabetes susceptibility genes following genome-wide association scan in UK samples 
Science (New York, N.Y.)  2007;316(5829):1336-1341.
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
doi:10.1126/science.1142364
PMCID: PMC3772310  PMID: 17463249
4.  Increased Genetic Vulnerability to Smoking at CHRNA5 in Early-Onset Smokers 
Hartz, Sarah M. | Short, Susan E. | Saccone, Nancy L. | Culverhouse, Robert | Chen, LiShiun | Schwantes-An, Tae-Hwi | Coon, Hilary | Han, Younghun | Stephens, Sarah H. | Sun, Juzhong | Chen, Xiangning | Ducci, Francesca | Dueker, Nicole | Franceschini, Nora | Frank, Josef | Geller, Frank | Guđbjartsson, Daniel | Hansel, Nadia N. | Jiang, Chenhui | Keskitalo-Vuokko, Kaisu | Liu, Zhen | Lyytikäinen, Leo-Pekka | Michel, Martha | Rawal, Rajesh | Hum, Sc | Rosenberger, Albert | Scheet, Paul | Shaffer, John R. | Teumer, Alexander | Thompson, John R. | Vink, Jacqueline M. | Vogelzangs, Nicole | Wenzlaff, Angela S. | Wheeler, William | Xiao, Xiangjun | Yang, Bao-Zhu | Aggen, Steven H. | Balmforth, Anthony J. | Baumeister, Sebastian E. | Beaty, Terri | Bennett, Siiri | Bergen, Andrew W. | Boyd, Heather A. | Broms, Ulla | Campbell, Harry | Chatterjee, Nilanjan | Chen, Jingchun | Cheng, Yu-Ching | Cichon, Sven | Couper, David | Cucca, Francesco | Dick, Danielle M. | Foroud, Tatiana | Furberg, Helena | Giegling, Ina | Gu, Fangyi | Hall, Alistair S. | Hällfors, Jenni | Han, Shizhong | Hartmann, Annette M. | Hayward, Caroline | Heikkilä, Kauko | Lic, Phil | Hewitt, John K. | Hottenga, Jouke Jan | Jensen, Majken K. | Jousilahti, Pekka | Kaakinen, Marika | Kittner, Steven J. | Konte, Bettina | Korhonen, Tellervo | Landi, Maria-Teresa | Laatikainen, Tiina | Leppert, Mark | Levy, Steven M. | Mathias, Rasika A. | McNeil, Daniel W. | Medland, Sarah E. | Montgomery, Grant W. | Muley, Thomas | Murray, Tanda | Nauck, Matthias | North, Kari | Pergadia, Michele | Polasek, Ozren | Ramos, Erin M. | Ripatti, Samuli | Risch, Angela | Ruczinski, Ingo | Rudan, Igor | Salomaa, Veikko | Schlessinger, David | Styrkársdóttir, Unnur | Terracciano, Antonio | Uda, Manuela | Willemsen, Gonneke | Wu, Xifeng | Abecasis, Goncalo | Barnes, Kathleen | Bickeböller, Heike | Boerwinkle, Eric | Boomsma, Dorret I. | Caporaso, Neil | Duan, Jubao | Edenberg, Howard J. | Francks, Clyde | Gejman, Pablo V. | Gelernter, Joel | Grabe, Hans Jörgen | Hops, Hyman | Jarvelin, Marjo-Riitta | Viikari, Jorma | Kähönen, Mika | Kendler, Kenneth S. | Lehtimäki, Terho | Levinson, Douglas F. | Marazita, Mary L. | Marchini, Jonathan | Melbye, Mads | Mitchell, Braxton D. | Murray, Jeffrey C. | Nöthen, Markus M. | Penninx, Brenda W. | Raitakari, Olli | Rietschel, Marcella | Rujescu, Dan | Samani, Nilesh J. | Sanders, Alan R. | Schwartz, Ann G. | Shete, Sanjay | Shi, Jianxin | Spitz, Margaret | Stefansson, Kari | Swan, Gary E. | Thorgeirsson, Thorgeir | Völzke, Henry | Wei, Qingyi | Wichmann, H.-Erich | Amos, Christopher I. | Breslau, Naomi | Cannon, Dale S. | Ehringer, Marissa | Grucza, Richard | Hatsukami, Dorothy | Heath, Andrew | Johnson, Eric O. | Kaprio, Jaakko | Madden, Pamela | Martin, Nicholas G. | Stevens, Victoria L. | Stitzel, Jerry A. | Weiss, Robert B. | Kraft, Peter | Bierut, Laura J.
Archives of general psychiatry  2012;69(8):854-860.
Context
Recent studies have shown an association between cigarettes per day (CPD) and a nonsynonymous single-nucleotide polymorphism in CHRNA5, rs16969968.
Objective
To determine whether the association between rs16969968 and smoking is modified by age at onset of regular smoking.
Data Sources
Primary data.
Study Selection
Available genetic studies containing measures of CPD and the genotype of rs16969968 or its proxy.
Data Extraction
Uniform statistical analysis scripts were run locally. Starting with 94 050 ever-smokers from 43 studies, we extracted the heavy smokers (CPD >20) and light smokers (CPD ≤10) with age-at-onset information, reducing the sample size to 33 348. Each study was stratified into early-onset smokers (age at onset ≤16 years) and late-onset smokers (age at onset >16 years), and a logistic regression of heavy vs light smoking with the rs16969968 genotype was computed for each stratum. Meta-analysis was performed within each age-at-onset stratum.
Data Synthesis
Individuals with 1 risk allele at rs16969968 who were early-onset smokers were significantly more likely to be heavy smokers in adulthood (odds ratio [OR]=1.45; 95% CI, 1.36–1.55; n=13 843) than were carriers of the risk allele who were late-onset smokers (OR = 1.27; 95% CI, 1.21–1.33, n = 19 505) (P = .01).
Conclusion
These results highlight an increased genetic vulnerability to smoking in early-onset smokers.
doi:10.1001/archgenpsychiatry.2012.124
PMCID: PMC3482121  PMID: 22868939
5.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing 
Nature genetics  2012;44(8):955-959.
Sequencing efforts, including the 1000 Genomes Project and disease-specific efforts, are producing large collections of haplotypes that can be used for genotype imputation in genome-wide association studies (GWAS). Imputing from these reference panels can help identify new risk alleles, but the use of large panels with existing methods imposes a high computational burden. To keep imputation broadly accessible, we introduce a strategy called “pre-phasing” that maintains the accuracy of leading methods while cutting computational costs by orders of magnitude. In brief, we first statistically estimate the haplotypes for each GWAS individual (“pre-phasing”) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because: (i) the GWAS samples must be phased only once, whereas standard methods would implicitly re-phase with each reference panel update; (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match unphased GWAS genotypes to a pair of reference haplotypes. This strategy will be particularly valuable for repeated imputation as reference panels evolve.
doi:10.1038/ng.2354
PMCID: PMC3696580  PMID: 22820512
6.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity 
Liu, Jason Z. | Tozzi, Federica | Waterworth, Dawn M. | Pillai, Sreekumar G. | Muglia, Pierandrea | Middleton, Lefkos | Berrettini, Wade | Knouff, Christopher W. | Yuan, Xin | Waeber, Gérard | Vollenweider, Peter | Preisig, Martin | Wareham, Nicholas J | Zhao, Jing Hua | Loos, Ruth J.F. | Barroso, Inês | Khaw, Kay-Tee | Grundy, Scott | Barter, Philip | Mahley, Robert | Kesaniemi, Antero | McPherson, Ruth | Vincent, John B. | Strauss, John | Kennedy, James L. | Farmer, Anne | McGuffin, Peter | Day, Richard | Matthews, Keith | Bakke, Per | Gulsvik, Amund | Lucae, Susanne | Ising, Marcus | Brueckl, Tanja | Horstmann, Sonja | Wichmann, H.-Erich | Rawal, Rajesh | Dahmen, Norbert | Lamina, Claudia | Polasek, Ozren | Zgaga, Lina | Huffman, Jennifer | Campbell, Susan | Kooner, Jaspal | Chambers, John C | Burnett, Mary Susan | Devaney, Joseph M. | Pichard, Augusto D. | Kent, Kenneth M. | Satler, Lowell | Lindsay, Joseph M. | Waksman, Ron | Epstein, Stephen | Wilson, James F. | Wild, Sarah H. | Campbell, Harry | Vitart, Veronique | Reilly, Muredach P. | Li, Mingyao | Qu, Liming | Wilensky, Robert | Matthai, William | Hakonarson, Hakon H. | Rader, Daniel J. | Franke, Andre | Wittig, Michael | Schäfer, Arne | Uda, Manuela | Terracciano, Antonio | Xiao, Xiangjun | Busonero, Fabio | Scheet, Paul | Schlessinger, David | St Clair, David | Rujescu, Dan | Abecasis, Gonçalo R. | Grabe, Hans Jörgen | Teumer, Alexander | Völzke, Henry | Petersmann, Astrid | John, Ulrich | Rudan, Igor | Hayward, Caroline | Wright, Alan F. | Kolcic, Ivana | Wright, Benjamin J | Thompson, John R | Balmforth, Anthony J. | Hall, Alistair S. | Samani, Nilesh J. | Anderson, Carl A. | Ahmad, Tariq | Mathew, Christopher G. | Parkes, Miles | Satsangi, Jack | Caulfield, Mark | Munroe, Patricia B. | Farrall, Martin | Dominiczak, Anna | Worthington, Jane | Thomson, Wendy | Eyre, Steve | Barton, Anne | Mooser, Vincent | Francks, Clyde | Marchini, Jonathan
Nature genetics  2010;42(5):436-440.
Smoking is a leading global cause of disease and mortality1. We performed a genomewide meta-analytic association study of smoking-related behavioral traits in a total sample of 41,150 individuals drawn from 20 disease, population, and control cohorts. Our analysis confirmed an effect on smoking quantity (SQ) at a locus on 15q25 (P=9.45e-19) that includes three genes encoding neuronal nicotinic acetylcholine receptor subunits (CHRNA5, CHRNA3, CHRNB4). We used data from the 1000 Genomes project to investigate the region using imputation, which allowed analysis of virtually all common variants in the region and offered a five-fold increase in coverage over the HapMap. This increased the spectrum of potentially causal single nucleotide polymorphisms (SNPs), which included a novel SNP that showed the highest significance, rs55853698, located within the promoter region of CHRNA5. Conditional analysis also identified a secondary locus (rs6495308) in CHRNA3.
doi:10.1038/ng.572
PMCID: PMC3612983  PMID: 20418889
7.  Genome-Wide Joint Meta-Analysis of SNP and SNP-by-Smoking Interaction Identifies Novel Loci for Pulmonary Function 
Hancock, Dana B. | Artigas, María Soler | Gharib, Sina A. | Henry, Amanda | Manichaikul, Ani | Ramasamy, Adaikalavan | Loth, Daan W. | Imboden, Medea | Koch, Beate | McArdle, Wendy L. | Smith, Albert V. | Smolonska, Joanna | Sood, Akshay | Tang, Wenbo | Wilk, Jemma B. | Zhai, Guangju | Zhao, Jing Hua | Aschard, Hugues | Burkart, Kristin M. | Curjuric, Ivan | Eijgelsheim, Mark | Elliott, Paul | Gu, Xiangjun | Harris, Tamara B. | Janson, Christer | Homuth, Georg | Hysi, Pirro G. | Liu, Jason Z. | Loehr, Laura R. | Lohman, Kurt | Loos, Ruth J. F. | Manning, Alisa K. | Marciante, Kristin D. | Obeidat, Ma'en | Postma, Dirkje S. | Aldrich, Melinda C. | Brusselle, Guy G. | Chen, Ting-hsu | Eiriksdottir, Gudny | Franceschini, Nora | Heinrich, Joachim | Rotter, Jerome I. | Wijmenga, Cisca | Williams, O. Dale | Bentley, Amy R. | Hofman, Albert | Laurie, Cathy C. | Lumley, Thomas | Morrison, Alanna C. | Joubert, Bonnie R. | Rivadeneira, Fernando | Couper, David J. | Kritchevsky, Stephen B. | Liu, Yongmei | Wjst, Matthias | Wain, Louise V. | Vonk, Judith M. | Uitterlinden, André G. | Rochat, Thierry | Rich, Stephen S. | Psaty, Bruce M. | O'Connor, George T. | North, Kari E. | Mirel, Daniel B. | Meibohm, Bernd | Launer, Lenore J. | Khaw, Kay-Tee | Hartikainen, Anna-Liisa | Hammond, Christopher J. | Gläser, Sven | Marchini, Jonathan | Kraft, Peter | Wareham, Nicholas J. | Völzke, Henry | Stricker, Bruno H. C. | Spector, Timothy D. | Probst-Hensch, Nicole M. | Jarvis, Deborah | Jarvelin, Marjo-Riitta | Heckbert, Susan R. | Gudnason, Vilmundur | Boezen, H. Marike | Barr, R. Graham | Cassano, Patricia A. | Strachan, David P. | Fornage, Myriam | Hall, Ian P. | Dupuis, Josée | Tobin, Martin D. | London, Stephanie J.
PLoS Genetics  2012;8(12):e1003098.
Genome-wide association studies have identified numerous genetic loci for spirometic measures of pulmonary function, forced expiratory volume in one second (FEV1), and its ratio to forced vital capacity (FEV1/FVC). Given that cigarette smoking adversely affects pulmonary function, we conducted genome-wide joint meta-analyses (JMA) of single nucleotide polymorphism (SNP) and SNP-by-smoking (ever-smoking or pack-years) associations on FEV1 and FEV1/FVC across 19 studies (total N = 50,047). We identified three novel loci not previously associated with pulmonary function. SNPs in or near DNER (smallest PJMA = 5.00×10−11), HLA-DQB1 and HLA-DQA2 (smallest PJMA = 4.35×10−9), and KCNJ2 and SOX9 (smallest PJMA = 1.28×10−8) were associated with FEV1/FVC or FEV1 in meta-analysis models including SNP main effects, smoking main effects, and SNP-by-smoking (ever-smoking or pack-years) interaction. The HLA region has been widely implicated for autoimmune and lung phenotypes, unlike the other novel loci, which have not been widely implicated. We evaluated DNER, KCNJ2, and SOX9 and found them to be expressed in human lung tissue. DNER and SOX9 further showed evidence of differential expression in human airway epithelium in smokers compared to non-smokers. Our findings demonstrated that joint testing of SNP and SNP-by-environment interaction identified novel loci associated with complex traits that are missed when considering only the genetic main effects.
Author Summary
Measures of pulmonary function provide important clinical tools for evaluating lung disease and its progression. Genome-wide association studies have identified numerous genetic risk factors for pulmonary function but have not considered interaction with cigarette smoking, which has consistently been shown to adversely impact pulmonary function. In over 50,000 study participants of European descent, we applied a recently developed joint meta-analysis method to simultaneously test associations of gene and gene-by-smoking interactions in relation to two major clinical measures of pulmonary function. Using this joint method to incorporate genetic main effects plus gene-by-smoking interaction, we identified three novel gene regions not previously related to pulmonary function: (1) DNER, (2) HLA-DQB1 and HLA-DQA2, and (3) KCNJ2 and SOX9. Expression analyses in human lung tissue from ours or prior studies indicate that these regions contain genes that are plausibly involved in pulmonary function. This work highlights the utility of employing novel methods for incorporating environmental interaction in genome-wide association studies to identify novel genetic regions.
doi:10.1371/journal.pgen.1003098
PMCID: PMC3527213  PMID: 23284291
8.  Bayesian Hierarchical Mixture Modelling to Assign Copy Number from a targeted CNV array 
Genetic epidemiology  2011;35(6):536-548.
Accurate assignment of copy number at known copy number variant (CNV) loci is important for both increasing understanding of the structural evolution of genomes as well as for carrying out association studies of copy number with disease. As with calling SNP genotypes, the task can be framed as a clustering problem but for a number of reasons assigning copy number is much more challenging. CNV-assays have lower signal to noise ratios than SNP assays, often display heavy tailed and asymmetric intensity distributions, contain outlying observations and may exhibit systematic technical differences among different cohorts. In addition, the number of copy-number classes at a CNV in the population may be unknown a priori. Due to these complications automatic and robust assignment of copy number from array data remains a challenging problem. We have developed a copy number assignment algorithm, CNVCALL, for a targeted CNV array, such as that used by the Wellcome Trust Case Control Consortium’s recent CNV association study. We use a Bayesian hierarchical mixture model that robustly identifies both the number of different copy number classes at a specific locus as well as relative copy number for each individual in the sample. This approach is fully automated which is a critical requirement when analysing large numbers of CNVs. We illustrate the methods performance using real data from the WTCCC’s CNV association study and using simulated data.
doi:10.1002/gepi.20604
PMCID: PMC3159791  PMID: 21769931
9.  HAPGEN2: simulation of multiple disease SNPs 
Bioinformatics  2011;27(16):2304-2305.
Motivation: Performing experiments with simulated data is an inexpensive approach to evaluating competing experimental designs and analysis methods in genome-wide association studies. Simulation based on resampling known haplotypes is fast and efficient and can produce samples with patterns of linkage disequilibrium (LD), which mimic those in real data. However, the inability of current methods to simulate multiple nearby disease SNPs on the same chromosome can limit their application.
Results: We introduce a new simulation algorithm based on a successful resampling method, HAPGEN, that can simulate multiple nearby disease SNPs on the same chromosome. The new method, HAPGEN2, retains many advantages of resampling methods and expands the range of disease models that current simulators offer.
Availability: HAPGEN2 is freely available from http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html.
Contact: zhan@well.ox.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr341
PMCID: PMC3150040  PMID: 21653516
10.  Genome-wide association and large scale follow-up identifies 16 new loci influencing lung function 
Artigas, María Soler | Loth, Daan W | Wain, Louise V | Gharib, Sina A | Obeidat, Ma’en | Tang, Wenbo | Zhai, Guangju | Zhao, Jing Hua | Smith, Albert Vernon | Huffman, Jennifer E | Albrecht, Eva | Jackson, Catherine M | Evans, David M | Cadby, Gemma | Fornage, Myriam | Manichaikul, Ani | Lopez, Lorna M | Johnson, Toby | Aldrich, Melinda C | Aspelund, Thor | Barroso, Inês | Campbell, Harry | Cassano, Patricia A | Couper, David J | Eiriksdottir, Gudny | Franceschini, Nora | Garcia, Melissa | Gieger, Christian | Gislason, Gauti Kjartan | Grkovic, Ivica | Hammond, Christopher J | Hancock, Dana B | Harris, Tamara B | Ramasamy, Adaikalavan | Heckbert, Susan R | Heliövaara, Markku | Homuth, Georg | Hysi, Pirro G | James, Alan L | Jankovic, Stipan | Joubert, Bonnie R | Karrasch, Stefan | Klopp, Norman | Koch, Beate | Kritchevsky, Stephen B | Launer, Lenore J | Liu, Yongmei | Loehr, Laura R | Lohman, Kurt | Loos, Ruth JF | Lumley, Thomas | Al Balushi, Khalid A | Ang, Wei Q | Barr, R Graham | Beilby, John | Blakey, John D | Boban, Mladen | Boraska, Vesna | Brisman, Jonas | Britton, John R | Brusselle, Guy G | Cooper, Cyrus | Curjuric, Ivan | Dahgam, Santosh | Deary, Ian J | Ebrahim, Shah | Eijgelsheim, Mark | Francks, Clyde | Gaysina, Darya | Granell, Raquel | Gu, Xiangjun | Hankinson, John L | Hardy, Rebecca | Harris, Sarah E | Henderson, John | Henry, Amanda | Hingorani, Aroon D | Hofman, Albert | Holt, Patrick G | Hui, Jennie | Hunter, Michael L | Imboden, Medea | Jameson, Karen A | Kerr, Shona M | Kolcic, Ivana | Kronenberg, Florian | Liu, Jason Z | Marchini, Jonathan | McKeever, Tricia | Morris, Andrew D | Olin, Anna-Carin | Porteous, David J | Postma, Dirkje S | Rich, Stephen S | Ring, Susan M | Rivadeneira, Fernando | Rochat, Thierry | Sayer, Avan Aihie | Sayers, Ian | Sly, Peter D | Smith, George Davey | Sood, Akshay | Starr, John M | Uitterlinden, André G | Vonk, Judith M | Wannamethee, S Goya | Whincup, Peter H | Wijmenga, Cisca | Williams, O Dale | Wong, Andrew | Mangino, Massimo | Marciante, Kristin D | McArdle, Wendy L | Meibohm, Bernd | Morrison, Alanna C | North, Kari E | Omenaas, Ernst | Palmer, Lyle J | Pietiläinen, Kirsi H | Pin, Isabelle | Polašek, Ozren | Pouta, Anneli | Psaty, Bruce M | Hartikainen, Anna-Liisa | Rantanen, Taina | Ripatti, Samuli | Rotter, Jerome I | Rudan, Igor | Rudnicka, Alicja R | Schulz, Holger | Shin, So-Youn | Spector, Tim D | Surakka, Ida | Vitart, Veronique | Völzke, Henry | Wareham, Nicholas J | Warrington, Nicole M | Wichmann, H-Erich | Wild, Sarah H | Wilk, Jemma B | Wjst, Matthias | Wright, Alan F | Zgaga, Lina | Zemunik, Tatijana | Pennell, Craig E | Nyberg, Fredrik | Kuh, Diana | Holloway, John W | Boezen, H Marike | Lawlor, Debbie A | Morris, Richard W | Probst-Hensch, Nicole | Kaprio, Jaakko | Wilson, James F | Hayward, Caroline | Kähönen, Mika | Heinrich, Joachim | Musk, Arthur W | Jarvis, Deborah L | Gläser, Sven | Järvelin, Marjo-Riitta | Stricker, Bruno H Ch | Elliott, Paul | O’Connor, George T | Strachan, David P | London, Stephanie J | Hall, Ian P | Gudnason, Vilmundur | Tobin, Martin D
Nature Genetics  2011;43(11):1082-1090.
Pulmonary function measures reflect respiratory health and predict mortality, and are used in the diagnosis of chronic obstructive pulmonary disease (COPD). We tested genome-wide association with the forced expiratory volume in 1 second (FEV1) and the ratio of FEV1 to forced vital capacity (FVC) in 48,201 individuals of European ancestry, with follow-up of top associations in up to an additional 46,411 individuals. We identified new regions showing association (combined P<5×10−8) with pulmonary function, in or near MFAP2, TGFB2, HDAC4, RARB, MECOM (EVI1), SPATA9, ARMC2, NCR3, ZKSCAN3, CDC123, C10orf11, LRP1, CCDC38, MMP15, CFDP1, and KCNE2. Identification of these 16 new loci may provide insight into the molecular mechanisms regulating pulmonary function and into molecular targets for future therapy to alleviate reduced lung function.
doi:10.1038/ng.941
PMCID: PMC3267376  PMID: 21946350
11.  Effect of Five Genetic Variants Associated with Lung Function on the Risk of Chronic Obstructive Lung Disease, and Their Joint Effects on Lung Function 
Rationale: Genomic loci are associated with FEV1 or the ratio of FEV1 to FVC in population samples, but their association with chronic obstructive pulmonary disease (COPD) has not yet been proven, nor have their combined effects on lung function and COPD been studied.
Objectives: To test association with COPD of variants at five loci (TNS1, GSTCD, HTR4, AGER, and THSD4) and to evaluate joint effects on lung function and COPD of these single-nucleotide polymorphisms (SNPs), and variants at the previously reported locus near HHIP.
Methods: By sampling from 12 population-based studies (n = 31,422), we obtained genotype data on 3,284 COPD case subjects and 17,538 control subjects for sentinel SNPs in TNS1, GSTCD, HTR4, AGER, and THSD4. In 24,648 individuals (including 2,890 COPD case subjects and 13,862 control subjects), we additionally obtained genotypes for rs12504628 near HHIP. Each allele associated with lung function decline at these six SNPs contributed to a risk score. We studied the association of the risk score to lung function and COPD.
Measurements and Main Results: Association with COPD was significant for three loci (TNS1, GSTCD, and HTR4) and the previously reported HHIP locus, and suggestive and directionally consistent for AGER and TSHD4. Compared with the baseline group (7 risk alleles), carrying 10–12 risk alleles was associated with a reduction in FEV1 (β = –72.21 ml, P = 3.90 × 10−4) and FEV1/FVC (β = –1.53%, P = 6.35 × 10−6), and with COPD (odds ratio = 1.63, P = 1.46 × 10−5).
Conclusions: Variants in TNS1, GSTCD, and HTR4 are associated with COPD. Our highest risk score category was associated with a 1.6-fold higher COPD risk than the population average score.
doi:10.1164/rccm.201102-0192OC
PMCID: PMC3398416  PMID: 21965014
FEV1; FVC; genome-wide association study; modeling risk
12.  Genotype Imputation with Thousands of Genomes 
G3: Genes|Genomes|Genetics  2011;1(6):457-470.
Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package.
doi:10.1534/g3.111.001198
PMCID: PMC3276165  PMID: 22384356
GWAS; reference panel; haplotype; linkage disequilibrium; human
13.  Multiple Common Susceptibility Variants near BMP Pathway Loci GREM1, BMP4, and BMP2 Explain Part of the Missing Heritability of Colorectal Cancer 
PLoS Genetics  2011;7(6):e1002105.
Genome-wide association studies (GWAS) have identified 14 tagging single nucleotide polymorphisms (tagSNPs) that are associated with the risk of colorectal cancer (CRC), and several of these tagSNPs are near bone morphogenetic protein (BMP) pathway loci. The penalty of multiple testing implicit in GWAS increases the attraction of complementary approaches for disease gene discovery, including candidate gene- or pathway-based analyses. The strongest candidate loci for additional predisposition SNPs are arguably those already known both to have functional relevance and to be involved in disease risk. To investigate this proposition, we searched for novel CRC susceptibility variants close to the BMP pathway genes GREM1 (15q13.3), BMP4 (14q22.2), and BMP2 (20p12.3) using sample sets totalling 24,910 CRC cases and 26,275 controls. We identified new, independent CRC predisposition SNPs close to BMP4 (rs1957636, P = 3.93×10−10) and BMP2 (rs4813802, P = 4.65×10−11). Near GREM1, we found using fine-mapping that the previously-identified association between tagSNP rs4779584 and CRC actually resulted from two independent signals represented by rs16969681 (P = 5.33×10−8) and rs11632715 (P = 2.30×10−10). As low-penetrance predisposition variants become harder to identify—owing to small effect sizes and/or low risk allele frequencies—approaches based on informed candidate gene selection may become increasingly attractive. Our data emphasise that genetic fine-mapping studies can deconvolute associations that have arisen owing to independent correlation of a tagSNP with more than one functional SNP, thus explaining some of the apparently missing heritability of common diseases.
Author Summary
Genome-wide association studies (GWAS) have identified several colorectal cancer (CRC) susceptibility polymorphisms near genes that encode proteins in the bone morphogenetic protein (BMP) pathway. However, most of the inherited susceptibility to CRC remains unexplained. We investigated three of the best candidate BMP genes (GREM1, BMP4, and BMP2) for additional polymorphisms associated with CRC. By extensive validation of polymorphisms with only modest evidence of association in the initial phases of the GWAS, we identified new, independent CRC predisposition polymorphisms close to BMP4 (rs1957636) and BMP2 (rs4813802). Near GREM1, we used additional genotyping around the GWAS-identified polymorphism rs4779584 to demonstrate two independent signals represented by rs16969681 and rs11632715. Common genes with modest effects on disease risk are becoming harder to identify, and approaches based on informed candidate gene selection may become increasingly attractive. In addition, genetic fine mapping around polymorphisms identified in GWAS can deconvolute associations which have arisen owing to two independent functional variants. These types of study can identify some of the apparently missing heritability of common disease.
doi:10.1371/journal.pgen.1002105
PMCID: PMC3107194  PMID: 21655089
14.  The effect of genome-wide association scan quality control on imputation outcome for common variants 
Imputation is an extremely valuable tool in conducting and synthesising genome-wide association studies (GWASs). Directly typed SNP quality control (QC) is thought to affect imputation quality. It is, therefore, common practise to use quality-controlled (QCed) data as an input for imputing genotypes. This study aims to determine the effect of commonly applied QC steps on imputation outcomes. We performed several iterations of imputing SNPs across chromosome 22 in a dataset consisting of 3177 samples with Illumina 610k (Illumina, San Diego, CA, USA) GWAS data, applying different QC steps each time. The imputed genotypes were compared with the directly typed genotypes. In addition, we investigated the correlation between alternatively QCed data. We also applied a series of post-imputation QC steps balancing elimination of poorly imputed SNPs and information loss. We found that the difference between the unQCed data and the fully QCed data on imputation outcome was minimal. Our study shows that imputation of common variants is generally very accurate and robust to GWAS QC, which is not a major factor affecting imputation outcome. A minority of common-frequency SNPs with particular properties cannot be accurately imputed regardless of QC stringency. These findings may not generalise to the imputation of low frequency and rare variants.
doi:10.1038/ejhg.2010.242
PMCID: PMC3083623  PMID: 21267008
genome-wide association study; imputation; quality control; single nucleotide polymorphism
15.  Genome-wide and fine-resolution association analysis of malaria in West Africa 
Jallow, Muminatou | Teo, Yik Ying | Small, Kerrin S | Rockett, Kirk A | Deloukas, Panos | Clark, Taane G | Kivinen, Katja | Bojang, Kalifa A | Conway, David J | Pinder, Margaret | Sirugo, Giorgio | Sisay-Joof, Fatou | Usen, Stanley | Auburn, Sarah | Bumpstead, Suzannah J | Campino, Susana | Coffey, Alison | Dunham, Andrew | Fry, Andrew E | Green, Angela | Gwilliam, Rhian | Hunt, Sarah E | Inouye, Michael | Jeffreys, Anna E | Mendy, Alieu | Palotie, Aarno | Potter, Simon | Ragoussis, Jiannis | Rogers, Jane | Rowlands, Kate | Somaskantharajah, Elilan | Whittaker, Pamela | Widden, Claire | Donnelly, Peter | Howie, Bryan | Marchini, Jonathan | Morris, Andrew | SanJoaquin, Miguel | Achidi, Eric Akum | Agbenyega, Tsiri | Allen, Angela | Amodu, Olukemi | Corran, Patrick | Djimde, Abdoulaye | Dolo, Amagana | Doumbo, Ogobara K | Drakeley, Chris | Dunstan, Sarah | Evans, Jennifer | Farrar, Jeremy | Fernando, Deepika | Hien, Tran Tinh | Horstmann, Rolf D | Ibrahim, Muntaser | Karunaweera, Nadira | Kokwaro, Gilbert | Koram, Kwadwo A | Lemnge, Martha | Makani, Julie | Marsh, Kevin | Michon, Pascal | Modiano, David | Molyneux, Malcolm E | Mueller, Ivo | Parker, Michael | Peshu, Norbert | Plowe, Christopher V | Puijalon, Odile | Reeder, John | Reyburn, Hugh | Riley, Eleanor M | Sakuntabhai, Anavaj | Singhasivanon, Pratap | Sirima, Sodiomon | Tall, Adama | Taylor, Terrie E | Thera, Mahamadou | Troye-Blomberg, Marita | Williams, Thomas N | Wilson, Michael | Kwiatkowski, Dominic P
Nature genetics  2009;41(6):657-665.
We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10−7 to P = 4 × 10−14, with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.
doi:10.1038/ng.388
PMCID: PMC2889040  PMID: 19465909
16.  A robust statistical method for case-control association testing with copy number variation 
Nature genetics  2008;40(10):1245-1252.
Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.
doi:10.1038/ng.206
PMCID: PMC2784596  PMID: 18776912
17.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies 
PLoS Genetics  2009;5(6):e1000529.
Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.
Author Summary
Large association studies have proven to be effective tools for identifying parts of the genome that influence disease risk and other heritable traits. So-called “genotype imputation” methods form a cornerstone of modern association studies: by extrapolating genetic correlations from a densely characterized reference panel to a sparsely typed study sample, such methods can estimate unobserved genotypes with high accuracy, thereby increasing the chances of finding true associations. To date, most genome-wide imputation analyses have used reference data from the International HapMap Project. While this strategy has been successful, association studies in the near future will also have access to additional reference information, such as control sets genotyped on multiple SNP chips and dense genome-wide haplotypes from the 1,000 Genomes Project. These new reference panels should improve the quality and scope of imputation, but they also present new methodological challenges. We describe a genotype imputation method, IMPUTE version 2, that is designed to address these challenges in next-generation association studies. We show that our method can use a reference panel containing thousands of chromosomes to attain higher accuracy than is possible with the HapMap alone, and that our approach is more accurate than competing methods on both current and next-generation datasets. We also highlight the modeling issues that arise in imputation datasets.
doi:10.1371/journal.pgen.1000529
PMCID: PMC2689936  PMID: 19543373
18.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip 
PLoS Genetics  2009;5(5):e1000477.
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Author Summary
Genome-wide association studies are a powerful and now widely-used method for finding genetic variants that increase the risk of developing particular diseases. These studies are complex and must be planned carefully in order to maximize the probability of finding novel associations. The main design choices to be made relate to sample sizes and choice of commercially available genotyping chip and are often constrained by cost, which can currently be as much as several million dollars. No comprehensive comparisons of chips based on their power for different sample sizes or for fixed study cost are currently available. We describe in detail a method for simulating large genome-wide association samples that accounts for the complex correlations between SNPs due to LD, and we used this method to assess the power of current genotyping chips. Our results highlight the differences between the chips under a range of plausible scenarios, and we demonstrate how our results can be used to design a study with a budget constraint. We also show how genotype imputation can be used to boost the power of each chip and that this method decreases the differences between the chips. Our simulation method and software for comparing power are being made available so that future association studies can be designed in a principled fashion.
doi:10.1371/journal.pgen.1000477
PMCID: PMC2688469  PMID: 19492015
19.  Genome-wide association defines more than thirty distinct susceptibility loci for Crohn's disease 
Nature genetics  2008;40(8):955-962.
Several new risk factors for Crohn's disease have been identified in recent genome-wide association studies. To advance gene discovery further we have combined the data from three studies (a total of 3,230 cases and 4,829 controls) and performed replication in 3,664 independent cases with a mixture of population-based and family-based controls. The results strongly confirm 11 previously reported loci and provide genome-wide significant evidence for 21 new loci, including the regions containing STAT3, JAK2, ICOSLG, CDKAL1, and ITLN1. The expanded molecular understanding of the basis of disease offers promise for informed therapeutic development.
doi:10.1038/NG.175
PMCID: PMC2574810  PMID: 18587394
20.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes 
Zeggini, Eleftheria | Scott, Laura J. | Saxena, Richa | Voight, Benjamin F. | Marchini, Jonathan L | Hu, Tainle | de Bakker, Paul IW | Abecasis, Gonçalo R | Almgren, Peter | Andersen, Gitte | Ardlie, Kristin | Boström, Kristina Bengtsson | Bergman, Richard N | Bonnycastle, Lori L | Borch-Johnsen, Knut | Burtt, Noël P | Chen, Hong | Chines, Peter S | Daly, Mark J | Deodhar, Parimal | Ding, Charles | Doney, Alex S F | Duren, William L | Elliott, Katherine S | Erdos, Michael R | Frayling, Timothy M | Freathy, Rachel M | Gianniny, Lauren | Grallert, Harald | Grarup, Niels | Groves, Christopher J | Guiducci, Candace | Hansen, Torben | Herder, Christian | Hitman, Graham A | Hughes, Thomas E | Isomaa, Bo | Jackson, Anne U | Jørgensen, Torben | Kong, Augustine | Kubalanza, Kari | Kuruvilla, Finny G | Kuusisto, Johanna | Langenberg, Claudia | Lango, Hana | Lauritzen, Torsten | Li, Yun | Lindgren, Cecilia M | Lyssenko, Valeriya | Marvelle, Amanda F | Meisinger, Christa | Midthjell, Kristian | Mohlke, Karen L | Morken, Mario A | Morris, Andrew D | Narisu, Narisu | Nilsson, Peter | Owen, Katharine R | Palmer, Colin NA | Payne, Felicity | Perry, John RB | Pettersen, Elin | Platou, Carl | Prokopenko, Inga | Qi, Lu | Qin, Li | Rayner, Nigel W | Rees, Matthew | Roix, Jeffrey J | Sandbæk, Anelli | Shields, Beverley | Sjögren, Marketa | Steinthorsdottir, Valgerdur | Stringham, Heather M | Swift, Amy J | Thorleifsson, Gudmar | Thorsteinsdottir, Unnur | Timpson, Nicholas J | Tuomi, Tiinamaija | Tuomilehto, Jaakko | Walker, Mark | Watanabe, Richard M | Weedon, Michael N | Willer, Cristen J | Illig, Thomas | Hveem, Kristian | Hu, Frank B | Laakso, Markku | Stefansson, Kari | Pedersen, Oluf | Wareham, Nicholas J | Barroso, Inês | Hattersley, Andrew T | Collins, Francis S | Groop, Leif | McCarthy, Mark I | Boehnke, Michael | Altshuler, David
Nature genetics  2008;40(5):638-645.
Genome-wide association (GWA) studies have identified multiple new genomic loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D)1-11. Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to discover loci at which common alleles have modest effects, we performed meta-analysis of three T2D GWA scans encompassing 10,128 individuals of European-descent and ~2.2 million SNPs (directly genotyped and imputed). Replication testing was performed in an independent sample with an effective sample size of up to 53,975. At least six new loci with robust evidence for association were detected, including the JAZF1 (p=5.0×10−14), CDC123/CAMK1D (p=1.2×10−10), TSPAN8/LGR5 (p=1.1×10−9), THADA (p=1.1×10−9), ADAMTS9 (p=1.2×10−8), and NOTCH2 (p=4.1×10−8) gene regions. The large number of loci with relatively small effects indicates the value of large discovery and follow-up samples in identifying additional clues about the inherited basis of T2D.
doi:10.1038/ng.120
PMCID: PMC2672416  PMID: 18372903
21.  A high resolution HLA and SNP haplotype map for disease association studies in the extended human MHC 
Nature genetics  2006;38(10):1166-1172.
The proteins encoded by the classical HLA class I and class II genes in the major histocompatibility complex (MHC) are highly polymorphic and play an essential role in self/non-self immune recognition. HLA variation is a crucial determinant of transplant rejection and susceptibility to a large number of infectious and autoimmune disease1. Yet identification of causal variants is problematic due to linkage disequilibrium (LD) that extends across multiple HLA and non-HLA genes in the MHC2,3. We therefore set out to characterize the LD patterns between the highly polymorphic HLA genes and background variation by typing the classical HLA genes and >7,500 common single nucleotide polymorphisms (SNPs) and deletion/insertion polymorphisms (DIPs) across four population samples. The analysis provides informative tag SNPs that capture some of the variation in the MHC region and that could be used in initial disease association studies, and provides new insight into the evolutionary dynamics and ancestral origins of the HLA loci and their haplotypes.
doi:10.1038/ng1885
PMCID: PMC2670196  PMID: 16998491
22.  Common variants near MC4R are associated with fat mass, weight and risk of obesity 
Loos, Ruth J F | Lindgren, Cecilia M | Li, Shengxu | Wheeler, Eleanor | Zhao, Jing Hua | Prokopenko, Inga | Inouye, Michael | Freathy, Rachel M | Attwood, Antony P | Beckmann, Jacques S | Berndt, Sonja I | Bergmann, Sven | Bennett, Amanda J | Bingham, Sheila A | Bochud, Murielle | Brown, Morris | Cauchi, Stéphane | Connell, John M | Cooper, Cyrus | Smith, George Davey | Day, Ian | Dina, Christian | De, Subhajyoti | Dermitzakis, Emmanouil T | Doney, Alex S F | Elliott, Katherine S | Elliott, Paul | Evans, David M | Farooqi, I Sadaf | Froguel, Philippe | Ghori, Jilur | Groves, Christopher J | Gwilliam, Rhian | Hadley, David | Hall, Alistair S | Hattersley, Andrew T | Hebebrand, Johannes | Heid, Iris M | Herrera, Blanca | Hinney, Anke | Hunt, Sarah E | Jarvelin, Marjo-Riitta | Johnson, Toby | Jolley, Jennifer D M | Karpe, Fredrik | Keniry, Andrew | Khaw, Kay-Tee | Luben, Robert N | Mangino, Massimo | Marchini, Jonathan | McArdle, Wendy L | McGinnis, Ralph | Meyre, David | Munroe, Patricia B | Morris, Andrew D | Ness, Andrew R | Neville, Matthew J | Nica, Alexandra C | Ong, Ken K | O'Rahilly, Stephen | Owen, Katharine R | Palmer, Colin N A | Papadakis, Konstantinos | Potter, Simon | Pouta, Anneli | Qi, Lu | Randall, Joshua C | Rayner, Nigel W | Ring, Susan M | Sandhu, Manjinder S | Scherag, André | Sims, Matthew A | Song, Kijoung | Soranzo, Nicole | Speliotes, Elizabeth K | Syddall, Holly E | Teichmann, Sarah A | Timpson, Nicholas J | Tobias, Jonathan H | Uda, Manuela | Vogel, Carla I Ganz | Wallace, Chris | Waterworth, Dawn M | Weedon, Michael N | Willer, Cristen J | Wraight, Vicki L | Yuan, Xin | Zeggini, Eleftheria | Hirschhorn, Joel N | Strachan, David P | Ouwehand, Willem H | Caulfield, Mark J | Samani, Nilesh J | Frayling, Timothy M | Vollenweider, Peter | Waeber, Gerard | Mooser, Vincent | Deloukas, Panos | McCarthy, Mark I | Wareham, Nicholas J | Barroso, Inês | Jacobs, Kevin B | Chanock, Stephen J | Hayes, Richard B | Lamina, Claudia | Gieger, Christian | Illig, Thomas | Meitinger, Thomas | Wichmann, H-Erich | Kraft, Peter | Hankinson, Susan E | Hunter, David J | Hu, Frank B | Lyon, Helen N | Voight, Benjamin F | Ridderstrale, Martin | Groop, Leif | Scheet, Paul | Sanna, Serena | Abecasis, Goncalo R | Albai, Giuseppe | Nagaraja, Ramaiah | Schlessinger, David | Jackson, Anne U | Tuomilehto, Jaakko | Collins, Francis S | Boehnke, Michael | Mohlke, Karen L
Nature genetics  2008;40(6):768-775.
To identify common variants influencing body mass index (BMI), we analyzed genome-wide association data from 16,876 individuals of European descent. After previously reported variants in FTO, the strongest association signal (rs17782313, P = 2.9 × 10−6) mapped 188 kb downstream of MC4R (melanocortin-4 receptor), mutations of which are the leading cause of monogenic severe childhood-onset obesity. We confirmed the BMI association in 60,352 adults (per-allele effect = 0.05 Z-score units; P = 2.8 × 10−15) and 5,988 children aged 7–11 (0.13 Z-score units; P = 1.5 × 10−8). In case-control analyses (n = 10,583), the odds for severe childhood obesity reached 1.30 (P = 8.0 × 10−11). Furthermore, we observed overtransmission of the risk allele to obese offspring in 660 families (P (pedigree disequilibrium test average; PDT-avg) = 2.4 × 10−4). The SNP location and patterns of phenotypic associations are consistent with effects mediated through altered MC4R function. Our findings establish that common variants near MC4R influence fat mass, weight and obesity risk at the population level and reinforce the need for large-scale data integration to identify variants influencing continuous biomedical traits.
doi:10.1038/ng.140
PMCID: PMC2669167  PMID: 18454148
23.  Two-Stage Two-Locus Models in Genome-Wide Association 
PLoS Genetics  2006;2(9):e157.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
Synopsis
Although there is growing appreciation that attempting to map genetic interactions in humans may be a fruitful endeavor, there is no consensus as to the best strategy for their detection, particularly in the case of genome-wide association where the number of potential comparisons is enormous. In this article, the authors compare the performance of four different search strategies to detect loci which interact in genome-wide association—a single-locus search, an exhaustive two-locus search, and two, two-stage procedures in which a subset of loci initially identified with single-locus tests are analyzed using a full two-locus model. Their results show that when loci interact, an exhaustive two-locus search across the genome is superior to a two-stage strategy, and in many situations can identify loci which would not have been identified solely using a single-locus search. Their findings suggest that an exhaustive search involving all pairwise combinations of markers across the genome may provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
doi:10.1371/journal.pgen.0020157
PMCID: PMC1570380  PMID: 17002500

Results 1-23 (23)