PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-23 (23)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Integrated Enrichment Analysis of Variants and Pathways in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease 
PLoS Genetics  2013;9(10):e1003770.
Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study.
Author Summary
Genome-wide association studies have helped locate gene variants that affect our susceptibility to diseases. The analysis of these studies is typically straightforward: test each genetic variant whether it is correlated with predisposition to disease. This approach often works well for identifying commonly occurring variants with moderate effects on disease risk. However, the effects of many variants are so small they fail to register statistically significant correlations. This is a concern because many diseases are modulated by many genetic factors with small effects on disease risk. An alternative is to examine groups of variants, such as variants sharing a common pathway, and assess whether these groups are “enriched” for correlations with disease. This can be a more effective approach to identifying genetic factors relevant to disease. However, it does not tell us which genes are associated with disease. To address this limitation, we describe an approach that integrates enrichment analysis with tests for disease-variant correlations within a single framework. We illustrate this approach in genome-wide studies of seven complex diseases. We show that our approach supports enriched pathways in several diseases, and uncovers disease-susceptibility genes in these pathways not identified in conventional analyses of the same data.
doi:10.1371/journal.pgen.1003770
PMCID: PMC3789883  PMID: 24098138
2.  Multiple type 2 diabetes susceptibility genes following genome-wide association scan in UK samples 
Science (New York, N.Y.)  2007;316(5829):1336-1341.
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
doi:10.1126/science.1142364
PMCID: PMC3772310  PMID: 17463249
3.  Fast and accurate genotype imputation in genome-wide association studies through pre-phasing 
Nature genetics  2012;44(8):955-959.
Sequencing efforts, including the 1000 Genomes Project and disease-specific efforts, are producing large collections of haplotypes that can be used for genotype imputation in genome-wide association studies (GWAS). Imputing from these reference panels can help identify new risk alleles, but the use of large panels with existing methods imposes a high computational burden. To keep imputation broadly accessible, we introduce a strategy called “pre-phasing” that maintains the accuracy of leading methods while cutting computational costs by orders of magnitude. In brief, we first statistically estimate the haplotypes for each GWAS individual (“pre-phasing”) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because: (i) the GWAS samples must be phased only once, whereas standard methods would implicitly re-phase with each reference panel update; (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match unphased GWAS genotypes to a pair of reference haplotypes. This strategy will be particularly valuable for repeated imputation as reference panels evolve.
doi:10.1038/ng.2354
PMCID: PMC3696580  PMID: 22820512
4.  Bayesian Hierarchical Mixture Modelling to Assign Copy Number from a targeted CNV array 
Genetic epidemiology  2011;35(6):536-548.
Accurate assignment of copy number at known copy number variant (CNV) loci is important for both increasing understanding of the structural evolution of genomes as well as for carrying out association studies of copy number with disease. As with calling SNP genotypes, the task can be framed as a clustering problem but for a number of reasons assigning copy number is much more challenging. CNV-assays have lower signal to noise ratios than SNP assays, often display heavy tailed and asymmetric intensity distributions, contain outlying observations and may exhibit systematic technical differences among different cohorts. In addition, the number of copy-number classes at a CNV in the population may be unknown a priori. Due to these complications automatic and robust assignment of copy number from array data remains a challenging problem. We have developed a copy number assignment algorithm, CNVCALL, for a targeted CNV array, such as that used by the Wellcome Trust Case Control Consortium’s recent CNV association study. We use a Bayesian hierarchical mixture model that robustly identifies both the number of different copy number classes at a specific locus as well as relative copy number for each individual in the sample. This approach is fully automated which is a critical requirement when analysing large numbers of CNVs. We illustrate the methods performance using real data from the WTCCC’s CNV association study and using simulated data.
doi:10.1002/gepi.20604
PMCID: PMC3159791  PMID: 21769931
5.  HAPGEN2: simulation of multiple disease SNPs 
Bioinformatics  2011;27(16):2304-2305.
Motivation: Performing experiments with simulated data is an inexpensive approach to evaluating competing experimental designs and analysis methods in genome-wide association studies. Simulation based on resampling known haplotypes is fast and efficient and can produce samples with patterns of linkage disequilibrium (LD), which mimic those in real data. However, the inability of current methods to simulate multiple nearby disease SNPs on the same chromosome can limit their application.
Results: We introduce a new simulation algorithm based on a successful resampling method, HAPGEN, that can simulate multiple nearby disease SNPs on the same chromosome. The new method, HAPGEN2, retains many advantages of resampling methods and expands the range of disease models that current simulators offer.
Availability: HAPGEN2 is freely available from http://www.stats.ox.ac.uk/~marchini/software/gwas/gwas.html.
Contact: zhan@well.ox.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr341
PMCID: PMC3150040  PMID: 21653516
6.  Effect of Five Genetic Variants Associated with Lung Function on the Risk of Chronic Obstructive Lung Disease, and Their Joint Effects on Lung Function 
Rationale: Genomic loci are associated with FEV1 or the ratio of FEV1 to FVC in population samples, but their association with chronic obstructive pulmonary disease (COPD) has not yet been proven, nor have their combined effects on lung function and COPD been studied.
Objectives: To test association with COPD of variants at five loci (TNS1, GSTCD, HTR4, AGER, and THSD4) and to evaluate joint effects on lung function and COPD of these single-nucleotide polymorphisms (SNPs), and variants at the previously reported locus near HHIP.
Methods: By sampling from 12 population-based studies (n = 31,422), we obtained genotype data on 3,284 COPD case subjects and 17,538 control subjects for sentinel SNPs in TNS1, GSTCD, HTR4, AGER, and THSD4. In 24,648 individuals (including 2,890 COPD case subjects and 13,862 control subjects), we additionally obtained genotypes for rs12504628 near HHIP. Each allele associated with lung function decline at these six SNPs contributed to a risk score. We studied the association of the risk score to lung function and COPD.
Measurements and Main Results: Association with COPD was significant for three loci (TNS1, GSTCD, and HTR4) and the previously reported HHIP locus, and suggestive and directionally consistent for AGER and TSHD4. Compared with the baseline group (7 risk alleles), carrying 10–12 risk alleles was associated with a reduction in FEV1 (β = –72.21 ml, P = 3.90 × 10−4) and FEV1/FVC (β = –1.53%, P = 6.35 × 10−6), and with COPD (odds ratio = 1.63, P = 1.46 × 10−5).
Conclusions: Variants in TNS1, GSTCD, and HTR4 are associated with COPD. Our highest risk score category was associated with a 1.6-fold higher COPD risk than the population average score.
doi:10.1164/rccm.201102-0192OC
PMCID: PMC3398416  PMID: 21965014
FEV1; FVC; genome-wide association study; modeling risk
7.  Genotype Imputation with Thousands of Genomes 
G3: Genes|Genomes|Genetics  2011;1(6):457-470.
Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package.
doi:10.1534/g3.111.001198
PMCID: PMC3276165  PMID: 22384356
GWAS; reference panel; haplotype; linkage disequilibrium; human
8.  Multiple Common Susceptibility Variants near BMP Pathway Loci GREM1, BMP4, and BMP2 Explain Part of the Missing Heritability of Colorectal Cancer 
PLoS Genetics  2011;7(6):e1002105.
Genome-wide association studies (GWAS) have identified 14 tagging single nucleotide polymorphisms (tagSNPs) that are associated with the risk of colorectal cancer (CRC), and several of these tagSNPs are near bone morphogenetic protein (BMP) pathway loci. The penalty of multiple testing implicit in GWAS increases the attraction of complementary approaches for disease gene discovery, including candidate gene- or pathway-based analyses. The strongest candidate loci for additional predisposition SNPs are arguably those already known both to have functional relevance and to be involved in disease risk. To investigate this proposition, we searched for novel CRC susceptibility variants close to the BMP pathway genes GREM1 (15q13.3), BMP4 (14q22.2), and BMP2 (20p12.3) using sample sets totalling 24,910 CRC cases and 26,275 controls. We identified new, independent CRC predisposition SNPs close to BMP4 (rs1957636, P = 3.93×10−10) and BMP2 (rs4813802, P = 4.65×10−11). Near GREM1, we found using fine-mapping that the previously-identified association between tagSNP rs4779584 and CRC actually resulted from two independent signals represented by rs16969681 (P = 5.33×10−8) and rs11632715 (P = 2.30×10−10). As low-penetrance predisposition variants become harder to identify—owing to small effect sizes and/or low risk allele frequencies—approaches based on informed candidate gene selection may become increasingly attractive. Our data emphasise that genetic fine-mapping studies can deconvolute associations that have arisen owing to independent correlation of a tagSNP with more than one functional SNP, thus explaining some of the apparently missing heritability of common diseases.
Author Summary
Genome-wide association studies (GWAS) have identified several colorectal cancer (CRC) susceptibility polymorphisms near genes that encode proteins in the bone morphogenetic protein (BMP) pathway. However, most of the inherited susceptibility to CRC remains unexplained. We investigated three of the best candidate BMP genes (GREM1, BMP4, and BMP2) for additional polymorphisms associated with CRC. By extensive validation of polymorphisms with only modest evidence of association in the initial phases of the GWAS, we identified new, independent CRC predisposition polymorphisms close to BMP4 (rs1957636) and BMP2 (rs4813802). Near GREM1, we used additional genotyping around the GWAS-identified polymorphism rs4779584 to demonstrate two independent signals represented by rs16969681 and rs11632715. Common genes with modest effects on disease risk are becoming harder to identify, and approaches based on informed candidate gene selection may become increasingly attractive. In addition, genetic fine mapping around polymorphisms identified in GWAS can deconvolute associations which have arisen owing to two independent functional variants. These types of study can identify some of the apparently missing heritability of common disease.
doi:10.1371/journal.pgen.1002105
PMCID: PMC3107194  PMID: 21655089
9.  The effect of genome-wide association scan quality control on imputation outcome for common variants 
Imputation is an extremely valuable tool in conducting and synthesising genome-wide association studies (GWASs). Directly typed SNP quality control (QC) is thought to affect imputation quality. It is, therefore, common practise to use quality-controlled (QCed) data as an input for imputing genotypes. This study aims to determine the effect of commonly applied QC steps on imputation outcomes. We performed several iterations of imputing SNPs across chromosome 22 in a dataset consisting of 3177 samples with Illumina 610k (Illumina, San Diego, CA, USA) GWAS data, applying different QC steps each time. The imputed genotypes were compared with the directly typed genotypes. In addition, we investigated the correlation between alternatively QCed data. We also applied a series of post-imputation QC steps balancing elimination of poorly imputed SNPs and information loss. We found that the difference between the unQCed data and the fully QCed data on imputation outcome was minimal. Our study shows that imputation of common variants is generally very accurate and robust to GWAS QC, which is not a major factor affecting imputation outcome. A minority of common-frequency SNPs with particular properties cannot be accurately imputed regardless of QC stringency. These findings may not generalise to the imputation of low frequency and rare variants.
doi:10.1038/ejhg.2010.242
PMCID: PMC3083623  PMID: 21267008
genome-wide association study; imputation; quality control; single nucleotide polymorphism
10.  An Evolutionary Framework for Association Testing in Resequencing Studies 
PLoS Genetics  2010;6(11):e1001202.
Sequencing technologies are becoming cheap enough to apply to large numbers of study participants and promise to provide new insights into human phenotypes by bringing to light rare and previously unknown genetic variants. We develop a new framework for the analysis of sequence data that incorporates all of the major features of previously proposed approaches, including those focused on allele counts and allele burden, but is both more general and more powerful. We harness population genetic theory to provide prior information on effect sizes and to create a pooling strategy for information from rare variants. Our method, EMMPAT (Evolutionary Mixed Model for Pooled Association Testing), generates a single test per gene (substantially reducing multiple testing concerns), facilitates graphical summaries, and improves the interpretation of results by allowing calculation of attributable variance. Simulations show that, relative to previously used approaches, our method increases the power to detect genes that affect phenotype when natural selection has kept alleles with large effect sizes rare. We demonstrate our approach on a population-based re-sequencing study of association between serum triglycerides and variation in ANGPTL4.
Author Summary
Studies correlating genetic variation to disease and other human traits have examined mostly common mutations, partly because of technological restrictions. However, recent advances have resulted in dramatically declining costs of obtaining genomic sequence data, which provides the opportunity to detect rare genetic variation. Existing methods of analysis designed for an earlier era of technology are not optimal for discovering links to rare mutations. We take advantage of 1) the advanced theoretical understanding of evolutionary mechanics and 2) genome-wide evidence about evolutionary forces on the human genome to suggest a framework for understanding observed correlations between rare genetic variation and modern traits. The model leads to a powerful test for genetic association and to an improved interpretation of results. We demonstrate the new method on previously confirmed results in a gene related to high blood cholesterol levels.
doi:10.1371/journal.pgen.1001202
PMCID: PMC2978703  PMID: 21085648
11.  Genome-wide and fine-resolution association analysis of malaria in West Africa 
Jallow, Muminatou | Teo, Yik Ying | Small, Kerrin S | Rockett, Kirk A | Deloukas, Panos | Clark, Taane G | Kivinen, Katja | Bojang, Kalifa A | Conway, David J | Pinder, Margaret | Sirugo, Giorgio | Sisay-Joof, Fatou | Usen, Stanley | Auburn, Sarah | Bumpstead, Suzannah J | Campino, Susana | Coffey, Alison | Dunham, Andrew | Fry, Andrew E | Green, Angela | Gwilliam, Rhian | Hunt, Sarah E | Inouye, Michael | Jeffreys, Anna E | Mendy, Alieu | Palotie, Aarno | Potter, Simon | Ragoussis, Jiannis | Rogers, Jane | Rowlands, Kate | Somaskantharajah, Elilan | Whittaker, Pamela | Widden, Claire | Donnelly, Peter | Howie, Bryan | Marchini, Jonathan | Morris, Andrew | SanJoaquin, Miguel | Achidi, Eric Akum | Agbenyega, Tsiri | Allen, Angela | Amodu, Olukemi | Corran, Patrick | Djimde, Abdoulaye | Dolo, Amagana | Doumbo, Ogobara K | Drakeley, Chris | Dunstan, Sarah | Evans, Jennifer | Farrar, Jeremy | Fernando, Deepika | Hien, Tran Tinh | Horstmann, Rolf D | Ibrahim, Muntaser | Karunaweera, Nadira | Kokwaro, Gilbert | Koram, Kwadwo A | Lemnge, Martha | Makani, Julie | Marsh, Kevin | Michon, Pascal | Modiano, David | Molyneux, Malcolm E | Mueller, Ivo | Parker, Michael | Peshu, Norbert | Plowe, Christopher V | Puijalon, Odile | Reeder, John | Reyburn, Hugh | Riley, Eleanor M | Sakuntabhai, Anavaj | Singhasivanon, Pratap | Sirima, Sodiomon | Tall, Adama | Taylor, Terrie E | Thera, Mahamadou | Troye-Blomberg, Marita | Williams, Thomas N | Wilson, Michael | Kwiatkowski, Dominic P
Nature genetics  2009;41(6):657-665.
We report a genome-wide association (GWA) study of severe malaria in The Gambia. The initial GWA scan included 2,500 children genotyped on the Affymetrix 500K GeneChip, and a replication study included 3,400 children. We used this to examine the performance of GWA methods in Africa. We found considerable population stratification, and also that signals of association at known malaria resistance loci were greatly attenuated owing to weak linkage disequilibrium (LD). To investigate possible solutions to the problem of low LD, we focused on the HbS locus, sequencing this region of the genome in 62 Gambian individuals and then using these data to conduct multipoint imputation in the GWA samples. This increased the signal of association, from P = 4 × 10−7 to P = 4 × 10−14, with the peak of the signal located precisely at the HbS causal variant. Our findings provide proof of principle that fine-resolution multipoint imputation, based on population-specific sequencing data, can substantially boost authentic GWA signals and enable fine mapping of causal variants in African populations.
doi:10.1038/ng.388
PMCID: PMC2889040  PMID: 19465909
12.  A robust statistical method for case-control association testing with copy number variation 
Nature genetics  2008;40(10):1245-1252.
Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.
doi:10.1038/ng.206
PMCID: PMC2784596  PMID: 18776912
13.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies 
PLoS Genetics  2009;5(6):e1000529.
Genotype imputation methods are now being widely used in the analysis of genome-wide association studies. Most imputation analyses to date have used the HapMap as a reference dataset, but new reference panels (such as controls genotyped on multiple SNP chips and densely typed samples from the 1,000 Genomes Project) will soon allow a broader range of SNPs to be imputed with higher accuracy, thereby increasing power. We describe a genotype imputation method (IMPUTE version 2) that is designed to address the challenges presented by these new datasets. The main innovation of our approach is a flexible modelling framework that increases accuracy and combines information across multiple reference panels while remaining computationally feasible. We find that IMPUTE v2 attains higher accuracy than other methods when the HapMap provides the sole reference panel, but that the size of the panel constrains the improvements that can be made. We also find that imputation accuracy can be greatly enhanced by expanding the reference panel to contain thousands of chromosomes and that IMPUTE v2 outperforms other methods in this setting at both rare and common SNPs, with overall error rates that are 15%–20% lower than those of the closest competing method. One particularly challenging aspect of next-generation association studies is to integrate information across multiple reference panels genotyped on different sets of SNPs; we show that our approach to this problem has practical advantages over other suggested solutions.
Author Summary
Large association studies have proven to be effective tools for identifying parts of the genome that influence disease risk and other heritable traits. So-called “genotype imputation” methods form a cornerstone of modern association studies: by extrapolating genetic correlations from a densely characterized reference panel to a sparsely typed study sample, such methods can estimate unobserved genotypes with high accuracy, thereby increasing the chances of finding true associations. To date, most genome-wide imputation analyses have used reference data from the International HapMap Project. While this strategy has been successful, association studies in the near future will also have access to additional reference information, such as control sets genotyped on multiple SNP chips and dense genome-wide haplotypes from the 1,000 Genomes Project. These new reference panels should improve the quality and scope of imputation, but they also present new methodological challenges. We describe a genotype imputation method, IMPUTE version 2, that is designed to address these challenges in next-generation association studies. We show that our method can use a reference panel containing thousands of chromosomes to attain higher accuracy than is possible with the HapMap alone, and that our approach is more accurate than competing methods on both current and next-generation datasets. We also highlight the modeling issues that arise in imputation datasets.
doi:10.1371/journal.pgen.1000529
PMCID: PMC2689936  PMID: 19543373
14.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip 
PLoS Genetics  2009;5(5):e1000477.
Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical “complete” chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.
Author Summary
Genome-wide association studies are a powerful and now widely-used method for finding genetic variants that increase the risk of developing particular diseases. These studies are complex and must be planned carefully in order to maximize the probability of finding novel associations. The main design choices to be made relate to sample sizes and choice of commercially available genotyping chip and are often constrained by cost, which can currently be as much as several million dollars. No comprehensive comparisons of chips based on their power for different sample sizes or for fixed study cost are currently available. We describe in detail a method for simulating large genome-wide association samples that accounts for the complex correlations between SNPs due to LD, and we used this method to assess the power of current genotyping chips. Our results highlight the differences between the chips under a range of plausible scenarios, and we demonstrate how our results can be used to design a study with a budget constraint. We also show how genotype imputation can be used to boost the power of each chip and that this method decreases the differences between the chips. Our simulation method and software for comparing power are being made available so that future association studies can be designed in a principled fashion.
doi:10.1371/journal.pgen.1000477
PMCID: PMC2688469  PMID: 19492015
15.  Evaluation of Haplotype Inference Using Definitive Haplotype Data Obtained from Complete Hydatidiform Moles, and Its Significance for the Analyses of Positively Selected Regions 
PLoS Genetics  2009;5(5):e1000468.
The haplotype map constructed by the HapMap Project is a valuable resource in the genetic studies of disease genes, population structure, and evolution. In the Project, Caucasian and African haplotypes are fairly accurately inferred, based mainly on the rules of Mendelian inheritance using the genotypes of trios. However, the Asian haplotypes are inferred from the genotypes of unrelated individuals based on population genetics, and are less accurate. Thus, the effects of this inaccuracy on downstream analyses needs to be assessed. We determined true Japanese haplotypes by genotyping 100 complete hydatidiform moles (CHM), each carrying a genome derived from a single sperm, using Affymetrix 500 K Arrays. We then assessed how inferred haplotypes can differ from true haplotypes, by phasing pseudo-individualized true haplotypes using the programs PHASE, fastPHASE, and Beagle. We found that, at various genomic regions, especially the MHC locus, the expansion of extended haplotype homozygosity (EHH), which is a measure of positive selection, is obscured when inferred Asian haplotype data is used to detect the expansion. We then mapped the genome using a new statistic, XDiHH, which directly detects the difference between the true and inferred haplotypes, in the determination of EHH expansion. We also show that the true haplotype data presented here is useful to assess and improve the accuracy of phasing of Asian genotypes.
Author Summary
Precise haplotype maps are preferred for the performance of a variety of genetic studies including identification of disease-associated loci and dissection of evolutionary mechanisms such as selection and recombination. For diploid organisms, the haplotype information appears as the genotypes when we obtain the information using widely used high-throughput techniques. The process of extracting haplotype information from genotypes is called phasing, which can be accurately done if the genotypes are from related individuals, such as parent–child trios, by considering the constraints imposed by the rules of Mendelian inheritance. For the genotype data without family information, phasing is done by one of the methods that are based on haplotype clustering, and the inferred haplotypes are known to be less accurate. Here, we experimentally determined genome-wide definitive haplotypes using a collection of Japanese complete hydatidiform moles (CHM), each of which carries a genome derived from a single sperm. Using these resources, we asked if the definitive haplotype data can detect long-distance information that has been obscured when we rely solely on the haplotypes inferred by clustering. We also show that by introducing definitive haplotypes as references, inference of haplotypes of unrelated individuals is significantly improved.
doi:10.1371/journal.pgen.1000468
PMCID: PMC2670534  PMID: 19424418
16.  Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes 
Zeggini, Eleftheria | Scott, Laura J. | Saxena, Richa | Voight, Benjamin F. | Marchini, Jonathan L | Hu, Tainle | de Bakker, Paul IW | Abecasis, Gonçalo R | Almgren, Peter | Andersen, Gitte | Ardlie, Kristin | Boström, Kristina Bengtsson | Bergman, Richard N | Bonnycastle, Lori L | Borch-Johnsen, Knut | Burtt, Noël P | Chen, Hong | Chines, Peter S | Daly, Mark J | Deodhar, Parimal | Ding, Charles | Doney, Alex S F | Duren, William L | Elliott, Katherine S | Erdos, Michael R | Frayling, Timothy M | Freathy, Rachel M | Gianniny, Lauren | Grallert, Harald | Grarup, Niels | Groves, Christopher J | Guiducci, Candace | Hansen, Torben | Herder, Christian | Hitman, Graham A | Hughes, Thomas E | Isomaa, Bo | Jackson, Anne U | Jørgensen, Torben | Kong, Augustine | Kubalanza, Kari | Kuruvilla, Finny G | Kuusisto, Johanna | Langenberg, Claudia | Lango, Hana | Lauritzen, Torsten | Li, Yun | Lindgren, Cecilia M | Lyssenko, Valeriya | Marvelle, Amanda F | Meisinger, Christa | Midthjell, Kristian | Mohlke, Karen L | Morken, Mario A | Morris, Andrew D | Narisu, Narisu | Nilsson, Peter | Owen, Katharine R | Palmer, Colin NA | Payne, Felicity | Perry, John RB | Pettersen, Elin | Platou, Carl | Prokopenko, Inga | Qi, Lu | Qin, Li | Rayner, Nigel W | Rees, Matthew | Roix, Jeffrey J | Sandbæk, Anelli | Shields, Beverley | Sjögren, Marketa | Steinthorsdottir, Valgerdur | Stringham, Heather M | Swift, Amy J | Thorleifsson, Gudmar | Thorsteinsdottir, Unnur | Timpson, Nicholas J | Tuomi, Tiinamaija | Tuomilehto, Jaakko | Walker, Mark | Watanabe, Richard M | Weedon, Michael N | Willer, Cristen J | Illig, Thomas | Hveem, Kristian | Hu, Frank B | Laakso, Markku | Stefansson, Kari | Pedersen, Oluf | Wareham, Nicholas J | Barroso, Inês | Hattersley, Andrew T | Collins, Francis S | Groop, Leif | McCarthy, Mark I | Boehnke, Michael | Altshuler, David
Nature genetics  2008;40(5):638-645.
Genome-wide association (GWA) studies have identified multiple new genomic loci at which common variants modestly but reproducibly influence risk of type 2 diabetes (T2D)1-11. Established associations to common and rare variants explain only a small proportion of the heritability of T2D. As previously published analyses had limited power to discover loci at which common alleles have modest effects, we performed meta-analysis of three T2D GWA scans encompassing 10,128 individuals of European-descent and ~2.2 million SNPs (directly genotyped and imputed). Replication testing was performed in an independent sample with an effective sample size of up to 53,975. At least six new loci with robust evidence for association were detected, including the JAZF1 (p=5.0×10−14), CDC123/CAMK1D (p=1.2×10−10), TSPAN8/LGR5 (p=1.1×10−9), THADA (p=1.1×10−9), ADAMTS9 (p=1.2×10−8), and NOTCH2 (p=4.1×10−8) gene regions. The large number of loci with relatively small effects indicates the value of large discovery and follow-up samples in identifying additional clues about the inherited basis of T2D.
doi:10.1038/ng.120
PMCID: PMC2672416  PMID: 18372903
17.  A high resolution HLA and SNP haplotype map for disease association studies in the extended human MHC 
Nature genetics  2006;38(10):1166-1172.
The proteins encoded by the classical HLA class I and class II genes in the major histocompatibility complex (MHC) are highly polymorphic and play an essential role in self/non-self immune recognition. HLA variation is a crucial determinant of transplant rejection and susceptibility to a large number of infectious and autoimmune disease1. Yet identification of causal variants is problematic due to linkage disequilibrium (LD) that extends across multiple HLA and non-HLA genes in the MHC2,3. We therefore set out to characterize the LD patterns between the highly polymorphic HLA genes and background variation by typing the classical HLA genes and >7,500 common single nucleotide polymorphisms (SNPs) and deletion/insertion polymorphisms (DIPs) across four population samples. The analysis provides informative tag SNPs that capture some of the variation in the MHC region and that could be used in initial disease association studies, and provides new insight into the evolutionary dynamics and ancestral origins of the HLA loci and their haplotypes.
doi:10.1038/ng1885
PMCID: PMC2670196  PMID: 16998491
18.  An African Ancestry-Specific Allele of CTLA4 Confers Protection against Rheumatoid Arthritis in African Americans 
PLoS Genetics  2009;5(3):e1000424.
Cytotoxic T-lymphocyte associated protein 4 (CTLA4) is a negative regulator of T-cell proliferation. Polymorphisms in CTLA4 have been inconsistently associated with susceptibility to rheumatoid arthritis (RA) in populations of European ancestry but have not been examined in African Americans. The prevalence of RA in most populations of European and Asian ancestry is ∼1.0%; RA is purportedly less common in black Africans, with little known about its prevalence in African Americans. We sought to determine if CTLA4 polymorphisms are associated with RA in African Americans. We performed a 2-stage analysis of 12 haplotype tagging single nucleotide polymorphisms (SNPs) across CTLA4 in a total of 505 African American RA patients and 712 African American controls using Illumina and TaqMan platforms. The minor allele (G) of the rs231778 SNP was 0.054 in RA patients, compared to 0.209 in controls (4.462×10−26, Fisher's exact). The presence of the G allele was associated with a substantially reduced odds ratio (OR) of having RA (AG+GG genotypes vs. AA genotype, OR 0.19, 95% CI: 0.13–0.26, p = 2.4×10−28, Fisher's exact), suggesting a protective effect. This SNP is polymorphic in the African population (minor allele frequency [MAF] 0.09 in the Yoruba population), but is very rare in other groups (MAF = 0.002 in 530 Caucasians genotyped for this study). Markers associated with RA in populations of European ancestry (rs3087243 [+60C/T] and rs231775 [+49A/G]) were not replicated in African Americans. We found no confounding of association for rs231778 after stratifying for the HLA-DRB1 shared epitope, presence of anti-cyclic citrullinated peptide antibody, or degree of admixture from the European population. An African ancestry-specific genetic variant of CTLA4 appears to be associated with protection from RA in African Americans. This finding may explain, in part, the relatively low prevalence of RA in black African populations.
Author Summary
Rheumatoid arthritis (RA) is a systemic autoimmune condition affecting the synovial membranes of diarthrodial joints. The etiology of RA is unclear but is thought to result from an environmental trigger in the context of genetic predisposition. We report that a single nucleotide polymorphism (SNP) (rs231778) in CTLA4, which encodes a negative regulator of T cell activation, is associated (p = 2.4×10−28) with protection from developing RA among African Americans. rs231778 is only polymorphic in populations of African ancestry. Protective alleles such as this one may contribute to the purported lower prevalence of RA in African Americans. Our finding appears to be independent from confounding by linkage with the HLA-DRB1 shared epitope or by genetic admixture. Furthermore, we did not replicate associations of CTLA4 SNPs with RA or other autoimmune diseases previously reported in Asians and Caucasians, such as rs3087243 (+60C/T) and rs231775 (+49A/G). The associations of different SNPs with RA susceptibility specific to different populations highlight the importance of CTLA4 in the pathogenesis of RA and demonstrate the ethnic-specific genetic background that contributes to its susceptibility.
doi:10.1371/journal.pgen.1000424
PMCID: PMC2652071  PMID: 19300490
19.  Identification of a Shared Genetic Susceptibility Locus for Coronary Heart Disease and Periodontitis 
PLoS Genetics  2009;5(2):e1000378.
Recent studies indicate a mutual epidemiological relationship between coronary heart disease (CHD) and periodontitis. Both diseases are associated with similar risk factors and are characterized by a chronic inflammatory process. In a candidate-gene association study, we identify an association of a genetic susceptibility locus shared by both diseases. We confirm the known association of two neighboring linkage disequilibrium regions on human chromosome 9p21.3 with CHD and show the additional strong association of these loci with the risk of aggressive periodontitis. For the lead SNP of the main associated linkage disequilibrium region, rs1333048, the odds ratio of the autosomal-recessive mode of inheritance is 1.99 (95% confidence interval 1.33–2.94; P = 6.9×10−4) for generalized aggressive periodontitis, and 1.72 (1.06–2.76; P = 2.6×10−2) for localized aggressive periodontitis. The two associated linkage disequilibrium regions map to the sequence of the large antisense noncoding RNA ANRIL, which partly overlaps regulatory and coding sequences of CDKN2A/CDKN2B. A closely located diabetes-associated variant was independent of the CHD and periodontitis risk haplotypes. Our study demonstrates that CHD and periodontitis are genetically related by at least one susceptibility locus, which is possibly involved in ANRIL activity and independent of diabetes associated risk variants within this region. Elucidation of the interplay of ANRIL transcript variants and their involvement in increased susceptibility to the interactive diseases CHD and periodontitis promises new insight into the underlying shared pathogenic mechanisms of these complex common diseases.
Author Summary
Coronary heart disease (CHD) and periodontitis are the most widespread diseases in the Western industrialized world and pose a substantial health threat to populations worldwide. CHD is a leading cause for premature death, and periodontitis is the major cause for tooth loss in adults over 40 years. Both diseases are associated with similar risk factors such as smoking, diabetes, and gender, and both diseases are further characterized by a chronic inflammatory process. In the last year, several genome studies have identified a region of the human genome near the CDKN2A and CDKN2B genes as having an influence on CHD. We show that this genetic region, being the most important susceptibility locus for CHD to date, is also associated with a substantial risk increase of aggressive periodontitis. The associated genetic region maps to a genomic region that codes for an “antisense RNA,” which partly overlaps regulatory and coding sequences of genes CDKN2A/CDKN2B. The interplay between these common inflammatory complex diseases could be partially due to the shared genetic risk variants of this antisense RNA.
doi:10.1371/journal.pgen.1000378
PMCID: PMC2632758  PMID: 19214202
20.  A Large-Scale Rheumatoid Arthritis Genetic Study Identifies Association at Chromosome 9q33.2 
PLoS Genetics  2008;4(6):e1000107.
Rheumatoid arthritis (RA) is a chronic, systemic autoimmune disease affecting both joints and extra-articular tissues. Although some genetic risk factors for RA are well-established, most notably HLA-DRB1 and PTPN22, these markers do not fully account for the observed heritability. To identify additional susceptibility loci, we carried out a multi-tiered, case-control association study, genotyping 25,966 putative functional SNPs in 475 white North American RA patients and 475 matched controls. Significant markers were genotyped in two additional, independent, white case-control sample sets (661 cases/1322 controls from North America and 596 cases/705 controls from The Netherlands) identifying a SNP, rs1953126, on chromosome 9q33.2 that was significantly associated with RA (ORcommon = 1.28, trend Pcomb = 1.45E-06). Through a comprehensive fine-scale-mapping SNP-selection procedure, 137 additional SNPs in a 668 kb region from MEGF9 to STOM on 9q33.2 were chosen for follow-up genotyping in a staged-approach. Significant single marker results (Pcomb<0.01) spanned a large 525 kb region from FBXW2 to GSN. However, a variety of analyses identified SNPs in a 70 kb region extending from the third intron of PHF19 across TRAF1 into the TRAF1-C5 intergenic region, but excluding the C5 coding region, as the most interesting (trend Pcomb: 1.45E-06 → 5.41E-09). The observed association patterns for these SNPs had heightened statistical significance and a higher degree of consistency across sample sets. In addition, the allele frequencies for these SNPs displayed reduced variability between control groups when compared to other SNPs. Lastly, in combination with the other two known genetic risk factors, HLA-DRB1 and PTPN22, the variants reported here generate more than a 45-fold RA-risk differential.
Author Summary
Rheumatoid arthritis (RA), a chronic autoimmune disorder affecting ∼1% of the population, is characterized by immune-cell–mediated destruction of the joint architecture. Gene–environment interactions are thought to underlie RA etiology. Variants within HLA-DRB1 and the hematopoietic-specific phosphatase, PTPN22, are well established RA-susceptibility loci, and although other markers have been identified, they do not fully account for the disease heritability. To identify additional susceptibility alleles, we carried out a multi-tiered, case-control association study genotyping >25,000 putative functional SNPs; here we report our finding of RA-associated variants in chromosome 9q33.2. A detailed genetic analysis of this region, incorporating HapMap information, localizes the RA-susceptibility effects to a 70 kb region that includes a portion of PHF19, all of TRAF1, and the majority of the TRAF1-C5 intergenic region, but excludes the C5 coding region. In addition to providing new insights into underlying mechanism(s) of disease and suggesting novel therapeutic targets, these data provide the underpinnings of a genetic signature that may predict individuals at increased risk for developing RA. Indeed, initial analyses of three known genetic risk factors, HLA, PTPN22, and the chromosome 9q33.2 variants described here, suggest a >45-fold difference in RA risk depending on an individual's three-locus genotype.
doi:10.1371/journal.pgen.1000107
PMCID: PMC2481282  PMID: 18648537
21.  Linkage Analysis of a Model Quantitative Trait in Humans: Finger Ridge Count Shows Significant Multivariate Linkage to 5q14.1 
PLoS Genetics  2007;3(9):e165.
The finger ridge count (a measure of pattern size) is one of the most heritable complex traits studied in humans and has been considered a model human polygenic trait in quantitative genetic analysis. Here, we report the results of the first genome-wide linkage scan for finger ridge count in a sample of 2,114 offspring from 922 nuclear families. Both univariate linkage to the absolute ridge count (a sum of all the ridge counts on all ten fingers), and multivariate linkage analyses of the counts on individual fingers, were conducted. The multivariate analyses yielded significant linkage to 5q14.1 (Logarithm of odds [LOD] = 3.34, pointwise-empirical p-value = 0.00025) that was predominantly driven by linkage to the ring, index, and middle fingers. The strongest univariate linkage was to 1q42.2 (LOD = 2.04, point-wise p-value = 0.002, genome-wide p-value = 0.29). In summary, the combination of univariate and multivariate results was more informative than simple univariate analyses alone. Patterns of quantitative trait loci factor loadings consistent with developmental fields were observed, and the simple pleiotropic model underlying the absolute ridge count was not sufficient to characterize the interrelationships between the ridge counts of individual fingers.
Author Summary
Finger ridge count (an index of the size of the fingerprint pattern) has been used as a model trait for the study of human quantitative genetics for over 80 years. Here, we present the first genome-wide linkage scan for finger ridge count in a large sample of 2,114 offspring from 922 nuclear families. Our results illustrate the increase in power and information that can be gained from a multivariate linkage analysis of ridge counts of individual fingers as compared to a univariate analysis of a summary measure (absolute ridge count). The strongest evidence for linkage was seen at 5q14.1, and the pattern of loadings was consistent with a developmental field factor whose influence is greatest on the ring finger, falling off to either side, which is consistent with previous findings that heritability for ridge count is higher for the middle three fingers. We feel that the paper will be of specific methodological interest to those conducting linkage and association analyses with summary measures. In addition, given the frequency with which this phenotype is used as a didactic example in genetics courses we feel that this paper will be of interest to the general scientific community.
doi:10.1371/journal.pgen.0030165
PMCID: PMC1994711  PMID: 17907812
22.  Two-Stage Two-Locus Models in Genome-Wide Association 
PLoS Genetics  2006;2(9):e157.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
Synopsis
Although there is growing appreciation that attempting to map genetic interactions in humans may be a fruitful endeavor, there is no consensus as to the best strategy for their detection, particularly in the case of genome-wide association where the number of potential comparisons is enormous. In this article, the authors compare the performance of four different search strategies to detect loci which interact in genome-wide association—a single-locus search, an exhaustive two-locus search, and two, two-stage procedures in which a subset of loci initially identified with single-locus tests are analyzed using a full two-locus model. Their results show that when loci interact, an exhaustive two-locus search across the genome is superior to a two-stage strategy, and in many situations can identify loci which would not have been identified solely using a single-locus search. Their findings suggest that an exhaustive search involving all pairwise combinations of markers across the genome may provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
doi:10.1371/journal.pgen.0020157
PMCID: PMC1570380  PMID: 17002500
23.  Two-Stage Two-Locus Models in Genome-Wide Association 
PLoS Genetics  2006;2(9):e157.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
Synopsis
Although there is growing appreciation that attempting to map genetic interactions in humans may be a fruitful endeavor, there is no consensus as to the best strategy for their detection, particularly in the case of genome-wide association where the number of potential comparisons is enormous. In this article, the authors compare the performance of four different search strategies to detect loci which interact in genome-wide association—a single-locus search, an exhaustive two-locus search, and two, two-stage procedures in which a subset of loci initially identified with single-locus tests are analyzed using a full two-locus model. Their results show that when loci interact, an exhaustive two-locus search across the genome is superior to a two-stage strategy, and in many situations can identify loci which would not have been identified solely using a single-locus search. Their findings suggest that an exhaustive search involving all pairwise combinations of markers across the genome may provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
doi:10.1371/journal.pgen.0020157
PMCID: PMC1570380  PMID: 17002500

Results 1-23 (23)