1.  A genome-wide association study and biological pathway analysis of epilepsy prognosis in a prospective cohort of newly treated epilepsy 
Human Molecular Genetics  2013;23(1):247-258.
We present the analysis of a prospective multicentre study to investigate genetic effects on the prognosis of newly treated epilepsy. Patients with a new clinical diagnosis of epilepsy requiring medication were recruited and followed up prospectively. The clinical outcome was defined as freedom from seizures for a minimum of 12 months in accordance with the consensus statement from the International League Against Epilepsy (ILAE). Genetic effects on remission of seizures after starting treatment were analysed with and without adjustment for significant clinical prognostic factors, and the results from each cohort were combined using a fixed-effects meta-analysis. After quality control (QC), we analysed 889 newly treated epilepsy patients using 472 450 genotyped and 6.9 × 106 imputed single-nucleotide polymorphisms. Suggestive evidence for association (defined as Pmeta < 5.0 × 10−7) with remission of seizures after starting treatment was observed at three loci: 6p12.2 (rs492146, Pmeta = 2.1 × 10−7, OR[G] = 0.57), 9p23 (rs72700966, Pmeta = 3.1 × 10−7, OR[C] = 2.70) and 15q13.2 (rs143536437, Pmeta = 3.2 × 10−7, OR[C] = 1.92). Genes of biological interest at these loci include PTPRD and ARHGAP11B (encoding functions implicated in neuronal development) and GSTA4 (a phase II biotransformation enzyme). Pathway analysis using two independent methods implicated a number of pathways in the prognosis of epilepsy, including KEGG categories ‘calcium signaling pathway’ and ‘phosphatidylinositol signaling pathway’. Through a series of power curves, we conclude that it is unlikely any single common variant explains >4.4% of the variation in the outcome of newly treated epilepsy.
PMCID: PMC3857947  PMID: 23962720
2.  Novel Approach Identifies SNPs in SLC2A10 and KCNK9 with Evidence for Parent-of-Origin Effect on Body Mass Index 
Hoggart, Clive J. | Venturini, Giulia | Mangino, Massimo | Gomez, Felicia | Ascari, Giulia | Zhao, Jing Hua | Teumer, Alexander | Winkler, Thomas W. | Tšernikova, Natalia | Luan, Jian'an | Mihailov, Evelin | Ehret, Georg B. | Zhang, Weihua | Lamparter, David | Esko, Tõnu | Macé, Aurelien | Rüeger, Sina | Bochud, Pierre-Yves | Barcella, Matteo | Dauvilliers, Yves | Benyamin, Beben | Evans, David M. | Hayward, Caroline | Lopez, Mary F. | Franke, Lude | Russo, Alessia | Heid, Iris M. | Salvi, Erika | Vendantam, Sailaja | Arking, Dan E. | Boerwinkle, Eric | Chambers, John C. | Fiorito, Giovanni | Grallert, Harald | Guarrera, Simonetta | Homuth, Georg | Huffman, Jennifer E. | Porteous, David | Moradpour, Darius | Iranzo, Alex | Hebebrand, Johannes | Kemp, John P. | Lammers, Gert J. | Aubert, Vincent | Heim, Markus H. | Martin, Nicholas G. | Montgomery, Grant W. | Peraita-Adrados, Rosa | Santamaria, Joan | Negro, Francesco | Schmidt, Carsten O. | Scott, Robert A. | Spector, Tim D. | Strauch, Konstantin | Völzke, Henry | Wareham, Nicholas J. | Yuan, Wei | Bell, Jordana T. | Chakravarti, Aravinda | Kooner, Jaspal S. | Peters, Annette | Matullo, Giuseppe | Wallaschofski, Henri | Whitfield, John B. | Paccaud, Fred | Vollenweider, Peter | Bergmann, Sven | Beckmann, Jacques S. | Tafti, Mehdi | Hastie, Nicholas D. | Cusi, Daniele | Bochud, Murielle | Frayling, Timothy M. | Metspalu, Andres | Jarvelin, Marjo-Riitta | Scherag, André | Smith, George Davey | Borecki, Ingrid B. | Rousson, Valentin | Hirschhorn, Joel N. | Rivolta, Carlo | Loos, Ruth J. F. | Kutalik, Zoltán
PLoS Genetics  2014;10(7):e1004508.
The phenotypic effect of some single nucleotide polymorphisms (SNPs) depends on their parental origin. We present a novel approach to detect parent-of-origin effects (POEs) in genome-wide genotype data of unrelated individuals. The method exploits increased phenotypic variance in the heterozygous genotype group relative to the homozygous groups. We applied the method to >56,000 unrelated individuals to search for POEs influencing body mass index (BMI). Six lead SNPs were carried forward for replication in five family-based studies (of ∼4,000 trios). Two SNPs replicated: the paternal rs2471083-C allele (located near the imprinted KCNK9 gene) and the paternal rs3091869-T allele (located near the SLC2A10 gene) increased BMI equally (beta = 0.11 (SD), P<0.0027) compared to the respective maternal alleles. Real-time PCR experiments of lymphoblastoid cell lines from the CEPH families showed that expression of both genes was dependent on parental origin of the SNPs alleles (P<0.01). Our scheme opens new opportunities to exploit GWAS data of unrelated individuals to identify POEs and demonstrates that they play an important role in adult obesity.
Author Summary
Large genetic association studies have revealed many genetic factors influencing common traits, such as body mass index (BMI). These studies assume that the effect of genetic variants is the same regardless of whether they are inherited from the mother or the father. In our study, we have developed a new approach that allows us to investigate variants whose impact depends on their parental origin (parent-of-origin effects), in unrelated samples when the parental origin cannot be inferred. This is feasible because at genetic markers at which such effects occur there is increased variability of the trait among individuals who inherited different genetic codes from their mother and their father compared to individuals who inherited the same genetic code from both parents. We applied this methodology to discover genetic markers with parent-of-origin effects (POEs) on BMI. This resulted in six candidate markers showing strong POE association. We then attempted to replicate the POE effects of these markers in family studies (where one can infer the parental origin of the inherited variants). Two of our candidates showed significant association in the family studies, the paternal and maternal effects of these markers were in the opposite direction.
PMCID: PMC4117451  PMID: 25078964
3.  Natural and Orthogonal Interaction framework for modeling gene-environment interactions with application to lung cancer 
Human heredity  2012;73(4):185-194.
We aimed at extending the natural and orthogonal interaction (NOIA) framework, developed for modeling gene-gene interactions in the analysis of quantitative traits, to allow for reduced genetic models, dichotomous traits, and gene-environment interactions. We evaluate the performance of the NOIA statistical models using simulated data and lung cancer data.
The NOIA statistical models are developed for the additive, dominant, recessive genetic models, and a binary environmental exposure. Using the Kronecker product rule, a NOIA statistical model is built to model gene-environment interactions. By treating the genotypic values as the logarithm of odds, the NOIA statistical models are extended to the analysis of case-control data.
Our simulations showed that power for testing associations while allowing for interaction using the statistical model is much higher than using functional models for most of the scenarios we simulated. When applied to the lung cancer data, much smaller P-values were obtained using the NOIA statistical model for either the main effects or the SNP-smoking interactions for some of the SNPs tested.
The NOIA statistical models are usually more powerful than the functional models in detecting main effects and interaction effects for both quantitative traits and binary traits.
PMCID: PMC3534768  PMID: 22889990
Statistical power; Genetic association studies; Case-control association analysis; Gene-environment interaction; Environmental risk factor; Association mapping; Orthogonal modeling
4.  Genome-wide association study of primary tooth eruption identifies pleiotropic loci associated with height and craniofacial distances 
Human Molecular Genetics  2013;22(18):3807-3817.
Twin and family studies indicate that the timing of primary tooth eruption is highly heritable, with estimates typically exceeding 80%. To identify variants involved in primary tooth eruption, we performed a population-based genome-wide association study of ‘age at first tooth’ and ‘number of teeth’ using 5998 and 6609 individuals, respectively, from the Avon Longitudinal Study of Parents and Children (ALSPAC) and 5403 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966). We tested 2 446 724 SNPs imputed in both studies. Analyses were controlled for the effect of gestational age, sex and age of measurement. Results from the two studies were combined using fixed effects inverse variance meta-analysis. We identified a total of 15 independent loci, with 10 loci reaching genome-wide significance (P < 5 × 10−8) for ‘age at first tooth’ and 11 loci for ‘number of teeth’. Together, these associations explain 6.06% of the variation in ‘age of first tooth’ and 4.76% of the variation in ‘number of teeth’. The identified loci included eight previously unidentified loci, some containing genes known to play a role in tooth and other developmental pathways, including an SNP in the protein-coding region of BMP4 (rs17563, P = 9.080 × 10−17). Three of these loci, containing the genes HMGA2, AJUBA and ADK, also showed evidence of association with craniofacial distances, particularly those indexing facial width. Our results suggest that the genome-wide association approach is a powerful strategy for detecting variants involved in tooth eruption, and potentially craniofacial growth and more generally organ development.
PMCID: PMC3749866  PMID: 23704328
5.  Genome-wide association study using a high-density SNP-array and case-control design identifies a novel essential hypertension susceptibility locus in the promoter region of eNOS 
Hypertension  2011;59(2):248-255.
Essential hypertension is a multi-factorial disorder and is the main risk factor for renal and cardiovascular complications. The research on the genetics of hypertension has been frustrated by the small predictive value of the discovered genetic variants. The HYPERGENES Project investigated associations between genetic variants and essential hypertension pursuing a two-stage study by recruiting cases and controls from extensively characterized cohorts recruited over many years in different European regions.
The discovery phase consisted of 1,865 cases and 1,750 controls genotyped with 1M Illumina array. Best hits were followed up in a validation panel of 1,385 cases and 1,246 controls that were genotyped with a custom array of 14,055 markers. We identified a new hypertension susceptibility locus (rs3918226) in the promoter region of the endothelial nitric oxide synthase (eNOS) gene (odds ratio 1.54; 95% CI 1.37-1.73; combined p=2.58·10−13). A meta-analysis, using other in-silico/de novo genotyping data for a total of 21714 subjects, resulted in an overall odds ratio of 1.34 (95% CI 1.25-1.44, p=1.032·10−14). The quantitative analysis on a population-based sample revealed an effect size of 1.91 (95% CI 0.16-3.66) for systolic and 1.40 (95% CI 0.25-2.55) for diastolic blood pressure. We identified in-silico a potential binding site for ETS transcription-factors directly next to rs3918226, suggesting a potential modulation of eNOS expression. Biological evidence links eNOS with hypertension, as it is a critical mediator of cardiovascular homeostasis and blood pressure control via vascular tone regulation. This finding supports the hypothesis that there may be a causal genetic variation at this locus.
PMCID: PMC3272453  PMID: 22184326
genetic epidemiology; risk factors; genetics-association studies; nitric oxide; Essential Hypertension
6.  MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS 
PLoS ONE  2012;7(5):e34861.
The genome-wide association study (GWAS) approach has discovered hundreds of genetic variants associated with diseases and quantitative traits. However, despite clinical overlap and statistical correlation between many phenotypes, GWAS are generally performed one-phenotype-at-a-time. Here we compare the performance of modelling multiple phenotypes jointly with that of the standard univariate approach. We introduce a new method and software, MultiPhen, that models multiple phenotypes simultaneously in a fast and interpretable way. By performing ordinal regression, MultiPhen tests the linear combination of phenotypes most associated with the genotypes at each SNP, and thus potentially captures effects hidden to single phenotype GWAS. We demonstrate via simulation that this approach provides a dramatic increase in power in many scenarios. There is a boost in power for variants that affect multiple phenotypes and for those that affect only one phenotype. While other multivariate methods have similar power gains, we describe several benefits of MultiPhen over these. In particular, we demonstrate that other multivariate methods that assume the genotypes are normally distributed, such as canonical correlation analysis (CCA) and MANOVA, can have highly inflated type-1 error rates when testing case-control or non-normal continuous phenotypes, while MultiPhen produces no such inflation. To test the performance of MultiPhen on real data we applied it to lipid traits in the Northern Finland Birth Cohort 1966 (NFBC1966). In these data MultiPhen discovers 21% more independent SNPs with known associations than the standard univariate GWAS approach, while applying MultiPhen in addition to the standard approach provides 37% increased discovery. The most associated linear combinations of the lipids estimated by MultiPhen at the leading SNPs accurately reflect the Friedewald Formula, suggesting that MultiPhen could be used to refine the definition of existing phenotypes or uncover novel heritable phenotypes.
PMCID: PMC3342314  PMID: 22567092
7.  TTC12-ANKK1-DRD2 and CHRNA5-CHRNA3-CHRNB4 influence different pathways leading to smoking behaviour from adolescence to mid-adulthood 
Biological psychiatry  2010;69(7):650-660.
CHRNA5-CHRNA3-CHRNB4 and TTC12-ANKK1-DRD2 gene-clusters influence smoking behavior. Our aim was to test developmental changes in their effects as well as the interplays between them and with non-genetic factors.
Participants included 4,762 subjects from a general population based prospective Northern Finland 1966 Birth Cohort (NFBC 1966). Smoking behavior was collected at age 14 and 31 years(y). Information on maternal smoking, socio-economic status, and novelty seeking were also collected. Structural equation modeling was used to construct an integrative etiological model including genetic and non-genetic factors.
Several SNPs in both gene-clusters were significantly associated with smoking. The most significant were in CHRNA3 (rs1051730, P=1.1×10−5) and in TTC12 (rs10502172, P=9.1×10−6). CHRNA3-rs1051730[A] was more common among heavy/regular smokers than non-smokers with similar effect-sizes at age 14y [OR(95%CI):1.27(1.06–1.52)] and 31y [1.28(1.13–1.44)]. TTC12-rs10502172[G] was more common among smokers than non-smokers with stronger association at 14y [1.33(1.11–1.60)] than 31y [1.14(1.02–1.28)]. In adolescence, carriers of three-four risk alleles at either CHRNA3-rs1051730 or TTC12-rs10502172 had almost 3-fold odds of smoking regularly than subjects with no risk alleles. TTC12-rs10502172 effect on smoking in adulthood was mediated by its effect on smoking in adolescence and via novelty seeking. Effect of CHRNA3-rs1051730 on smoking in adulthood was direct.
TTC12-ANKK1-DRD2s seemed to influence smoking behavior mainly in adolescence and its effect is partially mediated by personality characteristics promoting drug-seeking behavior. In contrast, CHRNA5-CHRNA3-CHRNB4 is involved in the transition towards heavy smoking in mid-adulthood and in smoking persistence. Factors related to familial and social disadvantages were strong independent predictors of smoking.
PMCID: PMC3058144  PMID: 21168125
TTC12; ANKK1; DRD2; CHRNA5; CHRNA3; CHRNB4; nicotine; gene; adolescence; smoking
8.  Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels 
Nature genetics  2009;41(11):1170-1172.
We carried out a genome-wide association study of hemoglobin levels in 16,001 individuals of European and Indian Asian ancestry. The most closely associated SNP (rs855791) results in nonsynonymous (V736A) change in the serine protease domain of TMPRSS6 and a blood hemoglobin concentration 0.13 (95% CI 0.09–0.17) g/dl lower per copy of allele A (P = 1.6 × 10−13). Our findings suggest that TMPRSS6, a regulator of hepcidin synthesis and iron handling, is crucial in hemoglobin level maintenance.
PMCID: PMC3178047  PMID: 19820698
10.  Genome-Wide Association Study Reveals Multiple Loci Associated with Primary Tooth Development during Infancy 
PLoS Genetics  2010;6(2):e1000856.
Tooth development is a highly heritable process which relates to other growth and developmental processes, and which interacts with the development of the entire craniofacial complex. Abnormalities of tooth development are common, with tooth agenesis being the most common developmental anomaly in humans. We performed a genome-wide association study of time to first tooth eruption and number of teeth at one year in 4,564 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966) and 1,518 individuals from the Avon Longitudinal Study of Parents and Children (ALSPAC). We identified 5 loci at P<5×10−8, and 5 with suggestive association (P<5×10−6). The loci included several genes with links to tooth and other organ development (KCNJ2, EDA, HOXB2, RAD51L1, IGF2BP1, HMGA2, MSRB3). Genes at four of the identified loci are implicated in the development of cancer. A variant within the HOXB gene cluster associated with occlusion defects requiring orthodontic treatment by age 31 years.
Author Summary
Genome-wide association studies have been used to identify genetic variants conferring susceptibility to diseases, intermediate phenotypes, and physiological traits such as height, hair color, and age at menarche. Here we analyze the NFBC1966 and ALSPAC birth cohorts to investigate the genetic determinants of a key developmental process: primary tooth development. The prospective nature of our studies allows us to exploit accurate measurements of age at first tooth eruption and number of teeth at one year, and also provides the opportunity to assess whether genetic variants affecting these traits are associated with dental problems later in the life course. Of the genes that we find to be associated with primary tooth development, several have established roles in tooth development and growth, and almost half have proposed links with the development of cancer. We find that one of the variants is also associated with occlusion defects requiring orthodontic treatment later in life. Our findings should provide a strong foundation for the study of the genetic architecture of tooth development, which as well as its relevance to medicine and dentistry, may have implications in evolutionary biology since teeth represent important markers of evolution.
PMCID: PMC2829062  PMID: 20195514
11.  Pathway Analysis of GWAS Provides New Insights into Genetic Susceptibility to 3 Inflammatory Diseases 
PLoS ONE  2009;4(11):e8068.
Although the introduction of genome-wide association studies (GWAS) have greatly increased the number of genes associated with common diseases, only a small proportion of the predicted genetic contribution has so far been elucidated. Studying the cumulative variation of polymorphisms in multiple genes acting in functional pathways may provide a complementary approach to the more common single SNP association approach in understanding genetic determinants of common disease. We developed a novel pathway-based method to assess the combined contribution of multiple genetic variants acting within canonical biological pathways and applied it to data from 14,000 UK individuals with 7 common diseases. We tested inflammatory pathways for association with Crohn's disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) with 4 non-inflammatory diseases as controls. Using a variable selection algorithm, we identified variants responsible for the pathway association and evaluated their use for disease prediction using a 10 fold cross-validation framework in order to calculate out-of-sample area under the Receiver Operating Curve (AUC). The generalisability of these predictive models was tested on an independent birth cohort from Northern Finland. Multiple canonical inflammatory pathways showed highly significant associations (p 10−3–10−20) with CD, T1D and RA. Variable selection identified on average a set of 205 SNPs (149 genes) for T1D, 350 SNPs (189 genes) for RA and 493 SNPs (277 genes) for CD. The pattern of polymorphisms at these SNPS were found to be highly predictive of T1D (91% AUC) and RA (85% AUC), and weakly predictive of CD (60% AUC). The predictive ability of the T1D model (without any parameter refitting) had good predictive ability (79% AUC) in the Finnish cohort. Our analysis suggests that genetic contribution to common inflammatory diseases operates through multiple genes interacting in functional pathways.
PMCID: PMC2778995  PMID: 19956648
12.  Genome-wide association analysis of metabolic traits in a birth cohort from a founder population 
Nature genetics  2008;41(1):35-46.
Genome-wide association studies (GWAS) of longitudinal birth cohorts enable joint investigation of environmental and genetic influences on complex traits. We report GWAS results for nine quantitative metabolic traits (triglycerides, high-density lipoprotein, low-density lipoprotein, glucose, insulin, C-reactive protein, body mass index, and systolic and diastolic blood pressure) in the Northern Finland Birth Cohort 1966 (NFBC1966), drawn from the most genetically isolated Finnish regions. We replicate most previously reported associations for these traits and identify nine new associations, several of which highlight genes with metabolic functions: high-density lipoprotein with NR1H3 (LXRA), low-density lipoprotein with AR and FADS1-FADS2, glucose with MTNR1B, and insulin with PANK1. Two of these new associations emerged after adjustment of results for body mass index. Gene-environment interaction analyses suggested additional associations, which will require validation in larger samples. The currently identified loci, together with quantified environmental exposures, explain little of the trait variation in NFBC1966. The association observed between low-density lipoprotein and an infrequent variant in AR suggests the potential of such a cohort for identifying associations with both common, low-impact and rarer, high-impact quantitative trait loci.
PMCID: PMC2687077  PMID: 19060910
13.  Genetic Determinants of Height Growth Assessed Longitudinally from Infancy to Adulthood in the Northern Finland Birth Cohort 1966 
PLoS Genetics  2009;5(3):e1000409.
Recent genome-wide association (GWA) studies have identified dozens of common variants associated with adult height. However, it is unknown how these variants influence height growth during childhood. We derived peak height velocity in infancy (PHV1) and puberty (PHV2) and timing of pubertal height growth spurt from parametric growth curves fitted to longitudinal height growth data to test their association with known height variants. The study consisted of N = 3,538 singletons from the prospective Northern Finland Birth Cohort 1966 with genotype data and frequent height measurements (on average 20 measurements per person) from 0–20 years. Twenty-six of the 48 variants tested associated with adult height (p<0.05, adjusted for sex and principal components) in this sample, all in the same direction as in previous GWA scans. Seven SNPs in or near the genes HHIP, DLEU7, UQCC, SF3B4/SV2A, LCORL, and HIST1H1D associated with PHV1 and five SNPs in or near SOCS2, SF3B4/SV2A, C17orf67, CABLES1, and DOT1L with PHV2 (p<0.05). We formally tested variants for interaction with age (infancy versus puberty) and found biologically meaningful evidence for an age-dependent effect for the SNP in SOCS2 (p = 0.0030) and for the SNP in HHIP (p = 0.045). We did not have similar prior evidence for the association between height variants and timing of pubertal height growth spurt as we had for PHVs, and none of the associations were statistically significant after correction for multiple testing. The fact that in this sample, less than half of the variants associated with adult height had a measurable effect on PHV1 or PHV2 is likely to reflect limited power to detect these associations in this dataset. Our study is the first genetic association analysis on longitudinal height growth in a prospective cohort from birth to adulthood and gives grounding for future research on the genetic regulation of human height during different periods of growth.
Author Summary
Family studies have shown that adult height is largely genetically determined. Identification of common genetic factors has been expedited with recent advances in genotyping techniques. However, factors regulating childhood height growth remain unclear. We investigated genetic variants of adult height for associations with peak height velocity in infancy (PHV1) and puberty (PHV2) and timing of pubertal growth spurt in a population based sample of 3,538 Finns born in 1966. Most variants studied associated with adult height in this sample. Of the 48 genetic variants tested, seven of them associated with PHV1 and five with PHV2. However, only one of these associated with both, and we found suggestive evidence for differential effects at different stages of growth for some of the variants. In this sample, less than half of the variants associated with adult height had a measurable effect on PHV1 or PHV2. However, these differences may reflect lower statistical power to detect associations with height velocities compared to adult height. This study provides a foundation for further biological investigation into the genes acting at each stage of height growth.
PMCID: PMC2646138  PMID: 19266077
14.  Fregene: Simulation of realistic sequence-level data in populations and ascertained samples 
BMC Bioinformatics  2008;9:364.
FREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets.
We report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection.
FREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended.
PMCID: PMC2542380  PMID: 18778480
15.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies 
PLoS Genetics  2008;4(7):e1000130.
Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.
Author Summary
Tests of association with disease status are normally conducted one SNP at a time, ignoring the effects of all other genotyped SNPs. We developed a computationally efficient method to simultaneously analyse all SNPs, either in a genome-wide association (GWA) study, or a fine-mapping study based on re-sequencing and/or imputation. The method selects a subset of SNPs that best predicts disease status, while controlling the type-I error of the selected SNPs. This brings many advantages over standard single-SNP approaches, because the signal from a particular SNP can be more clearly assessed when other SNPs associated with disease status are already included in the model. Thus, in comparison with single-SNP analyses, power is increased and the false positive rate is reduced because of reduced residual variation. Localisation is also greatly improved. We demonstrate these advantages over the widely used single-SNP Armitage Trend Test using GWA simulation studies, a real GWA dataset, and a sequence-based fine-mapping simulation study.
PMCID: PMC2464715  PMID: 18654633

