Associations between multilocus heterozygosity and fitness traits, also termed heterozygosity and fitness correlations (HFCs), have been reported in numerous organisms. These studies, in general, indicate a positive relationship between heterozygosity and fitness traits. We studied the association between genome-wide heterozygosity at 706 non-synonymous and synonymous SNPs and 19 quantitative traits, including morphological, biochemical and fitness traits in the Framingham Heart Study. Statistically significant association was found between heterozygosity and systolic and diastolic blood pressures as well as left ventricular diameter and wall thickness. These results suggest that heterozygosity may be associated with traits, such as blood pressure that closely track environmental variations. Balancing selection may be operating in the maintenance of heterozygosity and the major components of blood pressure and hypertension. Genome wide SNP heterozygosity may be used to understand the phenomenon of dominance as well as the evolutionary basis of many quantitative traits in humans.
Genome wide heterozygosity; Single Nucleotide Polymorphisms; Balancing selection; inbreeding; association; plastic traits; dominance
Next generation sequencing holds great promise for detecting rare variants underlying complex human traits. Due to their extremely low allele frequencies, the normality approximation for a proportion no longer works well. The Fisher’s exact method appears to be suitable but it is conservative. We investigate the utility of various variance-stabilizing transformations in single marker association analysis on rare variants. Unlike a proportion itself, the variance of the transformed proportions no longer depends on the proportion, making application of such transformations to rare variant association analysis extremely appealing. Simulation studies demonstrate that tests based on such transformations are more powerful than the Fisher’s exact test while controlling for type I error rate. Based on theoretical considerations and results from simulation studies, we recommend the test based on the Anscombe transformation over tests with other transformations.
rare variants; sequencing; variance-stabilizing transformation; association
To identify novel late-onset Alzheimer disease (LOAD) risk genes, we have analyzed Amish populations of Ohio and Indiana. We performed genome-wide SNP linkage and association studies on 798 individuals (109 with LOAD). We tested association using the Modified Quasi-Likelihood Score (MQLS) test and also performed two-point and multipoint linkage analyses. We found that LOAD was significantly associated with APOE (P=9.0×10-6) in all our ascertainment regions except for the Adams County, Indiana, community (P=0.55). Genome-wide, the most strongly associated SNP was rs12361953 (P=7.92×10-7). A very strong, genome-wide significant multipoint peak (recessive HLOD=6.14, dominant HLOD=6.05) was detected on 2p12. Three additional loci with multipoint HLOD scores >3 were detected on 3q26, 9q31, and 18p11. Converging linkage and association results, the most significantly associated SNP under the 2p12 peak was at rs2974151 (P=1.29×10-4). This SNP is located in CTNNA2, which encodes catenin alpha 2, a neuronal-specific catenin known to have function in the developing brain. These results identify CTNNA2 as a novel candidate LOAD gene, and implicate three other regions of the genome as novel LOAD loci. These results underscore the utility of using family-based linkage and association analysis in isolated populations to identify novel loci for traits with complex genetic architecture.
GWAS; Linkage; founder population; Amish; Alzheimer
Successful aging (SA) is a multi-dimensional phenotype involving preservation of cognitive ability, physical function, and social engagement throughout life. Multiple components of SA are heritable, supporting a genetic component. The Old Order Amish are genetically and socially isolated with homogeneous lifestyles, making them a suitable population for studying the genetics of SA. DNA and measures of SA were collected on 214 cognitively intact Amish individuals over age 80. Individuals were grouped into a 13-generation pedigree using the Anabaptist Genealogy Database. A linkage screen of 5,944 single nucleotide polymorphisms (SNPs) was performed using 12 informative sub-pedigrees with an affected-only 2-point and multipoint linkage analysis. Eleven SNPs produced 2-point LOD scores >2, suggestive of linkage. Multipoint linkage analyses, allowing for heterogeneity, detected significant lod scores on chromosomes 6 (HLOD = 4.50), 7 (LOD* = 3.11), and 14 (HLOD = 4.17), suggesting multiple new loci underlying SA.
Amish; longevity; genetic epidemiology; family-based study; population isolate
Dizygotic (DZ) twinning has a genetic component and is common among sub-Saharan Africans; in The Gambia its frequency is up to 3% of live births. Variation in Pentraxin 3 (PTX3), a soluble pattern recognition receptor that plays an important role both in humoral innate immunity and in female fertility, has been associated with resistance to M. tuberculosis infection and to P. aeruginosa infection in cystic fibrosis patients. We tested whether PTX3 variants in Gambian women associate with DZ twinning, by genotyping five PTX3 single nucleotide polymorphisms (SNPs) in 130 sister pairs (96 full sibs and 34 half sibs) who had DZ twins. We found that two, three and five SNP haplotypes differed in frequency between twinning mothers and those without a history of twinning (from p = 0.006 to 3.03e-06 for two SNP and three SNP haplotypes, respectively). Twinning mothers and West African tuberculosis-controls from a previous study shared several frequent haplotypes. Most importantly, our data are consistent with the previously reported association of PTX3 and female fertility in a West African sample from Ghana. Taken together, these results indicate that selective pressure on PTX3 variants that affect the innate immune response to infectious agents, could also produce the observed high incidence of DZ twinning in Gambians.
dizygotic twinning; fertility; innate immunity; Pentraxin 3; The Gambia; Africa
A variant allele, ADH1B*48His, also known as ADH1B*2, at the human Alcohol Dehydrogenase 1B gene (ADH1B) is strongly associated with alcoholism in some populations and has an unusual geographic distribution. Strong evidence implies selection has increased the frequency of this allele in some East Asian populations but does not fully explain its geographic pattern. We have studied haplotypes of ten single nucleotide polymorphisms (SNPs) and two short tandem repeat polymorphisms (STRPs) in the ADH1B region in 2,206 individuals from a worldwide set of populations. These SNPs and STRPs define nine common haplogroups most of which have distinct geographic patterns. The haplogroups H5 and H6, both with the derived ADH1B*48His allele, appear restricted to the Middle East and East Asia, respectively. The positively selected H7 is derived from H6 by a new regulatory region variant defining SNP rs3811801 restricted to East Asia. Age estimates of the haplogroups based on the STRPs also agree with the time of the migration events estimated by other studies. H7 is estimated to have expanded recently, around 2,800 years ago, and ancient DNA samples from North China confirm its presence about that time. The dating of the H7 expansion may help understand the selective force on the ADH1B gene.
ADH1B; Haplotype evolution; Recent expansion; Geographic distribution
In the analysis of case-control genetic association, the trend test and Pearson’s test are the two most commonly used tests. In genome-wide association studies (GWAS), Bayes factor is a useful tool to support significant p-values, and a better measure than p-value when results are compared across studies with different sample sizes. When reporting the p-value of the trend test, we propose a Bayes factor directly based on the trend test. To improve the power to detect association under recessive or dominant genetic models, we propose a Bayes factor based on the trend test and incorporating Hardy-Weinberg disequilibrium in cases. When the true model is unknown, or both the trend test and Pearson’s test or other robust tests are applied in genome-wide scans, we propose a joint Bayes factor, combining the previous two Bayes factors. All three Bayes factors studied in this paper have closed forms and are easy to compute without integrations, so they can be reported along with p-values, especially in GWAS. We discuss how to use each of them and how to specify priors. Simulation studies and applications to three GWAS are provided to illustrate their usefulness to detect non-additive gene susceptibility in practice.
Bayes factor; Comparing association studies; Genome-wide association studies; Hardy-Weinberg disequilibrium; Pearson’s test; Trend test
Testing multiple markers simultaneously not only can capture the linkage disequilibrium patterns but also can decrease the number of tests and thus alleviate the multiple-testing penalty. If a gene is associated with a phenotype, subjects with similar genotypes in this gene should also have similar phenotypes. Based on this concept, we have developed a general framework that is applicable to continuous traits. Two similarity-based tests (namely, SIMc and SIMp tests) were derived as special cases of the general framework. In our simulation study, we compared the power of the two tests with that of the single-marker analysis, a standard haplotype regression, and a popular and powerful kernel machine regression. Our SIMc test outperforms other tests when the average r-square (a measure of linkage disequilibrium) between the causal variant and the surrounding markers is larger than 0.3 or when the causal allele is common (say, frequency = 0.3). Our SIMp test outperforms other tests when the causal variant was introduced at common haplotypes (the maximum frequency of risk haplotypes > 0.4). We also applied our two tests to an adiposity data set to show their utility.
Haplotype; Similarity; Genomic distance; Linkage disequilibrium; Multi-marker test; Body-mass index; CPE gene
It is now well established that population stratification can result in spurious association findings in genetic case-control studies. However, very few studies have addressed similar issues for mapping quantitative traits. Since quantitative phenotypes are often precursors of clinical end-point traits and carry more information on within-genotype trait variability, it has been argued that studying these quantitative traits may be a more powerful strategy to map genes than the binary clinical end-points. Thus, it is of interest to evaluate the adverse effects of population stratification on the analyses of quantitative traits. The popular statistical tests of association for quantitative traits using population level data are ANOVA, linear regression with an additive allelic effect and Kruskal-Wallis. We have theoretically studied the marginal effects of genetic heterogeneity and phenotypic heterogeneity as well as their joint effects on the false positive rate of the three tests mentioned above. We have carried out extensive simulations under different genetic models and probability distributions of quantitative traits to assess the rate of false positives in the presence of population stratification. We find that the rate of false positives increases at a very fast rate with simultaneous increase in differences in the standardized phenotypic means and marker allele frequencies in the subpopulations.
allelic additivity; ANOVA; genetic heterogeneity; Kruskal-Wallis test; phenotypic heterogeneity
Orofacial clefts are common birth defects with strong evidence for both genetic and environmental causal factors. Candidate-gene studies combined with exposures known to influence the outcome provide a highly targeted approach to detecting GxE interactions. We developed a new statistical approach that combines the case-control and offspring-parent triad designs into a “hybrid design” to search for GxE interactions among 334 autosomal cleft candidate genes and maternal first-trimester exposure to smoking, alcohol, coffee, folic acid supplements, dietary folate, and vitamin A. The study population comprised 425 case-parent triads of isolated clefts and 562 control-parent triads derived from a nationwide study of orofacial clefts in Norway (1996-2001). A full maximum-likelihood model was used in combination with a Wald test statistic to screen for statistically significant GxE interaction between strata of exposed and unexposed mothers. In addition, we performed pathway-based analyses on 28 detoxification genes and 21 genes involved in folic acid metabolism. With the possible exception of the T-box 4 gene (TBX4) and dietary folate interaction in isolated CPO, there was little evidence overall of GxE interaction in our data. This study is the largest to date aimed at detecting interactions between orofacial clefts candidate genes and well-established risk exposures.
Birth defects; orofacial cleft; cleft lip; cleft palate; genetic epidemiology
Fragile X Syndrome (FXS) is characterized by moderate to severe intellectual disability which is accompanied by macroorchidism and distinct facial morphology. FXS is caused by the expansion of the CGG trinucleotide repeat in the 5′ untranslated region of the Fragile X mental retardation 1 (FMR1) gene. The syndrome has been studied in ethnically diverse populations around the world and has been extensively characterized in several populations. Similar to other trinucleotide expansion disorders, the gene specific instability of FMR1 is not accompanied by genomic instability. Currently we do not have a comprehensive understanding of the molecular underpinnings of gene specific instability associated with tandem repeats. Molecular evidence from in vitro experiments and animal models supports several pathways for gene specific trinucleotide repeat expansion. However, whether the mechanisms reported from other systems contribute to trinucleotide repeat expansion in humans is not clear. To understand how repeat instability in humans could occur, the CGG repeat expansion is explored through molecular analysis and population studies which characterized CGG repeat alleles of FMR1. Finally, the review discusses the relevance of these studies in understanding the mechanism of trinucleotide repeat expansion in FXS.
FMR1 gene; fragile x mutation; prevalence
A genome-wide association study of serum uric acid levels was performed in a relatively isolated population of European descent from an island of the Adriatic coast of Croatia. The study sample included 532 unrelated and 768 related individuals from 235 pedigrees. Inflation due to relatedness was controlled by using genomic control. Genetic association was assessed with 2,241,249 SNPs in 1300 samples after adjusting for age and gender. Our study replicated four previously reported serum uric acid loci (SLC2A9, ABCG2, RREB1, and SLC22A12). The strongest association was found with a SNP in SLC2A9 (rs13129697, P=2.33×10−19), which exhibited significant gender-specific effects, 35.76μmol/L (P=2.11×10−19) in females and 19.58 μmol/L (P=5.40×10−5) in males. Within this region of high linkage disequilibrium, we also detected a strong association with a non-synonymous SNP, rs16890979 (P=2.24×10−17), a putative causal variant for serum uric acid variation. In addition, we identified several novel loci suggestive of association with uric acid levels (SEMA5A, TMEM18, SLC28A2, and ODZ2), although the P-values (P<5×10−6) did not reach the threshold of genome-wide significance. Together, these findings provide further confirmation of previously reported uric acid-related genetic variants and highlight suggestive new loci for additional investigation.
Serum uric acid; genome-wide association; Adriatic island population
Primary ciliary dyskinesia (PCD) is a genetic disorder, usually autosomal recessive, causing early respiratory disease and later subfertility. Whole exome sequencing may enable efficient analysis for locus heterogeneous disorders such as PCD. We whole exome sequenced one consanguineous Saudi Arabian with clinically diagnosed PCD and normal laterality, to attempt ab initio molecular diagnosis.
We reviewed thirteen known PCD genes and potentially autozygous regions (extended homozygosity) for homozygous exon deletions, non-dbSNP codon, splice-site base variants or small indels. Homozygous non-dbSNP changes were also reviewed exome-wide.
One single molecular read representing RSPH9 p.Lys268del was observed, with no wildtype reads, and a notable deficiency of mapped reads at this location. Among all observations, RSPH9 was the strongest candidate for causality. Searching unmapped reads revealed seven more mutant reads. Direct assay for p.Lys268del (MboII digest) confirmed homozygosity in the affected individual, then confirmed homozygosity in three siblings with bronchiectasis. Our finding in southwest Saudi Arabia indicates that p.Lys268del, previously observed in two Bedouin families (Israel, UAE) is geographically widespread in the Arabian Peninsula. Analogous with cystic fibrosis CFTR p.Phe508del, screening for RSPH9 p.Lys268del (which lacks sentinel dextrocardia) in those at risk would help in early diagnosis, tailored clinical management, genetic counselling and primary prevention.
high-throughput nucleotide sequencing; primary ciliary dyskinesia; screening
We narrowed chromosome 15q21-23 linkage to plasma high density lipoprotein cholesterol (HDL-C) levels in atherogenic dyslipidemic Turkish families by fine mapping, then focused on glucuronic acid epimerase (GLCE), a heparan sulfate proteoglycan (HSPG) biosynthesis enzyme. HSPGs participate in lipid metabolism along with apolipoprotein (apo) E. Of 31 SNPs in the GLCE locus, nine analyzed by haplotype were associated with plasma HDL-C and triglyceride levels (permuted p = 0.006 and 0.013, respectively) in families. Of five tagging GLCE SNPs in two cohorts of unrelated subjects, three (rs16952868, rs11631403, rs3865014) were associated with triglyceride and HDL-C levels in males (non-permuted p < 0.05). The association was stronger in APOE 2/3 subjects (apoE2 has reduced binding to HSPGs) and reached multiple-testing significance (p < 0.05) in both males and females (n = 2612). Similar results were obtained in the second cohort (n = 1164). Interestingly, at the GLCE locus, bounded by recombination hotspots, Turks had a minor allele frequency of SNPs resembling Chinese more than European ancestry; adjoining regions on chromosome 15 resembled the European pattern. Studies of glce+/–apoe–/– mice fed a chow or high-fat diet supported a role for GLCE in lipid metabolism. Thus, SNPs in GLCE are associated with triglyceride and HDL-C levels in Turks, and mouse studies support a role for glce in lipid metabolism.
It has been a research focus to uncover the genetic determination of complex diseases caused by rare variants. As the vast majority of genomic variants represent background variation, highlighting potentially causal mutations through weighting scheme is critical to the success of rare variants aimed association studies. In this study, we propose a novel Bayesian marker selection approach to perform weighting-based association test. In this approach, individual association signal and its direction are used to weight variants. In addition, the predicted biological function of variants is taken as prior information to direct the selection of likely causal variants. Simulation studies show that the proposed method has improved power over several existing methods in certain conditions. Analyses of two empirical datasets demonstrate its applicability.
weighting; Bayesian marker selection; rare variants; association
A large number of linkage and association studies of complex diseases focus on analysis of a more common or more easily measured disease endophenotype. The motivation for this approach is that there is a pleiotropic locus common to both the disease and the endophenotype and that this locus is a major genetic determinant of the endophenotype. In this paper we determine the conditions in which the risk of the endophenotype in siblings of affected probands with disease equals the risk of the endophenotype in the offspring(parents) of affected parents(offspring) with disease. In doing so we prove that this equality holds if and only if the penetrance of either the endophenotype or the disease (but not necessarily both) is additive.
linkage; sibpair; endophenotype; recurrence risk; additivity
Studies have shown that interactions of single nucleotide polymorphism (SNP) may play an important role for understanding causes of complex disease. Machine learning approaches provide useful features to explore interactions more effectively and efficiently. We have proposed an integrated method that combines two machine learning methods - Random Forests (RF) and Multivariate Adaptive Regression Splines (MARS) - to identify a subset of important SNPs and detect interaction patterns. In this two-stage RF-MARS (TRM) approach, RF is first applied to detect a predictive subset of SNPs, and then MARS is used to identify the interaction patterns among the selected SNPs. We evaluated the TRM performances in four models: three causal models with one two-way interaction and one null model. RF variable selection was based on out-of-bag classification error rate (OOB) and variable important spectrum (IS). First, we compared the selection of important variable of RF and MARS. Our results support that RFOOB had better performance than MARS and RFIS in detecting important variables. We also evaluated the true positive rate and false positive rate of identifying interaction patterns in TRM and MARS. This study demonstrates that TRMOOB, which is RFOOB plus MARS, has combined the strengths of RF and MARS in identifying SNP-SNP interaction patterns in a scenario of 100 candidate SNPs. TRMOOB had greater true positive rate and lower false positive rate compared with MARS, particularly for searching interactions with a strong association with the outcome. Therefore the use of TRMOOB is favored for exploring SNP-SNP interactions in a large-scale genetic variation study.
polymorphism; interaction; machine learning
The association between obesity and the fat mass and obesity associated (FTO) gene has been widely replicated among Caucasian populations. The limited number of studies assessing its significance in Asian populations have been somewhat conflicting. We performed a genetic association study of 51 tagging, GWAS, and imputed single nucleotide polymorphisms with twelve measures of adiposity and skeletal robustness in two Samoan populations of Polynesia. We included 465 and 624 unrelated American Samoan and Samoan individuals, respectively; these populations derive from a single genetic background traced to Southeast Asia and represent one socio-cultural unit, although they are economically disparate with distinct environmental exposures. American Samoans were significantly larger than Samoans in all measures of obesity and most measures of skeletal robustness. In separate analyses of American Samoa and Samoa, we found a total of 36 nominal associations between FTO variants and skeletal and obesity measures. The preponderance of these nominal associations (32 of 36) was observed in the Samoan population, and predominantly with skeletal rather than fat mass measures (28 of 36). All significance disappeared, however, following corrections for multiple testing. Based on these findings, it could be surmised that FTO is not likely a major obesity locus in Polynesian populations.
obesity; FTO; association analysis; Samoa
It has been shown that parametric analysis of linkage disequilibrium conditional on linkage using an overly deterministic model can be optimal for family-based association analysis. However if one applies this strategy carelessly there is a risk of false inference. We analyze properties of such likelihood ratio tests when the assumed disease mode-of-inheritance is inaccurate. Under some conditions problems result if one is not careful to consider what null hypothesis is being tested. We show that: (a) tests for which the null hypothesis assumes absence of both linkage and association are independent of the true mode-of-inheritance; (b) LRTs assuming either linkage or association under the null hypothesis may depend on the true mode-of-inheritance, lead to inconsistent parameter estimates, in particular under extremely deterministic models; (c) this problem cannot be eliminated by increasing sample size or adding population controls - as sample size increases, the chance of false positive inference goes to 100%; (d) this issue can lead to systematic false positive inference of association in regions of linkage. This is important because highly-deterministic models are often used intentionally in model-based analyses because they can have more power than the true model, and are implicit in many model-free analysis methods.
Likelihood methods; Family-based association; Linkage disequilibrium; Type I error; Bias
Surfactant protein D (SP-D) is a mucosal collectin that functions in the innate immune response to microorganisms in the pulmonary tract, and possibly the gastrointestinal tract. We studied the genetic association of the two non-synonymous SP-D SNPs rs721917 (C/T Met11Thr) and rs2243639 (G/A Ala160Thr) in 256 IBD cases (123 CD and 133 UC) and 376 unrelated healthy individuals from an inflammatory bowel disease (IBD) population from Central Pennsylvania. Case-control analysis revealed a significant association of rs2243639 with susceptibility to Crohn’s disease (CD) (p=0.0036), but not ulcerative colitis (UC) (p=0.883), and no association of rs721917 with CD (p=0.328) or UC (p=0.218). Using intestinal tissues from 19 individuals heterozygous for each SNP, we compared allele expression of these two SNPs between diseased and matched normal tissues. rs2243639 exhibited balanced biallele expression; while rs721917 exhibited differential allele expression (balanced biallelic 37%, imbalanced biallelic 45%, and dominant monoallelic 18%). Comparison of allelic expression pattern between diseased and matched normal tissues, 13 out 19 individuals (14 UC, 5 CD) showed a similar pattern. The six patients exhibiting a different pattern were all UC patients. The results suggest that differential allelic expression may affect penetrance of the SNP rs721917 disease-susceptibility allele in IBD. The potential impact of SP-D monoallelic expression on incomplete penetrance is discussed.
surfactant protein-D; inflammatory bowel disease; allele specific expression; genetic association
In order to identify novel genetic variants that influence plasma lipid concentrations, we performed a genome-wide association study (GWAS) comprised of 411 children under 18 years of age, ascertained at St. Jude Children’s Research Hospital, all of whom were of European, African, or Mexican-descent. Promising associations (p<10−5) were subsequently examined in 1,040 additional youths and 3,508 adults from the Third National Health and Nutrition Examination Survey (NHANES III), a diverse population-based study. Three genotype-phenotype associations replicated in NHANES III youths and three associated in NHANES III adults at p<0.05; however, no single association was significant in both youths and adults. The most significant association (p=0.009) in NHANES III youths was between low-density lipoprotein cholesterol (LDL-C) and intronic rs2429917 among participants of African-descent. Given the known age-dependency of lipid levels, we also tested for gene-age interactions in NHANES III participants across all ages. We identified a significant (p=0.024) age-dependent association between SGSM2 rs2429917 and LDL-C. This finding illustrates the utility of using children to discover novel variants associated with complex phenotypes and the importance of considering age-dependent genetic effects in association studies of lipid levels.
Many lines of evidence suggest that mitochondrial DNA (mtDNA) variants are involved in the pathogenesis of human complex diseases, especially for age-related disorders. Osteoporosis is a typical age-related complex disease. However, the role of mtDNA variants in the susceptibility of osteoporosis is largely unknown. In this study, we performed a mitochondria-wide association study for osteoporosis in Caucasians. A total of 445 mitochondrial single nucleotide polymorphisms (mtSNPs) were genotyped in a large sample of 2,286 unrelated Caucasian subjects by using the Affymetrix Genome-Wide SNP Array 6.0, and 72 mtSNPs survived the quality control. We first tested for association between single-mtSNP and bone mineral density (BMD), and identified that, a mtSNP within the NADH dehydrogenase 2 gene (ND2), mt4823 C/A polymorphism, was strongly associated with hip BMD (P = 2.05 × 10−4), even after conservative Bonferroni correction‥ The C allele of mt4823 was associated with reduced hip BMD and the effect size (β) was estimated to be ~0.044. Another SNP mt15885 within the Cytochrome b gene (Cytb) was found to be associated both with spine (P = 1.66×10−3) and hip BMD (P = 0.023). The T allele of mt15885 had a protective effect on spine (β = 0.064) and hip BMD (β = 0.038). Next, we classified subjects into the nine common European haplogroups and conducted association analyses. Subjects classified as haplogroup X had significantly lower mean hip BMD values than others (P = 0.040). Our results highlighted the importance of mtDNA variants in influencing BMD variation and risk to osteoporosis.
mtSNP; haplogroup; osteoporosis; BMD; association
In the presence of epistasis multilocus association tests of human complex traits can provide powerful methods to detect susceptibility variants. We undertook multilocus analyses in 1924 type 2 diabetes cases and 2938 controls from the Wellcome Trust Case Control Consortium (WTCCC). We performed a two-dimensional genome-wide association (GWA) scan using joint two-locus tests of association including main and epistatic effects in 70,236 markers tagging common variants. We found two-locus association at 79 SNP-pairs at a Bonferroni-corrected P-value = 0.05 (uncorrected P-value = 2.14 × 10−11). The 79 pair-wise results always contained rs11196205 in TCF7L2 paired with 79 variants including confirmed variants in FTO, TSPAN8, and CDKAL1, which are associated in the absence of epistasis. However, the majority (82%) of the 79 variants did not have compelling single-locus association signals (P-value = 5 × 10−4). Analyses conditional on the single-locus effects at TCF7L2 established that the joint two-locus results could be attributed to single-locus association at TCF7L2 alone. Interaction analyses among the peak 80 regions and among 23 previously established diabetes candidate genes identified five SNP-pairs with case-control and case-only epistatic signals. Our results demonstrate the feasibility of systematic scans in GWA data, but confirm that single-locus association can underlie and obscure multilocus findings.
Epistasis; simultaneous search; joint effects; genome-wide association
Compositional epistasis is said to be present when the effect of a genetic factor at one locus is masked by a variant at another locus. Although such compositional epistasis is not equivalent to the presence of an interaction in a statistical model, non-standard tests can sometimes be used to detect compositional epistasis. In this paper we consider empirical tests for compositional epistasis under models for the joint effect of two genetic factors which place no restrictions on the main effects of each factor but constrain the interactive effects of the two factors so as to be captured by a single parameter in the model. We describe the implications of these tests for cohort, case-control, case-only and family-based study designs and we illustrate the methods using an example of gene-gene interaction already reported in the literature.