The human leukocyte antigen (HLA) class II genes HLA-DRB1, -DQA1 and -DQB1 are the strongest genetic factors for type 1 diabetes (T1D). Additional loci in the major histocompatibility complex (MHC) are difficult to identify due to the region’s high gene density and complex linkage disequilibrium (LD). To facilitate the association analysis, two novel algorithms were implemented in this study: one for phasing the multi-allelic HLA genotypes in trio families, and one for partitioning the HLA strata in conditional testing. Screening and replication were performed on two large and independent datasets: the Wellcome Trust Case–Control Consortium (WTCCC) dataset of 2,000 cases and 1,504 controls, and the T1D Genetics Consortium (T1DGC) dataset of 2,300 nuclear families. After imputation, the two datasets have 1,941 common SNPs in the MHC, of which 22 were successfully tested and replicated based on the statistical testing stratifying on the detailed DRB1 and DQB1 genotypes. Further conditional tests using the combined dataset confirmed eight novel SNP associations around 31.3 Mb on chromosome 6 (rs3094663, p = 1.66 × 10−11 and rs2523619, p = 2.77 × 10−10 conditional on the DR/DQ genotypes). A subsequent LD analysis established TCF19, POU5F1, CCHCR1 and PSORS1C1 as potential causal genes for the observed association.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-010-0908-2) contains supplementary material, which is available to authorized users.
Although they have demonstrated success in searching for common variants for complex diseases, Genome-Wide Association (GWA) studies are less successful in detecting rare genetic variants because of the poor statistical power of most of current methods. We developed a two-stage method that can apply to GWA studies for detecting rare variants. Here we report the results of applying this two-stage method to the Wellcome Trust Case Control Consortium (WTCCC) dataset that include 7 complex diseases: Bipolar disorder, Cardiovascular disease, Hypertension, Rheumatoid Arthritis, Crohn’s disease, Type 1 Diabetes and Type 2 Diabetes. We identified 24 genes or regions that reach genome wide significance. 8 of them are novel and were not reported in the WTCCC study. The cumulative risk (or protective) haplotype frequency for each of the 8 genes or regions is small, being at most 11%. For each of the novel genes, the risk (or protective) haplotype set cannot be tagged by the common SNPs available in chips (r2<0.32). The gene identified in hypertension was further replicated in the Framingham Heart Study (FHS), and is also significantly associated with Type 2 Diabetes. Our analysis suggests that searching for rare genetic variants is feasible in current genome-wide association studies and candidate gene studies, and the results can severe as guides to future resequencing studies to identify the underlying rare functional variants.
We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two ‘missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10−16. The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, ‘missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation.
genome-wide association study; the 1000 Genomes project; imputation
The Type I Diabetes Genetics Consortium (T1DGC) is an international, multicenter research program with two primary goals. The first goal is to identify genomic regions and candidate genes whose variants modify an individual’s risk of type I diabetes (T1D) and help explain the clustering of the disease in families. The second goal is to make research data available to the research community and to establish resources that can be used by, and that are fully accessible to, the research community. To facilitate the access to these resources, the T1DGC has developed a Consortium Agreement (http://www.t1dgc.org) that specifies the rights and responsibilities of investigators who participate in Consortium activities. The T1DGC has assembled a resource of affected sib-pair families, parent–child trios, and case–control collections with banks of DNA, serum, plasma, and EBV-transformed cell lines. In addition, both candidate gene and genome-wide (linkage and association) studies have been performed and displayed in T1DBase (http://www.t1dbase.org) for all researchers to use in their own investigations. In this supplement, a subset of the T1DGC collection has been used to investigate earlier published candidate genes for T1D, to confirm the results from a genome-wide association scan for T1D, and to determine associations with candidate genes for other autoimmune diseases or with type II diabetes that may be involved with β-cell function.
type I diabetes; autoantibodies; HLA; families; linkage; association
The Wellcome Trust Case Control Consortium (WTCCC) primary genome-wide association (GWA) scan1 on seven diseases, including the multifactorial, autoimmune disease, type 1 diabetes (T1D), shows significant association (P < 5 × 10−7 between T1D and six chromosome regions: 12q24, 12q13, 16p13, 18p11, 12p13 and 4q27. Here, we attempted to validate these and six other top findings in 4,000 individuals with T1D, 5,000 controls and 2,997 family trios that were independent of the WTCCC study. We confirmed unequivocally the associations of 12q24, 12q13, 16p13 and 18p11 (Pfollow-up ≤ 1.35 × 10−9; Poverall ≤ 1.15 × 10−14), leaving eight regions with small effects or false-positive associations with T1D. We also obtained evidence for chromosome 18q22 (Poverall = 1.38 × 10−8) from a genome-wide association study of nonsynonymous SNPs. Several regions, including 18q22 and 18p11, showed association with autoimmune thyroid disease. This study increases the number of T1D loci with compelling evidence from six to at least ten.
The Type I Diabetes Genetics Consortium (T1DGC) is an international collaboration whose primary goal is to identify genes whose variants modify an individual’s risk of type I diabetes (T1D). An integral part of the T1DGC’s mission is the establishment of clinical and data resources that can be used by, and that are fully accessible to, the T1D research community (http://www.t1dgc.org). The T1DGC has organized the collection and analyses of study samples and conducted several major research projects focused on T1D gene discovery: a genome-wide linkage scan, an intensive evaluation of the human major histocompatibility complex, a detailed examination of published candidate genes, and a genome-wide association scan. These studies have provided important information to the scientific community regarding the function of specific genes or chromosomal regions on T1D risk. The results are continually being updated and displayed (http://www.t1dbase.org). The T1DGC welcomes all investigators interested in using these data for scientific endeavors on T1D. The T1DGC resources provide a framework for future research projects, including examination of structural variation, re-sequencing of candidate regions in a search for T1D-associated genes and causal variants, correlation of T1D risk genotypes with biomarkers obtained from T1DGC serum and plasma samples, and in-depth bioinformatics analyses.
type I diabetes; sequence analysis; HLA; structural variants; expression
OBJECTIVE— The Type 1 Diabetes Genetics Consortium (T1DGC) has assembled and genotyped a large collection of multiplex families for the purpose of mapping genomic regions linked to type 1 diabetes. In the current study, we tested for evidence of loci associated with type 1 diabetes utilizing genome-wide linkage scan data and family-based association methods.
RESEARCH DESIGN AND METHODS— A total of 2,496 multiplex families with type 1 diabetes were genotyped with a panel of 6,090 single nucleotide polymorphisms (SNPs). Evidence of association to disease was evaluated by the pedigree disequilibrium test. Significant results were followed up by genotyping and analyses in two independent sets of samples: 2,214 parent-affected child trio families and a panel of 7,721 case and 9,679 control subjects.
RESULTS— Three of the SNPs most strongly associated with type 1 diabetes localized to previously identified type 1 diabetes risk loci: INS, IFIH1, and KIAA0350. A fourth strongly associated SNP, rs876498 (P = 1.0 × 10−4), occurred in the sixth intron of the UBASH3A locus at chromosome 21q22.3. Support for this disease association was obtained in two additional independent sample sets: families with type 1 diabetes (odds ratio [OR] 1.06 [95% CI 1.00–1.11]; P = 0.023) and case and control subjects (1.14 [1.09–1.19]; P = 7.5 × 10−8).
CONCLUSIONS— The T1DGC 6K SNP scan and follow-up studies reported here confirm previously reported type 1 diabetes associations at INS, IFIH1, and KIAA0350 and identify an additional disease association on chromosome 21q22.3 in the UBASH3A locus (OR 1.10 [95% CI 1.07–1.13]; P = 4.4 × 10−12). This gene and its flanking regions are now validated targets for further resequencing, genotyping, and functional studies in type 1 diabetes.
Genome-wide association studies (GWAS) have emerged as a powerful approach for identifying susceptibility loci associated with polygenetic diseases such as type 2 diabetes mellitus (T2DM). However, it is still a daunting task to prioritize single nucleotide polymorphisms (SNPs) from GWAS for further replication in different population. Several recent studies have shown that genetic variation often affects gene-expression at proximal (cis) as well as distal (trans) genomic locations by different mechanisms such as altering rate of transcription or splicing or transcript stability.
To prioritize SNPs from GWAS, we combined results from two GWAS related to T2DM, the Diabetes Genetics Initiative (DGI) and the Wellcome Trust Case Control Consortium (WTCCC), with genome-wide expression data from pancreas, adipose tissue, liver and skeletal muscle of individuals with or without T2DM or animal models thereof to identify T2DM susceptibility loci.
We identified 1,170 SNPs associated with T2DM with P < 0.05 in both GWAS and 243 genes that were located in the vicinity of these SNPs. Out of these 243 genes, we identified 115 differentially expressed in publicly available gene expression profiling data. Notably five of them, IGF2BP2, KCNJ11, NOTCH2, TCF7L2 and TSPAN8, have subsequently been shown to be associated with T2DM in different populations. To provide further validation of our approach, we reversed the approach and started with 26 known SNPs associated with T2DM and related traits. We could show that 12 (57%) (HHEX, HNF1B, IGF2BP2, IRS1, KCNJ11, KCNQ1, NOTCH2, PPARG, TCF7L2, THADA, TSPAN8 and WFS1) out of 21 genes located in vicinity of these SNPs were showing aberrant expression in T2DM from the gene expression profiling studies.
Utilizing of gene expression profiling data from different tissues of individuals with or without T2DM or animal models thereof is a powerful tool for prioritizing SNPs from WGAS for further replication studies.
In the presence of epistasis multilocus association tests of human complex traits can provide powerful methods to detect susceptibility variants. We undertook multilocus analyses in 1924 type 2 diabetes cases and 2938 controls from the Wellcome Trust Case Control Consortium (WTCCC). We performed a two-dimensional genome-wide association (GWA) scan using joint two-locus tests of association including main and epistatic effects in 70,236 markers tagging common variants. We found two-locus association at 79 SNP-pairs at a Bonferroni-corrected P-value = 0.05 (uncorrected P-value = 2.14 × 10−11). The 79 pair-wise results always contained rs11196205 in TCF7L2 paired with 79 variants including confirmed variants in FTO, TSPAN8, and CDKAL1, which are associated in the absence of epistasis. However, the majority (82%) of the 79 variants did not have compelling single-locus association signals (P-value = 5 × 10−4). Analyses conditional on the single-locus effects at TCF7L2 established that the joint two-locus results could be attributed to single-locus association at TCF7L2 alone. Interaction analyses among the peak 80 regions and among 23 previously established diabetes candidate genes identified five SNP-pairs with case-control and case-only epistatic signals. Our results demonstrate the feasibility of systematic scans in GWA data, but confirm that single-locus association can underlie and obscure multilocus findings.
Epistasis; simultaneous search; joint effects; genome-wide association
Recent genome-wide association studies have resulted in a dramatic increase in our knowledge of the genetic loci involved in type 2 diabetes. In a complementary approach to these single-marker studies, we attempted to identify biological pathways associated with type 2 diabetes. This approach could allow us to identify additional risk loci.
RESEARCH DESIGN AND METHODS
We used individual level genotype data generated from the Wellcome Trust Case Control Consortium (WTCCC) type 2 diabetes study, consisting of 393,143 autosomal SNPs, genotyped across 1,924 case subjects and 2,938 control subjects. We sought additional evidence from summary level data available from the Diabetes Genetics Initiative (DGI) and the Finland-United States Investigation of NIDDM Genetics (FUSION) studies. Statistical analysis of pathways was performed using a modification of the Gene Set Enrichment Algorithm (GSEA). A total of 439 pathways were analyzed from the Kyoto Encyclopedia of Genes and Genomes, Gene Ontology, and BioCarta databases.
After correcting for the number of pathways tested, we found no strong evidence for any pathway showing association with type 2 diabetes (top Padj = 0.31). The candidate WNT-signaling pathway ranked top (nominal P = 0.0007, excluding TCF7L2; P = 0.002), containing a number of promising single gene associations. These include CCND2 (rs11833537; P = 0.003), SMAD3 (rs7178347; P = 0.0006), and PRICKLE1 (rs1796390; P = 0.001), all expressed in the pancreas.
Common variants involved in type 2 diabetes risk are likely to occur in or near genes in multiple pathways. Pathway-based approaches to genome-wide association data may be more successful for some complex traits than others, depending on the nature of the underlying disease physiology.
In addition to the HLA-locus, six genetic risk factors for primary biliary cirrhosis (PBC) have been identified in recent genome-wide association studies (GWAS). To identify additional loci, we carried out a GWAS using 1,840 cases from the UK PBC Consortium and 5,163 UK population controls as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3). Twenty-eight loci were followed up in an additional UK cohort of 620 PBC cases and 2,514 population controls. We identified 12 novel risk loci (P<5×10−8) and replicated all previously associated loci. Three further novel loci were identified by meta-analysis of data from our study and previously published GWAS results. New candidate genes include STAT4, DENND1B, CD80, IL7R, CXCR5, TNFRSF1A, CLEC16A, and NFKB1. This study has considerably expanded our knowledge of the genetic architecture of PBC.
Type 1 diabetes arises from the actions of multiple genetic and environmental risk factors. Considerable success at identifying common genetic variants that contribute to type 1 diabetes risk has come from genetic association (primarily case-control) studies. However, such studies have limited power to detect genes containing multiple rare variants that contribute significantly to disease risk.
RESEARCH DESIGN AND METHODS
The Type 1 Diabetes Genetics Consortium (T1DGC) has assembled a collection of 2,496 multiplex type 1 diabetic families from nine geographical regions containing 2,658 affected sib-pairs (ASPs). We describe the results of a genome-wide scan for linkage to type 1 diabetes in the T1DGC family collection.
Significant evidence of linkage to type 1 diabetes was confirmed at the HLA region on chromosome 6p21.3 (logarithm of odds [LOD] = 213.2). There was further evidence of linkage to type 1 diabetes on 6q that could not be accounted for by the major linkage signal at the HLA class II loci on chromosome 6p21. Suggestive evidence of linkage (LOD ≥2.2) was observed near CTLA4 on chromosome 2q32.3 (LOD = 3.28) and near INS (LOD = 3.16) on chromosome 11p15.5. Some evidence for linkage was also detected at two regions on chromosome 19 (LOD = 2.84 and 2.54).
Five non–HLA chromosome regions showed some evidence of linkage to type 1 diabetes. A number of previously proposed type 1 diabetes susceptibility loci, based on smaller ASP numbers, showed limited or no evidence of linkage to disease. Low-frequency susceptibility variants or clusters of loci with common alleles could contribute to the linkage signals observed.
Phenotypic misclassification (between cases) has been shown to reduce the power to detect association in genetic studies. However, it is conceivable that complex traits are heterogeneous with respect to individual genetic susceptibility and disease pathophysiology, and that the effect of heterogeneity has a larger magnitude than the effect of phenotyping errors. Although an intuitively clear concept, the effect of heterogeneity on genetic studies of common diseases has received little attention. Here we investigate the impact of phenotypic and genetic heterogeneity on the statistical power of genome wide association studies (GWAS). We first performed a study of simulated genotypic and phenotypic data. Next, we analyzed the Wellcome Trust Case-Control Consortium (WTCCC) data for diabetes mellitus (DM) type 1 (T1D) and type 2 (T2D), using varying proportions of each type of diabetes in order to examine the impact of heterogeneity on the strength and statistical significance of association previously found in the WTCCC data. In both simulated and real data, heterogeneity (presence of “non-cases”) reduced the statistical power to detect genetic association and greatly decreased the estimates of risk attributed to genetic variation. This finding was also supported by the analysis of loci validated in subsequent large-scale meta-analyses. For example, heterogeneity of 50% increases the required sample size by approximately three times. These results suggest that accurate phenotype delineation may be more important for detecting true genetic associations than increase in sample size.
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
Over 50 regions of the genome have been associated with type 1 diabetes risk, mainly using large case/control collections. In a recent genome-wide association (GWA) study, 18 novel susceptibility loci were identified and replicated, including replication evidence from 2,319 families. Here, we, the Type 1 Diabetes Genetics Consortium (T1DGC), aimed to exclude the possibility that any of the 18 loci were false-positives due to population stratification by significantly increasing the statistical power of our family study.
We genotyped the most disease-predicting single-nucleotide polymorphisms at the 18 susceptibility loci in 3,108 families and used existing genotype data for 2,319 families from the original study, providing 7,013 parent–child trios for analysis. We tested for association using the transmission disequilibrium test.
Seventeen of the 18 susceptibility loci reached nominal levels of significance (p < 0.05) in the expanded family collection, with 14q24.1 just falling short (p = 0.055). When we allowed for multiple testing, ten of the 17 nominally significant loci reached the required level of significance (p < 2.8 × 10−3). All susceptibility loci had consistent direction of effects with the original study.
The results for the novel GWA study-identified loci are genuine and not due to population stratification. The next step, namely correlation of the most disease-associated genotypes with phenotypes, such as RNA and protein expression analyses for the candidate genes within or near each of the susceptibility regions, can now proceed.
Electronic supplementary material
The online version of this article (doi:10.1007/s00125-012-2450-3) contains peer-reviewed but unedited supplementary material, including a full list of members of the Type 1 Diabetes Genetics Consortium, which is available to authorised users.
Families; Population stratification bias; Power; Replication; Susceptibility; Type 1 diabetes
The Type I Diabetes Genetics Consortium (T1DGC) Rapid Response Workshop was established to evaluate published candidate gene associations in a large collection of affected sib-pair (ASP) families. We report on our quality control (QC) and preliminary family-based association analyses. A random sample of blind duplicates was analyzed for QC. Quality checks, including examination of plate-panel yield, marker yield, Hardy–Weinberg equilibrium, mismatch error rate, Mendelian error rate, and allele distribution across plates, were performed. Genotypes from 2324 families within nine cohorts were obtained from a panel of 21 candidate genes, including 384 single-nucleotide polymorphisms on two genotyping platforms performed at the Broad Institute Center for Genotyping and Analysis (Cambridge, MA, USA). The T1DGC Rapid Response project, following rigorous QC procedures, resulted in a 2297 family, 9688 genotyped individual database on a single-candidate gene panel. The available data include 9005 individuals with genotype data from both platforms and 683 individuals genotyped (276 in Illumina; 407 in Sequenom) on only one platform.
type I diabetes; candidate gene; SNP; quality control; association
Recent genome-wide association studies (GWAS) have identified novel loci associated with sudden cardiac death (SCD). Despite this progress, identified DNA variants account for a relatively small portion of overall SCD risk, suggesting that additional loci contributing to SCD susceptibility await discovery. The objective of this study was to identify novel DNA variation associated with SCD in the context of coronary artery disease (CAD).
Methods and Findings
Using the MetaboChip custom array we conducted a case-control association analysis of 119,117 SNPs in 948 SCD cases (with underlying CAD) from the Oregon Sudden Unexpected Death Study (Oregon-SUDS) and 3,050 controls with CAD from the Wellcome Trust Case-Control Consortium (WTCCC). Two newly identified loci were significantly associated with increased risk of SCD after correction for multiple comparisons at: rs6730157 in the RAB3GAP1 gene on chromosome 2 (P = 4.93×10−12, OR = 1.60) and rs2077316 in the ZNF365 gene on chromosome 10 (P = 3.64×10−8, OR = 2.41).
Our findings suggest that RAB3GAP1 and ZNF365 are relevant candidate genes for SCD and will contribute to the mechanistic understanding of SCD susceptibility.
Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and “Measles” pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study.
Genome-wide association studies have helped locate gene variants that affect our susceptibility to diseases. The analysis of these studies is typically straightforward: test each genetic variant whether it is correlated with predisposition to disease. This approach often works well for identifying commonly occurring variants with moderate effects on disease risk. However, the effects of many variants are so small they fail to register statistically significant correlations. This is a concern because many diseases are modulated by many genetic factors with small effects on disease risk. An alternative is to examine groups of variants, such as variants sharing a common pathway, and assess whether these groups are “enriched” for correlations with disease. This can be a more effective approach to identifying genetic factors relevant to disease. However, it does not tell us which genes are associated with disease. To address this limitation, we describe an approach that integrates enrichment analysis with tests for disease-variant correlations within a single framework. We illustrate this approach in genome-wide studies of seven complex diseases. We show that our approach supports enriched pathways in several diseases, and uncovers disease-susceptibility genes in these pathways not identified in conventional analyses of the same data.
Rheumatoid arthritis (RA) is an archetypal, common, complex autoimmune disease with both genetic and environmental contributions to disease aetiology. Two novel RA susceptibility loci have been reported from recent genome-wide and candidate gene association studies. We, therefore, investigated the evidence for association of the STAT4 and TRAF1/C5 loci with RA using imputed data from the Wellcome Trust Case Control Consortium (WTCCC). No evidence for association of variants mapping to the TRAF1/C5 gene was detected in the 1860 RA cases and 2930 control samples tested in that study. Variants mapping to the STAT4 gene did show evidence for association (rs7574865, P = 0.04). Given the association of the TRAF1/C5 locus in two previous large case–control series from populations of European descent and the evidence for association of the STAT4 locus in the WTCCC study, single nucleotide polymorphisms mapping to these loci were tested for association with RA in an independent UK series comprising DNA from >3000 cases with disease and >3000 controls and a combined analysis including the WTCCC data was undertaken. We confirm association of the STAT4 and the TRAF1/C5 loci with RA bringing to 5 the number of confirmed susceptibility loci. The effect sizes are less than those reported previously but are likely to be a more accurate reflection of the true effect size given the larger size of the cohort investigated in the current study.
Most pathway and gene-set enrichment methods prioritize genes by their main effect and do not account for variation due to interactions in the pathway. A portion of the presumed missing heritability in genome-wide association studies (GWAS) may be accounted for through gene–gene interactions and additive genetic variability. In this study, we prioritize genes for pathway enrichment in GWAS of bipolar disorder (BD) by aggregating gene–gene interaction information with main effect associations through a machine learning (evaporative cooling) feature selection and epistasis network centrality analysis. We validate this approach in a two-stage (discovery/replication) pathway analysis of GWAS of BD. The discovery cohort comes from the Wellcome Trust Case Control Consortium (WTCCC) GWAS of BD, and the replication cohort comes from the National Institute of Mental Health (NIMH) GWAS of BD in European Ancestry individuals. Epistasis network centrality yields replicated enrichment of Cadherin signaling pathway, whose genes have been hypothesized to have an important role in BD pathophysiology but have not demonstrated enrichment in previous analysis. Other enriched pathways include Wnt signaling, circadian rhythm pathway, axon guidance and neuroactive ligand-receptor interaction. In addition to pathway enrichment, the collective network approach elevates the importance of ANK3, DGKH and ODZ4 for BD susceptibility in the WTCCC GWAS, despite their weak single-locus effect in the data. These results provide evidence that numerous small interactions among common alleles may contribute to the diathesis for BD and demonstrate the importance of including information from the network of gene–gene interactions as well as main effects when prioritizing genes for pathway analysis.
eigenvector centrality; epistasis network; evaporative cooling machine learning feature selection; pathway enrichment analysis; regression-based genetic association interaction network (reGAIN); SNPrank
The advent of genome-wide association (GWA) studies has revolutionized the detection of disease loci and provided abundant evidence for previously undetected disease loci that can be pooled together in meta-analysis studies or used to design followup studies. A total of 1715 SNPs from the Wellcome Trust Case Control Consortium GWA study of type I diabetes (T1D) were selected and a follow-up study was conducted in 1410 affected sib-pair families assembled by the Type I Diabetes Genetics Consortium. In addition to the support for previously identified loci (PTPN22/1p13; ERBB3/12q13; SH2B3/12q24; CLEC16A/16p13; UBASH3A/21q22), evidence supporting two new and distinct chromosome locations associated with T1D was observed: FHOD3/18q12 (rs2644261, P=5.9×10−4) and Xp22 (rs5979785, P=6.8×10−3; http://www.T1DBase.org). There was independent support for both SNPs in a GWA meta-analysis of 7514 cases and 9045 controls (P values=5.0×10−3 and 6.7×10−6, respectively). The chromosome 18q12 region contains four genes, none of which are obvious functional candidate genes. In contrast, the Xp22 SNP is located 30 kb centromeric of the functional candidate genes TLR8 and TLR7 genes. Both TLR8 and TLR7 are functional candidate genes owing to their key roles as pathogen recognition receptors and, in the case of TLR7, overexpression has been associated directly with murine autoimmune disease.
genome-wide association; type I diabetes; follow-up study; T1DGC
Candidate gene studies have long been the principal method for identification of susceptibility genes for type I diabetes (T1D), resulting in the discovery of HLA, INS, PTPN22, CTLA4, and IL2RA. However, many of the initial studies that relied on this strategy were largely underpowered, because of the limitations in genomic information and genotyping technology, as well as the limited size of available cohorts. The Type I Diabetes Genetic Consortium (T1DGC) has established resources to reevaluate earlier reported genes associated with T1D, using its collection of 2298 Caucasian affected sib-pair families (with 11 159 individuals). A total of 382 single-nucleotide polymorphisms (SNPs) located in 21 T1D candidate genes were selected for this study and genotyped in duplicate on two platforms, Illumina and Sequenom. The genes were chosen based on published literature as having been either ‘confirmed’ (replicated) or not (candidates). This study showed several important features of genetic association studies. First, it showed the major impact of small rates of genotyping errors on association statistics. Second, it confirmed associations at INS, PTPN22, IL2RA, IFIH1 (earlier confirmed genes), and CTLA4 (earlier confirmed, with distinct SNPs) loci. Third, it did not find evidence for an association with T1D at SUMO4, despite confirmed association in Asian populations, suggesting the potential for population-specific gene effects. Fourth, at PTPN22, there was evidence for a novel contribution to T1D risk, independent of the replicated effect of the R620W variant. Fifth, among the candidate genes selected for replication, the association of TCF7-P19T with T1D was newly replicated in this study. In summary, this study was able to replicate some genetic effects, reject others, and provide suggestions of association with several of the other candidate genes in stratified analyses (age at onset, HLA status, population of origin). These results have generated additional interesting functional hypotheses that will require further replication in independent cohorts.
type I diabetes; candidate genes; T1DGC; SNP selection
Psychiatric phenotypes are currently defined according to sets of
descriptive criteria. Although many of these phenotypes are heritable, it
would be useful to know whether any of the various diagnostic categories in
current use identify cases that are particularly helpful for
To use genome-wide genetic association data to explore the relative genetic
utility of seven different descriptive operational diagnostic categories
relevant to bipolar illness within a large UK case–control bipolar
We analysed our previously published Wellcome Trust Case Control Consortium
(WTCCC) bipolar disorder genome-wide association data-set, comprising 1868
individuals with bipolar disorder and 2938 controls genotyped for 276 122
single nucleotide polymorphisms (SNPs) that met stringent criteria for
genotype quality. For each SNP we performed a test of association (bipolar
disorder group v. control group) and used the number of associated
independent SNPs statistically significant at P<0.00001 as a
metric for the overall genetic signal in the sample. We next compared this
metric with that obtained using each of seven diagnostic subsets of the group
with bipolar disorder: Research Diagnostic Criteria (RDC): bipolar I disorder;
manic disorder; bipolar II disorder; schizoaffective disorder, bipolar type;
DSM–IV: bipolar I disorder; bipolar II disorder; schizoaffective
disorder, bipolar type.
The RDC schizoaffective disorder, bipolar type (v. controls) stood
out from the other diagnostic subsets as having a significant excess of
independent association signals (P<0.003) compared with that
expected in samples of the same size selected randomly from the total bipolar
disorder group data-set. The strongest association in this subset of
participants with bipolar disorder was at rs4818065 (P =
2.42×10–7). Biological systems implicated included
gamma amniobutyric acid (GABA)A receptors. Genes having at least
one associated polymorphism at P<10–4 included
B3GALTS, A2BP1, GABRB1, AUTS2, BSN, PTPRG, GIRK2 and
Our findings show that individuals with broadly defined bipolar
schizoaffective features have either a particularly strong genetic
contribution or that, as a group, are genetically more homogeneous than the
other phenotypes tested. The results point to the importance of using
diagnostic approaches that recognise this group of individuals. Our approach
can be applied to similar data-sets for other psychiatric and non-psychiatric
OBJECTIVE—This study examined how differences in the BMI distribution of type 2 diabetic case subjects affected genome-wide patterns of type 2 diabetes association and considered the implications for the etiological heterogeneity of type 2 diabetes.
RESEARCH DESIGN AND METHODS—We reanalyzed data from the Wellcome Trust Case Control Consortium genome-wide association scan (1,924 case subjects, 2,938 control subjects: 393,453 single-nucleotide polymorphisms [SNPs]) after stratifying case subjects (into “obese” and “nonobese”) according to median BMI (30.2 kg/m2). Replication of signals in which alternative case-ascertainment strategies generated marked effect size heterogeneity in type 2 diabetes association signal was sought in additional samples.
RESULTS—In the “obese-type 2 diabetes” scan, FTO variants had the strongest type 2 diabetes effect (rs8050136: relative risk [RR] 1.49 [95% CI 1.34–1.66], P = 1.3 × 10−13), with only weak evidence for TCF7L2 (rs7901695 RR 1.21 [1.09–1.35], P = 0.001). This situation was reversed in the “nonobese” scan, with FTO association undetectable (RR 1.07 [0.97–1.19], P = 0.19) and TCF7L2 predominant (RR 1.53 [1.37–1.71], P = 1.3 × 10−14). These patterns, confirmed by replication, generated strong combined evidence for between-stratum effect size heterogeneity (FTO: PDIFF = 1.4 × 10−7; TCF7L2: PDIFF = 4.0 × 10−6). Other signals displaying evidence of effect size heterogeneity in the genome-wide analyses (on chromosomes 3, 12, 15, and 18) did not replicate. Analysis of the current list of type 2 diabetes susceptibility variants revealed nominal evidence for effect size heterogeneity for the SLC30A8 locus alone (RRobese 1.08 [1.01–1.15]; RRnonobese 1.18 [1.10–1.27]: PDIFF = 0.04).
CONCLUSIONS—This study demonstrates the impact of differences in case ascertainment on the power to detect and replicate genetic associations in genome-wide association studies. These data reinforce the notion that there is substantial etiological heterogeneity within type 2 diabetes.
We recently reported an association with type 1 diabetes of a telomeric MHC SNP rs1233478. As further families have been analyzed in the Type 1 Diabetes Genetics Consortium (T1DGC), we sought to test replication of the association and with more data analyze haplotypic associations.
Research Design and Methods
We have since analyzed an additional 2,717 case and 1,315 control chromosomes from the T1DGC, with HLA-typing and data for 2,837 SNPs across the MHC region.
We confirmed the association of rs1233478 [new data only: p=2.2E-5, OR=1.4]. We also found two additional SNPs nearby which were significantly associated with type 1 diabetes (new data only rs3131020: p=8.3E-9, OR=0.65; rs1592410 p=2.2E-8, OR=1.5). For studies of type 1 diabetes in the MHC region it is critical to account for linkage disequilibrium with the HLA genes. Logistic regression analysis of this new data indicated that the effects of rs3131020 and rs1592410 on type 1 diabetes risk are independent of HLA alleles (rs3131020: p=2.3E-3, OR=0.73; rs1592410: p=2.1E-3, OR=1.4). Haplotypes of 12 SNPs (including the three highly significant SNPs) stratify diabetes risk (high risk, protective, and neutral), with high risk haplotypes limited to approximately 20,000 base pairs in length. The 20,000 base pair region is telomeric of the UBD gene and contains LOC729653, a hypothetical gene.
We believe that polymorphisms of the telomeric MHC locus LOC729653 may confer risk for type 1 diabetes.
genetic association studies; major histocompatibility complex; type 1 diabetes