Article first published online 7 January 2015.
Supplemental Digital Content is Available in the Text.
Many genetic risk loci have been identified for inflammatory bowel disease and colorectal cancer; however, identifying the causal genes for each association signal remains a challenge. Expression quantitative trait loci (eQTL) studies have identified common variants that induce differential gene expression and eQTLs can be cross-referenced with disease association signals for gene prioritization. However, the genetics of gene expression are highly tissue-specific, and further eQTL datasets from primary tissues are needed.
We have conducted an eQTL discovery study using tissue extracted endoscopically from the terminal ileum and 4 colonic locations of non-inflamed bowel from 65 controls and patients with quiescent inflammatory bowel disease. A genome-wide cis-eQTL analysis was performed on >3,600,000 variants and 13,558 expressed probes.
We identified 1312 independent eQTLs associated with the differential expression of 1222 genes in rectal mucosa. One hundred seventy-one, 211, 168, and 102 independent eQTLs were identified in the sigmoid, descending colon, ascending colon, and terminal ileum, respectively. Twenty-six percent of genes with rectal eQTLs were novel and unique compared with 7 published eQTL datasets. Rectal eQTLs were significantly enriched for genes expressed in the colon. Examining 163 inflammatory bowel disease risk loci identified 11 tag single-nucleotide polymorphisms that were rectal eQTLs. A colorectal cancer locus at 11q23 contained a rectal eQTL for COLCA2, a protein implicated in colon cancer pathogenesis.
This study defines a catalog of ileal and colonic eQTLs. Our data reaffirm the tissue specificity of eQTLs and support the notion that identification of functional variants in relevant tissue can be effective in fine-mapping genetic risk loci.
gene expression; genetic association; eQTL
Genome-wide association studies of the related chronic inflammatory bowel diseases (IBD) known as Crohn’s disease and ulcerative colitis have shown strong evidence of association to the major histocompatibility complex (MHC). This region encodes a large number of immunological candidates, including the antigen-presenting classical HLA molecules1. Studies in IBD have indicated that multiple independent associations exist at HLA and non-HLA genes, but lacked the statistical power to define the architecture of association and causal alleles2,3. To address this, we performed high-density SNP typing of the MHC in >32,000 patients with IBD, implicating multiple HLA alleles, with a primary role for HLA-DRB1*01:03 in both Crohn’s disease and ulcerative colitis. Significant differences were observed between these diseases, including a predominant role of class II HLA variants and heterozygous advantage observed in ulcerative colitis, suggesting an important role of the adaptive immune response to the colonic environment in the pathogenesis of IBD.
The genetic architecture of autism spectrum disorder involves the interplay of common and rare variation and their impact on hundreds of genes. Using exome sequencing, analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, and a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic, transcriptional, and chromatin remodeling pathways. These include voltage-gated ion channels regulating propagation of action potentials, pacemaking, and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodelers, prominently histone post-translational modifications involving lysine methylation/demethylation.
Human genome sequencing has transformed our understanding of genomic variation and its relevance to health and disease, and is now starting to enter clinical practice for the diagnosis of rare diseases. The question of whether and how some categories of genomic findings should be shared with individual research participants is currently a topic of international debate, and development of robust analytical workflows to identify and communicate clinically relevant variants is paramount.
The Deciphering Developmental Disorders (DDD) study has developed a UK-wide patient recruitment network involving over 180 clinicians across all 24 regional genetics services, and has performed genome-wide microarray and whole exome sequencing on children with undiagnosed developmental disorders and their parents. After data analysis, pertinent genomic variants were returned to individual research participants via their local clinical genetics team.
Around 80 000 genomic variants were identified from exome sequencing and microarray analysis in each individual, of which on average 400 were rare and predicted to be protein altering. By focusing only on de novo and segregating variants in known developmental disorder genes, we achieved a diagnostic yield of 27% among 1133 previously investigated yet undiagnosed children with developmental disorders, whilst minimising incidental findings. In families with developmentally normal parents, whole exome sequencing of the child and both parents resulted in a 10-fold reduction in the number of potential causal variants that needed clinical evaluation compared to sequencing only the child. Most diagnostic variants identified in known genes were novel and not present in current databases of known disease variation.
Implementation of a robust translational genomics workflow is achievable within a large-scale rare disease research study to allow feedback of potentially diagnostic findings to clinicians and research participants. Systematic recording of relevant clinical data, curation of a gene–phenotype knowledge base, and development of clinical decision support software are needed in addition to automated exclusion of almost all variants, which is crucial for scalable prioritisation and review of possible diagnostic variants. However, the resource requirements of development and maintenance of a clinical reporting system within a research setting are substantial.
Health Innovation Challenge Fund, a parallel funding partnership between the Wellcome Trust and the UK Department of Health.
The contribution of rare coding sequence variants to genetic susceptibility in complex disorders is an important but unresolved question. Most studies thus far have investigated a limited number of genes from regions which contain common disease associated variants. Here we investigate this in inflammatory bowel disease by sequencing the exons and proximal promoters of 531 genes selected from both genome-wide association studies and pathway analysis in pooled DNA panels from 474 cases of Crohn’s disease and 480 controls. 80 variants with evidence of association in the sequencing experiment or with potential functional significance were selected for follow up genotyping in 6,507 IBD cases and 3,064 population controls. The top 5 disease associated variants were genotyped in an extension panel of 3,662 IBD cases and 3,639 controls, and tested for association in a combined analysis of 10,147 IBD cases and 7,008 controls. A rare coding variant p.G454C in the BTNL2 gene within the major histocompatibility complex was significantly associated with increased risk for IBD (p = 9.65x10−10, OR = 2.3[95% CI = 1.75–3.04]), but was independent of the known common associated CD and UC variants at this locus. Rare (<1%) and low frequency (1–5%) variants in 3 additional genes showed suggestive association (p<0.005) with either an increased risk (ARIH2 c.338-6C>T) or decreased risk (IL12B p.V298F, and NICN p.H191R) of IBD. These results provide additional insights into the involvement of the inhibition of T cell activation in the development of both sub-phenotypes of inflammatory bowel disease. We suggest that although rare coding variants may make a modest overall contribution to complex disease susceptibility, they can inform our understanding of the molecular pathways that contribute to pathogenesis.
Crohn’s disease and ulcerative colitis are two forms of inflammatory bowel disease which cause chronic inflammation of the gastrointestinal tract. Common genetic variants in more than 160 regions of the human genome have been associated with an altered risk of these disorders, but leave much of the estimated genetic contribution to disease risk unexplained. We sought to establish whether rare genetic variants which alter the structure or function of the proteins encoded by genes also contribute to disease susceptibility. We used high throughput DNA sequencing to screen over 500 genes for such variants in nearly 500 patients and controls, and validated interesting variants in about 10,000 patients and 7,000 controls. We detected association of a limited number of rare variants from coding regions with disease, suggesting that they do not account for a large proportion of genetic susceptibility. However, they highlight the involvement of genes of potential importance in the development of inflammatory bowel disease, including those involved in the activation of immune cells, the regulation of immune response genes, and the degradation of proteins in cells.
Common genetic variants residing near upstream regulatory elements for MYB, the gene encoding transcription factor cMYB, promote the persistence of fetal hemoglobin (HbF) into adulthood. While they have no consequences in healthy individuals, high HbF levels have major clinical benefits in patients with sickle cell disease (SCD) or β thalassemia. Here, we present our detailed investigation of HBS1L-MYB intergenic polymorphism block 2 (HMIP-2), the central component of the complex quantitative-trait locus upstream of MYB, in 1,022 individuals with SCD in Tanzania.
We have looked at 1022 individuals with HbSS or HbS/β0 in Tanzania. In order to achieve a detailed analysis of HMIP-2, we performed targeted genotyping for a total of 10 SNPs and extracted additional 528 SNPs information from a genome wide scan involving the same population. Using MACH, we utilized the existing YRI data from 1000 genomes to impute 54 SNPs situated within HIMP-2.
Seven HbF-increasing, low-frequency variants (β > 0.3, p < 10−5, f ≤ 0.05) were located in two partially-independent sub-loci, HMIP-2A and HMIP-2B. The spectrum of haplotypes carrying such alleles was diverse when compared to European and West African reference populations: we detected one such haplotype at sub-locus HMIP-2A, two at HMIP-2B, and a fourth including high-HbF alleles at both sub-loci (‘Eurasian’ haplotype clade). In the region of HMIP-2A a putative functional variant (a 3-bp indel) has been described previously, but no such candidate causative variant exists at HMIP-2B. Extending our dataset through imputation with 1000 Genomes, whole-genome-sequence data, we have mapped peak association at HMIP-2B to an 11-kb region around rs9494145 and rs9483788, flanked by two conserved regulatory elements for MYB.
Studies in populations from the African continent provide distinct opportunities for mapping disease-modifying genetic loci, especially for conditions that are highly prevalent there, such as SCD. Population-genetic characteristics of our cohort, such as ethnic diversity and the predominance of shorter, African-type haplotypes, can add to the power of such studies.
Mutations in nucleotide-binding oligomerisation domain-containing protein 2 (NOD2) remain the strongest genetic determinants for Crohn’s disease (CD). Having previously identified vimentin as a novel NOD2-interacting protein, we aimed to investigate the regulatory effects of vimentin on NOD2 function and the association of variants in Vim to CD susceptibility.
Co-immunoprecipitation, fluorescent microscopy and fractionation were used to confirm the interaction between NOD2 and vimentin. HEK293 cells stably expressing wild-type NOD2 or NOD2-frameshift variant (L1007fs) and SW480 colonic epithelial cells were used alongside the vimentin inhibitor Withaferin-A (WFA) to assess effects on NOD2 function using nuclear factor-kappaB (NF-κB) reporter gene, GFP-LC3-based autophagy, and bacterial gentamicin protection assays. International GWAS meta-analysis data were used to test for association of SNPs in Vim to CD susceptibility.
The leucine rich repeat (LRR) domain of NOD2 contained the elements required for vimentin binding; CD-associated polymorphisms disrupted this interaction. NOD2 and vimentin co-localised at the cell plasma membrane and cytosolic mislocalisation of the L1007fs and R702W variants correlated with an inability to interact with vimentin. Use of WFA demonstrated that vimentin was required for NOD2-dependent NF-κB activation, MDP-induced autophagy induction, and that NOD2 and vimentin regulated the invasion and survival properties of a CD-associated adherent-invasive strain E.coli strain. Genetic analysis revealed an association signal across the haplotype block containing Vim.
Vimentin is an important regulator of NOD2 function and a potential novel therapeutic target in the treatment of CD. Additionally, Vim is a candidate susceptibility gene for CD, supporting the functional data.
inflammatory bowel disease; Crohn’s disease; NOD2; vimentin; E.coli; autophagy; genetic association studies
Fetal hemoglobin (HbF) is an important modulator of sickle cell disease (SCD). HbF has previously been shown to be affected by variants at three loci on chromosomes 2, 6 and 11, but it is likely that additional loci remain to be discovered.
Methods and Findings
We conducted a genome-wide association study (GWAS) in 1,213 SCA (HbSS/HbSβ0) patients in Tanzania. Genotyping was done with Illumina Omni2.5 array and imputation using 1000 Genomes Phase I release data. Association with HbF was analysed using a linear mixed model to control for complex population structure within our study. We successfully replicated known associations for HbF near BCL11A and the HBS1L-MYB intergenic polymorphisms (HMIP), including multiple independent effects near BCL11A, consistent with previous reports. We observed eight additional associations with P<10−6. These associations could not be replicated in a SCA population in the UK.
This is the largest GWAS study in SCA in Africa. We have confirmed known associations and identified new genetic associations with HbF that require further replication in SCA populations in Africa.
Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5–5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10−8) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P = 1.5×10−117). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR = 0.84, P = 3×10−4), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.
We explored the coding regions of 3,000 Finnish individuals with 3,000 non-Finnish Europeans (NFEs) using whole-exome sequence data, in order to understand how an individual from a bottlenecked population might differ from an individual from an out-bred population. We provide empirical evidence that there are more rare and low-frequency deleterious alleles in Finns compared to NFEs, such that an average Finn has almost twice as many low-frequency complete knockouts of a gene. As such, we hypothesized that some of these low-frequency loss-of-function variants might have important medical consequences in humans and genotyped 83 of these variants in 36,000 Finns. In doing so, we discovered that completely knocking out the TSFM gene might result in inviability or a very severe phenotype in humans and that knocking out the LPA gene might confer protection against coronary heart diseases, suggesting that LPA is likely to be a good potential therapeutic target.
Genetic mutations cause primary immunodeficiencies (PIDs), which predispose to infections. Here we describe Activated PI3K-δ Syndrome (APDS), a PID associated with a dominant gain-of-function mutation E1021K in the p110δ protein, the catalytic subunit of phosphoinositide 3-kinase δ (PI3Kδ), encoded by the PIK3CD gene. We found E1021K in 17 patients from seven unrelated families, but not among 3,346 healthy subjects. APDS was characterized by recurrent respiratory infections, progressive airway damage, lymphopenia, increased circulating transitional B cells, increased IgM and reduced IgG2 levels in serum and impaired vaccine responses. The E1021K mutation enhanced membrane association and kinase activity of p110δ. Patient-derived lymphocytes had increased levels of phosphatidylinositol 3,4,5-trisphosphate and phosphorylated AKT protein and were prone to activation-induced cell death. Selective p110δ inhibitors IC87114 and GS-1101 reduced the activity of the mutant enzyme in vitro, suggesting a therapeutic approach for patients with APDS.
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
A central focus of complex disease genetics after genome-wide association studies (GWAS) is to identify low frequency and rare risk variants, which may account for an important fraction of disease heritability unexplained by GWAS. A profusion of studies using next-generation sequencing are seeking such risk alleles. We describe how already-known complex trait loci (largely from GWAS) can be used to guide the design of these new studies by selecting cases, controls, or families who are most likely to harbor undiscovered risk alleles. We show that genetic risk prediction can select unrelated cases from large cohorts who are enriched for unknown risk factors, or multiply-affected families that are more likely to harbor high-penetrance risk alleles. We derive the frequency of an undiscovered risk allele in selected cases and controls, and show how this relates to the variance explained by the risk score, the disease prevalence and the population frequency of the risk allele. We also describe a new method for informing the design of sequencing studies using genetic risk prediction in large partially-genotyped families using an extension of the Inside-Outside algorithm for inference on trees. We explore several study design scenarios using both simulated and real data, and show that in many cases genetic risk prediction can provide significant increases in power to detect low-frequency and rare risk alleles. The same approach can also be used to aid discovery of non-genetic risk factors, suggesting possible future utility of genetic risk prediction in conventional epidemiology. Software implementing the methods in this paper is available in the R package Mangrove.
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.
Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic diversity, both within and between populations, is far more extensive. We analyse genome-wide data from approximately 5,000 individuals with severe malaria and 7,000 population controls from three different locations in Africa. Our results show that the standard approach is well powered to detect known malaria susceptibility loci when sample sizes are large, and that modern methods for association analysis can control the potential confounding effects of population structure. We show that pattern of association around the haemoglobin S allele differs substantially across populations due to differences in haplotype structure. Motivated by these observations we consider new approaches to association analysis that might prove valuable for multicentre GWAS in Africa: we relax the assumptions of SNP–based fixed effect analysis; we apply Bayesian approaches to allow for heterogeneity in the effect of an allele on risk across studies; and we introduce a region-based test to allow for heterogeneity in the location of causal alleles.
Malaria kills nearly a million people every year, most of whom are young children in Africa. The risk of developing severe malaria is known to be affected by genetics, but so far only a handful of genetic risk factors for malaria have been identified. We studied over a million DNA variants in over 5,000 individuals with severe malaria from the Gambia, Malawi, and Kenya, and about 7,000 healthy individuals from the same countries. Because the populations of Africa are far more genetically diverse than those in Europe, it is necessary to use statistical models that can account for both broad differences between countries and subtler differences between ethnic groups within the same community. We identified known associations at the genes ABO (which affects blood type) and HBB (which causes sickle cell disease), and showed that the latter is heterogeneous across populations. We used these findings to guide the development of statistical tests for association that take this heterogeneity into account, by modelling differences in the strength and genomic location of effect across and within African populations.
Crohn’s disease (CD) and ulcerative colitis (UC), the two common forms of inflammatory bowel disease (IBD), affect over 2.5 million people of European ancestry with rising prevalence in other populations1. Genome-wide association studies (GWAS) and subsequent meta-analyses of CD and UC2,3 as separate phenotypes implicated previously unsuspected mechanisms, such as autophagy4, in pathogenesis and showed that some IBD loci are shared with other inflammatory diseases5. Here we expand knowledge of relevant pathways by undertaking a meta-analysis of CD and UC genome-wide association scans, with validation of significant findings in more than 75,000 cases and controls. We identify 71 new associations, for a total of 163 IBD loci that meet genome-wide significance thresholds. Most loci contribute to both phenotypes, and both directional and balancing selection effects are evident. Many IBD loci are also implicated in other immune-mediated disorders, most notably with ankylosing spondylitis and psoriasis. We also observe striking overlap between susceptibility loci for IBD and mycobacterial infection. Gene co-expression network analysis emphasizes this relationship, with pathways shared between host responses to mycobacteria and those predisposing to IBD.
We genotyped 2,861 cases from the UK PBC consortium and 8,514 UK population controls across 196,524 variants within 186 known autoimmune risk loci. We identified three loci newly associated with primary biliary cirrhosis (PBC) (with P<5×10−8), increasing the number of known susceptibility loci to 25. The most associated variant at 19p12 is a low-frequency non-synonymous SNP in TYK2, further implicating JAK/STAT and cytokine signalling in disease pathogenesis. A further five loci contained non-synonymous variants in high linkage disequilibrium (LD) (r2>0.8) with the most associated variant at the locus. We found multiple independent common, low-frequency and rare variant association signals at five loci. Of the 26 independent non-HLA signals tagged on Immunochip, 15 have SNPs in B-lymphoblastoid open-chromatin regions in high LD (r2>0.8) with the most associated variant. This study demonstrates how dense fine-mapping arrays coupled with functional genomic data can be utilized to identify candidate causal variants for functional follow-up.
Motivation: The existence of families with many individuals affected by the same complex disease has long suggested the possibility of rare alleles of high penetrance. In contrast to Mendelian diseases, however, linkage studies have identified very few reproducibly linked loci in diseases such as diabetes and autism. Genome-wide association studies have had greater success with such diseases, but these results explain neither the extreme disease load nor the within-family linkage peaks, of some large pedigrees. Combining linkage information with exome or genome sequencing from large complex disease pedigrees might finally identify family-specific, high-penetrance mutations.
Results: Olorin is a tool, which integrates gene flow within families with next generation sequencing data to enable the analysis of complex disease pedigrees. Users can interactively filter and prioritize variants based on haplotype sharing across selected individuals and other measures of importance, including predicted functional consequence and population frequency.
Genome sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2,951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in non-essential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes, and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
We densely genotyped, using 1000 Genomes Project pilot CEU and additional re-sequencing study variants, 183 reported immune-mediated disease non-HLA risk loci in 12,041 celiac disease cases and 12,228 controls. We identified 13 new celiac disease risk loci at genome wide significance, bringing the total number of known loci (including HLA) to 40. Multiple independent association signals are found at over a third of these loci, attributable to a combination of common, low frequency, and rare genetic variants. In comparison with previously available data such as HapMap3, our dense genotyping in a large sample size provided increased resolution of the pattern of linkage disequilibrium, and suggested localization of many signals to finer scale regions. In particular, 29 of 54 fine-mapped signals appeared localized to specific single genes - and in some instances to gene regulatory elements. We define a complex genetic architecture of risk regions, and refine risk signals, providing a next step towards elucidating causal disease mechanisms.
Imputation allows the inference of unobserved genotypes in low-density data sets, and is often used to test for disease association at variants that are poorly captured by standard genotyping chips (such as low-frequency variants). Although much effort has gone into developing the best imputation algorithms, less is known about the effects of reference set choice on imputation accuracy. We assess the improvements afforded by increases in reference size and diversity, specifically comparing the HapMap2 data set, which has been used to date for imputation, and the new HapMap3 data set, which contains more samples from a more diverse range of populations. We find that, for imputation into Western European samples, the HapMap3 reference provides more accurate imputation with better-calibrated quality scores than HapMap2, and that increasing the number of HapMap3 populations included in the reference set grant further improvements. Improvements are most pronounced for low-frequency variants (frequency <5%), with the largest and most diverse reference sets bringing the accuracy of imputation of low-frequency variants close to that of common ones. For low-frequency variants, reference set diversity can improve the accuracy of imputation, independent of reference sample size. HapMap3 reference sets provide significant increases in imputation accuracy relative to HapMap2, and are of particular use if highly accurate imputation of low-frequency variants is required. Our results suggest that, although the sample sizes from the 1000 Genomes Pilot Project will not allow reliable imputation of low-frequency variants, the larger sample sizes of the main project will allow.
imputation; reference sets; rare variants
The Fc receptor like 3 (FCRL3) molecule, involved in controlling B cell signalling, may contribute to the autoimmune disease process. Recently a genome wide screen detected association of neighbouring gene FCRL5 with Graves’ disease (GD). To determine whether FCRL5 represents a further independent B cell signaling GD susceptibility loci we screened 12 tag SNPs, capturing all known common variation within FCRL5, in 5192 UK Caucasian GD index cases and controls.
A case control association study investigating twelve tag SNPs within FCRL5 which captured the majority of known common variation within this gene region.
A dataset comprising 2504 UK Caucasian GD patients and 2688 geographically matched controls taken from the 1958 British Birth cohort.
We used the chi-squared test and haplotype analysis to investigate association between the tag SNPs and GD before performing regression analysis to determine if association at FCRL5 was independent of the known FCRL3 association.
Three of the FCRL5 tag SNPs, rs6667109, rs3811035 and rs6692977 showed association with GD (P=0.015-0.001, OR=1.15-1.16). Logistic regression performed on all FCRL5 and, previously screened, FCRL3 tag SNPs revealed that association with FCRL5 was secondary to linkage disequilibrium with the FCRL3, rs11264798 and rs10489678 SNPs.
FCRL5 does not appear to be exerting an independent effect on the development of GD in the UK. Fine mapping of the entire FCRL region is required to determine the exact location of the etiological variant/s present.
Linkage disequilibrium; FCRL3; FCRL5; Graves’ disease; genome wide screening