Mycobacterium tuberculosis (M. tuberculosis) infections cause 9.0 million new tuberculosis (TB) cases and 1.5 million deaths annually1. To search for sequence variants that confer risk of TB we tested 28.3 million variants identified through whole-genome sequencing of 2,636 Icelanders for association with TB (8,162 cases and 277,643 controls), pulmonary TB (PTB), and M. tuberculosis infection. We found association of three sequence variants in the HLA class II region: rs557011[T] (MAF=40.2%) with M. tuberculosis infection (OR =1.14, P=3.1×10-13) and PTB (OR=1.25, P=5.8×10-12) and rs9271378[G] (MAF=32.5%) with PTB (OR=0.78, P=2.5×10-12), both located between HLA-DQA1 and HLA-DRB1. Finally, a missense variant p.Ala210Thr in HLA-DQA1, (MAF=19.1%, rs9272785) shows association with M. tuberculosis infection (P=9.3×10-9, OR=1.14). The association of these variants with PTB was replicated in large samples of European ancestry from Russia and Croatia (P< 5.9×10-4). These findings demonstrate that the HLA class II region contributes to the complex genetic risk of tuberculosis, possibly through reduced presentation of protective M. tuberculosis antigens to T cells.
Crohn's disease and ulcerative colitis are the two major forms of inflammatory bowel disease; treatment strategies have historically been determined by this binary categorisation. Genetic studies have identified 163 susceptibility loci for inflammatory bowel disease, mostly shared between Crohn's disease and ulcerative colitis. We undertook the largest genotype association study, to date, in widely used clinical subphenotypes of inflammatory bowel disease with the goal of further understanding the biological relations between diseases.
This study included patients from 49 centres in 16 countries in Europe, North America, and Australasia. We applied the Montreal classification system of inflammatory bowel disease subphenotypes to 34 819 patients (19 713 with Crohn's disease, 14 683 with ulcerative colitis) genotyped on the Immunochip array. We tested for genotype–phenotype associations across 156 154 genetic variants. We generated genetic risk scores by combining information from all known inflammatory bowel disease associations to summarise the total load of genetic risk for a particular phenotype. We used these risk scores to test the hypothesis that colonic Crohn's disease, ileal Crohn's disease, and ulcerative colitis are all genetically distinct from each other, and to attempt to identify patients with a mismatch between clinical diagnosis and genetic risk profile.
After quality control, the primary analysis included 29 838 patients (16 902 with Crohn's disease, 12 597 with ulcerative colitis). Three loci (NOD2, MHC, and MST1 3p21) were associated with subphenotypes of inflammatory bowel disease, mainly disease location (essentially fixed over time; median follow-up of 10·5 years). Little or no genetic association with disease behaviour (which changed dramatically over time) remained after conditioning on disease location and age at onset. The genetic risk score representing all known risk alleles for inflammatory bowel disease showed strong association with disease subphenotype (p=1·65 × 10−78), even after exclusion of NOD2, MHC, and 3p21 (p=9·23 × 10−18). Predictive models based on the genetic risk score strongly distinguished colonic from ileal Crohn's disease. Our genetic risk score could also identify a small number of patients with discrepant genetic risk profiles who were significantly more likely to have a revised diagnosis after follow-up (p=6·8 × 10−4).
Our data support a continuum of disorders within inflammatory bowel disease, much better explained by three groups (ileal Crohn's disease, colonic Crohn's disease, and ulcerative colitis) than by Crohn's disease and ulcerative colitis as currently defined. Disease location is an intrinsic aspect of a patient's disease, in part genetically determined, and the major driver to changes in disease behaviour over time.
International Inflammatory Bowel Disease Genetics Consortium members funding sources (see Acknowledgments for full list).
We simultaneously investigated the genetic landscape of ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis to investigate pleiotropy and the relationship between these clinically related diseases. Using high-density genotype data from more than 86,000 individuals of European-ancestry we identified 244 independent multi-disease signals including 27 novel genome-wide significant susceptibility loci and 3 unreported shared risk loci. Complex pleiotropy was supported when contrasting multi-disease signals with expression data sets from human, rat and mouse, and epigenetic and expressed enhancer profiles. The comorbidities among the five immune diseases were best explained by biological pleiotropy rather than heterogeneity (a subgroup of cases that is genetically identical to another disease, possibly due to diagnostic misclassification, molecular subtypes, or excessive comorbidity). In particular, the strong comorbidity between primary sclerosing cholangitis and inflammatory bowel disease is likely the result of a unique disease, which is genetically distinct from classical inflammatory bowel disease phenotypes.
Ulcerative colitis and Crohn’s disease are the two main forms of inflammatory bowel disease (IBD). Here, we report the first trans-ethnic association study of IBD, with genome-wide or Immunochip genotype data from an extended cohort of 86,640 European individuals and Immunochip data from 9,846 individuals of East-Asian, Indian or Iranian descent. We implicate 38 loci in IBD risk for the first time. For the majority of IBD risk loci, the direction and magnitude of effect is consistent in European and non-European cohorts. Nevertheless, we observe genetic heterogeneity between divergent populations at several established risk loci driven by a combination of differences in allele frequencies (NOD2), effect sizes (TNFSF15, ATG16L1) or a combination of both (IL23R, IRGM). Our results provide biological insights into the pathogenesis of IBD, and demonstrate the utility of trans-ethnic association studies for mapping complex disease loci and understanding genetic architecture across diverse populations.
Article first published online 7 January 2015.
Supplemental Digital Content is Available in the Text.
Many genetic risk loci have been identified for inflammatory bowel disease and colorectal cancer; however, identifying the causal genes for each association signal remains a challenge. Expression quantitative trait loci (eQTL) studies have identified common variants that induce differential gene expression and eQTLs can be cross-referenced with disease association signals for gene prioritization. However, the genetics of gene expression are highly tissue-specific, and further eQTL datasets from primary tissues are needed.
We have conducted an eQTL discovery study using tissue extracted endoscopically from the terminal ileum and 4 colonic locations of non-inflamed bowel from 65 controls and patients with quiescent inflammatory bowel disease. A genome-wide cis-eQTL analysis was performed on >3,600,000 variants and 13,558 expressed probes.
We identified 1312 independent eQTLs associated with the differential expression of 1222 genes in rectal mucosa. One hundred seventy-one, 211, 168, and 102 independent eQTLs were identified in the sigmoid, descending colon, ascending colon, and terminal ileum, respectively. Twenty-six percent of genes with rectal eQTLs were novel and unique compared with 7 published eQTL datasets. Rectal eQTLs were significantly enriched for genes expressed in the colon. Examining 163 inflammatory bowel disease risk loci identified 11 tag single-nucleotide polymorphisms that were rectal eQTLs. A colorectal cancer locus at 11q23 contained a rectal eQTL for COLCA2, a protein implicated in colon cancer pathogenesis.
This study defines a catalog of ileal and colonic eQTLs. Our data reaffirm the tissue specificity of eQTLs and support the notion that identification of functional variants in relevant tissue can be effective in fine-mapping genetic risk loci.
gene expression; genetic association; eQTL
Human genetic factors predispose to tuberculosis (TB). We studied 7.6 million genetic variants in 5,530 pulmonary TB patients and 5,607 healthy controls. In the combined analysis of these subjects and the follow-up cohort (15,087 TB patients and controls altogether), we found association between TB and variants located in introns of the ASAP1 gene on chromosome 8q24 (P = 2.6 × 10−11 for rs4733781; P = 1.0 × 10−10 for rs10956514). Dendritic cells (DCs) showed high level of ASAP1 expression, which was reduced after M. tuberculosis infection, and rs10956514 was associated with the level of reduction of ASAP1 expression. The ASAP1 protein is involved in actin and membrane remodeling and has been associated with podosomes. The ASAP1-depleted DCs showed impaired matrix degradation and migration. Therefore, genetically determined excessive reduction of ASAP1 expression in M. tuberculosis-infected DCs may lead to their impaired migration, suggesting a potential novel mechanism that predisposes to TB.
Genetic studies of type 1 diabetes (T1D) have identified 50 susceptibility regions1,2 (www.T1DBase.org) revealing major pathways contributing to risk3, with some loci shared across immune disorders4–6. In order to make genetic comparisons across autoimmune disorders as informative as possible a dense genotyping array, the ImmunoChip, was developed, from which four novel T1D regions were identified (P < 5 × 10−8). A comparative analysis with 15 immune diseases (www.ImmunoBase.org) revealed that T1D is more similar genetically to other autoantibody-positive diseases, most significantly to juvenile idiopathic arthritis and least to ulcerative colitis, and provided support for three additional novel T1D loci. Using a Bayesian approach, we defined credible sets for the T1D SNPs. These T1D SNPs localized to enhancer sequences active in thymus, T and B cells, and CD34+ stem cells. Enhancer-promoter interactions can now be analyzed in these cell types to identify which particular genes and regulatory sequences are causal.
We present a genome-wide messenger RNA (mRNA) sequencing technique that converts small amounts of RNA from many samples into molecular phenotypes. It encompasses all steps from sample preparation to sequence analysis and is applicable to baseline profiling or perturbation measurements.
Multiplex sequencing of transcript 3′ ends identifies differential transcript abundance independent of gene annotation. We show that increasing biological replicate number while maintaining the total amount of sequencing identifies more differentially abundant transcripts.
This method can be implemented on polyadenylated RNA from any organism with an annotated reference genome and in any laboratory with access to Illumina sequencing.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1788-6) contains supplementary material, which is available to authorized users.
mRNA transcript profiling; RNA-seq; Molecular phenotype
Genome-wide association studies of the related chronic inflammatory bowel diseases (IBD) known as Crohn’s disease and ulcerative colitis have shown strong evidence of association to the major histocompatibility complex (MHC). This region encodes a large number of immunological candidates, including the antigen-presenting classical HLA molecules1. Studies in IBD have indicated that multiple independent associations exist at HLA and non-HLA genes, but lacked the statistical power to define the architecture of association and causal alleles2,3. To address this, we performed high-density SNP typing of the MHC in >32,000 patients with IBD, implicating multiple HLA alleles, with a primary role for HLA-DRB1*01:03 in both Crohn’s disease and ulcerative colitis. Significant differences were observed between these diseases, including a predominant role of class II HLA variants and heterozygous advantage observed in ulcerative colitis, suggesting an important role of the adaptive immune response to the colonic environment in the pathogenesis of IBD.
Genome-wide association studies (GWAS) have identified thousands of robust and replicable genetic associations for complex disease. However, the identification of the causal variants that underlie these associations has been more difficult. This problem of fine-mapping association signals predates GWAS, but the last few years have seen a surge of studies aimed at pinpointing causal variants using both statistical evidence from large association data sets and functional annotations of genetic variants. Combining these two approaches can often determine not only the causal variant but also the target gene. Recent contributions include analyses of custom genotyping arrays, such as the Immunochip, statistical methods to identify credible sets of causal variants and the addition of functional genomic annotations for coding and non-coding variation to help prioritize variants and discern functional consequence and hence the biological basis of disease risk.
The genetic architecture of autism spectrum disorder involves the interplay of common and rare variation and their impact on hundreds of genes. Using exome sequencing, analysis of rare coding variation in 3,871 autism cases and 9,937 ancestry-matched or parental controls implicates 22 autosomal genes at a false discovery rate (FDR) < 0.05, and a set of 107 autosomal genes strongly enriched for those likely to affect risk (FDR < 0.30). These 107 genes, which show unusual evolutionary constraint against mutations, incur de novo loss-of-function mutations in over 5% of autistic subjects. Many of the genes implicated encode proteins for synaptic, transcriptional, and chromatin remodeling pathways. These include voltage-gated ion channels regulating propagation of action potentials, pacemaking, and excitability-transcription coupling, as well as histone-modifying enzymes and chromatin remodelers, prominently histone post-translational modifications involving lysine methylation/demethylation.
Human genome sequencing has transformed our understanding of genomic variation and its relevance to health and disease, and is now starting to enter clinical practice for the diagnosis of rare diseases. The question of whether and how some categories of genomic findings should be shared with individual research participants is currently a topic of international debate, and development of robust analytical workflows to identify and communicate clinically relevant variants is paramount.
The Deciphering Developmental Disorders (DDD) study has developed a UK-wide patient recruitment network involving over 180 clinicians across all 24 regional genetics services, and has performed genome-wide microarray and whole exome sequencing on children with undiagnosed developmental disorders and their parents. After data analysis, pertinent genomic variants were returned to individual research participants via their local clinical genetics team.
Around 80 000 genomic variants were identified from exome sequencing and microarray analysis in each individual, of which on average 400 were rare and predicted to be protein altering. By focusing only on de novo and segregating variants in known developmental disorder genes, we achieved a diagnostic yield of 27% among 1133 previously investigated yet undiagnosed children with developmental disorders, whilst minimising incidental findings. In families with developmentally normal parents, whole exome sequencing of the child and both parents resulted in a 10-fold reduction in the number of potential causal variants that needed clinical evaluation compared to sequencing only the child. Most diagnostic variants identified in known genes were novel and not present in current databases of known disease variation.
Implementation of a robust translational genomics workflow is achievable within a large-scale rare disease research study to allow feedback of potentially diagnostic findings to clinicians and research participants. Systematic recording of relevant clinical data, curation of a gene–phenotype knowledge base, and development of clinical decision support software are needed in addition to automated exclusion of almost all variants, which is crucial for scalable prioritisation and review of possible diagnostic variants. However, the resource requirements of development and maintenance of a clinical reporting system within a research setting are substantial.
Health Innovation Challenge Fund, a parallel funding partnership between the Wellcome Trust and the UK Department of Health.
Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2,907 cases with AN from 14 countries (15 sites) and 14,860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery datasets. Seventy-six (72 independent) SNPs were taken forward for in silico (two datasets) or de novo (13 datasets) replication genotyping in 2,677 independent AN cases and 8,629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication datasets comprised 5,551 AN cases and 21,080 controls. AN subtype analyses (1,606 AN restricting; 1,445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01×10-7) in SOX2OT and rs17030795 (P=5.84×10-6) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76×10-6) between CUL3 and FAM124B and rs1886797 (P=8.05×10-6) near SPATA13. Comparing discovery to replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P=4×10-6), strongly suggesting that true findings exist but that our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.
anorexia nervosa; eating disorders; GWAS; genome-wide association study; body mass index; metabolic
The contribution of rare coding sequence variants to genetic susceptibility in complex disorders is an important but unresolved question. Most studies thus far have investigated a limited number of genes from regions which contain common disease associated variants. Here we investigate this in inflammatory bowel disease by sequencing the exons and proximal promoters of 531 genes selected from both genome-wide association studies and pathway analysis in pooled DNA panels from 474 cases of Crohn’s disease and 480 controls. 80 variants with evidence of association in the sequencing experiment or with potential functional significance were selected for follow up genotyping in 6,507 IBD cases and 3,064 population controls. The top 5 disease associated variants were genotyped in an extension panel of 3,662 IBD cases and 3,639 controls, and tested for association in a combined analysis of 10,147 IBD cases and 7,008 controls. A rare coding variant p.G454C in the BTNL2 gene within the major histocompatibility complex was significantly associated with increased risk for IBD (p = 9.65x10−10, OR = 2.3[95% CI = 1.75–3.04]), but was independent of the known common associated CD and UC variants at this locus. Rare (<1%) and low frequency (1–5%) variants in 3 additional genes showed suggestive association (p<0.005) with either an increased risk (ARIH2 c.338-6C>T) or decreased risk (IL12B p.V298F, and NICN p.H191R) of IBD. These results provide additional insights into the involvement of the inhibition of T cell activation in the development of both sub-phenotypes of inflammatory bowel disease. We suggest that although rare coding variants may make a modest overall contribution to complex disease susceptibility, they can inform our understanding of the molecular pathways that contribute to pathogenesis.
Crohn’s disease and ulcerative colitis are two forms of inflammatory bowel disease which cause chronic inflammation of the gastrointestinal tract. Common genetic variants in more than 160 regions of the human genome have been associated with an altered risk of these disorders, but leave much of the estimated genetic contribution to disease risk unexplained. We sought to establish whether rare genetic variants which alter the structure or function of the proteins encoded by genes also contribute to disease susceptibility. We used high throughput DNA sequencing to screen over 500 genes for such variants in nearly 500 patients and controls, and validated interesting variants in about 10,000 patients and 7,000 controls. We detected association of a limited number of rare variants from coding regions with disease, suggesting that they do not account for a large proportion of genetic susceptibility. However, they highlight the involvement of genes of potential importance in the development of inflammatory bowel disease, including those involved in the activation of immune cells, the regulation of immune response genes, and the degradation of proteins in cells.
Common genetic variants residing near upstream regulatory elements for MYB, the gene encoding transcription factor cMYB, promote the persistence of fetal hemoglobin (HbF) into adulthood. While they have no consequences in healthy individuals, high HbF levels have major clinical benefits in patients with sickle cell disease (SCD) or β thalassemia. Here, we present our detailed investigation of HBS1L-MYB intergenic polymorphism block 2 (HMIP-2), the central component of the complex quantitative-trait locus upstream of MYB, in 1,022 individuals with SCD in Tanzania.
We have looked at 1022 individuals with HbSS or HbS/β0 in Tanzania. In order to achieve a detailed analysis of HMIP-2, we performed targeted genotyping for a total of 10 SNPs and extracted additional 528 SNPs information from a genome wide scan involving the same population. Using MACH, we utilized the existing YRI data from 1000 genomes to impute 54 SNPs situated within HIMP-2.
Seven HbF-increasing, low-frequency variants (β > 0.3, p < 10−5, f ≤ 0.05) were located in two partially-independent sub-loci, HMIP-2A and HMIP-2B. The spectrum of haplotypes carrying such alleles was diverse when compared to European and West African reference populations: we detected one such haplotype at sub-locus HMIP-2A, two at HMIP-2B, and a fourth including high-HbF alleles at both sub-loci (‘Eurasian’ haplotype clade). In the region of HMIP-2A a putative functional variant (a 3-bp indel) has been described previously, but no such candidate causative variant exists at HMIP-2B. Extending our dataset through imputation with 1000 Genomes, whole-genome-sequence data, we have mapped peak association at HMIP-2B to an 11-kb region around rs9494145 and rs9483788, flanked by two conserved regulatory elements for MYB.
Studies in populations from the African continent provide distinct opportunities for mapping disease-modifying genetic loci, especially for conditions that are highly prevalent there, such as SCD. Population-genetic characteristics of our cohort, such as ethnic diversity and the predominance of shorter, African-type haplotypes, can add to the power of such studies.
Mutations in nucleotide-binding oligomerisation domain-containing protein 2 (NOD2) remain the strongest genetic determinants for Crohn’s disease (CD). Having previously identified vimentin as a novel NOD2-interacting protein, we aimed to investigate the regulatory effects of vimentin on NOD2 function and the association of variants in Vim to CD susceptibility.
Co-immunoprecipitation, fluorescent microscopy and fractionation were used to confirm the interaction between NOD2 and vimentin. HEK293 cells stably expressing wild-type NOD2 or NOD2-frameshift variant (L1007fs) and SW480 colonic epithelial cells were used alongside the vimentin inhibitor Withaferin-A (WFA) to assess effects on NOD2 function using nuclear factor-kappaB (NF-κB) reporter gene, GFP-LC3-based autophagy, and bacterial gentamicin protection assays. International GWAS meta-analysis data were used to test for association of SNPs in Vim to CD susceptibility.
The leucine rich repeat (LRR) domain of NOD2 contained the elements required for vimentin binding; CD-associated polymorphisms disrupted this interaction. NOD2 and vimentin co-localised at the cell plasma membrane and cytosolic mislocalisation of the L1007fs and R702W variants correlated with an inability to interact with vimentin. Use of WFA demonstrated that vimentin was required for NOD2-dependent NF-κB activation, MDP-induced autophagy induction, and that NOD2 and vimentin regulated the invasion and survival properties of a CD-associated adherent-invasive strain E.coli strain. Genetic analysis revealed an association signal across the haplotype block containing Vim.
Vimentin is an important regulator of NOD2 function and a potential novel therapeutic target in the treatment of CD. Additionally, Vim is a candidate susceptibility gene for CD, supporting the functional data.
inflammatory bowel disease; Crohn’s disease; NOD2; vimentin; E.coli; autophagy; genetic association studies
Fetal hemoglobin (HbF) is an important modulator of sickle cell disease (SCD). HbF has previously been shown to be affected by variants at three loci on chromosomes 2, 6 and 11, but it is likely that additional loci remain to be discovered.
Methods and Findings
We conducted a genome-wide association study (GWAS) in 1,213 SCA (HbSS/HbSβ0) patients in Tanzania. Genotyping was done with Illumina Omni2.5 array and imputation using 1000 Genomes Phase I release data. Association with HbF was analysed using a linear mixed model to control for complex population structure within our study. We successfully replicated known associations for HbF near BCL11A and the HBS1L-MYB intergenic polymorphisms (HMIP), including multiple independent effects near BCL11A, consistent with previous reports. We observed eight additional associations with P<10−6. These associations could not be replicated in a SCA population in the UK.
This is the largest GWAS study in SCA in Africa. We have confirmed known associations and identified new genetic associations with HbF that require further replication in SCA populations in Africa.
Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5–5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10−8) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P = 1.5×10−117). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR = 0.84, P = 3×10−4), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.
We explored the coding regions of 3,000 Finnish individuals with 3,000 non-Finnish Europeans (NFEs) using whole-exome sequence data, in order to understand how an individual from a bottlenecked population might differ from an individual from an out-bred population. We provide empirical evidence that there are more rare and low-frequency deleterious alleles in Finns compared to NFEs, such that an average Finn has almost twice as many low-frequency complete knockouts of a gene. As such, we hypothesized that some of these low-frequency loss-of-function variants might have important medical consequences in humans and genotyped 83 of these variants in 36,000 Finns. In doing so, we discovered that completely knocking out the TSFM gene might result in inviability or a very severe phenotype in humans and that knocking out the LPA gene might confer protection against coronary heart diseases, suggesting that LPA is likely to be a good potential therapeutic target.
Genetic mutations cause primary immunodeficiencies (PIDs), which predispose to infections. Here we describe Activated PI3K-δ Syndrome (APDS), a PID associated with a dominant gain-of-function mutation E1021K in the p110δ protein, the catalytic subunit of phosphoinositide 3-kinase δ (PI3Kδ), encoded by the PIK3CD gene. We found E1021K in 17 patients from seven unrelated families, but not among 3,346 healthy subjects. APDS was characterized by recurrent respiratory infections, progressive airway damage, lymphopenia, increased circulating transitional B cells, increased IgM and reduced IgG2 levels in serum and impaired vaccine responses. The E1021K mutation enhanced membrane association and kinase activity of p110δ. Patient-derived lymphocytes had increased levels of phosphatidylinositol 3,4,5-trisphosphate and phosphorylated AKT protein and were prone to activation-induced cell death. Selective p110δ inhibitors IC87114 and GS-1101 reduced the activity of the mutant enzyme in vitro, suggesting a therapeutic approach for patients with APDS.
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
Anorexia nervosa (AN) is a complex and heritable eating disorder characterized by dangerously low body weight. Neither candidate gene studies nor an initial genome wide association study (GWAS) have yielded significant and replicated results. We performed a GWAS in 2,907 cases with AN from 14 countries (15 sites) and 14,860 ancestrally matched controls as part of the Genetic Consortium for AN (GCAN) and the Wellcome Trust Case Control Consortium 3 (WTCCC3). Individual association analyses were conducted in each stratum and meta-analyzed across all 15 discovery datasets. Seventy-six (72 independent) SNPs were taken forward for in silico (two datasets) or de novo (13 datasets) replication genotyping in 2,677 independent AN cases and 8,629 European ancestry controls along with 458 AN cases and 421 controls from Japan. The final global meta-analysis across discovery and replication datasets comprised 5,551 AN cases and 21,080 controls. AN subtype analyses (1,606 AN restricting; 1,445 AN binge-purge) were performed. No findings reached genome-wide significance. Two intronic variants were suggestively associated: rs9839776 (P=3.01×10−7) in SOX2OT and rs17030795 (P=5.84×10−6) in PPP3CA. Two additional signals were specific to Europeans: rs1523921 (P=5.76×10−6) between CUL3 and FAM124B and rs1886797 (P=8.05×10−6) near SPATA13. Comparing discovery to replication results, 76% of the effects were in the same direction, an observation highly unlikely to be due to chance (P= 4×10−6), strongly suggesting that true findings exist but that our sample, the largest yet reported, was underpowered for their detection. The accrual of large genotyped AN case-control samples should be an immediate priority for the field.
anorexia nervosa; eating disorders; GWAS; genome-wide association study; body mass index; metabolic
A central focus of complex disease genetics after genome-wide association studies (GWAS) is to identify low frequency and rare risk variants, which may account for an important fraction of disease heritability unexplained by GWAS. A profusion of studies using next-generation sequencing are seeking such risk alleles. We describe how already-known complex trait loci (largely from GWAS) can be used to guide the design of these new studies by selecting cases, controls, or families who are most likely to harbor undiscovered risk alleles. We show that genetic risk prediction can select unrelated cases from large cohorts who are enriched for unknown risk factors, or multiply-affected families that are more likely to harbor high-penetrance risk alleles. We derive the frequency of an undiscovered risk allele in selected cases and controls, and show how this relates to the variance explained by the risk score, the disease prevalence and the population frequency of the risk allele. We also describe a new method for informing the design of sequencing studies using genetic risk prediction in large partially-genotyped families using an extension of the Inside-Outside algorithm for inference on trees. We explore several study design scenarios using both simulated and real data, and show that in many cases genetic risk prediction can provide significant increases in power to detect low-frequency and rare risk alleles. The same approach can also be used to aid discovery of non-genetic risk factors, suggesting possible future utility of genetic risk prediction in conventional epidemiology. Software implementing the methods in this paper is available in the R package Mangrove.
The molecular mechanisms involved in the development of type 2 diabetes are poorly understood. Starting from genome-wide genotype data for 1,924 diabetic cases and 2,938 population controls generated by the Wellcome Trust Case Control Consortium, we set out to detect replicated diabetes association signals through analysis of 3,757 additional cases and 5,346 controls, and by integration of our findings with equivalent data from other international consortia. We detected diabetes susceptibility loci in and around the genes CDKAL1, CDKN2A/CDKN2B and IGF2BP2 and confirmed the recently described associations at HHEX/IDE and SLC30A8. Our findings provide insights into the genetic architecture of type 2 diabetes, emphasizing the contribution of multiple variants of modest effect. The regions identified underscore the importance of pathways influencing pancreatic beta cell development and function in the etiology of type 2 diabetes.