Genome-wide association studies of the related chronic inflammatory bowel diseases (IBD) known as Crohn’s disease and ulcerative colitis have shown strong evidence of association to the major histocompatibility complex (MHC). This region encodes a large number of immunological candidates, including the antigen-presenting classical HLA molecules1. Studies in IBD have indicated that multiple independent associations exist at HLA and non-HLA genes, but lacked the statistical power to define the architecture of association and causal alleles2,3. To address this, we performed high-density SNP typing of the MHC in >32,000 patients with IBD, implicating multiple HLA alleles, with a primary role for HLA-DRB1*01:03 in both Crohn’s disease and ulcerative colitis. Significant differences were observed between these diseases, including a predominant role of class II HLA variants and heterozygous advantage observed in ulcerative colitis, suggesting an important role of the adaptive immune response to the colonic environment in the pathogenesis of IBD.
Multiple immune-related genes are encoded in the HLA complex on chromosome 6p21. The 8.1 ancestral haplotype (AH8.1) include the classical HLA alleles HLA-B*08:01 and HLA-DRB1*03:01, and has been associated with a large number of autoimmune diseases, but the underlying mechanisms for this association are largely unknown. Given the recently established links between the gut microbiota and inflammatory diseases, we hypothesized that the AH8.1 influences the host gut microbial community composition. To study this further, healthy individuals were selected from the Norwegian Bone Marrow Donor Registry and categorized as either I. AH8.1 homozygote (n=34), II. AH8.1 heterozygote (n=38), III. Non AH8.1 heterozygote or IV. HLA-DRB1 homozygote but non AH8.1 (n=15). Bacterial DNA from stool samples were subjected to sequencing of the V3–V5 region of the 16S rRNA gene on the 454 Life Sciences platform and data analyzed using Mothur and QIIME. The results showed that the abundances of different taxa were highly variable within all pre-defined AH8.1 genotype groups. Using univariate non-parametric statistics, there were no differences regarding alpha or beta diversity between AH8.1 carriers (categories I and II) and non-carriers (categories III and IV), however four different taxa (Prevotellaceae, Clostridium XVIII, Coprococcus, Enterorhabdus) had nominally significant lower abundances in AH8.1 carriers than non-carriers. After including possible confounders in a multivariate linear regression, only the two latter genera remained significantly associated. In conclusion, the overall contribution of the AH8.1 haplotype to the variation in gut microbiota profile of stool in the present study was small.
Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.
Determining the prerequisites for healthy aging is a major task in the modern world characterized by a longer lifespan of the individuals. Besides lifestyle and environmental influences genetic factors are involved as shown by several genome-wide association studies. Older individuals are known to have an impaired immune response, a condition recently termed “inflamm-aging”. We hypothesize that the induction of this condition in the elderly is influenced by the sensitivity of the innate immune system. Therefore, we investigated genetic variants of the Toll-like receptor (TLR) family, one of the major family of innate immune receptors, for association with age in two cohorts of healthy, disease-free subjects.
According to sex we found a positive association of loss-of-function variants of TLR-1 and −6 with healthy aging with odds ratios of 1.54 in males for TLR-6 (249 S/S), and 1.41, 1.66, and 1.64 in females for TLR-1 prom., TLR-1 (248 S/S), and TLR-1 (602 S/S), respectively. Thus, the presence of these variants increases the probability of achieving healthy old age and indicates that a reduced TLR activity may be beneficial in the elderly.
This is the first report showing an association of TLR variants with age. While a loss of function of an important immune receptor may be a risk factor for acute infections as has been shown previously, in the setting of healthy ageing it appears to be protective, which may relate to “inflamm-aging”. These first results should be reproduced in larger trials to confirm this hypothesis.
Inflamm-aging; Innate immunity; Toll-like receptors; Polymorphisms; Healthy aging
Genetic generalised epilepsy (GGE) is the most common form of genetic epilepsy, accounting for 20% of all epilepsies. Genomic copy number variations (CNVs) constitute important genetic risk factors of common GGE syndromes. In our present genome-wide burden analysis, large (≥ 400 kb) and rare (< 1%) autosomal microdeletions with high calling confidence (≥ 200 markers) were assessed by the Affymetrix SNP 6.0 array in European case-control cohorts of 1,366 GGE patients and 5,234 ancestry-matched controls. We aimed to: 1) assess the microdeletion burden in common GGE syndromes, 2) estimate the relative contribution of recurrent microdeletions at genomic rearrangement hotspots and non-recurrent microdeletions, and 3) identify potential candidate genes for GGE. We found a significant excess of microdeletions in 7.3% of GGE patients compared to 4.0% in controls (P = 1.8 x 10-7; OR = 1.9). Recurrent microdeletions at seven known genomic hotspots accounted for 36.9% of all microdeletions identified in the GGE cohort and showed a 7.5-fold increased burden (P = 2.6 x 10-17) relative to controls. Microdeletions affecting either a gene previously implicated in neurodevelopmental disorders (P = 8.0 x 10-18, OR = 4.6) or an evolutionarily conserved brain-expressed gene related to autism spectrum disorder (P = 1.3 x 10-12, OR = 4.1) were significantly enriched in the GGE patients. Microdeletions found only in GGE patients harboured a high proportion of genes previously associated with epilepsy and neuropsychiatric disorders (NRXN1, RBFOX1, PCDH7, KCNA2, EPM2A, RORB, PLCB1). Our results demonstrate that the significantly increased burden of large and rare microdeletions in GGE patients is largely confined to recurrent hotspot microdeletions and microdeletions affecting neurodevelopmental genes, suggesting a strong impact of fundamental neurodevelopmental processes in the pathogenesis of common GGE syndromes.
Epilepsy affects about 4% of the general population during lifetime. The genetic generalised epilepsies (GGEs) represent the most common group of epilepsies with predominant genetic aetiology, accounting for 20% of all epilepsies. Despite their strong heritability, the genetic basis of the majority of patients with GGE remains elusive. Genomic microdeletions constitute a significant source of genetic risk factors for epilepsies. The present genome-wide burden analysis in 1,366 European patients with GGE and 5,234 ancestry-matched controls explored the role of large and rare microdeletions (size ≥ 400 kb, frequency < 1%) in the complex genetic architecture of common GGE syndromes. Our results revealed a 2-fold excess of microdeletions in GGE patients relative to the population controls, 2) a 7-fold increased burden for known hotspot microdeletions (15q11.2, 15q13.3, 16p13.11, 22q11.2) previously associated with a wide range of neurodevelopmental disorders, and 3) a more than 4-fold enrichment of microdeletions carrying a gene implicated in neurodevelopmental disorders. Our findings reinforce emerging evidence that genes affected by microdeletions in GGE patients have a strong impact in fundamental neurodevelopmental processes and dissect novel candidate genes involved in epileptogenesis.
Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here we present the largest trans-ethnic genome-wide meta-analysis (GWMA) of psoriasis in 15,369 cases and 19,517 controls of Caucasian and Chinese ancestries. We identify four novel associations at LOC144817, COG6, RUNX1 and TP63, as well as three novel secondary associations within IFIH1 and IL12B. Fine-mapping analysis of MHC region demonstrates an important role for all three HLA class I genes and a complex and heterogeneous pattern of HLA associations between Caucasian and Chinese populations. Further, trans-ethnic comparison suggests population-specific effect or allelic heterogeneity for 11 loci. These population-specific effects contribute significantly to the ethnic diversity of psoriasis prevalence. This study not only provides novel biological insights into the involvement of immune and keratinocyte development mechanism, but also demonstrates a complex and heterogeneous genetic architecture of psoriasis susceptibility across ethnic populations.
Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here Yin et al. conduct a large trans-ethnic genome-wide meta-analysis and identify novel loci that contribute to population-specific susceptibility.
Epidemiological studies suggest a relationship between blood lipids and immune-mediated diseases, but the nature of these associations is not well understood. We used genome-wide association studies (GWAS) to investigate shared single nucleotide polymorphisms (SNPs) between blood lipids and immune-mediated diseases. We analyzed data from GWAS (n~200,000 individuals), applying new False Discovery Rate (FDR) methods, to investigate genetic overlap between blood lipid levels [triglycerides (TG), low density lipoproteins (LDL), high density lipoproteins (HDL)] and a selection of archetypal immune-mediated diseases (Crohn’s disease, ulcerative colitis, rheumatoid arthritis, type 1 diabetes, celiac disease, psoriasis and sarcoidosis). We found significant polygenic pleiotropy between the blood lipids and all the investigated immune-mediated diseases. We discovered several shared risk loci between the immune-mediated diseases and TG (n = 88), LDL (n = 87) and HDL (n = 52). Three-way analyses differentiated the pattern of pleiotropy among the immune-mediated diseases. The new pleiotropic loci increased the number of functional gene network nodes representing blood lipid loci by 40%. Pathway analyses implicated several novel shared mechanisms for immune pathogenesis and lipid biology, including glycosphingolipid synthesis (e.g. FUT2) and intestinal host-microbe interactions (e.g. ATG16L1). We demonstrate a shared genetic basis for blood lipids and immune-mediated diseases independent of environmental factors. Our findings provide novel mechanistic insights into dyslipidemia and immune-mediated diseases and may have implications for therapeutic trials involving lipid-lowering and anti-inflammatory agents.
Recent evidence suggests that natural selection operating on hosts to maintain their microbiome contributes to the emergence of new species, that is, the ‘hologenomic basis of speciation’. Here we analyse the gut microbiota of two house mice subspecies, Mus musculus musculus and M. m. domesticus, across their Central European hybrid zone, in addition to hybrids generated in the lab. Hybrid mice display widespread transgressive phenotypes (that is, exceed or fall short of parental values) in a variety of measures of bacterial community structure, which reveals the importance of stabilizing selection operating on the intestinal microbiome within species. Further genetic and immunological analyses reveal genetic incompatibilities, aberrant immune gene expression and increased intestinal pathology associated with altered community structure among hybrids. These results provide unique insight into the consequences of evolutionary divergence in a vertebrate ‘hologenome’, which may be an unrecognized contributing factor to reproductive isolation in this taxonomic group.
Animal hosts and their associated microbes are largely the outcome of coevolution. Here, the authors show differences in the intestinal microbiome of hybrids compared with pure species of house mice, which suggests that host–microbiome interactions contribute to the evolution of host species.
Glioblastoma multiforme (GBM) is the most aggressive and malignant subtype of human brain tumors. While a family clustering of GBM has long been acknowledged, relevant hereditary factors still remained elusive. Exome sequencing of families offers the option to discover respective genetic factors.
We sequenced blood samples of one of the rare affected families: while both parents were healthy, both children were diagnosed with GBM. We report 85 homozygous non-synonymous single nucleotide variations (SNVs) in both siblings that were heterozygous in the parents. Beyond known key players for GBM such as ERBB2, PMS2, or CHI3L1, we identified over 50 genes that have not been associated to GBM so far. We also discovered three accumulative effects potentially adding to the tumorigenesis in the siblings: a clustering of multiple variants in single genes (e.g. PTPRB, CROCC), the aggregation of affected genes on specific molecular pathways (e.g. Focal adhesion or ECM receptor interaction) and genomic proximity (e.g. chr22.q12.2, chr1.p36.33). We found a striking accumulation of SNVs in specific genes for the daughter, who developed not only a GBM at the age of 12 years but was subsequently diagnosed with a pilocytic astrocytoma, a common acute lymphatic leukemia and a diffuse pontine glioma.
The reported variants underline the relevance of genetic predisposition and cancer development in this family and demonstrate that GBM has a complex and heterogeneous genetic background. Sequencing of other affected families will help to further narrow down the driving genetic causes for this disease.
glioblastoma multiforme; next generation sequencing; bioinformatics
Primary sclerosing cholangitis (PSC) is a chronic bile duct disease affecting 2.4–7.5% of individuals with inflammatory bowel disease. We performed a genome-wide association analysis of 2,466,182 SNPs in 715 individuals with PSC and 2,962 controls, followed by replication in 1,025 PSC cases and 2,174 controls. We detected non-HLA associations at rs3197999 in MST1 and rs6720394 near BCL2L11 (combined P = 1.1 × 10−16 and P = 4.1 × 10−8, respectively).
The human leukocyte antigen (HLA) complex contains the most polymorphic genes in the human genome. The classical HLA class I and II genes define the specificity of adaptive immune responses. Genetic variation at the HLA genes is associated with susceptibility to autoimmune and infectious diseases and plays a major role in transplantation medicine and immunology. Currently, the HLA genes are characterized using Sanger- or next-generation sequencing (NGS) of a limited amplicon repertoire or labeled oligonucleotides for allele-specific sequences. High-quality NGS-based methods are in proprietary use and not publicly available. Here, we introduce the first highly automated open-kit/open-source HLA-typing method for NGS. The method employs in-solution targeted capturing of the classical class I (HLA-A, HLA-B, HLA-C) and class II HLA genes (HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1). The calling algorithm allows for highly confident allele-calling to three-field resolution (cDNA nucleotide variants). The method was validated on 357 commercially available DNA samples with known HLA alleles obtained by classical typing. Our results showed on average an accurate allele call rate of 0.99 in a fully automated manner, identifying also errors in the reference data. Finally, our method provides the flexibility to add further enrichment target regions.
The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal Mendelian Long QT Syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals we identified 35 common variant QT interval loci, that collectively explain ∼8-10% of QT variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 novel QT loci in 298 unrelated LQTS probands identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode for proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies novel candidate genes for ventricular arrhythmias, LQTS,and SCD.
genome-wide association study; QT interval; Long QT Syndrome; sudden cardiac death; myocardial repolarization; arrhythmias
T-cell immunoglobulin domain and mucin domain-3 (TIM-3, also known as HAVCR2) is an activation-induced inhibitory molecule involved in tolerance and shown to induce T-cell exhaustion in chronic viral infection and cancers1–5. Under some conditions, TIM-3 expression has also been shown to be stimulatory. Considering that TIM-3, like cytotoxic T lymphocyte antigen 4 (CTLA-4) and programmed death 1 (PD-1), is being targeted for cancer immunotherapy, it is important to identify the circumstances under which TIM-3 can inhibit and activate T-cell responses. Here we show that TIM-3 is co-expressed and forms a heterodimer with carcinoembryonic antigen cell adhesion molecule 1 (CEACAM1), another well-known molecule expressed on activated T cells and involved in T-cell inhibition6–10. Biochemical, biophysical and X-ray crystallography studies show that the membrane-distal immunoglobulin-variable (IgV)-like amino-terminal domain of each is crucial to these interactions. The presence of CEACAM1 endows TIM-3 with inhibitory function. CEACAM1 facilitates the maturation and cell surface expression of TIM-3 by forming a heterodimeric interaction in cis through the highly related membrane-distal N-terminal domains of each molecule. CEACAM1 and TIM-3 also bind in trans through their N-terminal domains. Both cis and trans interactions between CEACAM1 and TIM-3 determine the tolerance-inducing function of TIM-3. In a mouse adoptive transfer colitis model, CEACAM1-deficient T cells are hyper-inflammatory with reduced cell surface expression of TIM-3 and regulatory cytokines, and this is restored by T-cell-specific CEACAM1 expression. During chronic viral infection and in a tumour environment, CEACAM1 and TIM-3 mark exhausted T cells. Co-blockade of CEACAM1 and TIM-3 leads to enhancement of anti-tumour immune responses with improved elimination of tumours in mouse colorectal cancer models. Thus, CEACAM1 serves as a heterophilic ligand for TIM-3 that is required for its ability to mediate T-cell inhibition, and this interaction has a crucial role in regulating autoimmunity and anti-tumour immunity.
Genetic variants within the major histocompatibility complex (MHC) represent the strongest genetic susceptibility factors for primary sclerosing cholangitis (PSC). Identifying the causal variants within this genetic complex represents a major challenge due to strong linkage disequilibrium and an overall high physical density of candidate variants. We aimed to refine the MHC association in a geographically restricted PSC patient panel.
A total of 365 PSC cases and 368 healthy controls of Scandinavian ancestry were included in the study. We incorporated data from HLA typing (HLA-A, -B, -C, -DRB3, -DRB1, -DQB1) and single nucleotide polymorphisms across the MHC (n = 18,644; genotyped and imputed) alongside previously suggested PSC risk determinants in the MHC, i.e. amino acid variation of DRβ, a MICA microsatellite polymorphism and HLA-C and HLA-B according to their ligand properties for killer immunoglobulin-like receptors. Breakdowns of the association signal by unconditional and conditional logistic regression analyses demarcated multiple PSC associated MHC haplotypes, and for eight of these classical HLA class I and II alleles represented the strongest association. A novel independent risk locus was detected near NOTCH4 in the HLA class III region, tagged by rs116212904 (odds ratio [95% confidence interval] = 2.32 [1.80, 3.00], P = 1.35×10−11).
Our study shows that classical HLA class I and II alleles, predominantly at HLA-B and HLA-DRB1, are the main risk factors for PSC in the MHC. In addition, the present assessments demonstrated for the first time an association near NOTCH4 in the HLA class III region.
miRNA profiles are promising biomarker candidates for a manifold of human pathologies, opening new avenues for diagnosis and prognosis. Beyond studies that describe miRNAs frequently as markers for specific traits, we asked whether a general pattern for miRNAs across many diseases exists.
We evaluated genome-wide circulating profiles of 1,049 patients suffering from 19 different cancer and non-cancer diseases as well as unaffected controls. The results were validated on 319 individuals using qRT-PCR.
We discovered 34 miRNAs with strong disease association. Among those, we found substantially decreased levels of hsa-miR-144* and hsa-miR-20b with AUC of 0.751 (95% CI: 0.703–0.799), respectively. We also discovered a set of miRNAs, including hsa-miR-155*, as rather stable markers, offering reasonable control miRNAs for future studies. The strong downregulation of hsa-miR-144* and the less variable pattern of hsa-miR-155* has been validated in a cohort of 319 samples in three different centers. Here, breast cancer as an additional disease phenotype not included in the screening phase has been included as the 20th trait.
Our study on 1,368 patients including 1,049 genome-wide miRNA profiles and 319 qRT-PCR validations further underscores the high potential of specific blood-borne miRNA patterns as molecular biomarkers. Importantly, we highlight 34 miRNAs that are generally dysregulated in human pathologies. Although these markers are not specific to certain diseases they may add to the diagnosis in combination with other markers, building a specific signature. Besides these dysregulated miRNAs, we propose a set of constant miRNAs that may be used as control markers.
Electronic supplementary material
The online version of this article (doi:10.1186/s12916-014-0224-0) contains supplementary material, which is available to authorized users.
Bioinformatics; Biomarker; Microarray; miRNA
To perform a genome-wide association study (GWAS) using the Immunochip array in 3,420 cases of ischemic stroke and 6,821 controls, followed by a meta-analysis with data from more than 14,000 additional ischemic stroke cases.
Using the Immunochip, we genotyped 3,420 ischemic stroke cases and 6,821 controls. After imputation we meta-analyzed the results with imputed GWAS data from 3,548 cases and 5,972 controls recruited from the ischemic stroke WTCCC2 study, and with summary statistics from a further 8,480 cases and 56,032 controls in the METASTROKE consortium. A final in silico “look-up” of 2 single nucleotide polymorphisms in 2,522 cases and 1,899 controls was performed. Associations were also examined in 1,088 cases with intracerebral hemorrhage and 1,102 controls.
In an overall analysis of 17,970 cases of ischemic stroke and 70,764 controls, we identified a novel association on chromosome 12q24 (rs10744777, odds ratio [OR] 1.10 [1.07–1.13], p = 7.12 × 10−11) with ischemic stroke. The association was with all ischemic stroke rather than an individual stroke subtype, with similar effect sizes seen in different stroke subtypes. There was no association with intracerebral hemorrhage (OR 1.03 [0.90–1.17], p = 0.695).
Our results show, for the first time, a genetic risk locus associated with ischemic stroke as a whole, rather than in a subtype-specific manner. This finding was not associated with intracerebral hemorrhage.
Background & Aims
Genome-wide association studies (GWASs) have identified 140 Crohn’s disease (CD) susceptibility loci. For most loci, the variants that cause disease are not known and the genes affected by these variants have not been identified. We aimed to identify variants that cause CD through detailed sequencing, genetic association, expression, and functional studies.
We sequenced whole exomes of 42 unrelated subjects with Crohn’s disease (CD) and 5 healthy individuals (controls), and then filtered single-nucleotide variants by incorporating association results from meta-analyses of CD GWASs and in silico mutation effect prediction algorithms. We then genotyped 9348 patients with CD, 2868 with ulcerative colitis, and 14,567 controls, and associated variants analyzed in functional studies using materials from patients and controls and in vitro model systems.
We identified rare missense mutations in PR domain-containing1 (PRDM1) and associated these with CD. These increased proliferation of T cells and secretion of cytokines upon activation, and increased expression of the adhesion molecule L-selectin. A common CD risk allele, identified in GWASs, correlated with reduced expression of PRDM1 in ileal biopsies and peripheral blood mononuclear cells (combined P=1.6×0−8). We identified an association between CD and a common missense variant, Val248Ala, in nuclear domain 10 protein 52 (NDP52) (P=4.83×10−9). We found that this variant impairs the regulatory functions of NDP52 to inhibit NFκB activation of genes that regulate inflammation and affect stability of proteins in toll-like receptor pathways.
We have extended GWAS results and provide evidence that variants in PRDM1 and NDP52 determine susceptibility to CD. PRDM1 maps adjacent to a CD interval identified in GWASs and encodes a transcription factor expressed by T and B cells. NDP52 is an adaptor protein that functions in selective autophagy of intracellular bacteria and signaling molecules, supporting the role for autophagy in pathogenesis of CD.
inflammatory bowel disease; whole-exome sequencing; complex disease
To advance understanding of the complex genetics of Crohn disease (CD) we sequenced 42 whole exomes of patients with CD and five healthy control individuals, resulting in identification of a missense mutation in the autophagy receptor calcium binding and coiled-coil domain 2 (CALCOCO2/NDP52) gene. Protein domain modeling and functional studies highlight the potential role of this mutation in controlling NFKB signaling downstream of toll-like receptor (TLR) pathways. We summarize our recent findings and discuss the role of autophagy as a major modulator of proinflammatory signaling in the context of chronic inflammation.
Crohn disease; autophagy; CALCOCO2; NDP52; inflammation; NF-kappaB; toll-like receptor; adaptophagy
Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied.
We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case–control status – effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case–control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches.
To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case–control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway.
DCM; UC; GWAS; Permutation tests; Pathway analysis
Crohn’s disease (CD) is an inflammatory bowel disease caused by genetic and environmental factors. More than 160 susceptibility loci have been identified for IBD, yet a large part of the genetic variance remains unexplained. Recent studies have demonstrated genetic differences between monozygotic twins, who were long thought to be genetically completely identical.
We aimed to test if somatic mutations play a role in CD etiology by sequencing the genomes and exomes of directly affected tissue from the bowel and blood samples of one and the blood-derived exomes of two further monozygotic discordant twin pairs. Our goal was the identification of mutations present only in the affected twins, pointing to novel candidates for CD susceptibility loci. We present a thorough genetic characterization of the sequenced individuals but detected no consistent differences within the twin pairs. An estimate of the CD susceptibility based on known CD loci however hinted at a higher mutational load in all three twin pairs compared to 1,920 healthy individuals.
Somatic mosaicism does not seem to play a role in the discordance of monozygotic CD twins. Our study constitutes the first to perform whole genome sequencing for CD twins and therefore provides a valuable reference dataset for future studies. We present an example framework for mosaicism detection and point to the challenges in these types of analyses.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-564) contains supplementary material, which is available to authorized users.
Crohn’s disease; Discordant monozygotic twins; Somatic mosaicism; Whole genome sequencing; Exome sequencing
We have previously identified tagSNPs at 8q24.21 influencing glioma risk. We have sought to fine-map the location of the functional basis of this association using data from four genome-wide association studies, comprising a total of 4147 glioma cases and 7435 controls. To improve marker density across the 700 kb region, we imputed genotypes using 1000 Genomes Project data and high-coverage sequencing data generated on 253 individuals. Analysis revealed an imputed low-frequency SNP rs55705857 (P = 2.24 × 10−38) which was sufficient to fully capture the 8q24.21 association. Analysis by glioma subtype showed the association with rs55705857 confined to non-glioblastoma multiforme (non-GBM) tumours (P = 1.07 × 10−67). Validation of the non-GBM association was shown in three additional datasets (625 non-GBM cases, 2412 controls; P = 1.41 × 10−28). In the pooled analysis, the odds ratio for low-grade glioma associated with rs55705857 was 4.3 (P = 2.31 × 10−94). rs55705857 maps to a highly evolutionarily conserved sequence within the long non-coding RNA CCDC26 raising the possibility of direct functionality. These data provide additional insights into the aetiological basis of glioma development.
Annual fish of the genus Nothobranchius show large variations in lifespan and expression of age-related phenotypes between closely related populations. We studied N. kadleci and its sister species N. furzeri GRZ strain, and found that N.kadleci is longer-lived than the N. furzeri. Lipofuscin and apoptosis measured in the liver increased with age in N. kadleci with different profiles: lipofuscin increased linearly, while apoptosis declined in the oldest animals. More lipofuscin (P < 0.001) and apoptosis (P < 0.001) was observed in N. furzeri than in N. kadleci at 16w age. Lipofuscin and apoptotic cells were then quantified in hybrids from the mating of N. furzeri to N. kadleci. F1 individuals showed heterosis for lipofuscin but additive effects for apoptosis. These two age-related phenotypes were not correlated in F2 hybrids. Quantitative trait loci analysis of 287 F2 fish using 237 markers identified two QTL accounting for 10% of lipofuscin variance (P < 0.001) with overdominance effect. Apoptotic cells revealed three significant- and two suggestive QTL explaining 19% of variance (P < 0.001), showing additive and dominance effects, and two interacting loci. Our results show that lipofuscin and apoptosis are markers of different age-dependent biological processes controlled by different genetic mechanisms.
Nothobranchius; lifespan; lipofuscin; apoptosis; quantitative trait loci; aging
Next Generation Sequencing (NGS) of whole exomes or genomes is increasingly being used in human genetic research and diagnostics. Sharing NGS data with third parties can help physicians and researchers to identify causative or predisposing mutations for a specific sample of interest more efficiently. In many cases, however, the exchange of such data may collide with data privacy regulations. GrabBlur is a newly developed tool to aggregate and share NGS-derived single nucleotide variant (SNV) data in a public database, keeping individual samples unidentifiable. In contrast to other currently existing SNV databases, GrabBlur includes phenotypic information and contact details of the submitter of a given database entry. By means of GrabBlur human geneticists can securely and easily share SNV data from resequencing projects. GrabBlur can ease the interpretation of SNV data by offering basic annotations, genotype frequencies and in particular phenotypic information - given that this information was shared - for the SNV of interest.
GrabBlur facilitates the combination of phenotypic and NGS data (VCF files) via a local interface or command line operations. Data submissions may include HPO (Human Phenotype Ontology) terms, other trait descriptions, NGS technology information and the identity of the submitter. Most of this information is optional and its provision at the discretion of the submitter. Upon initial intake, GrabBlur merges and aggregates all sample-specific data. If a certain SNV is rare, the sample-specific information is replaced with the submitter identity. Generally, all data in GrabBlur are highly aggregated so that they can be shared with others while ensuring maximum privacy. Thus, it is impossible to reconstruct complete exomes or genomes from the database or to re-identify single individuals. After the individual information has been sufficiently "blurred", the data can be uploaded into a publicly accessible domain where aggregated genotypes are provided alongside phenotypic information. A web interface allows querying the database and the extraction of gene-wise SNV information. If an interesting SNV is found, the interrogator can get in contact with the submitter to exchange further information on the carrier and clarify, for example, whether the latter's phenotype matches with phenotype of their own patient.
Heritability estimates for body mass index (BMI) variation are high. For mothers and their offspring higher BMI correlations have been described than for fathers. Variation(s) in the exclusively maternally inherited mitochondrial DNA (mtDNA) might contribute to this parental effect. Thirty-two to 40 mtDNA single nucleotide polymorphisms (SNPs) were available from genome-wide association study SNP arrays (Affymetrix 6.0). For discovery, we analyzed association in a case-control (CC) sample of 1,158 extremely obese children and adolescents and 435 lean adult controls. For independent confirmation, 7,014 population-based adults were analyzed as CC sample of n = 1,697 obese cases (BMI≥30 kg/m2) and n = 2,373 normal weight and lean controls (BMI<25 kg/m2). SNPs were analyzed as single SNPs and haplogroups determined by HaploGrep. Fisher's two-sided exact test was used for association testing. Moreover, the D-loop was re-sequenced (Sanger) in 192 extremely obese children and adolescents and 192 lean adult controls. Association testing of detected variants was performed using Fisher's two-sided exact test. For discovery, nominal association with obesity was found for the frequent allele G of m.8994G/A (rs28358887, p = 0.002) located in ATP6. Haplogroup W was nominally overrepresented in the controls (p = 0.039). These findings could not be confirmed independently. For two of the 252 identified D-loop variants nominal association was detected (m.16292C/T, p = 0.007, m.16189T/C, p = 0.048). Only eight controls carried the m.16292T allele, five of whom belonged to haplogroup W that was initially enriched among these controls. m.16189T/C might create an uninterrupted poly-C tract located near a regulatory element involved in replication of mtDNA. Though follow-up of some D-loop variants still is conceivable, our hypothesis of a contribution of variation in the exclusively maternally inherited mtDNA to the observed larger correlations for BMI between mothers and their offspring could not be substantiated by the findings of the present study.