Genetic factors have been estimated to account for about 25% of the variation in an adult's life span. The complement component C4 with the isotypes C4A and C4B is an effector protein of the immune system, and differences in the overall C4 copy number or gene size (long C4L; short C4S) may influence the strength of the immune response and disease susceptibilities. Previously, an association between C4B copy number and life span was reported for Hungarians and Icelanders, where the C4B*Q0 genotype, which is defined by C4B gene deficiency, showed a decrease in frequency with age. Additionally, one of the studies indicated that a low C4B copy number might be a genetic trait that is manifested only in the presence of the environmental risk factor “smoking”. These observations prompted us to investigate the role of the C4 alleles in our large German longevity sample (∼700 cases; 94–110 years and ∼900 younger controls). No significant differences in the number of C4A, C4B and C4S were detected. Besides, the C4B*Q0 carrier state did not decrease with age, irrespective of smoking as an interacting variable. However, for C4L*Q0 a significantly different carrier frequency was observed in the cases compared with controls (cases: 5.08%; controls: 9.12%; p = 0.003). In a replication sample of 714 German cases (91–108 years) and 890 controls this result was not replicated (p = 0.14) although a similar trend of decreased C4L*Q0 carrier frequency in cases was visible (cases: 7.84%; controls: 10.00%).
To further characterize the genetic basis of primary biliary cirrhosis (PBC), we genotyped 2426 PBC patients and 5731 unaffected controls from three independent cohorts using a single nucleotide polymorphism (SNP) array (Immunochip) enriched for autoimmune disease risk loci. Meta-analysis of the genotype data sets identified a novel disease-associated locus near the TNFSF11 gene at 13q14, provided evidence for association at six additional immune-related loci not previously implicated in PBC and confirmed associations at 19 of 22 established risk loci. Results of conditional analyses also provided evidence for multiple independent association signals at four risk loci, with haplotype analyses suggesting independent SNP effects at the 2q32 and 16p13 loci, but complex haplotype driven effects at the 3q25 and 6p21 loci. By imputing classical HLA alleles from this data set, four class II alleles independently contributing to the association signal from this region were identified. Imputation of genotypes at the non-HLA loci also provided additional associations, but none with stronger effects than the genotyped variants. An epistatic interaction between the IL12RB2 risk locus at 1p31and the IRF5 risk locus at 7q32 was also identified and suggests a complementary effect of these loci in predisposing to disease. These data expand the repertoire of genes with potential roles in PBC pathogenesis that need to be explored by follow-up biological studies.
DMBT is an antibacterial pattern recognition and scavenger receptor. In this study, we analyzed the role of DMBT1 single nucleotide polymorphisms (SNPs) regarding inflammatory bowel disease (IBD) susceptibility and examined their functional impact on transcription factor binding and downstream gene expression.
Seven SNPs in the DMBT1 gene region were analyzed in 2073 individuals including 818 Crohn’s disease (CD) patients and 972 healthy controls in two independent case-control panels. Comprehensive epistasis analyses for the known CD susceptibility genes NOD2, IL23R and IL27 were performed. The influence of IL23R variants on DMBT1 expression was analyzed. Functional analysis included siRNA transfection, quantitative PCR, western blot, electrophoretic mobility shift and luciferase assays.
IL-22 induces DMBT1 protein expression in intestinal epithelial cells dependent on STAT3, ATF-2 and CREB1. IL-22 expression-modulating, CD risk-associated IL23R variants influence DMBT1 expression in CD patients and DMBT1 levels are increased in the inflamed intestinal mucosa of CD patients. Several DMBT1 SNPs were associated with CD susceptibility. SNP rs2981804 was most strongly associated with CD in the combined panel (p = 3.0×10−7, OR 1.42; 95% CI 1.24–1.63). All haplotype groups tested showed highly significant associations with CD (including omnibus P-values as low as 6.1×10−18). The most strongly CD risk-associated, non-coding DMBT1 SNP rs2981804 modifies the DNA binding sites for the transcription factors CREB1 and ATF-2 and the respective genomic region comprising rs2981804 is able to act as a transcriptional regulator in vitro. Intestinal DMBT1 expression is decreased in CD patients carrying the rs2981804 CD risk allele.
We identified novel associations of DMBT1 variants with CD susceptibility and discovered a novel functional role of rs2981804 in regulating DMBT1 expression. Our data suggest an important role of DMBT1 in CD pathogenesis.
Genome-wide association studies and follow-up meta-analyses in Crohn's disease (CD) and ulcerative colitis (UC) have recently identified 163 disease-associated loci that meet genome-wide significance for these two inflammatory bowel diseases (IBD). These discoveries have already had a tremendous impact on our understanding of the genetic architecture of these diseases and have directed functional studies that have revealed some of the biological functions that are important to IBD (e.g. autophagy). Nonetheless, these loci can only explain a small proportion of disease variance (∼14% in CD and 7.5% in UC), suggesting that not only are additional loci to be found but that the known loci may contain high effect rare risk variants that have gone undetected by GWAS. To test this, we have used a targeted sequencing approach in 200 UC cases and 150 healthy controls (HC), all of French Canadian descent, to study 55 genes in regions associated with UC. We performed follow-up genotyping of 42 rare non-synonymous variants in independent case-control cohorts (totaling 14,435 UC cases and 20,204 HC). Our results confirmed significant association to rare non-synonymous coding variants in both IL23R and CARD9, previously identified from sequencing of CD loci, as well as identified a novel association in RNF186. With the exception of CARD9 (OR = 0.39), the rare non-synonymous variants identified were of moderate effect (OR = 1.49 for RNF186 and OR = 0.79 for IL23R). RNF186 encodes a protein with a RING domain having predicted E3 ubiquitin-protein ligase activity and two transmembrane domains. Importantly, the disease-coding variant is located in the ubiquitin ligase domain. Finally, our results suggest that rare variants in genes identified by genome-wide association in UC are unlikely to contribute significantly to the overall variance for the disease. Rather, these are expected to help focus functional studies of the corresponding disease loci.
Genetic studies of common diseases have seen tremendous progress in the last half-decade primarily due to recent technologies that enable a systematic examination of genetic markers across the entire genome in large numbers of patients and healthy controls. The studies, while identifying genomic regions that influence a person's risk for developing disease, often do not pinpoint the actual gene or gene variants that account for this risk (called a causal gene/variant). A prime example of this can be seen with the 163 genetic risk factors that have recently been associated with the chronic inflammatory bowel diseases known as Crohn's disease and ulcerative colitis. For less than a handful of these 163 is the causative change in the genetic code known. The current study used an approach to directly look at the genetic code for a subset of these and identified a causative change in the genetic code for eight risk factors for ulcerative colitis. This finding is particularly important because it directs biological studies to understand the mechanisms that lead to this chronic life-long inflammatory disease.
Atopic dermatitis (AD) is the most common dermatological disease of childhood. Many children with AD have asthma and AD shares regions of genetic linkage with psoriasis, another chronic inflammatory skin disease. We present here a genome-wide association study (GWAS) of childhood-onset AD in 1563 European cases with known asthma status and 4054 European controls. Using Illumina genotyping followed by imputation, we generated 268 034 consensus genotypes and in excess of 2 million single nucleotide polymorphisms (SNPs) for analysis. Association signals were assessed for replication in a second panel of 2286 European cases and 3160 European controls. Four loci achieved genome-wide significance for AD and replicated consistently across all cohorts. These included the epidermal differentiation complex (EDC) on chromosome 1, the genomic region proximal to LRRC32 on chromosome 11, the RAD50/IL13 locus on chromosome 5 and the major histocompatibility complex (MHC) on chromosome 6; reflecting action of classical HLA alleles. We observed variation in the contribution towards co-morbid asthma for these regions of association. We further explored the genetic relationship between AD, asthma and psoriasis by examining previously identified susceptibility SNPs for these diseases. We found considerable overlap between AD and psoriasis together with variable coincidence between allergic rhinitis (AR) and asthma. Our results indicate that the pathogenesis of AD incorporates immune and epidermal barrier defects with combinations of specific and overlapping effects at individual loci.
We hypothesize that imputation based on data from the 1000 Genomes Project can identify novel association signals on a genome-wide scale due to the dense marker map and the large number of haplotypes. To test the hypothesis, the Wellcome Trust Case Control Consortium (WTCCC) Phase I genotype data were imputed using 1000 genomes as reference (20100804 EUR), and seven case/control association studies were performed using imputed dosages. We observed two ‘missed' disease-associated variants that were undetectable by the original WTCCC analysis, but were reported by later studies after the 2007 WTCCC publication. One is within the IL2RA gene for association with type 1 diabetes and the other in proximity with the CDKN2B gene for association with type 2 diabetes. We also identified two refined associations. One is SNP rs11209026 in exon 9 of IL23R for association with Crohn's disease, which is predicted to be probably damaging by PolyPhen2. The other refined variant is in the CUX2 gene region for association with type 1 diabetes, where the newly identified top SNP rs1265564 has an association P-value of 1.68 × 10−16. The new lead SNP for the two refined loci provides a more plausible explanation for the disease association. We demonstrated that 1000 Genomes-based imputation could indeed identify both novel (in our case, ‘missed' because they were detected and replicated by studies after 2007) and refined signals. We anticipate the findings derived from this study to provide timely information when individual groups and consortia are beginning to engage in 1000 genomes-based imputation.
genome-wide association study; the 1000 Genomes project; imputation
Several studies examined the fine-scale structure of human genetic variation in Europe. However, the European sets analyzed represent mainly northern, western, central, and southern Europe. Here, we report an analysis of approximately 166,000 single nucleotide polymorphisms in populations from eastern (northeastern) Europe: four Russian populations from European Russia, and three populations from the northernmost Finno-Ugric ethnicities (Veps and two contrast groups of Komi people). These were compared with several reference European samples, including Finns, Estonians, Latvians, Poles, Czechs, Germans, and Italians. The results obtained demonstrated genetic heterogeneity of populations living in the region studied. Russians from the central part of European Russia (Tver, Murom, and Kursk) exhibited similarities with populations from central–eastern Europe, and were distant from Russian sample from the northern Russia (Mezen district, Archangelsk region). Komi samples, especially Izhemski Komi, were significantly different from all other populations studied. These can be considered as a second pole of genetic diversity in northern Europe (in addition to the pole, occupied by Finns), as they had a distinct ancestry component. Russians from Mezen and the Finnic-speaking Veps were positioned between the two poles, but differed from each other in the proportions of Komi and Finnic ancestries. In general, our data provides a more complete genetic map of Europe accounting for the diversity in its most eastern (northeastern) populations.
Susceptibility to primary biliary cirrhosis (PBC) is strongly associated with HLA region polymorphisms. To determine if associations can be explained by classical HLA determinants we studied Italian 676 cases and 1440 controls with genotyped with dense single nucleotide polymorphisms (SNPs) for which classical HLA alleles and amino acids were imputed. Although previous genome-wide association studies and our results show stronger SNP associations near DQB1, we demonstrate that the HLA signals can be attributed to classical DRB1 and DPB1 genes. Strong support for the predominant role of DRB1 is provided by our conditional analyses. We also demonstrate an independent association of DPB1. Specific HLA-DRB1 genes (*08, *11 and *14) account for most of the DRB1 association signal. Consistent with previous studies, DRB1*08 (p = 1.59 × 10−11) was the strongest predisposing allele where as DRB1*11 (p = 1.42 × 10−10) was protective. Additionally DRB1*14 and the DPB1 association (DPB1*03:01) (p = 9.18 × 10−7) were predisposing risk alleles. No signal was observed in the HLA class 1 or class 3 regions. These findings better define the association of PBC with HLA and specifically support the role of classical HLA-DRB1 and DPB1 genes and alleles in susceptibility to PBC.
genetic risk; risk allele; imputation; antigen binding pocket; autoimmune disease
Genome-wide association studies identified a PTGER4 expression-modulating region on chromosome 5p13.1 as Crohn's disease (CD) susceptibility region. The study aim was to test this association in a large cohort of patients with inflammatory bowel disease (IBD) and to elucidate genotypic and phenotypic interactions with other IBD genes.
A total of 7073 patients and controls were genotyped: 844 CD and 471 patients with ulcerative colitis and 1488 controls were analyzed for the single nucleotide polymorphisms (SNPs) rs4495224 and rs7720838 on chromosome 5p13.1. The study included two replication cohorts of North American (CD: n = 684; controls: n = 1440) and of German origin (CD: n = 1098; controls: n = 1048). Genotype-phenotype, epistasis and transcription factor binding analyses were performed. In the discovery cohort, an association of rs4495224 (p = 4.10×10−5; 0.76 [0.67–0.87]) and of rs7720838 (p = 6.91×10−4; 0.81 [0.71–0.91]) with susceptibility to CD was demonstrated. These associations were confirmed in both replication cohorts. In silico analysis predicted rs4495224 and rs7720838 as essential parts of binding sites for the transcription factors NF-κB and XBP1 with higher binding scores for carriers of the CD risk alleles, providing an explanation of how these SNPs might contribute to increased PTGER4 expression. There was no association of the PTGER4 SNPs with IBD phenotypes. Epistasis detected between 5p13.1 and ATG16L1 for CD susceptibility in the discovery cohort (p = 5.99×10−7 for rs7720838 and rs2241880) could not be replicated in both replication cohorts arguing against a major role of this gene-gene interaction in the susceptibility to CD.
We confirmed 5p13.1 as a major CD susceptibility locus and demonstrate by in silico analysis rs4495224 and rs7720838 as part of binding sites for NF-κB and XBP1. Further functional studies are necessary to confirm the results of our in silico analysis and to analyze if changes in PTGER4 expression modulate CD susceptibility.
Atopic dermatitis (AD) is a common chronic inflammatory skin disorder where epidermal barrier dysfunction is a major factor in the pathogenesis. The identification of AD susceptibility genes related to barrier dysfunction is therefore of importance. The epidermal transglutaminases (TGM1, TGM3 and TGM5) encodes essential cross-linking enzymes in the epidermis.
To determine whether genetic variability in the epidermal transglutaminases contributes to AD susceptibility.
Forty-seven single nucleotide polymorphisms (SNPs) in the TGM1, TGM3 and TGM5 gene region were tested for genetic association with AD, independently and in relation to FLG genotype, using a pedigree disequilibrium test (PDT) in a Swedish material consisting of 1753 individuals from 539 families. In addition, a German case-control material, consisting of 533 AD cases and 1996 controls, was used for in silico analysis of the epidermal TGM regions. Gene expression of the TGM1, TGM3 and TGM5 gene was investigated by relative quantification with Real Time PCR (qRT-PCR). Immunohistochemical (IHC) analysis was performed to detect TG1, TG3 and TG5 protein expression in the skin of patients and healthy controls.
PDT analysis identified a significant association between the TGM1 SNP rs941505 and AD with allergen-specific IgE in the Swedish AD family material. However, the association was not replicated in the German case-control material. No significant association was detected for analyzed SNPs in relation to FLG genotype. TG1, TG3 and TG5 protein expression was detected in AD skin and a significantly increased TGM3 mRNA expression was observed in lesional skin by qRT-PCR.
Although TGM1 and TGM3 may be differentially expressed in AD skin, the results from the genetic analysis suggest that genetic variation in the epidermal transglutaminases is not an important factor in AD susceptibility.
In this review, we discuss the latest targeted enrichment methods and aspects of their utilization along with second-generation sequencing for complex genome analysis. In doing so, we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a powerful tool. We explain how targeted enrichment for next-generation sequencing has made great progress in terms of methodology, ease of use and applicability, but emphasize the remaining challenges such as the lack of even coverage across targeted regions. Costs are also considered versus the alternative of whole-genome sequencing which is becoming ever more affordable. We conclude that targeted enrichment is likely to be the most economical option for many years to come in a range of settings.
targeted enrichment; next-generation sequencing; genome partitioning; exome; genetic variation
Exposure to microbes during early childhood is associated with protection from immune-mediated diseases such as inflammatory bowel disease (IBD) and asthma. Here, we show that in germ-free (GF) mice, invariant natural killer T (iNKT) cells accumulate in the colonic lamina propria and lung, resulting in increased morbidity in models of IBD and allergic asthma as compared with that of specific pathogen-free mice. This was associated with increased intestinal and pulmonary expression of the chemokine ligand CXCL16, which was associated with increased mucosal iNKT cells. Colonization of neonatal—but not adult—GF mice with a conventional microbiota protected the animals from mucosal iNKT accumulation and related pathology. These results indicate that age-sensitive contact with commensal microbes is critical for establishing mucosal iNKT cell tolerance to later environmental exposures.
Objectives: Using a novel candidate SNP approach, we aimed to identify a possible genetic basis for the higher glioma incidence in Whites relative to East Asians and African-Americans. Methods: We hypothesized that genetic regions containing SNPs with extreme differences in allele frequencies across ethnicities are most likely to harbor susceptibility variants. We used International HapMap Project data to identify 3,961 candidate SNPs with the largest allele frequency differences in Whites compared to East Asians and Africans and tested these SNPs for association with glioma risk in a set of White cases and controls. Top SNPs identified in the discovery dataset were tested for association with glioma in five independent replication datasets. Results: No SNP achieved statistical significance in either the discovery or replication datasets after accounting for multiple testing or conducting meta-analysis. However, the most strongly associated SNP, rs879471, was found to be in linkage disequilibrium with a previously identified risk SNP, rs6010620, in RTEL1. We estimate rs6010620 to account for a glioma incidence rate ratio of 1.34 for Whites relative to East Asians. Conclusion: We explored genetic susceptibility to glioma using a novel candidate SNP method which may be applicable to other diseases with appropriate epidemiologic patterns.
glioma; candidate SNP association study; ancestry informative markers; admixture; race; ethnicity; brain cancer
Psoriatic arthritis (PsA) is a chronic inflammatory musculoskeletal disease affecting up to 30% of psoriasis vulgaris (PsV) cases and approximately 0.25% to 1% of the general population. To identify common susceptibility loci, we performed a meta-analysis of three imputed genome-wide association studies (GWAS) on psoriasis, stratified for PsA. A total of 1,160,703 SNPs were analyzed in the discovery set consisting of 535 PsA cases and 3,432 controls from Germany, the United States and Canada. We followed up two SNPs in 1,931 PsA cases and 6,785 controls comprising six independent replication panels from Germany, Estonia, the United States and Canada. In the combined analysis, a genome-wide significant association was detected at 2p16 near the REL locus encoding c-Rel (rs13017599, P=1.18×10−8, OR=1.27, 95% CI=1.18–1.35). The rs13017599 polymorphism is known to associate with rheumatoid arthritis (RA), and another SNP near REL (rs702873) was recently implicated in PsV susceptibility. However, conditional analysis indicated that rs13017599, rather than rs702873, accounts for the PsA association at REL. We hypothesize that c-Rel, as a member of the Rel/NF-κB family, is associated with PsA in the context of disease pathways that involve other identified PsA and PsV susceptibility genes including TNIP1, TNFAIP3 and NFκBIA.
Genome-wide association studies of two main forms of inflammatory bowel diseases (IBD), Crohn’s disease (CD) and ulcerative colitis (UC), have identified 99 susceptibility loci, but these explain only ∼23% of the genetic risk. Part of the ‘hidden heritability’ could be in transmissible genetic effects in which mRNA expression in the offspring depends on the parental origin of the allele (genomic imprinting), since children whose mothers have CD are more often affected than children with affected fathers. We analyzed parent-of-origin (POO) effects in Dutch and Indian cohorts of IBD patients.
We selected 28 genetic loci associated with both CD and UC, and tested them for POO effects in 181 Dutch IBD case-parent trios. Three susceptibility variants in NOD2 were tested in 111 CD trios and a significant finding was re-evaluated in 598 German trios. The UC-associated gene, BTNL2, reportedly imprinted, was tested in 70 Dutch UC trios. Finally, we used 62 independent Indian UC trios to test POO effects of five established Indian UC risk loci.
We identified POO effects for NOD2 (L1007fs; OR = 21.0, P-value = 0.013) for CD; these results could not be replicated in an independent cohort (OR = 0.97, P-value = 0.95). A POO effect in IBD was observed for IL12B (OR = 3.2, P-value = 0.019) and PRDM1 (OR = 5.6, P-value = 0.04). In the Indian trios the IL10 locus showed a POO effect (OR = 0.2, P-value = 0.03).
Little is known about the effect of genomic imprinting in complex diseases such as IBD. We present limited evidence for POO effects for the tested IBD loci. POO effects explain part of the hidden heritability for complex genetic diseases but need to be investigated further.
Many hypothesis-driven genetic studies require the ability to comprehensively and efficiently target specific regions of the genome to detect sequence variations. Often, sample availability is limited requiring the use of whole genome amplification (WGA). We evaluated a high-throughput microdroplet-based PCR approach in combination with next generation sequencing (NGS) to target 384 discrete exons from 373 genes involved in cancer. In our evaluation, we compared the performance of six non-amplified gDNA samples from two HapMap family trios. Three of these samples were also preamplified by WGA and evaluated. We tested sample pooling or multiplexing strategies at different stages of the tested targeted NGS (T-NGS) workflow.
The results demonstrated comparable sequence performance between non-amplified and preamplified samples and between different indexing strategies [sequence specificity of 66.0% ± 3.4%, uniformity (coverage at 0.2× of the mean) of 85.6% ± 0.6%]. The average genotype concordance maintained across all the samples was 99.5% ± 0.4%, regardless of sample type or pooling strategy. We did not detect any errors in the Mendelian patterns of inheritance of genotypes between the parents and offspring within each trio. We also demonstrated the ability to detect minor allele frequencies within the pooled samples that conform to predicted models.
Our described PCR-based sample multiplex approach and the ability to use WGA material for NGS may enable researchers to perform deep resequencing studies and explore variants at very low frequencies and cost.
High-throughput targeted next-generation resequencing; Microdroplet-based multiplex PCR; Sample pooling or multiplexing; Whole-genome amplified DNA samples; Cost reduction
Genome-wide association studies (GWAS) have provided a large set of genetic loci
influencing the risk for many common diseases. Association studies typically
analyze one specific trait in single populations in an isolated fashion without
taking into account the potential phenotypic and genetic correlation between
traits. However, GWA data can be efficiently used to identify overlapping loci
with analogous or contrasting effects on different diseases.
Here, we describe a new approach to systematically prioritize and interpret
available GWA data. We focus on the analysis of joint and disjoint genetic
determinants across diseases. Using network analysis, we show that variant-based
approaches are superior to locus-based analyses. In addition, we provide a
prioritization of disease loci based on network properties and discuss the roles
of hub loci across several diseases. We demonstrate that, in general, agonistic
associations appear to reflect current disease classifications, and present the
potential use of effect sizes in refining and revising these agonistic signals. We
further identify potential branching points in disease etiologies based on
antagonistic variants and describe plausible small-scale models of the underlying
The observation that a surprisingly high fraction (>15%) of the SNPs considered in
our study are associated both agonistically and antagonistically with related as
well as unrelated disorders indicates that the molecular mechanisms influencing
causes and progress of human diseases are in part interrelated. Genetic overlaps
between two diseases also suggest the importance of the affected entities in the
specific pathogenic pathways and should be investigated further.
Genome-wide association study; Genetic overlap; Shared variant network; Disease comorbidity
Scientists working with single-nucleotide variants (SNVs), inferred by next-generation sequencing software, often need further information regarding true variants, artifacts and sequence coverage gaps. In clinical diagnostics, e.g. SNVs must usually be validated by visual inspection or several independent SNV-callers. We here demonstrate that 0.5–60% of relevant SNVs might not be detected due to coverage gaps, or might be misidentified. Even low error rates can overwhelm the true biological signal, especially in clinical diagnostics, in research comparing healthy with affected cells, in archaeogenetic dating or in forensics. For these reasons, we have developed a package called pibase, which is applicable to diploid and haploid genome, exome or targeted enrichment data. pibase extracts details on nucleotides from alignment files at user-specified coordinates and identifies reproducible genotypes, if present. In test cases pibase identifies genotypes at 99.98% specificity, 10-fold better than other tools. pibase also provides pair-wise comparisons between healthy and affected cells using nucleotide signals (10-fold more accurately than a genotype-based approach, as we show in our case study of monozygotic twins). This comparison tool also solves the problem of detecting allelic imbalance within heterozygous SNVs in copy number variation loci, or in heterogeneous tumor sequences.
Background & Aims
A limited number of genetic risk factors have been reported in primary sclerosing cholangitis (PSC). To discover further genetic susceptibility factors for PSC, we followed up on a second tier of single nucleotide polymorphisms (SNPs) from a genome-wide association study (GWAS).
We analyzed 45 SNPs in 1221 PSC cases and 3508 controls. The association results from the replication analysis and the original GWAS (715 PSC cases and 2962 controls) were combined in a meta-analysis comprising 1936 PSC cases and 6470 controls. We performed an analysis of bile microbial community composition in 39 PSC patients by 16S rRNA sequencing.
Seventeen SNPs representing 12 distinct genetic loci achieved nominal significance (Preplication<0.05) in the replication. The most robust novel association was detected at chromosome 1p36 (rs3748816; Pcombined=2.1×10−8) where the MMEL1 and TNFRSF14 genes represent potential disease genes. Eight additional novel loci showed suggestive evidence of association (Prepl<0.05). FUT2 at chromosome 19q13 (rs602662; Pcomb=1.9×10−6, rs281377; Pcomb = 2.1×10−6 and rs601338; Pcomb=2.7×10−6) is notable due to its implication in altered susceptibility to infectious agents. We found that FUT2 secretor status and genotype defined by rs601338 significantly influences biliary microbial community composition in PSC patients.
We identify multiple new PSC risk loci by extended analysis of a PSC GWAS. FUT2 genotype needs to be taken into account when assessing the influence from microbiota on biliary pathology in PSC.
primary sclerosing cholangitis; genome-wide association study; single nucleotide polymorphism; immunogenetics
Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions).
We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent ‘read-backmapping’ to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach.
We recommend applying our general ‘two-step’ mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results.
Two-stage mapping; Read-backmapping; Software performance; SNP discovery; Multiplexed targeted next-generation sequencing
Atopic dermatitis (AD) is a common chronic skin disease with high heritability. Apart from filaggrin (FLG), the genes influencing AD are largely unknown. We conducted a genome-wide association meta-analysis of 5,606 cases and 20,565 controls from 16 population-based cohorts and followed up the ten most strongly associated novel markers in a further 5,419 cases and 19,833 controls from 14 studies. Three SNPs met genome-wide significance in the discovery and replication cohorts combined: rs479844 upstream of OVOL1 (OR=0.88, p=1.1×10−13) and rs2164983 near ACTL9 (OR=1.16, p=7.1×10−9), genes which have been implicated in epidermal proliferation and differentiation, as well as rs2897442 in KIF3A within the cytokine cluster on 5q31.1 (OR=1.11, p=3.8×10−8). We also replicated the FLG locus and two recently identified association signals at 11q13.5 (rs7927894, p=0.008) and 20q13.3 (rs6010620, p=0.002). Our results underline the importance of both epidermal barrier function and immune dysregulation in AD pathogenesis.
While gliomas are the most common primary brain tumors, their etiology is largely unknown. To identify novel risk loci for glioma, we conducted genome-wide association (GWA) analysis of two case–control series from France and Germany (2269 cases and 2500 controls). Pooling these data with previously reported UK and US GWA studies provided data on 4147 glioma cases and 7435 controls genotyped for 424 460 common tagging single-nucleotide polymorphisms. Using these data, we demonstrate two statistically independent associations between glioma and rs11979158 and rs2252586, at 7p11.2 which encompasses the EGFR gene (population-corrected statistics, Pc = 7.72 × 10−8 and 2.09 × 10−8, respectively). Both associations were independent of tumor subtype, and were independent of EGFR amplification, p16INK4a deletion and IDH1 mutation status in tumors; compatible with driver effects of the variants on glioma development. These findings show that variation in 7p11.2 is a determinant of inherited glioma risk.
More than a thousand disease susceptibility loci have been identified via genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings generally remain to be defined. We utilize pooled next-generation sequencing to study 56 genes in regions associated to Crohn’s Disease in 350 cases and 350 controls. Follow up genotyping of 70 rare and low-frequency protein-altering variants (MAF ~ .001-.05) in nine independent case-control series (16054 CD patients, 12153 UC patients, 17575 healthy controls) identifies four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association to a novel, protective splice variant in CARD9 (p < 1e-16, OR ~ 0.29), as well as additional associations to coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by providing novel, rare, and likely functional variants that will empower functional experiments and predictive models.
Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease characterized by inflammation and fibrosis of the bile ducts. Both environmental and genetic factors contribute to its pathogenesis. To further clarify its genetic background, we investigated susceptibility loci recently identified for ulcerative colitis (UC) in a large cohort of 1186PSC patients and 1748 controls. Single nucleotide polymorphisms (SNPs) tagging 13 UC susceptibility loci were initially genotyped in 854 PSC patients and 1491 controls from the Benelux (331 cases, 735 controls), Germany (265 cases, 368 controls) and Scandinavia (258 cases, 388 controls). Subsequently, a joint analysis was performed with an independent second Scandinavian cohort (332 cases, 257 controls). SNPs at chromosomes2p16 (p value 4.12×10−4), 4q27 (p value 4.10×10−5) and 9q34 (p value 8.41×10−4) were associated with PSC in the joint analysis after correcting for multiple testing. In PSC patients without inflammatory bowel disease(IBD), SNPs at 4q27and9q34 were nominally associated (p<0.05). We applied additional in silico analyses to identify likely candidate genes at PSC susceptibility loci. To identify non-random, evidence-based links we used GRAIL analysis showing interconnectivity between genes in six out of in total nine PSC-associated regions. Expression quantitative trait analysis from 1469 Dutch and UK individuals demonstrated that five out of nine SNPs had an effect on cis-gene expression. These analyses prioritized IL2, CARD9 and REL as novel candidates.
We have identified three UC susceptibility loci to be associated with PSC, harboring the putative candidate genes REL, IL2 and CARD9. These results add to the scarce knowledge on the genetic background of PSC and imply an important role for both innate and adaptive immunological factors.
chronic liver disease; auto-immunity; inflammation; complex disease; genetics
Genetic variation in NOD2 and cigarette smoking are well-established risk factors for the development of Crohn's disease (CD). However, little is known about a potential interaction between these risk factors. We investigated gene-environment interactions between CD-associated NOD2 alleles and cigarette smoking in a large sample of patients with CD.
Three previously reported CD-associated variants in NOD2 (R702W, G908R, 1007fs) were genotyped in 1636 patients with CD continuously recruited between 1995 and 2010 based on physician referral. Data on history of smoking behaviour was obtained for all participants through a written questionnaire. Using a case-only design, we performed logistic regression analyses to investigate statistical interactions between NOD2 risk alleles and smoking status.
We detected a significant negative interaction between carriership of at least one of the NOD2 risk alleles and history of ever having smoked (OR = 0.71; p = 0.005) as well as smoking at the time of CD diagnosis (OR = 0.68; p = 0.005). Subsequent separate analyses of the three variants revealed a significant negative interaction between the 1007fs variant and history of ever having smoked (OR = 0.64; p = 9 × 10-4) and smoking at the time of CD diagnosis (OR = 0.53; p = 7 × 10-5).
The observed significant negative gene-environment interaction suggests that the risk increase for CD conferred simultaneously by cigarette smoking and the 1007fs NOD2 polymorphism is smaller than expected and may point to a biological interaction. Our findings warrant further investigation in epidemiological and functional studies to elucidate pathophysiology as well as to aid in the development of recommendations for disease prevention.
Gene-environment interaction; Case-only; Crohn's disease; NOD2/CARD15; Cigarette smoking