Genome-wide association studies have revealed several common genetic risk variants for ulcerative colitis (UC). However, little is known about the contribution of rare, large effect genetic variants to UC susceptibility. In this study, we performed a deep targeted re-sequencing of 122 genes in Dutch UC patients in order to investigate the contribution of rare variants to the genetic susceptibility to UC. The selection of genes consists of 111 established human UC susceptibility genes and 11 genes that lead to spontaneous colitis when knocked-out in mice. In addition, we sequenced the promoter regions of 45 genes where known variants exert cis-eQTL-effects. Targeted pooled re-sequencing was performed on DNA of 790 Dutch UC cases. The Genome of the Netherlands project provided sequence data of 500 healthy controls. After quality control and prioritization based on allele frequency and pathogenicity probability, follow-up genotyping of 171 rare variants was performed on 1021 Dutch UC cases and 1166 Dutch controls. Single-variant association and gene-based analyses identified an association of rare variants in the MUC2 gene with UC. The associated variants in the Dutch population could not be replicated in a German replication cohort (1026 UC cases, 3532 controls). In conclusion, this study has identified a putative role for MUC2 on UC susceptibility in the Dutch population and suggests a population-specific contribution of rare variants to UC.
Ulcerative colitis and Crohn’s disease are the two main forms of inflammatory bowel disease (IBD). Here, we report the first trans-ethnic association study of IBD, with genome-wide or Immunochip genotype data from an extended cohort of 86,640 European individuals and Immunochip data from 9,846 individuals of East-Asian, Indian or Iranian descent. We implicate 38 loci in IBD risk for the first time. For the majority of IBD risk loci, the direction and magnitude of effect is consistent in European and non-European cohorts. Nevertheless, we observe genetic heterogeneity between divergent populations at several established risk loci driven by a combination of differences in allele frequencies (NOD2), effect sizes (TNFSF15, ATG16L1) or a combination of both (IL23R, IRGM). Our results provide biological insights into the pathogenesis of IBD, and demonstrate the utility of trans-ethnic association studies for mapping complex disease loci and understanding genetic architecture across diverse populations.
AIM: To validate the Montreal classification system for Crohn’s disease (CD) and ulcerative colitis (UC) within the Netherlands.
METHODS: A selection of 20 de-identified medical records with an appropriate representation of the inflammatory bowel disease (IBD) sub phenotypes were scored by 30 observers with different professions (gastroenterologist specialist in IBD, gastroenterologist in training and IBD-nurses) and experience level with IBD patient care. Patients were classified according to the Montreal classification. In addition, participants were asked to score extra-intestinal manifestations (EIM) and disease severity in CD based on their clinical judgment. The inter-observer agreement was calculated by percentages of correct answers (answers identical to the “expert evaluation”) and Fleiss-kappa (κ). Kappa cut-offs: < 0.4-poor; 0.41-0.6-moderate; 0.61-0.8-good; > 0.8 excellent.
RESULTS: The inter-observer agreement was excellent for diagnosis (κ = 0.96), perianal disease (κ = 0.92) and disease location in CD (κ = 0.82) and good for age of onset (κ = 0.67), upper gastrointestinal disease (κ = 0.62), disease behaviour in CD (κ = 0.79) and disease extent in UC (κ = 0.65). Disease severity in UC was scored poor (κ = 0.23). The additional items resulted in a good inter-observer agreement for EIM (κ = 0.68) and a moderate agreement for disease severity in CD (κ = 0.44). Percentages of correct answers over all Montreal items give a good reflection of the inter-observer agreement (> 80%), except for disease severity (48%-74%). IBD-nurses were significantly worse in scoring upper gastrointestinal disease in CD compared to gastroenterologists (P = 0.008) and gastroenterologists in training (P = 0.040). Observers with less than 10 years of experience were significantly better at scoring UC severity than observers with 10-20 years (P = 0.003) and more than 20 years (P = 0.003) of experience with IBD patient care. Observers with 10-20 years of experience with IBD patient care were significantly better at scoring upper gastrointestinal disease in CD than observers with less than 10 years (P = 0.007) and more than 20 years (P = 0.007) of experience with IBD patient care.
CONCLUSION: We found a good to excellent inter-observer agreement for all Montreal items except for disease severity in UC (poor).
Crohn’s disease; Ulcerative colitis; Montreal classification; Phenotypes- inter-observer agreement
The progression of liver fibrosis in response to chronic injury varies considerably among individual patients. The underlying genetics is highly complex due to large numbers of potential genes, environmental factors and cell types involved. Here, we provide the first toxicogenomic analysis of liver fibrosis induced by carbon tetrachloride in the murine ‘genetic reference panel’ of recombinant inbred BXD lines. Our aim was to define the core of risk genes and gene interaction networks that control fibrosis progression. Liver fibrosis phenotypes and gene expression profiles were determined in 35 BXD lines. Quantitative trait locus (QTL) analysis identified seven genomic loci influencing fibrosis phenotypes (pQTLs) with genome-wide significance on chromosomes 4, 5, 7, 12, and 17. Stepwise refinement was based on expression QTL mapping with stringent selection criteria, reducing the number of 1,351 candidate genes located in the pQTLs to a final list of 11 cis-regulated genes. Our findings demonstrate that the BXD reference population represents a powerful experimental resource for shortlisting the genes within a regulatory network that determine the liver's vulnerability to chronic injury.
Obesity-associated organ-specific pathological states can be ensued from the dysregulation of the functions of the adipose tissues, liver and muscle. However, the influence of genetic differences underlying gross-compositional differences in these tissues is largely unknown. In the present study, the analytical method of ATR-FTIR spectroscopy has been combined with a genetic approach to identify genetic differences responsible for phenotypic alterations in adipose, liver and muscle tissues.
Mice from 29 BXD recombinant inbred mouse strains were put on high fat diet and gross-compositional changes in adipose, liver and muscle tissues were measured by ATR-FTIR spectroscopy. The analysis of genotype-phenotype correlations revealed significant quantitative trait loci (QTL) on chromosome 12 for the content of fat and collagen, collagen integrity, and the lipid to protein ratio in adipose tissue and on chromosome 17 for lipid to protein ratio in liver. Using gene expression and sequence information, we suggest Rsad2 (viperin) and Colec11 (collectin-11) on chromosome 12 as potential quantitative trait candidate genes. Rsad2 may act as a modulator of lipid droplet contents and lipid biosynthesis; Colec11 might play a role in apoptopic cell clearance and maintenance of adipose tissue. An increased level of Rsad2 transcripts in adipose tissue of DBA/2J compared to C57BL/6J mice suggests a cis-acting genetic variant leading to differential gene activation.
The results demonstrate that the analytical method of ATR-FTIR spectroscopy effectively contributed to decompose the macromolecular composition of tissues that accumulate fat and to link this information with genetic determinants. The candidate genes in the QTL regions may contribute to obesity-related diseases in humans, in particular if the results can be verified in a bigger BXD cohort.
Collagen; Endoplasmic reticulum; Apoptosis; Remodeling; Liver steatosis; Viperin; Collectin-11
There is strong but mostly circumstantial evidence that genetic factors modulate the severity of influenza infection in humans. Using genetically diverse but fully inbred strains of mice it has been shown that host sequence variants have a strong influence on the severity of influenza A disease progression. In particular, C57BL/6J, the most widely used mouse strain in biomedical research, is comparatively resistant. In contrast, DBA/2J is highly susceptible.
To map regions of the genome responsible for differences in influenza susceptibility, we infected a family of 53 BXD-type lines derived from a cross between C57BL/6J and DBA/2J strains with influenza A virus (PR8, H1N1). We monitored body weight, survival, and mean time to death for 13 days after infection. Qivr5 (quantitative trait for influenza virus resistance on chromosome 5) was the largest and most significant QTL for weight loss. The effect of Qivr5 was detectable on day 2 post infection, but was most pronounced on days 5 and 6. Survival rate mapped to Qivr5, but additionally revealed a second significant locus on chromosome 19 (Qivr19). Analysis of mean time to death affirmed both Qivr5 and Qivr19. In addition, we observed several regions of the genome with suggestive linkage. There are potentially complex combinatorial interactions of the parental alleles among loci. Analysis of multiple gene expression data sets and sequence variants in these strains highlights about 30 strong candidate genes across all loci that may control influenza A susceptibility and resistance.
We have mapped influenza susceptibility loci to chromosomes 2, 5, 16, 17, and 19. Body weight and survival loci have a time-dependent profile that presumably reflects the temporal dynamic of the response to infection. We highlight candidate genes in the respective intervals and review their possible biological function during infection.
Regulatory T cells (Tregs) play an essential role in the control of the immune response. Treg cells represent important targets for therapeutic interventions of the immune system. Therefore, it will be very important to understand in more detail which genes are specifically activated in Treg cells versus T helper (Th) cells, and which gene regulatory circuits may be involved in specifying and maintaining Treg cell homeostasis.
We isolated Treg and Th cells from a genetically diverse family of 31 BXD type recombinant inbred strains and the fully inbred parental strains of this family--C57BL/6J and DBA/2J. Subsequently genome-wide gene expression studies were performed from the isolated Treg and Th cells. A comparative analysis of the transcriptomes of these cell populations allowed us to identify many novel differentially expressed genes. Analysis of cis- and trans-expression Quantitative Trait Loci (eQTLs) highlighted common and unique regulatory mechanisms that are active in the two cell types. Trans-eQTL regions were found for the Treg functional genes Nrp1, Stat3 and Ikzf4. Analyses of the respective QTL intervals suggested several candidate genes that may be involved in regulating these genes in Treg cells. Similarly, possible candidate genes were found which may regulate the expression of F2rl1, Ctla4, Klrb1f. In addition, we identified a focused group of candidate genes that may be important for the maintenance of self-tolerance and the prevention of allergy.
Variation of expression across the strains allowed us to find many novel gene-interaction networks in both T cell subsets. In addition, these two data sets enabled us to identify many differentially expressed genes and to nominate candidate genes that may have important functions for the maintenance of self-tolerance and the prevention of allergy.
During a meeting of the SYSGENET working group ‘Bioinformatics’, currently available software tools and databases for systems genetics in mice were reviewed and the needs for future developments discussed. The group evaluated interoperability and performed initial feasibility studies. To aid future compatibility of software and exchange of already developed software modules, a strong recommendation was made by the group to integrate HAPPY and R/qtl analysis toolboxes, GeneNetwork and XGAP database platforms, and TIQS and xQTL processing platforms. R should be used as the principal computer language for QTL data analysis in all platforms and a ‘cloud’ should be used for software dissemination to the community. Furthermore, the working group recommended that all data models and software source code should be made visible in public repositories to allow a coordinated effort on the use of common data structures and file formats.
QTL mapping; database; mouse; systems genetics
The lung is critical in surveillance and initial defense against pathogens. In humans, as in mice, individual genetic differences strongly modulate pulmonary responses to infectious agents, severity of lung disease, and potential allergic reactions. In a first step towards understanding genetic predisposition and pulmonary molecular networks that underlie individual differences in disease vulnerability, we performed a global analysis of normative lung gene expression levels in inbred mouse strains and a large family of BXD strains that are widely used for systems genetics. Our goal is to provide a key community resource on the genetics of the normative lung transcriptome that can serve as a foundation for experimental analysis and allow predicting genetic predisposition and response to pathogens, allergens, and xenobiotics.
Steady-state polyA+ mRNA levels were assayed across a diverse and fully genotyped panel of 57 isogenic strains using the Affymetrix M430 2.0 array. Correlations of expression levels between genes were determined. Global expression QTL (eQTL) analysis and network covariance analysis was performed using tools and resources in GeneNetwork http://www.genenetwork.org.
Expression values were highly variable across strains and in many cases exhibited a high heri-tability factor. Several genes which showed a restricted expression to lung tissue were identified. Using correlations between gene expression values across all strains, we defined and extended memberships of several important molecular networks in the lung. Furthermore, we were able to extract signatures of immune cell subpopulations and characterize co-variation and shared genetic modulation. Known QTL regions for respiratory infection susceptibility were investigated and several cis-eQTL genes were identified. Numerous cis- and trans-regulated transcripts and chromosomal intervals with strong regulatory activity were mapped. The Cyp1a1 P450 transcript had a strong trans-acting eQTL (LOD 11.8) on Chr 12 at 36 ± 1 Mb. This interval contains the transcription factor Ahr that has a critical mis-sense allele in the DBA/2J haplotype and evidently modulates transcriptional activation by AhR.
Large-scale gene expression analyses in genetic reference populations revealed lung-specific and immune-cell gene expression profiles and suggested specific gene regulatory interactions.
Quantitative trait locus (QTL) mapping identifies genomic regions that likely contain genes regulating a quantitative trait. However, QTL regions may encompass tens to hundreds of genes. To find the most promising candidate genes that regulate the trait, the biologist typically collects information from multiple resources about the genes in the QTL interval. This process is very laborious and time consuming.
QTLminer is a bioinformatics tool that automatically performs QTL region analysis. It is available in GeneNetwork and it integrates information such as gene annotation, gene expression and sequence polymorphisms for all the genes within a given genomic interval.
QTLminer substantially speeds up discovery of the most promising candidate genes within a QTL region.
The analysis of expression quantitative trait loci (eQTL) is a potentially powerful way to detect transcriptional regulatory relationships at the genomic scale. However, eQTL data sets often go underexploited because legacy QTL methods are used to map the relationship between the expression trait and genotype. Often these methods are inappropriate for complex traits such as gene expression, particularly in the case of epistasis.
Here we compare legacy QTL mapping methods with several modern multi-locus methods and evaluate their ability to produce eQTL that agree with independent external data in a systematic way. We found that the modern multi-locus methods (Random Forests, sparse partial least squares, lasso, and elastic net) clearly outperformed the legacy QTL methods (Haley-Knott regression and composite interval mapping) in terms of biological relevance of the mapped eQTL. In particular, we found that our new approach, based on Random Forests, showed superior performance among the multi-locus methods.
Benchmarks based on the recapitulation of experimental findings provide valuable insight when selecting the appropriate eQTL mapping method. Our battery of tests suggests that Random Forests map eQTL that are more likely to be validated by independent data, when compared to competing multi-locus and legacy eQTL mapping methods.
XGAP, a software platform for the integration and analysis of genotype and phenotype data.
We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data. Current functionality includes tools ranging from eQTL analysis in mouse to genome-wide association studies in humans.
The integration of information present in many disparate biological databases represents a major challenge in biomedical research. To define the problems and needs, and to explore strategies for database integration in mouse functional genomics, we consulted the biologist user community and implemented solutions to two user-defined use-cases.
We organised workshops, meetings and used a questionnaire to identify the needs of biologist database users in mouse functional genomics. As a result, two use-cases were developed that can be used to drive future designs or extensions of mouse databases. Here, we present the use-cases and describe some initial computational solutions for them. The application for the gene-centric use-case, "MUSIG-Gen" starts from a list of gene names and collects a wide range of data types from several distributed databases in a "shopping cart"-like manner. The iterative user-driven approach is a response to strongly articulated requests from users, especially those without computational biology backgrounds. The application for the phenotype-centric use-case, "MUSIG-Phen", is based on a similar concept and starting from phenotype descriptions retrieves information for associated genes.
The use-cases created, and their prototype software implementations should help to better define biologists' needs for database integration and may serve as a starting point for future bioinformatics solutions aimed at end-user biologists.
Many investigations have reported the successful mapping of quantitative trait loci (QTLs) for gene expression phenotypes (eQTLs). Local eQTLs, where expression phenotypes map to the genes themselves, are of especially great interest, because they are direct candidates for previously mapped physiological QTLs. Here we show that many mapped local eQTLs in genetical genomics experiments do not reflect actual expression differences caused by sequence polymorphisms in cis-acting factors changing mRNA levels. Instead they indicate hybridization differences caused by sequence polymorphisms in the mRNA region that is targeted by the microarray probes. Many such polymorphisms can be detected by a sensitive and novel statistical approach that takes the individual probe signals into account. Applying this approach to recent mouse and human eQTL data, we demonstrate that indeed many local eQTLs are falsely reported as “cis-acting” or “cis” and can be successfully detected and eliminated with this approach.
The Affymetrix GeneChip technology uses multiple probes per gene to measure its expression level. Individual probe signals can vary widely, which hampers proper interpretation. This variation can be caused by probes that do not properly match their target gene or that match multiple genes. To determine the accuracy of Affymetrix arrays, we developed an extensive verification protocol, for mouse arrays incorporating the NCBI RefSeq, NCBI UniGene Unique, NIA Mouse Gene Index, and UCSC mouse genome databases.
Applying this protocol to Affymetrix Mouse Genome arrays (the earlier U74Av2 and the newer 430 2.0 array), the number of sequence-verified probes with perfect matches was no less than 85% and 95%, respectively; and for 74% and 85% of the probe sets all probes were sequence verified. The latter percentages increased to 80% and 94% after discarding one or two unverifiable probes per probe set, and even further to 84% and 97% when, in addition, allowing for one or two mismatches between probe and target gene. Similar results were obtained for other mouse arrays, as well as for human and rat arrays. Based on these data, refined chip definition files for all arrays are provided online. Researchers can choose the version appropriate for their study to (re)analyze expression data.
The accuracy of Affymetrix probe sequences is higher than previously reported, particularly on newer arrays. Yet, refined probe set definitions have clear effects on the detection of differentially expressed genes. We demonstrate that the interpretation of the results of Affymetrix arrays is improved when the new chip definition files are used.