|Home | About | Journals | Submit | Contact Us | Français|
The rapid decline in the cost of dense genotyping is paving the way for new DNA sequence-based laboratory tests to move quickly into clinical practice, and to ultimately help realize the promise of ‘personalized’ therapies. These advances are based on the growing appreciation of genetics as an important dimension in science and the practice of investigative pharmacology and toxicology. On the clinical side, both the regulators and the pharmaceutical industry hope that the early identification of individuals prone to adverse drug effects will keep advantageous medicines on the market for the benefit of the vast majority of prospective patients. On the environmental health protection side, there is a clear need for better science to define the range and causes of susceptibility to adverse effects of chemicals in the population, so that the appropriate regulatory limits are established. In both cases, most of the research effort is focused on genome-wide association studies in humans where de novo genotyping of each subject is required. At the same time, the power of population-based preclinical safety testing in rodent models (e.g., mouse) remains to be fully exploited. Here, we highlight the approaches available to utilize the knowledge of DNA sequence and genetic diversity of the mouse as a species in mechanistic toxicology research. We posit that appropriate genetically defined mouse models may be combined with the limited data from human studies to not only discover the genetic determinants of susceptibility, but to also understand the molecular underpinnings of toxicity.
One of the goals of mechanistic toxicology is the discovery of the biochemical mechanisms underlying toxicity responses in humans. This research leads to the discovery of genes and polymorphisms responsible for toxicity and an understanding of how molecular pathways are altered in response to xenobiotic exposure. Recent advances in high-throughput methods of gathering biological data and in computational power to manipulate these datasets has changed the manner in which this research can be conducted. While toxicology will always rely upon systematic in vitro and in vivo investigation of model systems, the generation of new hypotheses regarding the genetic causes of toxicity has benefited greatly from the sequencing of mammalian genomes and the invention of gene-expression microarrays. The sequencing of several mammalian genomes, including mouse  and human , has opened up new methods for investigating the genetic basis of toxicity responses. Gene-expression microarrays have produced a global view of transcriptional changes in response to xenobiotic exposure and have supported the discovery of gene clusters, which may be part of the same biological pathway, that are related to injury. The combination of these two developments has opened up new approaches to the understanding of toxicity as it is affected by genetic variability .
Recent advances in technologies that permit human genome-wide association studies (GWAS) with more than 1 million SNP have enabled the identification of genetic variants associated with important diseases . In the past few years, more than 400 GWAS studies have been published, establishing a knowledge base that links hundreds of genetic variants to complex human diseases, as well as providing valuable insights into the complexities of their genetic architecture . These studies have not been as successful as hoped for in the prediction of individual risk for developing a disease, but have been successful for identifying plausible molecular causes underlying polygenic diseases and traits. It is noteworthy that GWAS have ‘found’ many genes that have been known to be important in the pathogenesis of the relevant diseases .
With regards to adverse drug reactions, a number of important advances have also been made over the past decade with the help of GWAS approaches . GWAS have identified gene targets for approved drugs, including thiazolidine diones and sulfonylureas, statins and estrogens [8,9]. The association between MHC alleles, especially HLA-B*5701, and susceptibility to adverse drug reactions manifesting through a diverse set of clinical phenotypes is one of the most intriguing stories in pharmacogenomics [10-12]. Genetic screening is now advised or recommended for a number of drugs on the market with known adverse drug reactions, and the introduction of new clinical tests is likely to intensify as the ongoing trials make their way to peer review .
Although GWAS are increasingly popular and affordable, they present a number of formidable logistical and technical challenges to the conduct of the studies and in the interpretation of the results . These include the challenge of selecting a well-defined disease or trait suitable for analysis, the requirement for sufficiently large sample sizes, and the fact that most common variants, individually or in combination, confer relatively small increments in risk (up to 1.5-fold) and explain only a small proportion of heritability [5,14]. Human-only GWAS are also likely to remain expensive in terms of recruitment and characterization (both phenotyping and genotyping) of the study cohorts.
The limitations of human GWAS studies may be alleviated, at least partially, by the use of appropriate genetically-defined model systems. Cell-based models have been extensively used in preclinical drug development for years as a means to evaluate drug-induced toxicity or to identify interactions of target compounds with drug-metabolizing enzymes and transporter proteins. Importantly, the availability of a large bank of commercially available and densely genotyped lymphoblastoid cell lines from the Centre d’Etude du Polymorphisme Humain (Paris, France) and Coriell Institute for Medical Research (NJ, USA) shows promise for in vitro pharmacogenetics research . Rats are commonly used in drug safety evaluation experiments, and the genetically diversified inbred rat strains are being developed  and used for population-based toxicity testing . However, among the mammalian organism-based laboratory models, the mouse offers an unparalleled wealth of genetic knowledge and resources, among which is a high-density SNP database encompassing more than 8 million polymorphic loci across hundreds of inbred strains [18,19].
Mouse genetic studies can complement many shortcomings of both in vitro and human-only approaches in pharmacogenomics. For example, collection of tissues from a wide variety of anatomical sites (e.g., brain and heart) or developmental stages is problematic in humans, as are many experimental interventions. A successful GWAS analysis is more likely when the phenotype of interest can be sensitively and specifically defined and measured, which is usually not a limitation in a mouse system. With regards to the environmental health sciences and toxicology, controlled exposure of people to environmental toxicants is often ethically unacceptable, which makes it challenging to interpret genetic associations produced in human cohorts exposed in the occupational or environmental setting without validation in animal studies.
In vivo toxicity screens and mechanistic studies are often carried out in a single strain of mouse [20-22]. This is done in order to fix as many variables as possible and has the benefit of standardizing the genotype across multiple chemicals. While this approach provides mechanistic information regarding toxicant activity in a single genetic background, the reality of human toxicity is more complex, including both diverse genetic backgrounds and uncontrolled environmental effects. The interpretation of the data with respect to the population-wide effects is plagued by the largely inaccurate generalizations from a single genome; inability to distinguish small and biologically important changes from background variation; ineffective exploitation of reproducible genetic variation to dissect differential response to chemical exposure; and inefficient use of defined genetic backgrounds to model particular phenotypic profiles observed in human populations. To address these important limitations, various animal models are being used by toxicologists to assess gene–environment interactions and determine genetic causes of interindividual variability in toxicity. Indeed, genetic background is an important component of toxicity responses [23,24], and a successful in vivo approach to modeling the effects of genetic diversity on toxicity will improve both the prediction of toxicity in humans as well as the identification of sensitive subpopulations.
Panels of genetically defined animals that provide a fixed genotype within a particular strain but encompass great genetic diversity across strains are being used in biomedical research . Both standard intercrosses between inbred lines of mice, and large populations of inbred strains have been used as powerful tools for mapping quantitative trait loci (QTL). Inbred mouse strains represent fixed, renewable genotypes that are ideally suited for system biology approaches to improve our understanding of the mechanisms of toxicity and discovery of new biomarkers associated with biological responses to toxicant exposure. Panels of inbred mouse strains are also well-suited for identifying whole-genome response signatures indicative of chemical exposure because of the large knowledge base on the genetic lineage for hundreds of strains, and because the number and distribution of genetic polymorphisms among mouse strains is equal to or exceeds that in the human population [26,27]. This approach also has the added advantage of ‘repeat testing’ in genetically identical individuals within a given strain, yielding important information regarding reproducibility of the response.
When a research study into the genetic basis of toxicity is initiated, the responsible genes being sought could lie anywhere in the genome. A forward genetics approach, in which the genetic basis of toxicity is investigated, is a reasonable approach to this problem. The first step is to search for evidence that the responsible genes lie in certain chromosomal regions by detecting correlations between the toxicity phenotype and genotype at loci throughout the genome. This is often carried out using QTL mapping [28,29]. This involves selecting or breeding a genetically segregating population, such as a backcross or intercross between two inbred parental strains that demonstrate quantitative variation in the toxicity phenotype of interest. The quantitative phenotype, such as liver histology score or serum alkaline aminotransferase levels, is measured in each individual, who is then genotyped at a number of genetic markers across the genome. A statistic of association, such as a likelihood ratio statistic or a linear model, is then calculated between the phenotypic values and each marker.
The diversity of the genotypes archived in different inbred mouse strains is ideally suited to identify and dissect genetic susceptibility in responses to toxicant exposure. To advance the understanding of health risks posed by toxicants, and the role that genetic diversity plays in determining the variability of responses between individuals and species, panels of inbred mouse strains can be used to demonstrate the benefits afforded by combining mechanistic toxicology with genetics.
A translational study, whereby candidate genes for susceptibility to toxicant-induced liver injury were discovered in a mouse population and subsequently validated in two independent human cohorts, was recently demonstrated using a well-known liver toxicant, acetaminophen . A traditional ‘human-only’ approach to a genome-wide pharmacogenetic investigation into the genetic factors linked to liver toxicity of acetaminophen would require a much larger number of individuals to overcome statistical power limitations . Conversely, a so-called ‘candidate gene’ analysis  is equally challenging owing to the complexity of the mechanism of toxicity and the inherent biases in candidate gene selection . By utilizing a mouse model for acetaminophen-induced liver toxicity, whole-genome association analysis and targeted sequencing, polymorphisms in Ly86, Cd44, Cd59a and Capn8 were identified that correlate strongly with liver injury . Furthermore, these candidates were validated in two independent human cohorts where volunteers were exposed to the maximum recommended doses of acetaminophen. This study demonstrated that variation in the orthologous human gene, CD44, is associated with susceptibility to acetaminophen. Interestingly, well-characterized genes known to be essential for acetaminophen toxicity did not correlate with liver injury in the panel of mouse strains. This finding suggests that while a priori knowledge of the toxicant’s mode of action can be useful in the selection of genes for follow-up analysis, validation of susceptibility-modulating genes in the laboratory is essential. This finding also illustrates the important difference between genes involved in mechanistic pathways leading to toxicant-induced injury and genes whose variants contribute to interindividual differences in susceptibility to toxicity, two areas that have potentially different gene sets.
It is noteworthy that the top candidate genes derived from the analysis of the mouse population were related to the immune response, and not to metabolism and detoxification of acetaminophen. Interestingly, in several cytokine knockout mouse studies of acetaminophen toxicity, the sensitivity to liver necrosis was also largely independent of glutathione depletion . The traditional view on the mechanisms of toxicity, the approach widely utilized to predict individual responses to toxicants, would imply that the metabolism of acetaminophen to the reactive electrophile and/or its subsequent detoxification by glutathione conjugation should explain, at least to a considerable degree, the variability in responses. However, no apparent correlation between the levels of major metabolizing enzymes, glutathione, or acetaminophen plasma levels and liver injury was observed in the mouse population. Similarly, no correlation with sensitivity for polymorphisms in the genes encoding catalase or CYP2E1 was found, implying that variation at these key mediators of acetaminophen toxicity probably do not contribute to differential susceptibility.
It is worth noting that the variations in drug exposure and metabolism profile have been shown to be common causes of difference in adverse effects of chemicals across species and strains. For example, the well-described human interindividual variability in the metabolism of warfarin, specifically the generation of 7-hydroxywarfarin, was reproduced in a panel of inbred mouse strains , and it was determined that the phenotypic differences were associated with the polymorphisms in the Cyp2c locus. In addition, a study that used liver microsomes isolated from the panel of mouse strains demonstrated that genetic variation in Cyp2b9 and Ugt1a loci played a role in the oxidative metabolism of α-hydroxytestosterone and glucuronidation of irinotecan, respectively . Thus, it should be emphasized that exposure levels should always be assessed when using populations of strains for genetic biomarker identification.
Not every adverse drug effect in humans may be genotype dependent; thus, a multistrain approach may also prove useful for understanding genotype-independent toxicity responses and facilitate the identification of novel targets of therapeutic intervention that will be effective in the entire population. When liver gene-expression levels were assessed across strains exposed to acetaminophen, it was determined that the genes associated with the level of liver necrosis, independent of the genetic background, were involved in cell death pathways and form a closely linked molecular network . This finding confirms a central role for cell death-inducing intracellular cell signaling in acetaminophen-induced liver toxicity [32,36].
The power of mouse-to-human translation studies that use mouse genetic tools has also been shown through the QTL analysis of pulmonary responses to the air pollutant ozone. Ozone causes highly reproducible changes in pulmonary function in humans, and significant interindividual variation in the responses have suggested that genetic background is an important determinant of susceptibility to ozone-induced toxicity . Similarly, significant variation in ozone-induced pulmonary injury and inflammatory responses has been found among inbred strains of mice [38-40]. Both F2 and backcross studies utilizing differentially responsive strains were used to discover a number of candidate QTLs for responsiveness to ozone [38,39]. These QTLs guided the selection of the candidate genes and loci for validation not only in subsequent mouse studies [41-43], but two homologs of the mouse susceptibility genes (TNF and HLA-DR) have also been associated with response to ozone in humans [44,45]. Similar QTL mapping approaches have been used to investigate many clinical phenotypes, including alcohol-related behavior , alcohol metabolism  and iron transport .
These studies indicate that the use of an inbred mouse strain panel may be a useful tool for understanding the mode of action of toxic agents and the identification of nodes in the complex molecular events that may confer susceptibility to adverse events. When genotype-dependent chemical-induced toxicity in human population is identified or suspected, this approach has several benefits. First, potential genetic biomarkers may be developed to prescreen individuals prior to therapeutic drug treatment when potential adverse drug events are suspected. If the genes associated with differential susceptibility to toxicity are identified in a preclinical phase, the subsequent pharmacogenetics research may be focused on a few candidates to help overcome the challenge of small cohort size in human studies and to shorten the validation period. The data acquired with this model could therefore be influential in the analysis of individual risk to chemicals and may facilitate both drug development and human safety endeavors. At the same time, such an approach may not be fruitful in safety assessment of experimental drug candidates for which the risk in humans has yet to be determined.
Second, the genetic variation among individuals is reflected in variation in gene-expression levels [49,50], which introduces additional challenges into toxicology research on biomarkers of effect. While major research efforts are seeking genetic and genomic markers that could identify individuals susceptible to toxicity, less attention is given to the fact that genetic control of gene expression may present a challenge for finding robust population-wide expression biomarkers of toxicity responses . Indeed, it is seldom appreciated in the analysis of gene-expression data that the genetic difference between individuals is by far the strongest effect on global gene expression at both basal levels and even when a considerable amount of tissue damage is present . Thus, a careful evaluation of gene-expression-based biomarkers of response through multistrain experiments can avert the risk of mistakenly identifying large genotype effects in a particular strain of animals used for toxicity testing as the effects of treatment.
Determining which of the multitude of variants carried by an individual are responsible for a given phenotype represents a massive task, especially if the causal alleles are relatively anonymous in terms of known functional consequences. The best approaches for combining functional credibility and statistical support in the evaluation of such variants remain to be determined. GWAS tend to focus almost exclusively on statistical evidence and give lesser weight to considerations of biological plausibility, but the challenges of finding causal associations among the large number of rare variants may prompt a more careful examination of the underlying biology .
Toxicogenomics has been used at all stages of chemical risk assessment, and it is thought that gene-expression changes may be utilized as biomarkers of adverse effects . Current approaches often attempt to classify compounds with the goals of predicting adverse responses to specific chemical classes , understanding the underlying biological mechanism of toxicity , or identifying key nodes in the toxicity pathway that may serve as effect biomarkers . Extensive proprietary [56,57] and public  databases containing gene-expression and pathological end points derived from rodent and human tissues exposed to a variety of chemicals have been developed, thereby allowing the scientific community to mine the data for biomarkers.
Gene expression QTL (eQTL) mapping is one of the modern tools that support the evaluation of associations between transcript expression and genotype in order to find genomic locations that are likely to regulate transcript expression. The availability of gene-expression and high-density genotype data has enabled eQTL mapping in animal and human populations. These analyses have contributed significantly to our understanding of the effects that genetic polymorphisms may have on interindividual variability in normal physiological processes, in multiple tissues, and in both animals and humans [50-61]. Furthermore, these studies have shown that genetic regulation of gene expression is a key contributor to population diversity, and is being realized not only through transcription factors and subtle variations in sequence of their response elements, but also through previously unknown mechanisms. While eQTL mapping is clearly an important new frontier in the application of ’omics technologies to biomedical research, no current approaches are available for the evaluation of the potential role of eQTLs in the response to environmental exposures or the pathogenesis of common diseases.
Early eQTL studies surveyed natural variation in crosses of model organisms such as budding yeast [62,63] and mouse . In the mouse, two inbred parental stains were selected and bred in either a backcross or intercross design. All progeny mice are genotyped at a density sufficient to distinguish all recombination blocks, and microarrays were used to measure transcript expression. Previous studies reported significant numbers of eQTL (~9% of the transcripts surveyed) and demonstrated that there are genomic loci that contain more eQTL than expected by chance. These eQTL ‘hotspots’ are thought to regulate the expression levels of dozens of transcripts  and have been observed in several tissues in the mouse [49,50,59]. By examining the genes that lie beneath the eQTL hotspot, investigators can propose regulatory candidates for the transcripts with eQTL at the hotspot. eQTL studies also commonly identify both cis-acting eQTL , for which a transcript’s eQTL is located near the transcript itself in the genome, and trans-acting eQTL, for which a transcript’s eQTL is located far from the transcript. It has been hypothesized that cis-eQTL are caused by polymorphisms in regulatory regions close to the transcript itself, whereas trans-eQTL are caused by polymorphisms in distant genes that affect transcript expression, either directly or indirectly, in an allele-specific manner.
Indeed, while the study of individual genes is informative and can improve our understanding of the causes of differential toxicity in populations, a broader approach that focuses on gene networks and biological pathways may produce more interpretable results. eQTL mapping can be used to generate hypotheses regarding transcriptional regulation and can be integrated with gene coexpression data to discover gene networks or pathways that are associated with a clinical trait. For example, transcript expression data in the livers of a panel of C57BL/6J × DBA/2J F2 mice was combined with obesity data, and eQTL mapping was used to identify causal gene-expression networks [65,66]. eQTL data may also be combined with estimates of transcription factor activity to infer causal relationships between transcription factors and clusters of eQTL genes . Other methods to infer causality between regulatory candidate genes under eQTL hotspots and the trans-regulated genes that map to the eQTL locus have also been proposed to assist in narrowing the list of candidate genes for further biological investigation [64,68].
Network-based approaches have been used in research into Type I diabetes and heart disease, and have shown the power of integrating human data with data derived from mouse models. A GWAS in a large human population proposed the receptor typrosine kinase ERBB3 as the best candidate gene near a QTL for Type I diabetes . Separate work that examined liver gene expression in a smaller cohort of human samples with and without Type I diabetes found that ERBB3 did not have a cis-eQTL but that a flanking gene, RPS26, did. Since the disease phenotype and RPS26 both had QTLs in the same location, this suggested the RPS26 was a stronger candidate than ERBB3. The authors then used mouse liver and adipose expression data from several mouse crosses to construct causal expression networks for the ERBB3 and RPS26 orthologs in the mouse. They were able to show that ERBB3 is not associated with any known Type I diabetes genes, whereas RPS26 is associated with a network of several genes that are part of the Kyoto Encyclopedia of Genes and Genomes (KEGG) Type I diabetes pathway . This type of analysis demonstrates the power of combining human and mouse data with a network-based approach that has been proposed for use in drug discovery  and may prove useful in toxicology studies.
It should be noted that the accuracy of cis-eQTL detection has been called into question owing to the possibility of SNPs residing within the sequence queried by microarray probes [64,71,72]. Microarray probes for mice are designed based on the reference sequence of C57BL/6J. Transcripts in other strains with polymorphisms in the probe sequence will bind with lower affinity than the C57BL/6J transcript, giving the false appearance of allele-specific expression levels associated with the transcript location, which is the defining characteristic of a cis-eQTL. Studies in which shorter, 25-nucleotide microarray probes are used  appear to be more significantly affected than studies that use longer, 50- to 60-nucleotide-long probes [64,72]. This is consistent with the hypothesis that a SNP within a probe sequence will affect shorter probes more strongly than longer probes. The validity of eQTL hotspots has also been questioned [73,74] owing to the possibility that sets of highly correlated genes will naturally cluster over the same genomic marker. Furthermore, sets of highly correlated genes are likely to be part of the same gene-ontology category, and so when geneontology category enrichment is conducted on eQTL hotspots, they are likely to (falsely) appear biologically coherent. A permutation strategy, in which the sample labels are permuted and the expression labels are held fixed, has been suggested to address this problem [74,75]. False eQTL hotspots may also arise owing to intersample correlation, and a mixed-model approach has been shown to eliminate spurious eQTL hotspots in mouse data .
Another matter of concern in eQTL mapping studies is how to control for the massive multiple testing involved. There are two levels of multiple testing carried out in an eQTL study; multiple testing across correlated SNPs and across multiple correlated transcripts. Multiple testing across SNPs may be addressed by permuting the sample labels in the genotype data while holding the expression data fixed . Multiple testing across genes may be addressed using approaches based on the false-discovery rate .
Recognition of the challenges of currently available laboratory animal-based genetics resources led to the realization that a new general-purpose mouse population was needed to model complex human diseases, with particular emphasis on traits relevant to human health in its broadest aspects. Open discussion among members of the genetics community resulted in the conception and design of the ‘Collaborative Cross’ (CC) [77,78]. Establishment of this new mouse-based resource will considerably expedite gene discovery and characterization and serve as a powerful complement to ongoing studies in human genetics.
The CC provides a translational platform for systems genetics that integrates classical genetics and systems biology tools to identify genetic networks that underlie complex phenotypes. A pre-requisite for systems genetics is a realistic experimental population structure, which is essential to unravel the complex biological processes that may differ from one individual to another, such as cancer susceptibility or the response to an environmental exposure. The CC population was created through a community effort by the Complex Trait Community . This resource will obviate the need for researchers to produce ephemeral backcross or F2 populations; it will need to be genotyped only once; it will archive thousands of recombination events and millions of genetic polymorphisms; and it will facilitate international and intergenerational comparisons of genetic effects.
The CC satisfies four essential criteria for an optimal experimental platform that can support systems genetic studies :
The overall design of CC consists of eight founder strains (A/J, C57BL/6J, 129S1/SvImJ, NOD/LtJ, NZO/HiLtJ, CAST/Ei, PWK/PhJ and WSB/EiJ) bred through an eight-way ‘funnel’ breeding design established to randomly mix the variation present in the founder strains before inbreeding by brother–sister mating. The founder strains were selected from a set of over 100 strains in order to maximize genetic diversity and utility for studying traits of widespread interest. The eight CC founder strains capture approximately 90% of the known allelic diversity across all 1-Mb intervals spanning the entire mouse genome; compare this with AXB/BXA and BXD, the two most commonly used mouse recombinant inbred panels, which each capture only approximately 13% of the known variation, with much of it overlapping . Furthermore, the population of emergent CC lines has a much more random distribution of genetic variation than existing panels of inbred strains such as the mouse phenome panel [80,81], which has considerable genetic linkage across chromosomes that result in high rates of false-positive associations. Since the CC strains have a population structure that randomizes existing genetic variation, this resource will provide unparalleled power to assign causality to understand the intricacies of biological networks underlying disease and toxicant response. The types, distribution and frequency of genetic polymorphisms are close to those in human populations, and the fraction of genetic diversity captured in CC lines is far superior compared with other commonly used mouse populations. Importantly, preliminary phenotypic characterization of pre-CC strains indicates that a very large variability exists within the CC population following changes in environmental conditions (e.g., diet and exercise). The recombination, inbreeding rates and statistical power of this novel cross has been examined by others and found to be optimal for systems genetics applications [82,83].
The authors were supported by research grants (P42 ES005948, R01 ES015241), and the Intramural Research Program of the National Institute of Environmental Health Sciences.
Financial & competing interests disclosure The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
Papers of special note have been highlighted as:
of considerable interest