|Home | About | Journals | Submit | Contact Us | Français|
This white paper by eighty members of the Complex Trait Consortium presents a community’s view on the approaches and statistical analyses that are needed for the identification of genetic loci that determine quantitative traits. Quantitative trait loci (QTLs) can be identified in several ways, but is there a definitive test of whether a candidate locus actually corresponds to a specific QTL?
Much of the genetic variation that underlies disease susceptibility and morphology is complex and is governed by loci that have quantitative effects on the phenotype. Gene-gene and gene-environment interactions are common and make these loci difficult to analyse. Here, we present a community’s view on the steps that are necessary to identify genetic loci that govern quantitative traits, along with a set of interpretive guidelines. This community mostly represents interests in the analyses of rodent quantitative trait loci (QTLs), although many of the same principles apply to other species. With the development of new genetic techniques and with more information about the mammalian genome, we are confident that QTLs will become easier to identify and will provide valuable information about normal development and disease processes.
At the first international meeting of the Complex Trait Consortium (CTC) (box 1) in Memphis, Tennessee, United States (May 2002), the attendees decided that a document should be written to reflect the view of the community on the definition, mapping and identification of QTLs as a means to identify the molecular players that underlie complex phenotypes. Several distinct views have been presented in the literature on the definition and mapping of QTLs1-7. In light of the controversies raised by some of these publications, the CTC held an open discussion of these issues through e-mail over an eight-month period (see links in online links box). This ‘white paper’ is an attempt to form a consensus view and aims to provide the larger scientific community with a realistic set of standards that can be applied to studies that involve QTLs. We intend these criteria to be sufficiently flexible and pragmatic to accommodate studies with a range of different scopes and objectives. Although other papers have been written on this subject and similar views to those expressed here have been voiced, this is the first attempt to develop these ideas from the point of view of a community.
The Complex Trait Consortium (CTC) is an international group of investigators who study the genetics of complex traits in model organisms such as rodents. The following authors are members of the CTC who have contributed to the writing of this document and have agreed with its content (members are listed in alphabetical order and author affiliations are detailed in Online box 1): Oduola Abiola, Joe M. Angel, Philip Avner, Alexander A. Bachmanov, John K. Belknap, Beth Bennett, Elizabeth P. Blankenhorn, David A. Blizard, Valerie Bolivar, Gudrun A. Brockmann, Kari J. Buck, Jean-Francois Bureau, William L. Casley, Elissa J. Chesler, James M. Cheverud, Gary A. Churchill, Melloni Cook, John C. Crabbe, Wim E. Crusio, Ariel Darvasi, Gerald de Haan, Peter Demant, R. W. Doerge, Rosemary W. Elliott, Charles R. Farber, Lorraine Flaherty, Jonathan Flint, Howard Gershenfeld, John P. Gibson, Jing Gu, Weikuan Gu, Heinz Himmelbauer, Robert Hitzemann, Hui-Chen Hsu, Kent Hunter, Fuad A. Iraqi, Ritsert C. Jansen, Thomas E. Johnson, Byron C. Jones, Gerd Kempermann, Frank Lammert, Lu Lu, Kenneth F. Manly, Douglas B. Matthews, Juan F. Medrano, Margarete Mehrabian, Guy Mittleman, Beverly A. Mock, Jeffrey S. Mogil, Xavier Montagutelli, Grant Morahan, John D. Mountz, Hiroki Nagase, Richard S. Nowakowski, Bruce F. O’Hara, Alexander V. Osadchuk, Beverly Paigen, Abraham A. Palmer, Jeremy L. Peirce, Daniel Pomp, Michael Rosemann, Glenn D. Rosen, Leonard C. Schalkwyk, Ze’ev Seltzer, Stephen Settle, Kazuhiro Shimomura, Siming Shou, James M. Sikela, Linda D. Siracusa, Jimmy L. Spearow, Cory Teuscher, David W. Threadgill, Linda A. Toth, Ayo A. Toye, Csaba Vadasz, Gary Van Zant, Edward Wakeland, Robert W. Williams, Huang-Ge Zhang and Fei Zou.
The importance of QTLs to our understanding of disease processes should not be underestimated. Even though QTLs present challenges of discovery and analysis, they represent a fascinating biological phenomenon that is fundamental to our understanding of human variation. Without a doubt, they are responsible for most of the genetic diversity in human disease susceptibility and severity. Now that the human genome sequence is available8,9, we are entering an era in which the analysis of QTLs can be approached both experimentally and mathematically. For this reason, it is important to clearly state our goals and methods, as they have the potential to lead to exciting outcomes.
A quantitative trait is one that has measurable phenotypic variation owing to genetic and/or environmental influences. This variation can consist of discrete values, such as the number of separate tumours in the intestine of a cancer-prone mouse, or can be continuous, such as measurements of height, weight and blood pressure. Sometimes a threshold must be crossed for the quantitative trait to be expressed; this is common among complex diseases.
A QTL is a genetic locus, the alleles of which affect this variation. Generally, quantitative traits are multifactorial and are influenced by several polymorphic genes and environmental conditions, so one or many QTLs can influence a trait or a phenotype. It is important to remember that phenotypic variation can also be caused by environmental factors that are independent of genotype or through gene-environment interactions. Sometimes a cluster of closely linked polymorphic genes is responsible for the quantitative variation of a trait. These are difficult to separate by recombination events and therefore might be detected as one QTL. However, if distinct QTLs can be separated by genetic or functional means, each should be considered to be a separate QTL.
Two classic examples of quantitative traits are height and weight — loci that modulate these traits are therefore called QTLs. These traits can also be influenced by loci that have large discrete effects (often called mendelian loci); for example, genes that cause dwarfism also affect height but in a qualitative ‘all-or-none’ way. Moreover, the same locus might be considered to be both a QTL and a Mendelian locus depending on the alleles that are examined: some alleles might cause quantitative effects whereas others might cause all-or-none effects. modifier loci that modulate the effects of a Mendelian locus can also be described as QTLs. For example, Mtap1a, which is a modifier of the mouse gene tubby (Tub), is considered to be a QTL6,10,11. This modifier alters the hearing of tub/tub mice, as detected by the auditory brainstem response (ABR) threshold. In an F2 population, the ABR measurements are distributed continuously and therefore this modifier qualifies as a QTL6,10,11.
The distinction between Mendelian loci and QTLs is artificial, as the same mapping techniques can be applied to both. In fact, the classification of genetic (and allelic) effects should be considered as a continuum. At one end of the spectrum is the dichotomous Mendelian trait with only two detectable and distinct phenotypes, which are governed by a single gene. At the other end are traits, such as growth, which are likely to be affected by many genes that each contribute a small portion to the overall phenotype. Between these two extremes are traits that are regulated by more than one genetic locus (and are possibly also influenced by environmental factors), which show several intermediate phenotypes. Generally, the more loci that are involved in determining a quantitative trait, the more difficult it is to map and identify all of the causative QTLs. When more than one QTL affects a particular trait, each might have a different effect size and the effects of individual QTLs will vary from strong to weak. The size and nature of these effects can also be influenced by the genetic background (the total genotype of the individual) and interactions between QTLs are common.
Coarse mapping is mapping to a chromosomal segment, usually within a range of 10-30 centimorgans (cM). The likelihood of success in QTL mapping depends on the heritability of the trait, its genetic nature (dominant, recessive or additive) and the number of genes that affect it. For a given QTL, the mapping resolution depends on the number of recombination events in the mapping population. Mapping QTLs with smaller effect sizes requires larger mapping populations. Mouse-breeding strategies that have led to successfully mapping QTLs include backcrosses and intercrosses, and have used advanced intercross lines12, recombinant inbred strains13,14, recombinant inbred-strain crosses7, heterogeneous stocks15, congenic strains13,16, recombinant congenic strains17, consomic strains18, near-isogenic lines19, recombinant qtl-introgression strains20 and knockout/congenic strains21 (for recent reviews see refs 4,22). In general, strategies that increase the number of breakpoints in a mapping population provide higher mapping resolution but also require a larger number of animals to achieve significance for a given size of a QTL effect.
There has been some divergence of opinion in the mouse genetic community about the level of significance that is appropriate to establish credible linkage. With the advent of comprehensive genome-wide maps, Lander and Kruglyak3 formulated a set of criteria for reporting the significance of a linkage relationship based on standard genome scans in intercross and backcross populations. These criteria for statistical significance have now become standard practice:‘highly significant’ refers to p<0.001, ‘significant’ refers to p<0.05 and ‘suggestive’ refers to p<0.63 after correcting for multiple testing in a genome-wide scan3. Several statistical approaches can be used to arrive at these criteria. Strong control of type I error was originally proposed in this context by Lander and Botstein23 to ensure, with high confidence, that limited false QTL detection would be reported in a QTL search by genome scanning. Lander and Kruglyak3 later suggested a series of lod threshold values that could be used for this purpose. Although specific LOD threshold values have been useful as general guidelines, they are based on conservative approximations that are valid only for certain types of genome scan in the context of specific kinds of crosses, such as, for example, a mouse intercross.
Permutation analysis is a more general approach for obtaining threshold values that are adjusted for multiple testing24. In a permutation test, genome scans are repeatedly carried out on shuffled versions of the data to estimate a LOD threshold that is appropriate for the given data set. Overall, these permutation-based thresholds compare well with the Lander and Kruglyak threshold values, although the former tend to be less conservative. So, the use of permutation-based thresholds is likely to yield more QTLs without jeopardizing statistical significance. Also, permutation analysis can provide valid thresholds for non-standard situations, such as when the phenotype or trait does not follow a normal distribution. The state of available software tools for QTL analysis is constantly changing and being improved, and links to some of the more popular tools are provided in the online links box.
When reporting QTL map positions, the LOD score, the peak position and an estimate of the confidence interval should be given. This allows the reader to compare the map position with that for other QTLs that control potentially related traits. It is, however, possible that a QTL might fall outside a calculated confidence-interval region owing to problems in marker order, genotyping errors and/or model misspecification. In such cases, QTLs are difficult to identify and overlaps between QTLs are difficult to determine.
There has been some controversy about when a QTL should be given an official locus designation. It is our opinion that QTLs that have been mapped to regions with only a suggestive significance should not be given a locus name. However, we recommend
“...we are entering an era in which the analysis of QTLs can be approached both experimentally and mathematically.”
reporting such regions to facilitate possible confirmation in future studies, as was originally suggested by Lander and Kruglyak3. A name for a locus would be appropriate if repeat observations or other kinds of evidence confirm the linkage of a QTL. Meta-analysis, such as that carried out by Belknap and Atkins25 using Fisher’s method (based on the additive nature of independent chi squared values), can be used to calculate new p values and LOD scores based on combined data. Therefore, information from two or more independent studies can be used to increase the statistical power and accuracy of the QTL linkage relationships. Candidate names of loci should follow approved nomenclature; for the mouse, candidate names should be submitted to the Mouse Nomenclature Committee.
It is wise to confirm any significant linkages by further studies before proceeding to finer mapping, to avoid unnecessary effort. Several methods are available for this purpose. First, further independent crosses can be carried out. A reconfirmation of significance only requires a simple test of the proposed chromosomal interval for linkage (usually ~20 cM)3. In this case, statistical corrections for genome-wide scanning are not necessary.
In a second method, a congenic strain can be made in which the QTL interval or the critical region (defined as the region that must contain the candidate locus on the basis of recombination breakpoints on either side of the candidate locus) has been captured in the ‘differential segment’, which is the segment that has been introgressed into an inbred strain. This congenic strain should then show phenotypic differences from the inbred strain in the quantitative trait being monitored. Moreover, this congenic strain will be a useful tool for fine mapping the QTL. In certain instances, however, even when the QTL is known to be in the interval region, the congenic strain might not confirm the original observations. This might happen if the new genetic background of the congenic strain does not support the full expression of the quantitative trait. In this case, the effect of the QTL might not be detected. Even with Mendelian traits, such as cystic fibrosis, a change in the genetic background can cause a given phenotype to become undetectable26.
One important advantage of a congenic strain is that it allows the assessment of a phenotype in many genetically identical individuals. In such a case, statistical significance can be reached more easily and a QTL with a small effect size can be confirmed using a more manageable number of animals. For example, Fehr et al. used only 20-40 mice from their newly developed congenic strains to confirm the location of a QTL for alcohol withdrawal27.
A third method involves selection studies in which a short-term selective-breeding study is carried out and mice are selected for three to five generations for the phenotypic trait28,29. At every generation, DNA markers are also scored. If the DNA markers are co-selected with the significant QTL region, then this constitutes further proof that the QTL has been mapped to the correct general location.
Fine mapping (to <1-5 cM) is difficult as it requires more recombination events to separate the genes that govern the quantitative trait from closely linked markers. It might also require more sensitive phenotyping procedures if there are several linked QTLs, because each individual QTL will probably have a smaller effect on the phenotype. Crosses that involve many recombination events are the most successful. Also, the production of subcongenic strains is an efficient way of accomplishing this task. Subcongenics have a shorter differential segment (arising by recombination in the differential segment) than their congenic parent. A set of these subcongenic strains can be made that subdivide this critical interval into several segments that can be individually tested for the QTL27. Therefore, subcongenics are powerful tools for fine mapping as they allow multiple tests for phenotypic effects on genetically identical mice. If other types of cross are used to narrow the critical region, progeny testing is often necessary to confirm the phenotype of the recombinant mice.
Before the availability of genome sequences, it was difficult to identify a suitable candidate gene even when the critical region was 0.5 cM. With the completion of genome sequencing for several model organisms, it is now feasible to identify the gene (or other functional element) that is responsible for a quantitative trait, even when the critical region is relatively large. The availability of known polymorphisms among strains also facilitates the identification of candidate genes in larger critical regions. The size of the critical region that is required for successful results varies depending on whether the region is gene-rich or gene-poor. Clearly, it will be easier to identify a candidate QTL if there are only 20 genes in the critical region compared with 200 genes. With more than 200 genes, there is a much higher probability of the region containing more than one gene with the characteristics of a viable candidate, such as appropriate tissue expression, genetic polymorphisms and suspected pathway functions (see below for criteria). To test all of these candidate genes might be prohibitively time-consuming and expensive.
“...the bar should not be set so high as to prevent QTL information from being published in a timely fashion.”
Although several genes have been confirmed as underlying quantitative traits, their identification still requires a great deal of effort. From the set of QTLs that have been identified, it is clear that only those that have a strong effect on the phenotype are readily amenable to positional cloning techniques (for recent reviews, see refs 6,11). As procedures become more refined and the genome becomes better characterized, QTLs with weaker effects will also be identified. The set of criteria for identification of a gene that determines a quantitative trait should be no more (or less) stringent than it is for the identification of a gene that determines a non-quantitative trait.
There is no single ‘gold standard’ for the identification of a QTL. Rather, there should be a predominance of evidence that supports its identity. Generally, more than one of the conditions discussed below should be applied, some of which are more important than others. A similar list of criteria has been compiled by Glazier et al.11, who state that the most conclusive evidence for a QTL resides in the ability to replace one allele with another and test for function, for example, by making a knock-in mouse. However, Glazier et al. also state that circumstantial evidence might provide sufficient and reasonable proof of gene identity. As a community of investigators who are directly involved in QTL analyses, we agree with this proposal but also emphasize that the use of allelic replacement or allelic addition through knock-ins and/or transgenics should not be a necessary requirement for QTL identification. If there is a predominance of circumstantial evidence in support of the identity of a QTL, as judged by peer review, then the research community should accept this as sufficient for publication, with the assumption that these findings will inevitably be subject to further testing and refinement.
Below, we list some of the methods that can be used to identify genes that determine a quantitative trait. The list does not give priorities as to which combinations of evidence would be sufficient for the identification of a QTL, because such priorities depend on the genetics and function of the gene to be identified; however, it does represent the view of the community on the available sources of evidence that can be used for this purpose.
Sequence differences that lead to changes in either the structure or regulation (or quantity) of a gene product should be detected between the strains that are used for mapping and are known to differ in the quantitative trait. It is difficult to predict what type of genetic abnormality will most commonly underlie quantitative traits. Of the QTLs that have been identified in the mouse and rat, so far, most have allelic variations in the coding region sequence11. All of these identified rodent QTLs have large phenotypic effects and it is not known whether the same principle will apply to QTLs with weaker effects.
As well as affecting either the structure or regulation of a gene product, some evidence should support a link between the function of the gene and the expression of the quantitative trait being analysed, either by involvement in an appropriate pathway and/or by expression in the appropriate target tissue(s) or cell type(s).
Conceivably, in vitro tests can substitute for in vivo tests. If an in vitro functional test, such as a tissue-culture system that displays the quantitative trait, can be designed, then transfection experiments can be used to test the effects of the alternative alleles on relevant cellular phenotypes. Transfections with alternative alleles should be definitive in identifying the effects of the candidate gene. These in vitro tests should reflect the in vivo phenotype that is influenced by the QTL.
Transgenesis with bacterial artificial chromosomes (BACs) or other large chromosomal segments can also be used to confirm the identity of the candidate gene. For example, BACs that contain the candidate gene can be transfected into zygotes and the resulting mice can be tested for the quantitative trait. For this system to be applicable, BAC libraries must be available that contain the appropriate alternative alleles at the relevant QTL. Also, the success of this experiment will depend on the ‘dominance’ of the transfected allele(s), as the two resident copies of the host allele might distort the effect of the transgene on the quantitative trait. Moreover, genetic background effects could complicate interpretation. If there are several genes on the BAC, rescue by a BAC clone might require further experiments to confirm which gene is responsible. Development of new techniques for the manipulation of BAC sequences will aid these BAC transgenesis experiments30.
Knock-ins can also be used to confirm candidate genes, as replacement of one allele with another at the candidate QTL should alter the quantitative trait. As there might be several polymorphic genes in the critical region, this method will test the effects of one gene at a time. New recombineering techniques will also allow easier construction of appropriate vectors for this purpose31. A limitation of this technique lies in the availability of embryonic stem cell lines from a wide range of inbred strains that are used in quantitative trait analyses. The interpretation of results might also be complicated by genetic background effects that do not allow the full expression of the alternative alleles.
If a knockout (or a null allele) of the candidate QTL is available, then complementation tests between the knockout (or mutant strain) and the strain that contains the QTL variant allele could be used as evidence of gene identity. This technique has been successfully used in Drosophila and mice32,33. The quantitative trait should change depending on the presence of the alternative allele. Genetic background effects can be minimized by using control strains and intercrosses.
With the advent of more convenient mutational techniques, such as the chemical mutagenesis of embryonic stem cells, it is now possible to carry out gene-specific mutational analyses; that is, to collect an allelic series of mutations in a specific gene34. Two or more induced or spontaneous mutations at the candidate QTL should change the quantitative trait in a predictable fashion.
The mouse and human genomes are notably homologous in regions of functional importance (see Mouse-Human Homologies in online links box). In some cases, it might be possible to identify QTLs taking advantage of these homologies. When a QTL has been identified in one species (for example, in humans) and is subsequently mapped in another species (for example, in mice) to the homologous location, this is strong evidence that the candidate gene governs the particular quantitative trait.
Identification of the genes that underlie quantitative traits is becoming easier, which has led to a new wave of optimism for accomplishing this task. Newly developed animal and genomic tools have become available to facilitate QTL mapping. For example, there are now several expanded recombinant inbred strains of mice (see Complex Trait Consortium 2003 Meeting in online links box) that can be used to yield more powerful mapping studies such that QTLs with weaker effects can be mapped successfully. New sets of congenic mouse lines and consomic strains will also soon be available for mapping and can be used to pinpoint QTLs to particular chromosomes and their regions. The development of further recombinant inbred strains is also being discussed in the mouse genetics community7,35.
However, the most difficult part of identifying QTLs is still the ‘endgame’ in which the gene and the relevant variant that determines the quantitative trait must be identified among many genes in the region. Newly available genomic tools (and more are being developed) have made winning this endgame a realistic venture. New comparative SNP maps (see SNPview and Mouse SNP Database in online links box) between inbred strains will allow the investigator to identify all of the nucleotide changes in any given region of the genome and will yield a list of genes that are polymorphic either in their coding or regulatory regions. Recent improvements in software programs that are designed to predict structure-function relationships should also help to distinguish which of these polymorphisms might be important. This list will substantially narrow the search for the candidate gene. Moreover, with further annotation of the genome and more knowledge of the motifs that are important in specific biochemical pathways, it should be possible to prioritize the remaining genes into probable candidates. Finally, transgenic techniques and gene-specific mutagenesis procedures are becoming easier and more suitable for testing candidate genes. All these advances should provide the necessary tools to make the final QTL identification considerably more efficient. Perhaps in the near future we will be able to use exclusively genomic techniques and databases to identify the genetic basis of at least a subset of quantitative traits without resorting to more complicated biological proofs, such as the creation and functional characterization of knock-in mouse strains.
One of the goals of this white paper is to voice the heightened optimism of the CTC community about the eventual identification of many of the genes that underlie quantitative traits. With new genomic and statistical techniques, we are able to map these genes with more confidence and efficiency. At the same time, we feel that we must remain vigilant and require standards for their mapping and identification. But the bar should not be set so high as to prevent QTL information from being published in a timely fashion. As our knowledge of the genome expands, more difficult and formal proofs of QTL identity will become unnecessary. New mutational and genetic-engineering tools will allow us to identify these genes more rapidly and to show their importance.
The following terms in this article are linked online to: LocusLink: http://www.ncbi.nlm.nih.gov/LocusLink Mtap1a | Tub
OMIM: http://www.ncbi.nlm.nih.gov/omim cystic fibrosis
Complex Trait Consortium: http://www.complextrait.org
Complex Trait Consortium 2003 Meeting: http://www.well.ox.ac.uk/~rmott/CTC
Genetic Mapping Software: http://mapmgr.roswellpark.org/qtsoftware.html
Genomics: a global resource: http://genomics.phrma.org/lexicon/r.html
Mouse-Human Homologies: http://www.informatics.jax.org/reports/homologymap/mouse_human.shtml
Mouse Nomenclature: http://www.informatics.jax.org/mgihome/nomen
Mouse Phenome Database: http://www.jax.org/phenome
Mouse SNP Database: http://mousesnp.roche.com/cgi-bin/msnp.pl
Neurogenetics at University of Tennessee Health Science Center: http://www.nervenet.org
R/qtl: a QTL mapping environment: http://www.biostat.jhsph.edu/~kbroman/qtl
Rat Genome Database: http://rat.lab.nig.ac.jp/qtls
SNPview: SNPs, SSLPs, alleles and haplotypes: http://www.gnf.org/SNP
Software for QTL data analysis: http://www.stat.wisc.edu/~yandell/qtl/software
The Mouse Brain Library: http://www.mbl.org
The WebQTL Project: http://www.webqtl.org/search.html
University of Wisconsin-Madison Department of Statistics: http://www.stat.wisc.edu