Silkworm is the basis of sericultural industry and the model organism in insect genetics study. Mapping quantitative trait loci (QTLs) underlying economically important traits of silkworm is of high significance for promoting the silkworm molecular breeding and advancing our knowledge on genetic architecture of the Lepidoptera. Yet, the currently used mapping methods are not well suitable for silkworm, because of ignoring the recombination difference in meiosis between two sexes.
A mixed linear model including QTL main effects, epistatic effects, and QTL × sex interaction effects was proposed for mapping QTLs in an F2 population of silkworm. The number and positions of QTLs were determined by F-test and model selection. The Markov chain Monte Carlo (MCMC) algorithm was employed to estimate and test genetic effects of QTLs and QTL × sex interaction effects. The effectiveness of the model and statistical method was validated by a series of simulations. The results indicate that when markers are distributed sparsely on chromosomes, our method will substantially improve estimation accuracy as compared to the normal chiasmate F2 model. We also found that a sample size of hundreds was sufficiently large to unbiasedly estimate all the four types of epistases (i.e., additive-additive, additive-dominance, dominance-additive, and dominance-dominance) when the paired QTLs reside on different chromosomes in silkworm.
The proposed method could accurately estimate not only the additive, dominance and digenic epistatic effects but also their interaction effects with sex, correcting the potential bias and precision loss in the current QTL mapping practice of silkworm and thus representing an important addition to the arsenal of QTL mapping tools.
Functional mapping is a powerful approach for mapping quantitative trait loci (QTLs) that control biological processes. Functional mapping incorporates mathematical aspects of growth and development into a general QTL mapping framework and has been recently integrated with composite interval mapping to build up a so-called composite functional mapping model, aimed to separate multiple linked QTLs on the same chromosomal region.
This article reports the principle of using composite functional mapping to estimate the effects of QTL-environment interactions on growth trajectories by parametrically modeling the tested QTL in a marker interval and nonparametrically modeling the markers outside the interval as co-factors. With this new model, we can characterize the dynamic patterns of the genetic effects of QTLs governing growth trajectories, estimate the global effects of the underlying QTLs during the course of growth and development, and test the differentiation in the shapes of QTL genotype-specific growth curves between different environments. By analyzing a real example from a soybean genome project, our model detects several QTLs that cause significant genotype-environment interactions for plant height growth processes.
The model provides a basis for deciphering the genetic architecture of trait expression adjusted to different biotic and abiotic environments for any organism.
The Bayesian shrinkage technique has been applied to multiple quantitative trait loci (QTLs) mapping to estimate the genetic effects of QTLs on quantitative traits from a very large set of possible effects including the main and epistatic effects of QTLs. Although the recently developed empirical Bayes (EB) method significantly reduced computation comparing with the fully Bayesian approach, its speed and accuracy are limited by the fact that numerical optimization is required to estimate the variance components in the QTL model.
We developed a fast empirical Bayesian LASSO (EBLASSO) method for multiple QTL mapping. The fact that the EBLASSO can estimate the variance components in a closed form along with other algorithmic techniques render the EBLASSO method more efficient and accurate. Comparing with the EB method, our simulation study demonstrated that the EBLASSO method could substantially improve the computational speed and detect more QTL effects without increasing the false positive rate. Particularly, the EBLASSO algorithm running on a personal computer could easily handle a linear QTL model with more than 100,000 variables in our simulation study. Real data analysis also demonstrated that the EBLASSO method detected more reasonable effects than the EB method. Comparing with the LASSO, our simulation showed that the current version of the EBLASSO implemented in Matlab had similar speed as the LASSO implemented in Fortran, and that the EBLASSO detected the same number of true effects as the LASSO but a much smaller number of false positive effects.
The EBLASSO method can handle a large number of effects possibly including both the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTL mapping.
The determination of expression quantitative trait loci (eQTL) epistasis – a form of functional interaction between genetic loci that affect gene expression – is an important step toward the thorough understanding of gene regulation. Since gene expression has emerged as an “intermediate” molecular phenotype eQTL epistasis might help to explain the relationship between genotype and higher level organismal phenotypes such as diseases. A characteristic feature of eQTL analysis is the big number of tests required to identify associations between gene expression and genetic loci variability. This problem is aggravated, when epistatic effects between eQTLs are analyzed. In this review, we discuss recent algorithmic approaches for the detection of eQTL epistasis and highlight lessons that can be learned from current methods.
eQTL; epistasis; genetic association; genetic crosses; network modules
Mapping multiple quantitative trait loci (QTL) is commonly viewed as a problem of model selection. Various model selection criteria have been proposed, primarily in the non-Bayesian framework. The deviance information criterion (DIC) is the most popular criterion for Bayesian model selection and model comparison but has not been applied to Bayesian multiple QTL mapping. A derivation of the DIC is presented for multiple interacting QTL models and calculation of the DIC is demonstrated using posterior samples generated by Markov chain Monte Carlo (MCMC) algorithms. The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. The effective number of parameters simultaneously accounts for the sample size, the cross design, the number and lengths of chromosomes, covariates, the number of QTL, the type of QTL effects, and QTL effect sizes. The DIC provides a computationally efficient way to perform sensitivity analysis and can be used to quantitatively evaluate if including environmental effects, gene-gene interactions, and/or gene-environment interactions in the prior specification is worth the extra parameterization. The DIC has been implemented in the freely available package R/qtlbim, which greatly facilitates the general usage of Bayesian methodology for genome-wide interacting QTL analysis.
complex trait; deviance; DIC; model selection and comparison; quantitative trait loci
Industrial production of the edible basidiomycete Pleurotus ostreatus (oyster mushroom) is based on a solid fermentation process in which a limited number of selected strains are used. Optimization of industrial mushroom production depends on improving the culture process and breeding new strains with higher yields and productivities. Traditionally, fungal breeding has been carried out by an empirical trial and error process. In this study, we used a different approach by mapping quantitative trait loci (QTLs) controlling culture production and quality within the framework of the genetic linkage map of P. ostreatus. Ten production traits and four quality traits were studied and mapped. The production QTLs identified explain nearly one-half of the production variation. More interestingly, a single QTL mapping to the highly polymorphic chromosome VII appears to be involved in control of all the productivity traits studied. Quality QTLs appear to be scattered across the genome and to have less effect on the variation of the corresponding traits. Moreover, some of the new hybrid strains constructed in the course of our experiments had production or quality values higher than those of the parents or other commercial strains. This approach opens the possibility of marker-assisted selection and breeding of new industrial strains of this fungus.
Gramene is a comparative information resource for plants that integrates data across diverse data domains. In this article, we describe the development of a quantitative trait loci (QTL) database and illustrate how it can be used to facilitate both the forward and reverse genetics research. The QTL database contains the largest online collection of rice QTL data in the world. Using flanking markers as anchors, QTLs originally reported on individual genetic maps have been systematically aligned to the rice sequence where they can be searched as standard genomic features. Researchers can determine whether a QTL co-localizes with other QTLs detected in independent experiments and can combine data from multiple studies to improve the resolution of a QTL position. Candidate genes falling within a QTL interval can be identified and their relationship to particular phenotypes can be inferred based on functional annotations provided by ontology terms. Mutations identified in functional genomics populations and association mapping panels can be aligned with QTL regions to facilitate fine mapping and validation of gene–phenotype associations. By assembling and integrating diverse types of data and information across species and levels of biological complexity, the QTL database enhances the potential to understand and utilize QTL information in biological research.
Complex binary traits are influenced by many factors including the main effects of many quantitative trait loci (QTLs), the epistatic effects involving more than one QTLs, environmental effects and the effects of gene-environment interactions. Although a number of QTL mapping methods for binary traits have been developed, there still lacks an efficient and powerful method that can handle both main and epistatic effects of a relatively large number of possible QTLs.
In this paper, we use a Bayesian logistic regression model as the QTL model for binary traits that includes both main and epistatic effects. Our logistic regression model employs hierarchical priors for regression coefficients similar to the ones used in the Bayesian LASSO linear model for multiple QTL mapping for continuous traits. We develop efficient empirical Bayesian algorithms to infer the logistic regression model. Our simulation study shows that our algorithms can easily handle a QTL model with a large number of main and epistatic effects on a personal computer, and outperform five other methods examined including the LASSO, HyperLasso, BhGLM, RVM and the single-QTL mapping method based on logistic regression in terms of power of detection and false positive rate. The utility of our algorithms is also demonstrated through analysis of a real data set. A software package implementing the empirical Bayesian algorithms in this paper is freely available upon request.
The EBLASSO logistic regression method can handle a large number of effects possibly including the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTLs mapping for complex binary traits.
QTL mapping; Binary traits; Epistatic effects; Bayesian shrinkage; Logistic regression
Genetic mapping has proven to be powerful for studying the genetic architecture of complex traits by characterizing a network of the underlying interacting quantitative trait loci (QTLs). Current statistical models for genetic mapping were mostly founded on the biallelic epistasis of QTLs, incapable of analyzing multiallelic QTLs and their interactions that are widespread in an outcrossing population.
Here we have formulated a general framework to model and define the epistasis between multiallelic QTLs. Based on this framework, we have derived a statistical algorithm for the estimation and test of multiallelic epistasis between different QTLs in a full-sib family of outcrossing species. We used this algorithm to genomewide scan for the distribution of mul-tiallelic epistasis for a rooting ability trait in an outbred cross derived from two heterozygous poplar trees. The results from simulation studies indicate that the positions and effects of multiallelic QTLs can well be estimated with a modest sample and heritability.
The model and algorithm developed provide a useful tool for better characterizing the genetic control of complex traits in a heterozygous family derived from outcrossing species, such as forest trees, and thus fill a gap that occurs in genetic mapping of this group of important but underrepresented species.
The study of quantitative trait loci (QTL) in cotton (Gossypium spp.) is focused on traits of agricultural significance. Previous studies have identified a plethora of QTL attributed to fiber quality, disease and pest resistance, branch number, seed quality and yield and yield related traits, drought tolerance, and morphological traits. However, results among these studies differed due to the use of different genetic populations, markers and marker densities, and testing environments. Since two previous meta-QTL analyses were performed on fiber traits, a number of papers on QTL mapping of fiber quality, yield traits, morphological traits, and disease resistance have been published. To obtain a better insight into the genome-wide distribution of QTL and to identify consistent QTL for marker assisted breeding in cotton, an updated comparative QTL analysis is needed.
In this study, a total of 1,223 QTL from 42 different QTL studies in Gossypium were surveyed and mapped using Biomercator V3 based on the Gossypium consensus map from the Cotton Marker Database. A meta-analysis was first performed using manual inference and confirmed by Biomercator V3 to identify possible QTL clusters and hotspots. QTL clusters are composed of QTL of various traits which are concentrated in a specific region on a chromosome, whereas hotspots are composed of only one trait type. QTL were not evenly distributed along the cotton genome and were concentrated in specific regions on each chromosome. QTL hotspots for fiber quality traits were found in the same regions as the clusters, indicating that clusters may also form hotspots.
Putative QTL clusters were identified via meta-analysis and will be useful for breeding programs and future studies involving Gossypium QTL. The presence of QTL clusters and hotspots indicates consensus regions across cultivated tetraploid Gossypium species, environments, and populations which contain large numbers of QTL, and in some cases multiple QTL associated with the same trait termed a hotspot. This study combines two previous meta-analysis studies and adds all other currently available QTL studies, making it the most comprehensive meta-analysis study in cotton to date.
Bone mineral density (BMD) is a heritable trait, and in mice, over 100 quantitative trait loci (QTLs) have been reported, but candidate genes have been identified for only a small percentage. Persistent errors in the mouse genetic map have negatively affected QTL localization, spurring the development of a new, corrected map. In this study, QTLs for BMD were remapped in 11 archival mouse data sets using this new genetic map. Since these QTLs all were mapped in a comparable way, direct comparisons of QTLs for concordance would be valid. We then compared human genome-wide association study (GWAS) BMD loci with the mouse QTLs. We found that 26 of the 28 human GWAS loci examined were located within the confidence interval of a mouse QTL. Furthermore, 14 of the GWAS loci mapped to within 3 cM of a mouse QTL peak. Lastly, we demonstrated that these newly remapped mouse QTLs can substantiate a candidate gene for a human GWAS locus, for which the peak single-nucleotide polymorphism (SNP) fell in an intergenic region. Specifically, we suggest that MEF2C (human chromosome 5, mouse chromosome 13) should be considered a candidate gene for the genetic regulation of BMD. In conclusion, use of the new mouse genetic map has improved the localization of mouse BMD QTLs, and these remapped QTLs show high concordance with human GWAS loci. We believe that this is an opportune time for a renewed effort by the genetics community to identify the causal variants regulating BMD using a synergistic mouse-human approach. © 2010 American Society for Bone and Mineral Research.
genetic linkage; quantitative trait loci; mouse; human
A mathematical approach was developed to model and optimize selection on multiple known quantitative trait loci (QTL) and polygenic estimated breeding values in order to maximize a weighted sum of responses to selection over multiple generations. The model allows for linkage between QTL with multiple alleles and arbitrary genetic effects, including dominance, epistasis, and gametic imprinting. Gametic phase disequilibrium between the QTL and between the QTL and polygenes is modeled but polygenic variance is assumed constant. Breeding programs with discrete generations, differential selection of males and females and random mating of selected parents are modeled. Polygenic EBV obtained from best linear unbiased prediction models can be accommodated. The problem was formulated as a multiple-stage optimal control problem and an iterative approach was developed for its solution. The method can be used to develop and evaluate optimal strategies for selection on multiple QTL for a wide range of situations and genetic models.
selection; quantitative trait loci; optimization; marker assisted selection
We present a new computational scheme that enables efficient and reliable quantitative trait loci (QTL) scans for experimental populations. Using a standard brute-force exhaustive search effectively prohibits accurate QTL scans involving more than two loci to be performed in practice, at least if permutation testing is used to determine significance. Some more elaborate global optimization approaches, for example, DIRECT have been adopted earlier to QTL search problems. Dramatic speedups have been reported for high-dimensional scans. However, since a heuristic termination criterion must be used in these types of algorithms, the accuracy of the optimization process cannot be guaranteed. Indeed, earlier results show that a small bias in the significance thresholds is sometimes introduced.
Our new optimization scheme, PruneDIRECT, is based on an analysis leading to a computable (Lipschitz) bound on the slope of a transformed objective function. The bound is derived for both infinite- and finite-size populations. Introducing a Lipschitz bound in DIRECT leads to an algorithm related to classical Lipschitz optimization. Regions in the search space can be permanently excluded (pruned) during the optimization process. Heuristic termination criteria can thus be avoided. Hence, PruneDIRECT has a well-defined error bound and can in practice be guaranteed to be equivalent to a corresponding exhaustive search. We present simulation results that show that for simultaneous mapping of three QTLS using permutation testing, PruneDIRECT is typically more than 50 times faster than exhaustive search. The speedup is higher for stronger QTL. This could be used to quickly detect strong candidate eQTL networks.
algorithms; branch-and-bound; genetic mapping; genomics; statistical models; statistics
Quantitative phenotypic variation of agronomic characters in crop plants is controlled by environmental and genetic factors (quantitative trait loci = QTL). To understand the molecular basis of such QTL, the identification of the underlying genes is of primary interest and DNA sequence analysis of the genomic regions harboring QTL is a prerequisite for that. QTL mapping in potato (Solanum tuberosum) has identified a region on chromosome V tagged by DNA markers GP21 and GP179, which contains a number of important QTL, among others QTL for resistance to late blight caused by the oomycete Phytophthora infestans and to root cyst nematodes.
To obtain genomic sequence for the targeted region on chromosome V, two local BAC (bacterial artificial chromosome) contigs were constructed and sequenced, which corresponded to parts of the homologous chromosomes of the diploid, heterozygous genotype P6/210. Two contiguous sequences of 417,445 and 202,781 base pairs were assembled and annotated. Gene-by-gene co-linearity was disrupted by non-allelic insertions of retrotransposon elements, stretches of diverged intergenic sequences, differences in gene content and gene order. The latter was caused by inversion of a 70 kbp genomic fragment. These features were also found in comparison to orthologous sequence contigs from three homeologous chromosomes of Solanum demissum, a wild tuber bearing species. Functional annotation of the sequence identified 48 putative open reading frames (ORF) in one contig and 22 in the other, with an average of one ORF every 9 kbp. Ten ORFs were classified as resistance-gene-like, 11 as F-box-containing genes, 13 as transposable elements and three as transcription factors. Comparing potato to Arabidopsis thaliana annotated proteins revealed five micro-syntenic blocks of three to seven ORFs with A. thaliana chromosomes 1, 3 and 5.
Comparative sequence analysis revealed highly conserved collinear regions that flank regions showing high variability and tandem duplicated genes. Sequence annotation revealed that the majority of the ORFs were members of multiple gene families. Comparing potato to Arabidopsis thaliana annotated proteins suggested fragmented structural conservation between these distantly related plant species.
A number of different quantitative trait loci (QTL) for various phenotypic traits, including milk production, functional, and conformation traits in dairy cattle as well as growth and body composition traits in meat cattle, have been mapped consistently in the middle region of bovine chromosome 6 (BTA6). Dense genetic and physical maps and, ultimately, a fully annotated genome sequence as well as their mutual connections are required to efficiently identify genes and gene variants responsible for genetic variation of phenotypic traits. A comprehensive high-resolution gene-rich map linking densely spaced bovine markers and genes to the annotated human genome sequence is required as a framework to facilitate this approach for the region on BTA6 carrying the QTL.
Therefore, we constructed a high-resolution radiation hybrid (RH) map for the QTL containing chromosomal region of BTA6. This new RH map with a total of 234 loci including 115 genes and ESTs displays a substantial increase in loci density compared to existing physical BTA6 maps. Screening the available bovine genome sequence resources, a total of 73 loci could be assigned to sequence contigs, which were already identified as specific for BTA6. For 43 loci, corresponding sequence contigs, which were not yet placed on the bovine genome assembly, were identified. In addition, the improved potential of this high-resolution RH map for BTA6 with respect to comparative mapping was demonstrated. Mapping a large number of genes on BTA6 and cross-referencing them with map locations in corresponding syntenic multi-species chromosome segments (human, mouse, rat, dog, chicken) achieved a refined accurate alignment of conserved segments and evolutionary breakpoints across the species included.
The gene-anchored high-resolution RH map (1 locus/300 kb) for the targeted region of BTA6 presented here will provide a valuable platform to guide high-quality assembling and annotation of the currently existing bovine genome sequence draft to establish the final architecture of BTA6. Hence, a sequence-based map will provide a key resource to facilitate prospective continued efforts for the selection and validation of relevant positional and functional candidates underlying QTL for milk production and growth-related traits mapped on BTA6 and on similar chromosomal regions from evolutionary closely related species like sheep and goat. Furthermore, the high-resolution sequence-referenced BTA6 map will enable precise identification of multi-species conserved chromosome segments and evolutionary breakpoints in mammalian phylogenetic studies.
Higher seed yield is one of the objectives of jatropha breeding. However, genetic analysis of the yield traits has not been done in jatropha. Quantitative trait loci (QTL) mapping was conducted to identify genetic factors controlling growth and seed yield in jatropha, a promising biofuel crop.
A linkage map was constructed consisting of 105 SSR (simple sequence repeat) markers converged into 11 linkage groups. With this map, we identified a total of 28 QTLs for 11 growth and seed traits using a population of 296 backcrossing jatropha trees. Two QTLs qTSW-5 and qTSW-7 controlling seed yield were mapped on LGs 5 and 7 respectively, where two QTL clusters controlling yield related traits were detected harboring five and four QTLs respectively. These two QTL clusters were critical with pleiotropic roles in regulating plant growth and seed yield. Positive additive effects of the two QTLs indicated higher values for the traits conferred by the alleles from J. curcas, while negative additive effects of the five QTLs on LG6, controlling plant height, branch number (in the 4th and 10th months post seed germination), female flower number and fruit number respectively, indicated higher values conferred by the alleles from J. integerrima. Therefore favored alleles from both the parents could be expected to be integrated into elite jatropha plant by further backcrossing and marker assisted selection. Efficient ways to improve the seed yield by applying the two QTL clusters are discussed.
This study is the first report on genetic analysis of growth and seed traits with molecular markers in jatropha. An approach for jatropha improvement is discussed using pleiotropic QTLs, which will be likely to lead to initiation of molecular breeding in jatropha by integrating more markers in the QTL regions.
How the power required for bird flight varies as a function of forward speed can be used to predict the flight style and behavioral strategy of a bird for feeding and migration. A U-shaped curve was observed between the power and flight velocity in many birds, which is consistent to the theoretical prediction by aerodynamic models. In this article, we present a general genetic model for fine mapping of quantitative trait loci (QTL) responsible for power curves in a sample of birds drawn from a natural population. This model is developed within the maximum likelihood context, implemented with the EM algorithm for estimating the population genetic parameters of QTL and the simplex algorithm for estimating the QTL genotype-specific parameters of power curves. Using Monte Carlo simulation derived from empirical observations of power curves in the European starling (Sturnus vulgaris), we demonstrate how the underlying QTL for power curves can be detected from molecular markers and how the QTL detected affect the most appropriate flight speeds used to design an optimal migration strategy. The results from our model can be directly integrated into a conceptual framework for understanding flight origin and evolution.
QTL; Linkage Disequilibrium; Power Curve; Bird
The identity-by-descent (IBD) based variance component analysis is an important method for mapping quantitative trait loci (QTL) in outbred populations. The interval-mapping approach and various modified versions of it may have limited use in evaluating the genetic variances of the entire genome because they require evaluation of multiple models and model selection. In this study, we developed a multiple variance component model for genome-wide evaluation using both the maximum likelihood (ML) method and the MCMC implemented Bayesian method. We placed one QTL in every few cM on the entire genome and estimated the QTL variances and positions simultaneously in a single model. Genomic regions that have no QTL usually showed no evidence of QTL while regions with large QTL always showed strong evidence of QTL. While the Bayesian method produced the optimal result, the ML method is computationally more efficient than the Bayesian method. Simulation experiments were conducted to demonstrate the efficacy of the new methods.
Electronic supplementary material
The online version of this article (doi:10.1007/s10709-010-9497-1) contains supplementary material, which is available to authorized users.
Bayesian analysis; Genome selection; Markov chain Monte Carlo; Maximum likelihood
The dissection of the genetic architecture of quantitative traits, including the number and locations of quantitative trait loci (QTL) and their main and epistatic effects, has been an important topic in current QTL mapping. We extend the Bayesian model selection framework for mapping multiple epistatic QTL affecting continuous traits to dynamic traits in experimental crosses. The extension inherits the efficiency of Bayesian model selection and the flexibility of the Legendre polynomial model fitting to the change in genetic and environmental effects with time. We illustrate the proposed method by simultaneously detecting the main and epistatic QTLs for the growth of leaf age in a doubled-haploid population of rice. The behavior and performance of the method are also shown by computer simulation experiments. The results show that our method can more quickly identify interacting QTLs for dynamic traits in the models with many numbers of genetic effects, enhancing our understanding of genetic architecture for dynamic traits. Our proposed method can be treated as a general form of mapping QTL for continuous quantitative traits, being easier to extend to multiple traits and to a single trait with repeat records.
Bayesian model selection; dynamic trait; QTL; epistatic; Legendre polynomial
Meta-analysis of information from quantitative trait loci (QTL) mapping experiments was used to derive distributions of the effects of genes affecting quantitative traits. The two limitations of such information, that QTL effects as reported include experimental error, and that mapping experiments can only detect QTL above a certain size, were accounted for. Data from pig and dairy mapping experiments were used. Gamma distributions of QTL effects were fitted with maximum likelihood. The derived distributions were moderately leptokurtic, consistent with many genes of small effect and few of large effect. Seventeen percent and 35% of the leading QTL explained 90% of the genetic variance for the dairy and pig distributions respectively. The number of segregating genes affecting a quantitative trait in dairy populations was predicted assuming genes affecting a quantitative trait were neutral with respect to fitness. Between 50 and 100 genes were predicted, depending on the effective population size assumed. As data for the analysis included no QTL of small effect, the ability to estimate the number of QTL of small effect must inevitably be weak. It may be that there are more QTL of small effect than predicted by our gamma distributions. Nevertheless, the distributions have important implications for QTL mapping experiments and Marker Assisted Selection (MAS). Powerful mapping experiments, able to detect QTL of 0.1σp, will be required to detect enough QTL to explain 90% the genetic variance for a quantitative trait.
distribution of gene effects; quantitative trait loci; genetic variance; marker assisted selection
The ultimate goal of genetic mapping of quantitative trait loci (QTL) is the positional cloning of genes involved in any agriculturally or medically important phenotype. However, only a small portion (≤ 1%) of the QTL detected have been characterized at the molecular level, despite the report of hundreds of thousands of QTL for different traits and populations.
We develop a statistical model for detecting and characterizing the nucleotide structure and organization of haplotypes that underlie QTL responsible for a quantitative trait in an F2 pedigree. The discovery of such haplotypes by the new model will facilitate the molecular cloning of a QTL. Our model is founded on population genetic properties of genes that are segregating in a pedigree, constructed with the mixture-based maximum likelihood context and implemented with the EM algorithm. The closed forms have been derived to estimate the linkage and linkage disequilibria among different molecular markers, such as single nucleotide polymorphisms, and quantitative genetic effects of haplotypes constructed by non-alleles of these markers. Results from the analysis of a real example in mouse have validated the usefulness and utilization of the model proposed.
The model is flexible to be extended to model a complex network of genetic regulation that includes the interactions between different haplotypes and between haplotypes and environments.
Many biological traits are discretely distributed in phenotype but continuously distributed in genetics because they are controlled by multiple genes and environmental variants. Due to the quantitative nature of the genetic background, these multiple genes are called quantitative trait loci (QTL). When the QTL effects are treated as random, they can be estimated in a single generalized linear mixed model (GLMM), even if the number of QTL may be larger than the sample size. The GLMM in its original form cannot be applied to QTL mapping for discrete traits if there are missing genotypes. We examined two alternative missing genotype-handling methods: the expectation method and the overdispersion method. Simulation studies show that the two methods are efficient for multiple QTL mapping (MQM) under the GLMM framework. The overdispersion method showed slight advantages over the expectation method in terms of smaller mean-squared errors of the estimated QTL effects. The two methods of GLMM were applied to MQM for the female fertility trait of wheat. Multiple QTL were detected to control the variation of the number of seeded spikelets.
binary trait; binomial trait; mixed model; overdispersion; QTL
An ultimate goal of genetic research is to understand the connection between genotype and phenotype in order to improve the diagnosis and treatment of diseases. The quantitative genetics field has developed a suite of statistical methods to associate genetic loci with diseases and phenotypes, including quantitative trait loci (QTL) linkage mapping and genome-wide association studies (GWAS). However, each of these approaches have technical and biological shortcomings. For example, the amount of heritable variation explained by GWAS is often surprisingly small and the resolution of many QTL linkage mapping studies is poor. The predictive power and interpretation of QTL and GWAS results are consequently limited. In this study, we propose a complementary approach to quantitative genetics by interrogating the vast amount of high-throughput genomic data in model organisms to functionally associate genes with phenotypes and diseases. Our algorithm combines the genome-wide functional relationship network for the laboratory mouse and a state-of-the-art machine learning method. We demonstrate the superior accuracy of this algorithm through predicting genes associated with each of 1157 diverse phenotype ontology terms. Comparison between our prediction results and a meta-analysis of quantitative genetic studies reveals both overlapping candidates and distinct, accurate predictions uniquely identified by our approach. Focusing on bone mineral density (BMD), a phenotype related to osteoporotic fracture, we experimentally validated two of our novel predictions (not observed in any previous GWAS/QTL studies) and found significant bone density defects for both Timp2 and Abcg8 deficient mice. Our results suggest that the integration of functional genomics data into networks, which itself is informative of protein function and interactions, can successfully be utilized as a complementary approach to quantitative genetics to predict disease risks. All supplementary material is available at http://cbfg.jax.org/phenotype.
Many recent efforts to understand the genetic origins of complex diseases utilize statistical approaches to analyze phenotypic traits measured in genetically well-characterized populations. While these quantitative genetics methods are powerful, their success is limited by sampling biases and other confounding factors, and the biological interpretation of results can be challenging since these methods are not based on any functional information for candidate loci. On the other hand, the functional genomics field has greatly expanded in past years, both in terms of experimental approaches and analytical algorithms. However, functional approaches have been applied to understanding phenotypes in only the most basic ways. In this study, we demonstrate that functional genomics can complement traditional quantitative genetics by analytically extracting protein function information from large collections of high throughput data, which can then be used to predict genotype-phenotype associations. We applied our prediction methodology to the laboratory mouse, and we experimentally confirmed a role in osteoporosis for two of our predictions that were not candidates from any previous quantitative genetics study. The ability of our approach to produce accurate and unique predictions implies that functional genomics can complement quantitative genetics and can help address previous limitations in identifying disease genes.
Identifying genes that underlie quantitative trait loci (QTL) is a challenging task. Here, we present a new QTL software system, named QTL MatchMaker. The system is designed to integrate and mine QTL information across human, mouse and rat genomes and to annotate functional genomic data. It combines and organizes information from relevant public databases and publications and integrates QTL, physical, genetic and cytogenetic maps across human, mouse and rat. To make this application available to the research community we have developed a website for high-throughput mapping of expressed sequences to QTL and for selection of candidate genes in the physiological genomics context of complex traits. QTL MatchMaker is accessible at
Conventional multiple-trait quantitative trait locus (QTL) mapping methods must discard cases (individuals) with incomplete phenotypic data, thereby sacrificing other phenotypic and genotypic information contained in the discarded cases. Under standard assumptions about the missing-data mechanism, it is possible to exploit these cases.
We present an expectation-maximization (EM) algorithm, derived for recombinant inbred and F2 genetic models but extensible to any mating design, that supports conventional hypothesis tests for QTL main effect, pleiotropy, and QTL-by-environment interaction in multiple-trait analyses with missing phenotypic data. We evaluate its performance by simulations and illustrate with a real-data example.
The EM method affords improved QTL detection power and precision of QTL location and effect estimation in comparison with case deletion or imputation methods. It may be incorporated into any least-squares or likelihood-maximization QTL-mapping approach.