Search tips
Search criteria

Results 1-25 (946302)

Clipboard (0)

Related Articles

1.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
2.  Different sets of QTLs influence fitness variation in yeast 
We have carried out a combination of in-lab-evolution (ILE) and congenic crosses to identify the gene sets that contribute to the ability of yeast cells to survive under alkali stress.Each selected line acquired a different set of mutations, all resulting in the same phenotype. We identified a total of 15 genes in ILE and 17 candidates in the congenic approach, and studied their individual contribution to the phenotype.The total additive effect of the QTLs was much larger than the difference between the ancestor and the evolved strains, suggesting epistatic interactions between the QTLs.None of the genes identified encode structural components of the pH machinery. Instead, most encode regulatory functions, such as ubiquitin ligases, chromatin remodelers, GPI anchoring and copper/iron sensing transcription factors.
The majority of phenotypes in nature are complex traits affected by multiple genes [usually called quantitative trait loci (QTLs)], as well as by environmental factors. Many traits with practical importance such as crop yield in plants and susceptibility to various diseases in humans fall under this category. Understanding the architecture of complex traits has become the new frontier of genetic research, and many studies have greatly contributed to this field. However, to date, the genetic basis of only a few of these traits has been identified, and many questions regarding the architecture of complex traits and the accumulation of QTLs during evolution still remain unanswered. Among them are: How many QTLs affect complex phenotypes? What is the effect of each QTL? How do complex traits change during evolution? Is the adaptation process repeatable?, etc. In order to identify the QTLs that affect one of the important components of fitness variability in yeast, and to answer some of the questions above, we combined in-lab evolution (ILE) with the construction of congenic lines to isolate and map several gene sets that contribute to the ability of yeast cells to survive under alkali stress.
We carried out an ILE experiment, in which we grew yeast populations under increasing alkali stress to enrich for beneficial mutations. This process was followed by hybridizations to tiling arrays to identify the mutations acquired during the laboratory selective process. The ILE procedure revealed mutations in 15 genes, thus defining the QTLs and mechanisms that affect, in a quantitative fashion, the ability to cope with alkali stress. Our results indicate that during ILE several populations acquired different sets of QTLs that conferred the same phenotype. We identified each individual mutation in these strains, and validated and estimated their contribution to the phenotype. The total additive effect of the QTLs was much larger than the difference between the ancestor and the evolved strains, suggesting epistatic interactions between the QTLs.
In addition to the ILE, we have studied the mechanisms regulating fitness under alkali stress at natural habitats. We used a clinically isolated strain able to grow at high pH and a standard laboratory strain with a limited ability to sustain high pH as the parents of series of backcrosses to construct congenic lines up to the 8th generation. Seventeen genomic intervals that are candidates to contain QTLs were thus identified. In order to detect the contributing QTL in each interval, a predictive algorithm was applied, which scored the candidate genes in each genomic interval based on their interactions and similarity to the ILE genes. The algorithm was validated by testing the effect of the predicted candidate gene's deletions on the phenotype. Twelve out of 29 deletions were found to affect the trait (P-value 0.023).
Interestingly, our results show that almost all beneficial mutations affected regulatory genes, and not structural components of the pH homeostasis machinery (such as proton pumps, which control the cell's pH). The genes identified affect global regulators, such as ubiquitin ligases, proteins involved in GPI anchoring, copper sensing and chromatin remodelers. Thus, we show that adaptive changes tend to occur in genes with wide influence, rather than in genes narrowly affecting the phenotype selected for.
One example of genes identified both in the ILE and in the congenic lines is the copper-sensing transcription factor MAC1, and its downstream targets CTR1 and CTR3, which encode copper transporters. Different mutations at the same residue (Cys 271) were found in four out of five independent ILE lines. These mutations inactivate a copper-sensing region of Mac1 and cause up-regulation of its target genes. The CTR1 and CTR3 genes were identified in the congenic lines. Moreover, we found that a Ty transposable element is responsible for the decreased expression of CTR3 in some strains, and its excision caused transcriptional activation, affecting the ability to thrive at high pH.
This work provides insights on both evolutionary and genetic issues (such as the appearance of adaptive mutations and the architecture of complex traits), while at the same time providing information about the mechanisms that contribute to growth at high pH, a subject with ramifications for cell physiology, pathogenicity, and stress response.
Most of the phenotypes in nature are complex and are determined by many quantitative trait loci (QTLs). In this study we identify gene sets that contribute to one important complex trait: the ability of yeast cells to survive under alkali stress. We carried out an in-lab evolution (ILE) experiment, in which we grew yeast populations under increasing alkali stress to enrich for beneficial mutations. The populations acquired different sets of affecting alleles, showing that evolution can provide alternative solutions to the same challenge. We measured the contribution of each allele to the phenotype. The sum of the effects of the QTLs was larger than the difference between the ancestor phenotype and the evolved strains, suggesting epistatic interactions between the QTLs. In parallel, a clinical isolated strain was used to map natural QTLs affecting growth at high pH. In all, 17 candidate regions were found. Using a predictive algorithm based on the distances in protein-interaction networks, candidate genes were defined and validated by gene disruption. Many of the QTLs found by both methods are not directly implied in pH homeostasis but have more general, and often regulatory, roles.
PMCID: PMC2835564  PMID: 20160707
congenic lines; growth on alkali; in-lab evolution; QTL mapping; Saccharomyces cerevisiae
3.  Impact of Natural Genetic Variation on Gene Expression Dynamics 
PLoS Genetics  2013;9(6):e1003514.
DNA sequence variation causes changes in gene expression, which in turn has profound effects on cellular states. These variations affect tissue development and may ultimately lead to pathological phenotypes. A genetic locus containing a sequence variation that affects gene expression is called an “expression quantitative trait locus” (eQTL). Whereas the impact of cellular context on expression levels in general is well established, a lot less is known about the cell-state specificity of eQTL. Previous studies differed with respect to how “dynamic eQTL” were defined. Here, we propose a unified framework distinguishing static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. Further, we introduce a new approach to simultaneously infer eQTL from different cell types. By using murine mRNA expression data from four stages of hematopoiesis and 14 related cellular traits, we demonstrate that static, conditional and dynamic eQTL, although derived from the same expression data, represent functionally distinct types of eQTL. While static eQTL affect generic cellular processes, non-static eQTL are more often involved in hematopoiesis and immune response. Our analysis revealed substantial effects of individual genetic variation on cell type-specific expression regulation. Among a total number of 3,941 eQTL we detected 2,729 static eQTL, 1,187 eQTL were conditionally active in one or several cell types, and 70 eQTL affected expression changes during cell type transitions. We also found evidence for feedback control mechanisms reverting the effect of an eQTL specifically in certain cell types. Loci correlated with hematological traits were enriched for conditional eQTL, thus, demonstrating the importance of conditional eQTL for understanding molecular mechanisms underlying physiological trait variation. The classification proposed here has the potential to streamline and unify future analysis of conditional and dynamic eQTL as well as many other kinds of QTL data.
Author Summary
Complex physiological traits are affected through subtle changes of molecular traits like gene expression in the relevant tissues, which in turn are caused by genetic variation. A genetic locus containing a sequence variation affecting gene expression is called an expression quantitative trait locus (eQTL). Understanding the tissue and cell type specificity of eQTL effects is essential for revealing the molecular mechanisms underlying disease phenotypes. However, so far the cell-state dependence of eQTL is poorly understood. In order to systematically assess the importance of cell state-specific eQTL, we propose to distinguish static, conditional and dynamic eQTL and suggest strategies for mapping these eQTL classes. We applied our framework to mouse gene expression data from four hematopoietic stages and related cellular traits. The different eQTL classes, although derived from the same expression data, represent functionally distinct types of eQTL. Importantly, conditional eQTL are well correlated with relevant hematological traits. These findings emphasize the condition specificity of many regulatory relationships, even if the conditions under study are related. This calls for due caution when transferring conclusions about regulatory mechanisms across cell types or tissues. The proposed classification will also help to unravel dynamic behaviors in many other kinds of QTL data.
PMCID: PMC3674999  PMID: 23754949
4.  A general and efficient method for estimating continuous IBD functions for use in genome scans for QTL 
BMC Bioinformatics  2007;8:440.
Identity by descent (IBD) matrix estimation is a central component in mapping of Quantitative Trait Loci (QTL) using variance component models. A large number of algorithms have been developed for estimation of IBD between individuals in populations at discrete locations in the genome for use in genome scans to detect QTL affecting various traits of interest in experimental animal, human and agricultural pedigrees. Here, we propose a new approach to estimate IBD as continuous functions rather than as discrete values.
Estimation of IBD functions improved the computational efficiency and memory usage in genome scanning for QTL. We have explored two approaches to obtain continuous marker-bracket IBD-functions. By re-implementing an existing and fast deterministic IBD-estimation method, we show that this approach results in IBD functions that produces the exact same IBD as the original algorithm, but with a greater than 2-fold improvement of the computational efficiency and a considerably lower memory requirement for storing the resulting genome-wide IBD. By developing a general IBD function approximation algorithm, we show that it is possible to estimate marker-bracket IBD functions from IBD matrices estimated at marker locations by any existing IBD estimation algorithm. The general algorithm provides approximations that lead to QTL variance component estimates that even in worst-case scenarios are very similar to the true values. The approach of storing IBD as polynomial IBD-function was also shown to reduce the amount of memory required in genome scans for QTL.
In addition to direct improvements in computational and memory efficiency, estimation of IBD-functions is a fundamental step needed to develop and implement new efficient optimization algorithms for high precision localization of QTL. Here, we discuss and test two approaches for estimating IBD functions based on existing IBD estimation algorithms. Our approaches provide immediately useful techniques for use in single QTL analyses in the variance component QTL mapping framework. They will, however, be particularly useful in genome scans for multiple interacting QTL, where the improvements in both computational and memory efficiency are the key for successful development of efficient optimization algorithms to allow widespread use of this methodology.
PMCID: PMC2194736  PMID: 17999749
5.  Fast and Accurate Detection of Multiple Quantitative Trait Loci 
Journal of Computational Biology  2013;20(9):687-702.
We present a new computational scheme that enables efficient and reliable quantitative trait loci (QTL) scans for experimental populations. Using a standard brute-force exhaustive search effectively prohibits accurate QTL scans involving more than two loci to be performed in practice, at least if permutation testing is used to determine significance. Some more elaborate global optimization approaches, for example, DIRECT have been adopted earlier to QTL search problems. Dramatic speedups have been reported for high-dimensional scans. However, since a heuristic termination criterion must be used in these types of algorithms, the accuracy of the optimization process cannot be guaranteed. Indeed, earlier results show that a small bias in the significance thresholds is sometimes introduced.
Our new optimization scheme, PruneDIRECT, is based on an analysis leading to a computable (Lipschitz) bound on the slope of a transformed objective function. The bound is derived for both infinite- and finite-size populations. Introducing a Lipschitz bound in DIRECT leads to an algorithm related to classical Lipschitz optimization. Regions in the search space can be permanently excluded (pruned) during the optimization process. Heuristic termination criteria can thus be avoided. Hence, PruneDIRECT has a well-defined error bound and can in practice be guaranteed to be equivalent to a corresponding exhaustive search. We present simulation results that show that for simultaneous mapping of three QTLS using permutation testing, PruneDIRECT is typically more than 50 times faster than exhaustive search. The speedup is higher for stronger QTL. This could be used to quickly detect strong candidate eQTL networks.
PMCID: PMC3761440  PMID: 23919387
algorithms; branch-and-bound; genetic mapping; genomics; statistical models; statistics
6.  Integrating genome annotation and QTL position to identify candidate genes for productivity, architecture and water-use efficiency in Populus spp 
BMC Plant Biology  2012;12:173.
Hybrid poplars species are candidates for biomass production but breeding efforts are needed to combine productivity and water use efficiency in improved cultivars. The understanding of the genetic architecture of growth in poplar by a Quantitative Trait Loci (QTL) approach can help us to elucidate the molecular basis of such integrative traits but identifying candidate genes underlying these QTLs remains difficult. Nevertheless, the increase of genomic information together with the accessibility to a reference genome sequence (Populus trichocarpa Nisqually-1) allow to bridge QTL information on genetic maps and physical location of candidate genes on the genome. The objective of the study is to identify QTLs controlling productivity, architecture and leaf traits in a P. deltoides x P. trichocarpa F1 progeny and to identify candidate genes underlying QTLs based on the anchoring of genetic maps on the genome and the gene ontology information linked to genome annotation. The strategy to explore genome annotation was to use Gene Ontology enrichment tools to test if some functional categories are statistically over-represented in QTL regions.
Four leaf traits and 7 growth traits were measured on 330 F1 P. deltoides x P. trichocarpa progeny. A total of 77 QTLs controlling 11 traits were identified explaining from 1.8 to 17.2% of the variation of traits. For 58 QTLs, confidence intervals could be projected on the genome. An extended functional annotation was built based on data retrieved from the plant genome database Phytozome and from an inference of function using homology between Populus and the model plant Arabidopsis. Genes located within QTL confidence intervals were retrieved and enrichments in gene ontology (GO) terms were determined using different methods. Significant enrichments were found for all traits. Particularly relevant biological processes GO terms were identified for QTLs controlling number of sylleptic branches: intervals were enriched in GO terms of biological process like ‘ripening’ and ‘adventitious roots development’.
Beyond the simple identification of QTLs, this study is the first to use a global approach of GO terms enrichment analysis to fully explore gene function under QTLs confidence intervals in plants. This global approach may lead to identification of new candidate genes for traits of interest.
PMCID: PMC3520807  PMID: 23013168
7.  Detection of quantitative trait loci affecting serum cholesterol, LDL, HDL, and triglyceride in pigs 
BMC Genetics  2011;12:62.
Serum lipids are associated with many serious cardiovascular diseases and obesity problems. Many quantitative trait loci (QTL) have been reported in the pig mostly for performance traits but very few for the serum lipid traits. In contrast, remarkable numbers of QTL are mapped for serum lipids in humans and mice. Therefore, the objective of this research was to investigate the chromosomal regions influencing the serum level of the total cholesterol (CT), triglyceride (TG), high density protein cholesterol (HDL) and low density protein cholesterol (LDL) in pigs. For this purpose, a total of 330 animals from a Duroc × Pietrain F2 resource population were phenotyped for serum lipids using ELISA and were genotyped by using 122 microsatellite markers covering all porcine autosomes for QTL study in QTL Express. Blood sampling was performed at approximately 175 days before slaughter of the pig.
Most of the traits were correlated with each other and were influenced by average daily gain, slaughter date and age. A total of 18 QTL including three QTL with imprinting effect were identified on 11 different porcine autosomes. Most of the QTL reached to 5% chromosome-wide (CW) level significance including a QTL at 5% experiment-wide (GW) and a QTL at 1% GW level significance. Of these QTL four were identified for both the CT and LDL and two QTL were identified for both the TG and LDL. Moreover, three chromosomal regions were detected for the HDL/LDL ratio in this study. One QTL for HDL on SSC2 and two QTL for TG on SSC11 and 17 were detected with imprinting effect. The highly significant QTL (1% GW) was detected for LDL at 82 cM on SSC1, whereas significant QTL (5% GW) was identified for HDL/LDL on SSC1 at 87 cM. Chromosomal regions with pleiotropic effects were detected for correlated traits on SSC1, 7 and 12. Most of the QTL identified for serum lipid traits correspond with the previously reported QTL for similar traits in other mammals. Two novel QTL on SSC16 for HDL and HDL/LDL ratio and an imprinted QTL on SSS17 for TG were detected in the pig for the first time.
The newly identified QTL are potentially involved in lipid metabolism. The results of this work shed new light on the genetic background of serum lipid concentrations and these findings will be helpful to identify candidate genes in these QTL regions related to lipid metabolism and serum lipid concentrations in pigs.
PMCID: PMC3146427  PMID: 21752294
pig; QTL; serum lipids; F2 population
8.  Statistical properties of interval mapping methods on quantitative trait loci location: impact on QTL/eQTL analyses 
BMC Genetics  2012;13:29.
Quantitative trait loci (QTL) detection on a huge amount of phenotypes, like eQTL detection on transcriptomic data, can be dramatically impaired by the statistical properties of interval mapping methods. One of these major outcomes is the high number of QTL detected at marker locations. The present study aims at identifying and specifying the sources of this bias, in particular in the case of analysis of data issued from outbred populations. Analytical developments were carried out in a backcross situation in order to specify the bias and to propose an algorithm to control it. The outbred population context was studied through simulated data sets in a wide range of situations.
The likelihood ratio test was firstly analyzed under the "one QTL" hypothesis in a backcross population. Designs of sib families were then simulated and analyzed using the QTL Map software. On the basis of the theoretical results in backcross, parameters such as the population size, the density of the genetic map, the QTL effect and the true location of the QTL, were taken into account under the "no QTL" and the "one QTL" hypotheses. A combination of two non parametric tests - the Kolmogorov-Smirnov test and the Mann-Whitney-Wilcoxon test - was used in order to identify the parameters that affected the bias and to specify how much they influenced the estimation of QTL location.
A theoretical expression of the bias of the estimated QTL location was obtained for a backcross type population. We demonstrated a common source of bias under the "no QTL" and the "one QTL" hypotheses and qualified the possible influence of several parameters. Simulation studies confirmed that the bias exists in outbred populations under both the hypotheses of "no QTL" and "one QTL" on a linkage group. The QTL location was systematically closer to marker locations than expected, particularly in the case of low QTL effect, small population size or low density of markers, i.e. designs with low power. Practical recommendations for experimental designs for QTL detection in outbred populations are given on the basis of this bias quantification. Furthermore, an original algorithm is proposed to adjust the location of a QTL, obtained with interval mapping, which co located with a marker.
Therefore, one should be attentive when one QTL is mapped at the location of one marker, especially under low power conditions.
PMCID: PMC3386024  PMID: 22520935
QTL; linkage analysis; QTL location; bias
9.  MetaQTL: a package of new computational methods for the meta-analysis of QTL mapping experiments 
BMC Bioinformatics  2007;8:49.
Integration of multiple results from Quantitative Trait Loci (QTL) studies is a key point to understand the genetic determinism of complex traits. Up to now many efforts have been made by public database developers to facilitate the storage, compilation and visualization of multiple QTL mapping experiment results. However, studying the congruency between these results still remains a complex task. Presently, the few computational and statistical frameworks to do so are mainly based on empirical methods (e.g. consensus genetic maps are generally built by iterative projection).
In this article, we present a new computational and statistical package, called MetaQTL, for carrying out whole-genome meta-analysis of QTL mapping experiments. Contrary to existing methods, MetaQTL offers a complete statistical process to establish a consensus model for both the marker and the QTL positions on the whole genome. First, MetaQTL implements a new statistical approach to merge multiple distinct genetic maps into a single consensus map which is optimal in terms of weighted least squares and can be used to investigate recombination rate heterogeneity between studies. Secondly, assuming that QTL can be projected on the consensus map, MetaQTL offers a new clustering approach based on a Gaussian mixture model to decide how many QTL underly the distribution of the observed QTL.
We demonstrate using simulations that the usual model choice criteria from mixture model literature perform relatively well in this context. As expected, simulations also show that this new clustering algorithm leads to a reduction in the length of the confidence interval of QTL location provided that across studies there are enough observed QTL for each underlying true QTL location. The usefulness of our approach is illustrated on published QTL detection results of flowering time in maize. Finally, MetaQTL is freely available at .
PMCID: PMC1808479  PMID: 17288608
10.  Confirmation and fine-mapping of a major QTL for resistance to infectious pancreatic necrosis in Atlantic salmon (Salmo salar): population-level associations between markers and trait 
BMC Genomics  2009;10:368.
Infectious pancreatic necrosis (IPN) is one of the most prevalent and economically devastating diseases in Atlantic salmon (Salmo salar) farming worldwide. The disease causes large mortalities at both the fry- and post-smolt stages. Family selection for increased IPN resistance is performed through the use of controlled challenge tests, where survival rates of sib-groups are recorded. However, since challenge-tested animals cannot be used as breeding candidates, within-family selection is not performed and only half of the genetic variation for IPN resistance is being exploited. DNA markers linked to quantitative trait loci (QTL) affecting IPN resistance would therefore be a powerful selection tool. The aim of this study was to identify and fine-map QTL for IPN-resistance in Atlantic salmon, for use in marker-assisted selection to increase the rate of genetic improvement for this trait.
A genome scan was carried out using 10 large full-sib families of challenge-tested Atlantic salmon post-smolts and microsatellite markers distributed across the genome. One major QTL for IPN-resistance was detected, explaining 29% and 83% of the phenotypic and genetic variances, respectively. This QTL mapped to the same location as a QTL recently detected in a Scottish Atlantic salmon population. The QTL was found to be segregating in 10 out of 20 mapping parents, and subsequent fine-mapping with additional markers narrowed the QTL peak to a 4 cM region on linkage group 21. Challenge-tested fry were used to show that the QTL had the same effect on fry as on post-smolt, with the confidence interval for QTL position in fry overlapping the confidence interval found in post-smolts. A total of 178 parents were tested for segregation of the QTL, identifying 72 QTL-heterozygous parents. Genotypes at QTL-heterozygous parents were used to determine linkage phases between alleles at the underlying DNA polymorphism and alleles at single markers or multi-marker haplotypes. One four-marker haplotype was found to be the best predictor of QTL alleles, and was successfully used to deduce genotypes of the underlying polymorphism in 72% of the parents of the next generation within a breeding nucleus. A highly significant population-level correlation was found between deduced alleles at the underlying polymorphism and survival of offspring groups in the fry challenge test, parents with the three deduced genotypes (QQ, Qq, qq) having mean offspring mortality rates of 0.13, 0.32, and 0.49, respectively. The frequency of the high-resistance allele (Q) in the population was estimated to be 0.30. Apart from this major QTL, one other experiment-wise significant QTL for IPN-resistance was detected, located on linkage group 4.
The QTL confirmed in this study represents a case of a major gene explaining the bulk of genetic variation for a presumed complex trait. QTL genotypes were deduced within most parents of the 2005 generation of a major breeding company, providing a solid framework for linkage-based MAS within the whole population in subsequent generations. Since haplotype-trait associations valid at the population level were found, there is also a potential for MAS based on linkage disequilibrium (LD). However, in order to use MAS across many generations without reassessment of linkage phases between markers and the underlying polymorphism, the QTL needs to be positioned with even greater accuracy. This will require higher marker densities than are currently available.
PMCID: PMC2728743  PMID: 19664221
11.  Mapping epistatic quantitative trait loci 
BMC Genetics  2014;15(1):112.
How to map quantitative trait loci (QTL) with epistasis efficiently and reliably has been a persistent problem for QTL mapping analysis. There are a number of difficulties for studying epistatic QTL. Linkage can impose a significant challenge for finding epistatic QTL reliably. If multiple QTL are in linkage and have interactions, searching for QTL can become a very delicate issue. A commonly used strategy that performs a two-dimensional genome scan to search for a pair of QTL with epistasis can suffer from low statistical power and also may lead to false identification due to complex linkage disequilibrium and interaction patterns.
To tackle the problem of complex interaction of multiple QTL with linkage, we developed a three-stage search strategy. In the first stage, main effect QTL are searched and mapped. In the second stage, epistatic QTL that interact significantly with other identified QTL are searched. In the third stage, new epistatic QTL are searched in pairs. This strategy is based on the consideration that most genetic variance is due to the main effects of QTL. Thus by first mapping those main-effect QTL, the statistical power for the second and third stages of analysis for mapping epistatic QTL can be maximized. The search for main effect QTL is robust and does not bias the search for epistatic QTL due to a genetic property associated with the orthogonal genetic model that the additive and additive by additive variances are independent despite of linkage. The model search criterion is empirically and dynamically evaluated by using a score-statistic based resampling procedure. We demonstrate through simulations that the method has good power and low false positive in the identification of QTL and epistasis.
This method provides an effective and powerful solution to map multiple QTL with complex epistatic pattern. The method has been implemented in the user-friendly computer software Windows QTL Cartographer. This will greatly facilitate the application of the method for QTL mapping data analysis.
Electronic supplementary material
The online version of this article (doi:10.1186/s12863-014-0112-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4226885  PMID: 25367219
Quantitative trait loci; Epistasis; Model selection; Sequential search
12.  A Multiparent Advanced Generation Inter-Cross to Fine-Map Quantitative Traits in Arabidopsis thaliana 
PLoS Genetics  2009;5(7):e1000551.
Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms.
Author Summary
Most traits of economic and evolutionary interest vary quantitatively and have multiple genes affecting their expression. Dissecting the genetic basis of such traits is crucial for the improvement of crops and management of diseases. Here, we develop a new resource to identify genes underlying such quantitative traits in Arabidopsis thaliana, a genetic model organism in plants. We show that using a large population of inbred lines derived from intercrossing 19 parents, we can localize the genes underlying quantitative traits better than with existing methods. Using these lines, we were able to replicate the identification of previously known genes that affect developmental traits in A. thaliana and identify some new ones. This paper also presents all the necessary biological and computational material necessary for the scientific community to use these lines in their own research. Our results suggest that the use of lines derived from a multiparent advanced generation inter-cross (MAGIC lines) should be very useful in other organisms.
PMCID: PMC2700969  PMID: 19593375
13.  Effect of advanced intercrossing on genome structure and on the power to detect linked quantitative trait loci in a multi-parent population: a simulation study in rice 
BMC Genetics  2014;15:50.
In genetic analysis of agronomic traits, quantitative trait loci (QTLs) that control the same phenotype are often closely linked. Furthermore, many QTLs are localized in specific genomic regions (QTL clusters) that include naturally occurring allelic variations in different genes. Therefore, linkage among QTLs may complicate the detection of each individual QTL. This problem can be resolved by using populations that include many potential recombination sites. Recently, multi-parent populations have been developed and used for QTL analysis. However, their efficiency for detection of linked QTLs has not received attention. By using information on rice, we simulated the construction of a multi-parent population followed by cycles of recurrent crossing and inbreeding, and we investigated the resulting genome structure and its usefulness for detecting linked QTLs as a function of the number of cycles of recurrent crossing.
The number of non-recombinant genome segments increased linearly with an increasing number of cycles. The mean and median lengths of the non-recombinant genome segments decreased dramatically during the first five to six cycles, then decreased more slowly during subsequent cycles. Without recurrent crossing, we found that there is a risk of missing QTLs that are linked in a repulsion phase, and a risk of identifying linked QTLs in a coupling phase as a single QTL, even when the population was derived from eight parental lines. In our simulation results, using fewer than two cycles of recurrent crossing produced results that differed little from the results with zero cycles, whereas using more than six cycles dramatically improved the power under most of the conditions that we simulated.
Our results indicated that even with a population derived from eight parental lines, fewer than two cycles of crossing does not improve the power to detect linked QTLs. However, using six cycles dramatically improved the power, suggesting that advanced intercrossing can help to resolve the problems that result from linkage among QTLs.
PMCID: PMC4101851  PMID: 24767139
QTL; Rice; Simulation; Advanced intercrossing
14.  A Bayesian Partition Method for Detecting Pleiotropic and Epistatic eQTL Modules 
PLoS Computational Biology  2010;6(1):e1000642.
Studies of the relationship between DNA variation and gene expression variation, often referred to as “expression quantitative trait loci (eQTL) mapping”, have been conducted in many species and resulted in many significant findings. Because of the large number of genes and genetic markers in such analyses, it is extremely challenging to discover how a small number of eQTLs interact with each other to affect mRNA expression levels for a set of co-regulated genes. We present a Bayesian method to facilitate the task, in which co-expressed genes mapped to a common set of markers are treated as a module characterized by latent indicator variables. A Markov chain Monte Carlo algorithm is designed to search simultaneously for the module genes and their linked markers. We show by simulations that this method is more powerful for detecting true eQTLs and their target genes than traditional QTL mapping methods. We applied the procedure to a data set consisting of gene expression and genotypes for 112 segregants of S. cerevisiae. Our method identified modules containing genes mapped to previously reported eQTL hot spots, and dissected these large eQTL hot spots into several modules corresponding to possibly different biological functions or primary and secondary responses to regulatory perturbations. In addition, we identified nine modules associated with pairs of eQTLs, of which two have been previously reported. We demonstrated that one of the novel modules containing many daughter-cell expressed genes is regulated by AMN1 and BPH1. In conclusion, the Bayesian partition method which simultaneously considers all traits and all markers is more powerful for detecting both pleiotropic and epistatic effects based on both simulated and empirical data.
Author Summary
Genome-wide association studies (GWAS) have yielded several causal genes for many human diseases. However, the mechanisms underlying how DNA variations affect disease phenotypes have not been well understood in many cases. Gene expression is intermediate between DNA and clinical endpoints. Linking DNA variation and gene expression variation, often referred to as “expression quantitative trait loci (eQTL) mapping”, has yielded clues of mechanisms and pathways by which DNA variations impact phenotypes. Because of the large number of genes and genetic markers in such analyses, it is extremely challenging to discover how a small number of eQTLs interact with each other to affect mRNA expression levels for a set of co-regulated genes. We present a Bayesian method to identify genetic interactions and more eQTLs by treating co-expressed genes as a module. Our method provides a tool to study genetic interactions in human disease models.
PMCID: PMC2797600  PMID: 20090830
15.  Growth-related quantitative trait loci in domestic and wild rainbow trout (Oncorhynchus mykiss) 
BMC Genetics  2010;11:63.
Somatic growth is a complex process that involves the action and interaction of genes and environment. A number of quantitative trait loci (QTL) previously identified for body weight and condition factor in rainbow trout (Oncorhynchus mykiss), and two other salmonid species, were used to further investigate the genetic architecture of growth-influencing genes in this species. Relationships among previously mapped candidate genes for growth and their co-localization to identified QTL regions are reported. Furthermore, using a comparative genomic analysis of syntenic rainbow trout linkage group clusters to their homologous regions within model teleost species such as zebrafish, stickleback and medaka, inferences were made regarding additional possible candidate genes underlying identified QTL regions.
Body weight (BW) QTL were detected on the majority of rainbow trout linkage groups across 10 parents from 3 strains. However, only 10 linkage groups (i.e., RT-3, -6, -8, -9, -10, -12, -13, -22, -24, -27) possessed QTL regions with chromosome-wide or genome-wide effects across multiple parents. Fewer QTL for condition factor (K) were identified and only six instances of co-localization across families were detected (i.e. RT-9, -15, -16, -23, -27, -31 and RT-2/9 homeologs). Of note, both BW and K QTL co-localize on RT-9 and RT-27. The incidence of epistatic interaction across genomic regions within different female backgrounds was also examined, and although evidence for interaction effects within certain QTL regions were evident, these interactions were few in number and statistically weak. Of interest, however, was the fact that these predominantly occurred within K QTL regions. Currently mapped growth candidate genes are largely congruent with the identified QTL regions. More QTL were detected in male, compared to female parents, with the greatest number evident in an F1 male parent derived from an intercross between domesticated and wild strain of rainbow trout which differed strongly in growth rate.
Strain background influences the degree to which QTL effects are evident for growth-related genes. The process of domestication (which primarily selects faster growing fish) may largely reduce the genetic influences on growth-specific phenotypic variation. Although heritabilities have been reported to be relatively high for both BW and K growth traits, the genetic architecture of K phenotypic variation appears less defined (i.e., fewer major contributing QTL regions were identified compared with BW QTL regions).
PMCID: PMC2914766  PMID: 20609225
16.  A microsatellite-based consensus linkage map for species of Eucalyptus and a novel set of 230 microsatellite markers for the genus 
BMC Plant Biology  2006;6:20.
Eucalypts are the most widely planted hardwood trees in the world occupying globally more than 18 million hectares as an important source of carbon neutral renewable energy and raw material for pulp, paper and solid wood. Quantitative Trait Loci (QTLs) in Eucalyptus have been localized on pedigree-specific RAPD or AFLP maps seriously limiting the value of such QTL mapping efforts for molecular breeding. The availability of a genus-wide genetic map with transferable microsatellite markers has become a must for the effective advancement of genomic undertakings. This report describes the development of a novel set of 230 EMBRA microsatellites, the construction of the first comprehensive microsatellite-based consensus linkage map for Eucalyptus and the consolidation of existing linkage information for other microsatellites and candidate genes mapped in other species of the genus.
The consensus map covers ~90% of the recombining genome of Eucalyptus, involves 234 mapped EMBRA loci on 11 linkage groups, an observed length of 1,568 cM and a mean distance between markers of 8.4 cM. A compilation of all microsatellite linkage information published in Eucalyptus allowed us to establish the homology among linkage groups between this consensus map and other maps published for E. globulus. Comparative mapping analyses also resulted in the linkage group assignment of other 41 microsatellites derived from other Eucalyptus species as well as candidate genes and QTLs for wood and flowering traits published in the literature. This report significantly increases the availability of microsatellite markers and mapping information for species of Eucalyptus and corroborates the high conservation of microsatellite flanking sequences and locus ordering between species of the genus.
This work represents an important step forward for Eucalyptus comparative genomics, opening stimulating perspectives for evolutionary studies and molecular breeding applications. The generalized use of an increasingly larger set of interspecific transferable markers and consensus mapping information, will allow faster and more detailed investigations of QTL synteny among species, validation of expression-QTL across variable genetic backgrounds and positioning of a growing number of candidate genes co-localized with QTLs, to be tested in association mapping experiments.
PMCID: PMC1599733  PMID: 16995939
17.  Simultaneous inferences based on empirical Bayes methods and false discovery rates ineQTL data analysis 
BMC Genomics  2013;14(Suppl 8):S8.
Genome-wide association studies (GWAS) have identified hundreds of genetic variants associated with complex human diseases, clinical conditions and traits. Genetic mapping of expression quantitative trait loci (eQTLs) is providing us with novel functional effects of thousands of single nucleotide polymorphisms (SNPs). In a classical quantitative trail loci (QTL) mapping problem multiple tests are done to assess whether one trait is associated with a number of loci. In contrast to QTL studies, thousands of traits are measured alongwith thousands of gene expressions in an eQTL study. For such a study, a huge number of tests have to be performed (~106). This extreme multiplicity gives rise to many computational and statistical problems. In this paper we have tried to address these issues using two closely related inferential approaches: an empirical Bayes method that bears the Bayesian flavor without having much a priori knowledge and the frequentist method of false discovery rates. A three-component t-mixture model has been used for the parametric empirical Bayes (PEB) method. Inferences have been obtained using Expectation/Conditional Maximization Either (ECME) algorithm. A simulation study has also been performed and has been compared with a nonparametric empirical Bayes (NPEB) alternative.
The results show that PEB has an edge over NPEB. The proposed methodology has been applied to human liver cohort (LHC) data. Our method enables to discover more significant SNPs with FDR<10% compared to the previous study done by Yang et al. (Genome Research, 2010).
In contrast to previously available methods based on p-values, the empirical Bayes method uses local false discovery rate (lfdr) as the threshold. This method controls false positive rate.
PMCID: PMC4042241  PMID: 24564682
18.  High-density genetic linkage map construction and identification of fruit-related QTLs in pear using SNP and SSR markers 
Journal of Experimental Botany  2014;65(20):5771-5781.
A highly saturated pear genetic map was constructed using over 3000 SNP markers developed by RADseq integrated with anchored SSR markers. The reliable QTLs of several fruit traits were also identified.
Pear (Pyrus spp) is an important fruit crop, grown in all temperate regions of the world, with global production ranked after grape and apples among deciduous tree crops. A high-density linkage map is a valuable tool for fine mapping quantitative trait loci (QTL) and map-based gene cloning. In this study, we firstly constructed a high-density linkage map of pear using SNPs integrated with SSRs, developed by the rapid and robust technology of restriction-associated DNA sequencing (RADseq). The linkage map consists of 3143 SNP markers and 98 SSRs, 3241 markers in total, spanning 2243.4 cM, with an average marker distance of 0.70 cM. Anchoring SSRs were able to anchor seventeen linkage groups to their corresponding chromosomes. Based on this high-density integrated pear linkage map and two years of fruit phenotyping, a total of 32 potential QTLs for 11 traits, including length of pedicel (LFP), single fruit weight (SFW), soluble solid content (SSC), transverse diameter (TD), vertical diameter (VD), calyx status (CS), flesh colour (FC), juice content (JC), number of seeds (NS), skin colour (SC), and skin smooth (SS), were identified and positioned on the genetic map. Among them, some important fruit-related traits have for the first time been identified, such as calyx status, length of pedicel, and flesh colour, and reliable localization of QTLs were verified repeatable. This high-density linkage map of pear is a worthy reference for mapping important fruit traits, QTL identification, and comparison and combination of different genetic maps.
PMCID: PMC4203118  PMID: 25129128
Genetic linkage map; pear; QTL; RADseq; SNP; SSR.
19.  Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping 
BMC Bioinformatics  2011;12:211.
The Bayesian shrinkage technique has been applied to multiple quantitative trait loci (QTLs) mapping to estimate the genetic effects of QTLs on quantitative traits from a very large set of possible effects including the main and epistatic effects of QTLs. Although the recently developed empirical Bayes (EB) method significantly reduced computation comparing with the fully Bayesian approach, its speed and accuracy are limited by the fact that numerical optimization is required to estimate the variance components in the QTL model.
We developed a fast empirical Bayesian LASSO (EBLASSO) method for multiple QTL mapping. The fact that the EBLASSO can estimate the variance components in a closed form along with other algorithmic techniques render the EBLASSO method more efficient and accurate. Comparing with the EB method, our simulation study demonstrated that the EBLASSO method could substantially improve the computational speed and detect more QTL effects without increasing the false positive rate. Particularly, the EBLASSO algorithm running on a personal computer could easily handle a linear QTL model with more than 100,000 variables in our simulation study. Real data analysis also demonstrated that the EBLASSO method detected more reasonable effects than the EB method. Comparing with the LASSO, our simulation showed that the current version of the EBLASSO implemented in Matlab had similar speed as the LASSO implemented in Fortran, and that the EBLASSO detected the same number of true effects as the LASSO but a much smaller number of false positive effects.
The EBLASSO method can handle a large number of effects possibly including both the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTL mapping.
PMCID: PMC3125263  PMID: 21615941
20.  Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules 
BMC Genomics  2013;14:196.
Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant.
While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso.
Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.
PMCID: PMC3616858  PMID: 23514438
21.  Genome-wide genetic dissection of germplasm resources and implications for breeding by design in soybean 
Breeding Science  2012;61(5):495-510.
“Breeding by Design” as a concept described by Peleman and van der Voort aims to bring together superior alleles for all genes of agronomic importance from potential genetic resources. This might be achievable through high-resolution allele detection based on precise QTL (quantitative trait locus/loci) mapping of potential parental resources. The present paper reviews the works at the Chinese National Center for Soybean Improvement (NCSI) on exploration of QTL and their superior alleles of agronomic traits for genetic dissection of germplasm resources in soybeans towards practicing “Breeding by Design”. Among the major germplasm resources, i.e. released commercial cultivar (RC), farmers’ landrace (LR) and annual wild soybean accession (WS), the RC was recognized as the primary potential adapted parental sources, with a great number of new alleles (45.9%) having emerged and accumulated during the 90 years’ scientific breeding processes. A mapping strategy, i.e. a full model procedure (including additive (A), epistasis (AA), A × environment (E) and AA × E effects), scanning with QTLNetwork2.0 and followed by verification with other procedures, was suggested and used for the experimental data when the underlying genetic model was usually unknown. In total, 110 data sets of 81 agronomically important traits were analyzed for their QTL, with 14.5% of the data sets showing major QTL (contribution rate more than 10.0% for each QTL), 55.5% showing a few major QTL but more small QTL, and 30.0% having only small QTL. In addition to the detected QTL, the collective unmapped minor QTL sometimes accounted for more than 50% of the genetic variation in a number of traits. Integrated with linkage mapping, association mappings were conducted on germplasm populations and validated to be able to provide complete information on multiple QTL and their multiple alleles. Accordingly, the QTL and their alleles of agronomic traits for large samples of RC, LR and WS were identified and then the QTL-allele matrices were established. Based on which the parental materials can be chosen for complementary recombination among loci and alleles to make the crossing plans genetically optimized. This approach has provided a way towards breeding by design, but the accuracy will depend on the precision of the loci and allele matrices.
PMCID: PMC3406800  PMID: 23136489
soybean; Breeding by Design; germplasm resources; QTL mapping; type of QTL constitution; association mapping; germplasm genomics
22.  solQTL: a tool for QTL analysis, visualization and linking to genomes at SGN database 
BMC Bioinformatics  2010;11:525.
A common approach to understanding the genetic basis of complex traits is through identification of associated quantitative trait loci (QTL). Fine mapping QTLs requires several generations of backcrosses and analysis of large populations, which is time-consuming and costly effort. Furthermore, as entire genomes are being sequenced and an increasing amount of genetic and expression data are being generated, a challenge remains: linking phenotypic variation to the underlying genomic variation. To identify candidate genes and understand the molecular basis underlying the phenotypic variation of traits, bioinformatic approaches are needed to exploit information such as genetic map, expression and whole genome sequence data of organisms in biological databases.
The Sol Genomics Network (SGN, is a primary repository for phenotypic, genetic, genomic, expression and metabolic data for the Solanaceae family and other related Asterids species and houses a variety of bioinformatics tools. SGN has implemented a new approach to QTL data organization, storage, analysis, and cross-links with other relevant data in internal and external databases. The new QTL module, solQTL,, employs a user-friendly web interface for uploading raw phenotype and genotype data to the database, R/QTL mapping software for on-the-fly QTL analysis and algorithms for online visualization and cross-referencing of QTLs to relevant datasets and tools such as the SGN Comparative Map Viewer and Genome Browser. Here, we describe the development of the solQTL module and demonstrate its application.
solQTL allows Solanaceae researchers to upload raw genotype and phenotype data to SGN, perform QTL analysis and dynamically cross-link to relevant genetic, expression and genome annotations. Exploration and synthesis of the relevant data is expected to help facilitate identification of candidate genes underlying phenotypic variation and markers more closely linked to QTLs. solQTL is freely available on SGN and can be used in private or public mode.
PMCID: PMC2984588  PMID: 20964836
23.  QTLs for Seed Vigor-Related Traits Identified in Maize Seeds Germinated under Artificial Aging Conditions 
PLoS ONE  2014;9(3):e92535.
High seed vigor is important for agricultural production due to the associated potential for increased growth and productivity. However, a better understanding of the underlying molecular mechanisms is required because the genetic basis for seed vigor remains unknown. We used single-nucleotide polymorphism (SNP) markers to map quantitative trait loci (QTLs) for four seed vigor traits in two connected recombinant inbred line (RIL) maize populations under four treatment conditions during seed germination. Sixty-five QTLs distributed between the two populations were identified and a meta-analysis was used to integrate genetic maps. Sixty-one initially identified QTLs were integrated into 18 meta-QTLs (mQTLs). Initial QTLs with contribution to phenotypic variation values of R2>10% were integrated into mQTLs. Twenty-three candidate genes for association with seed vigor traits coincided with 13 mQTLs. The candidate genes had functions in the glycolytic pathway and in protein metabolism. QTLs with major effects (R2>10%) were identified under at least one treatment condition for mQTL2, mQTL3-2, and mQTL3-4. Candidate genes included a calcium-dependent protein kinase gene (302810918) involved in signal transduction that mapped in the mQTL3-2 interval associated with germination energy (GE) and germination percentage (GP), and an hsp20/alpha crystallin family protein gene (At5g51440) that mapped in the mQTL3-4 interval associated with GE and GP. Two initial QTLs with a major effect under at least two treatment conditions were identified for mQTL5-2. A cucumisin-like Ser protease gene (At5g67360) mapped in the mQTL5-2 interval associated with GP. The chromosome regions for mQTL2, mQTL3-2, mQTL3-4, and mQTL5-2 may be hot spots for QTLs related to seed vigor traits. The mQTLs and candidate genes identified in this study provide valuable information for the identification of additional quantitative trait genes.
PMCID: PMC3961396  PMID: 24651614
24.  A comprehensive meta QTL analysis for fiber quality, yield, yield related and morphological traits, drought tolerance, and disease resistance in tetraploid cotton 
BMC Genomics  2013;14:776.
The study of quantitative trait loci (QTL) in cotton (Gossypium spp.) is focused on traits of agricultural significance. Previous studies have identified a plethora of QTL attributed to fiber quality, disease and pest resistance, branch number, seed quality and yield and yield related traits, drought tolerance, and morphological traits. However, results among these studies differed due to the use of different genetic populations, markers and marker densities, and testing environments. Since two previous meta-QTL analyses were performed on fiber traits, a number of papers on QTL mapping of fiber quality, yield traits, morphological traits, and disease resistance have been published. To obtain a better insight into the genome-wide distribution of QTL and to identify consistent QTL for marker assisted breeding in cotton, an updated comparative QTL analysis is needed.
In this study, a total of 1,223 QTL from 42 different QTL studies in Gossypium were surveyed and mapped using Biomercator V3 based on the Gossypium consensus map from the Cotton Marker Database. A meta-analysis was first performed using manual inference and confirmed by Biomercator V3 to identify possible QTL clusters and hotspots. QTL clusters are composed of QTL of various traits which are concentrated in a specific region on a chromosome, whereas hotspots are composed of only one trait type. QTL were not evenly distributed along the cotton genome and were concentrated in specific regions on each chromosome. QTL hotspots for fiber quality traits were found in the same regions as the clusters, indicating that clusters may also form hotspots.
Putative QTL clusters were identified via meta-analysis and will be useful for breeding programs and future studies involving Gossypium QTL. The presence of QTL clusters and hotspots indicates consensus regions across cultivated tetraploid Gossypium species, environments, and populations which contain large numbers of QTL, and in some cases multiple QTL associated with the same trait termed a hotspot. This study combines two previous meta-analysis studies and adds all other currently available QTL studies, making it the most comprehensive meta-analysis study in cotton to date.
PMCID: PMC3830114  PMID: 24215677
25.  A Statistical Framework for Joint eQTL Analysis in Multiple Tissues 
PLoS Genetics  2013;9(5):e1003486.
Mapping expression Quantitative Trait Loci (eQTLs) represents a powerful and widely adopted approach to identifying putative regulatory variants and linking them to specific genes. Up to now eQTL studies have been conducted in a relatively narrow range of tissues or cell types. However, understanding the biology of organismal phenotypes will involve understanding regulation in multiple tissues, and ongoing studies are collecting eQTL data in dozens of cell types. Here we present a statistical framework for powerfully detecting eQTLs in multiple tissues or cell types (or, more generally, multiple subgroups). The framework explicitly models the potential for each eQTL to be active in some tissues and inactive in others. By modeling the sharing of active eQTLs among tissues, this framework increases power to detect eQTLs that are present in more than one tissue compared with “tissue-by-tissue” analyses that examine each tissue separately. Conversely, by modeling the inactivity of eQTLs in some tissues, the framework allows the proportion of eQTLs shared across different tissues to be formally estimated as parameters of a model, addressing the difficulties of accounting for incomplete power when comparing overlaps of eQTLs identified by tissue-by-tissue analyses. Applying our framework to re-analyze data from transformed B cells, T cells, and fibroblasts, we find that it substantially increases power compared with tissue-by-tissue analysis, identifying 63% more genes with eQTLs (at FDR = 0.05). Further, the results suggest that, in contrast to previous analyses of the same data, the majority of eQTLs detectable in these data are shared among all three tissues.
Author Summary
Genetic variants that are associated with gene expression are known as expression Quantitative Trait Loci, or eQTLs. Many studies have been conducted to identify eQTLs, and they have proven an effective tool for identifying putative regulatory variants and linking them to specific genes. Up to now most studies have been conducted in a single tissue or cell type, but moving forward this is changing, and ongoing studies are collecting data aimed at mapping eQTLs in dozens of tissues. Current statistical methods are not able to fully exploit the richness of these kinds of data, taking account of both the sharing and differences in eQTLs among tissues. In this paper we develop a statistical framework to address this problem, to improve power to detect eQTLs when they are shared among multiple tissues, and to allow for differences among tissues to be estimated. Applying these methods to data from three tissues suggests that sharing of eQTLs among tissues may be substantially more common than it appeared in previous analyses of the same data.
PMCID: PMC3649995  PMID: 23671422

Results 1-25 (946302)