Search tips
Search criteria

Results 1-25 (954199)

Clipboard (0)

Related Articles

1.  Genome-wide association studies for agronomical traits in a world wide spring barley collection 
BMC Plant Biology  2012;12:16.
Genome-wide association studies (GWAS) based on linkage disequilibrium (LD) provide a promising tool for the detection and fine mapping of quantitative trait loci (QTL) underlying complex agronomic traits. In this study we explored the genetic basis of variation for the traits heading date, plant height, thousand grain weight, starch content and crude protein content in a diverse collection of 224 spring barleys of worldwide origin. The whole panel was genotyped with a customized oligonucleotide pool assay containing 1536 SNPs using Illumina's GoldenGate technology resulting in 957 successful SNPs covering all chromosomes. The morphological trait "row type" (two-rowed spike vs. six-rowed spike) was used to confirm the high level of selectivity and sensitivity of the approach. This study describes the detection of QTL for the above mentioned agronomic traits by GWAS.
Population structure in the panel was investigated by various methods and six subgroups that are mainly based on their spike morphology and region of origin. We explored the patterns of linkage disequilibrium (LD) among the whole panel for all seven barley chromosomes. Average LD was observed to decay below a critical level (r2-value 0.2) within a map distance of 5-10 cM. Phenotypic variation within the panel was reasonably large for all the traits. The heritabilities calculated for each trait over multi-environment experiments ranged between 0.90-0.95. Different statistical models were tested to control spurious LD caused by population structure and to calculate the P-value of marker-trait associations. Using a mixed linear model with kinship for controlling spurious LD effects, we found a total of 171 significant marker trait associations, which delineate into 107 QTL regions. Across all traits these can be grouped into 57 novel QTL and 50 QTL that are congruent with previously mapped QTL positions.
Our results demonstrate that the described diverse barley panel can be efficiently used for GWAS of various quantitative traits, provided that population structure is appropriately taken into account. The observed significant marker trait associations provide a refined insight into the genetic architecture of important agronomic traits in barley. However, individual QTL account only for a small portion of phenotypic variation, which may be due to insufficient marker coverage and/or the elimination of rare alleles prior to analysis. The fact that the combined SNP effects fall short of explaining the complete phenotypic variance may support the hypothesis that the expression of a quantitative trait is caused by a large number of very small effects that escape detection. Notwithstanding these limitations, the integration of GWAS with biparental linkage mapping and an ever increasing body of genomic sequence information will facilitate the systematic isolation of agronomically important genes and subsequent analysis of their allelic diversity.
PMCID: PMC3349577  PMID: 22284310
2.  Localising Loci underlying Complex Trait Variation Using Regional Genomic Relationship Mapping 
PLoS ONE  2012;7(10):e46501.
The limited proportion of complex trait variance identified in genome-wide association studies may reflect the limited power of single SNP analyses to detect either rare causative alleles or those of small effect. Motivated by studies that demonstrate that loci contributing to trait variation may contain a number of different alleles, we have developed an analytical approach termed Regional Genomic Relationship Mapping that, like linkage-based family methods, integrates variance contributed by founder gametes within a pedigree. This approach takes advantage of very distant (and unrecorded) relationships, and this greatly increases the power of the method, compared with traditional pedigree-based linkage analyses. By integrating variance contributed by founder gametes in the population, our approach provides an estimate of the Regional Heritability attributable to a small genomic region (e.g. 100 SNP window covering ca. 1 Mb of DNA in a 300000 SNP GWAS) and has the power to detect regions containing multiple alleles that individually contribute too little variance to be detectable by GWAS as well as regions with single common GWAS-detectable SNPs. We use genome-wide SNP array data to obtain both a genome-wide relationship matrix and regional relationship (“identity by state" or IBS) matrices for sequential regions across the genome. We then estimate a heritability for each region sequentially in our genome-wide scan. We demonstrate by simulation and with real data that, when compared to traditional (“individual SNP") GWAS, our method uncovers new loci that explain additional trait variation. We analysed data from three Southern European populations and from Orkney for exemplar traits – serum uric acid concentration and height. We show that regional heritability estimates are correlated with results from genome-wide association analysis but can capture more of the genetic variance segregating in the population and identify additional trait loci.
PMCID: PMC3471913  PMID: 23077511
3.  GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm 
PLoS Genetics  2013;9(8):e1003657.
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.
Author Summary
Nowadays, the availability of cheaper and accurate assays to quantify multiple (endo)phenotypes in large population cohorts allows multi-trait studies. However, these studies are limited by the lack of flexible models integrated with efficient computational tools for genome-wide multi SNPs-traits analyses. To overcome this problem, we propose a novel Bayesian analysis strategy and a new algorithmic implementation which exploits parallel processing architecture for fully multivariate modeling of groups of correlated phenotypes at the genome-wide scale. In addition to increased power of our algorithm over alternative Bayesian and well-established non-Bayesian multi-phenotype methods, we provide an application to a real case study of several blood lipid traits, and show how our method recovered most of the major associations and is better at refining multi-trait polygenic associations than alternative methods. We reveal and replicate in independent cohorts new associations with two phenotypic groups that were not detected by competing multivariate approaches and not noticed by a large meta-GWAS. We also discuss the applicability of the proposed method to large meta-analyses involving hundreds of thousands of individuals and to diverse genomic datasets where complex dependencies in the predictor space are present.
PMCID: PMC3738451  PMID: 23950726
4.  Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel 
PLoS Genetics  2014;10(9):e1004573.
Association mapping is a powerful approach for dissecting the genetic architecture of complex quantitative traits using high-density SNP markers in maize. Here, we expanded our association panel size from 368 to 513 inbred lines with 0.5 million high quality SNPs using a two-step data-imputation method which combines identity by descent (IBD) based projection and k-nearest neighbor (KNN) algorithm. Genome-wide association studies (GWAS) were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model (MLM) and a new method, the Anderson-Darling (A-D) test. Ten loci for five traits were identified using the MLM method at the Bonferroni-corrected threshold −log10 (P) >5.74 (α = 1). Many loci ranging from one to 34 loci (107 loci for plant height) were identified for 17 traits using the A-D test at the Bonferroni-corrected threshold −log10 (P) >7.05 (α = 0.05) using 556809 SNPs. Many known loci and new candidate loci were only observed by the A-D test, a few of which were also detected in independent linkage analysis. This study indicates that combining IBD based projection and KNN algorithm is an efficient imputation method for inferring large missing genotype segments. In addition, we showed that the A-D test is a useful complement for GWAS analysis of complex quantitative traits. Especially for traits with abnormal phenotype distribution, controlled by moderate effect loci or rare variations, the A-D test balances false positives and statistical power. The candidate SNPs and associated genes also provide a rich resource for maize genetics and breeding.
Author Summary
Genotype imputation has been used widely in the analysis of genome-wide association studies (GWAS) to boost power and fine-map associations. We developed a two-step data imputation method to meet the challenge of large proportion missing genotypes. GWAS have uncovered an extensive genetic architecture of complex quantitative traits using high-density SNP markers in maize in the past few years. Here, GWAS were carried out for 17 agronomic traits with a panel of 513 inbred lines applying both mixed linear model and a new method, the Anderson-Darling (A-D) test. We intend to show that the A-D test is a complement to current GWAS methods, especially for complex quantitative traits controlled by moderate effect loci or rare variations and with abnormal phenotype distribution. In addition, the traits associated QTL identified here provide a rich resource for maize genetics and breeding.
PMCID: PMC4161304  PMID: 25211220
5.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor 
PLoS Genetics  2013;9(7):e1003608.
Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R2) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R2 based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)2, where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R2. However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R2. Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R2 may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.
Author Summary
Despite great advances in genotyping technologies, the ability to predict complex traits and diseases remains limited. Increasing evidence suggests that many of these traits may be affected by a large number of small-effect genes that are difficult to detect in single-variant association studies. Whole-Genome Regression (WGR) methods can be used to confront this challenge and have exhibited good predictive power when applied to animal and plant breeding populations. WGR is receiving increased attention in the field of human genetics. However, human and breeding populations differ greatly in factors that can affect the performance of WGRs. Using theory, simulation and real data analysis, we study the predictive performance of the Genomic Best Linear Unbiased Predictor (G-BLUP), one of the most commonly used WGR methods. We derive upper bounds for the prediction accuracy of G-BLUP under perfect and imperfect LD between markers and genotypes at causal loci and validate such upper bounds using simulation and real data analysis. Imperfect LD between markers and causal loci can impose a very low upper bound on the prediction accuracy of G-BLUP, especially when data involve unrelated individuals. In this context, we propose and evaluate avenues for improving the predictive performance of G-BLUP.
PMCID: PMC3708840  PMID: 23874214
6.  Regulatory and coding genome regions are enriched for trait associated variants in dairy and beef cattle 
BMC Genomics  2014;15(1):436.
In livestock, as in humans, the number of genetic variants that can be tested for association with complex quantitative traits, or used in genomic predictions, is increasing exponentially as whole genome sequencing becomes more common. The power to identify variants associated with traits, particularly those of small effects, could be increased if certain regions of the genome were known a priori to be enriched for associations. Here, we investigate whether twelve genomic annotation classes were enriched or depleted for significant associations in genome wide association studies for complex traits in beef and dairy cattle. We also describe a variance component approach to determine the proportion of genetic variance captured by each annotation class.
P-values from large GWAS using 700K SNP in both dairy and beef cattle were available for 11 and 10 traits respectively. We found significant enrichment for trait associated variants (SNP significant in the GWAS) in the missense class along with regions 5 kilobases upstream and downstream of coding genes. We found that the non-coding conserved regions (across mammals) were not enriched for trait associated variants. The results from the enrichment or depletion analysis were not in complete agreement with the results from variance component analysis, where the missense and synonymous classes gave the greatest increase in variance explained, while the upstream and downstream classes showed a more modest increase in the variance explained.
Our results indicate that functional annotations could assist in prioritization of variants to a subset more likely to be associated with complex traits; including missense variants, and upstream and downstream regions. The differences in two sets of results (GWAS enrichment depletion versus variance component approaches) might be explained by the fact that the variance component approach has greater power to capture the cumulative effect of mutations of small effect, while the enrichment or depletion approach only captures the variants that are significant in GWAS, which is restricted to a limited number of common variants of moderate effects.
Electronic supplementary material
The online version of this article (doi: 10.1186/1471-2164-15-436) contains supplementary material, which is available to authorized users.
PMCID: PMC4070550  PMID: 24903263
Variants component analysis; Regulatory genome; GWAS prioritization; Enrichment depletion
7.  Linkage and Association Mapping of Arabidopsis thaliana Flowering Time in Nature 
PLoS Genetics  2010;6(5):e1000940.
Flowering time is a key life-history trait in the plant life cycle. Most studies to unravel the genetics of flowering time in Arabidopsis thaliana have been performed under greenhouse conditions. Here, we describe a study about the genetics of flowering time that differs from previous studies in two important ways: first, we measure flowering time in a more complex and ecologically realistic environment; and, second, we combine the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping. Our experiments involved phenotyping nearly 20,000 plants over 2 winters under field conditions, including 184 worldwide natural accessions genotyped for 216,509 SNPs and 4,366 RILs derived from 13 independent crosses chosen to maximize genetic and phenotypic diversity. Based on a photothermal time model, the flowering time variation scored in our field experiment was poorly correlated with the flowering time variation previously obtained under greenhouse conditions, reinforcing previous demonstrations of the importance of genotype by environment interactions in A. thaliana and the need to study adaptive variation under natural conditions. The use of 4,366 RILs provides great power for dissecting the genetic architecture of flowering time in A. thaliana under our specific field conditions. We describe more than 60 additive QTLs, all with relatively small to medium effects and organized in 5 major clusters. We show that QTL mapping increases our power to distinguish true from false associations in GWA mapping. QTL mapping also permits the identification of false negatives, that is, causative SNPs that are lost when applying GWA methods that control for population structure. Major genes underpinning flowering time in the greenhouse were not associated with flowering time in this study. Instead, we found a prevalence of genes involved in the regulation of the plant circadian clock. Furthermore, we identified new genomic regions lacking obvious candidate genes.
Author Summary
Dissecting the genetic bases of adaptive traits is of primary importance in evolutionary biology. In this study, we combined a genome-wide association (GWA) study with traditional linkage mapping in order to detect the genetic bases underlying natural variation in flowering time in ecologically realistic conditions in the plant Arabidopsis thaliana. Our study involved phenotyping nearly 20,000 plants over 2 winters under field conditions in a temperate climate. We show that combined linkage and association mapping clearly outperforms each method alone when it comes to identifying true associations. This highlights the utility of combining different methods to localize genes involved in complex trait natural variation. Most candidate genes found in this study are involved in the regulation of the plant circadian clock and, surprisingly, were not associated with flowering time scored under greenhouse conditions. While rapid advances have been made in high-throughput genotyping and sequencing, high-throughput phenotyping of complex traits under natural conditions will be the next challenge for dissecting the genetic bases of adaptive variation in “laboratory” model organisms.
PMCID: PMC2865524  PMID: 20463887
8.  Combining Genome-Wide Association Mapping and Transcriptional Networks to Identify Novel Genes Controlling Glucosinolates in Arabidopsis thaliana 
PLoS Biology  2011;9(8):e1001125.
Genome-wide association mapping is highly sensitive to environmental changes, but network analysis allows rapid causal gene identification.
Genome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.
Methodology/Principal Findings
To understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ∼230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.
Together, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse.
Author Summary
Understanding how genetic variation can control phenotypic variation is a fundamental goal of modern biology. A major push has been made using genome-wide association mapping in all organisms to attempt and rapidly identify the genes contributing to phenotypes such as disease and nutritional disorders. But a number of fundamental questions have not been answered about the use of genome-wide association: for example, how does the internal or external environment influence the genes found? Furthermore, the simple question of how many genes may influence a trait is unknown. Finally, a number of studies have identified significant false-positive and -negative issues within genome-wide association studies that are not solvable by direct statistical approaches. We have used genome-wide association mapping in the plant Arabidopsis thaliana to begin exploring these questions. We show that both external and internal environments significantly alter the identified genes, such that using different tissues can lead to the identification of nearly completely different gene sets. Given the large number of potential false-positives, we developed an orthogonal approach to filtering the possible genes, by identifying co-functioning networks using the nominal candidate gene list derived from genome-wide association studies. This allowed us to rapidly identify and validate a large number of novel and unexpected genes that affect Arabidopsis thaliana defense metabolism within phenotypic ranges that have been shown to be selectable within the field. These genes and the associated networks suggest that Arabidopsis thaliana defense metabolism is more readily similar to the infinite gene hypothesis, according to which there is a vast number of causative genes controlling natural variation in this phenotype. It remains to be seen how frequently this is true for other organisms and other phenotypes.
PMCID: PMC3156686  PMID: 21857804
9.  Statistical Power to Detect Genetic (Co)Variance of Complex Traits Using SNP Data in Unrelated Samples 
PLoS Genetics  2014;10(4):e1004269.
We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.
Author Summary
Genome-wide association studies (GWAS) have identified thousands of genetic variants for hundreds of traits and diseases. However, the genetic variants discovered from GWAS only explained a small fraction of the heritability, resulting in the question of “missing heritability”. We have recently developed approaches (called GREML) to estimate the overall contribution of all SNPs to the phenotypic variance of a trait (disease) and the proportion of genetic overlap between traits (diseases). A frequently asked question is that how many samples are required to estimate the proportion of variance attributable to all SNPs and the proportion of genetic overlap with useful precision. In this study, we derive the standard errors of the estimated parameters from theory and find that they are highly consistent with those observed values from published results and those obtained from simulation. The theory together with an online application tool will be helpful to plan experimental design to quantify the missing heritability, and to estimate the genetic overlap between traits (diseases) especially when it is unfeasible to have the traits (diseases) measured on the same individuals.
PMCID: PMC3983037  PMID: 24721987
10.  A Hierarchical Bayesian Approach to Multi-Trait Clinical Quantitative Trait Locus Modeling 
Recent advances in high-throughput genotyping and transcript profiling technologies have enabled the inexpensive production of genome-wide dense marker maps in tandem with huge amounts of expression profiles. These large-scale data encompass valuable information about the genetic architecture of important phenotypic traits. Comprehensive models that combine molecular markers and gene transcript levels are increasingly advocated as an effective approach to dissecting the genetic architecture of complex phenotypic traits. The simultaneous utilization of marker and gene expression data to explain the variation in clinical quantitative trait, known as clinical quantitative trait locus (cQTL) mapping, poses challenges that are both conceptual and computational. Nonetheless, the hierarchical Bayesian (HB) modeling approach, in combination with modern computational tools such as Markov chain Monte Carlo (MCMC) simulation techniques, provides much versatility for cQTL analysis. Sillanpää and Noykova (2008) developed a HB model for single-trait cQTL analysis in inbred line cross-data using molecular markers, gene expressions, and marker-gene expression pairs. However, clinical traits generally relate to one another through environmental correlations and/or pleiotropy. A multi-trait approach can improve on the power to detect genetic effects and on their estimation precision. A multi-trait model also provides a framework for examining a number of biologically interesting hypotheses. In this paper we extend the HB cQTL model for inbred line crosses proposed by Sillanpää and Noykova to a multi-trait setting. We illustrate the implementation of our new model with simulated data, and evaluate the multi-trait model performance with regard to its single-trait counterpart. The data simulation process was based on the multi-trait cQTL model, assuming three traits with uncorrelated and correlated cQTL residuals, with the simulated data under uncorrelated cQTL residuals serving as our test set for comparing the performances of the multi-trait and single-trait models. The simulated data under correlated cQTL residuals were essentially used to assess how well our new model can estimate the cQTL residual covariance structure. The model fitting to the data was carried out by MCMC simulation through OpenBUGS. The multi-trait model outperformed its single-trait counterpart in identifying cQTLs, with a consistently lower false discovery rate. Moreover, the covariance matrix of cQTL residuals was typically estimated to an appreciable degree of precision under the multi-trait cQTL model, making our new model a promising approach to addressing a wide range of issues facing the analysis of correlated clinical traits.
PMCID: PMC3368303  PMID: 22685451
Bayesian multilevel modeling; genetic architecture; linked marker-expression pairs; pleiotropy
11.  Limits on the reproducibility of marker associations with southern leaf blight resistance in the maize nested association mapping population 
BMC Genomics  2014;15(1):1068.
A previous study reported a comprehensive quantitative trait locus (QTL) and genome wide association study (GWAS) of southern leaf blight (SLB) resistance in the maize Nested Association Mapping (NAM) panel. Since that time, the genomic resources available for such analyses have improved substantially. An updated NAM genetic linkage map has a nearly six-fold greater marker density than the previous map and the combined SNPs and read-depth variants (RDVs) from maize HapMaps 1 and 2 provided 28.5 M genomic variants for association analysis, 17 fold more than HapMap 1. In addition, phenotypic values of the NAM RILs were re-estimated to account for environment-specific flowering time covariates and a small proportion of lines were dropped due to genotypic data quality problems. Comparisons of original and updated QTL and GWAS results confound the effects of linkage map density, GWAS marker density, population sample size, and phenotype estimates. Therefore, we evaluated the effects of changing each of these parameters individually and in combination to determine their relative impact on marker-trait associations in original and updated analyses.
Of the four parameters varied, map density caused the largest changes in QTL and GWAS results. The updated QTL model had better cross-validation prediction accuracy than the previous model. Whereas joint linkage QTL positions were relatively stable to input changes, the residual values derived from those QTL models (used as inputs to GWAS) were more sensitive, resulting in substantial differences between GWAS results. The updated NAM GWAS identified several candidate genes consistent with previous QTL fine-mapping results.
The highly polygenic nature of resistance to SLB complicates the identification of causal genes. Joint linkage QTL are relatively stable to perturbations of data inputs, but their resolution is generally on the order of tens or more Mbp. GWAS associations have higher resolution, but lower power due to stringent thresholds designed to minimize false positive associations, resulting in variability of detection across studies. The updated higher density linkage map improves QTL estimation and, along with a much denser SNP HapMap, greatly increases the likelihood of detecting SNPs in linkage with causal variants. We recommend use of the updated genetic resources and results but emphasize the limited repeatability of small-effect associations.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1068) contains supplementary material, which is available to authorized users.
PMCID: PMC4300987  PMID: 25475173
Quantitative trait loci; Nested association mapping; Disease resistance; Genome wide association study; Zea mays
12.  Genome Wide Association Identifies Novel Loci Involved in Fungal Communication 
PLoS Genetics  2013;9(8):e1003669.
Understanding how genomes encode complex cellular and organismal behaviors has become the outstanding challenge of modern genetics. Unlike classical screening methods, analysis of genetic variation that occurs naturally in wild populations can enable rapid, genome-scale mapping of genotype to phenotype with a medium-throughput experimental design. Here we describe the results of the first genome-wide association study (GWAS) used to identify novel loci underlying trait variation in a microbial eukaryote, harnessing wild isolates of the filamentous fungus Neurospora crassa. We genotyped each of a population of wild Louisiana strains at 1 million genetic loci genome-wide, and we used these genotypes to map genetic determinants of microbial communication. In N. crassa, germinated asexual spores (germlings) sense the presence of other germlings, grow toward them in a coordinated fashion, and fuse. We evaluated germlings of each strain for their ability to chemically sense, chemotropically seek, and undergo cell fusion, and we subjected these trait measurements to GWAS. This analysis identified one gene, NCU04379 (cse-1, encoding a homolog of a neuronal calcium sensor), at which inheritance was strongly associated with the efficiency of germling communication. Deletion of cse-1 significantly impaired germling communication and fusion, and two genes encoding predicted interaction partners of CSE1 were also required for the communication trait. Additionally, mining our association results for signaling and secretion genes with a potential role in germling communication, we validated six more previously unknown molecular players, including a secreted protease and two other genes whose deletion conferred a novel phenotype of increased communication and multi-germling fusion. Our results establish protein secretion as a linchpin of germling communication in N. crassa and shed light on the regulation of communication molecules in this fungus. Our study demonstrates the power of population-genetic analyses for the rapid identification of genes contributing to complex traits in microbial species.
Author Summary
Many phenotypes of interest are controlled by multiple loci, and in biological systems identifying determinants of such complex traits is challenging. Here, we genotyped 112 wild isolates of Neurospora crassa and used this resource to identify genes that mediate a fundamental but poorly-understood attribute of this filamentous fungus: the ability of germinating spores to sense each other at a distance, extend projections toward one another, and fuse. Inheritance at a secretion gene, cse-1, was associated strongly with germling communication across wild strains; this association was validated in experiments showing reduced communication in a cse-1 deletion strain. By testing interacting partners of CSE1, and by assessing additional secretion and signaling factors whose inheritance associated more modestly with germling communication in wild strains, we identified eight other novel determinants of this phenotype. Our population of genotyped wild isolates provides a flexible and powerful community resource for the rapid identification of any varying, complex phenotype in N. crassa. The success of our approach, which used a phenotyping scheme far more tractable than would be required in a screen of the entire N. crassa gene deletion collection, serves as a proof of concept for association studies of wild populations for any organism.
PMCID: PMC3731230  PMID: 23935534
13.  A genome-wide association study for blood lipid phenotypes in the Framingham Heart Study 
BMC Medical Genetics  2007;8(Suppl 1):S17.
Blood lipid levels including low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) are highly heritable. Genome-wide association is a promising approach to map genetic loci related to these heritable phenotypes.
In 1087 Framingham Heart Study Offspring cohort participants (mean age 47 years, 52% women), we conducted genome-wide analyses (Affymetrix 100K GeneChip) for fasting blood lipid traits. Total cholesterol, HDL-C, and TG were measured by standard enzymatic methods and LDL-C was calculated using the Friedewald formula. The long-term averages of up to seven measurements of LDL-C, HDL-C, and TG over a ~30 year span were the primary phenotypes. We used generalized estimating equations (GEE), family-based association tests (FBAT) and variance components linkage to investigate the relationships between SNPs (on autosomes, with minor allele frequency ≥10%, genotypic call rate ≥80%, and Hardy-Weinberg equilibrium p ≥ 0.001) and multivariable-adjusted residuals. We pursued a three-stage replication strategy of the GEE association results with 287 SNPs (P < 0.001 in Stage I) tested in Stage II (n ~1450 individuals) and 40 SNPs (P < 0.001 in joint analysis of Stages I and II) tested in Stage III (n~6650 individuals).
Long-term averages of LDL-C, HDL-C, and TG were highly heritable (h2 = 0.66, 0.69, 0.58, respectively; each P < 0.0001). Of 70,987 tests for each of the phenotypes, two SNPs had p < 10-5 in GEE results for LDL-C, four for HDL-C, and one for TG. For each multivariable-adjusted phenotype, the number of SNPs with association p < 10-4 ranged from 13 to 18 and with p < 10-3, from 94 to 149. Some results confirmed previously reported associations with candidate genes including variation in the lipoprotein lipase gene (LPL) and HDL-C and TG (rs7007797; P = 0.0005 for HDL-C and 0.002 for TG). The full set of GEE, FBAT and linkage results are posted at the database of Genotype and Phenotype (dbGaP). After three stages of replication, there was no convincing statistical evidence for association (i.e., combined P < 10-5 across all three stages) between any of the tested SNPs and lipid phenotypes.
Using a 100K genome-wide scan, we have generated a set of putative associations for common sequence variants and lipid phenotypes. Validation of selected hypotheses in additional samples did not identify any new loci underlying variability in blood lipids. Lack of replication may be due to inadequate statistical power to detect modest quantitative trait locus effects (i.e., <1% of trait variance explained) or reduced genomic coverage of the 100K array. GWAS in FHS using a denser genome-wide genotyping platform and a better-powered replication strategy may identify novel loci underlying blood lipids.
PMCID: PMC1995614  PMID: 17903299
14.  Properties and Modeling of GWAS when Complex Disease Risk Is Due to Non-Complementing, Deleterious Mutations in Genes of Large Effect 
PLoS Genetics  2013;9(2):e1003258.
Current genome-wide association studies (GWAS) have high power to detect intermediate frequency SNPs making modest contributions to complex disease, but they are underpowered to detect rare alleles of large effect (RALE). This has led to speculation that the bulk of variation for most complex diseases is due to RALE. One concern with existing models of RALE is that they do not make explicit assumptions about the evolution of a phenotype and its molecular basis. Rather, much of the existing literature relies on arbitrary mapping of phenotypes onto genotypes obtained either from standard population-genetic simulation tools or from non-genetic models. We introduce a novel simulation of a 100-kilobase gene region, based on the standard definition of a gene, in which mutations are unconditionally deleterious, are continuously arising, have partially recessive and non-complementing effects on phenotype (analogous to what is widely observed for most Mendelian disorders), and are interspersed with neutral markers that can be genotyped. Genes evolving according to this model exhibit a characteristic GWAS signature consisting of an excess of marginally significant markers. Existing tests for an excess burden of rare alleles in cases have low power while a simple new statistic has high power to identify disease genes evolving under our model. The structure of linkage disequilibrium between causative mutations and significantly associated markers under our model differs fundamentally from that seen when rare causative markers are assumed to be neutral. Rather than tagging single haplotypes bearing a large number of rare causative alleles, we find that significant SNPs in a GWAS tend to tag single causative mutations of small effect relative to other mutations in the same gene. Our results emphasize the importance of evaluating the power to detect associations under models that are genetically and evolutionarily motivated.
Author Summary
Current GWA studies typically only explain a small fraction of heritable variation in complex traits, resulting in speculation that a large fraction of variation in such traits may be due to rare alleles of large effect (RALE). The most parsimonious evolutionary mechanism that results in an inverse relationship between the frequency and effect size of causative alleles is an equilibrium between newly arising deleterious mutations and selection eliminating those mutations, resulting in an inverse relation between effect size and average frequency. This assumption is not built into many current models of RALE and, as a result, power calculations may be misleading. We use forward population genetic simulations to explore the ability of GWAS to detect genes in which unconditionally deleterious, partially recessive mutations arise each generation. Our model is based on the standard definition of a gene as a region within which loss-of-function mutations fail to complement, consistent with the multi-allelic basis for Mendelian disorders. Our model predicts that it may not be uncommon for single genes evolving under our model to contribute upwards of 5% to variation in a complex trait, and that such genes could be routinely detected via modified GWAS approaches.
PMCID: PMC3578756  PMID: 23437004
15.  A Multi-Trait, Meta-analysis for Detecting Pleiotropic Polymorphisms for Stature, Fatness and Reproduction in Beef Cattle 
PLoS Genetics  2014;10(3):e1004198.
Polymorphisms that affect complex traits or quantitative trait loci (QTL) often affect multiple traits. We describe two novel methods (1) for finding single nucleotide polymorphisms (SNPs) significantly associated with one or more traits using a multi-trait, meta-analysis, and (2) for distinguishing between a single pleiotropic QTL and multiple linked QTL. The meta-analysis uses the effect of each SNP on each of n traits, estimated in single trait genome wide association studies (GWAS). These effects are expressed as a vector of signed t-values (t) and the error covariance matrix of these t values is approximated by the correlation matrix of t-values among the traits calculated across the SNP (V). Consequently, t'V−1t is approximately distributed as a chi-squared with n degrees of freedom. An attractive feature of the meta-analysis is that it uses estimated effects of SNPs from single trait GWAS, so it can be applied to published data where individual records are not available. We demonstrate that the multi-trait method can be used to increase the power (numbers of SNPs validated in an independent population) of GWAS in a beef cattle data set including 10,191 animals genotyped for 729,068 SNPs with 32 traits recorded, including growth and reproduction traits. We can distinguish between a single pleiotropic QTL and multiple linked QTL because multiple SNPs tagging the same QTL show the same pattern of effects across traits. We confirm this finding by demonstrating that when one SNP is included in the statistical model the other SNPs have a non-significant effect. In the beef cattle data set, cluster analysis yielded four groups of QTL with similar patterns of effects across traits within a group. A linear index was used to validate SNPs having effects on multiple traits and to identify additional SNPs belonging to these four groups.
Author Summary
We describe novel methods for finding significant associations between a genome wide panel of SNPs and multiple complex traits, and further for distinguishing between genes with effects on multiple traits and multiple linked genes affecting different traits. The method uses a meta-analysis based on estimates of SNP effects from independent single trait genome wide association studies (GWAS). The method could therefore be widely used to combine already published GWAS results. The method was applied to 32 traits that describe growth, body composition, feed intake and reproduction in 10,191 beef cattle genotyped for approximately 700,000 SNP. The genes found to be associated with these traits can be arranged into 4 groups that differ in their pattern of effects and hence presumably in their physiological mechanism of action. For instance, one group of genes affects weight and fatness in the opposite direction and can be described as a group of genes affecting mature size, while another group affects weight and fatness in the same direction.
PMCID: PMC3967938  PMID: 24675618
16.  Four Linked Genes Participate in Controlling Sporulation Efficiency in Budding Yeast 
PLoS Genetics  2006;2(11):e195.
Quantitative traits are conditioned by several genetic determinants. Since such genes influence many important complex traits in various organisms, the identification of quantitative trait loci (QTLs) is of major interest, but still encounters serious difficulties. We detected four linked genes within one QTL, which participate in controlling sporulation efficiency in Saccharomyces cerevisiae. Following the identification of single nucleotide polymorphisms by comparing the sequences of 145 genes between the parental strains SK1 and S288c, we analyzed the segregating progeny of the cross between them. Through reciprocal hemizygosity analysis, four genes, RAS2, PMS1, SWS2, and FKH2, located in a region of 60 kilobases on Chromosome 14, were found to be associated with sporulation efficiency. Three of the four “high” sporulation alleles are derived from the “low” sporulating strain. Two of these sporulation-related genes were verified through allele replacements. For RAS2, the causative variation was suggested to be a single nucleotide difference in the upstream region of the gene. This quantitative trait nucleotide accounts for sporulation variability among a set of ten closely related winery yeast strains. Our results provide a detailed view of genetic complexity in one “QTL region” that controls a quantitative trait and reports a single nucleotide polymorphism-trait association in wild strains. Moreover, these findings have implications on QTL identification in higher eukaryotes.
Genes controlling many medically and agriculturally important complex traits in various organisms and their organization as quantitative trait loci (QTLs) are of major interest. To identify QTLs responsible for such a quantitative trait, the authors employed a two-step strategy: First, single-nucleotide markers (called SNPs) distributed throughout the genome were screened for prevalence among progeny with extreme characteristics, thus identifying three candidate genomic regions. Next, in one of these regions, manipulation of individual genes revealed four tightly linked genes that affected the trait, sporulation efficiency. A fifth gene that affects sporulation was recently and independently identified in the same region. This 60-kilobase region has a complex and interesting architecture: One strain, which sporulates efficiently, has sporulation-promoting alleles (alternative forms) at two major genes and inhibiting alleles at the three less important ones, whereas another strain, with inefficient sporulation, has the opposite alleles at the five genes. Moreover, one causative SNP for this trait, in the promoter region of the gene RAS2, explains sporulation differences among a set of ten winery yeast strains. These results provide a detailed view of genetic complexity in one “QTL region” and an SNP-trait association example among wild strains.
PMCID: PMC1636695  PMID: 17112318
17.  Genome-Wide Association Studies Using Haplotypes and Individual SNPs in Simmental Cattle 
PLoS ONE  2014;9(10):e109330.
Recent advances in high-throughput genotyping technologies have provided the opportunity to map genes using associations between complex traits and markers. Genome-wide association studies (GWAS) based on either a single marker or haplotype have identified genetic variants and underlying genetic mechanisms of quantitative traits. Prompted by the achievements of studies examining economic traits in cattle and to verify the consistency of these two methods using real data, the current study was conducted to construct the haplotype structure in the bovine genome and to detect relevant genes genuinely affecting a carcass trait and a meat quality trait. Using the Illumina BovineHD BeadChip, 942 young bulls with genotyping data were introduced as a reference population to identify the genes in the beef cattle genome significantly associated with foreshank weight and triglyceride levels. In total, 92,553 haplotype blocks were detected in the genome. The regions of high linkage disequilibrium extended up to approximately 200 kb, and the size of haplotype blocks ranged from 22 bp to 199,266 bp. Additionally, the individual SNP analysis and the haplotype-based analysis detected similar regions and common SNPs for these two representative traits. A total of 12 and 7 SNPs in the bovine genome were significantly associated with foreshank weight and triglyceride levels, respectively. By comparison, 4 and 5 haplotype blocks containing the majority of significant SNPs were strongly associated with foreshank weight and triglyceride levels, respectively. In addition, 36 SNPs with high linkage disequilibrium were detected in the GNAQ gene, a potential hotspot that may play a crucial role for regulating carcass trait components.
PMCID: PMC4203724  PMID: 25330174
18.  Identification, Replication, and Fine-Mapping of Loci Associated with Adult Height in Individuals of African Ancestry 
N'Diaye, Amidou | Chen, Gary K. | Palmer, Cameron D. | Ge, Bing | Tayo, Bamidele | Mathias, Rasika A. | Ding, Jingzhong | Nalls, Michael A. | Adeyemo, Adebowale | Adoue, Véronique | Ambrosone, Christine B. | Atwood, Larry | Bandera, Elisa V. | Becker, Lewis C. | Berndt, Sonja I. | Bernstein, Leslie | Blot, William J. | Boerwinkle, Eric | Britton, Angela | Casey, Graham | Chanock, Stephen J. | Demerath, Ellen | Deming, Sandra L. | Diver, W. Ryan | Fox, Caroline | Harris, Tamara B. | Hernandez, Dena G. | Hu, Jennifer J. | Ingles, Sue A. | John, Esther M. | Johnson, Craig | Keating, Brendan | Kittles, Rick A. | Kolonel, Laurence N. | Kritchevsky, Stephen B. | Le Marchand, Loic | Lohman, Kurt | Liu, Jiankang | Millikan, Robert C. | Murphy, Adam | Musani, Solomon | Neslund-Dudas, Christine | North, Kari E. | Nyante, Sarah | Ogunniyi, Adesola | Ostrander, Elaine A. | Papanicolaou, George | Patel, Sanjay | Pettaway, Curtis A. | Press, Michael F. | Redline, Susan | Rodriguez-Gil, Jorge L. | Rotimi, Charles | Rybicki, Benjamin A. | Salako, Babatunde | Schreiner, Pamela J. | Signorello, Lisa B. | Singleton, Andrew B. | Stanford, Janet L. | Stram, Alex H. | Stram, Daniel O. | Strom, Sara S. | Suktitipat, Bhoom | Thun, Michael J. | Witte, John S. | Yanek, Lisa R. | Ziegler, Regina G. | Zheng, Wei | Zhu, Xiaofeng | Zmuda, Joseph M. | Zonderman, Alan B. | Evans, Michele K. | Liu, Yongmei | Becker, Diane M. | Cooper, Richard S. | Pastinen, Tomi | Henderson, Brian E. | Hirschhorn, Joel N. | Lettre, Guillaume | Haiman, Christopher A.
PLoS Genetics  2011;7(10):e1002298.
Adult height is a classic polygenic trait of high heritability (h2 ∼0.8). More than 180 single nucleotide polymorphisms (SNPs), identified mostly in populations of European descent, are associated with height. These variants convey modest effects and explain ∼10% of the variance in height. Discovery efforts in other populations, while limited, have revealed loci for height not previously implicated in individuals of European ancestry. Here, we performed a meta-analysis of genome-wide association (GWA) results for adult height in 20,427 individuals of African ancestry with replication in up to 16,436 African Americans. We found two novel height loci (Xp22-rs12393627, P = 3.4×10−12 and 2p14-rs4315565, P = 1.2×10−8). As a group, height associations discovered in European-ancestry samples replicate in individuals of African ancestry (P = 1.7×10−4 for overall replication). Fine-mapping of the European height loci in African-ancestry individuals showed an enrichment of SNPs that are associated with expression of nearby genes when compared to the index European height SNPs (P<0.01). Our results highlight the utility of genetic studies in non-European populations to understand the etiology of complex human diseases and traits.
Author Summary
Adult height is an ideal phenotype to improve our understanding of the genetic architecture of complex diseases and traits: it is easily measured and usually available in large cohorts, relatively stable, and mostly influenced by genetics (narrow-sense heritability of height h2∼0.8). Genome-wide association (GWA) studies in individuals of European ancestry have identified >180 single nucleotide polymorphisms (SNPs) associated with height. In the current study, we continued to use height as a model polygenic trait and explored the genetic influence in populations of African ancestry through a meta-analysis of GWA height results from 20,809 individuals of African descent. We identified two novel height loci not previously found in Europeans. We also replicated the European height signals, suggesting that many of the genetic variants that are associated with height are shared between individuals of European and African descent. Finally, in fine-mapping the European height loci in African-ancestry individuals, we found SNPs more likely to be associated with the expression of nearby genes than the SNPs originally found in Europeans. Thus, our results support the utility of performing genetic studies in non-European populations to gain insights into complex human diseases and traits.
PMCID: PMC3188544  PMID: 21998595
19.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
20.  Entering the second century of maize quantitative genetics 
Heredity  2013;112(1):30-38.
Maize is the most widely grown cereal in the world. In addition to its role in global agriculture, it has also long served as a model organism for genetic research. Maize stands at a genetic crossroads, as it has access to all the tools available for plant genetics but exhibits a genetic architecture more similar to other outcrossing organisms than to self-pollinating crops and model plants. In this review, we summarize recent advances in maize genetics, including the development of powerful populations for genetic mapping and genome-wide association studies (GWAS), and the insights these studies yield on the mechanisms underlying complex maize traits. Most maize traits are controlled by a large number of genes, and linkage analysis of several traits implicates a ‘common gene, rare allele' model of genetic variation where some genes have many individually rare alleles contributing. Most natural alleles exhibit small effect sizes with little-to-no detectable pleiotropy or epistasis. Additionally, many of these genes are locked away in low-recombination regions that encourage the formation of multi-gene blocks that may underlie maize's strong heterotic effect. Domestication left strong marks on the maize genome, and some of the differences in trait architectures may be due to different selective pressures over time. Overall, maize's advantages as a model system make it highly desirable for studying the genetics of outcrossing species, and results from it can provide insight into other such species, including humans.
PMCID: PMC3860165  PMID: 23462502
maize; nested association mapping; quantitative traits; genome-wide association; heterosis
21.  Genome-Wide Associations of Gene Expression Variation in Humans 
PLoS Genetics  2005;1(6):e78.
The exploration of quantitative variation in human populations has become one of the major priorities for medical genetics. The successful identification of variants that contribute to complex traits is highly dependent on reliable assays and genetic maps. We have performed a genome-wide quantitative trait analysis of 630 genes in 60 unrelated Utah residents with ancestry from Northern and Western Europe using the publicly available phase I data of the International HapMap project. The genes are located in regions of the human genome with elevated functional annotation and disease interest including the ENCODE regions spanning 1% of the genome, Chromosome 21 and Chromosome 20q12–13.2. We apply three different methods of multiple test correction, including Bonferroni, false discovery rate, and permutations. For the 374 expressed genes, we find many regions with statistically significant association of single nucleotide polymorphisms (SNPs) with expression variation in lymphoblastoid cell lines after correcting for multiple tests. Based on our analyses, the signal proximal (cis-) to the genes of interest is more abundant and more stable than distal and trans across statistical methodologies. Our results suggest that regulatory polymorphism is widespread in the human genome and show that the 5-kb (phase I) HapMap has sufficient density to enable linkage disequilibrium mapping in humans. Such studies will significantly enhance our ability to annotate the non-coding part of the genome and interpret functional variation. In addition, we demonstrate that the HapMap cell lines themselves may serve as a useful resource for quantitative measurements at the cellular level.
With the finished reference sequence of the human genome now available, focus has shifted towards trying to identify all of the functional elements within the sequence. Although quite a lot of progress has been made towards identifying some classes of genomic elements, in particular protein-coding sequences, the characterization of regulatory elements remains a challenge. The authors describe the genetic mapping of regions of the genome that have functional effects on quantitative levels of gene expression. Gene expression of 630 genes was measured in cell lines derived from 60 unrelated human individuals, the same Utah residents of Northern and Western European ancestry that have been genetically well-characterized by The International HapMap Project. This paper reports significant variation among individuals with respect to levels of gene expression, and demonstrates that this quantitative trait has a genetic basis. For some genes, the genetic signal was localized to specific locations in the human genome sequence; in most cases the genomic region associated with expression variation was physically close to the gene whose expression it regulated. The authors demonstrate the feasibility of performing whole-genome association scans to map quantitative traits, and highlight statistical issues that are increasingly important for whole-genome disease mapping studies.
PMCID: PMC1315281  PMID: 16362079
22.  An Integration of Genome-Wide Association Study and Gene Expression Profiling to Prioritize the Discovery of Novel Susceptibility Loci for Osteoporosis-Related Traits 
PLoS Genetics  2010;6(6):e1000977.
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6×10−8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6×10−13; SOX6, p = 6.4×10−10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
Author Summary
BMD and hip geometry are two major predictors of osteoporotic fractures, the most severe consequence of osteoporosis in elderly persons. We performed sex-specific genome-wide association studies (GWAS) for BMD at the lumbar spine and femor neck skeletal sites as well as hip geometric indices (NSA, NL, and NW) in the Framingham Osteoporosis Study and then replicated the top findings in two independent studies. Three novel loci were significant: in women, including chromosome 1p13.2 (RAP1A) for NW; in men, 2q11.2 (TBC1D8) for NSA and 18q11.2 (OSBPL1A) for NW. We confirmed a previously reported region on 8q24.12 (TNFRSF11B/OPG) for lumbar spine BMD in women. In addition, we integrated GWAS signals with eQTL in several tissues and publicly available expression signature profiling in cellular and whole-animal models, and prioritized 16 candidate genes/loci based on their potential involvement in skeletal metabolism. Among three prioritized loci (GPR177, SOX6, and CASR genes) associated with BMD in women, GPR177 and SOX6 have been successfully replicated later in a large-scale meta-analysis, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of using expression profiling to support the candidacy of suggestive GWAS signals that may contain important genes of interest.
PMCID: PMC2883588  PMID: 20548944
23.  A comparison of different linkage statistics in small to moderate sized pedigrees with complex diseases 
BMC Research Notes  2012;5:411.
In the last years GWA studies have successfully identified common SNPs associated with complex diseases. However, most of the variants found this way account for only a small portion of the trait variance. This fact leads researchers to focus on rare-variant mapping with large scale sequencing, which can be facilitated by using linkage information. The question arises why linkage analysis often fails to identify genes when analyzing complex diseases. Using simulations we have investigated the power of parametric and nonparametric linkage statistics (KC-LOD, NPL, LOD and MOD scores), to detect the effect of genes responsible for complex diseases using different pedigree structures.
As expected, a small number of pedigrees with less than three affected individuals has low power to map disease genes with modest effect. Interestingly, the power decreases when unaffected individuals are included in the analysis, irrespective of the true mode of inheritance. Furthermore, we found that the best performing statistic depends not only on the type of pedigrees but also on the true mode of inheritance.
When applied in a sensible way linkage is an appropriate and robust technique to map genes for complex disease. Unlike association analysis, linkage analysis is not hampered by allelic heterogeneity. So, why does linkage analysis often fail with complex diseases? Evidently, when using an insufficient number of small pedigrees, one might miss a true genetic linkage when actually a real effect exists. Furthermore, we show that the test statistic has an important effect on the power to detect linkage as well. Therefore, a linkage analysis might fail if an inadequate test statistic is employed. We provide recommendations regarding the most favorable test statistics, in terms of power, for a given mode of inheritance and type of pedigrees under study, in order to reduce the probability to miss a true linkage.
PMCID: PMC3475142  PMID: 22862841
Linkage; Parametric analysis; Nonparametric analysis; NPL score; LOD score; MOD score; Complex diseases; Rare variants
24.  Regional Heritability Mapping to identify loci underlying genetic variation of complex traits 
BMC Proceedings  2014;8(Suppl 5):S3.
Genome-wide association studies can have limited power to identify QTL, partly due to the stringent correction for multiple testing and low linkage-disequilibrium between SNPs and QTL. Regional Heritability Mapping (RHM) has been advanced as an alternative approach to capture underlying genetic effects. In this study, RHM was used to identify loci underlying variation in the 16th QTLMAS workshop simulated traits.
The method was implemented by fitting a mixed model where a genomic region and the overall genetic background were added as random effects. Heritabilities for the genetic regional effects were estimated, and the presence of a QTL in the region was tested using a likelihood ratio test (LRT). Several region sizes were considered (100, 50 and 20 adjacent SNPs). Bonferroni correction was used to calculate the LRT thresholds for genome-wide (p < 0.05) and suggestive (i.e., one false positive per genome scan) significance.
Genomic heritabilities (0.31, 0.32 and 0.48, respectively) and genetic correlations (0.80, -0.42 and 0.19, between trait-pairs 1&2, 1&3 and 2&3) were similar to the simulated ones. RHM identified 7 QTL (4 at genome-wide and 3 at suggestive level) for Trait1; 4 (2 genome-wide and 2 suggestive) for Trait2; and 7 (6 genome-wide and 1 suggestive) for Trait3. Only one of the identified suggestive QTL was a false-positive. The position of these QTL tended to coincide with the position where the largest QTL (or several of them) were simulated. Several signals were detected for the simulated QTL with smaller effect. A combined analysis including all significant regions showed that they explain more than half of the total genetic variance of the traits. However, this might be overestimated, due to Beavis effect. All QTL affecting traits 1&2 and 2&3 had positive correlations, following the trend of the overall correlation of both trait-pairs. All but one QTL affecting traits 1&3 were negatively correlated, in agreement with the simulated situation. Moreover, RHM identified extra loci that were not found by association and linkage analysis, highlighting the improved power of this approach.
RHM identified the largest QTL among the simulated ones, with some signals for the ones with small effect. Moreover, RHM performed better than association and linkage analysis, in terms of both power and resolution.
PMCID: PMC4195407  PMID: 25519517
25.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network 
PLoS Genetics  2009;5(8):e1000587.
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.
Author Summary
An association study examines a phenotype against genotypic variations over a large set of individuals in order to find the genetic variant that gives rise to the variation in the phenotype. Many complex disease syndromes consist of a large number of highly related clinical phenotypes, and the patient cohorts are routinely surveyed with a large number of traits, such as hundreds of clinical phenotypes and genome-wide profiling of thousands of gene expressions, many of which are correlated. However, most of the conventional approaches for association mapping or eQTL analysis consider a single phenotype at a time instead of taking advantage of the relatedness of traits by analyzing them jointly. Assuming that a group of tightly correlated traits may share a common genetic basis, in this paper, we present a new framework for association analysis that searches for genetic variations influencing a group of correlated traits. We explicitly represent the correlation information in multiple quantitative traits as a quantitative trait network and directly incorporate this network information to scan the genome for association. Our results on simulated and asthma data show that our approach has a significant advantage in detecting associations when a genetic marker perturbs synergistically a group of traits.
PMCID: PMC2719086  PMID: 19680538

Results 1-25 (954199)