Individuals within species show heritable variation for many traits of biological and medical interest. Most heritable traits follow complex inheritance patterns, with multiple underlying genetic factors2,3
. Finding these factors has been a central focus of modern genetic research in humans, as well as in model organisms and agriculturally important species4–7
. Recent work, most notably genome-wide association studies (GWAS) in humans, has underscored the problem of “missing heritability” —although many genetic loci have been identified for a wide range of traits, these typically explain only a minority of the heritability of each trait, implying the existence of other, undiscovered genetic factors1
Multiple non-mutually-exclusive explanations have been proposed for missing heritability1
. One possibility is that the undiscovered factors could have effects that are too small to be detected with current sample sizes, or even too small to ever be individually detected with statistical significance8,9
. The existence of many small-effect variants is supported by studies showing that a large proportion of heritable trait variation is tagged when all GWAS markers are considered simultaneously10,11
. Because GWAS can only detect variants that are common in the population, another possibility is that the undiscovered variants are too rare in the population to be captured by GWAS12
. One recent proposal13
highlights the fact that non-additive interactions among loci (sometimes termed “epistasis”) may inflate heritability measures14
. Other proposed contributions to missing heritability include inflated heritability estimates, structural variation, gene-environment interactions, parent-of-origin effects, heritable epigenetic factors, and “entirely unforeseen sources”1,15
. A better understanding of the sources of missing heritability is crucial for designing studies to find the missing components.
We set out to investigate these questions in the yeast Saccharomyces cerevisiae
. We and others have previously used a cross between a lab strain and a wine strain to investigate the genetic basis of many complex traits, including global gene expression, protein abundance, telomere length, cell shape, gene expression noise, and drug sensitivity, and we have demonstrated “missing heritability” in this system16,17
. Recently, we used extreme quantitative trait locus (QTL) mapping (X-QTL) a bulk segregant approach that uses pools of millions of cross progeny (segregants), to detect many loci underlying heritable trait variation18
. We showed that for one trait, loci detected by X-QTL explained most of the heritability. However, pooled approaches do not allow direct estimates of heritability, the contribution of gene-gene interactions or locus effect sizes. Here we use a large panel of individually genotyped and phenotyped yeast segregants to accurately measure the heritable components of many quantitative traits, discover the underlying loci, and examine the sources of missing heritability.
In order to estimate heritability and detect the underlying loci with high statistical power, we constructed a panel of 1008 prototrophic haploid segregants from a cross between a lab strain and a wine strain (Supplementary Fig. 1
and Methods). These strains differ by 0.5% at the sequence level19
. We sequenced the parent strains to high coverage and compared the sequences to define 30,594 high-confidence single-nucleotide polymorphisms (SNPs) that distinguish the strains and densely cover the genome. We obtained comprehensive individual genotype information for each of 1008 segregants by highly multiplexed short-read sequencing (Supplementary Fig. 1d
We sought to accurately measure a large number of quantitative traits in the segregant panel. To do so, we implemented a high-throughput end-point colony size assay and measured growth in multiple conditions, including different temperatures, pHs, and carbon sources, as well as addition of metal ions and small molecules16,18
(Supplementary Fig. 1c
). We defined each trait as endpoint colony size normalized relative to growth on control medium, (Methods) and obtained reproducible measurements with a strong heritable component for 46 traits (Supplementary Table 1
). Most of the traits were only weakly correlated with each other (Supplementary Fig. 2
Phenotypic variation in the segregant panel can be partitioned into the contribution of heritable genetic factors (broad-sense heritability) and measurement errors or other random environmental effects. Broad-sense heritability can in turn be partitioned into the contribution of additive genetic factors (narrow-sense heritability), dominance effects, gene-gene interactions, and gene-environment interactions14
. In our experiment, dominance effects are absent because the segregants are haploid, and gene-environment interactions for a given trait should also be absent as all the segregants are grown simultaneously under uniform conditions. Thus our estimates of broad-sense heritability include additive and gene-gene interaction components, while our estimates of narrow-sense heritability include only the additive component. The difference between the two heritability measures therefore provides an estimate of the contribution of gene-gene interactions.
We estimated broad-sense heritability from repeatability of trait measurements (Methods). Estimating narrow-sense heritability usually involves measuring phenotypic similarity for different degrees of relatedness. We took advantage of a recently developed genomic approach in which narrow-sense heritability is estimated by comparing phenotypic similarity among individuals with their actual genetic relatedness, computed from dense genotype data (Methods)20
. Among the 46 traits, broad-sense heritability estimates ranged from 0.40 to 0.96, with a median of 0.77. Narrow-sense heritability estimates ranged from 0.21 to 0.84, with a median of 0.52 (). An analysis which partitions additive genetic variation among chromosomes produced similar results (Supplementary Table 2
). We used the difference between broad-sense and narrow-sense heritability to estimate the fraction of genetic variance due to gene-gene interactions, which ranged from 0.02 to 0.54, with a median of 0.30. Thus, the genetic basis for variation in some traits is almost entirely due to additive effects, while for others approximately half of the heritable component is due to gene-gene interactions.
Heritability for 46 yeast traits
Next, we sought to map the additive heritable variation to specific QTL. Simple linkage analysis of one marker at a time revealed multiple QTL per trait (Methods). To more accurately capture the effects of each QTL while controlling for the other QTL affecting the same trait, we used a step-wise forward-search approach to detect QTL and build a multiple-regression model (Methods). With this approach, we detected a total of 591 QTL for 46 traits at an empirical false-discovery rate (FDR) of 5% (Supplementary Table 3
). We observed varying degrees of trait complexity, with a minimum of 5, a maximum of 29, and a median of 12 QTL per trait. These numbers of QTL are comparable to those previously seen for a smaller set of traits by X-QTL18
. Consistent with theoretical predictions21
and previous observations, we detected many more QTL of small effect than of large effect (Supplementary Fig. 3
). Some traits showed a distribution of QTL effect sizes roughly consistent with Orr’s evolutionary model21
, whereas others showed one or more larger-than-expected QTL (Supplementary Fig. 4
Having identified QTL, we next measured the fraction of additive heritability explained by our model of detected QTL for each trait. To obtain unbiased estimates, we performed 10-fold cross-validation by detecting QTL in a subset of the segregant panel and estimating the effects in the rest of the panel. Across the traits, the detected loci explained between 72% and 100% of the narrow-sense heritability, with a median of 88% (). Thus, high statistical power provided by the large segregant panel allowed us to detect QTL that jointly explain most of the additive heritability for the traits studied here. By analyzing subsets of the data, we showed that “missing” narrow-sense heritability can be explained by insufficient sample sizes (). For instance, we detected 16 significant QTL, which jointly explain 78% of narrow-sense heritability for growth in E6-berbamine, in a panel 1000 segregants, (), but only 2 of these, explaining 21% of narrow-sense heritability, also reached statistical significance in a smaller panel of 100 segregants (). For traits with mostly additive genetics, the high fraction of variance explained by the detected QTL allowed us to accurately predict individual trait values from QTL genotypes ().
Most additive heritability is explained by detected QTL
QTL detection for a complex trait
Prediction of segregant trait values from QTL phenotypes
Differences between the estimates of broad-sense and narrow-sense heritability for many traits imply the presence of genetic interactions. We next sought to identify specific two-locus interactions. For each trait, we first performed an exhaustive two-dimensional (2D) scan for pairwise interactions. At a LOD score of 6.2, corresponding to an empirical FDR of 10%, we detected significant QTL-QTL interactions for 17 of the 46 traits, with a total of 23 interacting locus pairs. A 2D scan has low statistical power due to the large search space. Power can be increased, at the cost of missing interactions between loci with no main effects, by testing only for interactions between each locus with significant additive effects and the rest of the genome22
. Using this approach, we detected interactions for 24 of the 46 traits, with a total of 78 QTL-QTL interactions at an FDR of 10%. We observed a minimum of 1 and a maximum of 16 pairwise interactions per trait. These 78 pairs included 20 of the 23 locus pairs detected in the exhaustive 2D scan, suggesting that two-locus interactions in which neither locus has a detectable main effect are uncommon. For 47 of the 78 pairs, both loci were detected as significant in the single-locus search for additive effects. In the remaining 31 cases, the additive effect of the second locus was too small to reach genome-wide significance, although it was nominally significant in 10 of these cases. These observations are broadly consistent with our previous work on genetic interactions that affect gene expression traits22,23
For most of the traits with a sizeable difference between broad-sense and narrow-sense heritability, pairwise interactions were either not detected or explained little of the difference (). The detected interaction effects were typically small (a median of 1.1% of genetic variance per interaction or a median of 3% of genetic variance per trait). Only in a few cases did detected genetic interactions explain much of the difference between broad-sense and narrow-sense heritability. Most notably, in the case of growth on maltose, one strong interaction explained 14% of the genetic variance and 71% of the difference between broad-sense and narrow-sense heritability ( inset).
Non-additive genetic variance explained by QTL-QTL interactions
We have used a large panel of segregants from a cross between two yeast strains to investigate the genetic architecture of 46 quantitative traits. We measured both the total and the additive contributions of genetic factors to trait variation, and showed that these often differ. The observed differences between total and additive heritability estimates suggest that the contribution of genetic interactions to broad-sense heritability ranges from near zero to 54%. However, with a few exceptions, the specific combinations of loci that account for these interactions remain elusive. There are several possible explanations for this result. First, the statistical power to detect interactions is lower than the power to detect main effects. Second, individual interaction effects are expected to be smaller than additive effects, and hence their detection requires even larger sample sizes13
. Finally, higher-order interactions among more than two loci could also contribute24
. Our estimates of the contribution of interactions in a cross may overestimate their contribution to trait heritability in a population, because a higher proportion of variance is expected to be additive as allele frequencies depart from one-half25
The large size of the panel allowed us to detect specific loci that jointly account for the great majority of the additive (narrow-sense) heritability of each trait (72–100%). Human traits examined by GWAS vary in their genetic complexity1
, ranging from macular degeneration, for which 5 variants in 3 genes explain roughly half of the genetic risk26
, to height, for which 180 loci explain about 13% of heritability, implying the existence of a much larger number of undetected loci27
. Compared to our results in a yeast cross, GWAS typically detect a larger number of loci explaining a smaller proportion of trait heritability. One obvious difference is that the number of variants segregating in a cross between two strains is smaller than the number of common variants segregating in a population sample. The difference is roughly a factor of three for a neutral allele frequency spectrum, and potentially much larger if functional variants are deleterious and hence shifted toward lower frequency (Methods). The human genome also offers a larger target size, perhaps by a factor of five, for variants affecting a trait (Methods). These very rough estimates suggest that we might expect at least 15 times more loci to be found by GWAS than the median of 12 loci per trait we observe in the yeast cross. Because of the resolution of linkage analysis in our cross, some QTL may contain multiple linked variants, further increasing the true number of loci. Several additional factors could lead to a larger “missing heritability” in humans: the fraction of heritability due to genetic interactions could be higher13
, rare variants may account for a disproportionately large contribution of heritable variation28–30
, and some human traits might be inherently more complex than yeast traits in that they integrate over physiological processes involving a larger number of underlying gene pathways. Within-locus dominance effects represent an additional source of genetic complexity in diploid organisms.
Our results are consistent with the suggestions that missing additive (narrow-sense) heritability arises primarily from many loci with small but not infinitesimal effects. These loci can be discovered in studies with sufficiently large sample sizes, although the optimal study designs will depend on the population frequency spectra of the causative alleles. Because all alleles are fixed at a frequency of one-half in a cross, we cannot yet delineate the contributions of common and rare variants to inherited variation, but we plan to do so in future studies.