|Home | About | Journals | Submit | Contact Us | Français|
Empirical studies of quantitative genetic variation have revealed robust patterns that are observed both across traits and across species. However, these patterns have no compelling explanation, and some of the observations even appear to be mutually incompatible. We review and extend a major class of theoretical models, ‘mutation–selection models’, that have been proposed to explain quantitative genetic variation. We also briefly review an alternative class of ‘balancing selection models’. We consider to what extent the models are compatible with the general observations, and argue that a key issue is understanding and modelling pleiotropy. We discuss some of the thorny issues that arise when formulating models that describe many traits simultaneously.
Many quantitative traits show substantial heritable variation and yet appear to be subject to stabilizing selection. This is a paradox because stabilizing selection is expected to eliminate variation. A major outstanding problem is therefore to deduce the true nature of selection acting on such traits, and on the genes that influence them (Barton & Turelli 1989; Barton & Keightley 2002). The biological reality will probably be messy. An organism's phenotype can be described by an effectively infinite number of traits, most of which are under some (maybe weak) selection. These traits are influenced by only a finite number of genetic factors, and because almost every gene has some (maybe small) effect on every trait, pleiotropy is ubiquitous. One can therefore take the view that the problem is really one of estimation rather than of hypothesis testing. What is the joint distribution of allele frequencies and effects on traits and fitness? However, the estimation problem is very hard. Weak selection is difficult to measure, the causes of selection can rarely be determined without experimental manipulations, and we need to estimate whole distributions of effects and not just those of a few major factors. It therefore seems that more progress will be made by using and testing models that are much simpler than reality. Simple models allow us to draw digestible and generalizable conclusions. The models can be purely statistical, in the tradition of the regression models of the turn-of-the-century biometricians, or mechanistic and based on Mendelian principles. Fisher (1918) laid the foundations for understanding the link between the two.
Pearson (1903, p. 18) and Robertson (1967) identified that we can distinguish two kinds of simplified model for how selection can act on a given trait, which in the present context are now known as true and apparent stabilizing selection. In both cases, individuals with intermediate trait values have higher fitness. Under true stabilizing selection, intermediate trait values cause higher fitness, so selection acts on genes contributing to variation directly, via the trait of interest. Under apparent stabilizing selection, the trait of interest is selectively neutral, but mutations affecting that trait also affect traits that are under selection in such a way that individuals with extreme values of the focal trait have lower fitness. A simple example of apparent stabilizing selection is a single locus with two alleles and heterozygote advantage for fitness, where the alleles have additive effects ±(1/2)a on a neutral trait.
Putting aside the biological issue of which traits cause selection, the distinction between real and apparent stabilizing selection is essentially a semantic one. A valid population genetic description of the genetic basis of any trait consists of only the effects of alleles on the trait and on fitness, and their frequencies (and perhaps map positions). The difference between real and apparent stabilizing selection is whether we model the other (hidden or not measured) traits in order to determine the net fitness effect of a given allele, or simply assume a convenient distribution for those fitness effects.
Although purely statistical models have enjoyed great success in predicting the short-to-medium-term effects of selection on populations, they can say nothing about the mechanistic genetic basis of the traits they describe. The genetic details are important in themselves, and are crucial for predicting medium-to-long-term evolution. This is important for assessing the likely success and implications of attempts to determine the genetic basis of complex traits such as blood pressure, susceptibility to many diseases, or the basis and probable future evolution of drug or pesticide resistance.
One approach is to extrapolate from the known genetic bases of traits that have been studied in detail. In some areas, this approach is a good one; for example, much is known about the genetic basis of Mendelian diseases (Hirschhorn et al. 2002). However, much less is known about the basis of complex genetic diseases. To extrapolate from what is known is to risk serious bias because the few factors found so far are probably those easiest to detect, having uncharacteristically high penetrances and simple allelic architectures (Pritchard & Cox 2002). One might hope that a fruitful approach would be to identify a suite of observations (including, but not limited to, those of the type just described) that seem to apply to quantitative traits in general, and to inquire what models are consistent with them. To make inferences and predictions, we must generalize, but should do so with due caution; our observations are always of ‘the variation that we observe in a particular measurement made in a certain way on a particular population,’ and we should be wary of ‘pretend[ing] to ourselves that we are dealing with a general property of the individual rather than a very specific observation of that property’ (Robertson 1967).
The layout of the paper is as follows. In §2, we describe some observations that seem to apply to ‘quantitative traits in general’. In §3, we discuss two types of models of selection on quantitative traits. Following the literature, we concentrate in §3 on ‘mutation–selection balance’ models in which selection acts solely to eliminate variation, which is opposed by continual generation of variation by mutation. However, in §3d, we also discuss ‘balancing selection models’ in which selection acts partly to preserve variation. Although mutation is the ultimate source of variation, it need not feature explicitly in such models. In §4, we discuss to what extent the models are compatible with our general observations, and consider some thorny issues relevant to models that describe many traits simultaneously.
The total variation observed in a quantitative trait is called the phenotypic variance, VP. Ignoring linkage disequilibrium, this can be decomposed as
where VG is the genetic variance, VE is the environmental variance, and 2Cov(G, E) includes effects of genotype×environment interactions (G×E). The genetic variance is made up of additive, dominance and epistatic (interaction) variances, VA, VD and VI, respectively. Broad sense heritability is VG/VP and measures to what extent a trait is heritable. This broad sense heritability can be estimated from comparison of outbred and inbred lines (given that VG=0 in the latter). In many applications, it is the ordinary or narrow sense heritability, defined as h2=VA/VP, that is more important. This can be used to predict (and can therefore be measured by) the response to artificial selection, or regressions between phenotypes of related individuals.
One striking observation is that, although almost any level of heritability can be found for some trait in some population, heritabilities for the majority of traits in either wild or random bred laboratory populations are typically between 0.2 and 0.6 (Roff 1997; Lynch & Walsh 1998). There are a few patterns. Traits more closely related to fitness show lower heritabilities, but when they are scaled appropriately, we see that this is because they have much greater VE, and in fact have greater VG too (Houle 1992). It is puzzling that levels of heritability are so pervasive, so high and roughly constant.
The common observation that the great majority of morphological traits have high heritabilities applies even to small populations; for example, in the various species of Galapagos finches (Grant 1986). In such cases, occasional immigration may compensate for loss of variation through drift in subpopulations (Grant & Grant 1992; Keller 1998). Nevertheless, there is no obvious relation between heritability and population size. Reed & Frankham (2001) find little correlation between measures of (primarily) electrophoretic variation and quantitative variability in wild populations, which is surprising if both kinds of variation increase with Ne. However, because electrophoretic variation also shows only a weak increase with population size (Gillespie 1991, ch. 1), and a weaker increase than putatively neutral synonymous sequence variation (Gillespie 2001 and references therein), the weak relation between electrophoretic and quantitative variation becomes less remarkable.
Clearly, the root source of quantitative genetic variation is mutation. The rate of input of new variation can be measured by the increase in variance in an inbred line. The effect of spontaneous mutation is typically measured in inbred lines, and thus measures only homozygous effects. This type of experiment gives reliable estimates of Vm, the (steady-state) per generation increase in variance owing to mutation. For many traits, Vm falls in the range 10−3 VE to 10−2 VE (reviewed by Lynch 1988; Lynch & Walsh 1998, ch. 12). Again, this is surprisingly constant. However, it is difficult to disentangle whether this mutational variance is contributed by a few mutations of larger effect, or many mutations of small effect (Keightley & Eyre-Walker 1999; Lynch et al. 1999). So far, no coherent picture has emerged. For example, in Drosophila melanogaster, the net effect of spontaneous mutations is almost always to reduce fitness components (e.g. Keightley 1996; Fry et al. 1999; Keightley & Eyre-Walker 1999 and references therein), whereas in the plant Arabidopsis thaliana, the average net effect of spontaneous mutations neither increases nor decreases fitness components (Shaw et al. 2000). Moreover, when a gamma distribution of effects was fitted to data for life-history traits in Caenorhabditis elegans, the highest support was found for a distribution with mostly equal effects and a coefficient of variation (CV)0, and distributions with CV>1 were rejected (Keightley & Caballero 1997). On the other hand, with similar data for D. melanogaster, the highest support was found for CV≥1, and distributions with very small CV were rejected (Fry et al. 1999). Although manipulations of mutation accumulation lines can be used to infer average dominance coefficients, such approaches have limitations and tend to yield averages weighted in a way that may be of little evolutionary relevance (Caballero et al. 1997).
More is known about distribution of effects of artificially induced mutations, primarily those generated by transposable element insertions. Because the number of mutations is controlled, the means, variances, covariances and higher moments of mutational effects on traits (such as bristle number) and on fitness components can be estimated. For example, a study of P element-induced mutagenesis on bristle number traits in D. melanogaster (Mackay et al. 1990) suggests that mutations have leptokurtic distributions of effects (which for a gamma distribution implies a high CV) on both fitness and quantitative traits.
Many traits appear to be under stabilizing selection; that is, selection favouring an intermediate value of the trait. Evidence that extreme phenotypes cause reduced fitness comes from experimental manipulations, and from the observed constancy of form over evolutionary time. Measuring stabilizing selection directly is difficult because it is only meaningful if measured under natural conditions. It is usually measured by the (standardized) quadratic selection gradient, γ, defined as the regression of fitness on squared deviation of trait value from the mean (after normalizing trait values so that VP=Var[P]=1). If there is no directional selection, then
Observing γ<0 is usually interpreted as implying stabilizing selection, although, strictly, it only implies a fitness function that is on average convex (Schluter & Nychka 1994). Many measurements of γ are reviewed and analysed by Endler (1986, ch. 7) and Kingsolver et al. (2001). A summary of 465 estimates is reproduced as figure 1. Studies of this type are intrinsically unable to distinguish between true and apparent stabilizing selection. It seems that (apparent) stabilizing selection and (apparent) disruptive selection are equally common. Although the strength of stabilizing selection now seems much weaker than had previously been thought (e.g. as reviewed Endler 1986, ch. 7), it is nevertheless much stronger than assumed in most theoretical analyses. The median γ=−0.1 for stabilizing selection corresponds to a value of or when real stabilizing selection is modelled using a Gaussian or ‘nor-optimal’ fitness function
The parameter Vs has a concrete interpretation; the reduction in fitness owing to variation around the optimum is VP/(2Vs) which has median of ca 10%. Unless heritability is extremely high, the estimated strengths of stabilizing selection are mostly nowhere near as weak as the value Vs/VE=20 or range of 10–100 used in much theoretical work (Lande 1975; Turelli 1984; Bürger 2000, ch. 7). It is clear that traits under statistically significant stabilizing selection are under much stronger selection than has been assumed in theoretical work. Taking the non-significant estimates at face value suggests many other traits are under stabilizing selection that would be considered strong, Vs/VE<10, by theoreticians. However, the distribution shown in figure 1 includes sampling errors and the true absolute values of γ could be much smaller if these errors were large. Kingsolver et al.'s (2001) meta-analysis does not attempt to estimate the true distribution of γ.
There is at least one classic documented example of apparent stabilizing selection. Kearsey & Barnes (1970) showed that the strength of stabilizing selection on Drosophila bristle number depends on the level of crowding experienced by the larvae at a life stage before bristle traits are expressed. In this case, at least, selection on bristle traits must be mediated by other traits.
When a population is subjected to artificial selection, the response of the trait mean at any time depends only on VA at that time for that trait. Prolonged artificial selection changes allele frequencies at the underlying loci, which, in turn, change VA, and therefore the medium and long-term dynamics of the trait mean contain information about the underlying genetic basis. (The dynamics of the trait variance are often noisy and it is not clear what information they add when VA has already been inferred from the selection response.)
For practical reasons, the great majority of selection experiments have been on small populations, numbering tens or hundreds. It is remarkable, therefore, that sustained responses have been seen in such experiments—the classic example being the Illinois corn experiment (Dudley & Lambert 2004) and, in general, we see a response that is sustained at a roughly constant rate for a period of many generations (say, 10–100, depending on the experiment), after which the rate of response declines. We know of no example of an accelerating response to artificial selection on an outbred base population (although this is the norm for an inbred base population).
The sustained steady response can be explained by a mixture of (i) many loci segregating for alleles with small effects, so that allele frequencies change slowly; and/or (ii) a large and steady input of mutational variance (Barton & Keightley 2002, fig. 2b,c). The eventual decline in rate and plateau of response may be a result of exhaustion of genetic variation present at the start of the experiment (fixation of alleles by either selection or drift), and/or increasing strength of natural selection opposing artificial selection or its pleiotropic side effects.
Weber & Diggins (1990) showed that selection on much larger populations of Drosophila led to a significantly greater response. Reviewing several long-term experiments, they found that the response ratio of 50 generations to that in the first generation fitted that expected under the infinitesimal model remarkably well (figure 2), with little effect of alleles being moved to fixation by selection. The estimated contribution of new mutations over this time is negligible under the infinitesimal model, but could be substantial if alleles of larger effect are involved (Barton & Keightley 2002). However, the responses surveyed by Weber & Diggins (1990) were lower than expected from standing variation, which suggests that mutation is not a significant contributor. In addition, Keightley (2004) surveyed several selection experiments on inbred populations and found that the selection response at 50 generations is an order of magnitude lower than seen in selection on outbreds. Although mutation must dominate over long time-scales, it does not make a substantial contribution at 50 generations.
In contrast, Keightley et al. (1996) detected sharp changes in frequency of neutral markers during 21 generations of artificial selection, which must have been caused by hitchhiking with quantitative trait locus (QTL) that were experiencing sharp changes in allele frequency. However, this experiment began with a cross between inbred lines, so blocks of genome may have acted as QTLs with large effects, and recombination may have released new variation during the experiment.
For pairs of lines selected for the same trait at the same intensity but in opposite directions, the response (after the first generation) is often asymmetric. For traits thought to be positive components of fitness (e.g. many aspects of size), the response is greater in the downward direction (Frankham 1990). In the longer term, the ‘low’ line often plateaus at some minimum (e.g. oil content in maize; Dudley & Lambert 2004), although the response may continue if the trait is measured on some other scale.
The variance between replicate experiments (the repeatability of the response) is potentially informative. The relatively low variance between replicates, especially when started with small populations, suggests that the response is not based on alleles at extreme frequencies in the base population (James 1971; Frankham 1980; Hill & Caballero 1992), but to our knowledge, this restriction has not been quantified. This type of experiment is very noisy, however, and a very large numbers of replicates would be needed for firm conclusions to be drawn (Whitlock & Fowler 1999).
QTL mapping experiments attempt to determine the genetic basis of a trait directly. They are analogous to gene mapping in humans, but have much greater power and resolution when large controlled crosses can be made. When QTL are searched for by crossing high and low lines, typically a small number (ca 10 or less per chromosome) and of large effect are found (Lynch & Walsh 1998, ch. 15). Although it will be this type of work that ultimately determines the genetic basis of any trait (exactly where the genes are and what are their effects), they are not a good way to infer the overall distribution of effects for all genes influencing a trait. This is because of both ascertainment and statistical biases; even in large experiments, only the QTL with relatively large effect are detected, and when their effects are then estimated those estimates are upwardly biased (Lande & Thompson 1990; Göring et al. 2001). It is only possible to correct such biases if independent experiments are carried out (Lande & Thompson 1990) or if quite strong assumptions are made about the distribution of effects of undetected QTL (Otto & Jones 2000). Further, many experimental designs cannot give any information about the frequencies of alleles in natural populations.
In a study of QTL on chromosome 3 affecting an index of wing shape in Drosophila melanogaster (Weber et al. 1999), the authors observed that two contrasting models fitted their (large) dataset almost equally well. The first model was built by QTL mapping, and the best model with 11 QTL and 9 pairwise epistatic interactions could be made to fit the data closely (r2=0.96). The second model was effectively an infinitesimal model, assuming many loci of individually small effect and no epistasis. The density along the chromosome of loci with effects on the trait was fitted to the data and achieved almost as good a fit (r2=0.93). This suggests that there is little power, even in a large F2 and backcross QTL experiment, to distinguish these two alternatives.
Understanding the nature and extent of pleiotropy is fundamental to understanding the evolution of quantitative genetic variation. In this paper, we emphasize the distinction between the maintenance of variation involving direct selection on the trait of interest, and indirect explanations in which selection arises from the pleiotropic effects of the alleles that affect the trait. Pleiotropy is also central to understanding constraints on evolutionary change (an issue that we take up below), and to arguments about the sizes of effects of adaptive substitutions.
Since the beginning of evolutionary biology, there has been a widespread belief that pleiotropy is ubiquitous. Darwin emphasized the importance of ‘correlated growth’, while both Darwin and Fisher's emphasis on the importance of slight variations was based on the argument that major changes would be eliminated through their deleterious side effects. (Fisher's 1930 geometric model of pleiotropy has been extended by Orr (1998) to show that the distribution of factors fixed during adaptive evolution follows an exponential distribution, with a mean that decreases as the square root of the number of pleiotropic side effects.) However, the widely held belief in the importance of pleiotropy has been based on little systematic empirical evidence (Barton & Turelli 1989; Orr & Coyne 1992; Stern 2000); indeed, Dobzhansky's classic work, which has usually been taken as support for widespread pleiotropy, often shows quite subtle side effects of Drosophila visible mutations (Dobzhansky 1937; see Stern 2000). More recent work has shown extensive pleiotropic effects of major mutations. For example, Thaker & Kankel (1992) used mitotic recombination to make small patches of Drosophila tissue homozygous for recessive lethals; they showed that 40% are cell lethal and about a third disrupt development of the visual system.
Large-scale surveys of the effects of gene knockouts in organisms such as yeast (e.g. Giaever et al. 2002) give the opportunity for investigations of pleiotropy, but the technique has not yet been applied to this issue. However, as Stern (2000) emphasizes, such studies would not tell us about the extent of pleiotropy for alleles of small effect, which probably contribute the bulk of quantitative gentic variation. In particular, changes in regulatory sequence that bind (say) just one transcription factor may have much more specific effects than deletions of the whole gene, or changes in its amino acid sequence. Stern (2000) gives the example of the gene decapentaplegic (dpp), whose loss disrupts many developmental processes, and kills the embryo. However, changes in the adjacent noncoding sequence cause specific phenotypes. For example, a 2.7kb deletion causes the wings to be held out from the body, and also reduces the numbers of sensilla on the dorsal radius wing, a 0.9kb deletion causes small gaps at the distal ends of two wing veins and some extra venation, and so on.
The common observation that traits can respond independently to selection shows that there is no absolute pleiotropic connection between them, but otherwise tells us little about the nature of pleiotropy for the individual alleles involved. This is important, because (as we explain below) pleiotropy may determine the genetic variance, even when it has no net effect on changes in the mean. The genetic covariance between traits is a sum over the covariances contributed by each allele, which may largely cancel. Conversely, even if the same genes influence each trait, different alleles may affect those traits, and so there may be no genetic covariance between them. (This appears to be the case, for example, for abdominal and sternopleural bristles in Drosophila; association studies show significant effects of the candidate loci Delta and achaete-scute, but the associations are with different variants for the two traits; Long et al. 1998, 2000.) Weber (1992) directly addressed the question of whether pleiotropic connections between genes for closely related traits prevent selection from separating them. He selected on the ratio between two nearby vein characters, and obtained a large response, despite a strong allometric relation between the traits. Again, however, this does not imply that individual alleles show weak pleiotropy.
Finally, a strong argument for widespread pleiotropy comes from the remarkably high rate of mutation to quantitative traits. As we said above, the mutational heritability, Vm/VE, is in the range 10−3–10−2 (Lynch 1988; Lynch & Walsh 1998, ch. 12). Making the reasonable assumption that the alleles involved have effects of one environmental standard deviation or less, then the total mutation rate for any given trait must be at least as large as Vm/VE. Indeed, the few estimates of the per trait mutation rate we have are around 0.1 (from maize and mice; Lynch & Walsh 1998, p. 337). The total mutation rate to deleterious mutations is also uncertain, but is unlikely to be much greater than 1 for mammals or flowering plants (Drake et al. 1998; Eyre-Walker & Keightley 1999; Keightley & Eyre-Walker 2000). Thus, even these rough figures imply that there cannot be many sets of traits each with an independent genetic basis. It is most plausible, of course, that each allele has a distribution of effects on the (very large) number of traits, but that this distribution is concentrated on some subset of traits.
The simple argument of mutation–selection balance has attracted much attention, because of both its intuitive and mathematical simplicity. If many genes contribute, then the total mutation rate could be large enough for significant variation to be maintained. Regardless of the mechanism of selection, we can say generally that selection coefficients must be the same order of magnitude as if the steady state increase in variance owing to mutation is to be eliminated at the same rate by selection (Barton & Turelli 1989). Of all Vm/VG estimates, 90% are in the range 0.001–0.04 (Houle et al. 1996).
Theoretical predictions of genetic variance at mutation–selection balance can be classified according to the type of selection they assume, as follows:
Many models of quantitative trait variation are multilocus generalizations of Crow & Kimura's (1964) continuum of alleles model. These assume that, at each of n diploid loci, infinitely many alleles are possible. At each locus, alleles are described by their effect on the trait, x, so there is a distribution f(x) that describes the population frequencies of alleles with each effect. An individual's trait value is , assuming that effects are additive within and between loci and that there is an independent random environmental component E~N(0, VE). It is also usual to assume that stabilizing selection can be modelled by a nor-optimal or Gaussian function (equation (2.4)) and that there is a stepwise Gaussian mutation scheme with rate μ at each locus. For this mutation scheme, when an allele with effect x mutates the new allele, it has effect x′ where x′~N(x, α2) is centred on the previous value x. Here, α2 is the variance in heterozygous effects of mutations.
Early analyses of this model (Kimura 1965; Lande 1975) assumed that f(x) was Gaussian. (This is an approximation that is never exactly true; Turelli 1984, and there is no empirical justification; although the distribution of phenotype values is approximately Gaussian for many traits, there are almost no data about the shape of f(x) at each locus.) Under this assumption, at mutation–selection equilibrium
(because Vm=2nμα2; Lande 1975). Typical data imply that the number of loci must be small, e.g. h2=0.5, Vm/VE=10−3 and Vs/VP=50 imply n=5. Although this is in itself reasonable, assuming the per locus mutation rate μ≤10−4, then implies that the average mutational effects must be very large relative to the phenotypic range, . If this were so, the Gaussian approximation for f(x) would fail, because it relies on both high mutation rates and relatively small mutational effects, and therefore this model under the Gaussian approximation does not fit the data.
Later analyses of this model (Turelli 1984) used a ‘house of cards’ approximation. Under a house of cards mutational scheme, when an allele with effect x mutates, the new allele has effect x′ where x′~N(0, α2) is independent of x. This can be motivated on biological grounds (Kingman 1978) as an alternative to the stepwise mutation scheme, but, more usually, it is viewed as an approximation to a variety of mutation schemes. The house of cards approximation is good when most individuals carry alleles of small effect and the variance is contributed by rare individuals carrying alleles of large effect, that is when f(x) is highly leptokurtic. In this case, at equilibrium
(Turelli 1984). For the same typical data as above, the total rate of mutation in loci affecting the trait must be 2nμ=5×10−3 (again implying that the average mutational effects must be very large, ). This is plausible if many loci affect each trait, but then the total genomic mutation rate sets a limit on how many traits there can be that have an independent genetic basis (see below).
Although Turelli (1984) derived this approximation for the continuum of alleles model, the same result is obtained for a model of many loci with finite numbers of alleles (typically two or five; Barton 1986; Slatkin 1987 and references therein). Under the house of cards approximation, the distribution of effects of alleles segregating at each locus is much more leptokurtic than Gaussian: there are common alleles of tiny effect and almost all the variance is contributed by rare alleles of large effect. This is well approximated by a ‘rare alleles’ model in which a single allele of zero effect is at high frequency and one or several rare alleles of large effect segregate independently (at the same locus, because they are rare). The genetic variance under such approximations does not depend on α because these rare alleles are independent and are held at frequencies inversely proportional to their effects.
The natural response to the preceding arguments is to study the multi-trait generalization of the real stabilizing selection model. Early work (Lande 1980; Slatkin & Frank 1990) concluded that real stabilizing selection on any given trait does not affect apparent stabilizing selection on other traits. This was an artefact of assuming that at each locus multivariate normality of allelic effects on all traits held (Zhang & Hill 2003). This assumption implies that all loci (not just all traits) can respond to selection in an arbitrary direction, which is considered extremely unlikely; there cannot be enough alleles at each locus (Turelli 1984).
For parameter values that allow a house of cards approximation to be made, Waxman & Peck (1998) show that for ≥3 traits, there is a spike in the equilibrium density function (i.e. a non-zero fraction of the population have exactly the optimum phenotype). (They also show that this behaviour will occur when there are a large number of traits, even if the HoC approximation does not apply.) This suggests a possible inadequacy of the model—it predicts a phenomenon that seems implausible. Wingreen et al. (2003) show that this behaviour arises from an unrealistic modelling assumption, that there is no correlation between the effects of a given mutation on the different traits (pointed out by Turelli 1985). Thus, as the number of traits grows, the probability of a mutation having a small overall effect vanishes. When such a correlation is allowed (Wingreen et al. 2003), the model is no longer inadequate.
Zhang & Hill (2003) applied the rare alleles approximation to a model of many traits, allowing correlations in mutational effects and multivariate Gaussian real stabilizing selection applying to all traits. They show that for real weak stabilizing selection on many traits, there can be strong apparent stabilizing selection on any given trait. When considering a segregating allele with an effect (say a) on a focal trait, the pleiotropic effects of that allele on all other traits cause it to have a net fitness effect (say s). Zhang & Hill (2003) found that, under reasonable conditions, the distribution of s becomes normal with a variance that tends to zero as the number of traits in the model increases. Thus, in this limit, their multivariate model becomes like a pure pleiotropy model or the Zhang et al. (Zhang & Hill 2002; Zhang et al. 2004) extension of it (§3c below). However, as Zhang & Hill (2003) point out, there is no empirical support for this behaviour of their multivariate model, and indeed, there is substantial evidence to the contrary (e.g. Mackay et al. 1990, see §2b). Zhang & Hill (2003) conclude that the observed strong apparent stabilizing selection cannot be caused by only weak real stabilizing selection on many traits. Although any organism has many traits under apparent stabilizing selection, many of those traits could be correlated and perhaps real stabilizing selection acts on just a few, in which case the limiting behaviour of the Zhang & Hill (2003) model need not apply (see below).
There was a need to analyse models in which each mutation affected several traits, and selection acts simultaneously on many traits, as described above. However, explicit multitrait models have often made unrealistic assumptions and/or have proved hard to draw general conclusions from. An early key insight of Hill & Keightley (1988) was that is not necessary to model the multivariate distribution of all these trait values. Instead, one can focus on a single trait of interest, and subsume the effects of mutations on all other traits into their effects on a composite trait, fitness. When it is assumed that any real stabilizing selection on the focal trait is negligible in comparison to selection on these pleiotropic side effects, we have a ‘pure pleiotropy’ model.
The simplest assumption to make is that all mutations have an equal pleiotropic effect on fitness. Each allele has a random effect on the trait; less fit individuals carry more such alleles and so tend to have more extreme phenotypes, giving rise to apparent stabilizing selection (Robertson 1967; Barton 1990; Kondrashov & Turelli 1992).
More realistic models are parameterized by a bivariate distribution m(a, s) describing the effects of mutations on the focal trait (a) and on fitness (s), and the effective population size Ne. In fact, the choice of the functional form for m(a, s) is the main distinguishing feature between the many studies (Hill & Keightley 1988; Keightley & Hill 1989; Barton 1990; Keightley & Hill 1990; Kondrashov & Turelli 1992; Caballero & Keightley 1994; Tanaka 1996; Zhang et al. 2002). However, as we illustrate in appendix A, it is possible to derive some results without making any specific assumptions about m(a, s).
Studies of this model have assumed that all mutations (at either one or several loci) segregate independently (or that only two alleles ever segregate at a given locus) and that all mutations lower fitness, using a diffusion approximation from Kimura (1969) that is a slight generalization of the rare alleles model. Most studies assume that allelic effects are codominant and combine across loci additively for the trait and multiplicatively for fitness, but in a few studies, these assumptions have been relaxed to allow dominance coefficients that covary with the effects (Caballero & Keightley 1994; Zhang et al. 2004). Although Zhang et al. (2002) claim that a pure pleiotropy model can reproduce the observed VG and observed strength of (apparent) stabilizing selection, our interpretation of the data of Kingsolver et al. (2001; see also figure 1) and the analysis in appendix A suggest that if Vm/VE<10−2, then Vs/VP>50 and thus γ>−0.01, and so that this cannot be true.
The pure pleiotropy model has two specific weaknesses. First, its behaviour is sensitive to Ne, and for many choices of m(a, s), it has the unfortunate property that VG→∞ as Ne→∞, and so holding VEVm fixed means that h2 tends to 1 as Ne increases (Keightley & Hill 1990; Caballero & Keightley 1994). (However, the slope becomes very weak for leptokurtic distributions.) Although infinite populations do not exist, there is no reported correlation between heritability and population size. The cause of this behaviour is segregation of effectively neutral mutations with substantial effects on the trait. Figure 3 and appendix A show that controlling the behaviour of a continuous m(a, s) in the neighbourhood of s=0 prevents the VG→∞ behaviour. (Zhang & Hill 2002 use a discontinuous m(a, s) with a cutoff at smin as an alternative remedy.) This highlights a serious problem, that model behaviour can be an artefact of using readily available modelling distributions (such as the multivariate gamma). These distributions are indexed by a few parameters such as their moments, and fitting these to values estimated from data cause spurious changes in the behaviour of the density near s=0 for which there is no empirical basis. Using a richer class of distributions can totally decouple the moments from the behaviour of the density near the origin, which does not avoid the problem that the model behaviour depends on an essentially arbitrary assumption about the form of the distribution m(a, s). There are few data on mutations of small effect; yet these critically determine the behaviour of the model.
The second weakness of the pure pleiotropy model is that it can only explain very weak apparent stabilizing selection, much weaker than what is observed (as noted above). We show in Appendix A (see also Zhang et al. 2002) that the pure pleiotropy model with an infinite population predicts that
which is typically greater than 50 (implying γ>−(1/100)), and is not true for most traits (figure 1).
Stronger correlations between fitness and trait value (and hence stronger apparent stabilizing selection) could be generated if there is epistasis (Kondrashov & Turelli 1992; Gavrilets & de Jong 1993), but this has unfortunately been neglected in most models.
The pure pleiotropy model has recently been extended to include real stabilizing selection on the focal trait (Zhang & Hill 2002; Zhang et al. 2004). One property of this model seems to be that if VG is high, VS,TVS,R (where T denotes total and R denotes real). This illustrates that pleiotropic effects on fitness cannot give the appearance of much stronger stabilizing selection than the real stabilizing selection acting on a trait. In combination with the arguments reviewed here that it is unrealistic to assume independent real stabilizing selection on many traits, this causes quite serious difficulties for mutation–selection models.
Balancing selection can maintain variation in several ways. The best known is by heterozygote advantage, but this cannot be invoked as a general explanation for either molecular or quantitative variation: haploids and habitual selfers show substantial variation (e.g. Charlesworth & Mayer 1995; Podolsky 2001). Selection that favours rare alleles provides a more general mechanism; frequency dependence can be direct (e.g. at plant self-incompatibility loci), or indirect, for example, being mediated by interactions between host and parasite. Fluctuating selection alone eliminates variation (Haldane & Jayakar 1963), but when combined with a low rate of mutation, sustains a succession of selective substitutions that maintain variability (e.g. Kondrashov & Yampolsky 1996; Bürger 1999, 2000, p. 344). Finally, variation can be sustained by migration between local populations that experience different selection (Felsenstein 1979; Barton 1999).
All these models can operate either via direct selection on a quantitative trait, or indirectly when trait variation results from the pleiotropic effects of balanced polymorphisms (Robertson 1956; Gillespie 1984; Barton 1990). Heterozygotes will be fitter if they tend to be closer to the trait optimum (Wright 1935; Hastings & Hom 1989), or are less sensitive to a fluctuating environment (Gillespie & Turelli 1989; Turelli & Barton 2004). Frequency-dependent selection can arise if individuals with similar trait values compete for resources (e.g. Roughgarden 1972; Slatkin 1979; Bürger & Gimelfarb 2004), and migration along a cline in trait optimum can maintain variation (Felsenstein 1979; Barton 1999). However, we do not have clear examples of any of these direct mechanisms, and the arguments above make indirect pleiotropic explanations more plausible. If substantial numbers of balanced polymorphisms are maintained by selection, then we expect them to contribute to trait variance.
We are in the somewhat embarrassing position of observing some remarkably robust patterns, that are consistent across traits and species, and yet seeing no compelling explanation for them. Our models are for the most part sensitive to parameters such as population size and selection strength, and worse, some observations appear incompatible—for example, strong stabilizing selection and high heritability, or small numbers of identified QTL, and sustained and replicable selection response. The key observations are:
For traits that appear to be under stabilizing selection, mutation–selection balance models have difficulty explaining the measured strength of selection without assuming that it is mostly contributed by real rather than apparent stabilizing selection. Yet there cannot be real stabilizing selection on an indefinitely large number of independent traits. Models of stabilizing selection on multiple traits must face the question of just how many independently evolvable traits there are. Plainly, the phenotype as a whole is described by an infinite number of traits—not just the infinite number of measurements needed to describe adult shape, but also the change in morphology and behaviour through time, and across different environments.
One problem is that relatively high total mutation rates must be invoked to explain observed levels of variation, and indeed available estimates of per-trait mutation rates are high, about 0.1 (Lynch & Walsh 1998). As discussed in §2f, the total mutation rate then sets quite a low limit on the number of traits that can have completely disjoint genetic bases. If most mutations affect several traits, then it is not adequate to model each trait in isolation.
A second problem with assuming that a very large number of independent traits are highly heritable, and also subject to strong stabilizing selection, is that the reduction in fitness owing to deviation from the optimal phenotype is ˜nVG/Vs, and so at most ten or so independent traits could have Vs<10VG. This argument is a purely phenotypic one, and does not depend on how variation is maintained. If deviations from each of a large number of traits reduce fitness independently, and if each of those traits has high variance, then net fitness must be low.
The basic difficulty we face comes from the apparently high heritabity of every measured trait. If the stabilizing selection we observe is ‘apparent’, then there is no problem; individuals extreme for one trait will tend to be extreme for others, breaking the assumption of independence. If we keep to the basic model of stabilizing selection on many traits, then strong selection can act on only a few of them. We can define directions in which selection acts independently by taking the eigenvectors of the covariance matrix that generalizes Vs. Then, the strength of stabilizing selection along the great majority of directions must be weak (VG/Vs<1/n). However, selection can act strongly in a few directions (including key traits such as body size); if we make measurements on some arbitrary trait, it is likely to include components from the few strongly selected traits, and we will observe strong stabilizing selection. Although, in principle, the Pearson–Lande–Arnold (Pearson 1903; Lande & Arnold 1983) approach to measuring multivariate selection gradients could demonstrate such a pattern, the statistical difficulties seem daunting.
As well as having an infinite dimensional phenotype, it is also clear that organisms can evolve in a large but finite number of dimensions within this infinite-dimensional phenotype space. A naive view would associate each gene with a single dimension. Such a simple relation might be justified for a structural protein or metabolic enzyme, where all that matters is the amount produced or the flux catalysed—though even this ignores the interaction of even such simple genetic functions with other genes and with the environment. As Stern's (2000) example of dpp discussed in §2f above makes clear, genes involved in development may be influenced by multiple regulatory sequences, and different variants may show qualitatively different phenotypes. We can think of each allelic variant as causing a particular phenotypic change, corresponding to a particular direction in phenotype space.
The number of possible regulatory sequence variants is enormous—potentially 43000 for a 3kb region that influences gene expression. However, what is relevant is the number of variants that is available to an evolving population. In the short term, this is the number of haplotypes segregating in the population, which might be small. However, in a reasonably large population, recombinants between these will be available within a few generations, as will all single-nucleotide mutations. (In extremely large populations, and with a high mutation rate, multiple mutations will also be available. For example, Lehman & Joyce (1993) selected on a mutagenised population of ca 1013 RNA molecules, and estimated that all possible four-step mutations were available in the base population (Lehman, personal communication). However, for moderately sized populations (e.g. <109) with low mutation rates, we need consider only single-step mutations.)
The argument is complicated by sequence variability within populations. Gavrilets (2004) has emphasized that many interconnected sequences can satisfy the same phenotypic constraints, so that populations can spread across large ‘nearly neutral networks’. Thus, relatively few mutations may be needed to cross from one high-fitness network to another (Schultes & Bartel 2000 give an intriguing example, involving two different ribozymes). If a population is spread across diverse sequences, then single-step mutations can generate many more alleles. However, it is hard to see how to quantify this argument, because it depends on epistatic effects of genetic background.
If we assume that each variant specifies a unique direction of change in phenotype space, then we can find a rough upper bound on the number of dimensions through which a population can evolve. On this argument, the number of available alleles corresponds to the number of dimensions; for an organism with 20000 genes, each with 3kb of coding sequence each site of which can mutate to three alternative bases, we have ca 2×108 dimensions available. This ignores complex rearrangements such as insertions and deletions, and ignores the (smaller) contribution from variation in amino acid sequence. However, it is a gross overestimate, in that gene function might naturally fall into a small number of dimensions—for example, determined by the strength of binding to a few transcription factors. Nevertheless, even if one guesses that 100 independent dimensions are available for each gene, there are still ca 2×106 dimensions available to short-term evolution.
The potentially large number of dimensions through which a population can evolve has consequences for the way we think of stabilizing selection, and for the likelihood of pleiotropic side effects. If we think of individual genes or nucleotide sites, then there is no great difficulty in accommodating the mutation load associated with a large number of allelic variants: the mutation rate per site is extremely low, and so the total mutation rate need not be unacceptably high.
As discussed above, the idea that organisms evolve in a space of very high dimension has motivated emphasis on micromutations as the basis for adaptive change. Fisher's (1930) geometric model encapsulates this idea, in terms of the model of multivariate stabilizing selection that underlies the models reviewed here. Orr (1998, 2000) has developed this model to describe ‘adaptive walks’, in which populations evolve by substituting successive mutations; this can be seen as the low heritability limit of a model of selection response where variation is maintained by mutation. This quantifies the advantage to modularity, which has recently been much discussed in qualitative terms (Wagner & Altenberg 1996; Carroll 2001; Hansen 2003). Essentially, when mutations have random effects on many traits, they will probably disrupt the majority even when causing an advantageous change to one of them. Hence, restricting the effects of mutations to a few dimensions (‘modularity’) increases evolvability by reducing deleterious pleiotropy. However, as Hansen (2003) points out, modularity also reduces the variety of changes that can be made, and so it is not obvious what the optimal dimensionality is.
How can we distinguish whether heritable variation is predominantly due to mutation, rather than balancing selection? This issue has been hard to resolve for variation in individual genes, despite the much greater effort expended on the problem, and the much greater information available from sequence data. However, one clear prediction is that the alleles responsible for trait variation should be at high frequency if they are maintained by balancing selection, but probably rare if maintained by mutation. Associations between traits and rare (presumably deleterious) transposable elements (e.g. Aquadro et al. 1986) give good evidence in favour of mutation–selection balance. Conversely, associations with common molecular variants have been taken as evidence for balancing selection (e.g. Long et al. 2000). However, unless one can identify the variants that actually cause trait variation, neutral divergence within allelic classes obscures these kinds of test.
Because mutation–selection models predict that most variation is contributed by rare alleles with large effects on traits, they thus predict that allele frequencies and thus VG will increase substantially under artificial selection, which could lead to an accelerating response over time. This has never been observed. However, such patterns may not be detectable above the noise introduced by random drift if the population is small (Ne100) as in most experiments (see Bürger 2000, p. 337). Moreover, when we say ‘large effects’, we mean large relative to the standing variation at each locus. If enough genes contribute, these effects could be small compared with the distribution of phenotypes, and so allele frequencies would change only slowly during artificial selection (the infinitesimal model). If this model is supplemented by alleles of large phenotypic effect that arise by mutation during the selection experiment, then it still predicts a steady selection response (Barton & Keightley 2002) and can also explain the observation that some allele frequencies do change quickly.
Inference from the relation between declining selection response and Ne is not straightforward. Although the data surveyed by Weber & Diggins (1990) are consistent with an infinitesimal model, negative responses when selection was relaxed (e.g. Yoo 1980) show that the decline must be attributed partly to countervailing natural selection. In larger populations, new alleles are generated by mutation and so those with smaller deleterious pleiotropic effects will be selected. We are not aware of any explicit predictions from this alternative model that could be compared with the data of Weber & Diggins (1990; but see Otto 2004 for general results for weak selection). It is not clear to us whether more detailed observations of the effects of relaxed selection could in principle distinguish balancing selection from pleiotropic mutation–selection balance.
Whether quantitative genetic variation is maintained by balancing selection or by mutation–selection balance, one expects that in small populations (Nes<1), genetic variance will be reduced. Thus, the lack of reduction even in populations with effective size perhaps Ne1000 suggests that selection coefficients on the alleles are of the order 0.001 or greater. This is consistent with the argument that if variation is maintained by any kind of mutation–selection balance, then selection coefficients must be of the same order as the mutational heritability, Vm/VG0.001–0.04. Systematic experiments on the effect of drift on genetic variance could help narrow these rough bounds.
We are grateful to Reinhard Bürger, Peter Keightley and an anonymous referee for helpful comments. T.J. is supported by BBSRC grant number 206/D16977.
In this appendix, we study the infinite population version of the pure pleiotropy model (Hill & Keightley 1988; Barton 1990; Zhang et al. 2002 and references therein). The model parameters include a bivariate distribution of mutational effects, m(s, z), where s is the effect on fitness and z is the effect on the focal trait. Many studies have focused on the finite population version of this model because, for the (continuous) m(s, z) they choose, the genetic variance VG tends to infinity (and so h2→1) as Ne→∞.
Rather than choose any particular parametric family for m(s, z), we allow an arbitrary distribution. However, we do assume infinite sites, rare alleles and multiplicative effects on fitness across sites. One purpose of our analysis is to identify conditions under which an infinite population model has finite VG (and perhaps Vs and other observable quantities). This is motivated by the lack of any observed correlation between h2 and Ne. If we believe that a mutation–selection model can explain observed roughly constant heritabilities over a wide range of Ne, then we may wish to focus on models that are well behaved in the limit Ne→∞. On the basis of the following analysis, we argue that ill behaviour of some previously studied models is an artefact of considering a limited class of distributions, rather than a property of the pure pleiotropy model with a continuous m(s, z) per se.
We use a haploid model. This applies for diploids either (i) when additive effects are assumed, or (ii) in the rare alleles approximation where mutant homozygotes can be ignored (Zhang et al. 2004). Our model is therefore parameterized by the distribution of heterozygous mutant effects. We follow the distribution f(x, z) in an infinite population of haplotypes, where 0≤X is the negative ln-fitness and −∞<Z<∞ is the trait value of a randomly selected individual. Mutation effects are drawn from the distribution m(x, z). Mutational effects on negative ln-fitness, 0≤X−ln(1−S), and on the trait, −∞<Z<∞, are all additive across loci. Here, X or Z can both denote a property of either an individual or a mutation. The distributions of interest have moment generating functions
and , respectively. Because effects on fitness are assumed to be multiplicative across loci, and effects on the trait are neutral, the population stays in linkage equilibrium and it is sufficient to follow an asexual population, for which f(x, z) is dynamically sufficient. The recursion for selection is
and the recursion for mutation as a Poisson process with rate nμ per haplotype is
(see Johnson 1999). For an isogenic initial condition Mf(u, v, t=0)=1, at time τ,
and a stationary distribution can be found by taking the limit τ→∞ (see Johnson 1999). We can therefore write down the a, bth cumulant of the stationary distribution f(x, z), by differentiating the cumulant generating function (which is the logarithm of the moment generating function) and assuming the order of differentiation, summation and integration can be interchanged as follows:
Cases of (A11) for particular (a, b) were found by Zhang & Hill (2002) and Zhang et al. (2002), although, in the latter case, with the denominator replaced by representing the combined effects of pleiotropic and real stabilizing selection. To our knowledge, the simple and general relationship between the cumulants of the distribution over individuals and the moments of the distribution over mutations is novel.
Often, we will be interested in m(x, z) that are symmetric about z=0. Therefore, f(x, z) will also be symmetric about z=0 and Ef(Z)=0. Then, some useful relationships between the cumulants κ and the central moments m are
Suppose an individual's breeding value is G=Z+Z′, where Z and Z′ are independent genetic contributions with common distribution f(x, z). Then (when Ef(Z)=0),
A special case is when mutational effects X and Z are independent, so
and we see that VG is finite if, and only if, the distribution of selection coefficients has non-zero harmonic mean . This was stated less explicitly by Barton (1990).
Expressions for VG when mutational effects X and Z are not independent were derived previously for the special case where m(s, z) is a reflected bivariate gamma distribution (Zhang & Hill 2002; Zhang et al. 2002). Zhang et al. (2002) suggest that an ‘arbitrary cutoff’ in the support of m(s) would stop VG→∞ up as Ne→∞, and consider a discrete distribution of selection coefficients. Our analysis shows that this is not necessary. Equation (A15) shows that finite Em(Z2/S) is necessary and sufficient for finite VG. Loosely speaking, when mutational effects are small, their effect on the trait z must typically be ‘smaller’ than (where s is their effect on fitness). For example, if the conditional random variable
for k a finite constant, and some ‘umbrella’ random variable U with distribution independent of s and finite variance, then
is finite. It can be proved that this condition only has to hold in the neighbourhood of the origin, by partitioning Em(·) according to whether s<ϵ or s≥ϵ for any small ϵ>0 and noting that the latter expectation is always finite.
We define as the regression of log-fitness on squared trait value, after normalization so that the trait variance VP is one. This will be approximately equal to γ (the stabilizing selection gradient, or regression of relative fitness on squared normalized trait value; see §2c or Lande & Arnold 1983) when most individuals have fitness close to one. when Gaussian stabilizing selection is assumed.
An individual's phenotype is P=Z+Z′+E where z and z′ are independent genetic contributions with common distribution f(x, z) and E~N(0, VE) is an independent environmental contribution. (More formally, we define a distribution where ϕ(e) is a Gaussian density.) Then, using (2.2), , symmetry between (X, Z) and (X′, Z′), and the fact that , we have
and the equivalent strength of Gaussian stabilizing selection is
where (when mutations have mostly small effects on fitness, XS)
when Hz≤1 and when Hz≤1/2 (where Hz is the dominance coefficient, which can covary with Z).
Inequalities that follow from (A23) have been derived before (especially ; see Zhang et al. 2002; Zhang & Hill 2003). Our result applies for an arbitrary distribution of mutational effects, and also is more stringent. Assuming Hz≤1, (A23) can be rewritten
If Vm/VE<10−2, then this implies Vs/VP>50 for any heritability.
One contribution of 16 to a Theme Issue ‘Population genetics, quantitative genetics and animal improvement: papers in honour of William (Bill) Hill’.