We are in the somewhat embarrassing position of observing some remarkably robust patterns, that are consistent across traits and species, and yet seeing no compelling explanation for them. Our models are for the most part sensitive to parameters such as population size and selection strength, and worse, some observations appear incompatible—for example, strong stabilizing selection and high heritability, or small numbers of identified QTL, and sustained and replicable selection response. The key observations are:
For traits that appear to be under stabilizing selection, mutation–selection balance models have difficulty explaining the measured strength of selection without assuming that it is mostly contributed by real rather than apparent stabilizing selection. Yet there cannot be real stabilizing selection on an indefinitely large number of independent traits. Models of stabilizing selection on multiple traits must face the question of just how many independently evolvable traits there are. Plainly, the phenotype as a whole is described by an infinite number of traits—not just the infinite number of measurements needed to describe adult shape, but also the change in morphology and behaviour through time, and across different environments.
One problem is that relatively high total mutation rates must be invoked to explain observed levels of variation, and indeed available estimates of per-trait mutation rates are high, about 0.1 (
Lynch & Walsh 1998). As discussed in
§2f, the total mutation rate then sets quite a low limit on the number of traits that can have completely disjoint genetic bases. If most mutations affect several traits, then it is not adequate to model each trait in isolation.
A second problem with assuming that a very large number of independent traits are highly heritable, and also subject to strong stabilizing selection, is that the reduction in fitness owing to deviation from the optimal phenotype is ˜nVG/Vs, and so at most ten or so independent traits could have Vs<10VG. This argument is a purely phenotypic one, and does not depend on how variation is maintained. If deviations from each of a large number of traits reduce fitness independently, and if each of those traits has high variance, then net fitness must be low.
The basic difficulty we face comes from the apparently high heritabity of every measured trait. If the stabilizing selection we observe is ‘apparent’, then there is no problem; individuals extreme for one trait will tend to be extreme for others, breaking the assumption of independence. If we keep to the basic model of stabilizing selection on many traits, then strong selection can act on only a few of them. We can define directions in which selection acts independently by taking the eigenvectors of the covariance matrix that generalizes
Vs. Then, the strength of stabilizing selection along the great majority of directions must be weak (
VG/
Vs<1/
n). However, selection can act strongly in a few directions (including key traits such as body size); if we make measurements on some arbitrary trait, it is likely to include components from the few strongly selected traits, and we will observe strong stabilizing selection. Although, in principle, the Pearson–Lande–Arnold (
Pearson 1903;
Lande & Arnold 1983) approach to measuring multivariate selection gradients could demonstrate such a pattern, the statistical difficulties seem daunting.
As well as having an infinite dimensional phenotype, it is also clear that organisms can evolve in a large but finite number of dimensions within this infinite-dimensional phenotype space. A naive view would associate each gene with a single dimension. Such a simple relation might be justified for a structural protein or metabolic enzyme, where all that matters is the amount produced or the flux catalysed—though even this ignores the interaction of even such simple genetic functions with other genes and with the environment. As
Stern's (2000) example of
dpp discussed in
§2f above makes clear, genes involved in development may be influenced by multiple regulatory sequences, and different variants may show qualitatively different phenotypes. We can think of each allelic variant as causing a particular phenotypic change, corresponding to a particular direction in phenotype space.
The number of possible regulatory sequence variants is enormous—potentially 4
3000 for a 3

kb region that influences gene expression. However, what is relevant is the number of variants that is available to an evolving population. In the short term, this is the number of haplotypes segregating in the population, which might be small. However, in a reasonably large population, recombinants between these will be available within a few generations, as will all single-nucleotide mutations. (In extremely large populations, and with a high mutation rate, multiple mutations will also be available. For example,
Lehman & Joyce (1993) selected on a mutagenised population of
ca 10
13 RNA molecules, and estimated that all possible four-step mutations were available in the base population (Lehman, personal communication). However, for moderately sized populations (e.g. <10
9) with low mutation rates, we need consider only single-step mutations.)
The argument is complicated by sequence variability within populations.
Gavrilets (2004) has emphasized that many interconnected sequences can satisfy the same phenotypic constraints, so that populations can spread across large ‘nearly neutral networks’. Thus, relatively few mutations may be needed to cross from one high-fitness network to another (
Schultes & Bartel 2000 give an intriguing example, involving two different ribozymes). If a population is spread across diverse sequences, then single-step mutations can generate many more alleles. However, it is hard to see how to quantify this argument, because it depends on epistatic effects of genetic background.
If we assume that each variant specifies a unique direction of change in phenotype space, then we can find a rough upper bound on the number of dimensions through which a population can evolve. On this argument, the number of available alleles corresponds to the number of dimensions; for an organism with 20

000 genes, each with 3

kb of coding sequence each site of which can mutate to three alternative bases, we have
ca 2×10
8 dimensions available. This ignores complex rearrangements such as insertions and deletions, and ignores the (smaller) contribution from variation in amino acid sequence. However, it is a gross overestimate, in that gene function might naturally fall into a small number of dimensions—for example, determined by the strength of binding to a few transcription factors. Nevertheless, even if one guesses that 100 independent dimensions are available for each gene, there are still
ca 2×10
6 dimensions available to short-term evolution.
The potentially large number of dimensions through which a population can evolve has consequences for the way we think of stabilizing selection, and for the likelihood of pleiotropic side effects. If we think of individual genes or nucleotide sites, then there is no great difficulty in accommodating the mutation load associated with a large number of allelic variants: the mutation rate per site is extremely low, and so the total mutation rate need not be unacceptably high.
As discussed above, the idea that organisms evolve in a space of very high dimension has motivated emphasis on micromutations as the basis for adaptive change.
Fisher's (1930) geometric model encapsulates this idea, in terms of the model of multivariate stabilizing selection that underlies the models reviewed here. Orr (
1998,
2000) has developed this model to describe ‘adaptive walks’, in which populations evolve by substituting successive mutations; this can be seen as the low heritability limit of a model of selection response where variation is maintained by mutation. This quantifies the advantage to modularity, which has recently been much discussed in qualitative terms (
Wagner & Altenberg 1996;
Carroll 2001;
Hansen 2003). Essentially, when mutations have random effects on many traits, they will probably disrupt the majority even when causing an advantageous change to one of them. Hence, restricting the effects of mutations to a few dimensions (‘modularity’) increases evolvability by reducing deleterious pleiotropy. However, as
Hansen (2003) points out, modularity also reduces the variety of changes that can be made, and so it is not obvious what the optimal dimensionality is.
How can we distinguish whether heritable variation is predominantly due to mutation, rather than balancing selection? This issue has been hard to resolve for variation in individual genes, despite the much greater effort expended on the problem, and the much greater information available from sequence data. However, one clear prediction is that the alleles responsible for trait variation should be at high frequency if they are maintained by balancing selection, but probably rare if maintained by mutation. Associations between traits and rare (presumably deleterious) transposable elements (e.g.
Aquadro et al. 1986) give good evidence in favour of mutation–selection balance. Conversely, associations with common molecular variants have been taken as evidence for balancing selection (e.g.
Long et al. 2000). However, unless one can identify the variants that actually cause trait variation, neutral divergence within allelic classes obscures these kinds of test.
Because mutation–selection models predict that most variation is contributed by rare alleles with large effects on traits, they thus predict that allele frequencies and thus
VG will increase substantially under artificial selection, which could lead to an accelerating response over time. This has never been observed. However, such patterns may not be detectable above the noise introduced by random drift if the population is small (
Ne![[similar, equals]](/corehtml/pmc/pmcents/sime.gif)
100) as in most experiments (see
Bürger 2000, p. 337). Moreover, when we say ‘large effects’, we mean large relative to the standing variation at each locus. If enough genes contribute, these effects could be small compared with the distribution of phenotypes, and so allele frequencies would change only slowly during artificial selection (the infinitesimal model). If this model is supplemented by alleles of large phenotypic effect that arise by mutation during the selection experiment, then it still predicts a steady selection response (
Barton & Keightley 2002) and can also explain the observation that some allele frequencies do change quickly.
Inference from the relation between declining selection response and
Ne is not straightforward. Although the data surveyed by
Weber & Diggins (1990) are consistent with an infinitesimal model, negative responses when selection was relaxed (e.g.
Yoo 1980) show that the decline must be attributed partly to countervailing natural selection. In larger populations, new alleles are generated by mutation and so those with smaller deleterious pleiotropic effects will be selected. We are not aware of any explicit predictions from this alternative model that could be compared with the data of
Weber & Diggins (1990; but see
Otto 2004 for general results for weak selection). It is not clear to us whether more detailed observations of the effects of relaxed selection could in principle distinguish balancing selection from pleiotropic mutation–selection balance.
Whether quantitative genetic variation is maintained by balancing selection or by mutation–selection balance, one expects that in small populations (
Nes<1), genetic variance will be reduced. Thus, the lack of reduction even in populations with effective size perhaps
Ne![[similar, equals]](/corehtml/pmc/pmcents/sime.gif)
1000 suggests that selection coefficients on the alleles are of the order 0.001 or greater. This is consistent with the argument that if variation is maintained by any kind of mutation–selection balance, then selection coefficients must be of the same order as the mutational heritability,
Vm/
VG![[similar, equals]](/corehtml/pmc/pmcents/sime.gif)
0.001–0.04. Systematic experiments on the effect of drift on genetic variance could help narrow these rough bounds.