(a) Genotype and phenotype robustness

To resolve the tension between robustness and evolvability, I first recast definitions of robustness and evolvability in terms of RNA genotypes and phenotypes. The relevant genotype space is the set of all 4

^{n} possible RNA sequences of some given length

*n*. Two sequences are 1-mutant neighbours or simply neighbours in this space if they differ in one nucleotide. The relevant phenotypes are the set of all possible RNA structures, whose number scales approximately as 1.8

^{n} (

Gruner *et al.* 1996;

Schuster 2003). The set of all genotypes forming the same structure is often called a neutral set or, when connected in genotype space, a neutral network (

Schuster *et al.* 1994). A neutral neighbour of a sequence is a neighbour that has the same structure. I define the 1-neighbourhood of a

*structure* or, equivalently, the 1-neighbourhood of a neutral network as follows: it is the set of sequences that differ from the sequences that fold into the structure by exactly one nucleotide. With these notations, I now introduce two different definitions of robustness and evolvability.

- (i)
*Genotype* (*sequence*) *robustness*. The number *R*_{G} (or fraction *r*_{G}) of neutral neighbours of a genotype G. - (ii)
*Phenotype* (*structure*) *robustness*. The number *R*_{P} (or fraction *r*_{P}) of neutral neighbours averaged over *all* genotypes G with a given phenotype. I will refer to this quantity also as the number of neutral neighbours of the structure. - (iii)
*Genotype* (*sequence*) *evolvability*. The number *E*_{G} of different structures found in the 1-neighbourhood of a sequence G. - (iv)
*Phenotype* (*structure*) *evolvability*. The number *E*_{P} of different structures found in the 1-neighbourhood of a structure P.

Below, I will show that the tension between robustness and evolvability disappears with the distinction just introduced. Specifically, genotypic robustness is negatively associated with genotypic evolvability, whereas phenotypic robustness is positively associated with phenotypic evolvability. Note that sequence robustness and structure robustness cannot be meaningfully compared to one another and neither can sequence evolvability and structure evolvability. Also note that the above definitions of evolvability are special cases of the more general definition in the introduction. A sequence that is more evolvable has more structural variants in its 1-neighbourhood. Similarly, a structure that is more evolvable has more structural variants in its 1-neighbourhood.

(b) High sequence robustness means low sequence evolvability

For reasons detailed in the methods (electronic supplementary material), I focus throughout on random RNA sequences of length *n*=30, which strikes a balance between structural richness and computational tractability. Biological RNAs are currently not suitable for this analysis, owing to the necessity of analysing many different structures of the same length and the unsolved problem of estimating the abundance or *frequency f* of a given structure. This frequency is defined as the number of sequences adopting the structure. In protein engineering, it is also known as a structure's designability.

It is well known that the distribution of structure frequencies is highly skewed (

Hofacker *et al.* 1998;

Schuster 2003). That is, there are relatively few structures adopted by many sequences, but many structures adopted by few sequences.

*a* shows the distribution of structure frequencies for a sample of 10

^{6} RNA sequences. Structure frequencies in this sample vary over a factor 1600, from less than 1.06×10

^{−6} (structures found only once) to 1.7×10

^{−3}.

A sequence G's mutational robustness *R*_{G} and the number of sequences in the 1-neighbourhood of G whose structure is different from that of G are trivially and inversely related, i.e. this number is equal to 3*n*−*R*_{G}. But what is the relationship between *R*_{G} and the number of structures in the 1-neighbourhood of a sequence that are *unique*, i.e. different from both the structure of G and from each other? To find out, I determined the number *U* of unique structures that are different from one another in the neighbourhood of G. To obtain this number, I counted all structures in the neighbourhood, but counted structures only once that occurred twice or more. By the above definition of sequence evolvability, *E*_{G}=*U*. To normalize for sequence length, one can also consider the normalized robustness *r*_{G}=*R*_{G}/3*n*, and *e*_{G}, an appropriately normalized analogue of *U*. Starting from *U*, however, there are two different ways of normalization to obtain the *proportion* of unique structures. The first approach is to divide *U* by the number 3*n*(1−*r*_{G}), i.e. by the total number of sequences in the neighbourhood of G that have structures different from the structure of G. The statistical association between *r*_{G} and *U*/3*n*(1−*r*_{G}) is weak, although highly significant in a large sample of sequences (Spearman's *s*=−0.06, *p*<10^{−17}; sample size *n*_{S}=7.5×10^{4}). It is perhaps remarkable that the association is negative despite the fact that robustness exerts a positive influence on *e*_{G} merely through this normalization: the higher *r*_{G}, the smaller (1−*r*_{G}) and therefore the greater *U/*3*n*(1−*r*_{G}).

The second approach to normalize and obtain the proportion of unique structures from *U* is to divide *U* simply by 3*n*, the total number of sequences in the neighbourhood of G. Arguably, this approach is more sensible, because it reflects the likelihood that a new structure is encountered in a blind evolutionary exploration of the neighbourhood of G. *b* shows the association between this measure of evolvability *e*_{G}=*U/3n* and *r*_{G} for 7.5×10^{4} sequences whose structures span three orders of magnitude in frequency. The two quantities are highly negatively correlated with mutational robustness (Spearman's *s*=−0.64, *p*<10^{−17}; *n*_{S}*=*7.5×10^{4}). The dashed line in the figure indicates 1−*r*_{G}, the fraction of sequences with structures different from that of G. The discrepancy between this dashed line and the data indicates that many structures different from that of G are not different from each other, otherwise the data would fall on the dashed line. The data shown in *b* is an average over structures with different frequencies.

Earlier work (

Ancel & Fontana 2000) showed—for a given structure—a negative association between mutational robustness of sequences folding into the structure and the structural repertoire found in a sequence neighbourhood. In contrast, the data shown in

*b* are based on sequences that were inversely folded from many structures with widely varying frequencies. This raises the question of whether structure frequency has any influence on the relationship between

*r*_{G} and

*e*_{G}. I addressed this question by determining a partial correlation coefficient between

*r*_{G} and

*e*_{G} that holds the structure frequency constant. The strongly negative association persists when we control in this way for structure frequency (Partial product–moment correlation coefficient

*r*=−0.65,

*p*<0.01;

*n*_{S}=7.5×10

^{4}). This means that structure frequency does not have a strong influence on the negative association between

*r*_{G} and

*e*_{G}. In sum, the greater a sequence's robustness, the lesser its evolvability.

(c) High structure robustness means high structure evolvability

By the above definition, the robustness *R*_{P} of a phenotype (structure) P is the average number of neutral neighbours of *all* sequences with phenotype P, or their proportion *r*_{P}=*R*_{P}/3*n*. The corresponding measure of evolvability *E*_{P} is the total number of structures different from P in the 1-neighbourhood of P.

Because neutral networks are so vast, neither robustness *R*_{P} nor evolvability *E*_{P} can be determined exactly. For example, in a sample of 10^{6} sequences of length *n*=30 nucleotides, the rarest structures are those that are found only once in the sample. Their frequency in sequence space cannot be estimated accurately from a sample of this size. However, any structure whose unique occurrence in such a sample is a reflection of its true frequency, e.g. a structure where *f*=10^{−6}, would still be adopted by approximately 10^{−6}×4^{n}≈10^{12} sequences. In other words, even the rarest phenotypes in a large sample of sequence may be adopted by many sequences. Although this prevents the determination of robustness *R*_{P} and evolvability *E*_{P} through exhaustive enumeration, a sampling approach can still be used to estimate these quantities. That is, one can inversely fold a sample of sequences with phenotype P and determine *R*_{P} in this sample. Similarly, one can determine for such a sample the number *U* of unique structures; that is, structures different from each other and from P. I emphasize that this is not simply the number of unique structures in the 1-neighbourhood of each sequence, added over all sequences in the sample. The reason is that the 1-neighbourhoods of two different sequences in the sample may contain sequences folding into identical structures. Instead, one needs to count structures that occur in the 1-neighbourhood of two or more sequences only once.

Because structure evolvability depends on the size of a neutral network, the resulting number *U* then still needs to be multiplied by an estimate of the total size of a neutral network, such as the structure frequency *f*. In other words, *e*_{P} will be monotonically increasing with *Uf*, although this relationship is not necessarily linear. *a* shows the relationship between phenotype robustness *r*_{P} and *Uf* (a proxy for *e*_{P}) based on 2.5×10^{4} structures, and on 100 randomly sampled sequences from each structure's neutral network. Structure robustness and evolvability show a strong positive association (Spearman's *s*=0.55, *p*<10^{−17}; *n*_{S}*=*2.5×10^{4}).

To understand this positive association, several observations are germane. First, the higher a structure's frequency, the higher its robustness (Spearman's *s*=0.64, *p*<10^{−17}; *n*_{S}*=*2.5×10^{4}; see figure S1 in the electronic supplementary material). When a neutral network is viewed as a graph, this observation simply states that the average number of neutral neighbours of a typical genotype G on the network increases with the size of the neutral network. It also means that the greater the structure frequency *f*, the more sequences in the 1-neighbourhood of any one sequence G will have the same structure as G. Conversely, the greater the structure frequency *f* is, the smaller the proportion of the neighbours of any one G that will have unique and different structures.

Next, the structures found in the neighbourhoods of two or more different sequences are typically very different from each other. To see this, consider the

*total* number of different structures found in the 1-neighbourhood of a set of sequences (G

_{1},

…,

G

_{k}) sampled from a neutral network with structure frequency

*f*. Denote the set of structures that are different from each other in the 1-neighbourhood of

*G*_{i} as {

*U*_{i}} and the size of this set as |

*U*_{i}|. When one compares two different sets, say {

*U*_{i}} and {

*U*_{j}}, there are two extreme possibilities and a wide spectrum in between: all of these structures could be identical, i.e. {

*U*_{i}}∩{

*U*_{j}}={

*U*_{i}}={

*U*_{j}}, or all of these sequences could be different, i.e. {

*U*_{i}}∩{

*U*_{j}}=

. The truth is closer to the second extreme, as illustrated by the following analysis. Consider the quantity

where

*k* is some number of sequences sampled from the neutral network of a given structure. This quantity is the proportion of unique structures in the 1-neighbourhood of G

_{i}, averaged over all

*k* sampled sequences. Next consider the quantity

which is the

*total* number of structures different from each other found in the neighbourhood of

*all k*-sampled sequences divided by the total number of sequences examined, that is,

*k* times

*3n*, because each of the

*k* sequences has 3

*n* neighbours. Of interest is the ratio of

(3.1) and

(3.2), that is

If the structures found in the neighbourhood of two sequences are very similar, then

*Q*1, because in that case

for every

*i*, which does not depend on

*k*, and

*Q* would thus approach zero with increasing

*k*. For example, for the

*k*=100 sequences sampled here, one would then expect that

*Q*≈0.01. If, however, all the structures found in the neighbourhood of two sequences are different from each other, then

*Q* would be of the order 1.

*b* shows the distribution of

*Q* for 2.5×10

^{4} structures. The median of this distribution is equal to 0.6, or 60-fold greater than the case where the structures found in the neighbourhoods of two sequences are identical. This means that most structures that occur in the neighbourhoods of different sequences along a neutral network are different from each other. For sequences that are not randomly sampled, but encountered along a random walk along a neutral network, this has been shown previously (

Huynen 1996;

Sumedha *et al.* 2007).

Because most structures in the 1-neighbourhoods of different sequences in a neutral network are different from each other, the size of a neutral network has an important influence on structure-based evolvability. Specifically, even though the number of structures found in the 1-neighbourhood of any one sequence decreases modestly with increasing neutral network size (see figure S1 in the electronic supplementary material), this decrease is much smaller than the increase in the number of different structures accessible from a much larger neutral network. In addition, the ratio *Q* increases modestly with structure frequency (*c*; Spearman's *s*=0.11, *p*<10^{−17}; *n*_{S}*=*2.5×10^{4}). This means that the larger a neutral network is, the more distinct the structures found in the 1-neighbourhoods of sequences sampled from the neutral network.

A final observation useful to understand

*a* emerges from a comparison between structure frequencies and sequence robustness. For any sequence of length

*n*, the number of neighbours with the same MFE structure can vary between 0 and 3

*n*. In contrast, the number of sequences folding into a given structure can vary over a much broader range, from zero to a fraction of a per cent of the total number of sequences, i.e. of the order of the number 4

^{n} of sequences itself. This discrepancy appears even in modestly sized samples of sequences. For example,

*d* shows the mutational robustness, normalized to (0,

1), for a sample of 7.5×10

^{4} sequences whose structure frequencies vary over the same range as that in , i.e. by a factor 1600. In contrast, mutational robustness in this sample varies only by a factor 37. In other words, even in this modest sample of sequences, structure frequencies are more than 40 times more variable than mutational robustness.

In sum, the positive association between phenotype robustness and evolvability can be explained as follows. First, the number of genotypes folding into any one structure can vary by many orders of magnitude, whereas mutational robustness among sequences of similar lengths varies more modestly. Next, most structures found near two or more sequences sampled from the same neutral network are different from each other. Thus, even though structure robustness increases modestly with structure frequency, this increase is much smaller than the vast increase in the number of different structures accessible found near a much larger neutral network.

(d) Populations evolving on large neutral networks can access greater amounts of variation

Neutral networks are vast in size and a finite population may take a very long time to explore such networks and all the structures in their neighbourhoods. It is thus important to show that robustness also affects evolvability on short time scales in finite populations. To determine whether this is the case, I first chose two different structures, one with high frequency (*f*≈10^{−3}) and thus high robustness, and the other with low frequency and low robustness (*f*≈10^{−6}). I then inversely folded 20 sequences for each of these structures. For each of the 40(=2×20) sequences, I then established a population of *N*=500 identical sequences. Each population then underwent repeated rounds of mutation (one nucleotide per sequence generation) and selection that confined the population to the neutral network. That is, in each generation, mutants that were no longer on the neutral network were eliminated and replaced by randomly sampled mutants (with replacement) that still resided on the network. After each such round of mutation and selection, I determined the total number of unique structures found in the neighbourhood of the entire population. The results show that the more robust phenotypes can access much more variation in their evolution on a neutral network (*a*). For example, after a mere 10 generations, the neighbourhoods of the population on the large network contain 2118 (±362 s.e.m.) unique structures. In contrast, the neighbourhood of the populations on the small network contains merely 874 different (±148 s.e.m.) structures.

How can an evolving *population* with a robust phenotype access more variation, despite the fact that each *individual* typically has fewer unique structures in its neighbourhood? The answer is that the populations with the highly robust phenotype are more diverse, and this increased diversity is much greater than the decreased diversity around any one sequence. *b* shows the number of different sequences for each population and *c* shows the mean Hamming distance of the sequences from each other. For both measures, the population with the robust phenotype rapidly accumulates greater diversity. But why is this increased sequence diversity observed in the first place? The reason is simply that in each generation, mutations kill fewer sequences with a more robust phenotype. Assume, for example, that in a population sequences have average robustness *r*_{G}. Then, a number of individuals proportional to (1−*r*_{G}) will be eliminated in every generation as a result of mutations. Populations of sequences with greater robustness (lower 1−*r*_{G}) can thus accumulate greater diversity.

These observations are not peculiarities of the two structures I used. I repeated this approach with inversely folded sequences derived from 4×10^{3} different structures, each of which was used to seed an evolving population. *d* shows the number of unique structures in the neighbourhood of these 4×10^{3} populations after 10 generations of mutation and selection. There is a modest but highly significant positive association between the structure frequency and the amount of phenotypic variation accessible to these structures. Populations with more frequent and thus more robust phenotypes thus have access to more phenotypic variation.

I note that these results would be qualitatively the same if I had not used ‘soft’ selection, where population sizes are held constant but ‘hard’ selection, in which population sizes are allowed to fluctuate. The reason is that in this case, populations evolving on small neutral networks would simply shrink faster over time than those on large neutral networks and show lower sequence diversity for this reason.

(e) Even when populations or mutation rates are small, populations with robust phenotypes access more variation

In the above analysis, mutations occurred at a rate of

*μ*=1 per sequence per generation, such that the population mutation rate is

*Nμ*1. The evolutionary dynamics in populations with

*Nμ*1 is fundamentally different from that in populations with

*Nμ*1 (

Kimura 1983;

van Nimwegen *et al.* 1999;

Wagner 2005). Specifically, large populations or populations with high mutation rates are likely to be polymorphic in any given generation, and their members accumulate over time in regions of a neutral network where genotypes show high genotypic robustness. In contrast, in populations where

*Nμ*1, populations are monomorphic most of the time, and they perform a random walk on a neutral network that samples the network uniformly. Given these differences, it is necessary to show that the greater structural diversity accessible to populations with robust phenotypes does not occur only when

*Nμ*1.

When

*Nμ*1, two populations differing in the robustness of their phenotypes cannot have access to dramatically different structural variation

*in any given generation*. The reason is that in most generations, both populations will be monomorphic; that is, all of their members have identical genotypes. However, the likelihood that a mutation is deleterious is smaller in populations with more robust phenotypes. This suggests that such populations can explore the neutral network faster also when

*Nμ*1 and thus encounter—over time—more structural variation.

*e* shows that this is indeed the case. The figure is based on populations that were initialized with a single randomly chosen member from a neutral network, and that evolved via selection and mutation, just like in , but with

*Nμ*1 (

*N*=10;

*μ=*0.01). The

*x*-axis indicates the number of generations. The

*y*-axis indicates the

*cumulative* number of unique structures that the population encountered in its 1-neighbourhood since generation zero. This cumulative number is obtained by recording all the genotypes on the neutral network that have occurred in the population since generation zero, and by determining the number of unique structures in their 1-neighbourhood (counting each structure that occurs in the 1-neighbourhood of two different genotypes only once). Clearly, populations with more robust phenotypes encounter greater structural diversity along their evolutionary path. Also, both the cumulative sequence diversity and the cumulative number of different sequences encountered during an evolutionary trajectory are larger in populations with robust phenotypes (results not shown). And again, these observations are not artefacts of the two phenotypes chosen: there is a positive association between the structure frequency and the structural diversity a small population encounters during its evolution (

*f*).

In sum, in populations with both

*Nμ*1 and

*Nμ*1, an evolving population with a robust phenotype accesses more variation along its evolutionary path on a neutral network. The reasons are fundamentally the same, namely that the mutations are less likely to have deleterious effects for robust phenotypes, and thus allow more rapid exploration of a neutral network.

(f) An evolutionary search's ability to find a target structure is only weakly correlated with robustness

A definition of evolvability that focuses only on the variation directly accessible from a given genotype G or phenotype P may seem limited. It does not address how a blind evolutionary search driven by mutations would find a phenotype (structure) that is not in the immediate neighbourhood of G or P, but an arbitrary distance away from it. Some genotypes G or phenotypes P might be more amenable to finding such arbitrary target phenotypes, and thus be more evolvable in this sense. If so, how is this kind of evolvability related to genotypic or phenotypic robustness?

To address this question, I pursued an approach that started with a set of 7.5×10^{4} random RNA structures that span three orders of magnitude in structure frequency.

I drew pairs of structures (S,

T) at random from this set (with replacement). For each such pair, I inversely folded a sequence G with structure S. From the starting sequence, I then performed a random walk towards the target structure T. Specifically, each step of this random walk consisted in a random change of a single nucleotide. If the mutation had not increased the Hamming distance to T, then the random walk was continued with the mutated sequence; otherwise, the original sequence was mutated again. This process was repeated until a sequence with an MFE structure T was obtained. The number of mutational steps needed to get to the target structure T can be used as a measure of evolvability.

*a* shows that sequence robustness is only marginally associated with evolvability in this sense (Spearman's *s*=0.01, *p*=0.016; *n*_{S}*=*3.7×10^{4}). What other factors might influence the length of this random walk? One candidate factor is the frequency of the starting structure S. This frequency is associated with the size of the sequence space that can be explored while staying on a neutral network. However, it is not associated with the length of this random walk either (Spearman's *s*=0.006, *p*>0.05; *n*_{S}=3.7×10^{4}).

Another candidate factor is the structure distance between the starting and the target structures. It might take longer to reach a given target structure if this structure is very dissimilar from the starting structure. However, two different structure distance measures are only weakly associated with the length of this random walk (Hamming distance: Spearman's *s*=0.03, *p*=8×10^{−7}; *n*_{S}*=*3.7×10^{4}; base pair distance, the number of base pairs that need to be opened or closed to transform one structure into the other: Spearman's *s*=−0.07, *p*<10^{−17}; *n*_{S}*=*3.7×10^{4}). The only variable that is moderately associated with walk length is the frequency of the *target* structure (*b*; Spearman's *s*=−0.21, *p*<10^{−17}; *n*_{S}*=*3.7×10^{4}). This means that regardless of the starting structure S, it is more difficult for a blind evolutionary search to get to a target structure T if this structure is rare.

Evolvability defined as the length of a random walk starting from a given sequence is a form of *sequence* evolvability. An analogous measure of *structure* evolvability can be defined as the *average* length of a random walk starting from a given structure S to a target structure T. This measure of evolvability, however, is also not associated with mutational robustness, when estimated for *k*=100 inversely folded sequences with structure S (Spearman's *s*=−0.04, *p*>0.05; *n*_{S}=910). Also it is not associated with the structure frequency of S (*s*=−0.014, *p*>0.05; *n*_{S}=910). The association between the length of this random walk and the distance between S and T is weak and depends on the distance measure used (Hamming: *s*=−0.096, *p*=0.004; base pair distance: *s*=−0.04, *p*>0.05; *n*_{S}=910). Again, the only feature of some relevance is the frequency of the target structure T (*s*=−0.34, *p*<10^{−17}; *n*_{S}=910).

The reason why the starting sequence may be irrelevant for the length of this random walk becomes obvious if one asks how many different structures the random walk encounters between S and T. A histogram of this distribution is shown in *c*. The median (mean) of the distribution is 63 (121) with a 10th percentile at 19 structures. Thus, an evolutionary search starting at S traverses many other structures before arriving at T. Arguably, during this search, the properties of the starting structure may matter much less than the properties of the structures encountered during the search. Robustness and evolvability are also not associated if one restricts the analysis to random walks that traverse fewer than 10 (Spearman's *s*=0.04, *p*>0.05; *n*_{S}*=*745) or fewer than 5 structures (Spearman's *s*=0.11, *p*>0.05; *n*_{S}*=*84) before arriving at T. This implies that the properties of the starting point are rapidly ‘forgotten’ in an evolutionary search.