PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ploscompComputational BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS Comput Biol. 2010 May; 6(5): e1000767.
Published online 2010 May 6. doi:  10.1371/journal.pcbi.1000767
PMCID: PMC2865504

Mutation Bias Favors Protein Folding Stability in the Evolution of Small Populations

Raul Mendez,# 1 Miriam Fritsche,# 2 , ¤a Markus Porto, 2 , * , ¤b and Ugo Bastolla 1 , *
Eugene I. Shakhnovich, Editor

Abstract

Mutation bias in prokaryotes varies from extreme adenine and thymine (AT) in obligatory endosymbiotic or parasitic bacteria to extreme guanine and cytosine (GC), for instance in actinobacteria. GC mutation bias deeply influences the folding stability of proteins, making proteins on the average less hydrophobic and therefore less stable with respect to unfolding but also less susceptible to misfolding and aggregation. We study a model where proteins evolve subject to selection for folding stability under given mutation bias, population size, and neutrality. We find a non-neutral regime where, for any given population size, there is an optimal mutation bias that maximizes fitness. Interestingly, this optimal GC usage is small for small populations, large for intermediate populations and around 50% for large populations. This result is robust with respect to the definition of the fitness function and to the protein structures studied. Our model suggests that small populations evolving with small GC usage eventually accumulate a significant selective advantage over populations evolving without this bias. This provides a possible explanation to the observation that most species adopting obligatory intracellular lifestyles with a consequent reduction of effective population size shifted their mutation spectrum towards AT. The model also predicts that large GC usage is optimal for intermediate population size. To test these predictions we estimated the effective population sizes of bacterial species using the optimal codon usage coefficients computed by dos Reis et al. and the synonymous to non-synonymous substitution ratio computed by Daubin and Moran. We found that the population sizes estimated in these ways are significantly smaller for species with small and large GC usage compared to species with no bias, which supports our prediction.

Author Summary

The Guanine plus Cytosine (GC) content of bacterial genomes varies from 20% to 80%. This variation is attributed to the mutation bias produced by replication and repair machinaries. However, the evolutionary forces that act on these very different machinaries have remained elusive. It is known that the GC content of genes strongly influences the resulting proteins' hydrophobicity, which is the main determinant of folding stability. This may lead to expectation that the GC content is strongly selected at its optimal value, since proteins that are too hydrophylic face unfolding problems and proteins that are too hydrophobic face misfolding and aggregation problems. In this work, using a realistic model of genotype (DNA sequence) to phenotype (protein folding stability) to fitness mapping and a standard population genetics model, we find that the optimal GC usage depends on population size. In particular, very small populations prefer small GC usage, intermediate populations prefer large GC usage, and large populations prefer no bias. Our results may explain why most intracellular bacteria, evolving with small effective populations, tend to adopt small GC usage. To test this hypothesis, we estimated the effective population size of several bacterial species, finding that those that evolve with 50% GC usage are characterized by significantly larger populations, although several exceptions exist.

Introduction

The quantitative modeling of molecular evolution is of key importance for reconstructing evolutionary histories, as well as for understanding how the properties of natural macromolecules are influenced by their evolution. Already for a long time population size has been recognized as a crucial factor that influences both the evolutionary process and the stability that macromolecules can attain. On the other hand, even if mutation bias in prokaryotes varies from extreme GC rich to extreme AT rich, its influence on the evolutionary process, the stability of evolving macromolecule, and on the fitness of the population has received much less attention. Here, we simulate an evolutionary model that combines population size, GC mutation bias, and protein folding stability, and we show the deep interplay between these variables.

Kimura's neutral model [1], [2] is still one of the most influential models of molecular evolution. This model considers all viable macromolecules as equally fit and all the others as nonviable. Within this neutral model, the functional properties of the evolving macromolecules, in particular their folding stability, are independent of population size and, by entropy arguments, they are expected to coincide with the minimal properties compatible with viable molecules [3]. If mutations with small fitness effects are included in the model, population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e001.jpg becomes a key variable of the evolutionary process, since slightly deleterious mutations are more likely to be fixed in small populations [4][6]. This study has been pioneered by Ohta, who showed that population size can provide a possible explanation for empirical observations such as the generation time effect [7], [8]. Obligate intracellular lifestyle, such as that of endosymbiotic or parasitic bacteria, implies a strong reduction in effective population size due to bottlenecks upon transmission from one host to another. Inspired by Ohta's theory, computational studies have compared bacterial species displaying an obligate intracellular lifestyle with their free living relatives, suggesting that the genes of intracellular bacteria evolve faster as a result of relaxed selection [9] (but Itoh et al. [10] give a different interpretation) and that their structural RNAs [11] and their proteins [12] are less stable than the orthologous macromolecules of free living bacteria. Evolution experiments with virus and bacteria confirm the influence of small population size, demonstrating fitness loss in populations evolving under repeated bottlenecks [13], [14], and show that such a loss can be partly compensated by over-expressing chaperones that assist protein folding [15]. These findings support the idea that fitness is reduced in small populations as a consequence of the reduction of protein folding stability. Recent theoretical work has shown that, in the appropriate limits, the statistical properties of population genetics are formally equivalent to a statistical mechanical system, so that there is an exact analogy between the reduction of fitness for small populations and the increase of entropy for large temperature [16], [17]. In the present study, we will exploit this correspondence to get analytic insight into non-neutral evolution.

Another key evolutionary variable, which however has received little attention, is the nucleotide spectrum. In prokaryotic genomes, it varies from extreme adenine plus thymine (AT) content in obligatory intracellular bacteria to extreme guanine plus cytosine (GC) content, for instance in actinobacteria. These differences in GC content are prevalently thought to be due to mutation bias [18], [19]. They are strongest at the third codon position, where GC content barely affects the amino acid composition of the protein, but also influence the coding positions [20], [21]. Due to the structure of the genetic code, a mutation bias favoring thymine at the nucleotide level favors the incorporation of hydrophobic amino acids in the translated protein [12], [22]. Hydrophobicity is a key property for protein folding [23]. Proteins that are too hydrophylic tend to be naturally unfolded, whereas proteins that are too hydrophobic tend to misfold and aggregate [24]. This qualitative trade-off between unfolding and misfolding was confirmed by a computational study of the properties of homologous proteins in the proteomes of several bacterial species, using a model of protein folding stability that correlates well with experimentally measured unfolding stabilities [12]. In previous work, two of us and colleagues investigated the relationship between unfolding stability, misfolding stability and mutation bias using a protein evolution model with a realistic genotype (DNA sequence) to phenotype (folding stability) mapping in a neutral fitness landscape in which all proteins with stabilities above thresholds have the same fitness. We found that the mutation bias modulates the trade-off between the two kinds of stability, making proteins evolving under AT mutation bias more stable against unfolding but less stable against misfolding [25].

Interestingly, the two aspects discussed above, small population size and mutation bias towards AT, are strongly correlated in nature. In fact, most bacterial and eukaryotic lineages that adopted an intracellular lifestyle, with consequent reduction of their effective population size, also shifted their mutation spectrum towards AT [26], as indicated by the strong correlation between reduced genome size, which is a signature of intracellularity, and the AT bias [9], [12]. In this work, we investigate the association between population size and mutation bias, studying its consequences through a model that takes into account all of the relevant features of protein evolution discussed above: folding stability with respect to both unfolding and misfolding, population size, mutation bias, and neutrality, i.e. the relationship between folding stability and fitness.

Results

Model

We adopt the Moran model [27], which describes an evolving haploid population with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e002.jpg individuals that reproduce asexually and stochastically under mutation and selection. The model can be easily extended to diploid populations. We assume here that the product of population size times mutation rate is small, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e003.jpg, so that the population is monomorphic, i.e. the time scale for appearance of a new mutant in the population is large and at most one single mutant genotype is competing with the wild-type for fixation each time. This assumption is justified for small and intermediate populations when considering an individual protein coding gene, but not an entire genome (see Discussion). However, for large populations the assumption An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e004.jpg is violated even for an individual gene, and we can not apply the model to this case. In this monomorphic limit, the probability that a mutation arising as a single individual is fixed in the whole population can be exactly computed as [27]

equation image
(1)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e006.jpg is the exponential growth rate of the phenotype associated to sequence An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e007.jpg, which will be called fitness in the following. This analytic result enormously simplifies the numeric study of the system allowing the systematic exploration of its parameter space. In our simulations, we randomly generate a mutated sequence, evaluate its fitness with respect to the wild type, and accept the new mutation according to the above probability.

We model mutations at the DNA level through the HKY process [28], whose only parameters are the equilibrium frequencies of the four bases An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e008.jpg in the absence of selection, and the transition/transversion ratio An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e009.jpg, whose influence is very weak and which we set to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e010.jpg [8]. In order to reduce the number of parameters, we assume that Chargaff's second parity rule holds, so that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e011.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e012.jpg. Thus, the mutation model only depends on the GC usage, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e013.jpg. GC usage different from An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e014.jpg determines a mutation bias towards AT or towards GC, therefore we sometimes refer to the GC usage variable as the mutation bias. In our model, the GC usage variable very strongly correlates with the GC content of the evolving gene in the stationary state of the evolutionary dynamics. The same correlation is thought to exist between the GC content of bacterial genomes, in particular at third codon position, and the GC usage of the mutations arising in bacterial replication. Therefore, we will compare the variable GC usage in our model with the variable GC content at third codon position in bacterial genomes.

Folding stability

In our model the fitness of an individual carrying a particular gene depends on the folding properties of the translated protein, which are estimated through a simple protein folding model. This model was used in our previous works [25], [29], [30] and it is similar to those used by others [31][39]. A characteristic of our model that distinguishes it from similar ones is that we consider two types of stability, with respect to misfolding and with respect to unfolding. Stability with respect to unfolding is estimated through the folding free energy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e015.jpg of a protein sequence An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e016.jpg, calculated with a simple contact interaction model (see Methods). Free energies estimated in this way correlate well with experimental measures (correlation coefficient An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e017.jpg over a test set of 20 proteins, UB, unpublished result). Stability with respect to misfolding is estimated through the normalized energy gap An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e018.jpg (see Methods), which is the normalized difference between the effective energy of the native state and the minimum effective energy predicted through a Random Energy Model, representing the energy of compact intermediate structures very different from the native one. These misfolded structures can trap the folding process, and they can expose hydrophobic patches and promote aggregation.

Interestingly, these two kinds of stability respond in an opposite way to an increased mutation pressure towards hydrophobicity: while An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e019.jpg increases for increasing mean hydrophobicity, meaning that proteins become more stable with respect to unfolding, the normalized energy gap decreases. This is due to the fact that the maximum stability of all potential misfolded structures increases more than the stability of the native structure, thus making misfolding and aggregation problems potentially more serious [12]. This trade-off between the two stabilities has a deep influence on the evolutionary dynamics.

Fitness

We adopt a fitness function that depends on the normalized stabilities An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e020.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e021.jpg and on the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e022.jpg,

equation image
(2)

The neutral thresholds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e024.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e025.jpg define the scale of acceptable stabilities and they are kept fixed throughout the simulation. With this definition the fitness takes values between An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e026.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e027.jpg, vanishing if the protein does not fold correctly, which means that it is considered essential. Two plots of fitness versus stability for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e028.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e029.jpg are represented in Fig. 1 for illustration purposes. The fitness becomes a binary variable, either 0 or An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e030.jpg, if the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e031.jpg is either zero (in this case all sequences satisfying An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e032.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e033.jpg are equally fit) or infinite (in this case all sequences overcoming the neutral thresholds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e034.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e035.jpg have fitness 1 and all other sequences are not viable). These limits are equivalent to Kimura's neutral model [2], which we studied previously [25], [29], [30], in which it is assumed that mutations that maintain stabilities above the neutral thresholds have no fitness effect, while all the others are lethal. This motivated us to name the parameter An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e036.jpg the neutrality exponent. Notice that the term neutrality is sometimes defined as the fraction of proteins that retain wild-type structure under mutations [40]. This definition assumes a neutral model where the wild-type structure is either stable (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e037.jpg) or unstable (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e038.jpg). We prefer to call this quantity the fraction of neutral neighbors [29], and to call neutrality exponent the exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e039.jpg that determines the smoothness of the relationship between stability and fitness.

Figure 1
Fitness versus stabilities for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e040.jpg (top) and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e041.jpg (bottom).

We choose the two neutral thresholds proportional to the values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e042.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e043.jpg for the reference protein in the Protein Data Bank (PDB), multiplied with coefficients An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e044.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e045.jpg. In simulations of neutral evolution, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e046.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e047.jpg have to be smaller than one so that the reference protein is viable. We present results with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e048.jpg. We tested the robustness of our results with respect to both changes in the analytical form of the fitness function and the values of parameters, as discussed in the following.

Analytic results

We can analytically predict how the population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e049.jpg and the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e050.jpg influence stability and fitness by exploiting the formal analogy between population genetics and statistical mechanics demonstrated by Berg and coworkers [16] and by Sella and Hirsh [17]. These authors noticed that, in the monomorphic limit An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e051.jpg mentioned above and that we assume throughout this work, the Moran process, as well as other evolutionary processes studied in population genetics, tends to a stationary distribution of the form An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e052.jpg. This distribution is equivalent to a Boltzmann distribution where population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e053.jpg plays the role of inverse temperature and the logarithm of fitness, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e054.jpg plays the role of minus energy. This result implies that the probability to find a protein with stability values An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e055.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e056.jpg in the stationary state of an evolving population is proportional to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e057.jpg multiplied by a factor that depends on the mutation process. The bias arising in the mutation process was treated as a “chemical potentia” by Sella and Hirsh [17] or as a mutational entropy by Berg et al. [16]. These two formalisms are qualitatively equivalent. We find the name mutational entropy more intuitive, and we will use it in the following. We define An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e058.jpg the probability to find stability parameters An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e059.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e060.jpg under mutation alone, and we introduce the quantity An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e061.jpg, which we call the mutational entropy compatible with stabilities An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e062.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e063.jpg under the given mutation process (notice that strictly speaking An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e064.jpg is not an entropy, however we find this name intuitive for indicating the mutational force that opposes protein stability). As discussed above, the mutational entropy depends on the GC usage, which can favor one kind of stability with respect to the other. Taking all this into account, the stationary distribution of stability that results from mutation and selection is

equation image
(3)

The logarithm of the above probability can be interpreted as minus an evolutionary free energy divided by temperature An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e066.jpg, and it is given by

equation image
(4)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e068.jpg is called the additive fitness [17]. The distribution Eq. (3) is peaked around the values An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e069.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e070.jpg that maximize the exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e071.jpg, i.e. minimize the evolutionary free energy. The equations that define these most likely values read

equation image
(5)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e073.jpg. We call the above the maximum-likelihood (ML) equations. Notice that the maximum likelihood values An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e074.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e075.jpg depend on the parameters An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e076.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e077.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e078.jpg. We can study this dependence analytically, assuming that Eq. (3) is narrowly peaked around these values, so that averages can be calculated as An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e079.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e080.jpg. This approximation is justified by the fact that the mutational entropy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e081.jpg is expected to be proportional to protein length An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e082.jpg, which is of the order of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e083.jpg, and the selective term is proportional to population size, which is also large, so that the exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e084.jpg is large and the distribution very narrow. The condition that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e085.jpg has a maximum at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e086.jpg requires that its Hessian matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e087.jpg, consisting of its second derivatives, is negative definite,

equation image
(6)

This Hessian is the sum of the Hessian of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e089.jpg, which is negative by construction, as it is easy to verify, and the Hessian of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e090.jpg, which is the logarithm of a probability. We assume that the mutational entropy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e091.jpg has a single maximum at stabilities An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e092.jpg, so that its Hessian is negative. The values An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e093.jpg that represent the most likely values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e094.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e095.jpg in the absence of selection depend on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e096.jpg. By definition of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e097.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e098.jpg is always negative, which is not a viable stability (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e099.jpg). However, our numerical results show that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e100.jpg is positive for small GC usage, corresponding to hydrophobic sequences. The mutational entropy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e101.jpg decreases for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e102.jpg and for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e103.jpg, which implies that the corresponding derivatives are negative, as required for the existence of the solution of the ML equations.

We can go beyond the maximum-likelihood approximation writing the exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e104.jpg at second order as An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e105.jpg, which is equivalent to approximating the distribution Eq. (3) as a Gaussian with covariance matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e106.jpg. Therefore, negativity of the Hessian matrix is equivalent to requiring the covariance matrix to be positive.

Influence of population size

We can calculate how An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e107.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e108.jpg depend on population size by taking the derivatives of the ML equations with respect to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e109.jpg (see Text S1). In this way, we find that both stabilities must increase with population size, as expected. The mean fitness An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e110.jpg is therefore an increasing function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e111.jpg, whereas the mutational entropy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e112.jpg is a decreasing function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e113.jpg.

Influence of the neutrality exponent

Stabilities are not monotonic functions of the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e114.jpg. At An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e115.jpg all stabilities above the lethal threshold An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e116.jpg at which fitness drops to zero are selectively equivalent, and the ML equations imply that the stabilities with the largest mutational entropy fulfilling these conditions will prevail. As mentioned above, the most likely value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e117.jpg in the absence of selection is negative for all An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e118.jpg usages, so that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e119.jpg for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e120.jpg. On the other hand, the most likely value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e121.jpg in the absence of selection An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e122.jpg is positive for hydrophobic sequences, corresponding to small GC usage. The ML equations thus predict that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e123.jpg, where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e124.jpg satisfies the equation An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e125.jpg at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e126.jpg. Similarly, in the neutral limit An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e127.jpg, the smaller between An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e128.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e129.jpg tends to the value 1, i.e.the corresponding stability tends to the neutral threshold, and the larger stability satisfies the equation An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e130.jpg at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e131.jpg. For finite An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e132.jpg, it can be shown that both stabilities increase with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e133.jpg when An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e134.jpg is small, they reach a maximum and then decrease towards the neutral values (see Text S1). This behavior of stability arises from the fact that, under neutral or almost neutral evolution, the advantage in fitness provided by a more stable protein is too small to be fixed in the population against the entropic effect of mutations. This mechanism has been proposed as an explanation of the empirical observation that natural proteins are only marginally stable [3].

Similarly, we can show that the fitness has a minimum as a function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e135.jpg: It starts from the value An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e136.jpg at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e137.jpg, then at small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e138.jpg the fitness is reduced because low stability values are penalized, at larger An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e139.jpg more stable sequences are attained, and finally in the neutral limit the fitness tends to the maximum possible value An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e140.jpg while stability decreases (see Text S1). We can therefore distinguish three qualitative behaviors, described in Table 1. We are mainly interested in the parameter range that is far both from the region An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e141.jpg at which the minimum stability is close to the lethal threshold An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e142.jpg, and from the region of large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e143.jpg at which stabilities are close to the neutral thresholds.

Table 1
Qualitative behavior of fitness and stability versus neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e144.jpg at fixed GC and population size.

Influence of the mutation bias

The most interesting feature of the evolutionary model presented here is the dependence of stability and fitness on the mutation bias. Unfortunately, this dependence cannot be predicted analytically, since we do not have a detailed model of how the mutation entropy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e156.jpg depends on GC usage. Numerical results show that, for the folding free energy function that we adopt here, the two stabilities respond differently to the GC usage. This is expected, since small GC usage favors hydrophobic proteins, enhancing unfolding stability (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e157.jpg) at the expenses of misfolding stability (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e158.jpg). Since fitness depends on both An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e159.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e160.jpg, it has to trade-off between the two stabilities, and we expect that there is an optimal GC usage at which the fitness is maximal for given An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e161.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e162.jpg, which satisfies the equation An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e163.jpg

equation image
(7)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e165.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e166.jpg are determined by the ML equations (5). The maximum fitness is achieved when the quantity

equation image
(8)

is minimal. Here An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e168.jpg is the smaller value and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e169.jpg the larger value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e170.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e171.jpg. We first discuss the small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e172.jpg regime at which stabilities are small and they are strongly influenced by the GC usage. In this regime, we expect that there is a value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e173.jpg at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e174.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e175.jpg are equal. Therefore, at small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e176.jpg usage it holds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e177.jpg, which increases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e178.jpg, whereas at large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e179.jpg usage it holds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e180.jpg, which decreases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e181.jpg. Consequently, the factor An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e182.jpg has a minimum where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e183.jpg. Conversely, the second factor that appears in An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e184.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e185.jpg, has a maximum where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e186.jpg. We expect that the factor An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e187.jpg depends more strongly on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e188.jpg than the factor An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e189.jpg, in particular if An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e190.jpg is large. Therefore, we expect that the minimum An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e191.jpg (i.e. the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e192.jpg) is reached near the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e193.jpg usage at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e194.jpg, and that it approaches this value as An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e195.jpg grows. The An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e196.jpg usage at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e197.jpg has an interesting interpretation. We can define the selective pressure on the variable An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e198.jpg as the derivative of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e199.jpg with respect to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e200.jpg, which expresses how fitness responds to a change in stability. If this derivative is large, a large number of attempted mutations will be discarded because of their negative influence on fitness. The ML equations show that the selective pressure is proportional to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e201.jpg, and it is stronger on the smaller variable An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e202.jpg. Therefore, when the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e203.jpg usage increases, the selective pressure on unfolding increases, and the selective pressure on misfolding decreases, and they balance when An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e204.jpg.

Theoretical considerations and numerical results indicate that there is a second regime at large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e205.jpg. In this limit, the fitness tends to the maximum possible value. Due to the trade-off between unfolding and misfolding stability, it is not possible to maximize An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e206.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e207.jpg simultaneously, since they are inversely related. As An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e208.jpg increases, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e209.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e210.jpg are expected to converge to the optimal fitness point An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e211.jpg and their dependence on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e212.jpg is expected to become weaker and weaker. We find numerically that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e213.jpg is smaller than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e214.jpg, so that for large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e215.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e216.jpg is smaller than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e217.jpg for all An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e218.jpg, and the selective pressure is always stronger on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e219.jpg. In this regime, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e220.jpg always decreases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e221.jpg and its dependence on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e222.jpg gets weaker. Conversely, the term An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e223.jpg always increases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e224.jpg, and the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e225.jpg is determined by a balance between these two terms. We now discuss two interesting limiting behaviors of the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e226.jpg.

  1. In the small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e227.jpg regime and for finite An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e228.jpg, so that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e229.jpg is small, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e230.jpg tends to zero and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e231.jpg tends to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e232.jpg independent of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e233.jpg. For small GC usage, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e234.jpg is positive and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e235.jpg is a decreasing function of GC, since An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e236.jpg increases with GC. For large GC usage, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e237.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e238.jpg increases with GC. Therefore, we expect that the minimum of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e239.jpg, i.e. the optimal GC, is attained near the GC usage at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e240.jpg, which is independent of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e241.jpg and of the neutral thresholds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e242.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e243.jpg.
  2. In the neutral limit An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e244.jpg, the selective pressure only affects the smallest stability variable, since An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e245.jpg. This tends to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e246.jpg independent of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e247.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e248.jpg. Therefore, as discussed above, for large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e249.jpg, the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e250.jpg is reached when An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e251.jpg, i.e. when the two selective pressures balance. The ML equations imply that at this point An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e252.jpg, so that the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e253.jpg does not depend on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e254.jpg. The ML equations also imply that, in the large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e255.jpg limit, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e256.jpg (see Text S1), which means that the maximum stability and maximum fitness is attained at the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e257.jpg value at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e258.jpg is minimum. This prediction is confirmed in Fig. 6 in the Text S1).
    Figure 6
    Optimal GC usage An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e359.jpg versus population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e360.jpg for neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e361.jpg and different values of the neutral thresholds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e362.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e363.jpg, where the reference energy gap An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e364.jpg and unfolding free energy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e365.jpg are those measured for the protein in the PDB.

Simulations

All simulations presented here are based on the native structure of some natural protein. When not otherwise stated, we exemplify our numerical results using the protein lysozyme, PDB id. 31zt. In all cases, the starting sequence is the sequence in the PDB. Results are collected after fitness has converged to its stationary value, discarding the first An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e259.jpg accepted substitutions, which are enough for equilibration, as it can be seen in Fig. 2 in the Text S1.

Figure 2
Mean unfolding stability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e260.jpg versus misfolding stability An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e261.jpg for neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e262.jpg (non-neutral regime).

As an illustration of the stationary states of the evolutionary dynamics, we represent in Fig. 2 the mean stability values An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e284.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e285.jpg obtained using the fitness function with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e286.jpg for different population sizes from An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e287.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e288.jpg and GC usage from An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e289.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e290.jpg. The distributions An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e291.jpg, Eq. (3), are narrowly peaked around the plotted points An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e292.jpg. Sets of points with the same GC usage are joined with solid lines, and sets of points with the same An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e293.jpg are joined with dashed line. The data are superimposed to a heat map that shows the value of fitness in colour code. We can see from the figure that both stabilities grow with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e294.jpg. On the other hand, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e295.jpg grows and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e296.jpg decreases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e297.jpg, so that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e298.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e299.jpg are negatively correlated for fixed population size. For An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e300.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e301.jpg tends to a finite value when An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e302.jpg tends to zero (corresponding to very small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e303.jpg), i.e. the most likely value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e304.jpg in the absence of selection is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e305.jpg and, for such small GC usage, there is very weak selective pressure on unfolding. One can see from the plot that the GC usage at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e306.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e307.jpg are equal increases with population size, which implies that the selective pressure on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e308.jpg increases more than the selective pressure on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e309.jpg for increasing population size. In the large population limit both An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e310.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e311.jpg tend to finite values independent of GC. We estimated from our numerical results that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e312.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e313.jpg, so that for large populations it is always An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e314.jpg.

Fitness clearly increases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e315.jpg. The variation of fitness with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e316.jpg is weaker, but one can nevertheless notice it from the plot. This variation translates into the fact that, for fixed fitness function and population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e317.jpg, there is an optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e318.jpg usage such that fitness is maximal, as predicted in Eq. (7). The existence of this optimal mutation bias is demonstrated in Fig. 3, where we plot the fitness of populations with constant An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e319.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e320.jpg as a function of their An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e321.jpg usage. For each set of parameters, we obtained the optimal GC usage An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e322.jpg by cubic interpolation, as exemplified in Fig. 3, and plotted it versus An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e323.jpg. We found that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e324.jpg is small for very small populations, large for intermediate populations, and the bias is almost absent (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e325.jpg) for very large populations (see Fig. 4). We obtained qualitatively similar results as long as the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e326.jpg is not too large or too small (in that case, the fitness landscape becomes almost neutral). The population size at which the optimal GC usage is highest increases with decreasing An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e327.jpg for small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e328.jpg, while the opposite holds for large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e329.jpg. Our numerical results are consistent with the optimal GC usage becoming less dependent on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e330.jpg in the infinite population limit, see Fig. 3 in the Text S1.

Figure 3
Fitness (in different units for each curve) versus GC usage for neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e331.jpg and three different population sizes.
Figure 4
Optimal GC usage An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e333.jpg at which the fitness is maximum versus population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e334.jpg.

Eq. (4) implies that a trait that confers a selective advantage can only be fixed against the entropic effect of random mutations when the difference in the selection coefficients An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e337.jpg is larger than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e338.jpg. We therefore verified whether the difference of selective coefficients An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e339.jpg between populations adopting different GC usages is large enough so that the optimal one would be eventually selected. We found that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e340.jpg decreases with population size, but more slowly than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e341.jpg, so that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e342.jpg increases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e343.jpg, see Fig. 4 in the Text S1. This implies that two populations evolving with different mutation bias (the optimal one and another one) attain a fitness difference large enough so that the optimal GC usage can be selected.

We tested that our results do not change qualitatively when different protein structures are used in the simulation. To this end, we computed the relationship between the optimal GC usage and population size at neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e344.jpg for five proteins of different length and secondary structure (see Methods). All curves, plotted in Fig. 5, have the same shape, although they are shifted in the vertical direction in a way that suggests that shorter proteins are characterized by larger optimal GC usage (but more proteins are needed to confirm this trend). We then combined the five curves. We assumed that a genome composed of these five proteins is evolving with very low mutation rate, so that at most one protein is mutated at each step, consistent with the assumption An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e345.jpg. The global fitness of the organism was obtained through two different ansatz that yielded qualitatively similar results, either as the minimum of the fitness of all proteins An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e346.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e347.jpg or as the product of the fitnesses, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e348.jpg, assuming absence of epistatic interactions. From these An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e349.jpg we then obtained the optimal GC by cubic interpolation. This is represented in Fig. 5, bottom plot for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e350.jpg. One can see that the qualitative behavior of the individual curves is preserved. We expect therefore that this qualitative behavior would be maintained for a large number of proteins as well.

Figure 5
Optimal mutation bias An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e351.jpg at which the fitness is maximum versus population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e352.jpg for different proteins and neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e353.jpg.

To further test the robustness of our results we changed the neutral thresholds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e354.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e355.jpg up to 20%, examining nine combinations of thresholds for neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e356.jpg. The results are shown in Fig 6. One can see that the qualitative behavior is unchanged. As expected, when An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e357.jpg becomes more tolerant the optimal GC usage decreases, and the contrary happens when An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e358.jpg becomes more strict.

Finally, we verified that the results are robust with respect to the energy parameters used. For such a test, we adopted the contact interaction energies determined by Godzik, Kolinsky and Skolnick (GKS) [41]. These parameters have correlation An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e369.jpg with the BVK parameters adopted in the present study, so that their differences are not small. We determined a new parameter for conformation entropy An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e370.jpg by demanding the folding free energies computed with the two sets of energy parameters to coincide on the average. As one can see from the dotted curve in Fig. 7, the qualitative behavior is the same for the two parameter-sets, but the optimal GC usage for GKS parameters is lower than for BVK parameters. This is due to the fact that, for our test protein lysozyme, GKS energy parameters produce a very low normalized energy gap An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e371.jpg instead of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e372.jpg with BVK parameters, which means that the native conformation is closer in energy to random conformations when GKS parameters are used. Consequently, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e373.jpg is very small (we recall that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e374.jpg is proportional to the value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e375.jpg for the native sequence) and the selective pressure on misfolding is very weak. We then increased this selective pressure by setting An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e376.jpg instead of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e377.jpg. The resulting curve can be seen in Fig. 7 as a dashed curve. One finds that the maximum GC usage is now much larger, reaching An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e378.jpg.

Figure 7
Comparison between the optimal GC usages computed with GKS energy parameters (dotted line and dashed line) and the BVK parameters adopted in the present study (solid line).

Finally, we show in Fig. 8 the optimal GC usage versus the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e385.jpg for small (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e386.jpg), intermediate (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e387.jpg) and large (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e388.jpg) populations. For small populations the optimal GC usage increases with the neutrality exponent, from very small values to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e389.jpg. For intermediate and large populations the optimal GC usage has a maximum and then it decreases. The maximum value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e390.jpg increases with population size, and it is reached at smaller neutrality exponent for intermediate populations (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e391.jpg at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e392.jpg) than for large populations (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e393.jpg at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e394.jpg).

Figure 8
Optimal GC usage An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e395.jpg versus neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e396.jpg for three population sizes An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e397.jpg.

We then tested the mean-field prediction that the stability coefficient An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e398.jpg has a maximum and the sequence entropy has a minimum as a function of neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e399.jpg. As expected, maximum stability and minimum entropy occur at the same value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e400.jpg, see Fig. 5 in the Text S1.

Qualitative behavior of the optimal GC

We now discuss the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e401.jpg-dependence of the optimal GC based on the results reported in Fig. 2. As explained above, the existence of the optimal GC usage arises from the trade-off between unfolding stability and misfolding stability in response to changes in the mutation bias. One can observe this trade-off in Fig. 2, from which it appears that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e402.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e403.jpg are negatively correlated for fixed population size. At the optimal GC the derivatives of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e404.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e405.jpg with respect to GC, which have opposite sign, become equal in absolute value, as indicated by Eq. (7). One can see from Fig. 2 that at small GC usage An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e406.jpg responds to GC variation more strongly than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e407.jpg, whereas the opposite happens at large GC usage, so that the optimal is reached at intermediate GC. In Fig. 2, the white thick line represents the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e408.jpg line at which the selective pressures on unfolding and misfolding are equal. One can see from the plot that, for small GC usage and small population sizes, the selective pressure is stronger on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e409.jpg (misfolding). Since An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e410.jpg increases faster than An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e411.jpg with population size, the selective pressure on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e412.jpg increases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e413.jpg more than the selective pressure on An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e414.jpg. Consequently, the GC usage at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e415.jpg (white line) increases with population size. As discussed in the section “Influence of the mutation process”, this behaviour qualitatively explains why the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e416.jpg increases with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e417.jpg at small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e418.jpg, since the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e419.jpg is expected to be near the value at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e420.jpg. Near An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e421.jpg, the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e422.jpg attains a maximum as a function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e423.jpg. For An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e424.jpg, we see that An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e425.jpg for all An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e426.jpg usages, so that the selective pressure is always stronger on misfolding, and we enter what we called the large An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e427.jpg regime. In this regime, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e428.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e429.jpg tend to the finite values that yield the maximum absolute fitness (numerical results suggest that they are An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e430.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e431.jpg), which are independent of GC, so that the GC dependence of stabilities gets weaker and weaker for large populations. When these limiting values are approached, the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e432.jpg curves that correspond to fixed An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e433.jpg and varying An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e434.jpg in Fig. 2 change their shape, becoming more convex and centered around An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e435.jpg (red squares). This behavior corresponds to the fact that the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e436.jpg decreases towards An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e437.jpg for very large population size.

According to this reasoning, the maximum value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e438.jpg versus An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e439.jpg is reached at a population size where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e440.jpg approaches its limiting value An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e441.jpg. As discussed above and detailed in the Text S1, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e442.jpg has a maximum as a function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e443.jpg for fixed population size. Therefore, the population size at which a given value An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e444.jpg is reached has a minimum as a function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e445.jpg, which implies that the population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e446.jpg at which the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e447.jpg is largest has a minimum as a function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e448.jpg. This prediction is in qualitative agreement with Fig. 4, bottom plot, which suggests that the minimum of the largest An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e449.jpg versus An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e450.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e451.jpg, is reached between An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e452.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e453.jpg.

Effective population Size

The results that we have presented suggest that mutation bias towards AT or GC favor protein folding stability for very small and intermediate population sizes, respectively, while very large populations are advantaged in the absence of bias (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e454.jpg). As it will be discussed below, this suggests that species evolving with mutation bias, either towards AT or GC, will have smaller population size than species with no bias. This prediction is consistent with the fact that almost all bacterial species with intracellular lifestyles, implying a reduction of effective population size through bottlenecks, shifted their mutation spectrum to AT, which resulted in small genomic GC content. On the other hand, among bacteria with large GC content some are facultative pathogens, such as Mycobacterium tuberculosis, and some live symbiotically in plant nodules, but there is no general tendency allowing for the deduction of their population size from their lifestyles. Therefore, to test our prediction, we tried to directly estimate their effective population size.

The effective population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e455.jpg depends on the breeding structure and the natural history of a population, and in particular it is influenced by the bottlenecks that the population may undergo if a few individuals periodically colonize new environments. Therefore, the effective population size cannot be measured experimentally, but is estimated by fitting some observed population feature to its expected value under evolution in a population with given An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e456.jpg. Optimal codon usage was used several years ago to estimate the effective population size of Escherichia coli [42]. A recent work supports the existence of a correlation between effective population size and synonymous codon usage [43], and the availability of many complete genomes makes it possible to analyze codon usage on a large scale. Codon usage and mutation bias are intimately correlated. It is commonly believed that the mutation bias, rather than selection for optimal codon usage, ultimately influences the global GC content of a genome [18], [19]. The definition of the optimal codon usage on which the results that we use here are based considers the excess frequency of preferred codons with respect to the frequency expected under mutation alone, and is therefore not expected to depend on the mutation bias in a trivial way. Dos Reis el al. [44] have recently estimated the optimal codon usage in a large number of prokaryotic species. We use their data rather than the analogous data obtained by Sharp et al. [45], since Dos Reis et al. evaluated the optimal codon usage on the entire genome, whereas Sharp et al. concentrated their attention only on ribosomal genes, which can be a biased sample. Fig. 9 shows the average optimal codon usage versus the average GC content at the third codon position, which is not affected by the selection on the amino acid sequence and is expected to be very strongly correlated with the mutation bias. We distinguished species with small (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e457.jpg), intermediate (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e458.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e459.jpg) and large (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e460.jpg) GC content. Species with intermediate GC content turned out to have significantly larger optimal codon usage, which suggests that they have larger effective population size. The scatter plot and the histogram of the GC content are shown in Fig. 7 and and8)8) in the Text S1. Error bars in the plot represent the standard error of the mean, and show that the mean values are significantly different. However, data prior to the mean are rather broadly distributed, with standard deviations equal to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e461.jpg (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e462.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e463.jpg (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e464.jpg) and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e465.jpg (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e466.jpg).

Figure 9
Estimates of quantities correlating with effective population size obtained from genomic data.

As a second estimate of effective population size, we considered the ratio between non-synonymous and synonymous substitutions An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e475.jpg, which is thought to represent the strength of negative selection [8]. We examined values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e476.jpg computed for pairs of entire genomes, recently published by Daubin and Moran [46]. From their table, we eliminated two pairs of genomes for which the evolutionary divergence, estimated through An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e477.jpg, was very small (An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e478.jpg), corresponding to Bordetella pertussis/parapertussis and two strains of Xylella fastidiosa, since it is known that the amino acid substitution rate is significantly higher at small time separation [47][49] and in fact these two pairs of genomes showed the two largest values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e479.jpg. We also eliminated two pairs for which the two compared species had genomic GC content in different bins: two strains of Prochlorococcus marinus having GC = 36% and 51%, and the pair Synechocystis/Synechococcus having GC = 48% and GC = 65%, respectively. We divided the remaining 19 pairs in 3 bins of low, mean and high GC content and averaged their An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e480.jpg. Results, shown in Fig. 9, clearly show that species evolving with no bias are characterized by lower An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e481.jpg, hence larger effective population size, in agreement with the analysis of the optimal codon usage and with the prediction of our model.

Finally, we reanalysed our data on protein folding stabilities computationally estimated for orthologous proteins in different prokaryotic genomes [12]. Unfolding and misfolding stabilities are negatively correlated, as predicted by our model (see Fig. 10). We found that most of the organisms evolving with mutation bias have proteins whose misfolding stability is lower than what could be expected based on their unfolding stability, see Fig. 11. This further supports the idea that these species are characterized by reduced effective population sizes.

Figure 10
Negative correlation between misfolding and unfolding stability.
Figure 11
Relationship between GC usage and protein folding stability in orthologous proteins in different prokaryotic genomes (data taken from Ref. [12]).

Discussion

Interplay between mutation bias and population size

We studied here a mathematical model of protein evolution where the genotype to phenotype mapping is determined by the stability of the mutated protein against unfolding and misfolding, predicted using a protein folding model that correlates well with experimental measures. As observed in previous work, the two kinds of stability respond in an opposite way to changes in the GC usage of the mutation process. This fact produces a trade-off between the two kinds of stability, and an interesting phenomenology arises from the impossibility to find a mutation process that optimizes both stabilities at the same time, a concept that in the physical literature has received the name of frustration.

We considered three key evolutionary parameters: the effective population size An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e488.jpg, the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e489.jpg, which determines how protein stability influences fitness, and the GC usage that expresses the mutation bias. Despite its importance in shaping the folding properties of proteins, the latter has been rarely considered in evolutionary models. Here we show that, in the non-neutral regime, mutation bias has a very interesting interplay with population size. We suggest that this can explain why some microbial species adopted extreme mutation bias.

At high neutrality exponent, all proteins with stability above the neutral threshold provide the same fitness and evolution is only able to attain the lowest allowed stabilities [3], almost independent of population size. Consistently, our analytic and numerical results indicate that the neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e490.jpg has a non-monotonic influence on protein stability, which reaches a maximum at intermediate An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e491.jpg for given population size. The increase of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e492.jpg in our model has its biological counterpart in the increase of the expression level of chaperones, which make proteins more tolerant to stability losses. Therefore, the decrease of stability for increasing An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e493.jpg predicted by our model would correspond in the real world to the decrease of protein stability when the chaperone expression is increased. This outcome appears rather plausible. However, given the cost of synthesizing chaperones, in real evolution it is to be expected that the increase of the expression level of chaperones is a consequence of the loss of protein stability, as observed in intracellular bacteria with reduced population size, rather than the other way round.

In the neutral regime the GC usage influences the amino acid composition and consequently the folding properties, favoring proteins more stable with respect to misfolding but less stable with respect to unfolding, without modifying the fitness. In contrast, in the non-neutral regime fitness is a continuous function of stability and the outcome of evolution depends non-trivially on mutation in the sense that for fixed population size there is an optimal mutation bias at which fitness and stability are maximal. This is an unexpected result, which implies that mutation and selection are effectively entangled, and that the mutation spectrum constrains the maximum stability and fitness that an evolving population can attain. The possibility that the mutation rate is optimized as a response to evolutionary forces [50] has received considerable attention in experiments (see Ref. [51] for a recent work) and modelling (see for instance Refs. [52], [53]). The main forces influencing mutation rate evolution have been identified as the population size [50], the ruggedness of the fitness landscape [54] and the average negative effect of a mutation [55]. Recently, a theoretical work has established a relation between mutation rate, maximal genome size and thermodynamic response of proteins to point mutations, showing that populations go extinct via lethal mutagenesis when their mutation rate exceeds a few mutations per genome per replication [56]. Simulations of this model confirmed the predicted behaviour, showing that the limiting number of mutations is approximately seven for RNA viruses and about four for DNA-based organisms, with some weak dependence on the number of genes in the organism and the organism's natural death rate [57]. This model predicts that species with high mutation rates tend to have less stable proteins compared to species with low mutation rates. Therefore, the notion that the mutation process can influence protein stability, and that the optimal mutation process is influenced by properties of the selection process is not new, but the extension of this concept to the evolution of the mutation bias is novel to our knowledge.

Quite interestingly, small populations attain higher fitness with AT bias, intermediate populations get an advantage with GC usage, and very large populations attain higher fitness with almost absent bias. This result establishes a deep interplay between population size and mutation bias. The ML equations show that the optimal GC usage depends on how the number of stable sequences decreases with the stability values, i.e. it is an effect of probability in sequence space. For very small population size and stabilities the optimal mutation bias is attained at small GC usage, which makes folding easier. At higher stabilities (intermediate population size) the optimal GC usage increases, therewith improving the stability against misfolding at the optimal GC. Approaching the maximal stabilities the optimal GC usage decreases again towards the value An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e494.jpg, which means absence of bias in the mutation process.

As a speculative remark, we note that it was not obvious that our model would predict An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e495.jpg as the optimal GC usage for very large populations. In this limit the absolute maximum fitness is reached. We have shown numerically (see Text S1) that the optimal GC usage in the infinite population limit is little dependent on the parameters of the fitness function An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e496.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e497.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e498.jpg, as long as the selective pressure affects mostly An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e499.jpg, so that in this limit An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e500.jpg mainly depends on the contact energy parameters and on the genetic code. This conjecture is consistent with our data. Nevertheless, a systematic test requires cumbersome simulations that we did not perform here. We obtained a different result when using the GKS contact energy parameters, which yielded An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e501.jpg for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e502.jpg in the very large population limit. However, we notice that these parameters also produced a very small normalized energy gap, which suggests that they might be less suitable for this kind of study.

Influence of the mutation rate

The model that we adopt here is based on the assumption that the population is genetically homogeneous, i.e. the product An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e503.jpg of population size times mutation rate is small. This allows us to analytically compute the fixation probability of a new mutation through Eq. (1) instead of explicitly simulating population dynamics. This approximation is considered valid if An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e504.jpg measures the mutation rate of a single protein, in particular if population size is small. However, the high mutation rates of RNA viruses may violate this assumption even for a single protein, and in this case several works [58], [59] have shown that the load due to nonviable mutations significantly modifies the evolutionary process even in the case of a neutral fitness landscape, leading to the evolution of mutational robustness and enhanced folding stability [60][62]. This situation can be studied analytically in the framework of the quasi-species theory [63]. We did not consider this theory here, because it assumes that the population size is infinite and therefore it prevents to study the effect of finite populations that is the main focus of the present work. If we considered a whole evolving genome instead of a single protein, the approximation of very small mutation rate would not be justified, since genomic mutation rates are in a range of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e505.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e506.jpg mutations per genome per generation for DNA-based microbes, including viruses, bacteria, and eukaryotes [55]. In this context, a new interesting effect has to be considered, namely the hitch-hiking effect, which consists in the fixation of mildly disfavoured alleles driven by a positively selected allele present in the same chromosome. However, since treating the hitch-hiking effect would make both the analytic and the numeric study much more complicated, we leave it as a subsequent step.

Robustness of the results

Our model depends on several assumptions and parameters. As evolutionary model, we adopted the Moran process, one of the best studied population genetic models. The theoretical work by Sella and Hirsh [17] shows that other evolutionary processes, such as for instance the Wright-Fisher process, would yield the same qualitative results. The mutation process was modelled using a single parameter, the GC usage. While this parametrization might appear too simplified, it has the merit to focus on a variable whose relevance has been pointed out by a large number of experimental studies, statistical analysis and models.

The ingredients of our model that seem more debatable are the form of the fitness function and its parameters An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e507.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e508.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e509.jpg. To test the robustness of our results, we simulated different functional forms of the fitness function, using exponential functions of stability instead of power laws or letting the fitness depend only on the minimum between the two stabilities An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e510.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e511.jpg. In all cases, we found the same qualitative results: There is an optimal mutation bias at which the fitness is maximal, such that for very small populations the optimal bias is towards AT, and for intermediate populations the optimal bias is towards GC. We then studied in detail the fitness function Eq. (2). Changing the neutrality exponent does not modify the qualitative results as long as the combination of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e512.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e513.jpg is in the non-neutral regime. Experiments on the evolution of small populations [13], [14] and computational studies of protein folding stability [12] suggest that stability does depend on population size for populations subject to repeated bottlenecks, so that for such populations it is justified to assume that the non-neutral regime is the relevant evolutionary regime. We also varied the neutral thresholds An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e514.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e515.jpg by more than 20%, finding that they do not change the qualitative behavior, although they have a quantitative influence on the optimal GC usage. We observed more important quantitative changes when we changed the contact energy parameters, but even in this case the gross qualitative features of the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e516.jpg versus An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e517.jpg relationship remain valid.

Meta-population evolution of the optimal bias

The result that the mutation bias directly influences the fitness that a population can attain in its evolution suggests the intriguing possibility that there may be a feedback between mutation and selection such that a particular mutation bias favors optimal protein folding stability, and selection may favor the replication machinery yielding this optimal mutation bias. Nevertheless, the selective advantage of evolving with the optimal GC usage is only apparent after a sufficiently large number of substitutions in protein coding genes. A mutant for GC usage would have a very low selective advantage during the first generations, and therefore its fixation would be a matter of almost neutral genetic drift. After the mutant is fixed, however, our model predicts that the population evolving with optimal bias will accumulate a sufficiently high selective advantage to take over populations with a less favourable GC usage when they, or their hosts in the important case of endosymbiotic bacteria, come to compete. Therefore, we expect this meta-population selection to almost deterministically favour the selection of the strain with optimal GC usage in contrast to the almost neutral fixation of a mutant with optimal GC usage within a single population. Thus the optimal mutation bias can facilitate the selection of more stable proteins and, on a longer time scale, selection at the meta-population level may favor the replication machinery that is most suitable to protein stability.

The population sizes at which we find the maximum of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e518.jpg in our model are of the order of a few hundreds individuals for An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e519.jpg. These values appear very small compared with real bacterial populations, even if they tend to grow rapidly for very high or very low neutrality exponent An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e520.jpg. We may reconcile our model with biology if we notice that the effective population size is not the same as the total number of individuals of a species. Berg [42] showed that, if a small number of individuals often colonize new habitats with colonization probability almost independent of the founders fitness, the effective population size is given by the number of generations between two colonization events. This is a very small number for obligatory endosymbiotic and parasitic bacteria, and it may also be small for facultative parasites or symbionts, and even for the paradigm of a free living bacterium such as Escherichia coli for which Berg [42] estimated an effective population of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e521.jpg individuals.

The meta-population structure of bacterial species raises the question of whether the molecular evolution properties of a species such as the codon usage bias and the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e522.jpg ratio are primarily determined by the effective size of a local population or by the global size of the meta-population. This is an important question that requires modelling the meta-population dynamics and the different levels of selection that are relevant for it. Our opinion is that both population sizes influence the evolutionary dynamics, and that, despite the losses of stability of small local populations can be in part compensated at the meta-population level, the influence on evolution of the local population size remains important even taking into account these corrections, so that observables such as codon usage bias and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e523.jpg strongly reflect the local structure of the population.

Comparison with observed mutation bias

The distribution of GC content observed in bacterial genomes is remarkably broad. We assume here, as it is widely believed, that these differences in the GC content are mainly determined by different mutation pressures [18], [19]. The third codon position, where a shift from A to G and from C to T does not change the coded amino acid in most cases, is thought to strongly reflect the mutation bias. However, the GC content at third codon position is strongly correlated with the GC content at first and second codon position [20], [21], and through this correlation, the mutation bias influences the properties of the protein sequence, most notably its hydrophobicity [12], [22]. This is surprising, since hydrophobicity is considered the main determinant of folding stability [23], and it is expected to be finely tuned since the protein has to avoid unfolding on one hand, and misfolding and aggregation on the other hand (of course this balance is very different for membrane proteins, which are not considered here). One possible interpretation is that, due to the trade-off between unfolding and misfolding, the hydrophobicity is to some extent neutral so that it is possible to modify it without significantly affecting the global fitness of the protein. Our results suggest a different interpretation: There may be an optimal range of hydrophobicity, but this range may be different for different values of protein stability. So proteins with low stability, as those found in small populations, may tend to be more hydrophobic than proteins with high stability as those found in large populations, hence leading to a preference for a lower GC usage in their evolution.

Our model predicts that species with large population size will tend to evolve without mutation bias (GC usage equal to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e524.jpg), whereas species with small and intermediate populations will tend to present such a bias, either towards AT or towards GC. This prediction is in qualitative agreement with two independent estimates of effective population size based on optimal codon usage and on the ratio between non-synonymous and synonymous substitutions represented in Fig. 9, and with a computational comparison of unfolding and misfolding stabilities in orthologous bacterial proteins, see Fig. 11. Of course bacterial genomes are rather complex, and we do not expect the mechanism proposed here to explain their GC content as the result of a single factor, population size. Another important factor influencing the GC content has been identified in a previous statistical study, which demonstrated that aerobiosis is an important determinant of GC rich genomes [64]. This interesting result is not in contradiction with our model, since many bacteria with small GC content tend to have an intracellular lifestyle, which in turn can make them anaerobic and at the same time reduce their effective population size.

As mentioned above, the proposed relationship between low GC content and small population size is consistent with the known fact that most bacterial species that adopted an intracellular lifestyle shifted their mutation spectrum towards AT with respect to their free living relatives [26]. This AT bias is, in most cases, the consequence of the loss of repair genes. For instance, three out of the four sequenced species of Buchnera lost the gene mutH, which in Escherichia coli is responsible of repairing the replication errors produced by methylation of cytosine that causes C to T mutation [65]. Moran proposed that this loss of repair genes and the consequent mutation bias is a selectively nearly neutral event in the evolution of endosymbionts [9]. Nevertheless, the results presented here suggest that this shift has important consequences on the folding properties of the whole proteome. In fact, a strong AT bias, together with reduced population size, is expected to produce severe misfolding problems, as indicated by the low predicted misfolding stability of proteins of intracellular bacteria with respect to orthologous ones in free living bacteria [12], and by the observed positive selection and over-expression of molecular chaperones in endosymbiotic bacteria [66], which is an expensive but effective strategy to reduce misfolding problems. Interestingly, it has been found that the fitness observed in an experimental population subject to frequent bottlenecks can be in part recovered by over-expressing chaperones [15]. Nevertheless, AT bias also enhances stability with respect to unfolding, and the results presented here suggest that its influence on fitness is globally positive for small populations.

The relationship between small population size and GC richness is even less expected. Only a few out of several prokaryotic species having high GC content are obligatory intracellular bacteria, such as for instance Mycobacterium leprae, and some are facultative pathogens or plants associated symbionts. Our results suggest the intriguing possibility that they tend to have small population size, although larger than for obligatory endosymbionts. To test this prediction, we estimated the population size using optimal codon usage [44], which has often been used to estimate population sizes. There are several caveats: The selective advantage of optimal codon usage strongly varies from one gene to another, and from one species to another. However, it is expected that the average codon usage bias estimated on the whole genome is correlated with population size. The optimal codon usage is computed subtracting the average mutation background, therefore it should not be trivially influenced by mutation bias. We found significantly reduced selection for optimal codon usage in bacteria evolving with large mutation bias compared to those with moderate or no bias, supporting our prediction that the former are characterized by smaller effective population size. Furthermore, we tested the relationship between GC content and effective population size estimating the latter through the ratio between non-synonymous to synonymous substitutions computed by Daubin and Moran [46] for entire bacterial genomes. This analysis presents important caveats. For instance, the non-synonymous substitution rate has been shown to depend on the time separation between two species [47][49]. We tackled this point by eliminating values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e525.jpg estimated at short timescales, which are known to be strongly overestimated. Given the above, it is remarkable that the qualitative picture provided by this measure qualitatively coincides with the one obtained analysing optimal codon usage. Both measures strongly support the prediction of our model that species with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e526.jpg are characterized by larger effective population size. Nevertheless, among species presenting large mutation bias, those with bias towards GC are estimated through the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e527.jpg measure to have smaller effective population than those with bias towards AT, which is in contrast with our prediction. This point is worth further investigation taking into account more carefully the time dependency of the An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e528.jpg estimate [48].

Of course, there exist several exceptions to these predictions, as there are several other factors, some already identified [64], [67] and others still unknown, that influence the differences in GC content of prokaryotic species. One remarkable exception to the association between intracellularity and low GC content is the genome of the endosymbiotic bacterium Hodgkinia cicadicola, very recently sequenced by Moran's group [68]. This genome is extremely reduced (144 kb), as generally observed for endosymbiotic bacteria, but it shows GC content of 58%, which came as a big surprise since it is probably the most serious exception to the association between genome size and GC content. This genome also challenges the association between endosymbiotic bacteria and AT bias. It has been suggested that Hodgkinia belongs to the Rhizobiales division of alpha proteobacteria, characterized by high GC content. Interestingly, the genetic code of Hodgkinia underwent a modification such that UGA codes for Tryptophan instead of Stop. This modification is expected to ease the evolution of proteins that are stable with respect to misfolding. Consistently with this expectation, we found that the optimal GC usage for small populations slightly increases when this alternative genetic code is used in simulations, but this effect is too small to reconcile the GC content of Hodgkinia with its expected small effective population size (data not shown). Further research is needed to identify the origin of the GC content in this genome that lacks any repair gene [68]. Nevertheless, the association between intracellular lifestyle and AT bias, despite not being deterministic as demonstrated by this counterexample, is still strongly significant.

A second exception is represented by Prochlorococcus marinus, a very abundant species of small marine cyanobacteria [69], [70]. It is expected that this species has a very large population size, which is in agreement with a recent estimate of its An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e529.jpg ratio [46]. 11 out of 13 fully sequenced strains of this cyanobacterium present low GC content, in the range between 30 to 38 percent, apparently contradicting the association between large population size and lack of mutation bias. However, the two remaining strains have GC content of 50%, as expected according to our model, and one of these was used to estimate the small An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e530.jpg ratio that supports the large population size. Prochlorococcus has a complex meta-population structure in which the strains with 50% GC content, characterized by large genomes, appear to act as gene reservoirs. These strains are also characterized by a larger cell size than other Prochlorococcus strains, which the authors describe as “a feature that may have led to their lower isolation recovery due to the filtration step most often used to separate Prochlorococcus from Synechococcus. Hence, there are probably more LL-adapted Prochlorococcus strains with cell and genome sizes similar to those of Synechococcus thriving deep in the euphotic zone. This is apparently confirmed by the dominance of this ecotype at the base of the euphotic zone in the Atlantic Ocean, as revealed by quantitative PCR data” [70]. These strains with large genomes and without mutation bias are found at considerable depth in the ocean and thus at low oxygen pressure. There seems to be a positive association between ocean depth and GC content for Prochlorococcus strains, thus a negative association between oxygen pressure and GC content, opposite to the observed general association between oxygen and GC content [64]. Comparative analysis of the sequenced Prochlorococcus strains will be necessary to test the hypothesis that there is an association between the GC content and the population size of these strains. Consistent with this possible association, it was found that in the MED4 strain, characterized by the smallest GC content among all Prochlorococcus strains, translational selection does not shape the codon usage variation among the genes in this organism [71].

Conclusions

We have shown here that the AT mutation bias can increase the fitness associated with essential proteins if the population size is very small. The same happens with GC mutation bias for intermediate population. These results suggest that the mutation bias is not selectively neutral, but it may be the preferred outcome for the evolution of small populations. We found a deep interplay between the estimated effective population size and the GC content that is consistent with the predictions of our model. Of course this association is not deterministic, since many other factors influence the GC content. However, the influence of population size is an intriguing one that we believe is worth further investigation. Thus, we hope that this proposal will be subject to experimental test in the future.

Materials and Methods

Folding stability

As in our previous work, the unfolding free energy of a protein with sequence An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e531.jpg and contact matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e532.jpg if the minimal interatomic distance between residues An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e533.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e534.jpg is below An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e535.jpg, 0 otherwise, is defined as

equation image
(9)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e537.jpg is the contact interaction matrix determined in [72], An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e538.jpg was determined fitting Eq. (9) to a set of experimentally measured unfolding free energy (UB, unpublished) and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e539.jpg is protein length. Although rather simple, this model is accurate enough to allow quantitative predictions of the folding free energy of small proteins that fold with two-state thermodynamics (the correlation coefficient between experimental and predicted free energy is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e540.jpg over a representative test set of 20 proteins, UB, unpublished result) and of the stability effect of mutations (correlation coefficient An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e541.jpg over a set of 195 mutations, UB, unpublished result). This is comparable to state-of-the-art programs such as Fold-X [73]. However, the computational simplicity of the model makes it affordable to use it for simulating very long evolutionary trajectories with a large number of parameters, which would not be possible using other tools.

The normalized energy gap An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e542.jpg measures how alternative compact conformations are higher in energy than the native, and it is defined using the random energy model [74], [75] as

equation image
(10)

with An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e544.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e545.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e546.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e547.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e548.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e549.jpg are the mean and standard deviation of the interaction energy of both native and non-native contacts in sequence An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e550.jpg.

Protein structures

We studied five proteins with different size and secondary structures: Phosphocarrier protein of E.Coli (85 amino acids, PDB id. 1opd), Lysozyme of G.Gallus (129 amino acids, PDB id. 3lzt), ATP synthase epsilon chain of E.Coli (135 amino acids, PDB id. 1aqt), Triose Phosphate Isomerase of E.Coli (255 amino acids, PDB id. 1tre) and Tryptophan Synthase alpha chain of S. Typhimurium (260 amino acids, PDB id. 1a50). When not otherwise stated, we exemplify our results with the structure of the protein lysozyme.

Mutation process

Mutations are modelled through the HKY process [28], in which the mutation rate from nucleotide An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e551.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e552.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e553.jpg, is An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e554.jpg if An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e555.jpg is a transition, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e556.jpg if it is a transversion. The transition/transversion ratio is fixed at An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e557.jpg. The microscopic rate An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e558.jpg is assumed to be very small and it does not affect the results. We further assume An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e559.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e560.jpg (Chargaff second parity rule), so that the only parameter of the mutation model is the stationary GC content, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e561.jpg, which we call GC usage.

Simulation of the evolutionary process

Simulations were performed starting from the native sequence, which was changed through random mutations subject to the acceptance probability Eq. (1) computed using the estimated folding stabilities. We checked that simulations converged in all cases after a number of accepted substitutions not larger than a few times the protein length An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e562.jpg, and discarded the first An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e563.jpg steps of the trajectory for collecting statistics. The simulations were run until An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e564.jpg accepted substitutions were collected, which makes it rather cumbersome to simulate large populations for which the acceptance rate is small. For each set of parameters we run 10 independent simulations in order to evaluate the statistical error.

At every step, we randomly draw one mutating DNA site An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e565.jpg with probability dependent on the nucleotide An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e566.jpg that occupies it, An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e567.jpg, and we draw a new nucleotide An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e568.jpg with probability proportional to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e569.jpg. The mutation is then translated to the amino acid sequence, whose stability is computed through Eq. (9) and (10) from which we obtain fitness through Eq. (2). The fitness is compared to the one of the current wild type sequence and the mutation is accepted with probability given by Eq. (1).

Optimal mutation bias

For fixed An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e570.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e571.jpg the equilibrium fitness An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e572.jpg is simulated for 9 GC usages from An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e573.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e574.jpg and the results are fitted to a cubic function, from which we obtain the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e575.jpg at the point where the first derivative vanishes. If An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e576.jpg is monotonically increasing or decreasing the maximum (minimum) An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e577.jpg is chosen. To estimate the error, we estimated An external file that holds a picture, illustration, etc.
Object name is pcbi.1000767.e578.jpg from 10 independent simulations, and we computed mean and standard error of the mean.

Supporting Information

Text S1

Supporting figures and analytic developments

(0.23 MB PDF)

Acknowledgments

We acknowledge contributions of Andreas Buhr in early stages of this work.

Footnotes

The authors have declared that no competing interests exist.

UB acknowledges financial support from the Spanish Science and Innovation Ministry through the Ramón y Cajal program and through the projects BIO2008-04384 and CSD2006-00023, and a stay at the Aspen Center for Physics where a first version of this work was written. Our collaboration was facilitated through the program “Acciones Integradas Espana-Alemania” of the Spanish Science and Innovation Ministry, project HA2006-0044, and of the Deutscher Akademischer Austauschdienst project D/06/12848. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624–626. [PubMed]
2. Kimura M. The neutral theory of molecular evolution. Cambridge Univ. Press; 1983.
3. Taverna DM, Goldstein RA. Why are proteins marginally stable? Proteins. 2002;46:105–109. [PubMed]
4. Muller HJ. Some Genetic Aspects of Sex. American Naturalist. 1932;66:118–138.
5. Wright SG. The distribution of gene frequencies in populations of polyploids. Proc Natl Acad Sci USA. 1938;24:372–377. [PubMed]
6. Fisher RA. The genetical theory of natural selection. Dover, New York: 1958.
7. Ohta T. Role of very slightly deleterious mutations in molecular evolution and polymorphism. Theor Pop Biol. 1976;10:254–275. [PubMed]
8. Graur D, Li WH. Fundamentals of molecular evolution. Sunderland: Sinauer; 2000.
9. Moran NA. Accelerated evolution and Muller's ratchet in endosymbiotic bacteria. Proc Natl Acad Sci USA. 1996;95:4458–4462. [PubMed]
10. Itoh T, Martin W, Nei M. Acceleration of genomic evolution caused by enhanced mutation rate in endocellular bacteria. Proc. Natl Acad Sci USA. 2002;99:12944–12948. [PubMed]
11. Lambert DJ, Moran NA. Deleterious mutations destabilize ribosomal RNA in endosymbiotic bacteria. Proc Natl Acad Sci USA. 1998;95:4458–4462. [PubMed]
12. Bastolla U, Moya A, Viguera E, van Ham RCHJ. Genomic determinants of protein folding thermodynamics. J Mol Biol. 2004;343:1451–1466. [PubMed]
13. Duarte E, Clarke D, Moya A, Domingo E, Holland J. Rapid fitness losses in mammalian RNA virus clones due to Muller's ratchet. Proc Natl Acad Sci USA. 1992;89:6015–6019. [PubMed]
14. Novella IS, Dutta RN, Wilke CO. A linear relationship between fitness and the logarithm of the critical bottleneck size in vesicular stomatitis virus populations. J Virol. 2008;82:12589–12590. [PMC free article] [PubMed]
15. Fares MA, Ruiz-Gonzalez MX, Moya A, Elena SF, Barrio E. Endosymbiotic bacteria: GroEL buffers against deleterious mutations. Nature. 2002;417:398. [PubMed]
16. Berg J, Willmann S, Lässig M. Adaptive evolution of transcription factor binding sites. BMC Evol Biol. 2004;4:42. [PMC free article] [PubMed]
17. Sella G, Hirsh AE. The application of statistical physics to evolutionary biology. Proc Natl Acad Sci USA. 2005;102:9541–9546. [PubMed]
18. Muto A, Osawa S. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA. 1987;84:166–169. [PubMed]
19. Chen SL, Lee W, Hottes AK, Shapiro L, McAdams H. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci USA. 2004;101:3480–5. [PubMed]
20. Sueoka N. Correlation between base composition of the deoxyribonucleic acid and amino acid composition of proteins. Proc Natl Acad Sci USA. 1961;47:469–478. [PubMed]
21. Bernardi G, Bernardi G. Codon usage and genome composition. J Mol Evol. 1985;24:1–11. [PubMed]
22. D'Onofrio G, Jabbari K, Musto H, Bernardi G. The correlation of protein hydropathy with the base composition of coding sequences. Gene 1999. 1999;238:3–14. [PubMed]
23. Kauzmann W. Some factors in the interpretation of protein denaturation. Adv Protein Chem. 1959;14:1–63. [PubMed]
24. Uversky VN. Protein folding revisited. A polypeptide chain at the folding – misfolding – nonfolding cross-roads: Which way to go? Cell Mol Life Sci. 2003;60:1852–1871. [PubMed]
25. Bastolla U, Porto M, Roman HE, Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol. 2006;6:43. [PMC free article] [PubMed]
26. Silva FLatorre, Gomez-Valero AL, Moya A. Genomic Changes in Bacteria: From Free-Living to Endosymbiotic Life. Structural Approaches to Sequence Evolution. In: Bastolla U, Porto M, Roman HE, Vendruscolo M, editors. Springer; 2008.
27. Durrett R. Probability models for DNA sequence evolution. Springer; 2002.
28. Hasegawa M, Kishino H, Yano T. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22:160–174. [PubMed]
29. Bastolla U, Porto M, Roman HE, Vendruscolo M. Lack of self-averaging in neutral evolution of proteins. Phys Rev Lett. 2002;89:208101. [PubMed]
30. Bastolla U, Porto M, Roman HE, Vendruscolo M. Statistical properties of neutral evolution. J Mol Evol. 2003;57:S103–S119. [PubMed]
31. Govindarajan S, Goldstein RA. On the thermodynamic hypothesis of protein folding. Proc Natl Acad Sci USA. 1998;95:5545–5549. [PubMed]
32. Bornberg-Bauer E, Chan HS. Modeling evolutionary landscapes: Mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci USA. 1999;96:10689–10694. [PubMed]
33. Babajide A, Hofacker IL, Sippl MJ, Stadler PF. Neutral networks in protein space. Fol Des. 1997;2:261–269. [PubMed]
34. Bussemaker HJ, Thirumalai D, Bhattacharjee JK. Thermodynamic stability of folded proteins against mutations. Phys Rev Lett. 1997;79:3530–3533.
35. Tiana G, Broglia RA, Roman HE, Vigezzi E, Shakhnovich EI. Folding and misfolding of designed proteinlike chains with mutations. J Chem Phys. 1998;108:757–761.
36. Mirny LA, Abkevich VI, Shakhnovich EI. How evolution makes proteins fold quickly. Proc Natl Acad Sci USA. 1998;95:4976–4981. [PubMed]
37. Dokholyan NV, Shakhnovich EI. Understanding hierarchical protein evolution from first principles. J Mol Biol. 2001;312:289–307. [PubMed]
38. Parisi G, Echave J. Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol. 2001;18:750–756. [PubMed]
39. DePristo MA, Weinreich DM, Hartl DL. Missense meanderings in sequence space: a biophysical view of protein evolution. Nat Rev Genet. 2005 Sep;6(9):678–87. [PubMed]
40. Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH. Thermodynamic prediction of protein neutrality. Proc Natl Acad Sci U S A. 2005;102:606–611. [PubMed]
41. Godzik A, Koli ski A, Skolnick J. Are proteins ideal mixtures of amino acids? Analysis of energy parameter sets. Protein Sci. 1995;4:2107–17. [PubMed]
42. Berg OG. Selection Intensity for Codon Bias and the Effective Population Size of Escherichia coli. Genetics. 1996;142:1379–1382. [PubMed]
43. Petit N, Barbadilla A. Selection efficiency and effective population size in Drosophila species. J Evol Biol. 2009;22:515–26. [PubMed]
44. dos Reis M, Savva R, Wernisch L. Solving the riddle of codon usage preferences: A test for translational selection. Nucl Ac Res. 2004;32:5036–5044. [PMC free article] [PubMed]
45. Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucl Ac Res. 2005;33:1141–1153. [PMC free article] [PubMed]
46. Daubin V, Moran NA. Comment on “The Origins of Genome Complexity”. Science. 2004;306:978. [PubMed]
47. Ho SY, Phillips MJ, Cooper A, Drummond AJ. Time dependency of molecular rate estimates and systematic overestimation of recent divergence times. Mol Biol Evol. 2005;22:1561–8. [PubMed]
48. Rocha EP, Smith JM, Hurst LD, Holden MT, Cooper JE, Smith NH, Feil EJ. Comparisons of dN/dS are time dependent for closely related bacterial genomes. J Theor Biol. 2006;239:226–35. [PubMed]
49. Peterson GI, Masel J. Quantitative prediction of molecular clock and Ka/Ks at short timescales. Mol Biol Evol. 2009 doi 10.1093/molbev/msp175. [PMC free article] [PubMed]
50. Denamur E, Matic I. Evolution of mutation rates in bacteria. Mol Microbiol. 60:820–7. [PubMed]
51. Loh E, Salk JJ, Loeb LA. Optimization of DNA polymerase mutation rates during bacterial evolution. Proc Natl Acad Sci U.S.A. 2010 [Epub ahead of print] [PubMed]
52. Nilsson M, Snoad N. Optimal mutation rates in dynamic environments. Bull Math Biol. 64:1033–43. [PubMed]
53. Brumer Y, Shakhnovich EI. Host-parasite coevolution and optimal mutation rates for semiconservative quasispecies. Phys Rev E Stat. 2004;69:061909. [PubMed]
54. Clune J, Misevic D, Ofria C, Lenski RE, Elena SF, Sanjuán R. Natural selection fails to optimize mutation rates for long-term adaptation on rugged fitness landscapes. PLoS Comput Biol. 2008;4:e1000187. [PMC free article] [PubMed]
55. Drake JW. Avoiding dangerous missense: thermophiles display especially low mutation rates. PLoS Genet. 2009;5:e1000520. [PMC free article] [PubMed]
56. Zeldovich KB, Chen P, Shakhnovich EI. Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc Natl Acad Sci USA. 2007;104:16152–16157. [PubMed]
57. Chen P, Shakhnovich EI. Lethal mutagenesis in viruses and bacteria. Genetics. 2009;183:639–50. [PubMed]
58. van Nimwegen E, Crutchfield JP, Huynen M. Neutral evolution of mutational robustness. Proc Natl Acad Sci USA. 1999;96:9716–9720. [PubMed]
59. Wilke CO. Molecular clock in neutral protein evolution. BMC Genetics. 2004;5:25. [PMC free article] [PubMed]
60. Taverna DM, Goldstein RA. Why are proteins so robust to site mutations? J Mol Biol. 2002;315:479–84. [PubMed]
61. Bloom JD, Lu Z, Chen D, Raval A, Venturelli OS, Arnold FA. Evolution favors protein mutational robustness in sufficiently large populations. BMC Biology. 2007;5:29. [PMC free article] [PubMed]
62. Bloom JD, Raval A, Wilke CO. Thermodynamics of neutral protein evolution. Genetics. 2007;175:255–66. [PubMed]
63. Eigen M. Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften. 1971;58:465–523. [PubMed]
64. Naya H, Romero H, Zavala A, Alvarez B, Musto H. Aerobiosis increases the genomic guanine plus cytosine content (GC%) in prokaryotes. J Mol Evol. 2002;55:260–264. [PubMed]
65. van Ham RC, Kamerbeek J, Palacios C, Rausell C, Abascal F, Bastolla U, Fernández JM, Jiménez L, Postigo M, Silva FJ, Tamames J, Viguera E, Latorre A, Valencia A, Morán F, Moya A. Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci USA. 2003;100:581–586. [PubMed]
66. Fares MA, Moya A, Barrio E. GroEL and the maintenance of bacterial endosymbiosis. Trends Genet. 2004;20:413–416. [PubMed]
67. Musto H, Naya H, Zavala A, Romero H, Alvarez-Val n F, Bernardi G. Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem Biophys Res Commun. 2006;347:1–3. [PubMed]
68. McCutcheon JP, McDonald BR, Moran NA. Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet. 2009;5:e1000565. [PMC free article] [PubMed]
69. Kettler, et al. Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 3:e231. [PubMed]
70. Scanlan, et al. Ecological Genomics of Marine Picocyanobacteria. Microbiology and Molecular Biology Reviews. 73:249–299. [PMC free article] [PubMed]
71. Banerjee T, Ghosh TC. Gene expression level shapes the amino acid usages in Prochlorococcus marinus MED4. J Biomol Struct Dyn. 2006;23:547–54. [PubMed]
72. Bastolla U, Farwer J, Knapp EW, Vendruscolo M. How to guarantee optimal stability for most representative structures in the protein data bank. Proteins. 2001;44:79–96. [PubMed]
73. Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol. 2002;320:369–87. [PubMed]
74. Derrida B. Random Energy Model: an exactly solvable model of disordered systems. Phys Rev B. 1981;24:2613–2626.
75. Shakhnovich EI, Gutin AM. Formation of unique structure in polypeptide chains. Theoretical investigation with the aid of a replica approach. Biophys Chem. 1989;34:187–199. [PubMed]

Articles from PLoS Computational Biology are provided here courtesy of Public Library of Science