It has long been established that the DNA composition directly affects the amino acid composition of proteins. The bulk correlation between GC content and the aa composition has been early evidenced 
, while more subtle effects, such as positioning on the leading or lagging strand was more recently detected 
. Our results demonstrate that in addition to these different effects, the coding sequence composition directly influences the future evolution of proteins. Moreover, we show that one can easily manipulate this principle to widen the short termed evolutionary perspectives of any given protein.
A single round of directed evolution of the two synonymous sequences aacWT and aacELP led to the isolation of five mutations modifying the resistance spectrum of the encoded AAC(6′)-Ib enzyme. Among these mutations, three were accessible from only one of two synonymous sequences through single mutation, and were indeed only isolated from the corresponding mutant libraries. These results, together with the observed substitution pattern sketched in , clearly show that parallel directed evolution of specifically designed synonymous sequences permits a wider exploration of the local protein landscape. In the framework of a serial directed evolution experiment, once a beneficial mutation has been identified in a given sequence, it can easily be introduced in other synonymous sequences by site-directed mutagenesis to proceed to the next round of evolution.
We emphasize that an ELP-designed sequence per se does not improve the encoded protein evolvability. The strategy we propose is rather a hypothesis-free approach to expand the evolutionary perspectives of existing proteins, so that parallel directed evolution of wild type plus synthetic sequences actually increases the overall odd to identify advantageous mutations. Indeed, what does matter from a biotechnological point of view is not the evolvability of a given DNA coding sequence, but the ability to extensively explore the corresponding protein sequence space. Conceptually, this dissociates the polypeptidic product of interest from the actual nucleic acid sequence from which it originates.
As no aa displays more than four codons with different REP, four synonymous sequences are sufficient to explore all the possibilities allowed by this principle, assuming independence between positions along the sequence (no epistasis). Obviously, a huge number of synonymous sequences would be needed to tackle the combinatorial association of codons between positions, but parallel evolution of four sequences seems a tractable alternative. The ELP software allows drawing of up to three alternative sequences with evolutionary perspectives as different as possible from each other and from the initial sequence at each codon. The use of such a set of sequences significantly reduces the number of mutations necessary for extensive landscape exploration (Figure S3
), and consequently decreases the required library size by several orders of magnitude (see supporting Text S1
). Effective mutational spectra vary markedly according to protocols, and in some cases it can be controlled for. This can be used to discriminate between otherwise equivalent alternative codons. An improved version of the ELP program will be developed to take such parameters into account.
Evolvability and robustness are tightly linked, with exploration of neutral networks potentially fuelling adaptive evolution 
. Hence, methods that improve spreading along a protein neutral space result in promoting its evolvability 
. In this study, we chose the most open and straightforward approach to design the neutral alternative sequence. The REP calculation does not rely on any particular assumption about the chemistry of the protein: every aa accessible by single mutation is counted as one unit (Hamming metric) and only synonymous codons were considered as potential alternatives. However, any idiosyncratic knowledge of a protein structure/function relationships can be incorporated in the calculation, by applying different metrics to specific residues or regions of the protein. When available, in silico
predictions might also be used to include non-synonymous, but nonetheless neutral mutations as potential alternatives in the REP calculations. This latter strategy, however, is risky because a single mispredicted substitution can flaw the enzyme activity and its derived library. Apart from improving directed evolution of proteins, synonymous codon replacement might alternatively be used to prevent the appearance of previously identified deleterious mutations, thereby favoring protein robustness in specific biotechnological applications.
Formally, the principle presented here plays with the exploration of synonymous sequence space. It is usually assumed that this exploration depends upon mutation rate and chance (neutral drift), in which case the use of synthetic sequences saves the time necessary for these processes to occur. However, some weak sub-functional forces may also structure synonymous space and constrain evolutionary pathways in many species 
. The ELP strategy permits one to circumvent such constraints. As a case study, we focused on the L55Q substitution which was only isolated from the synthetic sequence aacELP
and was not directly accessible from the wild type sequence aacWT
. Strikingly, it is the only mutation identified in this study which is not represented in the 129 different aac6′-Ib
homologous sequences deposited in the NCBI database. We identified two possible evolutionary pathways for that substitution. The longer one comprises two synonymous intermediates and can be explored by extensive drift over long time scale. The shorter one comprises only one synonymous intermediate, but the corresponding codon is very weakly used throughout many of the gene's host genomes.
Over the last 20 years, experimental studies reported various phenotypic effects associated with modification of sequences codon usage: alteration of mRNA structure 
, modification of translation efficiency 
, and protein aggregation due to alteration of folding route and final tertiary structure 
. Although a recent report has linked a phenotypic effect to the presence of two weakly used codons combined with a non-synonymous SNP 
, to the best of our knowledge no significant impact of a single rare codon has ever been described, most likely because it should involve a decrease in protein synthesis too weak to be measured accurately.
As we did not manage to measure any effect of the weakly used L55 CTA codon alone, we performed simulations of the L55Q adaptive landscape exploration in which we assumed various fitness values associated with this codon. Not surprisingly, drift toward Gln CAA requires a substantial amount of time, even when the CTA intermediate is considered neutral. Our results show that fitness decreases that are too faint to be detected in vivo can strongly affect the passage through weakly used codons. Eventually, longer and neutral pathways can rise as more probable outcomes over time. The hypothetic non-appearance of the adaptive L55Q substitution in nature would then be consistent with the relatively recent introduction of the antibiotic selective pressure. Nevertheless, we identified distant homologs in GenBank that display Leu TTG or CTG in position 55. These sequences can stand as natural intermediates along the longer pathway toward Gln CAG (see bottom).
If the adaptive landscape of proteins is indeed subtly structured by the codon preferences of the host genome, these constraints should be altered by higher-order evolutionary events such as horizontal gene transfer (HGT). At least in bacteria, HGT is a major factor of genome evolution 
, while phylogenetically distant species usually display markedly different codon usages. The introgression of a gene may compel its codon usage to conform to the new host 
, thus granting access to new adaptive pathways and offering chances to provide different mutants. Another intriguing issue, that should soon become feasible considering the current intensive efforts in synthetic biology 
, would be to recode the full set of genes encoding a metabolic pathway, and even a whole bacterial genome 
. By relieving several constraints at the same time, this could unlock access to potentially adaptive solutions and give access to the study of evolutionary phenomena from an upper scale.
The codon composition of a coding sequence is the outcome of its history, whether it is selective or contingent. It has been suggested that natural selection might actively bias the codon usage of some proteins to modulate their robustness to mutation or mistranslation 
. Although this later possibility remains unclear, we demonstrated experimentally that an astute reorganization of the synonymous codons can be performed artificially to modify the evolvability of the encoded protein. This strategy allows wider exploration of the protein space while limiting both library sizes and amount of time usually required for genetic drift. Hence, it provides an inexpensive and powerful tool to enhance the efficiency of any directed evolution protocol.