|Home | About | Journals | Submit | Contact Us | Français|
Selenocysteine (Sec), the 21st amino acid, is incorporated into proteins through the recoding of a termination codon, an inefficient translational process mediated by a complex molecular machinery. Sec is a rare amino acid in extant proteins, chemically similar to cysteine (Cys), found in homologous position to Cys of nonselenoprotein families. Selenoproteins account for the dependence of vertebrates on environmental selenium (Se) and have an important role in several Se-deficiency diseases. Selenoproteins are poorly characterized enzymes and reports on the functional exchangeability of Sec with Cys are limited and controversial. Whether the unique role of Sec in some selenoenzymes illustrates the broader contribution of Se to protein function is unknown (Gromer S, Johansson L, Bauer H, Arscott LD, Rauch S, Ballou DP, Williams CH Jr, Schirmer RH, Arnér ES. 2003. Active sites of thioredoxin reductases: why selenoproteins? Proc Natl Acad Sci USA. 100:12618–12623). Here, we address this question from an evolutionary perspective by the simultaneous identification of the patterns of divergence in almost half a billion years of vertebrate evolution and diversity within the human lineage for the full complement of enzymatic Sec residues in these proteomes. We complete this analysis with data for the homologous Cys residues in the same genomes. Our results indicate concerted purifying selection across Sec and Cys sites in all selenoproteomes, consistent with a unique role of Sec in protein function, low exchangeability, and an unknown degree of functional divergence with Cys homologs. The distinct biochemical properties of Sec, rather than the geographical distribution of Se, global O2 levels or Sec metabolic cost, appear to play a major role in driving adaptive changes in vertebrate selenoproteomes. A better understanding of the selenoproteomes and neutral evolutionary patterns in other taxa will be necessary to fully assess the generality of this conclusion.
Patterns of amino acid changes in proteins are often interpreted as a measure of the exchangeability between amino acid pairs. Indeed, the propensity for evolutionary change from one amino acid to another has been thoroughly studied. For example, the AAindex database (Kawashima et al. 2008) contains dozens of exchangeability measures between amino acids, from purely physicochemical to purely comparative. A functional interpretation of these measures is common in evolutionary studies. This research, however, applies only to the standard 20 amino acids in proteins. Little is known about the exchangeability of selenocysteine (Sec), the 21st amino acid, in nature.
Sec is a cysteine (Cys) analogue with a selenium (Se)-containing selenol group in place of the sulfur-containing thiol group in Cys (Stadtman 1996). Sec and Cys residues occupy homologous positions, presumably serving an oxidoreductase role, in several proteins not fully functionally characterized. Although the translation of Cys codons (UGC/UGT) in these proteins is typical, a complex translational machinery is necessary to incorporate Sec into an in-frame termination codon (UGA) in selenoproteins (Driscoll and Copeland 2003). In eukaryotes, the Sec insertion sequence (SECIS), an RNA stem-loop located in the 3′ untranslated region of selenoprotein genes, recruits several transacting factors to recode UGA from termination to Sec insertion (Driscoll and Copeland 2003). The Sec residue encoded by the recoded UGA is inserted into the growing peptide, and translation of the protein continues until the proper termination codon.
The majority of eukaryotic and prokaryotic selenoproteins have now been found in Cys form raising questions about the functional exchangeability of Sec with Cys in protein function, a long-standing issue in Se biology (Johansson et al. 2005). Among those clades with Se-dependent proteins, the growing knowledge of the number, distribution, and function of vertebrate selenoproteins and their Cys-containing homologs (Gromer et al. 2003; Kryukov et al. 2003; Castellano et al. 2004, 2005; Kim and Gladyshev 2005; Shchedrina et al. 2007), together with the evolutionary depth and quality of vertebrate genomes, provides a first opportunity to gain insight into this question.
The extent of exchangeability between Sec and Cys residues reflects the contribution of Sec to protein function. The long-term exchangeability between these amino acids, however, cannot be fully ascertained from functional studies on extant selenoproteins, as functional differences in present-day sequences are not a measure of fitness in natural populations (Gould and Lewontin 1979; Eyre-Walker and Keightley 2007; Nielsen et al. 2007). Therefore, the best approach to evaluate the functional exchangeability between the two amino acids is to infer the strength and mode of natural selection acting on the reciprocal exchange of residues (Williamson et al. 2005), as amino acid exchangeability is a measure of the neutrality of the substitution of two residues by one another in a protein.
Neutral patterns of selenoproteome divergence and diversity would indicate no fitness advantage or disadvantage of Sec over Cys (e.g., no distinct contribution of Sec to protein activity). Under neutrality, the expected patterns of variation in Sec and Cys sites can be inferred from population genetics theory or simulations of the evolutionary process (Castellano 2009). This constitutes the null (undisturbed by selective forces) model of Sec usage in protein evolution. Departures from neutrality are a signature of natural selection and, under some simplifying assumptions, can be interpreted as 1) selection against deleterious Sec/Cys mutations (purifying selection), which is consistent with low Sec/Cys exchangeability and denotes functional differences between Sec and Cys residues; purifying selection would result in a deficit of variation within populations and differentiation among populations, or 2) selection favoring advantageous Sec/Cys mutations (positive selection), which can be interpreted as evidence for adaptive evolution and, in the case of heterogeneous selective pressures unrelated to protein function, high Sec/Cys functional exchangeability; positive selection in some populations would result in an overall excess of variation.
Environmental, metabolic and biochemical selective pressures have been suggested to shape Sec use in proteins. Among those suggested are 1) the wide differences in dietary Se status among populations due to the worldwide variability of Se content in soils and waters (Shamberger 1981; Levander 1987; Valentine 1997), which may lead to disease due to excess or deficit of Se (Levander 1987); 2) the different Sec sensitivities to oxidation among selenoproteins and selenoproteomes due to variable O2 levels over geologic time (Leinfelder et al. 1988; Jukes 1990; Berner 2006; Berner et al. 2007); 3) the higher anabolic cost and lower translational efficiency of Sec (Berry et al. 1992; Driscoll and Copeland 2003; Mehta et al. 2004; Xu et al. 2007); and 4) the increased reactivity provided by Se (Berry et al. 1992; Rocher et al. 1992; Maiorino et al. 1995; Zhong and Holmgren 2000), which results in a possibly advantageous high catalytic activity in selenoenzymes (100- to 1000-fold more active than their Cys counterparts). Such higher enzymatic efficiency of Sec over Cys has, however, recently been challenged (Kanzok et al. 2001; Gromer et al. 2003; Kim and Gladyshev 2005) and its interpretation in terms of functional exchangeability over evolutionary time is problematic. The importance of these selective factors in selenoprotein evolution is untested, despite their common explanatory role for Sec/Cys replacements in the Se and selenoprotein literature.
Here, we study the exchangeability between Sec and Cys enzymatic residues through the analysis of the patterns of divergence among vertebrates and diversity within humans of all homologous Sec and Cys sites in these genomes. We identify concerted purifying selection across these sites in all selenoproteomes, consistent with a unique role of Sec in protein activity. The low exchangeability observed between Sec and Cys amino acids reveals a previously unappreciated degree of functional divergence between Sec- and Cys-containing enzymes where the distinct biochemical properties of Sec, and not environmental nor metabolic factors, may drive adaptive changes in selenoproteomes. Our conclusions represent a strong departure from the recent but prevailing view favoring ecological explanations to Sec evolution.
The human selenoproteome consists of 25 selenoproteins (Kryukov et al. 2003) and 6 paralogous genes with Cys (supplementary table S1, Supplementary Material online). In addition, four genes with Cys, orthologous to vertebrate selenoproteins, exist (supplementary table S1, Supplementary Material online). All 35 proteins were included in the divergence and diversity analyses. However, only the enzymatic Sec residue in the N-terminal domain of SelP was analyzed. Human sequences were obtained from SelenoDB (Castellano et al. 2008) at http://www.selenodb.org. This database provides the correct genomic structure of all human selenoprotein genes, which is essential to our genotyping efforts. Other more general databases only contain the mRNA sequence (e.g., Genbank) or vertebrate selenoprotein gene annotations of variable quality (e.g., Ensembl) and are not appropriate for this study.
Following Nikolaev et al. (2007) phylogeny, one or more representative from all major vertebrate taxa were chosen 1) Supraprimates: Human, Chimpanzee, Macaque, and Mouse; 2) Laurasiatheria: Dog and Hedgehog; 3) Xenarthra: Armadillo; 4) Afrotheria: Elephant; 5) Marsupialia: Opossum; 6) Prototheria (Monotremata): Platypus; 7) Archosauromorpha (crocodiles, dinosaurs, and birds): Chicken: 8) Lepidosauromorpha (snakes and lizards): Lizard; 9) Amphibia (salamanders and frogs): Frog; and 10) Teleostei: Puffer fish and Zebrafish. The phylogeny encompasses 450 ± 36 My of vertebrate evolution (Hedges 2002; fig. 1). See supplementary table S3, Supplementary Material online for species scientific names.
Nonhuman selenoproteins are routinely misannotated in genomic projects as most gene annotation systems consider Sec as a stop codon. The correct gene structures and protein sequences for most selenoproteins in diverged species are not easily available and their annotation will involve extensive manual curation in the future. Therefore, orthologous residues to 25 Sec and 10 homologous Cys amino acids in human were identified as follows: 1) The 35 human proteins, organized into 19 families (supplementary table S1 in Supplementary Material online), were blasted (Gish and States 1993) against the panel of vertebrate genomes. WU Blast 2.0 parameters were E = 0.001, W = 4, and filter = seg in combination of the substitution matrices BLOSUM50, 62 or 80; 2) all hits were automatically filtered for alignments with symmetrical conservation in regions flanking Sec–Sec or Cys–Sec aligned pairs (at least 5 similar residues in both regions of 10 amino acids each); and 3) the target sequences in each alignment were blasted back against the human families. Orthology was assigned by best reciprocal hit. In each step, alignments were manually inspected and extended beyond the Sec codon if necessary. Thioredoxin-like proteins were searched without the symmetrical conservation filter due to the short sequence region beyond Sec. When available, shark, lamprey, and sea urchin sequences were used to polarize Sec/Cys states. Gene orthology and paralogy were identified through gene and species tree reconciliation (Zmasek and Eddy 2001). Orthology assignment for nonhuman or nonmammalian selenoproteins was carried out similarly.
The program Mesquite v1.12 (http://mesquiteproject.org) was used to assign optimal character states (Sec/Cys) to the proteins in the tree internal nodes using the most-parsimonious reconstruction under the Fitch parsimony criterion (Fitch 1971). The overall similarity of selenoproteomes observed among species and the small number and phylogenetic distribution of identified Sec/Cys changes makes the use of Maximum Parsimony a reasonable choice.
Twelve reconstructed ancestral selenoproteome sizes throughout the vertebrate phylogeny (fig. 1) were correlated with the estimated levels of atmospheric O2 (Berner 2006; Berner et al. 2007; supplementary table S4, Supplementary Material online). To compute Spearmans's rank correlation (no normality in the data assumed) given ties between ranks, we calculated Pearson's rank correlation coefficient with the cor.test function of the R statistics package v2.5.1. We tested significance with a one-sided t-test using the same program.
The neutrality test is based on a simple divergence statistic D for binary Sec/Cys data. Let DSec→aa be the proportion of ancestral Sec sites that have diverged at least once from the ancestral Sec state in a phylogeny. Similarly, let DCys→aa be the proportion of ancestral Cys sites that have diverged at least once from the ancestral Cys state in a phylogeny. Then, over n ancestral sites of the same class, D is simply computed as:
where di = 0 if the ith site has not diverged in any species from its ancestral state, and di = 1 if the ith site has diverged in one or more species from its ancestral state. This expression of divergence is highly conservative to deviations toward purifying selection because multiple changes in a site (expected under neutrality) do not affect (counted as one) the estimate of D.
The neutrality test is carried out comparing the empirically observed DObs with the distribution of D, which was obtained through neutral simulations of the evolution of ancestral Sec or Cys codons along the phylogeny (Nikolaev et al. 2007; fig. 1). We use a continuous time Markov Chain model of sequence evolution and assume independence between sites, a reasonable assumption because there is only one Sec or Cys codon per gene. We performed 10,000 Markov Chain Monte Carlo simulations with a modified version of the program Seq-Gen v1.3.2 (Rambaut and Grassly 1997), in which strongly deleterious mutations (those resulting in TAA or TAG stop codons) are immediately eliminated from the population and do not contribute to sequence divergence (source code and program available from SC). We used the standard Hasegawa-Kishino-Yano model of nucleotide evolution (Hasegawa et al. 1985) in which the instantaneous rate of evolution is comprised of a transition/transversion ratio (TS/TV) set to 1.8 (Rosenberg et al. 2003) and the equilibrium nucleotide frequencies set to A = 0.26, T = 0.26, C = 0.24, and G = 0.24, as estimated from 4-fold degenerate sites in 10 vertebrate species ranging from human to Takifugu (Margulies et al. 2005). Branch lengths were set to a proxy of the mean number of neutral mutations per site, the number of synonymous substitutions per site, as estimated from 4-fold degenerate sites (Margulies et al. 2005) for the set of species analyzed in this work (Margulies EH, personal communication).
The exchangeability test is also based on the statistic D. Let DSec→Cys be the proportion of ancestral Sec sites that have diverged to Cys at least once from the ancestral Sec state in a phylogeny. Similarly, let DCys→Sec be the proportion of ancestral Cys sites that have diverged to Sec at least once from the ancestral Cys state in a phylogeny. The exchangeability test is carried out comparing the observed DObs with the neutral distribution of D, which is derived from 10,000 simulations where mutations other than between Sec and Cys codons are effectively removed from the population by strong purifying selection (see above for mutational parameters). In this model, sequences evolve neutrally at a fraction of the mutation rate. Only Sec/Cys neutral substitutions contribute to selenoproteomes divergence and, therefore, this test reflects the overall proportion of deleterious Sec/Cys (probably function-altering) substitutions removed by the action of purifying selection.
The distribution of mutation rate heterogeneity at the megabase scale is difficult to estimate, as it depends on chromosomal position, GC content, neighboring bases, efficiency of the repair system, and other factors (Ellegren et al. 2003). We investigated the robustness of our tests to nonuniform mutation rates in selenoprotein genes through simulations of the neutral process with increasing rate heterogeneity. We carried out simulations in which mutation rates among genes obey a gamma distribution with decreasing values of the shape (alpha) parameter so that approximately 10%, 15%, 20%, and 25% of the genes lie in mutation cold spots (see Results and supplementary fig. S2, Supplementary Material online).
The Human Genome Diversity Panel–Centre d'Etude du Polymorphisme Humain (HGDP–CEPH) Human Genome Diversity Cell Line panel contains 1,037 samples from a wide range of world populations (Cann et al. 2002; supplementary table S5, Supplementary Material online). In order to assess variability at 25 Sec (TGA) and 10 Cys (TGT/TGC) enzymatic codons in the panel, SNPlex genotyping assays (Applied Biosystems, Foster City, CA) for all sites and KASPar assays (Kbioscience, United Kingdom) for 14 codons (unresolved with the previous method) were designed. For each codon, two possible changes were tested: TGA to TGC/TGC, TGT to TGC/TGA, and TGC to TGT/TGA. SNPlex and KASPar genotyping assays were performed according to manufacturer conditions and average call rates of 98.34% and 98.4% were obtained, respectively (supplementary table S6, Supplementary Material online). Unclear genotypes (8 codons in a total of 11 individuals) were confirmed through direct sequencing. Regions flanking each codon were amplified in a 10-μl final volume containing PCRx Amplification Buffer (Invitrogen), 1.5 mM MgSO4, 2× Enhancer Solution (PCRx Enhancer System, Invitrogen), 0.5 μM of each primer (supplementary table S7, Supplementary Material online), 200 μM deoxyribonucleotide triphosphate, 0.5 U Taq DNA polymerase (Roche), and 1 μl of 10–50 ng of DNA. Polymerase chain reaction (PCR) cycling conditions were as follows: 95 °C for 5 min; 32 cycles of 95 °C for 30 s, 46.0 °C (GPx3), 48.0 °C (GPx8), 48.0 °C (MsrA), 52.0 °C (SelH), 51.4 °C (SelM), 50.7 °C (SelN), 48.0 °C (SelT), 48.7 °C (TR1) for 30 s, and 72 °C for 2 min; with a final elongation step at 72 °C for 7 min. Amplification products were purified with Exo-SAP (GE Healthcare Europe GMBH). Sequencing reactions were performed for each strand, using the corresponding forward or reverse amplification primer with the ABI PRISM BigDye Terminator v3.1 Cycle Sequencing kit (Applied Biosystems) following the supplier's instructions. Sequencing products were subsequently purified using the Montage SEQ96 clean up kit (Millipore) and analyzed on a 3730XL DNA Analyzer (Applied Biosystems).
The survey of variability revealed no variation in Sec and homologous Cys sites in humans. The probability of finding, under neutrality, no variation in 35 sites and 2,074 chromosomes given the average level of diversity in humans was assessed based on coalescent theory: first, analytically (Hein et al. 2005), considering the average θ of the human genome (8.25 × 10−4) (Venter et al. 2001) and second, through coalescent simulations. Neutral coalescent simulations simulate the expected diversity of a locus under neutrality given its mutation rate, sample size, and the demographic history of the population. The evolution of the 35 sites was simulated with the program ms (Hudson 2002), considering the above θ in the human genome, under demographic equilibrium and a range of demographic scenarios compatible with the history of human populations (Marth et al. 2004; Williamson et al. 2005). Though all demographic models represent simplified versions of the complex demographic history of human populations, fitting demographic parameters in a model of the demographic history of 51 populations is not practical given the small samples in each. Ignoring the population structure present in the data makes the test conservative, because population differentiation increases variability and reduces our power to reject neutrality. Ten thousand simulations were performed for every model, and the probability of the data was assessed by direct comparison of the number of observed polymorphic sites with the simulated distribution. Analytical and simulation results agree, regardless of the demographic model considered.
Our reconstruction of the ancestral vertebrate selenoproteome resulted in 31 selenoproteins (DI1, DI2, DI3, GPx1, GPx2, GPx3, GPx4, Sel15, SelH, SelI, SelJ, SelK, SelM, SelN, SelL, SelO, SelPa, SelPb, SelR1, SelS, SelT, SelUa, SelUb, SelUc, SelV, SelW1, SelW2a, SPS2, TR1, TR2, and TR3) and 5 Cys homologs (GPx7, GPx8, MsrA, SelR2, and SelR3). See figure 1 and supplementary tables S1 and S2, Supplementary Material online. In addition, six nonancestral selenoproteins (DI4, Fep15, GPx6, SelT2, SelT3, and SelR4) and one Cys homolog (GPx5) exist (supplementary fig. S1, Supplementary Material online). Three nonancestral selenoprotein genes (SelT3, SelR4, and DI4) occur in a single species and provide no divergence information (supplementary fig. S1, Supplementary Material online). Therefore, the divergence of 35 Sec (SelL has two Sec codons) throughout the phylogeny is considered. Only six Sec/Cys replacements were found in the whole vertebrate tree, all leading to the loss of Sec (fig. 1). Four of these changes to Cys occur deep within the vertebrate phylogeny, and their subsequent divergence can be simulated in the corresponding clades. Together with the 6 Cys sites at the root of the tree, the divergence of a total of 10 Cys sites is, then, simulated.
To test whether Sec has an increased sensitivity to rising O2 levels in vertebrate evolution (Berner 2006; Berner et al. 2007), we computed the correlation of the inferred ancestral selenoproteome sizes along the internal nodes in the phylogeny with the estimated levels of atmospheric O2 in the corresponding geological periods (see Materials and Methods). The predicted negative correlation, at the 5% significance level, between selenoproteome size and environmental O2 levels is not significant (r = −0.09516, P = 0.38430).
Our neutrality test on the pattern of divergence of all enzymatic Sec (DSec→aa = 0.171, P < 0.0001 and homologous Cys (DCys→aa = 0, P < 0.0001) sites over half a billion years of vertebrate evolution is consistent with strong purifying selection (fig. 2A, DSec→aa in blue, and B, DCys→aa in blue). This experiment, though, informs us of the extent of constraint acting on these sites, not of the functional exchangeability between Sec and Cys residues. To test for this, we introduce an exchangeability test (see Materials and Methods) that takes into account the chemical analogy and evolutionary homology of Sec and Cys residues in the context of pervasive purifying selection for any other amino acid substitution (i.e., other than Sec/Cys replacements). In practice, we derive a null model through simulations of the evolutionary process where strongly deleterious mutations (with strongly negative fitness effects) are immediately eliminated from the population and only Sec/Cys neutral substitutions contribute to selenoproteomes divergence. This test shows a deficit of divergence when compared with the neutral expectation, which is consistent with strong purifying selection and indicates very low exchangeability between Sec (DSec→Cys = 0.171, P < 0.0001) and Cys (DCys→Sec = 0, P = 0.0008) residues in vertebrate proteins (fig. 2A, DSec→Cys in red, and B, DCys→Sec in green). Although DSec→aa and DSec→Cys are different evolutionary measures, they have the same value in vertebrates because only changes between Sec and Cys residues were found. Both tests are robust to mutation heterogeneity across a genome (see Discussion and supplementary fig. S2, Supplementary Material online).
We further investigated the role of environmental Se in driving Sec/Cys changes by examining the diversity of Sec/Cys use in humans. A few populations are known to currently inhabit regions of Se deficiency, whereas others are in regions of borderline Se toxicity (Levander 1987), but the question of whether Se geographical distribution has historically shaped human diversity is better served by an unbiased sample of human populations, which provide a cross-section of Se nutritional histories in the world (supplementary table S5, Supplementary Material online). All Sec and homologous Cys sites in the human genome were genotyped in the HGDP–CEPH panel (Cann et al. 2002) and no variation was found. Neutrality cannot be rejected as an explanation for the absence of variants (P = 0.83 by analytical method, P≥0.99 by coalescent simulations) due to power limitations given the population sample size, the small number of Sec and Cys sites, and the little average diversity of the human genome. Nevertheless, the absence of polymorphism observed suggests that natural variation in these sites is rare, if at all present, in human populations. This is consistent with a minor role for Se availability in shaping Sec use in human proteins.
A fundamental question in Se biology is the extent of functional exchangeability between Sec and Cys amino acids, a measure of the distinct contribution of Sec to protein function. Sec is a nonstandard amino acid, and previous evolutionary studies on amino acid exchangeability have not considered this rare residue. To gain insight into this question, we have characterized the evolutionary forces shaping the exchange of Sec/Cys residues in vertebrates, a challenging inference given the small number of Sec sites in vertebrate proteomes. We believe this approach to be superior to physicochemical or experimental measures of exchangeability (Grantham 1974; Miyata et al. 1979; Yampolsky and Stoltzfus 2005) for the question at hand, as it discerns selection from mutational biases and it accounts for different fitness effects due to the use of a Se-dependent amino acid in proteins. The recent characterization of vertebrate selenoproteomes is believed to be quite complete (Kryukov et al. 2003; Castellano et al. 2004, 2005; Shchedrina et al. 2007), and the knowledge of the vertebrate phylogeny and mutation rates enables us, for the first time, to test current hypothesis on the role of Se and Sec in protein activity.
Our results are consistent not only with strong purifying selection acting on both Sec and Cys sites (as expected from functional sites), but also with a low level of functional exchangeability between the two residues over half a billion years of vertebrate evolution. These results underscore the unique role of Sec in protein activity. In interpreting these findings, it is worth noting that, as any evolutionary inference, they depend on the null model adopted and the test statistic used. In our simulations of the neutral divergence of vertebrate selenoproteomes, the expected number of synonymous substitutions per synonymous site is used as a proxy of the neutral mutation rate (see Materials and Methods). Synonymous mutations in mammals and other vertebrates with small population sizes are commonly assumed to be neutral. Although many synonymous mutations are no doubt free from selection, selective pressures related to translational efficiency, mRNA stability, splicing control, and others suggest that weakly purifying selection may act on an unknown fraction of synonymous sites (Chamary et al. 2006). Weakly purifying selection would make us underestimate mutation rates in vertebrate genomes, but would not compromise the tests. On the contrary, a slower neutral rate of evolution would make our tests conservative in the inference of purifying selection, a statistical property shared by our divergence summary statistic (see Materials and Methods).
A more problematic bias would be the underestimation of the extent of mutation rate heterogeneity in a genome, which would result in an overestimation of sequence divergence. Such biased neutral expectation could result in the false inference of constraint. Several lines of evidence, though, suggest that this is an unlikely explanation to our results. First, synonymous sites in selenoproteins and Cys-homolog genes between humans and chimpanzees are not unusually constrained, suggesting that mutations accumulate at a typical rate in these genes (Castellano S, data not shown). Second, selenoproteins and Cys-homolog genes are located in different chromosomal regions within and between species genomes. That the distribution of a large fraction of these genes consistently overlaps regions of low mutation in most species, as the pervasive purifying selection inferred above would imply, is highly improbable. Third, neutral simulations with increasing levels of mutation rate heterogeneity suggest that our tests are, to a large degree, robust to nonuniform mutation rates. Therefore, all evidence supports that vertebrate selenoproteomes are selectively constrained and that such evolutionary conservation can be of functional relevance.
Accordingly, we discuss previously proposed selective pressures on Sec usage in the context of the inferred constraint:
We have derived a global measure of functional exchangeability across vertebrate selenoproteins and selenoproteomes and provided the first evolutionary assessment of several selective pressures proposed to drive Sec use in proteins. The low exchangeability between Sec and Cys residues is better explained by strong natural selection due to Sec/Cys functional differences and, at best, a moderate role of environmental and metabolic forces, suggesting caution in the interpretation of evolutionary trends in Sec usage as ecological adaptations. Although our results only apply to the vertebrate clade, we feel that common claims of ecological adaptations in the Se field may be premature. Despite the difficulties and uncertainties associated with any molecular inference of the past, different selective factors leave different signatures of selection and these adaptive hypotheses can be examined through established evolutionary principles. Strong evidence for selection is most needed for genes of plausible ecological importance, like selenoproteins, as apparent selective factors may discourage considering alternatives to environmental adaptations (Gould and Lewontin 1979; Mitchell-Olds et al. 2007). Furthermore, natural selection is just one of several evolutionary mechanisms responsible for differences at the molecular level (Lynch 2007) and, despite typical assumptions in Se biology regarding the role of natural selection, no Sec to Cys or Cys to Sec substitution has yet been shown to be adaptive. Whether nonneutral evolutionary processes are responsible for some of these amino acid replacements is unknown. Similarly, whether adaptation to local Se levels or other selective factors have driven the evolution of selenoprotein expression, Se intake, metabolism or transport has not been addressed. These are open questions in Se biology.
A better understanding of the selenoproteomes and neutral evolutionary patterns in other taxa will be necessary to fully assess the generality of our conclusions. For example, the recent identification in the Drosophila clade of the first animal without selenoproteins is remarkable (Drosophila 12 Genomes Consortium 2007). Although all known Drosophila species have three selenoproteins, Drosophila willistoni has none. Indeed, insects seem to have a higher number of Sec/Cys exchanges in proteins than vertebrates (Chapple and Guigó 2008; Lobanov et al. 2008a, 2008b). The evolutionary forces and selective pressures, if any, driving these replacements are still unclear. Beyond the Sec residue, the evolutionary forces targeting selenoprotein genes as a whole are also poorly known. A notable exception is the Glutathione peroxidase 1 gene, which may have been under adaptive evolution in recent human history (Foster et al. 2006). In any case, if the results obtained here are representative of more divergent species, the certain conclusion is the unique role of Sec in protein activity and evolution. Overall, Sec and Cys residues may be less functionally exchangeable than usually thought and, if some instances of Sec/Cys substitutions have been adaptive in vertebrates or other taxa, Sec distinct biochemical properties, and not Se geographical distribution, global O2 levels nor metabolic cost, may have played a major role in the evolution of selenoproteomes.
S.C. thanks S.R. Eddy for time and resources to complete this manuscript. We thank M.J. Berry for helpful comments and suggestions; E.H. Margulies for sharing unpublished data on vertebrate rates of neutral evolution; R.A. Berner for providing up-to-date estimates of atmospheric O2 in the Phanerozoic eon; and M. Vallés for technical assistance. This work was supported by grants BIO2006-03380 from the Spanish Ministry of Education and Biosapiens LSHG-CT-2003-503265 from the European Commission (FP6 Program) (to R.G.) and NIH GM065509 (to A.G.C.).