Although much effort has previously been devoted to identifying selenoprotein genes and Sec insertion machinery, evolution of selenium utilization traits remained unclear. Some primary considerations concerning the phylogeny of Sec incorporation and the evolution of Sec have previously been proposed [33
]. The major usage of selenium in nature appears to be in co-translational incorporation of Sec into selenoproteins. In addition, 2-selenouridine, a modified tRNA nucleotide in the wobble position of anticodons of some tRNAs, has been identified as a second selenium utilization trait [13
]. A common feature between the two selenium utilization traits is that both use selenophosphate as the selenium donor. Therefore, SelD is considered to be a general signature for selenium utilization.
In the present study we scrutinized, using various methods, homologous Sec- and Cys-containing sequences evolved in bacterial genomes, which provided important new insights into the dynamic evolution of selenium utilization in bacteria. The widespread taxa distribution of selenium utilization traits agreed with the idea that selenium could be used by various species in almost all bacterial phyla. However, among all sequenced bacterial genomes, only 21.5% possess the Sec-decoding trait and 25.2% the selenouridine-utilizing trait, suggesting that most organisms lost the ability to utilize Sec or selenouridine. It should be noted that many Sec-decoding organisms also possessed the selenouridine-utilizing trait and vice versa, suggesting that the two traits might have evolved under similar environmental conditions (for example, selenium supply) or could influence evolution of each other. However, the occurrence of many organisms containing only one of these traits indicates that selenium availability is not the sole factor responsible for acquisition or loss of either trait, and suggests a relatively independent and complementary relationship between the two selenium utilization traits. The presence of SelD as a single selenoprotein in several YbbB-containing species reinforces the idea that the traits might have a complementary relationship (specifically, the Sec-decoding trait might be maintained for SelD, which in turn supports both itself and selenouridine synthesis). In addition, the presence of an 'orphan selD' (one that is not associated with either trait) in both bacteria and archaea raised the possibility of a third, currently unknown selenium utilization trait.
We built the phylogenetic trees for both the components of selenium utilization traits and selenoproteins by several independent methods. The topologies of these inferred trees were supported by most individual trees. In addition, phylogenies of SECIS elements in different bacterial selenoprotein genes were also consistent with those of selenoproteins (data not shown), suggesting that both SECIS elements and selenoproteins have similar evolutionary trends.
To establish the correspondence between the inferred phylogenies for the components of the two selenium utilization traits and the general evolutionary trend, we measured, for each pair of organisms, the correlation between the similarity of orthologous pairs and that of the 16S rRNAs (as controls). The correlation coefficient was 0.68-0.79 (Figure ). After removing the HGT cases, all correlation coefficients were even higher (≥ 0.9). The data suggest that the inferred phylogenetic trees are consistent with the evolutionary distance derived from 16S rRNAs, and that selenium utilization systems in most bacterial species were inherited from a common ancestor in the same phylogenetic lineage.
Figure 5 Evolutionary divergence of components of two selenium utilization traits extracted from the datasets identified in this work. Each graph contains 100 randomly selected organism pairs (points). The protein similarity (sequence divergence [SD]) of each (more ...)
HGT events have contributed to the evolution of Sec-decoding or selenouridine-utilizing traits. However, detection of HGT of the entire trait is difficult, especially for the Sec-decoding trait, because these events are rare. In our study, besides the HGT event previously reported for the Sec-decoding trait [13
], we found that all Sec-decoding organisms in Alphaproteobacteria, Betaproteobacteria
, and Gammaproteobacteria/Pseudomonadales
possess similar selA-selB-selC
operons and a neighboring fdhA
gene, which encodes the only selenoprotein in these organisms (Figure ). Our data provide support for the idea that a Sec-decoding HGT event can occur only if selA, selB
, and selC
genes are organized in a cluster and the transfer event is accompanied by co-transfer of at least one selenoprotein gene (most often fdhA
, or selD
is absent). In addition, because SelD and YbbB are the only known components of the selenouridine-utilizing trait and their genes almost always form an operon, additional co-transfer events could be observed (although we did not detect examples of the HGT of both traits). In some phyla both selenoprotein-containing organisms and sister organisms lacking selenoproteins possess selD
; this fact suggests that evolution of SelD is relatively independent from other components of the Sec-decoding trait.
That either FdhA or SelD were present in every selenoproteome supports the idea that one or both of these two selenoprotein families are largely responsible for maintaining the Sec-decoding trait. Deltaproteobacteria, Firmicutes/Clostridia, and Actinobacteria were three selenoprotein family rich phyla, which had all 25 selenoprotein families and represented 17 out of 18 (94.4%) selenoprotein-rich organisms. The families containing rare selenoproteins (with number of selenoproteins below five) were only present in Deltaproteobacteria and Firmicutes/Clostridia, suggesting an active evolution of new selenoproteins in these two separate phyla. Considering the bias of distribution of sequenced bacterial genomes, additional selenoprotein-rich phyla or organisms might be identified in future.
A total of 31 known selenoproteins were found in Syntrophobacter fumaroxidans
), which is the largest selenoproteome reported thus far. This organism has multiple glpC
operons. Phylogenetic analyses of the genes in these operons suggested that the hdrA-frhD-frhG-frhA
cluster was laterally transferred between Sec-decoding Archaea
(Figure ). Compared with other lateral gene transfers between archaea and bacteria [34
], selenoprotein gene transfers would be more difficult because of different mechanisms of Sec insertion into polypeptide chains [9
]. No remnant bacterial-type SECIS structures could be found in archaeal selenoprotein genes or archaeal-type SECISes in bacterial selenoprotein genes. However, Deltaproteobacteria
contained a five-gene operon which included GlpC, another selenoprotein family, in addition to the genes present in Sec-decoding archaea; also, complex evolutionary processes including gene duplications and gene fusion events involving hdrA
were observed in Deltaproteobacteria
. These facts suggest that Deltaproteobacteria
might have gained the original four-gene operon from Sec-decoding archaea. Coherent clustering of selenoprotein genes in Sec-decoding archaea and Deltaproteobacteria
, and the absence of the same operon in closely related organisms indicate that this lateral transfer might have happened only recently.
The analysis of selenoproteins and the complementary sets of Cys-containing homologs offered us a model system in which to analyze the origin and evolution of various selenoproteins. Although the majority of selenoprotein families have rare selenoproteins and widespread Cys-containing homologs (Additional data files 1 [Table S1] and 2 [Figure S2]), we found that several selenoproteins, including FdhA, SelW-like, and glycine reductase selenoproteins A (GrdA) and B (GrdB), have very few or even lack Cys-containing homologs in Sec-containing organisms (Additional data file 2 [Figure S3]). This observation suggests that Sec is the original form of these proteins. Moreover, by analyzing the phylogenies of 25 bacterial selenoprotein families, we detected more than twice as many Cys→Sec conversions as Sec→Cys events. In addition, the Cys→Sec conversions were detected in many thiol-based oxidoreductase families, suggesting that in most selenoprotein families there is a general trend toward Sec acquisition by replacement of catalytic redox-active Cys residues with Sec. It is possible that such replacements could be stabilized by vicinal residues in the active sites of these proteins. However, no such events were detected for FdhA, the most widely distributed and abundant selenoprotein family, as well as for SelD. We hypothesize that evolution of the Sec-decoding trait in most cases parallels the evolution of FdhA. Consistent with this idea, the genes for the Sec-decoding trait and FdhA are often in the same operon in Sec-decoding organisms, particularly those containing a single selenoprotein gene. Taken as a whole, these data suggest that acquisition of Sec-containing FdhA occurs via vertical or lateral inheritance of the Sec-decoding trait. SelD might be a second selenoprotein that helps to maintain the trait in organisms that lack FdhA. The requirement for FdhA or SelD to maintain the Sec-decoding trait and the scattered occurrence of other selenoproteins further illustrate a highly dynamic nature of Sec evolution.
Because new selenoproteins frequently evolve from their Cys-containing homologs, why do organisms have only a limited number of selenoproteins and why do so many organisms lack selenoproteins altogether? One hypothesis is that the Sec insertion trait is not stable, and evolution of new selenoproteins is balanced by selenoprotein loss in closely related organisms. To investigate the possibility of phylum-specific selenoprotein losses, we adopted an approach that relies on similarity between sister and relatively distant organisms. Similar methods have previously been used to analyze a general trend toward amino acid gain and loss in proteins [35
]. Because the sister species selected for each selenoprotein-containing organism are closely related, the observed results directly reflect only about the past 30 million years of evolution. We found that all 38 selenoprotein loss events, including the six SelD losses that were accompanied by the loss of the entire Sec-decoding trait in sister genomes, occurred in the selenoprotein family-rich phyla Firmicutes/Clostridia
. Organisms in these phyla reflect a balanced pattern of ongoing selenoprotein origin and loss. The most plausible hypothesis to explain the loss of selenoproteins might relate to a universal, intrinsic, and long-term trend that emerged in both ancient and extant organisms. During this period, some ancient selenoprotein families might have been lost in most or all organisms, or some ancient organisms might have disappeared that contained ancient selenoproteins. Our hypothesis is consistent with the recently proposed 'balance hypothesis', which suggests that gene gain and loss in prokaryotes are balanced to keep prokaryotic genome size relatively constant [36
]. However, the evolutionary forces modulating the balance are unclear.
To gain insight into the factors that influence maintenance/acquisition/loss of selenium utilization traits and Sec/Cys conversions, we analyzed environmental conditions (for example, habitat, oxygen requirement, optimal temperature, and optimal pH) and other factors (such as genome size and GC content) for all 349 bacteria for which completely or almost completely sequenced genomes are available, and compared those containing the Sec and/or selenouridine traits with those that do not. First, we found that the organisms possessing the Sec-decoding trait (especially those that have Sec but not selenouridine traits) favor anaerobic and hyperthermic conditions (Additional data file 1 [Tables S4 and S5] and Figure ). In contrast, organisms possessing the selenouridine trait (in the situations in which the Sec trait has been lost) favor aerobic environment and mesophilic conditions. Thus, decrease in oxygen concentration and increase in optimal growth temperature appeared to preserve or even stimulate the use of Sec (Figure ).
Figure 6 Relationship between selenium utilization traits and environmental factors (oxygen concentration and optimal growth temperature). Organisms were classified into four groups, including those containing the following: the Sec trait only, both Sec and selenouridine (more ...)
Second, for various selenoprotein families, we examined distribution, based on several environmental factors, of organisms that have selenoproteins (the Sec form) and the Sec trait; Cys-containing homologs of selenoproteins (the Cys form) and the Sec trait; the Cys form and no Sec-decoding trait; and neither Sec nor Cys forms of selenoproteins and no Sec trait. For this analysis, we selected six selenoprotein families that have selenoproteins in at least 10 organisms and widespread Cys-containing homologs. For most of these selenoprotein families, a similar trend was found in which anaerobic conditions correlated with the presence of the Sec form (for instance, when species containing selenoproteins and the Sec trait were compared with those that had Cys-containing homologs and the Sec trait; see examples in Figure and Additional data file 1 [Tables S6 and S7]). Our data again suggested that low oxygen level (anaerobic conditions) is the factor that promotes the use of Sec forms. It is possible that at high oxygen concentrations organisms could not tolerate the highly reactive Sec residue, which could be easily oxidized and could then support generation of reactive oxygen species. As a result, negative selection effects at the DNA level (either the loss of the whole Sec trait or Sec→Cys conversion) may be promoted under these conditions. Table shows a summary of observed relationships between different environmental factors and conditions and selenium utilization traits. However, we did not observe a relationship between these factors and the number of selenoproteins (selenoproteome) in organisms, as well as between these factors and the presence/absence of different selenoprotein families. A future challenge would be to discover additional trends that influence selenium utilization in all three domains of life.
Figure 7 Relationships between two representative bacterial selenoprotein families and oxygen requirement of organisms containing these proteins. Organisms containing a member of a specific selenoprotein family (either Sec or Cys forms) were divided into four (more ...)
General trends and correlations between changes in environmental factors, occurrence of selenium utilization traits, and occurrence of selenoproteins and their Cys-containing homologs in bacteria