Growing databases of completely sequenced genomes allow the exploration of patterns of interoperonic divergence among 16S rRNA sequences and provide critical information for the assessment of microbial diversity and evolution. Among Bacteria
, classes of up to seven operons appear to be common, with no clear predominance of a single class of operon numbers (Fig. ). Nonetheless, as previously noted (11
), about 40% of bacteria have fewer than two operons. The picture is different for Archaea
, among which the majority of strains have been shown to have a single operon and no genomes with more than four operons have been reported to date (Fig. ). A detailed analysis of divergence among the 16S rRNA genes in completely sequenced bacterial genomes revealed that ~40% of operons contain sequences identical to those of other operons (Table ). This number appears to be much smaller for the Archaea
; however, few completely sequenced genomes are available (Table ). Overall, the large majority of 16S rRNA sequences from the same genome display very high similarities, with the ranges and averages remaining within a 1% nucleotide difference (Table ). Only five genomes with extreme divergence among operons were detected, so overall, few incidences of HGT between divergent genomes are suggested.
Based on the level of divergence and redundancy among 16S rRNA sequences between operons of the same genome, more accurate bounds for diversity estimates of bacterial communities can be suggested. The analysis showed that 76 genomes with multiple operons contained 221 sequences (Table ). Thus, if 16S rRNA gene diversity among these genomes were to be analyzed analogously to microbial communities by cloning and sequencing, a roughly threefold overestimation of diversity would result. However, this clearly represents an upper bound since a considerable fraction of genomes contain single operons. The magnitude of this fraction is difficult to estimate since genome sequences are currently derived from cultured strains. Among these, organisms with multiple operons are likely overrepresented since they appear to be more adaptable to changing environmental conditions and grow more readily on culture media (5
). This also makes it likely that environments which display more stable conditions overall harbor bacteria with fewer operons, leading to a less severe overestimation of diversity. However, there are currently 21 genomes with a single operon available. Adding these to the above estimate provides a lower bound of diversity overestimation of 2.5-fold. Thus, overall we suggest this value as a conservative bound for the correction of bacterial diversity estimates by cloning and sequencing.
Operon numbers appear conserved overall among closely related organisms, but even among strains of the same species small-scale variation is evident (Table ). Among more distantly related organisms, no pattern of high or low operon numbers emerged from the analysis, so correction factors can only be applied to overall estimates of microbial diversity, not to individual phylogenetic groups. However, three notable exceptions were evident. Despite the considerable numbers of strains analyzed, the α-Proteobacteria
, and mycoplasma strains appear to contain only low numbers of operons. For example, no α-Proteobacteria
with more than four operons have been described to date, and the seven genomes available for α-Proteobacteria
show high homogeneity, with only seven 16S rRNA sequences. This suggests that diversity estimates by clone libraries may be more accurate for this phylogenetic group than for others and that the α-Proteobacteria
may overall be adapted to relatively stable environmental niches. In this context, it may be predicted that newly isolated representatives of the SAR11 clade, which dominates the open ocean environment (33
), also contain few rRNA operons.
The operon comparison revealed the highest divergence to date among 16S rRNA genes within a single genome and showed that four of the five examples of highly divergent 16S rRNA sequences stem from thermophilic organisms. Thermoanaerobacter tengcongensis
displayed 11.6% nucleotide divergence due to 188 polymorphic sites among its four 16S rRNA genes. A secondary structure analysis suggested that the rrnC
operon arose via HGT since none of the divergent nucleotides appeared to disrupt the functional configuration of the molecule. Indeed, the three insertions in rrnC
result in two longer but perfectly matched stems compared to the other operons (Fig. ). Similar length differences have been detected in the thermophilic bacterium D. kuznetsovii
). Additional evidence for HGT of the rrnC
operon in Thermoanaerobacter tengcongensis
is provided by its higher similarity (95%) to other Thermoanaerobacter
species (T. subterraneus
SL9 and T. keratinophilus
Whether there is an ecological significance to the occurrence of extreme divergence in thermophiles remains unknown, but for at least some strains it has been confirmed that the divergent rRNAs are transcribed and are thus likely functional (41
). However, the pattern may suggest that genomes of thermophiles are prone to HGT. This is supported by the suggestion of HGT in other thermophiles (23
). For example, extensive studies of strains of a Thermotoga
sp. showed that ~25% of the genes are likely of archaeal origin (29
), and a comparison of the genomes of “Pyrococcus abyssi
,” Pyrococcus furiosus
, and Pyrococcus horikoshii
suggested the occurrence of extensive HGT (25
Despite some extreme cases of 16S rRNA divergence among five genomes, overall a clear dominance of close relationships exists, with the vast majority of interoperonic sequence differences showing <1% divergence (Table ). Thus, 16S rRNAs may primarily diverge due to mutation or HGT between closely related organisms only. This conforms to the complexity hypothesis, which states that successful HGT over large phylogenetic distances should be a rare occurrence for rRNA genes (1
). Because the rRNAs are structural molecules, successful interactions with a large number of other gene products are dependent on the primary sequence of the rRNAs and should theoretically limit functionality in a highly heterologous genomic background. On the other hand, the rRNA genes, as members of a multigene family, are subjected to homogenization processes such as gene conversion (15
). Were such processes to occur at high rates, they would relatively quickly erase traces of HGT even if it occurred between distantly related organisms. Nonetheless, genome sequences, taken as a snapshot of the incidence of HGT of 16S rRNA genes between phylogenetically distant organisms, currently confirm that the rRNAs provide a relatively solid framework for the estimation of phylogenetic relationships.