This study was conducted to explore the use of large-scale genome sequence analysis to better describe the population genetics, genome content, and biological properties of naturally occurring, uncultivated microbial species. We focused on the description of an abundant but little understood component of marine plankton, the pelagic crenarchaeotes. Our approach facilitated (i) description of the structure and gene content of large genomic regions flanking the rRNA operon from uncultivated marine archaea, (ii) cross comparison of large genomic regions of allopatric marine crenarchaeotes (from Antarctica, deep temperate waters, and a marine sponge), and (iii) assessment of the genomic heterogeneity of sympatric crenarchaeotes that originate from the same population and that share identical or nearly identical rRNA sequences.
The 16S rRNA sequences of all five unique archaeal fosmids from the Antarctic picoplankton library were highly similar. One (15G10) was identical in sequence to a PCR-amplified 16S rRNA gene (ANTARCTIC 12) isolated from the same Antarctic waters in 1993 (3 years prior to the sampling of this report) (14
). The sympatric archaeal 16S rRNA genes differed from one another by a maximum of three nucleotide substitutions over 1,418 nucleotide residues. However, the different restriction fragment length polymorphism patterns of the fosmids could not be explained solely by the distance between the rRNA operon and the cloning sites and suggested significant sequence divergence between these highly related variants. Since it was possible that the protein sequences and gene organization were less conserved than rRNA (as recently observed for natural variants of C
), we characterized the GSAT gene from the different clones. Microheterogeneity in the DNA sequence of the GSAT gene was observed for all clones, including 31B2 and 83A10, which were identical over the entire 16S rRNA gene. To our knowledge, this is one of the first studies that directly links 16S rRNA gene variation to heterogeneity in flanking protein coding genes in sympatric free-living microbes. Our study shows that within a single microbial population, considerable genomic variation exists, even among microbes with identical 16S rRNA gene sequences.
Sequence analysis of the 43- and 42-kb genome fragments derived from marine archaea from two different oceanic provinces showed many features typical of the domain Archaea
. The majority of the genes identified, including those whose functions could not be predicted, most closely resembled archaeal protein genes (data not shown). Some features of these genome sequences, including the rRNA gene order, chromosomal organization, and nucleotide sequence of the genes for EF1-α and ribosomal protein S10, resembled those of other Crenarchaeota
). The observation that the GSAT gene is located downstream of the ribosomal operon in all planktonic marine crenarchaeotes analyzed to date (41
; this study) also suggests some chromosomal organization common to marine group I crenarchaeotes. Specific gene sequences that we recovered might provide further insight into the relationship of marine crenarchaeotes to other cultivated species. For instance, EF1-α (EF-Tu in Bacteria
) is a highly conserved protein that is found in all cellular organisms and that has proven extremely useful for global phylogenetic comparisons (2
). Both distance and parsimony analyses of the EF1-α amino acid sequence derived from Antarctic fosmid 74A4, however, could not resolve its placement within the Crenarchaeota
(data not shown). Other important features of EF1-α are specific insertions and deletions among homologs that provide evidence for a specific evolutionary linkage between eukaryotes and crenarchaeotes. Antarctic crenarchaeotal EF1-α did contain an 11-amino-acid insertion (data not shown) that is characteristic of Eucarya
but not Euryarchaeota
or bacteria (34
). The sequence of the 11-amino-acid insertion of 74A4 most closely resembles the insertion of Pyrobaculum aerophilum
, another crenarchaeote (four-amino-acid sequence difference; data not shown). This observation, together with the deep branching of planktonic archaeal 16S rRNA and of the EF2 amino acid sequence of fosmid 4B7 (42
) in phylogenetic trees, could reflect a nonthermophilic origin of the crenarchaeotal subdivision. Alternatively, EF1-α homologs from as-yet-uncultured thermophilic relatives of low-temperature crenarchaeotes that have been detected in hot springs (3
) may branch more deeply, placing these thermophilic groups basal to cultivated crenarchaeotal lineages.
Small cold shock proteins were believed to be present only in bacteria and eukaryotes (18
). To date, no cold shock genes have been found in the archaeal genomes that have been entirely sequenced. It was therefore surprising to find a gene encoding a cold shock protein on fosmid 4B7. Based on amino acid similarity, this putative cold shock protein resembles those of bacteria. The observation that no other sequenced archaeal genomes encode members of the small cold shock protein family raises the possibility of lateral gene transfer of this gene into cold-adapted archaea from bacteria. Several other genes present on the two marine archaeal fosmids may have been acquired from bacteria, for example, the genes for double-stranded beta-helix fold proteins, the SWI/SNF helicase, and peptide methionine sulfoxide reductase. Even more unexpectedly, we identified a gene coding for a C2
Zn finger protein that so far has been found only in eukaryotes.
Analysis of the combined 80 kb of sequence data has identified several genes indicative of metabolic pathways (Tables and ). However, the lack of known transporters on the two fosmids makes it difficult to predict possible components being taken up by the archaeal cells. Additional data obtained for C
and from other techniques, such as stable isotope and natural radiotracer analyses (34
) and microautoradiography and fluorescence in situ hybridization (23
), should provide more insight into the potential metabolic traits of uncultivated marine archaea.
With regard to the marine planktonic crenarchaeotal clade in general, there exists considerable divergence and genome evolution. The 16S rRNA genes in fosmids 4B7 and 74A4 and C. symbiosum all share greater than 94% sequence similarity. However, the regions surrounding the rRNA operons vary substantially, indicating extensive genome rearrangements and various genome contents. These differences are also likely reflected in the phenotypic properties of the different crenarchaeotes that occupy the different oceanic regions.
Variation among homologous protein coding genes from microbes that share moderately similar (97%) 16S rRNA gene sequences has been reported for Prochlorococcus
isolates derived from the same sample. Prochlorococcus
isolates MED and SS120 shared 98% sequence similarity in their 16S rRNAs (44
) but were only 76% identical based on their RNA polymerase C1 gene sequence (15
). To our knowledge, however, genome variation among free-living, sympatric, uncultivated microbes has never been reported. Our data now provide a significant perspective on the extent of genome variation that can exist within a single population of free-living microbial cells that share identical (or nearly so) rRNA gene sequences. Our data suggest that the observed seasonal maximum of planktonic crenarchaeotes in Antarctic waters (29
) is composed of (minimally) four highly related, yet nonidentical, co-occurring strains or variants. Of course, due to the labor and resource intensiveness of our procedures, our library screening procedure severely undersamples the actual population. Despite this undersampling, however, we did not recover any one dominant or identical genotype. Rather, we recovered identical or nearly identical rRNA phylotypes with significant differences in flanking genomic regions. Greater variation would be expected to be observed with larger sample size. These data strongly suggest that even within a single population, a very large amount of genomic heterogeneity exists that is undetectable by 16S rRNA sequence variation.
Presumably, genomic microheterogeneity can generate, eventually, physiological diversity. Even small variations among protein coding genes, such as those found here in sympatric archaeal cells that share identical or nearly identical rRNA gene sequences, could provide a selective advantage to the different genotypes under fluctuating environmental conditions. Such microvariations could confer greater fitness to the population as a whole under various environmental conditions, relative to any individual clonal phenotype. Our data strongly suggest that naturally occurring populations of bacteria and archaea can be viewed as nonclonal populations that harbor tremendous allelic variation.