|Home | About | Journals | Submit | Contact Us | Français|
Planktonic crenarchaeotes are present in high abundance in Antarctic winter surface waters, and they also make up a large proportion of total cell numbers throughout deep ocean waters. To better characterize these uncultivated marine crenarchaeotes, we analyzed large genome fragments from individuals recovered from a single Antarctic picoplankton population and compared them to those from a representative obtained from deeper waters of the temperate North Pacific. Sequencing and analysis of the entire DNA insert from one Antarctic marine archaeon (fosmid 74A4) revealed differences in genome structure and content between Antarctic surface water and temperate deepwater archaea. Analysis of the predicted gene products encoded by the 74A4 sequence and those derived from a temperate, deepwater planktonic crenarchaeote (fosmid 4B7) revealed many typical archaeal proteins but also several proteins that so far have not been detected in archaea. The unique fraction of marine archaeal genes included, among others, those for a predicted RNA-binding protein of the bacterial cold shock family and a eukaryote-type Zn finger protein. Comparison of closely related archaea originating from a single population revealed significant genomic divergence that was not evident from 16S rRNA sequence variation. The data suggest that considerable functional diversity may exist within single populations of coexisting microbial strains, even those with identical 16S rRNA sequences. Our results also demonstrate that genomic approaches can provide high-resolution information relevant to microbial population genetics, ecology, and evolution, even for microbes that have not yet been cultivated.
Molecular phylogenetic surveys of rRNA genes have altered the perspective on naturally occurring microbial diversity and distribution (for a review, see references 11, 19, and 32). Despite a greater understanding of microbial identity, distribution, and abundance, rRNA-based gene surveys have provided little information about the biological properties of many planktonic microbes, especially groups for which there are no cultivated representatives. In addition, although rRNA microheterogeneity (variation within highly related rRNA sequence clusters) has been observed in virtually all taxa and environments sampled via rRNA gene surveys, the significance of this phenomenon remains uncertain. Recent advances in genomic sequencing techniques and the development of methods for cloning large genome fragments into fosmid (35, 41, 42, 45) or bacterial artificial chromosome (5, 6, 38) vectors now provide the means to characterize the gene content (41), metabolic potential (5), and population genetics (41) of uncultivated microorganisms, otherwise known solely by an rRNA sequence. The utility of such genomic approaches and their applicability to diverse questions in microbial ecology have only begun to be explored and exploited.
Members of the Archaea (48) are much more diverse and widespread than previously suspected. Representatives have now been detected in terrestrial environments, marine and lake sediments, and temperate ocean waters and polar seas (for a review, see reference 10). Marine planktonic archaea have been shown to occur in high relative abundance in the oceanic subsurface (13, 26, 27) and to dominate the prokaryotic fraction in the mesopelagic zone of the Pacific Ocean (20). Planktonic archaea also reach a relative seasonal maximum in winter Antarctic waters, approaching 10 to 30% of the total planktonic microbial population (12, 14, 28, 29). To gain additional information on yet-uncultivated Antarctic archaea, we constructed, by use of a fosmid vector (42), a recombinant DNA library that contained inserts of ~40 kb from surface water picoplankton collected near Palmer Station, Antarctica, in late winter. Planktonic crenarchaeotal genome fragments that contained rRNA genes and originated from the same population were isolated and compared. These within-population genome comparisons yielded high-resolution information on genomic variations of uncultivated, sympatric archaeal cells. The entire sequences of one Antarctic crenarchaeotal clone (fosmid 74A4) and one temperate water subsurface crenarchaeotal clone (fosmid 4B7) (42) were also determined to compare the genomes of related, archaea inhabiting different oceanic provinces. This analysis provided comparative information on more distantly related crenarchaeotes derived from two different oceanic provinces. Our results demonstrate that microbial population structure can be determined at high resolution by examining genome divergence among highly related but genetically distinct cohorts coexisting in the same population. Insights into genomic variation, as it relates to rRNA sequence variation, also can be derived from comparisons of more distantly related microbial species sampled from different geographic locales.
Coastal waters were collected near Palmer Station, Anvers Island, Antarctic Peninsula, in August 1996 during a period of high archaeal abundance (12). The samples were filtered by tangential flow filtration with an Amicon (Beverly, Mass.) DC-10 unit equipped with a 30,000-Da-cutoff polysulfone hollow-fiber cartridge. A total of 1,500 liters were concentrated to a volume of ~900 ml. The cells were collected by centrifugation (4°C, 38,900 × g, 1 h) as previously described (12). The bacterioplankton pellet was embedded in agarose plugs as previously described (42). DNA extraction, preparation of the fosmid library, and multiplex PCR screening by using archaeon-biased 16S rRNA oligonucleotide primers were carried out as previously described (42). PCR primers used for screening the library were the 16S rRNA oligonucleotide primers Ar20-F (TTC CGG TTG ATC CYG CCR G) (13) and Arch958R (TCC GGC GTT GAM TCC AAT T) (9) and the 23S rRNA oligonucleotide primer LS2445a-R (CCC YGG GGT ARC TTT TCT ST) (13).
A subclone library was constructed from fosmid 74A4 with DNA partially digested with RsaI (42). In this library, the content of the fosmid was not randomly represented; therefore, a second library was constructed. This library was prepared with randomly sheared DNA as described by Kawata et al. (21), except that the DNA was sheared by passage through a microemulsifying 25-gauge needle. DNA was cloned by using vector pCR 2.1 and the original TA cloning kit (Invitrogen Corporation, Carlsbad, Calif.). Fosmid subclone plasmids were purified by using a Mini-Prep 24 machine (MacConnell Research Corporation, San Diego, Calif.) according to the manufacturer’s instructions. Nucleotide sequences (800 bp, on average) were determined by the dideoxy termination reaction with fluorescence-labeled M13 forward and reverse primers, a SequiTherm EXCel II sequencing kit (Epicentre, Madison, Wis.), and a model 4200 automated DNA sequencer (LI-COR, Lincoln, Nebr.). The strategy for the construction of the subclone library and for the determination of the sequence of fosmid 4B7 was as previously described (41). Contiguous sequences were assembled by using SEQUENCHER 3.1.1 software (Gene Codes Co., Ann Arbor, Mich.).
In-depth sequence analysis was based primarily on the use of the PSI-BLAST program (1) essentially as previously described (6). tRNAs were searched by using the tRNAscan-SE program (24). The BLAST-derived “e-values” (1) reported here take into account the statistics of database and local alignment size for the similarity scores obtained from local alignments. Signal peptides were predicted by using the SignalP program (30), and transmembrane segments were predicted by using the PHDhtm program (39, 40). Comparisons among 4B7, 74A4, and Cenarchaeum symbiosum (41) fosmids were performed with the BLAST 2 sequences program (43).
For distance and parsimony analyses of the inferred amino acid sequence of translation elongation factor 1-α (EF1-α), the program PaupSearch of the Wisconsin Package, version 10.0 (Genetics Computer Group, Madison, Wis.), was used.
Sequences reported in this study have been submitted to GenBank under the following accession numbers: AF393466, U40238, and AF393304 to AF393307.
An environmental fosmid library was constructed from microorganisms collected from late austral winter surface waters (−1.8°C) near Palmer Station, Antarctica, in 1996 (29). A total of 7,200 recombinants, each harboring 40 kb of Antarctic microbial DNA, were screened by using multiplex PCR (42) and archaeal-specific 16S rRNA primers. Six 16S rRNA gene-containing crenarchaeotal genome fragments were recovered in the library.
Five unique archaeal 16S rRNA-containing fosmids (15G10, 19H8, 31B2, 74A4, and 83A10) were identified by restriction fragment length polymorphism analysis (data not shown). Sequence analyses of the archaeal 16S rRNAs showed that all belonged to group I marine planktonic Crenarchaeota (9). The 16S rRNA gene sequences from fosmid clones 83A10 and 31B2 were identical to one another. One clone (15G10) contained an rRNA gene identical in sequence to a PCR-amplified 16S rRNA gene (ANTARCTIC 12) (14) that was recovered at the same site 3 years prior to the sampling reported here. The 16S rRNA gene sequence variation observed among the other Antarctic archaeal fosmids was limited, occurring at a total of four nucleotide residues within the 16S rRNA gene (Fig. (Fig.11 and and2).2). Relative to those in 83A10 and 31B2, 16S rRNAs in the other clones contained only two (74A4 and 15G10) or three (19H8) nucleotide sequence differences (Fig. (Fig.11).
To gain more insight into the variation within the coexisting archaeal phylotypes, we PCR amplified and sequenced a region containing an intergenic spacer (593 bp) and a portion (590 bp) of the glutamate semialdehyde aminotransferase (GSAT) gene found adjacent to the 16S rRNA gene in planktonic marine group I crenarchaeotes (41, 42). The data revealed that the GSAT gene and the spacer between the GSAT gene and the 16S RNA gene were variable between the different sympatric archaeal genomes (Fig. (Fig.1).1). Most of the variation in the GSAT coding region occurred in the third coding position, not changing the amino acid composition of the coded protein (Fig. (Fig.2).2). Nevertheless, variation in the GSAT amino acid sequence was observed among sympatric archaea that differed by only two or three nucleotides in their 16S rRNA genes. Surprisingly, microvariation in the GSAT gene was also found between clones 31B2 and 83A10, clones that had identical 16S RNA gene sequences (Fig. (Fig.11 and and22).
The entire sequence of the 43.6-kb genome fragment contained on fosmid 74A4 was determined. Forty-nine predicted protein coding frames were identified on fosmid 74A4 by using the National Center for Biotechnology Information BLAST 2.0 program (1), allowing the prediction of protein structural features such as transmembrane segments and signal peptides. Of these predicted proteins, 28 showed significant similarity to the products of genes with known functions, allowing a clear functional prediction, and 7 proteins were homologs of other, uncharacterized proteins. The remaining 14 predicted proteins had no detectable homologs, but some of them were predicted to be either membrane or secreted proteins on the basis of the predicted corresponding structural features (Table (Table1).1). The majority of the encoded proteins showed the greatest sequence similarity to homologs from other archaea. No specific affinity was noted with homologs from the only completely sequenced crenarchaeotal genome, that of Aeropyrum pernix, but several proteins showed greatest similarity to homologs from the partially sequenced genome of another crenarchaeote, Sulfolobus (Table (Table1).1). The protein sequence of EF1-α that showed the highest similarity to EF1-α proteins from Crenarchaeota was used for phylogenetic analysis. Marine crenarchaeotal EF1-α contained a modified 11-amino-acid insert that is shared between eukaryotes and Crenarchaeota but is not found in Euryarchaeota (36). However, both distance and parsimony analyses of the amino acid sequence of EF1-α could not resolve the placement of marine crenarchaeotal EF1-α (data not shown). Affiliation with either euryarchaeotes or crenarchaeotes was not well supported by either method.
A partial analysis of a genome fragment from a marine group I crenarchaeote recovered from a depth of 200 m near the coast of Oregon (e.g., fosmid 4B7) was previously reported (42). To better compare the genome fragment of this subsurface archaeon to that of Antarctic fosmid 74A4, the entire 42-kb sequence of fosmid 4B7 was determined. Protein and rRNA coding genes from fosmid 4B7 are shown in Table Table2.2. The average G+C contents for fosmids 74A4 and 4B7 were 32 and 34%, respectively, significantly different from the 55.6 and 57.1% reported for C. symbiosum (41). Genes on fosmids 74A4 and 4B7 appeared to be densely packed, as reported for other archaeal genomes, and homologs of archaeal genes were dispersed evenly over each of the inserts, strongly suggesting that the 74A4 and 4B7 inserts are contiguous genomic fragments derived from a marine archaeon. Several genes present on each of the analyzed marine archaeal fosmids showed a clear bacterial affinity (Tables (Tables11 and and2;2; see also discussion below). We discuss below the predicted genes classified by functional categories.
Homologs of well-characterized components of the transcription system were found only on fosmid 4B7. These include a predicted SWI/SNF family ATPase, transcription factor IIB, an RNA-binding cold shock protein, and a predicted AsnC family transcriptional regulator. Three genes from 74A4 encode different types of Zn finger or Zn ribbon proteins that also could function as transcriptional regulators. Of particular interest is a small C2H2 Zn finger protein that belongs to a family that so far has been identified only in eukaryotes. Both fosmids contain several genes involved in translation. These include a single tRNA gene on 4B7 and a typical crenarchaeotal rRNA operon (16S-spacer-23S) on both fosmids. Protein components of the translation apparatus encoded on these clones include a cysteinyl-tRNA synthetase which is highly similar to other archaeal cysteinyl-tRNA synthetases, EF1-α, and ribosomal protein S10 on 74A4 and elongation factor 2 (EF2) on 4B7. The genes for EF1-α and S10 form a cluster that so far has not been detected in any other genomes. Another protein potentially involved in translation is a predicted RNA helicase encoded on 4B7.
One DNA repair enzyme, DinB/UmuC, recently identified as a repair-associated DNA polymerase (46), was identified on 4B7 and is most similar to the Dbh protein found in Sulfolobus solfataricus (22). Two other typical archaeal enzymes involved in replication and repair, DNA ligase (ATP dependent) and tRNA intron endonuclease, were identified on 74A4.
Both marine archaeal fosmids encoded several proteins implicated in energy conversion, particularly fatty acid metabolism. These included 3-hydroxyacyl-coenzyme A (CoA) dehydrogenase, acyl dehydratase, a predicted CoA-binding protein, glucose-1-phosphate dehydrogenase, and some other, poorly characterized oxidoreductases. Only one protein, a periplasmic solute-binding protein homologous to those found in iron(III) ATP-binding cassette transporters, was clearly assigned as a protein involved in transport. Other membrane transporters were tentatively identified on the basis of transmembrane segment predictions.
It was noted previously that moderately thermophilic archaea, such as Methanobacterium thermoautotrophicum or Methanosarcina barkeri, encode classical molecular chaperones of the hsp70 (DnaK) and hsp40 (DnaJ) families, whereas archaeal hyperthermophiles do not have those proteins (25). In agreement with this trend, two predicted marine archaeal proteins, 74A4#31 and 4B7#19, contain the J domain of the hsp40 family of chaperones and parts of the heat shock chaperone DnaJ, which interacts with and stimulates the hydrolysis of ATP by the cognate DnaK proteins (47). Protein 74A4#31 also contains a ferredoxin domain that is predicted to bind iron or possibly other ions and might be functionally analogous to the Zn clusters that are present in bacterial and eukaryotic DnaJ proteins. A J domain-ferredoxin fusion has not been reported so far. In contrast, protein 4B7#19 is predicted to be a type I membrane protein in which the J domain is the C-terminal cytoplasmic portion. Another interesting pair of paralogous proteins are 74A4#15 and 4B7#24, which are predicted membrane-associated, collagenase-like, metal-dependent proteases. A cluster of three genes on 74A4 encodes three predicted enzymes of the double-stranded beta-helix fold that might possess a variety of enzymatic activities, for example, that of sugar-phosphate isomerase.
Protein and RNA gene organizations on fosmids 74A4 and 4B7 were compared to that of the C. symbiosum (variant A) fosmid (41) by using the BLAST 2 Sequences program (43). The 23S and 16S rDNA sequences were used as an alignment point for the sequences (Fig. (Fig.3).3). The typical crenarchaeotal rRNA operon (16S-23S) was shared by all three fosmids, and the operon was adjacent to the GSAT gene in all. Fosmids 74A4 and 4B7 shared two other regions in common. The first region includes an unknown open reading frame, the BirA protein gene, and a hypothetical metalloprotease gene. The second region represents an inversion between the two genomes and includes genes for a putative periplasmic-binding protein of an iron(III) ATP-binding cassette transporter and a neighboring hypothetical protein (Fig. (Fig.3).3). Fosmid 74A4 and the fosmid from C. symbiosum shared a region including hypothetical protein 02 and the product of ORF01, previously reported only for C. symbiosum (41), and a putative glucose dehydrogenase which was also reported only for C. symbiosum (GenBank accession number AAC62698).
This study was conducted to explore the use of large-scale genome sequence analysis to better describe the population genetics, genome content, and biological properties of naturally occurring, uncultivated microbial species. We focused on the description of an abundant but little understood component of marine plankton, the pelagic crenarchaeotes. Our approach facilitated (i) description of the structure and gene content of large genomic regions flanking the rRNA operon from uncultivated marine archaea, (ii) cross comparison of large genomic regions of allopatric marine crenarchaeotes (from Antarctica, deep temperate waters, and a marine sponge), and (iii) assessment of the genomic heterogeneity of sympatric crenarchaeotes that originate from the same population and that share identical or nearly identical rRNA sequences.
The 16S rRNA sequences of all five unique archaeal fosmids from the Antarctic picoplankton library were highly similar. One (15G10) was identical in sequence to a PCR-amplified 16S rRNA gene (ANTARCTIC 12) isolated from the same Antarctic waters in 1993 (3 years prior to the sampling of this report) (14). The sympatric archaeal 16S rRNA genes differed from one another by a maximum of three nucleotide substitutions over 1,418 nucleotide residues. However, the different restriction fragment length polymorphism patterns of the fosmids could not be explained solely by the distance between the rRNA operon and the cloning sites and suggested significant sequence divergence between these highly related variants. Since it was possible that the protein sequences and gene organization were less conserved than rRNA (as recently observed for natural variants of C. symbiosum) (41), we characterized the GSAT gene from the different clones. Microheterogeneity in the DNA sequence of the GSAT gene was observed for all clones, including 31B2 and 83A10, which were identical over the entire 16S rRNA gene. To our knowledge, this is one of the first studies that directly links 16S rRNA gene variation to heterogeneity in flanking protein coding genes in sympatric free-living microbes. Our study shows that within a single microbial population, considerable genomic variation exists, even among microbes with identical 16S rRNA gene sequences.
Sequence analysis of the 43- and 42-kb genome fragments derived from marine archaea from two different oceanic provinces showed many features typical of the domain Archaea. The majority of the genes identified, including those whose functions could not be predicted, most closely resembled archaeal protein genes (data not shown). Some features of these genome sequences, including the rRNA gene order, chromosomal organization, and nucleotide sequence of the genes for EF1-α and ribosomal protein S10, resembled those of other Crenarchaeota (7, 16, 17). The observation that the GSAT gene is located downstream of the ribosomal operon in all planktonic marine crenarchaeotes analyzed to date (41, 42; this study) also suggests some chromosomal organization common to marine group I crenarchaeotes. Specific gene sequences that we recovered might provide further insight into the relationship of marine crenarchaeotes to other cultivated species. For instance, EF1-α (EF-Tu in Bacteria) is a highly conserved protein that is found in all cellular organisms and that has proven extremely useful for global phylogenetic comparisons (2, 8, 37). Both distance and parsimony analyses of the EF1-α amino acid sequence derived from Antarctic fosmid 74A4, however, could not resolve its placement within the Crenarchaeota or Euryarchaeota (data not shown). Other important features of EF1-α are specific insertions and deletions among homologs that provide evidence for a specific evolutionary linkage between eukaryotes and crenarchaeotes. Antarctic crenarchaeotal EF1-α did contain an 11-amino-acid insertion (data not shown) that is characteristic of Eucarya and Crenarchaeota but not Euryarchaeota or bacteria (34). The sequence of the 11-amino-acid insertion of 74A4 most closely resembles the insertion of Pyrobaculum aerophilum, another crenarchaeote (four-amino-acid sequence difference; data not shown). This observation, together with the deep branching of planktonic archaeal 16S rRNA and of the EF2 amino acid sequence of fosmid 4B7 (42) in phylogenetic trees, could reflect a nonthermophilic origin of the crenarchaeotal subdivision. Alternatively, EF1-α homologs from as-yet-uncultured thermophilic relatives of low-temperature crenarchaeotes that have been detected in hot springs (3, 4) may branch more deeply, placing these thermophilic groups basal to cultivated crenarchaeotal lineages.
Small cold shock proteins were believed to be present only in bacteria and eukaryotes (18, 33). To date, no cold shock genes have been found in the archaeal genomes that have been entirely sequenced. It was therefore surprising to find a gene encoding a cold shock protein on fosmid 4B7. Based on amino acid similarity, this putative cold shock protein resembles those of bacteria. The observation that no other sequenced archaeal genomes encode members of the small cold shock protein family raises the possibility of lateral gene transfer of this gene into cold-adapted archaea from bacteria. Several other genes present on the two marine archaeal fosmids may have been acquired from bacteria, for example, the genes for double-stranded beta-helix fold proteins, the SWI/SNF helicase, and peptide methionine sulfoxide reductase. Even more unexpectedly, we identified a gene coding for a C2H2 Zn finger protein that so far has been found only in eukaryotes.
Analysis of the combined 80 kb of sequence data has identified several genes indicative of metabolic pathways (Tables (Tables11 and and2).2). However, the lack of known transporters on the two fosmids makes it difficult to predict possible components being taken up by the archaeal cells. Additional data obtained for C. symbiosum and from other techniques, such as stable isotope and natural radiotracer analyses (34) and microautoradiography and fluorescence in situ hybridization (23, 31), should provide more insight into the potential metabolic traits of uncultivated marine archaea.
With regard to the marine planktonic crenarchaeotal clade in general, there exists considerable divergence and genome evolution. The 16S rRNA genes in fosmids 4B7 and 74A4 and C. symbiosum all share greater than 94% sequence similarity. However, the regions surrounding the rRNA operons vary substantially, indicating extensive genome rearrangements and various genome contents. These differences are also likely reflected in the phenotypic properties of the different crenarchaeotes that occupy the different oceanic regions.
Variation among homologous protein coding genes from microbes that share moderately similar (97%) 16S rRNA gene sequences has been reported for Prochlorococcus isolates derived from the same sample. Prochlorococcus isolates MED and SS120 shared 98% sequence similarity in their 16S rRNAs (44) but were only 76% identical based on their RNA polymerase C1 gene sequence (15). To our knowledge, however, genome variation among free-living, sympatric, uncultivated microbes has never been reported. Our data now provide a significant perspective on the extent of genome variation that can exist within a single population of free-living microbial cells that share identical (or nearly so) rRNA gene sequences. Our data suggest that the observed seasonal maximum of planktonic crenarchaeotes in Antarctic waters (29) is composed of (minimally) four highly related, yet nonidentical, co-occurring strains or variants. Of course, due to the labor and resource intensiveness of our procedures, our library screening procedure severely undersamples the actual population. Despite this undersampling, however, we did not recover any one dominant or identical genotype. Rather, we recovered identical or nearly identical rRNA phylotypes with significant differences in flanking genomic regions. Greater variation would be expected to be observed with larger sample size. These data strongly suggest that even within a single population, a very large amount of genomic heterogeneity exists that is undetectable by 16S rRNA sequence variation.
Presumably, genomic microheterogeneity can generate, eventually, physiological diversity. Even small variations among protein coding genes, such as those found here in sympatric archaeal cells that share identical or nearly identical rRNA gene sequences, could provide a selective advantage to the different genotypes under fluctuating environmental conditions. Such microvariations could confer greater fitness to the population as a whole under various environmental conditions, relative to any individual clonal phenotype. Our data strongly suggest that naturally occurring populations of bacteria and archaea can be viewed as nonclonal populations that harbor tremendous allelic variation.
We thank Chris Preston for advice and helpful discussions.
This work was supported by NSF grants OPP94-18442 and OCE0001619 and the David and Lucile Packard Foundation to E.F.D. O.B. was supported by a fellowship from the European Molecular Biology Organization. D.C.B., R.V.S., R.A.F., and J.L.S. were supported by Diversa Corporation.