Many computational pipelines exist for analyzing the taxonomy and phylogeny of 16S rRNA gene sequence data generated by pyrosequencing and Illumina platforms (Schloss et al., 2009
; Caporaso et al., 2010
; Giongo et al., 2010
). The approach outlined in this study represents a major methodological improvement for characterizing the phylogenetic distribution of unclassified microbial diversity analyzed by short-read, high-throughput sequencing studies, which is a fraction often overlapping with the rare biosphere. In particular, we report a high success rate (7 of 8 primers) for the specific amplification of putatively novel lineages that contribute less than 1.0 × 10−4
% of all sequences in an environmental DNA extraction. Additionally, the recovery of highly novel clades, particularly UL5, UL9 and UL13, suggests that directed targeting of phylogenetic novelty from high-throughput sequencing projects is feasible. This protocol was further validated by the exclusive amplification and recovery of positive control sequences. Investigations specifically targeting novel phylogenetic lineages offer the potential of not only increasing the breadth of taxonomic knowledge, but offer an additional tool for investigating deep branching lineages of life in general (Sogin et al., 2006
; Wu et al., 2011
; Youssef et al., 2012
The lineages UL5, UL9 and UL13 each represented significant and repeatable highly novel phylogenetic groups and demonstrated the value of this approach. Each group was monophyletic and either completely unique to this study or significantly increased knowledge of uncultured sequence data within GenBank. One of the two internally consistent clades of UL5 sequences was classified as BRC1 using the RDP classifier. However, BLASTN analysis was ambiguous, only showing 92% identity with the BRC1 clade (). Regardless, as sequence identity of UL5 against BRC1 sequences is within the observed range of existing BRC1 lineages (Derakshani et al., 2001
) and the two UL5 clades are fully supported as monophyletic, they likely represent two additional species within phylotype-defined BRC1, significantly adding to its known diversity.
Extreme environments tend to contain unique cyanobacterial populations. Specifically, polar environments harbor species with high tolerance to UV (Quesada et al., 1999
; George et al., 2001
) and temperature extremes (Tang et al., 1997
). UL9 primers amplified 16S rRNA genes from two distinct Cyanobacteria species with strong bootstrap support for the sister relationship between the isolates, and full resolution as sister to all Cyanobacteria (, and ). Based on the phylogenetic resolution and BLASTN results, this clade appears to be a novel sister group to Gloeobacter violaceus
, significantly adding to our understanding of the early evolution of the Cyanobacteria. Gloeobacter
is a monospecific lineage representing an early radiation of Cyanobacteria and contains several features highly divergent from other cyanobacterial species, including the absence of thylakoids (Nakamura and Kaneko, 2003
; and references therein). At minimum, UL9 sequences indicate the presence of a second clade of Cyanobacteria, along with Gloeobacter
, that diverged very early in cyanobacterial evolution. Furthermore, the single observation of a near-identical sequence in Antarctica provides an interesting case of microbial dispersal. The GenBank sequence nearly identical to the larger UL9 clade was isolated from a moss pillar within a freshwater lake in eastern Antarctica. This sequence is potentially a lichen photobiont based on the association of moss and lichen in Antarctica (Victoria et al., 2006
). As lichens are a primary colonizer for tundra, one potential source of this sequence is as a previously unobserved cyanobacterial photobiont in lichen. Such an association would help explain the bi-polar distribution of this sequence as lichen species can be easily distributed and are tolerant to environmental stress such as desiccation. The near complete absence of this lineage within CM2
BL libraries is notable (), suggesting that it is not broadly distributed, arguing for animal (for example, bird) or anthropogenic dispersal.
The set of sequences with the highest taxonomic novelty recovered in this study, UL13 ( and ), were so divergent that inferences about ecology are difficult, although intriguing. Despite UL13 sequences resolving as sister to the obligate intracellular parasite Rickettsia
(), the phylogenetic support tended to be weak, likely due to the magnitude of sequence divergence. The closest BLASTN matches with mitochondrial sequences are not surprising given the phylogenetic placement near Rickettsia
and within algal mitochondria (). Related intracellular parasites and the SAR11 clade likely had a role in the evolution of mitochondria (Thrash et al., 2011
; Rodríguez-Ezpeleta and Embley, 2012
). There are relatively few mitochondrial 16S rRNA gene sequences available in public sequence databases, resulting in large gaps in our knowledge of bacterial 16S rRNA gene sequence data. An analysis relying exclusively on sequence divergence and non-phylogenetic classification schemes or poor taxon sampling (for example, ) would have incorrectly inferred bacterial novelty within the Rickettsiales instead of within unknown mitochondrial diversity. This uncharacterized 16S rRNA gene sequence diversity for mitochondria should be addressed, as microbial diversity studies tend not to correctly account for sequences of organellar origin.
It is possible that the rare and highly divergent 16S rRNA gene sequences amplified in this study did not represent bacterial species active in the ecosystem, and instead correspond to pseudogenes, dormant organisms or other such components. The fact that these sequences successfully aligned to valid 16S rRNA gene structures suggests these are bona fide 16S rRNA genes. Unfortunately, activity or metabolism cannot be inferred from single-gene DNA-based data. The directed amplification of these sequences did not necessarily recover members of potential unique clades in the proportions present in the environment or observed in high-throughput sequencing. This is not surprising, although it does indicate this approach should not be used to explore diversity relationships within the rare biosphere, but rather to further explore the evolutionary history and phylogenetic novelty of species constituting rare or uncharacterized groups.
The majority of ULs amplified here occurred at low relative abundance in Alert soils. Their relatively low abundance suggests that their constituent genes would not readily contribute to metagenomic libraries constructed from this and similar soil sites. Due to the magnitude of phylogenetic novelty observed, these organisms likely also represent highly divergent genomes that would be valuable to target further by cell sorting and inclusion with The Microbial Earth Project (http://genome.jgi.doe.gov/programs/bacteria-archaea/MEP/index.jsf
). With near full-length 16S rRNA gene sequences available, existing soil libraries can be further probed for these organisms, in attempts to isolate and amplify genomic material. This technique would therefore have applications in bioprospecting, specifically targeting phylogenetically ULs. The number of highly divergent lineages observed here, combined with the high proportion of sequences with unknown taxonomy from Alert soils, suggests that polar environments should be further explored for microbial diversity. This will not only improve our understanding of the ecology of these systems as they face an uncertain future, but also increase our knowledge of microbial diversity and organellar evolution.