During the last 30 years, technological advances in nucleic acid sequencing have led to revolutionary changes in our perception of the evolutionary relationships among all species as visualized in the tree of life
. The first revolution was spawned by the work of Carl Woese and colleagues who, through sequencing and phylogenetic analysis of fragments of rRNA molecules, demonstrated how the diverse kinds of known cellular organisms could be placed on a single tree of life 
. Most significantly, their analyses revealed the existence of a third major branch on the tree; the Archaea (then referred to as Archaebacteria) took their place along with the Bacteria and the Eukaryota 
. Several factors make rRNA genes exceptionally powerful for this purpose, the most important being perhaps that highly conserved, homologous rRNA genes are present in all cellular lineages. To this day, analyses of rRNA genes continue to clarify and extend our knowledge of the evolutionary relationships among all life forms 
For microbial organisms, this approach was restricted to the minority that could be grown in pure culture in the laboratory until Norm Pace and colleagues showed that one could sequence rRNAs directly from environmental samples 
. Initially, the methodology was cumbersome. However, this changed with the development of the polymerase chain reaction (PCR) methodology 
. PCR generates many copies of a target segment of DNA, which in turn facilitates cloning and sequencing of that segment. However, delineation of the segment to be amplified requires primers
, i.e., short segments of DNA whose nucleotide sequence is complementary to the DNA flanking the target. Because rRNA genes contain regions that are very highly conserved, “universal primers” can be used for PCR amplification of those genes even in environmental samples 
. Thus, in principle, one can use PCR to amplify the rRNA genes from all organisms in a sample in a culture-independent manner.
PCR-based studies have now characterized microbes from diverse habitats and have provided many fundamental new insights into microbial diversity. For example, we now realize that, in most environments, the culturable microbes represent but a small fraction of those present. Furthermore, phylogenetic analysis of the rRNA genes thus found enables one to assign those sequences to groups within the bacterial, archaeal, or eukaryotic domains of life (or to viral groups), a process known as phylotyping
. This has revealed the presence of dozens of major, but previously undiscovered, lineages that have no cultured members 
. With the development of considerably improved sequencing technologies, rRNA PCR surveys have become a routine tool for characterization of microbial communities.
Although rRNA PCR studies have provided a major foundation for today's environmental microbiology, this approach is not without its limitations. Notably, the “universal” primers are not truly universal. Even the best-designed ones fail to amplify the targeted genes in some lineages while preferentially amplifying those in others 
. Furthermore, phylogenetic trees based on rRNA sequences may not accurately reflect the evolutionary history of the source organisms due to the occurrence of lateral gene transfer, different rates of evolution in different lineages, or similarities produced by the convergent evolution of rRNA sequences from distantly related species 
. Generating alignments of rRNA genes can sometimes be challenging. Furthermore, because the copy number of rRNA genes varies in different species 
, the number of sequences observed in an environment cannot be used to directly infer the number of cells of any particular type  
. For these and other reasons, it is generally considered to be important to combine conclusions derived from rRNA sequence analysis with other types of information (e.g., microscopy, analysis of other macromolecules, etc). In terms of sequence information, this would mean generating data for other genes. This can be readily achieved for culturable organisms; phylogenetic analysis of protein coding genes, and even phylogenomic
analysis of whole genomes, has become a standard procedure  
. But unfortunately, despite considerable effort, no one has developed a robust PCR-based method for cloning and sequencing protein-coding genes from unknown uncultured organisms. Note – if you know reasonably detailed information about the taxonomy of the targeted uncultured organisms, one can get PCR of protein coding genes to work reasonably well. A major inherent obstacle in PCR of protein coding genes from unknown organisms is the degenerate nature of the genetic code. Even if the amino acid sequence of a highly conserved protein domain were identical across species, the primers for PCR amplification would have to be degenerate. Thus, although PCR surveys of protein-coding genes have revealed interesting findings, they are clearly limited somewhat in scope (e.g., 
Due to these factors, the community has faced a bit of a quandary regarding the characterization of uncultured organisms. Although rRNA analysis is extraordinarily powerful, the window it provides into the microbial world is clearly imperfect. It is possible that additional major branches in the tree of life might exist, branches that have been missed due to the limitations of rRNA PCR. To resolve this required ways to clone rRNA genes without the biases introduced by PCR, as well as unbiased methods for obtaining data on other genes from uncultured species. Fortunately, both are now provided by metagenomic analysis. Metagenomics
, broadly defined, is the sequencing of portions of the genomes of all organisms present in an environmental sample  
. It generates sequence data not only for rRNA genes, but for all sequences from the genomes of all organisms present, in a relatively unbiased manner (or at least with a different bias than that inherent in PCR) 
The application of metagenomic analysis has accelerated the rapid rate of advancement in the study of uncultured microbes that began with the advent of rRNA analysis (e.g., 
). Metagenomics has now enabled the phylogenetic characterization of many entire communities. For example, our analysis of the Sargasso Sea metagenomic data effectively used both protein-coding and rRNA sequences for phylotyping, in much the same way as had been done with rRNA PCR data 
. Furthermore, by including protein-coding genes, metagenomics can more accurately predict the biology of the organisms sampled, thus disclosing not only who is out there, but also what they are doing 
Previous usage of metagenomic data for phylogenetic typing of organisms focused primarily on assigning metagenomic sequences to specific known groups of organisms (e.g., see 
). Here we report our exploration of the potential use of metagenomic data to answer a simpler, but perhaps more fundamental, question: Can we identify novel rRNAs or protein-coding genes that suggest the existence of additional major branches on the tree of life?
The answer, surprisingly, is yes. We present here our findings, along with some likely explanations—including the possibility that there are indeed other major branches on the tree of life yet to be characterized.