Estimating the diversity of life is a persistent challenge in biology. In microbiology, the task is complicated by the fact that the subjects of the census are not visible to the naked eye or easily differentiated morphologically, and they are estimated to number over 1030
individual bacteria worldwide (30
). The properties of microorganisms necessitate the use of indirect analysis, involving culturing or 16S rRNA gene sequence analysis, to conduct a census of prokaryotes. Previous estimates of the number of bacterial species in the world range from 107
). Although it is well accepted that the number of prokaryotic species in the world is immense and that our efforts to sample them have been inadequate, there has been no systematic analysis to assess how well we have sampled the bacterial world.
Estimating microbial phylogenetic diversity is intrinsically interesting to many microbiologists, but it also plays a crucial role in the functional analysis of microbial communities. Knowledge of the extent of phylogenetic diversity can indicate how many functional groups have not yet been accounted for. For example, 16S rRNA diversity surveys of terrestrial and marine ecosystems revealed that gene sequences belonging to the Acidobacterium
) and the SAR11 clade of the α-Proteobacteria
), respectively, represented more than 25% of 16S rRNA sequences. These results have led to the development of improved culturing methods (13
). Likewise, Archaea
were long thought to exist solely in “extreme” environments, but 16S rRNA gene sequencing analysis indicates that Crenarchaeota
live in temperate soils (3
) and on the roots of plants (23
). Although it is impossible to elucidate function based solely on phylogeny, study of certain groups will be particularly fruitful for the discovery of new examples of certain functions such as antibiotics in the actinobacteria and light-harvesting complexes in the cyanobacteria. It is clear that we are at a relatively early stage in sampling global species richness, only beginning the exploration of ecologically important but unidentified groups of microorganisms.
Since Woese and Fox (31
) first proposed the 16S rRNA gene as a phylogenetic tool to describe the evolutionary relationships among organisms and Pace et al. (17
) described its use for classifying unculturable microorganisms in the environment, over 78,000 16S rRNA gene sequences have been deposited in GenBank (19
). These include sequences isolated from cultured bacteria (29
) and those amplified directly from environmental samples without prior culturing (17
). Sequences obtained by direct amplification from the environment provide the only information available for 99% of the prokaryotes in most natural communities (1
). Recent studies have shown that there are at least 50 bacterial phyla, and half of them are composed entirely of uncultured bacteria (9
). An additional three phyla contain less than 10% cultured members and six contain more than 90% cultured members (Fig. ).
FIG. 1. Phylogenetic tree of the Bacteria showing established phyla (italicized Latinized names) and candidate phyla described previously (9, 10, 19), using the November 2003 ARB database (http://arb-home.de ) with 16,964 sequences that are over 1,000 bp. (more ...)
We sought to answer the exigent question: how complete is the census of prokaryotes as represented by the 16S rRNA sequence database? The answer will indicate which groups have been well sampled and which have not, providing guidance to future studies directed toward discovering new forms of life. We constructed rarefaction curves, which indicate the completeness of sampling for each phylum of Bacteria
and for all Bacteria
, using the curated 16S rRNA gene accessions in the Ribosomal Database Project-II database (5
). We present evidence that argues that the traditional approach of blindly sampling interesting environments is limiting our attempt to census bacterial diversity. Based on the analysis presented here, we suggest complementing blind sampling with a more focused approach predicated on an assessment of which methods, environments, or taxonomic groups are most likely to yield new species in the future.