Diversity is the hard currency of ecologists. Various statistics have been developed for summarizing the diversity of an ecological community. A commonly adopted summary statistic is the Shannon-Weiner index: H = −Σpilnpi, where pi is the frequency of the ith species. In addition, species richness (the number of different species) often is reported, and recent work emphasizes the importance of accurate estimates of species richness when ecological communities and processes that affect the composition of communities and the function of ecosystems are described (5). The significance of diversity is often inferred by comparing communities characterized from different environments. Typically, such comparisons rely on standard measures of overlap, including the percentage of species shared by two communities or similarity indices. One of the indices used is Sorensen's index: S = S12/[0.5(S1 + S2)], where S12 is the number of species common to both sites and Si is the number of species found at site i.
A limitation of traditional statistics for describing and comparing diversity is that species (or operational taxonomic units [OTUs]) are defined inconsistently. For instance, Kroes et al. (6) defined an OTU as a 16S ribosomal DNA (rDNA) sequence group in which sequences differed by less than 1%. By contrast, the definition of McCaig et al. (11) included sequences that were less than 3% different, and other studies have used 5% as the magic number. The lack of consensus limits the comparative utility of statistics based solely on identification of species (or OTUs). A second, and perhaps more important, limitation of the standard statistics of diversity is that OTUs are counted equivalently even though some may be highly divergent and phylogenetically unique, whereas others may be part of a closely related group of species and are therefore phylogenetically redundant (4). The contrast can be illustrated by comparing two hypothetical communities in which the numbers of species, the richness profiles of species, and the rarefaction profiles are identical but which differ in the magnitude of phylogenetic diversity (i.e., the degree of divergence among the sampled sequences). Standard ecological statistics of diversity would miss the genetic difference between the two communities, and ecologists would most likely consider the two communities equally diverse when, in fact, one community harbors more genetic diversity (or disparity) than the other. Because genetic variation and phenotypic variance often are positively correlated in populations of animals (12), plants (7), and microbes (15), descriptions of microbial communities based on DNA data should include information about diversity and disparity. This is especially important in light of studies demonstrating an association between ecosystem function and community diversity (14, 28).
In this review I introduce various statistics borrowed from population genetics and systematics for describing and comparing the diversity evident from samples of gene sequences. I briefly introduce the statistics and methodological underpinnings of tests for differences between communities, and I use the methods to analyze well-described microbial communities. I show that information gained from analysis of DNA sequences provides the basis for statistical analysis of communities in ways that advance inferences about the processes that may govern the compositions and functions of microbial communities. Furthermore, the advocated analytical approaches make it possible to accomplish broad comparisons of ecological communities. The methods of analysis explored in this paper are meant to be complementary to other methods, such as the robust estimation of richness advocated by Hughes et al. (5) and approaches for estimating functional properties of bacteria from phylogenetic inference (16).