Based on the compositions of the distal gut microbial communities from hosts living in their natural environments, we were able to discriminate species of great apes. The topological concordance between the species-level branching orders obtained for hosts and their microbiotae shows that over evolutionary timescales, host phylogeny is the overriding factor determining the microbial composition of the great ape gut microbiota. This recapitulation of the species relationships in the frequencies of the microbial constituents of their distal gut communities contrasts with previous notions that diet is the most important factor governing the grouping of gut microbiotae within primates 
This new view of great ape microbiota evolution emerged as a consequence of the sampling depth, which allowed the recovery of large sets of evolutionarily informative phylotypes. This allowed the application of standard parsimony-based phylogenetic approaches that were based on the frequency of each microbial species shared among hosts. Previous studies of gut microbiotae that surveyed only on the order of 100 sequences per sample 
could not accurately gauge either the diversity present in complex microbial communities or the relative abundance of the constituent species. Given the species complexity within the distal gut microbiota, it is necessary to obtain more than 104
reads per host to accurately access the relationships among divergent microbial communities. However, recent advances in sequencing methodologies render this number of reads both technically and economically feasible.
The fact that the gut microbial community phylogeny matches the great ape species phylogeny is not readily attributable to factors other than the evolutionary diversification of hosts. For example, the broad geographic range of chimpanzees, as well as the intercontinental distance separating our sampled humans, establishes that geographic proximity is not a major factor in the clustering of microbial communities by host species. Likewise, chimpanzees and gorillas within the same locale exhibited phylogenetically distinct gut microbial communities. That the composition of gut microbiotae assorts to species despite their geographic locations suggests that similarities in local factors, such as those that relate to diet, do not explain the close correspondence between host phylogeny and microbial community composition. To further evaluate whether host species differentiate according to diet, we examined the populations of chloroplast sequences within each fecal sample. Although the diversity of chloroplasts serves as an indicator only of plant diet at the time of sampling, there was no clear indication that the great ape species (except for G. beringei
) have widely different diets or that the diets of great apes structure according to host phylogeny (Figure S2
As evident from the differences in relative branch lengths between the mtDNA () and microbial community () trees, it is clear that the degree of genetic differentiation between hosts does not fully account for the variation in great ape gut microbiota. The host phylogeny signal that we uncovered can be masked by factors occurring on more proximate timescales (such as diet, geography, or health status). Only by conducting a phylogenetic analysis of communities that have been more deeply sampled is it possible to detect this signal. To assess the degree to which differences in gut microbiota reflect the genetic distance between hosts, we compared the amount of variation assigned to the terminal branches of the tree (i.e., those leading to individual hosts) relative to that encompassed in the seven internal branches that differentiate the five great ape species (grey branches in ). The species-discriminating branches together represent 73% of the total genetic distance present in the mtDNA phylogeny, but only 7% of the total distance in the tree based on microbial communities. This contrasts with the situation for individual hosts, whose branch lengths together constitute 70% of the distance in the microbial tree but encompass only 11% of the total genetic distance. This disparity reflects the broad variation in microbial communities among members of the same species, as has already been observed in humans 
. Next, to discount the effects of individual variation, we calculated the correlation coefficient between the relative branch lengths of the seven internal branches in the microbial community tree and the corresponding distances in the mtDNA tree. Despite the congruence in branching orders, the branch lengths in the mtDNA tree explain only about 25% of the variation in the microbial community tree. This indicates that gut microbiotae, although diverging in a manner consistent with vertical inheritance, are not changing in a strict time-dependent fashion that reflects the degree of genetic divergence among hosts. The difference in branch length indicates that individual-level variation in microbial community structure is extensive relative to between-species variation.
Our analysis indicates that host phylogeny has a major role in the diversification of distal gut microbial communities in great apes, a conclusion that can become apparent only when sampling is adequate for robust phylogenetic and evolutionary analyses of microbial species compositions. Numerous studies have applied UniFrac and related approaches to establish the relationships among microbial communities derived from a wide range of hosts and environmental sources 
. Despite the highly supported tree that we obtained by parsimony analysis, subjecting our dataset to UniFrac did not recover a tree that matches the host-species phylogeny (Figure S3
). Unlike parsimony, UniFrac relies on an input tree to specify the evolutionary relationship among bacterial taxa to infer the similarity among microbial communities. However, for a large dataset with nearly 9,000 characters, ensuring the correct inference of tree topology and branch lengths is difficult. The task of inferring an input tree is all the more problematic because of the relatively short and highly variable sequencing reads that are generated for most metagenomic studies. The quality of multiple sequence alignment, which is critical for inferring the guide tree, is greatly impacted by the limited read length, the level of sequence variation, and the propensity towards indel sequencing errors. This problem was almost entirely eliminated from our parsimony analysis (of species abundance data) by performing multiple sequence alignments on sets of reads assigned to a particular taxonomic class, not the entire dataset. Furthermore, when calculating pair-wise sequence identities among reads typed to the same class, indel sequencing errors present in taxonomically different reads are ignored. Since the V6 region has previously been shown to have low phylogenetic congruency with full-length small subunit ribosomal RNA topologies 
, the described methods based on species abundances and community compositions serve as an alternative and complementary approach for analyzing pyrotag data.
With the availability of methods that allow the scrutiny of microbial diversity and community structure at finer levels, the challenge now is to determine how best to characterize each specific environment in order to extract the relevant biological information about its constituents. In the present study, we found that sampling at levels of greater than 10,000 reads per sample, the application of stringent cutoffs for species identity, and the focus on parsimony-informative characters helped resolve host phylogeny as the major determinant of distal gut microbial communities in great apes.