The large reference population of the HMP has provided, to our knowledge, the first opportunity for a comprehensive description of the human gastrointestinal microbiota, focused here on the bacterial composition and function of ten independently sampled body habitats throughout the digestive tract. Using taxonomically binned 16S rRNA gene sequences, we identified the representation and relative abundance of organisms in 2,105 samples. We used the LEfSe system for metagenomic biomarker discovery to identify clades at all taxonomic levels whose distribution varied among four classes of body habitats, and which included rare clades not expected as commensals in the human microbiome. We also observed prevalent but low abundance of genera characterized by common pathogenic species, even in this asymptomatic reference population. Finally, we performed a complementary analysis of the metabolic modules and enzymes detected in a subset of these body sites, revealing strong variation in sugar and metal utilization among the digestive tract communities.
Four distinct groups were delineated among the microbial communities from the digestive tract sites. The groups were rooted in the ratio of the relative abundances of the two major phyla, Firmicutes and Bacteroidetes (Figure ), and the differences extended to the genus level. In the absence of disease, these groupings suggest that it might be possible to sample one representative site from each group in future studies as a strategy to decrease sequencing costs. For example, the buccal mucosa (Group 1), tongue dorsum (Group 2), supragingival plaque (Group 3) and stool (Group 4) could be used to represent all ten sites examined here. Samples from the suggested body habitats can be obtained with minimal discomfort and risk to participants, and are likely to provide the biomass needed to yield sufficient DNA for community whole genome shotgun analysis. Since the current study includes only healthy subjects, however, additional validation would be required to investigate pre-disease and disease states at targeted sites for both local and systemic diseases.
The oral microbiome as revealed in this investigation was generally consistent with earlier studies [
11,
13,
14,
22,
66,
67]. Firmicutes largely dominated the microbial communities on oral tissue surfaces and in saliva. Dental plaque taxa were more evenly distributed, dominated by Firmicutes, Bacteriodetes, Actinobacteria, Proteobacteria and Fusobacteria. The differences in the plaque communities relative to oral tissue sites are likely driven by the ability of the microbial community to accumulate on the non-shedding tooth surface and the physiological status relative to oxygen distribution in the resulting biofilm.
Porphyromonas,
Tannerella and
Treponema, genera consisting of recognized pathogens in periodontal diseases, were highly prevalent. The presence of these genera in greater than 95% of individuals in this non-diseased population provides strong evidence that they are part of the commensal oral microbiome. These data suggest, rather than a complete absence of pathogenic organisms from the normal microbiota, the possibility of low-level carriage of potential pathogens [
68-
70].
The stool microbiota was distinguished from the microbiota of the upper digestive tract sites (Figure ), as expected, and set apart by a high abundance of Bacteroidetes. A notable difference in the composition of the stool microbiome of the HMP dataset compared to existing 16S rRNA gene profiles is the increased ratio of Bacteroidetes (>60% of the sequences) to Firmicutes (≤30% of the sequences). Many previous studies of adult American populations have observed the reverse, a preponderance of Firmicutes [
15,
71-
73], and similar observations have been reported in geographically diverse populations [
74,
75] and in infant gut microbiome colonization investigations [
76]. It should be noted that all HMP gut communities were assayed from stool samples, which may differ extensively from colonic biopsies. For example, using endoscopic biopsies from just two subjects, Wang
et al. [
77] reported 49% of 16S rRNA gene clones were from the Firmicutes and 27.7% were from Bacteroidetes. However, even this distinction is unclear, as a study of 16S rRNA sequences from regional gut biopsies and spontaneously passed stool involving three subjects similarly showed the majority of phylotypes belonged to Firmicutes (76%) compared to 16% for Bacteroidetes [
15]. In a study of stool from 154 adult women (twins and their mothers), Firmicutes had a mean relative abundance of >60% using several different methods to assess the 16S rRNA gene content of stool [
24]. Finally, a recently published study of fecal microbiota in 161 older subjects (≥65 years) corroborate our findings, namely a Bacteroidetes-dominant distribution (57%) compared to Firmicutes (40%) [
26]. The difference in the Firmicutes:Bacteroides ratio in stool samples analyzed by 16S rRNA composition was confirmed by whole genome shotgun data from the same samples in the HMP dataset [
54]. While it is possible that these differences are linked to any of geographic location, host genetics, or differences in technical procedures, further study will be critical in explaining these apparently dramatic variations in gut microbiota composition in adults.
An estimated 10
11 bacterial cells per day flow from the mouth to the stomach [
78,
79]. Both cultivation and molecular techniques demonstrate an overlap in the oral, pharyngeal, esophageal and intestinal microbiomes [
12,
27,
28,
75,
80-
85]. It has thus been hypothesized that the oral microbiota might significantly contribute to distal digestive tract populations. Among HMP subjects, the genera
Bacteroides,
Faecalibacterium,
Parabacteroides,
Eubacterium,
Alistipes,
Dialister,
Streptococcus,
Prevotella,
Roseburia,
Coprococcus,
Veillonella, and
Oscilibacter were detected in both the oral cavity and stool in more than 45% of subjects. However, the short sequence reads did not permit species-level identification, leaving open both the possibility that there are distinct distributions of species of these common genera along the digestive tract, and the question of whether oral microbes seed distal sites below the stomach.
Based on the commonality of genera detected in the upper digestive tract, we postulate that saliva, via its impact on pH (as a buffer) and nutrient availability (high mucin content) [
86], is a key driver of microbial composition in the habitats above the stomach. The epithelium is likely another key driver as most of the upper gastrointestinal mucosal surfaces share a common epithelial lining (nonkeratinized, stratified, squamous epithelium), with the exception of the keratinized gingiva, hard palate and parts of the tongue dorsum, which instead share a keratinized, stratified, squamous epithelium (Additional file
2). The upper digestive tract sites are also constantly exposed to both inhaled and ingested microbes. A substantial portion of the variability observed in the upper digestive tract tract microbiota might then be explained by interactions between the saliva, host cell type, and exogenous factors such as oxygen availability and oral intake.
In contrast to these potentially homogenizing effects, the throat, among the nine upper digestive tract sites sampled, is uniquely the recipient of small particles, including microbes, that are trapped in mucus and propelled by respiratory cilia up from the trachea and down from the nasal cavity
en route to the stomach. This might impose an additional selective pressure on pharyngeal microbiota. However, no such effect was evident in the oropharynx, which segregated nicely into Group 2 with sites not exposed to the constant flow of respiratory tract mucus. Group 2, with the tongue, tonsils, throat and saliva, is a reminder of the important overlap between the upper segments of the digestive and respiratory tracts: the aerodigestive tract, which consists of the 'lips, mouth, tongue, nose, throat, vocal cords, and part of the esophagus and windpipe' [
87]. Evidence suggests that the pool of microbes from Group 2, and other oral sites, contribute to colonization of the airways in disease. A few examples of this from the polymicrobial airway infections of cystic fibrosis follow: one of the earliest cystic fibrosis pulmonary pathogens is
Haemophilus influenzae, a common colonizer of the upper aerodigestive tract [
88]; members of the
Streptococcus milleri group were recently implicated as cystic fibrosis pathogens [
89], and are known colonizers of the oral cavity; and lastly, members of the oropharyngeal microbiome might modulate the virulence of the key cystic fibrosis pathogen
Pseudomonas [
90]. To explain microbial community structure throughout the aerodigestive tract and airways, one might speculatively extend the basic argument above, noting that the counterpart of saliva is mucus in regions not bathed by its flow, including sites sampled by the HMP but not investigated here (for example, the anterior nares) and habitats that require more invasive methods for sampling (for example, nasal cavity, nasopharynx, esophagus and airways).
Several 'environmental' phyla observed in human microbiota [
33,
91] appear to be strongly host-associated in this study. The Synergistetes phylum, for example, has only recently been described in detailed association with the human oral cavity [
36,
92], and is still considered potentially environmental due to its common occurrence in, for example, bioreactors [
93,
94]. Although completely absent from all ten sites in many individuals, it conversely comprised up to 10% of the community in some samples, and tended to recur at multiple body habitats within the same individual. This property - a dichotomy of apparent niches that includes specific and potentially stable occupation of human microbiome sites - can now be extended to TM7 and SR1 based on the HMP oral cavity data. As sequencing costs drop, deeper shotgun sequencing will provide access to such organisms with higher confidence, as most of those organisms are only known through their phylogenetically conserved genes.