Study Set
The HMP 16S sequencing includes sequences amplified from the V3–V5 region, and in most cases amplified from the V1–V3 region as well. After filtering sequences using read quality and sample size requirements (see
Methods), our data included 16S sequences from 238 subjects for the V3–V5 region and 168 subjects for the V1–V3 region. There were 112 female and 127 male subjects. Samples were collected from 18 body sites, including nine sites within the oral cavity: saliva, supragingival and subgingival plaque (plaque above and below the gum line), tongue dorsum (back of the tongue), hard palate (roof of the mouth), buccal mucosa (inside of the cheek), keratinized gingiva (gums), palatine tonsils (tonsils at the back of the mouth), and throat, four skin sites: left and right antecubital fossae (inner elbows), left and right retroauricular creases (behind the ears), three vaginal sites: mid-vagina, posterior fornix (the back of the vagina) and vaginal introitus (the entrance), one nasal site: the anterior nares (front of the nostrils), and one stool sample
[2]. There were a total of ~12 million V1–V3 and ~15 million V3–V5 reads, and between 404 and 9,489 OTUs identified by body site, with an average sequencing depth of 5,709 tags (
Table S1). OTUs were created with the bioinformatics package mothur using single-linkage preclustering and average linkage clustering (see
Methods).
Defining a Core Microbiome of Healthy Humans
The most stringent definition of a 16S OTU being a member of a core microbiome would require its presence within a body site in all subjects (100%) sampled. Using this definition, we found at least one core phylotype (here represented by a 3% OTU) seen in the V1–V3 sequences in each body site except the anterior nares, saliva and the three vaginal sites (,
Table S2). For the V3–V5 sequences, an additional three body sites had no core OTUs (left and right antecubital fossae and subgingival plaque) while the anterior nares sample showed two core OTUs present in V3–V5 but none in V1–V3. These differences between V1–V3 and V3–V5 may reflect differences in the rate of evolution in different regions of the hypervariable regions of the 16S rRNA genes, as well as the ability of each region to differentiate microbial organisms.
| Table 1Number of Core OTUs present at different prevalence thresholds. |
As we will explore further, there is a wide variation in abundance across samples for even the most prevalent OTUs. Core OTUs that are dominant in some samples can be rare in others. Coupling this large variation in relative abundance with the fact that the sequence depth within nearly half of our samples was modest (fewer than 5,000 tags), we cannot assume that the lack of an OTU in any particular sample corresponds to its true absence in the subject – it may simply be below the detection level of a small sample. If we use a slightly less stringent threshold and define a core OTU as being present in 95% of all the samples of a body site, we see a more consistent view between primer sets with all body sites having at least one core OTU except the three vaginal samples using V1–V3. Using an even less stringent definition of 90% prevalence, we find even more OTUs as would be expected (
Table S2), but the differences are minor. For our core OTU analyses therefore, we define “core OTUs” as those that are present in at least 95% of all samples for a given body site.
Table S3 provides the OTU-level taxonomy for each of the 95% core OTUs.
Examining the V3–V5 sequences for which we have more samples, we note that the oral cavity sites have a consistently richer core microbiome than other sites, ranging from 7 core V3–V5 OTUs in the keratinized gingiva and subgingival plaque to 22 in the saliva (, ), where we have defined core richness as a count of the number of OTUs found in 95% of the samples, rather than an estimate of the projected total richness of the bacterial community. The buccal mucosa, hard palate, palatine tonsils, supragingival plaque, throat, and tongue dorsum all had similar core richnesses of 11 to 16 V3–V5 OTUs. The V1–V3 OTUs showed similar patterns with the oral sites having a core V1–V3 richness of 10–15 OTUs except the keratinized gingiva with only 3 OTUs. The stool (5 V3–V5 and 7 V1–V3 OTUs), anterior nares (4 V3–V5 and 3 V1–V3 OTUs), and skin samples (2–3 V3–V5 and 1–2 V1–V3 OTUs) had similar size cores, and noticeably less than the oral cavity. The three vaginal sites had only one V3–V5 core OTU each but no cores by V1–V3. The Lactobacillus tags predominant in the vaginal sites were split into multiple V1–V3 OTUs, and no one Lactobacillus OTU was seen in even 90% of all vaginal samples, although 95% of the samples did have at least one of the three most common V1–V3 OTUs. In general, the core OTUs of a given body site represent less than 2% of the total number of different OTUs in that body site (results not shown).
Comparing across body site groups, there were 4 V3–V5 95% core OTUs present in all 9 oral sites representing Fusobacterium, Streptococcus, Pasteurellaceae, and Veillonella. Granulicatella and Gemella were present in samples from all oral sites, but in some cases with a prevalence of only 92–94%. Only 2 OTUs were present in all V1–V3 oral sites and both were Streptococcus, a third OTU representing Veillonella was present in all oral sites at a prevalence of 90% or more. Relaxing the requirement from all 9 to 7 oral sites picks up more core OTUs and shows more in common between the hypervariable regions (see ).
Across all skin samples, a Propionibacterium OTU was core at 99–100% in all four sites in both V1–V3 and V3–V5 data. A Staphylococcus V3–V5 OTU was core in three of the four skin sites, and nearly core at 93% prevalence in the fourth site. A V1–V3 Staphylococcus OTU was present in all 4 sites at greater 90% prevalence. No additional V1–V3 core OTUs were present in the skin samples. Only one V3–V5 OTU was core in the three vaginal sites, a Lactobacillus, and no OTUs were core with the V1–V3 data. The stool samples contained 7 core OTUs at a 95% prevalence representing several members of the Lachnospiraceae family as well as Faecalibacterium, Oscillibacter and two separate Bacteroides OTUs. The Bacteroides OTUs were by far the most abundant comprising on average 21% of the sequences.
We did not find a core microbiome across all subjects and body sites. The most common oral Streptococcus OTU in both the V1–V3 and the V3–V5 data was also the most prevalent OTU across all sites, being found also in the anterior nares, and as core in the two antecubital fossa sites with the V3–V5 sequencing. This lack of cross-body core OTUs is not surprising given that different body sites represent starkly different environments for adaption, include both internal and external sites, and have varying levels of moisture, acidity, and temperature, to name just a few differences.
Core OTUs – Abundance and Prevalence
Despite their prevalence across subjects, the relative abundances of core OTUs vary dramatically between subjects. To demonstrate the scope of individual variation in the composition and abundance of the microbial communities, we plotted the normalized counts of each V3–V5 OTU in each sample placing the most highly abundant OTUs (). We see that even the most abundant core OTUs are highly variable across subjects with differences in relative abundance that span multiple orders of magnitude. For example, the most abundant OTU in the stool samples has a mean relative abundance of 0.23 (meaning that on average it accounts for 23% of all the sequences in each sample), but this relative abundance varies nearly 5,000-fold across our samples, ranging from 84% down to 0.021% (and not detected in seven samples). This same pattern in which OTUs display tremendous variation across different subjects was found repeatedly across the other body sites that were sampled (results not shown).
The prevalence of OTUs trends positively with their abundance (). The most abundant OTUs tend to be present in more subjects than the less abundant OTUs overall. Obviously an OTU at any given size will have an increased overall abundance rank if it is present in more body sites and subjects. Presence in many subjects, however, will not substantially increase the rank abundance of an OTU that is only present in low numbers, and an OTU that is highly abundant (many thousand reads) will have a very low rank even if it is only present in a few subjects. While abundance and prevalence are not fully independent, they clearly correlate in the human microbiome. panel A shows the strong trend between the OTU rank based on overall abundance with the overall prevalence, as defined by the fraction of samples where the OTU is present. Panel B shows the OTU rank against the cumulative abundance of sequence tags, highlighting that the top 100 OTUs for both sets of tag data account for nearly all of the sequence tags. The lower panels compare the OTU rank against the prevalence rank for the V1–V3 (C) and the V3–V5 (D) tags. A general trend is apparent between the OTU abundance rank and prevalence rank, with the most abundant OTUs tending to be more prevalent as well. Highly prevalent but low abundance taxa were generally not detected by the level of sequencing effort provided by the HMP.
Presence of Biome Types
Recent studies by Arumugam et al
[8] and Wu et al
[11] described between two and three biome types (enterotypes) consisting of clusters dominated by
Bacteroides, Ruminococcus or Clostridiales, and
Prevotella. To determine if clear enteric biome types were also present in the stool samples from the HMP cohort, we first used the RDP taxonomy directly (prior to using the OTUs) and assigned samples to draft biome types defined by their most abundant taxon. For the V3–V5 data, samples were assigned to
Bacteroides (n

=

192),
Prevotella (n

=

9),
Ruminococcus (n

=

2),
Alistipes (n

=

4), or
Oscillibacter (n

=

3). With the V1–V3 tags, we found slightly different types:
Bacteroides (n

=

100),
Prevotella (n

=

8),
Akkermansia (n

=

1),
Alistipes (n

=

1) and multiple Clostridiales (n

=

10) including Ruminococcaceae (
Faecalibacterium,
Hydrogenoanaerobacterium,
Subdoligranulum), Lachnospiraceae (
Coprococcus,
Pseudobutyrivibrio,
Catonella) and Veillonellaceae (
Dialister).
A PCoA analysis based on the RDP taxonomy ( A&B) shows that the samples assigned to
Bacteroides and
Prevotella can be segregated by the first two components, although these components together explain only a small amount of the community differences (8% for V3–V5 and 13% for V1–V3). The segregation is minimal, however, and two types do not form discrete, well-separated clusters. In the V3–V5 analysis, the
Alistipes samples are located between the
Ruminococcus and the
Bacteroides, while the
Oscilllibacter overlaps both the
Bacteroides and the
Prevotella. In the V1–V3 samples, the
Bacteroides and Clostridiales have some separation but also a clear region of overlap shared by the single
Akkermansia and
Alistipes dominated samples. The data show community gradients rather than community clusters with a continuous ratio of Prevotella to Bacteroides. The differentiation between biome types occurs simply at the point when the ratio becomes greater than one, not when there is a large separation of types. This is highlighted by comparing the first PCoA axis with the ratio of Prevotella to Bacteroides (
Figure S1).
If we repeat our PCoA analysis, but at the OTU level, rather than using the RDP genus assignments, we observe a very different pattern with no separation of the biome types ( C&D). The OTU analysis shows the gradient of
Bacteroides and Clostridiales along the first axis, with most of the Clostridiales falling within the area of overlap. Using the V3–V5 OTUs the Prevotellaceae samples fell completely within the area of overlap between
Bacteroides and Clostridiales with no segregation at all. Using the V1–V3 data, the Prevotellaceae fall exactly along the edge of the
Bacteroides and Clostridiales, but with no clear separation between the two types. In the V3–V5 data there are 70 distinct
Bacteroides OTUs and 437 distinct Clostridiales OTUs. By counting these as separate phylotypes, it may be that the community distances within these biome types are too great and the community distances between the biome types too small, combined with the existence of a rich diversity of other OTUs and taxa present in both types precluding adequate PCoA clustering at the OTU level. The OTU data, therefore, appears to report a level of individual discrimination between subjects that confounds stool biome types that are more apparent at a less discriminating level of taxonomic resolution. This observation is consistent with a recent suggestion that the appropriate level of taxonomic resolution be explicitly considered in the analysis of metagenomic data
[12] and suggests that there is considerable individual variation between stool samples.
Vaginal samples by contrast appeared to have at least three potential V3–V5 biome types, and at least 4 V1–V3 biome types that do not overlap ( D&E). The three vaginal sampling locations (mid-vagina, vaginal introitus, and posterior fornix) yielded almost exactly the same results. The rRNA hypervariable regions, V3–V5 and V1–V3 provided a somewhat different perspective. With the V3–V5 tag data, greater than 90% of the samples were dominated by a single Lactobacillus OTU (93% in mid-vagina and 91% in both the vaginal introitus and the posterior fornix). A single Bifidobacteriaceae OTU was the most abundant OTU in about 5% of samples and the remaining samples had several other OTUs representing different taxa (including Atopobium, Prevotella, Propionobacterium and Clostridiales) as the most abundant. The dominant V3–V5 Lactobacillus OTU includes sequences with perfect BLAST matches to several different species including L. crispatus, L. iners, L. gasseri, as well as L. acidophilus, L. amylovorus, L. kalixensis, L.gallinarum, L. johnsonii, among others.
Sequencing with V1–V3 separates the vaginal
Lactobacillus into three separate OTUs likely representing
L. crispatus (OTU #3),
L. iners (OTU #6), and
L. gasseri (OTU #9) (based on perfect match BLAST results of the most abundant sequence to the NCBI nt database). These OTUs correspond to the dominant
Lactobacillus species and biome types identified by Zhou et al
[13] and Ravel et al
[14] as Groups I (OTU #3), II (OTU #9), and III (OTU #6) and one dominated by non-Lactobacillus OTUs, including
Prevotella and
Atopobium, Group IV. Interestingly, the V1–V3 tags while better at differentiating amongst the
Lactobacillus species, were not effective in detecting the Bifidobacteriaceae seen in the V3–V5 tags. About 60% of the subjects were dominated by the
Lactobacillus OTU #3 (60% in the mid-vagina, 64% at the posterior fornix and 61% at the vaginal introitus), 20% were dominated by
Lactobacillus OTU #6 (25%, 19%, and 21%) and 13% (13%, 12%, 13%) by
Lactobacillus OTU #9, with the remaining ~5% dominated by other OTUs and taxa.
Estimated Total Richness
The estimated total richness of the microbial communities, as defined by total number of OTUs expected with complete sampling of the subject population, varies markedly between body sites (See and ). The stool has the highest estimated richness at 33,627 expected V3–V5 OTUs (23,665 V1–V3 OTUs), followed by the oral sites with estimates of richness from 3,125–11,501 V3–V5 OTUs (3,793–14,410 V1–V3 OTUs), and then the anterior nares, skin and vagina sites have richness estimates between 1400 and 2800 V3–V5 OTUs (1100–3700 V1–V3 OTUs). While the richness estimates calculated from the V1–V3 data are on average moderately higher, the body sites maintain a similar ordering of relative richness. The different skin sites have similar richness estimates as do the different vaginal sites. The oral sites, however, have a broader range of estimated richness. By combining samples across subjects and body sites, we can estimate the number of different OTUs that may be expected for the human population sampled. Combining data across all body sites and subjects, we estimate the V3–V5 richness for the female microbiome to be 51,373 (V1–V3
![[ratio]](/corehtml/pmc/pmcents/x2236.gif)
42,391) and the male V3–V5 richness to be 48,388 (V1–V3
![[ratio]](/corehtml/pmc/pmcents/x2236.gif)
39,782). These estimates are about 40% of the sum of the individual richness estimates by body site, which implies that more than half of the subgenus OTUs are present in more than one location on the body, although much of this may be due to OTUs appearing in multiple oral sites.
| Table 2Richness estimates for each body site. |
Patterns in the Healthy Human Microbiome Within Genera
Clustering 16S tags into OTUs at the 3% level often differentiates groups of organisms more specifically than the genus level taxonomy. Current bacterial taxonomy is limited such that even with full length rRNA sequences, most organisms cannot be identified to species, and often not even as far as the genus level. The most commonly used tool for assigning taxonomy, the RDP Classifier
[4], does not assign taxonomic names below the genus level. OTUs based on 16S tag sequencing, however, can often distinguish between organisms within a single genus that may represent species or strain level taxonomy or simply subgroups of organisms within a given genus. These sub-genus OTUs can reveal patterns not seen at the genus or higher taxonomic levels. While subgenus diversity within rare OTUs may be important, especially when their functions are combined across OTUs
[8], the sampling depth of this data is insufficient to support the exploration of rare OTUs, and we only used OTUs containing a minimum of 100 sequences. Although our analyses here focus on the V3–V5 sequencing because of the greater amount of data available, in many cases the V1–V3 OTUs have better discerning power at the sub-genus level, as demonstrated by the ability of the V1–V3 tags to highlight three different
Lactobacillus OTUs in the vaginal samples (see
Presence of Biome Types, above).
Of the top most abundant genera in the V3–V5 tag data:
Streptococcus,
Propionibacterium,
Lactobacillus,
Prevotella,
Bacteroides,
Corynebacterium,
Fusobacterium,
Pasteurella,
Veillonella and
Neisseria, all but
Propionibacterium showed distinct differences in body site preference at the OTU level. This pattern is repeated in many other taxa. As illustrated in , different OTUs within a single genus can have markedly different relative abundances in different body sites. For instance,
Streptococcus, the most abundant genus in the dataset appears to have at least four different V3–V5 OTUs appearing in the oral sites (), with OTU #2 and #596 most abundant in the hard palate and palatine tonsils. Both OTU #2 and OTU #596 represent sequences matching many different members of the Streptococcus Mitis group (See
Table S4 for OTU species assignments). OTU #60 (
S. mutans) was more abundant in the supra- and sub-gingival plaque and in the buccal mucosa, and OTU #6 (
Streptococcus sp.) was most abundant on the tongue but also on the hard palate, tonsils. Within the
Prevotella genus 79 V3–V5 OTUs were discovered. It may be that many of these are a finer differentiation of organisms that have a natural 16S variation >3% or that there are multiple copies of the 16S gene within the genome. The three most abundant of these OTUs: #10 (
P. melaninogenica), #26 (
P. pallens), #67 (
P. nanceinensis) all appear preferentially in the oral cavity but with slightly different abundance patterns between the sites. Only OTU #26 appeared in high numbers in the mid-vagina (). All three of them appear more commonly in saliva, tongue dorsum, hard palate palatine tonsils and the throat, but are distinctly rare in the subgingival and supragingival plaque, keratinized gingiva and the buccal mucosa, as well as the nares, stool and skin. Many of the other
Prevotella OTUs followed the same patterns as these three.
Four of the twenty Bacteroides V3–V5 OTUs demonstrated highly specific body site preferences (). Despite both having taxonomic best matches to B. vulgatus and B. dorei, OTU #17 appeared almost exclusively in the stool and OTU #707 almost exclusively in the throat. OTU #1004 (Bacteroides sp.) appeared almost exclusively on the tongue, but OTU #45 (B. stercoris) was found in the throat, stool, left antecubital fossa and mid-vagina (). The seven subjects with the highest abundance of OTU #45 in the left antecubital fossa, accounting for most of the normalized antecubital fossa abundance did not have samples from the right antecubital fossa with adequate tags to be included in the study (less than 1000 reads), therefore no conclusions should be drawn about the left vs. right antecubital fossa and the Bacteroides OTU #45.
The genus Corynebacterium had at least 8 V3–V5 OTUs with five different profiles: OTU #15 (C. matruchotii) was present almost exclusively in the supragingival plaque, OTU #12 (Clostridium sp.) was predominantly present in the anterior nares, OTU #188 (C. argentoratense) mostly in saliva and to a lesser extent the hard palate, OTU #101 (Clostridium sp.) primarily in the skin and the mid-vagina, and OTU #418 (C. glucuronolyticum) in the mid-vagina and posterior fornix. Three of the most abundant Fusobacterium OTUs OTU #523 (Fusibacterium sp., Filifactor alocis), OTU #738 (Fusibacterium sp., Filifactor alocis), and OTU #9 (F. periodonticum) were present in moderate to high abundance in the tonsils. We also found OTU #523 in the plaque, OTU #738 on the tongue, but OTU #9 was more cosmopolitan appearing in the plaque, on the tongue, in the throat and to a small extent in the mid-vagina.
OTUs can also be used to differentiate sequences whose taxonomy cannot be ascertained even to the genus level, either because the tags themselves cannot be assigned a genus-level taxonomy, or because the tags within an OTU are assigned to different taxa confounding the taxonomic assignment of the OTU. In addition to OTUs belonging to the genus Neisseria which included at least four OTUs with distinct locational patterns, peaking in saliva (OTU #98, Neisseria sp.), supragingival plaque (OTU #220, N. bacilliformis), subgingival plaque (OTU #21, Neisseria sp, and Morococcus cerebrosus), and the tongue, tonsils and throat (OTU # 8, Neisseria sp.), OTUs classifying only to the family level as Neisseriaceae (which could not be further classified with BLAST) peaked in the buccal mucosa (OTU #843), the tonsils (OTU #1001), and the throat (OTU #918) and two more were present in the retroauricular crease, one of which was also on the hard palate (OTUs #40) and the other in the anterior nares (OTU #85) (). While the genera within the Pasteurellaceae family did not further separate into subgenus V3–V5 OTUs, the OTUs classified only to the family level did (), with three OTUs exclusive to a single body site: saliva (OTU #1511), hard palate (OTU #1725), and palatine tonsils (OTU #1185). Two additional OTUs #16 (Haemophilus parainfluenzae) and #19 (Haemophilus haemolyticus) were each present across most of the oral sites. None of these OTUs were found above trace levels in the nares, stool, skin, or vagina. V3–V5 OTUs assigned to the Prevotellaceae family also showed distinct body site preferences. Five of the most common Prevotellaceae OTUs appeared almost exclusively in a single body site: OTU #214 in the stool, OTU #241 in the throat, OTU #149 in the saliva, OTU #333 (P. melaninogenica) in the mid-vagina, and OTU #457 on the hard palate (). OTU #34 was split between the saliva and stool, and a seventh Prevotellaceae OTU, #195, showed more generalization, appearing in the saliva, tonsils and throat, and in lower abundances on the tongue and hard palate. Interestingly, only OTU #333 had best BLAST hits with species level taxonomy assignments. At the order level, V1–V3 OTUs identified only as Actinomycetales included seven OTUs with distinct patterns, with OTUs #65 (C. durum) and 151 (Actinomyces sp.) preferentially colonizing the subgingival and supragingival plaque, OTU #96 (Actinomyces gravenitzii) in several places in the oral cavity, especially the tongue, hard palate, tonsils and throat, OTUs #35 (Actinomycetales), #209 (Corynebacterium kroppenstedtii), and #165 (Actinomycetales) in the anterior nares and skin sites, and OTU #308 (Mycobacterium sp.) on the skin and in the vagina ().