|Home | About | Journals | Submit | Contact Us | Français|
The composition of the oral microbiota from 10 individuals with healthy oral tissues was determined using culture-independent techniques. From each individual, 26 specimens, each from different oral sites at a single point in time, were collected and pooled. An eleventh pool was constructed using portions of the subgingival specimens from all 10 individuals. The 16S rRNA gene was amplified using broad-range bacterial primers, and clone libraries from the individual and subgingival pools were constructed. From a total of 11 368 high-quality, non-chimeric, near full-length sequences, 247 species-level phylotypes (using a 99% sequence identity threshold) and 9 bacteria phyla were identified. At least 15 bacterial genera were conserved among all 10 individuals, with significant interindividual differences at the species and strain level. Comparisons of these oral bacterial sequences to near full-length sequences found previously in the large intestines and feces of other healthy individuals suggest that the mouth and intestinal tract harbor distinct sets of bacteria. Co-occurrence analysis demonstrated significant segregation of taxa when community membership was examined at the level of genus, but not at the level of species, suggesting that ecologically-significant, competitive interactions are more apparent at a broader taxonomic level than species. This study is one of the more comprehensive, high-resolution analyses of bacterial diversity within the healthy human mouth to date, and highlights the value of tools from macroecology for enhancing our understanding of bacterial ecology in human health.
The human body is home to many indigenous microorganisms, with distinct communities at different anatomical sites (Dethlefsen et al., 2007). Recent studies have demonstrated the importance of the gut microbiota in digestion, fat storage, angiogenesis, immune system development and response, colonization resistance, and epithelial architecture (reviewed in Flint et al., 2007; Tappenden and Deutsch, 2007; Cogen et al., 2008). The oral cavity is also home to microbial communities with important implications for human health and disease. Chronic periodontitis is one of the most common inflammatory conditions worldwide, and is associated with bacterial community structures that are distinct from those of health.
Efforts to characterize microbial diversity increasingly rely on cultivation-independent, molecular techniques (Hugenholtz, 2002; Schloss and Handelsman, 2004), since the vast majority of bacteria have yet to be cultivated. Most of these molecular studies are based on the small subunit (16S) ribosomal RNA (rRNA) gene because of its universal presence in cellular organisms, the presence of conserved regions, and its reliability for phylogenetic analysis (Woese and Fox, 1977). Recent molecular surveys of the human distal gut microbiota have shown that each individual gut is home to 500 – 3000 bacterial species, with a large degree of interindividual variation (Eckburg et al., 2005; Dethlefsen et al., 2007; Dethlefsen et al., 2008). Using rRNA gene-based techniques, it is estimated that the human oral cavity harbors 500–700 different bacterial species (Kroes et al., 1999; Paster et al., 2001; Kazor et al., 2003; Aas et al., 2005; Dewhirst, 2008). A recent study based on 14 115 partial 16S rRNA gene sequences from saliva specimens from 120 healthy individuals from 12 different geographic locations around the world found 101 different bacterial genera, with a high level of interindividual variation (Nasidze et al., 2009). Two recent 16S rRNA gene tag pyrosequencing-based studies have suggested that there are approximately 250–300 species-level phylotypes in the mouth of any given individual, and that they segregate based on mucosal versus dental surfaces (Keijser et al., 2008; Zaura et al., 2009). All three of these recent studies are limited by their dependence on relatively short (<500 nucleotides) sequences, and hence, by limited phylogenetic resolution.
We analyzed approximately 1000 near full-length cloned 16S rRNA gene sequences from each of 10 individuals with healthy oral tissues and gingiva, and examined variation in patterns of diversity between individuals.
Specimens were collected from 10 individuals with healthy oral tissues and gingiva (five women; age range 27 to 61 years; average age 38.1 years; ethnicity: 6 Caucasian, 1 Afro-American, 2 Chinese, 1 from India). Oral health status of all individuals was determined by a dentist who performed a full-mouth clinical examination that included inspection of the teeth, oral mucosa, and periodontal tissues. All participants had normal oral mucous membranes and were free from non-restored carious lesions. At most sites, periodontal tissues showed no clinical signs of inflammation such as redness, swelling, or bleeding on probing (BOP) and were judged to be free of gingivitis or periodontitis. Details of the periodontal data from sites from which plaque specimens were taken are provided in Table 1 and Supplemental Methods. From each individual, 26 oral specimens were collected. Separate dental plaque specimens were taken with sterile curets from supragingival and subgingival surfaces of 7 target teeth (#3, 9, 12, 19, 25, 28, and 30). The 26th sample consisted of whole saliva that was expectorated into a test tube. Healthy human mouths have relatively little bacterial biomass compared to the gastrointestinal tract; therefore, because the ultimate purpose of this project was to obtain community-wide shotgun sequence data, specimens were pooled in order to ensure sufficient DNA. One-third of each of the 26 specimens obtained from each individual was combined to obtain ten “individual-specific” pools, while a separate third of each of the subgingival specimens from all 10 individuals (7 specimens per subject) was pooled to create a single “subgingival pool”. To study the influence of DNA isolation method in the UniFrac analysis (see below), specimens were also collected from three additional healthy individuals. These specimens were not included in downstream analyses, unless otherwise noted. Further details about inclusion and exclusion criteria, specimen collection and other procedures are provided in Supplemental Methods.
To extract DNA, pooled specimens were washed twice in 1 ml ice-cold PBS, pelleted by 5 min centrifugation at 16 000 g and 4°C, and resuspended in 100 μl PBS, to reduce the amount of contaminating free human DNA. To this suspension, 10 μl of a 10% Triton-X100 solution, and 2.5 μl of a 20 mg/ml Proteinase K solution (Qiagen) were added, and the suspension was incubated at 60°C for 30 min. 200 μl of a cell lysis buffer (100 mM Tris-HCl pH 7.4, 20 mM EDTA, 5 M guanidine isothiocyanate) was added. In order to obtain maximum bacterial diversity, we split each specimen pool into 2 equal portions. To one specimen portion, three sizes of baked zirconia beads were added, and the mixture was agitated in a FastPrep FP120 machine (Qbiogene, Carlsbad, Calif.) at 4.0 m/s for 30 s. The bead-beaten portion was recombined with the non bead-beaten portion. The DNA was further purified, precipitated, washed, dried, and resuspended in 50 μl of 10 mM Tris, pH 8.0 (details are provided in the Supplemental Methods). Extraction controls were processed in parallel during the DNA extraction procedure to monitor contamination. A second set of pooled oral specimens from three additional healthy mouths was extracted using the QIAamp DNA mini kit (Qiagen).
The 16S rRNA gene was amplified using broad-range bacterial-specific primers 8FM (5′-AGAGTTTGATCMTGGCTCAG-3′) (Edwards et al., 1989; Palmer et al., 2007) and 1391R (5′-GACGGGCGGTGTGTRCA-3′) (Lane et al., 1985; Palmer et al., 2007). These primers amplify approximately 90% of the bacterial 16S rRNA coding sequence. PCR was performed as described previously (Eckburg et al., 2005), except PCRs involved 5 min at 95°C, 20 cycles of 30 s at 94°C, 30 s at 55°C, and 90 s at 72°C, followed by 8 min at 72°C. To obtain sufficient PCR product for cloning, the products of 4 replicate 20-cycle amplification reactions were pooled. No amplification product was observed in the extraction controls and negative PCR controls. Purified PCR products were cloned with the TOPO TA cloning kit (Invitrogen), and plasmid inserts were sequenced on both strands.
A total of 11 447 high quality, ~1400 bp-length 16S rRNA gene sequences were aligned with the on-line Greengenes NAST aligner (DeSantis et al., 2006) (http://greengenes.lbl.gov), and inserted into the Greengenes-version of ARB (Ludwig et al., 2004). The alignment was further perfected by manual optimization. Seventy-nine chimeras (0.7%) were manually identified and removed from the analysis, so that 11 368 sequences were included in the final analysis. Operational taxonomic units (OTUs; phylotypes) were defined using a 99% sequence similarity cutoff, by using similarity matrices and a filter of 1253 nucleotide positions, masking out the hypervariable regions. The 99% cutoff in this setting roughly corresponds to species-level groupings. One representative for each of the 247 OTUs found in this study was deposited in GenBank (accession numbers FJ976202 to FJ976448) (Table S1). Sequences with less than 99% similarity to sequences in public databases were considered novel (Table S2). Genus names were assigned based on placement of sequences within defined groups, or on a cutoff of 95% sequence identity in the case of unclassifiable sequences. The DOTUR and mothur packages were used to calculate the number of OTUs at different cutoffs, and to calculate collectors curves, and the Chao1 species richness calculator (Schloss and Handelsman, 2005; Schloss et al., 2009).
Richness estimates and diversity indices were determined (Simpson and Shannon formulas) with EstimateS (Colwell, 2005). The percentage of coverage was calculated by Good’s method with the formula [1-(n/N)] X 100, where n is the number of phylotypes in a specimen represented by one clone (singletons) and N is the total number of sequences in that specimen (Good, 1953). The Shannon index of evenness was calculated with the formula E = eD/N, where D is the Shannon diversity index.
After calculating, with an Olsen correction, a neighbor-joining tree containing representatives of all 247 OTUs found in this study, the 11 different oral environments were clustered using principal coordinates analysis (PCA), as enabled in UniFrac (Lozupone et al., 2006), using weighted, normalized abundance data. To compare sequence data from oral specimens in this study against data from other locations in the human body from subjects in previously-published studies, UniFrac PCA analysis was also performed on a second dataset. These combined data included data from the 11 oral pools of this study, 3 additional oral pools from healthy human mouths isolated using a different DNA extraction method (QIAamp DNA minikit from Qiagen; 1034 sequences; unpublished data), 18 colonic biopsy and 3 stool specimens from 3 healthy subjects (Eckburg et al., 2005), and 15 stool specimens from 3 healthy subjects in an antibiotic perturbation study (Dethlefsen et al., 2008).
Community composition was examined in two separate ways. First, the communities were compared using shared species estimators. Second, community assembly was examined using taxon co-occurrence. The Chao-Jaccard abundance-based similarity index is a shared species estimator that measures the probability that two individuals chosen from two different specimens are members of species shared by both specimens (Chao, 2005). This particular test can only be used to examine similarity between two communities at a time. The Chao-Jaccard similarity index was calculated using EstimateS (Colwell, 2005) for all possible pair-wise comparisons of the communities from the ten mouths. Community similarity was compared at two taxonomic levels - OTU and genus. The subgingival pool was not included in this analysis.
In addition to community similarity, we tested for non-random patterns of taxon co-occurrence by calculating C-scores for this dataset (Stone and Roberts, 1990). This measure of community structure calculates the number of checkerboard units (specimens in which two taxa are not found together) between all possible taxon pairs in a matrix, and calculates a single score for the entire dataset. The C-score is the average for all of the possible pairs in the matrix. This measure is compared to a null distribution of random matrices of the same size. If the observed C-score is larger than the score for the null hypothesis, it suggests significant segregation between taxa, and if the observed C-score is smaller than the score for the null hypothesis, it suggests significant aggregation between taxa. In this case, we calculated C-scores using an abundance matrix of all taxa, organized by mouth (Table S1), which was then converted to a presence/absence matrix. The subgingival pool was not included in this analysis. These scores were compared to those generated from a null model based on 500 randomly generated matrices of the same size using the program EcoSim (Gotelli and Entsminger, 2004). Co-occurrence patterns were examined at three separate taxonomic levels— OTU level (n=247, approximately species level), genus level (n=53), and phylum level (n=9).
From each of 10 individuals with a healthy oral status, 26 specimens from different parts of the mouth were collected. Portions of the specimens were pooled per individual and an 11th pool was constructed with portions of each subgingival specimen from all 10 individuals. Ribosomal RNA gene sequences were amplified using broad-range bacterial primers, cloned and sequenced. The 11 368 near full-length, non-chimeric sequences of the combined dataset were manually assigned to 247 OTUs (phylotypes) using a cutoff of 99% sequence identity (Table S1). DOTUR and mothur analyses revealed a total of 228 OTUs at this cutoff level, with an expected OTU richness of 236 (Figure S1, which also shows the rarefaction curves of each of the 11 clone libraries). A graph displaying the DOTUR-determined number of phylotypes versus the phylogenetic distance displayed the typical “hockey stick shape” that is found in most animal-associated bacterial communities, with an enriched representation of diversity at the tip (Figure S2). Nine bacterial phyla were identified within the combined dataset (Figure 1). Of these, Firmicutes (33.2% of all sequences; mean abundance in 11 pools 32.2 +/− 8.1%), Proteobacteria (27.5% in combined set; mean 24.6 +/− 8.1%), Bacteroidetes (16.6%; mean 14.6 +/− 8.4%) and Actinobacteria (14.5%; mean 11.9 +/− 10.3%) were the most abundant. Less abundant phyla included Fusobacteria (6.7%; mean 5.6 +/− 4.1), TM7 (1.3%; 0.52 +/− 1.3%), as well as Spirochaetes, OD2, and Synergistes (all <1%). Figure 2 displays a phylogenetic tree and relative abundance of all genera found in this study. In the combined dataset, the genus Streptococcus was the most abundant genus (2180 sequences, 19.2% of total). Other abundant genera include Haemophilus (1325; 11.7%), Neisseria (1042; 9.2%), Prevotella (974; 8.6%), Veillonella (973, 8.6%), and Rothia (820; 7.2%). However, the genera and species that dominate the mouth vary between individual (see below).
Using a 1% sequence identity cutoff, 24 OTUs (10%) were considered novel (Table S2). Of these, 6 had less than 97% sequence identity to published sequences (Table 2). The sequences with the least identity to previously reported sequences were clone 10B928 (phylum Bacteroidetes), which was 92.5% identical to AF371900 (isolated from the intestinal tract of a pig, (Leser et al., 2002)), and clone 7BB842 phylum OD2), which displayed 92.5% sequence similarity to its closest neighbor, AB243989 (detected in a Japanese oil well (unpublished)).
Observed bacterial richness was highest in subject 4, in whom the highest number of OTUs, singletons and doubletons was found (Table 3). In contrast, both Shannon and Simpson estimators of bacterial diversity were the highest for subject 3. This subject also showed the highest Shannon estimator of evenness. Good’s estimator suggested >95% coverage for each of the 11 libraries, indicating that only an additional 5 OTUs would be found if 100 additional clones were sequenced. UniFrac analysis showed no clustering of the oral communities from the ten individuals based on gender, age, or ethnic background (Figure 3A). Pairwise comparisons of the oral pools showed that all individuals were equally distinct (Bonferroni corrected p values all > 0.5).
We compared the oral bacterial communities described in this study with those found in previously published studies of the human colon and stool (Figure 3B). Although these specimens were derived from different studies and different individuals (except for certain stool and colonic specimens that were derived from the same 3 individuals), specimens from different anatomical sites clustered in a distinct fashion; the corrected UniFrac significance (all environments together) was <= 0.01, indicating that the environments were significantly different from each other. Three additional oral communities from QIAamp extracted specimens (CDL, unpublished results) clustered with the 11 communities from 11 benzyl-alcohol-extracted specimens described in this study, suggesting that DNA extraction method accounts for less variation in the composition of communities than do differences between individuals.
The different bacterial communities were compared using Chao’s Jaccard abundance-based similarity index. An average of 50.5 (range 29–76) OTUs were found to be shared between any two specimens (Table 4A). Similarity between communities was typically low, averaging 0.671 (range 0.501 – 0.801) with the raw index and 0.760 (range 0.533 - 0.969) with the estimated index (Table 4B). When community similarity was examined on the genus level, observed shared genera averaged 25 (range 18 – 34) (Table 4A), and the Chao-Jaccard abundance similarity averaged 0.942 (range 0.845–0.988; raw) and 0.963 (range 0.845–1; estimated) (Table 4B). A value of 1 indicates that all genera are shared between the two specimens examined. Fifteen bacterial genera were observed in all 10 healthy individuals: Neisseria, Cardiobacterium, Haemophilus, Campylobacter (Proteobacteria); Streptococcus, Granulicatella, and Veillonella (Firmicutes), Fusobacterium (Fusobacteria); Rothia, Actinomyces, Corynebacterium and Atopobium (Actinobacteria); and Prevotella, Capnocytophaga, and Bergeyella (Bacteroidetes). Every individual also contained TM7 sequences. All of these bacterial taxa were also present in the pooled subgingival library. Of these shared genera, eight had species present in all 10 individuals, leading to eleven shared bacterial species: Haemophilus parainfluenzae, Streptococcus oralis, Streptococcus sanguinis, Granulicatella adiacens, Veillonella parvula, Veillonella dispar, Rothia aeria, Actinomyces naeslundii, Actinomyces odontolyticus, Prevotella melaninogenica, and Capnocytophaga gingivalis.
Despite conserved oral bacterial community composition at the genus level, there were also interindividual differences. Several different patterns of genus dominance were found in the ten healthy mouths. Five of the ten mouths were dominated by Streptococcus species (#2, 5, 7, 9, 10). Two mouths were dominated by Prevotella (#1, 4), and one each was dominated by Neisseria (#3), Haemophilus (#8) and Veillonella (#6) (Figure S3). In addition, even among the genera present in all 10 healthy individuals, the presence of particular species within that genus was variable between individuals. For example, although every subject had sequences belonging to the genus Neisseria, no single Neisseria species was shared across all subjects. The same was true for species in the genera Fusobacterium and Corynebacterium.
Co-occurrence analysis was performed on the data from the ten individual subjects, using the C-score of Stone and Roberts, which compares the taxon distribution of a data set to a randomized distribution of the same number of taxa (Stone and Roberts, 1990). This method calculates the checkerboard units for each taxon pair (how often those two taxa are found together). When analyzed at the level of OTU, the observed C-score was not significantly different from the null hypothesis (random distribution). When the same data were analyzed at the genus level, the C-score indicated that the communities display co-occurrence patterns significantly different from the null hypothesis (observed C = 0.99184, expected C = 0.95366, p =0.02860). These scores (higher than expected) suggested segregation or competition among taxa. Examination of the matrix of checkerboard units between each taxon pair can pinpoint taxa that are more or less likely to be found together. Figure 4 displays the taxa pairs as a matrix of C-scores. Taxa with low C-scores (found together frequently) are colored white, whereas those with high C-scores (rarely or never found together) are colored black. Genus pairs in which both genera are found in all mouths, such as Streptococcus, Neisseria, and Haemophilus have zero checkerboard units, as expected. When examining the genus pairs with high checkerboard units, the genus Abiotrophia was identified as unlikely to be found together with the genera Dialister, Oribacterium, Eubacterium, and Treponema. In addition, the genus Scardovia was unlikely to be found with Eikenella or Dialister. Because it may be inappropriate to compare this broad range of bacterial taxa in a single analysis (due to the fact that members of different phyla may not be in competition), we re-analyzed the OTU-level data, but in this case, comparing the patterns only within a given phyla. In this case, we also calculated the C-scores based on presence/absence for all OTUs (but only within a given phylum). This was repeated for each phylum, except for OD2 and Synergistes, due to the few observations in each of those two groups. This OTU-level, within-phylum analysis revealed that only the taxa within Firmicutes demonstrated a C-score significantly different from the null hypothesis (Obs = 2.1370, Exp = 2.08243, p = 0.03460), suggesting segregation of species, and evidence of possible competitive species interactions.
The composition of the microbial communities on and within the human body varies between individuals. Inter-individual variation has been demonstrated in a variety of studies for the healthy intestinal tract (Eckburg et al., 2005; Dethlefsen et al., 2006; Ley et al., 2006; Palmer et al., 2007). In contrast, knowledge about the inter-individual differences in the healthy human mouth microbiota and the uniqueness of the oral microbiota compared to other microbial communities in our bodies is still somewhat sparse. Several molecular studies have been conducted regarding the composition of the oral microbiota, but these studies used limited numbers of sequences per individual, or only looked at short regions of the 16S rRNA gene (Kroes et al., 1999; Paster et al., 2001; Kazor et al., 2003; Aas et al., 2005). A study by Diaz et al. in three individuals showed that early colonization of enamel is subject-specific (Diaz et al., 2006). The distinctness of the phylogenetic structure of the human oral microbiota in relation to the microbiota of the skin and feces in nine individuals was revealed in a recent study (Costello et al., 2009). While other studies have considered the oral microbiota of a larger number of individuals, our study was based on one of the largest sets of near full-length sequences per individual to date for the human oral cavity. The most important contributions of this work are the combination of depth of coverage and degree of phylogenetic resolution for the human mouth, the features of a human oral core microbiota, and previously-unrecognized patterns of taxon co-occurrence.
In this study, we amplified and analyzed an average number of 1029 near full-length, well-aligned oral 16S rRNA gene sequences (range 931 – 1070) per subject from each of 10 healthy individuals, as well as an additional 1083 clones from the pooled subgingival specimens, bringing the total number of sequences analyzed in this study to 11 368. The advantage of near full-length 16S rRNA gene sequences in providing greater phylogenetic resolution than hypervariable region “tag” sequences was highlighted in a comparative analysis of these two types of sequence data (Huse et al., 2008). In this dataset, we identified a total of 247 different OTUs at the level of species, of which 24 were less than 99% identical than previously published sequences. Approximately 10% of the OTUs found in this study were previously uncharacterized.
The abundant bacterial groups found in our study are similar to those found in most other studies. For example, 20% of our sequences belonged to the genus Streptococcus, confirming the preponderance of Streptococcus species within the healthy mouth by microscopy and culture (Socransky, 1963) and by molecular methods (Kroes et al., 1999). In a recent molecular study, the most predominant bacterial genera in the oral cavity were Streptococcus, Gemella, Abiotrophia, Granulicatella, Rothia, Neisseria, and Prevotella (Aas et al., 2005). We found those same groups to be prevalent as well, but, in addition, we found many Proteobacteria (e.g., Haemophilus, Lautropia) to be abundant. This difference may be the result of a deeper sequencing effort per individual in the current study (average 57.5 clones per subject in the Aas et al. study for a total of 2,589 clones, in contrast to an average 1029 clones per individual in this study). In addition, different DNA extraction methods and different broad-range PCR primers could also explain the divergent results.
Despite the evidence for a conserved healthy oral community at the genus level in all 10 healthy mouths, there was also evidence in this study for large inter-individual differences. Our study confirms results by Nasidze et al. suggesting high variability in the oral microbiome between individuals, although in the latter study, saliva was the only specimen type examined (Nasidze et al., 2009). In addition to Streptococcus, which was the most abundant genus in the combined dataset and in three of the individual mouths, we identified four additional genera that may dominate the oral ecosystem of a healthy subject. Our data indicate that there is a variety of alternative oral bacterial community structures, and a greater degree of variation in patterns of diversity, associated with oral health than previously thought. It remains to be seen what factors, e.g., human genetics or lifestyle, correlate with oral bacterial community structure. Clearly, the concept of a core oral microbiome may be better defined with measurements of community function, rather than community membership. Such analyses will need to include community-wide assessments of gene content, gene transcript abundance, and protein products.
The role of bacteria in periodontal disease is complex, and likely involves polymicrobial consortia (Lepp et al., 2004). Socransky and Haffajee have proposed that the presence of a high proportion of so-called “red complex” bacteria, i.e., Porphyromonas gingivalis, Tannerella forsythia, and Treponema denticola, is associated with periodontal disease (Haffajee et al., 2008; Socransky et al., 1998). In a survey of five healthy mouths, Aas et al. (Aas et al., 2005) did not find any representatives of the “red complex”. Other studies have, however, identified members of this complex in healthy mouths (Ximenez-Fyvie et al., 2000). In our study, all three species were found in subjects with healthy gingival tissues, albeit in low numbers, and limited to subjects 1, 4, and 9. Taken together with previous studies, this study confirms that the ‘red complex’ group may be found in small numbers in healthy individuals. Other bacterial species such as Filifactor alocis, Selenomonas species, and Dialister species have been associated with a worsening periodontal status (Kumar et al., 2005). A bacterial species previously shown to be associated with periodontal health (Veillonella parvula, Veillonella X042, Genbank accession number AF287781) (Kumar et al., 2005) was found in all specimens in this study, and was the third most abundant OTU in our combined sequence dataset.
UniFrac PCA analysis showed no apparent clustering of oral microbial communities based on gender, age, or ethnicity. In addition, UniFrac analysis showed no apparent effect of DNA extraction method of oral specimens. No individual pool was found to be more significantly different than others in pairwise comparisons, and the subgingival library was not significantly different from the individual pools. This may be indicative of the fact that (1) despite the many different habitats in the human mouth, many bacterial species are shared among those habitats, or (2) that the individual pools are dominated by the subgingival specimens. However, the number of subjects in this study was relatively small, and inter-individual differences associated with gender, age, or ethnicity might become apparent when larger numbers of subjects are studied. Because specimens from multiple sites within an individual were pooled, bacterial community differences between anatomical sites could not be examined.
When the oral sequence libraries were compared to similar sequence libraries from the human colon and stool, a clear clustering according to anatomical site was observed. These results need to be interpreted with caution, since data were obtained from different individuals, and differences between study groups might drive some of the findings. But it is appealing to assume that each anatomical location within a healthy human has specific physiochemical conditions that shape the composition of a microbial community specifically adapted to that site. Our finding of human habitat-specific microbial community structure is supported by recently published data (Costello et al., 2009).
Tests for significant segregation patterns of taxa were originally developed as a means of assessing whether competition between taxa is a driving force behind community assembly. C-scores higher than expected are consistent with inter-species competition, as well as with habitat differences that cross over the sampling scheme, and historical processes. We feel that habitat differences (other than host genotype) were minimized in our study due to the fact that the pools presumably represented multiple intra-oral sites in a consistent manner across individuals. However, successional or early historical differences between subjects cannot be eliminated as a possible explanation of the observed segregation patterns. It has been previously suggested that as taxonomic level is refined, C-scores become more statistically significant (Horner-Devine et al., 2007). The fact that significant segregation was found at the genus level in our study but not at a level equivalent to species has several possible interpretations. One possibility is that taxonomic levels are not the relevant biological units of measure. Another possibility is that the level of ecological interest and interaction in the mouth is the level that humans have chosen to label as genus, rather than species.
Co-occurrence analysis not only addresses the forces structuring a community, but also draws attention to specific taxa that have apparent interactions and may be worthy of further investigation. For instance, in this study, Abiotrophia was found to have a high number of checkerboard units with the genera Dialister, Oribacterium, Eubacterium and Treponema, and the genus Scardovia had a high number of checkerboard units with Eikenella and Dialister. Interactions among these genera have not been the focus of research so far, but such research may lead us to understand whether and why these taxa compete. Each of these genera (except Treponema) is represented in this dataset by a single species, each of which has been implicated in human disease; recognition of competitive partners may prove useful in preventive medicine. For instance, it has been suggested that known competitive interactions between Streptococcus mutans and other species may be exploited in order to develop preventive treatments for dental caries, by encouraging growth of species with lower cariogenicity (Kreth et al., 2005).
This study demonstrates that each person’s mouth harbors a unique community of bacterial species, but that these communities tend to be more similar when classified at the level of genus. Ecological tools initially developed for larger organisms, such as co-occurrence analysis, will greatly facilitate the analysis of complex bacterial communities such as those found in the human body, and will enhance our understanding of the role of the microbiota in health and disease.
A. Number of observed species (Sobs) was plotted as a function of the number of clones that were sequenced in the combined dataset of 11 368 clones, sorted by library (first individual pools #1 through 10, followed by the subgingival pool). The number of observed species (Operational Taxonomic Units, OTUs) was calculated using DOTUR and mothur, using different cutoffs (99%, 98%, 97%, 95%, and 90%, respectively). The total number of Sobs was 228 using the 99% cutoff, 172 (98%), 144 (97%), 114 (95%), and 69 (90%), respectively. Data were grouped per library, showing that the addition of each library causes a new increase (“bump”) in the number observed OTUs, except at the 99% cutoff threshold*, which shows the data in randomized order. In addition, the Chao1 estimator (estimating the total bacterial diversity) is plotted, with the grey bars showing the standard deviation. B. Rarefaction curves (1000x randomized) for each of the 10 individual clone libraries, as well as the subgingival pool, calculated using mothur and a 99% OTU cutoff. The first 1100 clones of the combined dataset (“total”) are also shown.
Number of OTUs was calculated by DOTUR on the combined dataset presented in this study in steps of 0.001. The calculated number of OTUs was 228 at 0.01 (1%) distance, with 5804 unique sequences (genetic distance 0).
Genus abundance is plotted against genus rank, for the combined dataset, as well as for each of the individual oral pools (subjects 1 to 10). In all plots, genera are sorted according to their rank (most abundant first) in the combined dataset.
For each of the 247 OTUs (in rows), the taxonomic assignment is given, as well as the numbers of clones in each of the 10 individual pool 16S rRNA gene clone libraries (labeled 1 through 10), and in the subgingival pool clone library (S).
Using a cut-off of 99% sequence identity to published sequences longer than 1000 nucleotides, a total of 24 novel OTUs was found in this study. This table lists the clone number of the sequence that was used as a representative, its assigned accession number, the closest published relative and percentage homology, as well as the number of clones in each of the 11 rRNA gene clone libraries in this study.
We thank Karla Lightfield for technical assistance, and Katie Shelef for help with pooling and extracting the oral specimens. This work was funded by NIH R01-DE014868 (CFL), NIH R01-DE13541 (DAR), and NIH Pioneer Award DP1-OD000964 (DAR). DAR is supported by the Thomas C. and Joan M. Merigan Endowment at Stanford University.