Human Oral Microbiome Database.
The backbone of the HOMD is its set of reference 16S rRNA gene sequences, which are used to define individual human oral taxa and to create the phylogenetic and taxonomic structures of the database. The initial reference set of 16S rRNA gene sequences for the HOMD consisted of over 800 full-length sequences (each of which is greater than 1,500 bases) of named oral bacteria from the oral microbiological literature and strain and clone phylotypes generated from our sequencing and cloning studies. After entering and aligning these sequences in our RNA database, neighbor-joining trees were generated. A total of 78 chimeras were identified and removed from the database. Using a 98.5% sequence similarity cutoff for defining a phylotype, the approximately 800 sequences were placed in 619 taxa. Each taxon was given a human oral taxon (HOT) number (arbitrary number starting at 001). All of the human oral taxa were placed in a full taxonomic classification, including domain, phylum, class, order, family, genus, and species, which can be viewed in Table S4 in the supplemental material. Placement of species in higher taxa was based solely on tree position. For example, Eubacterium saburreum
was placed in the family Lachnospiraceae
and not Eubacteriaceae. Eubacterium saburreum
is not closely related to the type strain of the genus Eubacterium
, Eubacterium limosum
, and will need to be placed in a new genus and subsequently renamed. The HOMD taxonomy is one of several selectable taxonomies available at the Greengenes website (16
). When the HOMD is selected as the taxonomy, the operational taxonomic unit (OTU) numbers equal the human oral taxon numbers.
A summary of the phylogenetic distribution of these 619 taxa is presented in Table . It is notable that 65.6% of the taxa have been cultured. This percentage greatly exceeds that in many natural environments where less than 1% has been cultivated. The six major phyla, Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria, Spirochaetes, and Fusobacteria, contain 96% of the taxa. The remaining phyla, Euryarchaeota, Chlamydia, Chloroflexi, SR1, Synergistetes, Tenericutes, and TM7, contain the remaining 4% of the taxa.
Phylogenetic distribution of 619 taxa in HOMD version 10
A detailed presentation of the phylogenies of the 619 oral taxa can be seen in the phylogenetic trees shown in Fig. to 7. The percentage of times a node was present in the resampling is shown only when it was greater than 50%. The name of each taxon is followed by its designated HOT number, clone or strain number, GenBank accession number, number of clones of the taxon observed in this study, and a symbol indicating each taxon's naming and cultivation status.
FIG. 1. Neighbor-joining tree for human oral taxa in the phylum Tenericutes and the classes Bacilli and Erysipelotrichia of the phylum Firmicutes. The name of each taxon is followed by the oral taxon number, clone or strain number, GenBank accession number, the (more ...) Firmicutes and Tenericutes.
The phylum Tenericutes and the classes Bacilli and Erysipelotrichia of the phylum Firmicutes are shown in Fig. . The taxa belonging to the Firmicutes class Clostridia are shown in Fig. and .
FIG. 2. Neighbor-joining tree for human oral taxa in the class Clostridia of the phylum Firmicutes. Labeling and methods used are as described in Fig. . Major clades are marked as follows: encircled “1,” Collins XI; encircled “2,” (more ...)
Neighbor-joining tree for human oral taxa in the Veillonellaceae (previously Acidaminococcaceae) family of the class Clostridia of the phylum Firmicutes. Labeling and methods used are as described in Fig. .
The class Bacilli
(Fig. , clade designated with an encircled “1”) contains 86 taxa. It includes the genus Streptococcus
, whose members are the most abundant bacterial species in the mouth. The related, but less frequently studied, genera Abiotrophia
, and Granulicatella
are also extremely common, and three species from these genera were among the 10 taxa most frequently detected in our clone libraries. Abiotrophia
require the addition of pyridoxal to grow on blood agar media and were formerly known as the nutritionally variant streptococci (12
The class Erysipelotrichia (Fig. , clade designated with an encircled “3”) contains the following four oral organisms: Bulleidia extructa, Solobacterium moorei, Erysipelothrix tonsillarum, and Lactobacillus [XVII] catenaformis. The meaning of the roman number in square brackets is explained below.
is shown in Fig. and . For the taxonomy of the Clostridia
, a widely used classification is that described by Collins et al. (13
). Therefore, in the Clostridia
trees, Eubacterium infirmum
is written Eubacterium
to indicate that it is a misnamed Eubacterium
sp. falling in Collins cluster XI and that it and neighboring taxa represent a novel genus, as yet unnamed, designated [G-1]. NCBI, and RDP taxonomies, for the most part, remain based on historic names rather than phylogenetic position, even when genera are known to be polyphyletic, thus creating illogical placements at the family and higher taxonomic levels. The major oral clades, corresponding roughly to family-level divisions, are marked in Fig. as follows: Collins cluster XI (encircled “1”); the family Peptostreptococcaceae
(encircled “2”); Collins cluster XIII (encircled “3”); a novel family level cluster, with no named species, designated F-1 (encircled “4”); Collins cluster XV, the Eubacteriaceae sensu stricto
(encircled “5”); Collins cluster XIVa, the Lachnospiraceae
(encircled “6”); the Peptococcaceae
(encircled “7”); Collins cluster VIII, the Syntrophomonadaceae
(encircled “8”); and a novel family-level cluster, with no named species, designated F-2 (encircled “9”). The largest family of the Clostridia
is the Veillonellaceae
(previously called Acidaminococcaceae
), as shown in Fig. . The vast majority of human oral Clostridia
fall in the families Lachnospiraceae
, and Veillonellaceae.
All members of the Veillonellaceae, shown in Fig. , are Gram negative and include the genera Anaeroglobus, Centipeda, Dialister, Megasphaera, Selenomonas, and Veillonella. It is striking that a clade of Gram-negative organisms occurs in an otherwise Gram-positive phylum.
The phylum Tenericutes
(Fig. , designated with an encircled “4”) has recently been created and was previously the class Mollicutes
within the Firmicutes
species have been detected in the saliva of 97% of individuals (68
). Relatively few mycoplasmas have been detected in the clone libraries reported here, but representatives of Mycoplasma hominis
, Mycoplasma salivarium
, and Mycoplasma faucium
were found. Tenericutes
[G-1] sp. oral taxon 504 is very deeply branching and has tentatively been placed with the Tenericutes
awaiting further information.
The phyla Actinobacteria and Fusobacteria are shown in Fig. , and their clades are marked by an encircled “1” and an encircled “5,” respectively. Out of eight named Actinobacteria orders, oral taxa have thus far been found only in the orders Actinomycetales (Fig. , encircled “2”), Bifidobacteriales (encircled “3”), and Coriobacteriales (encircled “4”).
FIG. 4. Neighbor-joining tree for human oral taxa in the phyla Actinobacteria and Fusobacteria. Labeling and methods used are as described in Fig. . Major clades are marked as follows: encircled “1,” phylum Actinobacteria; encircled (more ...) Actinomycetales
(Fig. , encircled “2”) include the genera Actinomyces
, and Corynebacterium. Bifidobacteriales
(Fig. , encircled “3”) include the genera Bifidobacterium
, and Parascardovia
. These genera are found primarily in dental caries and in denture plaque (40
), except for Gardnerella
, which is isolated primarily from the vagina. Coriobacteriales
(Fig. , encircled “4”) include the following genera with oral members: Atopobium
, and Slackia
The phylum Fusobacteria includes the following two genera frequently detected in the mouth: Fusobacterium and Leptotrichia. An unnamed and uncultivated genus, Fusobacteria [G-1], contains two taxa, with Fusobacteria [G-1] sp. oral taxon 220 being relative common with 47 clones detected.
The 107 Bacteroidetes taxa, shown in Fig. , fall into the genera Prevotella, Bacteroides, Porphyromonas, Tannerella, Bergeyella, Capnocytophaga, and eight that are unnamed.
Neighbor-joining tree for human oral taxa in the phylum Bacteroidetes. Labeling and methods used are as described in Fig. . Encircled “0” through “8” symbols refer to clades discussed in the text.
Prevotella is the largest genus, with approximately 50 species. The majority have been cultivated. The clade marked with an encircled “0” in Fig. , including Prevotella tannerae through Prevotella sp. oral taxon 308, warrants separate genus status, as it shares less than 80% 16S rRNA similarity with the main cluster of Prevotella (oral taxa 313 through 304).
In Fig. , eight currently unnamed genera (comprised of 1 to 5 phylotypes each) are marked by encircled “1” through encircled “8.” These taxa are deeply branching and have no closely related named species. Bacteroidaceae [G-1] sp. oral taxon 272 (encircled “1”) is both cultivable and common, with 210 clones observed. Three strains in the Moores' collection labeled Prevotella zoogleoformans have been identified as belonging to oral taxon 272. Bacteroidales [G-2] sp. oral taxon 274 (encircled “2”) is also cultivable and common, with 51 clones observed and an isolate designated “Bacteroides D59” identified from the Moores' collection. The six remaining unnamed genera, marked encircled “3” through encircled “8,” have, as yet, no cultivated isolates, but Bacteroidetes [G-3] sp. oral taxon 281 is common with 54 clones in our collection.
, the only named species of the genus, is a member of the canine oral microbiome and a human pathogen from dog bites (65
). The two human oral Bergeyella
taxa are uncultivated.
All five classes of Proteobacteria, Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Deltaproteobacteria, and Epsilonproteobacteria, contain taxa detected in the human oral cavity. These taxa are shown in Fig. , where the classes are labeled with their corresponding Greek letters.
Neighbor-joining tree for human oral taxa in the phylum Proteobacteria. Labeling and methods used are as described in Fig. . The Proteobacteria classes are marked with the corresponding Greek letters.
include the genera Neisseria
, and Leptothrix.
Most of these genera are aerobes. Gammaproteobacteria
include oral taxa in the following five families: Xanthomonadaceae
(Fig. , encircled “1”), Cardiobacteriaceae
(encircled “2”), Pseudomonadaceae
(encircled “3”), Moraxellaceae
(encircled “4”), Enterobacteriaceae
(encircled “5”), and Pasteurellaceae
(encircled “6”). The Xanthomonadaceae
family (Fig. , encircled “1”) includes Stenotrophomonas maltophilia
, which is an ubiquitous environmental organism that is an opportunistic pathogen which can cause nosocomial infections (57
) but also can contaminate laboratory reagents and show up in cloning libraries as an artifact, as can many members of the Proteobacteria
). BLAST interrogation of the GenBank database with Haemophilus parainfluenzae
sequences frequently results in a match with accession no. AB098612, labeled “Terrahaemophilus aromaticivorans
,” an isolate from petroleum sludge. This species has not been formally proposed and, in our opinion, was likely a human oral H. parainfluenzae
contaminant in their sample.
include species in the genera Desulfovibrio
, and Bdellovibrio. Epsilonproteobacteria
include the genera Campylobacter
, and the misnamed “Bacteroides ureolyticus
,” which falls just inside or adjacent to the genus Campylobacter
phylum is shown as the clade marked with an encircled “1” in Fig. . All human oral taxa identified to date in the phylum Spirochaetes
are members of the genus Treponema
. Over 70% of the 49 identified taxa have thus far resisted cultivation. The number of proposed taxa has decreased from 57 to 49 relative to our initial description of the oral treponemes (18
), as some of the taxa with greater than 98.5% 16S rRNA sequence similarity were combined.
FIG. 7. Neighbor-joining tree for human oral taxa in the phyla Spirochaetes, Chlamydiae, Chloroflexi, Synergistetes, TM7, and SR1. Labeling and methods used are as described in Fig. . The phyla are labeled as follows: encircled “1,” (more ...)
phylum is marked with an encircled “2” in Fig. . Chlamydophila pneumoniae
has been detected in dental plaque (39a
) and is a recognized lung pathogen. Chlamydia
were not observed in our cloning studies, perhaps because the common 9 to 27 (E. coli
numbering) 16S rRNA PCR forward primer contains two mismatches with Chlamydia
). Use of the YM+3 forward primer set (24
) may remedy this problem in future studies.
phylum is marked with an encircled “3” in Fig. . A single Chloroflexi
phylotype has been identified. Chloroflexi
are abundant in studies of wastewater treatment sludge (8
). A closely related phylotype (>95% similarity) has been found in the canine oral cavity (F. E. Dewhirst, unpublished observation), suggesting there are multiple mammalian host-associated species in the phylum Chloroflexi
taxa are shown in the clade marked with an encircled “4” in Fig. . Ten taxa in the phylum Synergistetes
have been identified (34
). This recently described phylum includes a number of genera and phylotypes that have been previously misclassified as belonging to either the Deferribacteres
). Oral strains and clones fall into two main groups, with one which is readily cultivable and includes the named species Jonquetella anthropi
) and Pyramidobacter piscolens
). Most oral sequences fall into the second group (oral taxon 363 through 359), which until recently had no cultivable representatives. However, Vartoukian et al. (67
) have successfully cultured a member of this group by a combination of coculture and colony hybridization-directed enrichment. Members of this phylum are selectively amplified and appear in significant numbers in 16S rRNA gene libraries generated with “Spirochaeta
” selective primer pair F24/M98 (see Table S2 in the supplemental material).
The TM7 phylum is marked with an encircled “5” in Fig. . There are 12 TM7 phylotypes of this phylum shown in Fig. . The name TM7 comes from Torf, mittlere Schicht, clone 7 (or peat, middle layer, clone 7), a study of organisms in a German peat bog (53
). Organisms in this phylum are frequently detected in many environments (29
), but despite the efforts of several laboratories, no members of this phylum have been cultivated, except as microcolonies (23
The SR1 phylum is marked with an encircled “6” in Fig. . The Sulfur River 1 lineage is now recognized as a phylum distinct from OP11 (Obsidian Pool, candidate division 11), with which it was previously grouped (27
). Earlier publications referred to SR1 sp. oral taxon 345 as clone X112 in phylum OP11 (47
). Clones of oral taxon 345 have been obtained from several distinct individuals. This species is enriched in 16S rRNA gene libraries generated with “Bacteroidetes/TM7/SR1” selective primer pair F24/F01 (see Table S1 in the supplemental material). A closely related species (>95% similarity) has been found in the oral cavities of cats and dogs (F. E. Dewhirst, unpublished observation), indicating that there are multiple mammalian host-associated species in the SR1 phylum.
Web-based Human Oral Microbiome Database.
The taxonomic scheme and sequence database described above formed the basis for creating a publically accessible web-based Human Oral Microbiome Database (http://www.homd.org
). It is a resource for phylogenetic, taxonomic, genomic, phenotypic and bibliographic data related to the human oral microbiome and is supported by a contract from the National Institute for Dental and Craniofacial Research to facilitate research by the oral microbiome community. The database was formally launched on 1 March 2008. The HOMD hardware, program architecture, and website navigation have been described in a separate publication (9
). Briefly, the HOMD contains a taxon description page for each taxon with full taxonomy, its status as a named species, an unnamed isolate or an uncultivated phylotype, its type or reference strain number, and links to the literature. As a taxon moves from a phylotype, to an isolate, and to a named species, the database tracks the name and status of the bacterium and provides a stable link to the literature. The community can provide input to taxon descriptions and statuses using the HOMD interface. The HOMD contains a BLASTN tool for identifying the 16S rRNA sequences of isolates or clones. Hundreds of sequences can be submitted at one time, and the results can be downloaded as an Excel file showing the top four hits in the database for each query sequence. For some groups of taxa, 16S rRNA gene sequence analysis of partial sequences does not allow unequivocal identification at the species level. The HOMD BLASTN tool output allows easy detection of ambiguous identifications. It also contains visualization and analysis tools for examining the partial and completed genomes for any taxa of the human oral microbiome. It is likely that genomes for over 300 oral taxa will be available on the HOMD by the end of the Human Microbiome Project (51
To validate and expand the species included in the HOMD and to better understand the diversity and taxon distribution of organisms in the human oral microbiome, 36,043 clones from 633 oral 16S rRNA gene libraries constructed and sequenced in our laboratories were analyzed. Following vector removal, screening for lengths of >300 bases, and chimera checking using multiple programs, 1,290 sequences were rejected, leaving 34,753 for analysis. The clones identified as potentially chimeric by the program Chimera Slayer are listed in Table S3 in the supplemental material. A total of 125 were validated as chimeras. Those not validated include intragenus, intraspecies potential chimeras. The majority of these nonvalidated chimeras were found multiple times, which also suggests that they are not chimeras. BLASTN analyses of the remaining clones against the HOMD Reference Sequence Set version 10 yielded identification of 89.1% of the clones in 525 oral taxa. Those not identified in the HOMD were analyzed by BLASTN comparison to the more than 70,000 16S rRNA sequences in RDP (11
) and 302,066 sequences in the Greengenes prokMSA unaligned set (16
). Using the RDP and Greengenes databases, an additional 8.7% of clones were matched to nonself sequences. The 2.2% of the clones which remained unidentified were clustered into 325 novel taxa, 220 of which were singletons. The clone analysis yielded a total of 654 additional taxa not included in the HOMD version 10. Because novel taxa based on singletons are suspect compared to those seen multiple times, a more conservative figure excluding the 220 singletons is 434 novel taxa. These additional taxa will be added to the HOMD when they meet the strict criteria described below.
A total of 94 of 619 taxa in the HOMD version 10 were not represented by any clone. The list of taxa not found is provided in Table S5 in the supplemental material. This group included most of the extrinsic pathogens we had added to the HOMD for completeness, such as Bordetella pertussis, Corynebacterium diphtheriae, Mycobacterium tuberculosis, and Neisseria gonorrhoeae. Other taxa not seen included those which would not have been expected to have been amplified with the “universal” primers used, such as Methanobrevibacter oralis (an archaeon) and Chlamydophila pneumoniae.
The phylogenetic distribution of clones in various phyla or divisions is shown in Table . The clones were found in 11 of 13 phyla included in the HOMD version 10. As the primers used do not amplify Chlamydiae or Archaea, the absence of clones representing these taxa was expected. Clones for the phyla Deinococcus (3 clones), Acidobacteria (1), and Cyanobacteria (1) were found. The clones for these phyla may represent transient exogenous bacteria. Nineteen clones representing four plants (chloroplast) were detected. As virtually all humans eat plants, it is not surprising that plant chloroplast 16S rRNA sequences should be detected in the oral cavity. Species identified included Triticum aestivum (wheat) and Manihot esculenta (cassava, source of tapioca). For each phylum, the percentage of taxa and clones that represent named species, unnamed taxa with cultivable isolates, and uncultivated phylotypes is presented in Table . When analyzed by clone distribution, 82% fall into cultivable named and unnamed taxa. When analyzed by taxa distribution, the percent cultivated drops to 32% (40% if singleton clone taxa are removed). This divergence in percent cultivated by clone or taxa count indicates that we have had great success culturing the most common species but have not yet identified isolates for the rarer taxa. The microbiologic community has cultured between 29% and 50% of the oral Firmicutes, Proteobacteria, Bacteroidetes, Actinobacteria, and Fusobacteria taxa. Less than 12% of taxa in the Spirochaetes and Synergistetes, however, have been cultured. TM7, SR1, and Chloroflexi have, as yet, no cultivated members (the phylum Chloroflexi has cultivated members but no cultivated human oral species). The rarer taxa, such as SR1, are present in the clone pool at level of only 1/5,000. This implies that even if the SR1 taxon is only moderately difficult to cultivate, it will still be challenging to find an isolate because of its rarity.
Phylogenetic distribution of 34,753 oral clones
A plot of the relative abundance of clones in each of the 1,179 oral taxa is shown in Fig. . Veillonella parvula was the first ranked taxon, with 2,304 clones or 6.6% of the total clones observed. The complete listing of the rank abundance and relative abundance of each taxon is presented in Table S6 in the supplemental material. Because the 34,753 clones are from studies of different disease and health states, different oral sites, and different primer pairs, the data set cannot be used to infer the underlying population structure of the “oral cavity.” We therefore do not attempt to estimate the number of species in the oral cavity with these data. Furthermore, the question of “how many species are present in the human oral cavity?” is tricky, because the oral cavity is an open system where exogenous microorganisms from the environment are continually introduced by eating, drinking, and breathing. One answer to the question of how many microbial species would be found by exhaustive sampling of the oral cavities of the human population over time is all microbial species present in the earth's biosphere. A more straightforward question which can be answered for this data set is, “How many taxa are needed to account for the 90%, 95%, 98%, or 99% of the 34,753 clones sampled?” The answers are 259, 413, 655, and 875 taxa, respectively.
FIG. 8. Rank abundance graph for 34,753 16S rRNA clones obtained from oral samples in 1,179 taxa. Clones were placed in taxa on the basis of 98% BLASTN identities. The first ranked taxon was Veillonella parvula, with 2,304 clones. Ranks 769 to 1,179 were (more ...) Transient versus endogenous species.
Because there is a constant stream of organisms introduced into the oral cavity from the environment, one needs to distinguish transient species from endogenous species, those that are part of what Theodor Rosebury called the indigenous microbes or “normal flora” (54
). Unfortunately, the distinction between transient species and endogenous species cannot be deduced directly from human sampling studies. Rather, it has to come from comparing the human studies with environmental studies to determine the frequency with which clones of a particular genus are recovered as host associated or as environment associated. For example, species in the genus Prevotella
have been found only in the microbiota of mammals, whereas species in the genus Sphingomonas
are generally free-living species found in the environment in cloning studies of lake sediments, soils, etc. Table S7 in the supplemental material presents the categorization of the 169 genera currently included in the HOMD as strongly host associated, weakly host associated, or environmental based on a qualitative analysis of clone sequence sources. The degree of host association for various genera differs widely by phylogenetic position, as essentially all genera from the phylum Bacteroidetes
are host associated, while nearly all genera from the Alphaproteobacteria
are thought to be environmental transients.
Extended Reference Set for BLAST analyses.
Reference sequences for the 654 additional taxa identified in the clone analysis have been added to those for the 619 HOMD taxa to generate the Extended Reference Set (version 1), which is available for download or use in BLAST analysis at the HOMD website. Many sequences from the clone analysis are only 500 bases long, and thus, the Extended Reference Set, unlike the HOMD Reference Set 10, is composed of both full and partial sequences. The Extended Reference Set is useful for BLAST analysis of clone and other 5′ 500-base sequences but must be used with caution for analysis of full-length sequences.
Addition of taxa to the Human Oral Microbiome Database.
The 434 additional taxa identified in the clone analysis (excluding singletons) described above are candidates for addition to the HOMD. To be added, however, each taxon must meet the following criteria. (i) It must have been seen more than once, hence the exclusion of novel singletons. (ii) A near full-length (>1450-base) 16S rRNA gene sequence must be available for the taxon. (iii) The full sequence must be less than 98.5% similar to previously defined HOMD taxa. (iv) The full sequence must pass chimera and sequence quality screens. (v) If the sequence is for a named species, the sequence must greater than 98.5% similar to that of the type strain of the species. (vi) If there are multiple sequences in GenBank for a species, the longest and most accurate must be identified to represent the species. We expect that the majority of the taxa identified in the clone analysis will be validated and added in updates to the HOMD by January 2011.
The human oral microbiome has been extensively studied, is being examined as part of the Human Microbiome Project, and will continue to be examined in the future using ever more powerful sequencing technologies. We have created the Human Oral Microbiome Database taxonomic framework, with oral taxon numbers to facilitate communication between investigators exploring the diversity of the oral microbiome, as seen by 16S rRNA gene-based methods. It is critical that investigators can point to curated stably designated taxa rather than taxonomically undefined clone sequences in disseminating research findings. Investigators with oral strain or clone 16S RNA sequences can now definitively identify the vast majority of them by BLASTN analysis against the reference sets available at the HOMD website (www.homd.org
). Analysis of approximately 35,000 clone sequences has allowed validation of the initial 619 species in the HOMD and identified more than 434 additional named and unnamed taxa to be added to the HOMD once full 16S rRNA sequences are obtained and the other criteria discussed above are met. The HOMD is the first example of a curated human body site-specific microbiome resource. Predominant members of the oral microbiome can also be found in fewer numbers at other body sites, such as the skin, gut, and vagina, and thus, the HOMD can be useful to the entire Human Microbiome Project and infectious disease communities. One can foresee the development of additional body site-specific curated microbiome resources based on the HOMD model or the framework of the HOMD to be expanded to include the entire human microbiome.