|Home | About | Journals | Submit | Contact Us | Français|
Rapidly developing sequencing methods and analytical techniques are enhancing our ability to understand the human microbiome, and, indeed, how we define the microbiome and its constituents. In this review we highlight recent research that expands our ability to understand the human microbiome on different spatial and temporal scales, including daily timeseries datasets spanning months. Furthermore, we discuss emerging concepts related to defining operational taxonomic units, diversity indices, core versus transient microbiomes and the possibility of enterotypes. Additional advances in sequencing technology and in our understanding of the microbiome will provide exciting prospects for exploiting the microbiota for personalized medicine.
The human microbiota consists of the 10-100 trillion symbiotic microbial cells harbored by each person, primarily bacteria in the gut; the human microbiome consists of the genes these cells harbor. Microbiome projects worldwide have been launched with the goal of understanding the roles that these symbionts play and their impacts on human health[2, 3]. Just as the question, “what is it to be human?”, has troubled humans from the beginning of recorded history, the question, “what is the human microbiome?” has troubled researchers since the term was coined by Joshua Lederberg in 2001 . Specifying the definition of the human microbiome has been complicated by confusion about terminology: for example, “microbiota” (the microbial taxa associated with humans) and “microbiome” (the catalog of these microbes and their genes) are often used interchangeably. In addition, the term “metagenomics” originally referred to shotgun characterization of total DNA, although now it is increasingly being applied to studies of marker genes such as the 16S rRNA gene. More fundamentally, however, new findings are leading us to question the concepts that are central to establishing the definition of the human microbiome, such as the stability of an individual's microbiome, the definition of the OTUs (Operational Taxonomic Units) that make up the microbiota, and whether a person has one microbiome or many. In this review, we cover progress towards defining the human microbiome in these different respects.
Studies of the diversity of the human microbiome started with Antonie van Leewenhoek, who, as early as the 1680s, had compared his oral and fecal microbiota. He noted the striking differences in microbes between these two habitats and also between samples from individuals in states of health and disease in both of these sites [5, 6].Thus, studies of the profound differences in microbes at different body sites, and between health and disease, are as old as microbiology itself. What is new today is not the ability to observe these obvious differences, but rather the ability to use powerful molecular techniques to gain insight into why these differences exist, and to understand how we can affect transformations from one state to another.
Culture-independent methods for characterizing the microbiota, together with a molecular phylogenetic approach to organizing life's diversity, provided a fundamental breakthrough in allowing researchers to compare microbial communities across environments within a unified phylogenetic context (reviewed in ). Although host-associated microbes are presumably acquired from the environment, the composition of the mammalian microbiota, especially in the gut, is surprisingly different from free-living microbial communities . In fact, an analysis of bacterial diversity from free-living communities in terrestrial, marine, and freshwater environments as well as communities associated with animals suggests that the vertebrate gut is an extreme . In contrast, bacterial communities from environments typically considered extreme, such as acidic hot springs and hydrothermal vents, are similar to communities in many other environments. This suggests that coevolution between vertebrates and their microbial consortia over hundreds of millions of years has selected for a specialized community of microbes that thrive in the gut's warm, eutrophic, and stable environment. In the human gut and across human-associated habitats, bacteria comprise the bulk of the biomass and diversity, though archaea, eukaryotes, and viruses are also present in smaller numbers and should not be neglected[10, 11].
Interestingly, estimates of the human gene catalog and the diversity of the human genome pale in comparison to estimates of the diversity of the microbiome. For example, the Meta-HIT consortium reported a gene catalog of 3.3 million non-redundant genes in the human gut microbiome alone, as compared to the ~22,000 genes present in the entire human genome. Similarly, the diversity among the microbiome of individuals is immense compared to genomic variation: individual humans are about 99.9% identical to one another in terms of their host genome, but can be 80-90% different from one another in terms of the microbiome of their hand or gut. These findings suggest that employing the variation contained within the microbiome will be much more fruitful in personalized medicine, the use of an individual patient's genetic data to inform healthcare decisions, than approaches that target the relatively constant host genome.
Many fundamental questions about the human microbiome were difficult or impossible to address until recently. Some questions, such as the perennially popular “how many species live in a given body site?”, are still hard to answer, due to problems with definitions of bacterial species and with the rate of sequencing error. Other questions, such as “how does the diversity within a person over time compare to the diversity between people?”, or “how does the diversity between sites on the same person's body compare to the diversity between different people at the same site?”, or “is there a core set of microbial species that we all share?”, can now be answered conclusively. In the next section, we discuss some of the tools that have allowed these long-standing questions to be answered.
The drastic reduction in sequencing costs experienced over the past few years has made it possible to identify specific microbial taxa found within the human gut that are difficult or impossible to culture. Researchers are now able to generate millions of sequences per sample in order to assess differences in microbial communities between body sites and individuals. Our increased sequencing power has required the development of equally powerful computational tools to handle the burgeoning amount of sequence data produced by modern technologies. There are several pipelines for analysis of microbial microbial community data such as mothur, w.A.T.E.R.S, the RDP pyroseqeuncing tools, and QIIME (pronounced “chime”). QIIME is a free, open-source platform for the analysis of high-throughput sequencing data that enables users to import raw sequence data and readily produce measures of inter- and intra-sample diversity. Consistency in the identification of operational taxonomic units (OTUs) and establishing agreed-upon measures of diversity within and between samples are crucial for the comparison of results across studies, although the concept of OTU is increasingly problematic as sequence data accumulate and explicitly phylogenetic approaches gain in popularity.
Beta diversity refers to the measurement of the degree of difference in community membership or structure between two samples. A recent review of taxon-based measurements of beta diversity found that some metrics, including Canberra and Gower distances, have increased power for discriminating clusters, while other metrics, such as chi-squared and Pearson correlation distances, are more appropriate for elucidating the effects of environmental gradients on communities. A robust method for comparing the differences between microbial communities is UniFrac, which measures the proportion of shared branch lengths on a phylogenetic tree between samples. Highly similar microbial communities result in UniFrac scores near 0, while two completely independent communities that do not share any branch length (i.e. they have a different evolutionary history) would result in a UniFrac score of 1. Principal coordinates analysis (PCoA) can then visualize the Unifrac distances between samples in two-dimensional or three-dimensional space, allowing for the clustering of similar communities or separation of distinct communities to be easily distinguished visually.
UniFrac as a measure of beta diversity, coupled to PCoA, has the ability to distinguish differences between communities utilizing as little as 10 sequences per sample. It is important to recognize that increased sequencing depth is not always necessary to recover biologically meaningful results when those results are obvious. Thus, by choosing diversity measurements that are appropriate for a study design, researchers utilizing modern sequencing methods are able to characterize differences between samples at relatively low sequence coverage. This enables researchers to assess fine-grained spatial and temporal patterns by characterizing hundreds to thousands of samples, such as timeseries across multiple patients or environments. The functionality of UniFrac, as well as a multitude of diversity measurements are available in QIIME and can be readily compared.
In general, pipelines for analyzing 16S rRNA and shotgun metagenomic data have separate workflows. Some initial steps, such as demultiplexing (removing barcodes from and separating pooled samples) and quality filtering, are common to both pipelines. However, for 16S rRNA data, sequences must be grouped into OTUs, chimeric sequences generated by incomplete template extension must be removed, and phylogenetic trees must be constructed. In contrast, in the metagenomic pipeline, sequences must be assigned to functions as well as to taxonomy (either as whole reads or after assembly). Once taxon or gene function tables are constructed, the pipelines begin to converge, at least conceptually: the interest is then in 1) the composition of each sample, 2) finding the taxa or functions that discriminate among groups of samples (e.g. according to clinical parameters), and 3) in asking whether the samples cluster according to any measured clinical states (or according to time). One exciting emerging direction is comparing metagenomic and 16S rRNA clustering directly using a technique called Procrustes analysis that allows the PCoA plots to be combined. Another powerful tool is the use of machine learning and statistical techniques to build predictive models of taxa or functions that discriminate between groups of samples.
A unique advantage of QIIME relative to other pipelines is its ability to exploit “sample metadata”, e.g. clinical information about subjects, to produce visualizations that make the main patterns in the data immediately apparent. Of particular interest, QIIME supports the MIMARKS (Minimum Information about a MARKer Sequence) standard developed by the Genomic Standards Consortium, which is increasingly popular with other tools for microbial and community analysis such as MG-RAST, and has been adopted by the INSDC (International Nucleotide Sequence Database Consortium, which includes GenBank, EBI, and DDBJ) as the standard for metadata.
With these tools in hand, basic patterns of similarities and differences in the microbiota are now routine. The key challenge now is to extend analyses to include longitudinal studies and to understand the role of specific host and environmental factors in the development and maintenance of the microbiome.
The gastrointestinal (GI) tract of a human infant provides a brand new environment for microbial colonization. Indeed, the microbiota that an infant begins to acquire depends strongly on mode of delivery. Twenty minutes after birth, the microbiota of vaginally delivered infants resembles the microbiota of their mother's vagina, while infants delivered via Cesarean section harbor microbial communities typically found on human skin. The acquisition of microbiota continues over the first few years of life, as an infant's GI tract microbiome begins to resemble that of an adult as early as 1 year of life. In one case-study following an infant's microbiota over the first 2.5 years of life, phylogenetic diversity increases significantly and linearly with time. Additionally, significant changes in gut microbiota composition were apparent at five time points; starting a diet of breast milk, development of fever at day 92, introduction of rice cereal at day 134, introduction of formula and table foods at day 161, and antibiotic treatment and adult diet at day 371. Interestingly, each dietary change was accompanied by changes in gut microbiota and the enrichment of corresponding genes. For example, as the infant began to receive a full adult diet, genes in the microbiome associated with vitamin biosynthesis and polysaccharide digestion became enriched.
The interaction between the human microbiota and the environment is dynamic, with human microbes flowing freely onto the surfaces we interact with everyday. Fierer et al. showed that human fingertips can transfer signature communities of microbes onto keyboards and these communities strongly differentiate individuals . PCoA plots showed that it was possible to determine which fingers were typing on which keys, and which individuals were using which keyboards: it was even possible to link a person's hand to the computer mouse they use with up to 95% accuracy when compared to a database of other hands. Overall, this study showed that microbial communities are constantly being transferred between surfaces, and that a dynamic interaction exists between environmental microbiota and different human body sites.
Another interesting question that we are just beginning to answer is how stable the microbiome within an individual is over time. By defining what constitutes normal temporal variation in an individual over time, we will be better able to quantify and understand changes in microbial communities that result from dietary and pharmaceutical interventions. In the longest timeseries study to date, Caporaso et al. sampled two individual's microbial communities in the gut, oral cavity, and left and right palms over 396 time points spanning 15 months. Communities at different body sites were readily distinguishable from one another using 3-D PCoA plots over a one year time span, even though the community structure within a given site was highly variable. The level of diversity is also different between body sites, with the mouth and gut harboring the most diverse communities. Taken together, these studies show that an individual's microbiota represents a highly variable and compartmentalized ecosystem.
Overall, it has yet to be conclusively proven that individuals, or even body sites, harbor a “core” set of specific bacterial taxa. For example, the Meta-HIT consortium defined a “core” set of lineages as those that were present in half of the subjects studied, although essentially no genes were present in all subjects studied. Of course, it is important to recognize that sampling depth may be critical for distinguishing taxa that are absent from those that are merely very rare; the dynamic range of microbial abundance is also quite large, and even within the Meta-HIT “core” genes, 2000-fold ranges of abundance were not uncommon. Proving that a taxon is completely absent in the gut is not possible with these types of studies, so core calculations should always carry with them a caveat about sequencing depth. Another factor to consider when defining diversity and a core is that methodological artifacts can greatly increase the apparent numbers of OTUs in a sample (and hence reduce the apparent fraction that is shared). Both sequencing error[38, 39] and issues related to alignment, especially multiple sequence alignment[40-43], can inflate the number of OTUs immensely. It is important to ensure that the same methodological procedures were used when performing estimates of the core in terms of the fraction of individuals the core must be represented in, the minimum abundance, and the procedure for deciding which sequences count as “the same”. Finally, there is a key question about whether variation around a core is structured so that humans harbor only a few general types of microbiota profiles in a given body site: this is well established for the vagina but more controversial in the gut. In general, extreme caution must be applied when performing clustering procedures, as many will break up continuous variation into clusters where none exist. Robust model selection procedures that incorporate the possibility that only continuous variation, not discrete clusters, exist remain to be developed within the context of microbial community analysis.
There is increasing evidence that individuals actually share a “core microbiome” rather than “core microbiota”. In a study of monozygotic and dizygotic twin pairs concordant for obesity or leanness, a subset of identifiable microbial genes, but not species, were shared between all individuals. Remarkably, vastly different sets of microbial species yielded very similar functional KEGG pathways. However, deviations from this core microbiome were apparent in obese subjects, suggesting that it will be important to utilize metagenomic data in addition to determining microbial community composition with 16S marker gene studies when assessing differences between disease states. Understanding whether this principle holds true for other body sites will be fascinating; cross-biome metagenomic comparisons have been exceedingly rare to date[46, 47].
The evidence is mounting for the inextricable link between a host's microbiota, digestion, and metabolism. In an analysis of humans and 59 additional mammalian species, 16S rRNA sequences clustered together carnivores, omnivores, and herbivores in principal coordinate spacing, showing that community structures differ depending on diets. Dietary changes in mice can also lead to significant changes in bacterial metabolism, especially small chain fatty acids and amino acids, in as little as one week, and can lead to large changes after only one day. Importantly, the genetic diversity found within our gut microbiota allows us to digest compounds via metabolic pathways not explicitly coded for in the mammalian genome, greatly increasing our ability to extract energy from our diverse diets[51, 52].
Gut microbiota also seem to play an important role in obesity. Germ-free mice that receive a transplant of gut microbiota from conventional mice have an increase in adiposity without increasing food intake due to increased energy extraction from the diet and increased energy deposition into host adipocytes. The two major microbial divisions, Firmicutes and Bacteriodetes, show different abundances depending on phenotype. Decreased Bacteriodetes and increased Firmicutes have been found in genetically obese mice (ob/ob) when compared to their lean counterparts, and the obesity phenotype can even be transferred to a germ-free but genetically wild-type mouse by way of the microbiota, and the phenotype is due to energy balance: bomb calorimetry of the fecal pellets reveal that the ob/ob mice extract more energy from their diet, and leave less behind in the feces. Fascinatingly, the same effects hold true for another mouse model, the TLR5 knockout mice, which also become obese in some mouse facilities (but develop colitis in others, presumably due to differences in the background microbiota). The TLR5 knockout mice also produce a transmissible obesity phenotype, but no difference in the efficiency of energy harvest is involved. Instead, the altered microbiota somehow makes the mice hungrier, and their microbe-induced obesity can be cured by restricting the amount of food in their cages to that consumed by wild-type mice, as well as by antibiotics. The correlation between microbes and obesity is perhaps best illustrated through weight loss. As different groups of human subjects were placed on either a fat-restricted or carbohydrate-restricted diet, their abundance of Bacteriodetes increased as their body weight decreased, transitioning from the signature ‘obese’ microbial community to a ‘lean’ community. Thus, the modulation of a patient's microbiota might be a therapeutic option for promoting weight loss in obese patients or promoting weight gain in underweight children.
Surprisingly, the microbes that we ingest with our food might be providing our individual microbiome with new genes to digest new foods. Hehemann et al. found that a new class of glycoside hydrolases used to digest porphyran, a polysaccharide common in red algae, was also found in human stool samples as a gene in Bacteriodes plebeius. A closer examination of the stool metadata revealed that the stool samples containing the porphyran-digesting gene were only present in Japanese individuals; the gene was not found in the gut microbiome of the individuals of the United States. Why would a marine gene be found in human gut? The authors concluded that the seaweed common to the Japanese, but not American, diet contained the microorganism which transferred the genes to gut microbiome. Thus, microbes have the ability to greatly increase the number of metabolic tools of the human gut, allowing us to digest an array of substrates.
Given the relative stability of the human gut microbiota, one key question is whether it is sufficiently plastic to allow well-defined interventions to improve health. As described above, the gut microbiota is fairly stable over time once established, at least compared to the differences between individuals. However, a number of studies demonstrate that external forces can alter the community of microbes located in the GI tract and antibiotics are an important example.
Antibiotics are mainly used to combat pathogenic bacterial species that reside within or have invaded a host, however the current generation of antibiotics are broad spectrum and target broad swaths of the normal microbiota as well. Thus, antibiotics significantly affect the host's innate gut microbiota. Three to four days after treatment with the broad-spectrum antibiotic ciprofloxacin the gut microbiota experience a decrease in taxonomic richness, diversity, and evenness[58, 59]. The large magnitude of changes in the gut microbiota demonstrated significant interpersonal variability. While the gut microbiota began to resemble it's pre-treatment state a week after treatment, differences between individuals were seen with regards to how closely the post-treatment community resembled the pre-treatment community, and some taxa failed to return to the community[59, 60]. Indeed, the reestablishment of some species can be affected for up to four years following antibiotic treatment. Yet the overall recovery of the gut microbiota following antibiotic treatments suggests that there are factors within the community, biotic or abiotic, than promote community resilience, although these have yet to be elucidated.
Other antibiotics also tend to produce results that differ substantially between subjects[62, 63] and even body sites. Because larger populations have not yet been studied, in part due to ethical issues with administration of antibiotics to healthy human subjects, the basis for these underlying differences has not yet been elucidated. Understanding the factors that determine the ability of a microbiota to resist and recover from perturbation, as well as understanding the factors that determine its current state, will be key to developing tools to assist in microbiome manipulation. For example, counter-intuitively, in rats the administration of antibiotics prior to cecal transplant actually reduces the chance that new microbes will establish.
One fascinating hint that the microbiota may be more plastic than imagined is the recent success of treatment of persistent Clostridum difficile infections via stool transplant, which has been successful in a number of studies[66-72], and in general the depauperate gut community produced during the C. difficile infection is replaced by the donor community[67, 73]. The success of this technique is remarkable, especially considering how little is known about the best community to supply. For example, is it better to receive the fecal community of a close relative or of a cohabiting individual, or perhaps to bank one's own stool before beginning antibiotic treatment so that it can be restored later? Is the same stool good for everyone, or do the vast differences in the microbiota imply that each person's microbes are specifically adapted relative to those they might receive from a donor? As with blood types, are there “universal donors” and “universal recipients”? These and many other questions remain to be answered.
As in every year since the initial sequencing of DNA, this year has resulted in an unprecedented growth in the amount of sequence data collected at an unprecedentedly low cost. Increasingly powerful tools used to extract meaningful patterns from this wealth of data have been developed or updated as well. Emerging technologies such as stool transplantation, 16S rRNA and whole-genome sequencing on the Illumina platform, the ability to transplant human microbial communities into mice with high efficiency even from frozen samples, and the creation of personalized culture collections raises the prospect of a future in which therapies for individual humans are piloted in a battery of mice that are subjected to different treatments, and where leave-one-out experiments that reveal the effects of the deletion of individual species or individual genes from within a species allow insight into mechanism. Although the tools we have available are still imperfect (for example, the limited read length of today's high-throughput sequencing technologies limit the ability to detect bacterial species and strains, and analyses of viruses and eukaryotes are still very much an emerging frontier), the prospects for developing a mechanistic understanding of the factors that underlie the plasticity of the microbiome and then for manipulating the microbiome to improve health seem increasingly bright.
Work cited in this review from the author's lab was supported in part by grants from the National Institutes of Health [HG4872 (RK)], the Crohn's and Colitis Foundation of America, and the Howard Hughes Medical Institute [RK].