For investigators who have not conducted or even considered performing microbiome research the variety of analytic methods that are available can appear daunting (Robinson et al., 2010
). Many of the culture-independent methods for studying microbial ecology have benefited from the advances in next-generation sequencing platforms (Andersson et al., 2008
; Huse et al., 2008
). Additionally advances in mass spectroscopy and other means for large-scale analysis of complex mixtures of proteins and metabolites have been applied for microbiome research in order to try and make sense of which methods might be appropriate for specific biomedical questions. Investigators can start by understanding what types of questions each specific method is best suited to address. One way to consider the suite of methods for microbiome analysis available is to divide them into groups based on the specific types of information they provide about a given microbial community. There are several useful reviews that describe the available technologies (Zoetendal et al., 2008
; Simon and Daniel, 2011
). The first type of information available is information about the structure of a specific consortium of microbes. This can be thought of as a census of microbes both in terms of the number of different types of microbes and their relative abundance. The next type of information goes beyond community structure and provides a cataloging of functional capacity of the entire community. The final type of information that can be gained by certain analytic methods gives information regarding the in situ
activity of the given microbial community. We will discuss each of these platforms in more detail as well as describe in general terms specific analytic methods that can provide each of these types of data.
16S sequence retrieval: choosing the appropriate platform
Landmark ideas and research from Woese et al. (1990
) and Pace (2009
) established a common metric for identifying microbes – the nucleotide sequence of the small subunit (SSU) ribosomal RNA. The gene encoding the SSU rRNA has a sedimentation coefficient of 16S which is unique to bacteria and archaea and allows distinction from the SSU rRNA from human eukaryotic cells. Initially, SSU sequences were obtained by amplification and sequencing of SSU genes from complex microbial communities and then compared to databases containing more than 2 million aligned rRNA gene sequences (DeSantis et al., 2006
; Pruesse et al., 2007
; Cole et al., 2009
) to provide a census of microbes in each sample. More recently, the application of “next-generation” sequencing platforms has increased the number of sequences that can be obtained, as well as lowering the cost of analysis (Sogin et al., 2006
; Huse et al., 2008
Analysis of the data obtained by SSU sequence analysis continues to evolve, but two general approaches are used to bin or classify the sequences into microbial populations. Sequences can be compared to reference taxonomic outlines and binned based on similarity to references sequences (“phylotyping”) or the sequences can be assigned to operation taxonomic units (OTUs) based on similarity to other sequences within a given dataset. There are relative advantages and disadvantages of each method (see Schloss and Westcott, 2011
for a discussion). It should be noted however, that the desire to “name” a given community member based on SSU analysis can be complicated by the fact that most existing taxonomies for bacteria are based solely on cultured organisms.
The number of sequences required to assess microbial communities depends both on the questions being asked as well as the spatial and temporal variability in a community. Deeper sequencing will uncover less common members of a community, which may be necessary to enumerate a particular pathogen, but shifts in overall community structure can be detected identified with many fewer sequences (Young and Schmidt, 2004
; Antonopoulos et al., 2009
). A critical factor in determining the depth of sequencing required to address a question is to assess variability within replicate samples and determine if the variability is less than that found in treatment level comparisons. Pilot studies with either clone libraries or high-throughput sequencing methods are essential to identify the degree of variability and will establish the extent of sequencing required in a full-scale experimental design.
The next step is to consider how the structure of the microbial community might relate to its function. 16S sequences on their own do not provide specific functional information. However, if there is a genome sequence available corresponding to a bacterium with a given 16S with a known function, it may be possible to infer the functional capacity. It should be noted that inference of the metabolic potential of an organism based on its SSU rRNA gene sequences may also be complicated by the lateral transfer of genes between microbes.
Looking at the “big picture”: metagenomes, metatranscriptomes, and in situ analysis
Rather than inferring metabolic potential from 16S rRNA gene sequences, the genetic diversity of the microbiome can be accessed directly through shotgun metagenomes (Handelsman, 2004
; Riesenfeld et al., 2004
; Streit and Schmitz, 2004
; Gill et al., 2006
). In this approach, DNA extracted from a sample of the microbiome is sequenced directly, rather than following amplification of a specific gene (e.g., 16S rRNA). The absence of a specific amplification step to recover microbial genes often means that suitable amounts of DNA from microbial communities are difficult to obtain, particularly without interference from host DNA. Physical methods for separating microbial communities from host tissue, including the user of lasers to remove attached microbes from epithelial cells in the GI tract (Wang et al., 2010a
), can be effective, but typically provides insufficient DNA for direct sequencing. Fortunately, there are approaches for whole genome amplification that can be employed to produce sufficient DNA for metagenomic sequencing (Binga et al., 2008
). Understanding the biases and variability introduced by each of these steps is essential for a meaningful analysis of the resulting sequences.
When sequences derived from metagenomes are compared to previously characterized genes, using platforms such as MG-RAST (Glass et al., 2010
), a picture of the metabolic potential of a community emerges. Millions of sequences from shotgun metagenomes from the human GI tract (Qin et al., 2010
) have been generated in an effort to identify those that are consistent with health and various disease states. It has been suggested that while the taxonomic structure of microbiomes can fluctuate considerably, the composition of metabolic genes remains consistent (Turnbaugh et al., 2009
). The definition of OTUs for both rRNA genes and protein-encoding genes will certainly influence this interpretation of the data: defining the appropriate level of resolution in sequence analysis is central to future analysis of microbiome sequences.
A logical extension of the metabolic potential suggested by community metagenomic sequencing is insight into the actual activity of a community gained through metatranscriptomic sequencing (Gilbert and Hughes, 2011
; Gosalbes et al., 2011
). In this case, total RNA is isolated and structural RNAs removed to enrich for mRNA, which is then reverse transcribed into cDNA for sequence analysis. Rather than just revealing the potential activity, this will indicate which of the potential metabolic pathways are actually being used on the basis of their transcription within the community. To move even closer to actual function, metaproteomics employs high-throughput, high-resolution mass spectroscopy to determine which proteins are actually present in a given community (Verberkmoes et al., 2009
). This approach generally requires some knowledge of the coding potential of a community in order to make predictions about potential proteins based in mass/charge ratios, and thus is often combined with metagenomic sequencing. A final approach used to assess in situ
function, often via mass spectroscopy, is to measure the complement of metabolites (e.g., short chain fatty acids, lipids, small molecules) associated with a community. This so-called metabolomics or metabonomics approach assesses function based on the presence of metabolites, many of which will be produced by specific members of the community (Martin et al., 2007
; Kinross et al., 2011
The three dimensional structure of microbial communities in the GI tract, particularly those in close proximity to epithelial cells may also provide useful information about the function of the community, including cell–cell interactions among microbes and between microbes and their host. Extraction and purification of DNA for microbiome analysis obliterates the architecture of microbial communities, but fortunately the sequence data gathered as part of a SSU microbial census can be used to design fluorescently labeled probes that permit visualization of the structural organization of microbes in preserved samples. The recent application of combinatorial labeling of probes and spectral imaging (Valm et al., 2011
) offers the potential to visualize dozens of microbes in a community and holds considerable promise for microbiome studies.
Selecting the appropriate methodology; an argument for the team approach
With this immense armamentarium of tools for microbiome analysis, the decision as to which method to employ must return to the most basic considerations, namely, what is the scientific and/or clinical question(s) to be addressed? In some cases, associations with disease based on 16S sequence retrieval are an appropriate first step, in an exercise as we discussed earlier that can be thought of as being hypothesis generating. However, in order to specifically test a given hypothesis or to monitor the physiologic effects of specific microbiome alterations, functional assessments via metagenomics or metabolomics might be more appropriate. To help in such decision making, a “team science” approach is often necessary, bringing together clinicians with expertise in IBD with microbial ecologists, bioinformatics specialists, statisticians, and microbial physiologists. As demonstrated by the NIH Human Microbiome Project (HMP) and the European MetaHIT projects, collaborative teams of scientists from a broad range of disciplines working together to address questions of the microbiome in health and disease are an important and effective approach. Similarly, the study of IBD using a “systems science,” with interdisciplinary teams and expertise will be essential for discovering the etiopathogenesis of these diseases, novel therapies, and potentially a cure.