Search tips
Search criteria

Results 1-25 (833948)

Clipboard (0)

Related Articles

1.  PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data 
PLoS Computational Biology  2011;7(1):e1001061.
Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity?
Author Summary
Microorganisms comprise the majority of the biodiversity on the planet. Because the overwhelming majority of microbes are not readily cultured in the laboratory, researchers often rely on PCR-based investigations of genomic sequence to characterize microbial diversity. These analyses have dramatically expanded our understanding of biodiversity, but due to methodological biases PCR-based approaches may only reveal part of the microbial biosphere. Shotgun sequencing of environmental DNA, known as metagenomics, avoids the biases associated with targeted amplification of genomic sequence and can provide insight into the diversity hidden from traditional investigations. However, the fragmentary, non-overlapping nature of shotgun sequence data makes it intractable to analyze with existing tools. Here, we present PhylOTU, a novel computational method that enables accurate characterization of microbial diversity from metagenomic data. We process over 10 million metagenomic sequences obtained from the global open ocean to identify novel Bacterial taxa and reveal the presence of microorganisms overlooked by investigation of PCR-based sequences from the same samples. These results suggest that to fully characterize microbial biodiversity requires a novel bioinformatics toolbox for analysis of shotgun metagenomic data.
PMCID: PMC3024254  PMID: 21283775
2.  Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data 
Using metagenomic ‘parts lists' to study microbial ecology remains a significant challenge. This work proposes a molecular trait-based approach to biogeography by integrating metagenomic data with external metadata and using functional community composition as readout.
Climatic factors drive functional and phylogenetic composition of ocean microbial communities.Function dispersal is controlled by environmental conditions.Functional richness has a clear latitudinal gradient and correlates with primary production.Metagenomic data can be used as a predictor for ecosystem processes.To understand the relationship between community composition and environment, functional readouts are the most direct. Metagenomic data enable such trait-based ecology at the molecular level.
Metagenomics (shotgun sequencing of pooled DNA of complete microbial communities) is widely used to investigate ecosystem functioning of environmental and clinical samples. However, the nature of this data (usually a gigantic collection of gene fragments of 1000s of organisms) makes it very hard to infer global patterns on microbial ecology of the environment at hand. To address important ecological questions such as ‘How do microbial communities adapt to the environmental conditions?', ‘What drives the functional variation across the globe and to what extent do genes disperse?' and ‘What drives variation of CO2 uptake across different locations and communities?', we integrated 25 ocean metagenomes from the Global Ocean Sampling project with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the functional and phylogenetic composition of an environment and the main limiting factor on whether functions dispersal across the planet. We find a distinct latitudinal gradient in the size and diversity of the functional repertoire of ocean microbial communities, peaking at 20°N, and which correlates with oceanic CO2 uptake. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes can be used as quantitative predictor for molecular trait-based biogeography and ecology.
Using metagenomic ‘parts lists' to infer global patterns on microbial ecology remains a significant challenge. To deduce important ecological indicators such as environmental adaptation, molecular trait dispersal, diversity variation and primary production from the gene pool of an ecosystem, we integrated 25 ocean metagenomes with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the biomolecular repertoire of each sample and the main limiting factor on functional trait dispersal (absence of biogeographic provincialism). Molecular functional richness and diversity show a distinct latitudinal gradient peaking at 20°N and correlate with primary production. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes is an important quantitative readout for molecular trait-based biogeography and ecology.
PMCID: PMC3094067  PMID: 21407210
ecosystems biology; environmental genomics; metagenomics; microbiology; molecular trait-based ecology
3.  The Phylogenetic Diversity of Metagenomes 
PLoS ONE  2011;6(8):e23214.
Phylogenetic diversity—patterns of phylogenetic relatedness among organisms in ecological communities—provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context.
PMCID: PMC3166145  PMID: 21912589
4.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific 
PLoS Biology  2007;5(3):e77.
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
Author Summary
Marine microbes remain elusive and mysterious, even though they are the most abundant life form in the ocean, form the base of the marine food web, and drive energy and nutrient cycling. We know so little about the vast majority of microbes because only a small percentage can be cultivated and studied in the lab. Here we report on the Global Ocean Sampling expedition, an environmental metagenomics project that aims to shed light on the role of marine microbes by sequencing their DNA without first needing to isolate individual organisms. A total of 41 different samples were taken from a wide variety of aquatic habitats collected over 8,000 km. The resulting 7.7 million sequencing reads provide an unprecedented look at the incredible diversity and heterogeneity in naturally occurring microbial populations. We have developed new bioinformatic methods to reconstitute large portions of both cultured and uncultured microbial genomes. Organism diversity is analyzed in relation to sampling locations and environmental pressures. Taken together, these data and analyses serve as a foundation for greatly expanding our understanding of individual microbial lineages and their evolution, the nature of marine microbial communities, and how they are impacted by and impact our world.
TheSorcerer II GOS expedition, data sampling, and analysis is described. The immense diversity in the sequence data required novel comparative genomic assembly methods, which uncovered genomic differences that marker-based methods could not.
PMCID: PMC1821060  PMID: 17355176
5.  Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance 
PLoS Computational Biology  2012;8(10):e1002743.
The abundance of different SSU rRNA (“16S”) gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly – from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.
Author Summary
Microbial ecologists cannot observe their study organisms directly, so they use molecular sequencing to measure the abundance of different microbes living in the wild. The most commonly used method for measuring the abundance of different microbes is to collect a DNA sample from an environment and sequence a particular gene, the 16S SSU rRNA gene (“16S”) from those samples. The abundance of 16S sequences from different microbes is then used as a surrogate measure of the abundance of the microbial taxa in the community. One problem with the use of the 16S gene as a measure of microbial abundance is that many microbes have multiple copies of the gene in their genome. Thus, variation in 16S gene abundances can be caused by both genomic copy number variation and variation in the abundance of organisms. In this study we present a computational method that allows estimation of the abundance and genomic 16S copy number of microbes based on environmental sequencing of the 16S gene. We use simulations and analysis of microbial community data sets to demonstrate that estimating the abundance of organisms from 16S data improves our ability to accurately measure the diversity and abundance of microbial communities.
PMCID: PMC3486904  PMID: 23133348
6.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome 
PLoS Computational Biology  2012;8(6):e1002358.
Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies.
Author Summary
The human body is inhabited by trillions of bacteria and other microbes, which have recently been studied in many different habitats (including gut, mouth, skin, and urogenital) by the Human Microbiome Project (HMP). These microbial communities were assayed using high-throughput DNA sequencing, but it can be challenging to determine their biological functions based solely on the resulting short sequences. To reconstruct the metabolic activities of such communities, we have developed HUMAnN, a method to accurately infer community function directly from short DNA reads. The method's accuracy was validated using a collection of synthetic microbial communities. Applying HUMAnN to data from the HMP, we showed that, unlike individual microbial species, many metabolic processes were present among all body habitats. However, the frequencies of these processes varied dramatically, and some were highly enriched within individual habitats to provide niche specialization (e.g. in the gut, which is abundant in food matter but low in oxygen). Other community functions were linked specifically to properties of the human host, such as biochemical processes only present in vaginal habitats with particularly high or low pH. Studying additional environmental or disease-associated communities using HUMAnN will further improve our understanding of how the microbial organisms in a community are linked to the biological processes they carry out.
PMCID: PMC3374609  PMID: 22719234
7.  Community transcriptomic assembly reveals microbes that contribute to deep-sea carbon and nitrogen cycling 
The ISME Journal  2013;7(10):1962-1973.
The deep ocean is an important component of global biogeochemical cycles because it contains one of the largest pools of reactive carbon and nitrogen on earth. However, the microbial communities that drive deep-sea geochemistry are vastly unexplored. Metatranscriptomics offers new windows into these communities, but it has been hampered by reliance on genome databases for interpretation. We reconstructed the transcriptomes of microbial populations from Guaymas Basin, in the deep Gulf of California, through shotgun sequencing and de novo assembly of total community RNA. Many of the resulting messenger RNA (mRNA) contiguous sequences contain multiple genes, reflecting co-transcription of operons, including those from dominant members. Also prevalent were transcripts with only limited representation (2.8 times coverage) in a corresponding metagenome, including a considerable portion (1.2 Mb total assembled mRNA sequence) with similarity (96%) to a marine heterotroph, Alteromonas macleodii. This Alteromonas and euryarchaeal marine group II populations displayed abundant transcripts from amino-acid transporters, suggesting recycling of organic carbon and nitrogen from amino acids. Also among the most abundant mRNAs were catalytic subunits of the nitrite oxidoreductase complex and electron transfer components involved in nitrite oxidation. These and other novel genes are related to novel Nitrospirae and have limited representation in accompanying metagenomic data. High throughput sequencing of 16S ribosomal RNA (rRNA) genes and rRNA read counts confirmed that Nitrospirae are minor yet widespread members of deep-sea communities. These results implicate a novel bacterial group in deep-sea nitrite oxidation, the second step of nitrification. This study highlights metatranscriptomic assembly as a valuable approach to study microbial communities.
PMCID: PMC3965313  PMID: 23702516
Archaea; deep sea; transcriptomics; nitrification; Alteromonas; Nitrospirae
8.  Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients 
The ISME Journal  2011;6(5):1007-1017.
Terrestrial ecosystems are receiving elevated inputs of nitrogen (N) from anthropogenic sources and understanding how these increases in N availability affect soil microbial communities is critical for predicting the associated effects on belowground ecosystems. We used a suite of approaches to analyze the structure and functional characteristics of soil microbial communities from replicated plots in two long-term N fertilization experiments located in contrasting systems. Pyrosequencing-based analyses of 16S rRNA genes revealed no significant effects of N fertilization on bacterial diversity, but significant effects on community composition at both sites; copiotrophic taxa (including members of the Proteobacteria and Bacteroidetes phyla) typically increased in relative abundance in the high N plots, with oligotrophic taxa (mainly Acidobacteria) exhibiting the opposite pattern. Consistent with the phylogenetic shifts under N fertilization, shotgun metagenomic sequencing revealed increases in the relative abundances of genes associated with DNA/RNA replication, electron transport and protein metabolism, increases that could be resolved even with the shallow shotgun metagenomic sequencing conducted here (average of 75 000 reads per sample). We also observed shifts in the catabolic capabilities of the communities across the N gradients that were significantly correlated with the phylogenetic and metagenomic responses, indicating possible linkages between the structure and functioning of soil microbial communities. Overall, our results suggest that N fertilization may, directly or indirectly, induce a shift in the predominant microbial life-history strategies, favoring a more active, copiotrophic microbial community, a pattern that parallels the often observed replacement of K-selected with r-selected plant species with elevated N.
PMCID: PMC3329107  PMID: 22134642
shotgun metagenomics; pyrosequencing; soil bacteria; nitrogen fertilization; soil carbon dynamics
9.  Strengths and Limitations of 16S rRNA Gene Amplicon Sequencing in Revealing Temporal Microbial Community Dynamics 
PLoS ONE  2014;9(4):e93827.
This study explored the short-term planktonic microbial community structure and resilience in Lake Lanier (GA, USA) while simultaneously evaluating the technical aspects of identifying taxa via 16S rRNA gene amplicon and metagenomic sequence data. 16S rRNA gene amplicons generated from four temporally discrete samples were sequenced with 454 GS-FLX-Ti yielding ∼40,000 rRNA gene sequences from each sample and representing ∼300 observed OTUs. Replicates obtained from the same biological sample clustered together but several biases were observed, linked to either the PCR or sequencing-preparation steps. In comparisons with companion whole-community shotgun metagenome datasets, the estimated number of OTUs at each timepoint was concordant, but 1.5 times and ∼10 times as many phyla and genera, respectively, were identified in the metagenomes. Our analyses showed that the 16S rRNA gene captures broad shifts in community diversity over time, but with limited resolution and lower sensitivity compared to metagenomic data. We also identified OTUs that showed marked shifts in abundance over four close timepoints separated by perturbations and tracked these taxa in the metagenome vs. 16S rRNA amplicon data. A strong summer storm had less of an effect on community composition than did seasonal mixing, which revealed a distinct succession of organisms. This study provides insights into freshwater microbial communities and advances the approaches for assessing community diversity and dynamics in situ.
PMCID: PMC3979728  PMID: 24714158
10.  Quantitative Metagenomic Analyses Based on Average Genome Size Normalization ▿ †  
Over the past quarter-century, microbiologists have used DNA sequence information to aid in the characterization of microbial communities. During the last decade, this has expanded from single genes to microbial community genomics, or metagenomics, in which the gene content of an environment can provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data by estimating average genome sizes. This normalization can relieve comparative biases introduced by differences in community structure, number of sequencing reads, and sequencing read lengths between different metagenomes. We demonstrate the utility of this approach by comparing metagenomes from two different marine sources using both conventional small-subunit (SSU) rRNA gene analyses and our quantitative method to calculate the proportion of genomes in each sample that are capable of a particular metabolic trait. With both environments, to determine what proportion of each community they make up and how differences in environment affect their abundances, we characterize three different types of autotrophic organisms: aerobic, photosynthetic carbon fixers (the Cyanobacteria); anaerobic, photosynthetic carbon fixers (the Chlorobi); and anaerobic, nonphotosynthetic carbon fixers (the Desulfobacteraceae). These analyses demonstrate how genome proportionality compares to SSU rRNA gene relative abundance and how factors such as average genome size and SSU rRNA gene copy number affect sampling probability and therefore both types of community analysis.
PMCID: PMC3067418  PMID: 21317268
11.  Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing 
PLoS Genetics  2008;4(11):e1000255.
Massively parallel pyrosequencing of hypervariable regions from small subunit ribosomal RNA (SSU rRNA) genes can sample a microbial community two or three orders of magnitude more deeply per dollar and per hour than capillary sequencing of full-length SSU rRNA. As with full-length rRNA surveys, each sequence read is a tag surrogate for a single microbe. However, rather than assigning taxonomy by creating gene trees de novo that include all experimental sequences and certain reference taxa, we compare the hypervariable region tags to an extensive database of rRNA sequences and assign taxonomy based on the best match in a Global Alignment for Sequence Taxonomy (GAST) process. The resulting taxonomic census provides information on both composition and diversity of the microbial community. To determine the effectiveness of using only hypervariable region tags for assessing microbial community membership, we compared the taxonomy assigned to the V3 and V6 hypervariable regions with the taxonomy assigned to full-length SSU rRNA sequences isolated from both the human gut and a deep-sea hydrothermal vent. The hypervariable region tags and full-length rRNA sequences provided equivalent taxonomy and measures of relative abundance of microbial communities, even for tags up to 15% divergent from their nearest reference match. The greater sampling depth per dollar afforded by massively parallel pyrosequencing reveals many more members of the “rare biosphere” than does capillary sequencing of the full-length gene. In addition, tag sequencing eliminates cloning bias and the sequences are short enough to be completely sequenced in a single read, maximizing the number of organisms sampled in a run while minimizing chimera formation. This technique allows the cost-effective exploration of changes in microbial community structure, including the rare biosphere, over space and time and can be applied immediately to initiatives, such as the Human Microbiome Project.
Author Summary
Microbes play a critical role in both human and environmental health. The more we explore microbial populations, the more complexity and diversity we find. Phylogenetic trees based on 16S ribosomal RNA genes have been used with great success to identify microbial taxonomy from DNA alone. New DNA sequencing technologies, such as massively parallel pyrosequencing, can provide orders of magnitude more DNA sequences than ever before, however, the sequences are much shorter, so new methods are necessary to identify the microbes from short DNA tags. We demonstrate the effectiveness of identifying microbial taxa by comparing short tags from 16S hypervariable regions against a large database of known 16S genes. Using this technique, hypervariable region tags provide equivalent taxonomy and relative abundances of microbial communities as full-length rRNA sequences. The greater sampling depth afforded by tag pyrosequencing uncovers not only the dominant microbial species, but many more members of the “rare biosphere” than does capillary sequencing of the full-length gene. Tag pyrosequencing greatly enhances projects exploring composition, diversity, and distribution of microbial populations, such as the Human Microbiome Initiative. A companion paper in PLoS Biology (see Dethlefsen et al., doi:10.1371/journal.pbio.0060280) successfully uses this technique to characterize the effects of antibiotics on the human gut microbiota.
PMCID: PMC2577301  PMID: 19023400
12.  Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences 
PLoS Computational Biology  2013;9(3):e1002981.
Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial communities. The utilization of pathway reconstruction, however, can be jeopardized because of imperfect functional annotation of genes, and ambiguity in the assignment of predicted enzymes to biochemical reactions (e.g., some enzymes are involved in multiple biochemical reactions). Considering that metabolic functions in a microbial community are carried out by many enzymes in a collaborative manner, we present a probabilistic sampling approach to profiling functional content in a metagenomic dataset, by sampling functions of catalytically promiscuous enzymes within the context of the entire metabolic network defined by the annotated metagenome. We test our approach on metagenomic datasets from environmental and human-associated microbial communities. The results show that our approach provides a more accurate representation of the metabolic activities encoded in a metagenome, and thus improves the comparative analysis of multiple microbial communities. In addition, our approach reports likelihood scores of putative reactions, which can be used to identify important reactions and metabolic pathways that reflect the environmental adaptation of the microbial communities. Source code for sampling metabolic networks is available online at
Author Summary
We present a probabilistic sampling approach to profiling metabolic reactions in a microbial community from metagenomic shotgun reads, in an attempt to understand the metabolism within a microbial community and compare them across multiple communities. Different from the conventional pathway reconstruction approaches that aim at a definitive set of reactions, our method estimates how likely each annotated reaction can occur in the metabolism of the microbial community, given the shotgun sequencing data. This probabilistic measure improves our prediction of the actual metabolism in the microbial communities and can be used in the comparative functional analysis of metagenomic data.
PMCID: PMC3605055  PMID: 23555216
13.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families 
PLoS Biology  2007;5(3):e16.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
Author Summary
The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. Given the wide-ranging roles microbes play in many ecosystems, metagenomics studies of microbial communities will reveal insights into protein families and their evolution. Because most microbes will not grow in the laboratory using current cultivation techniques, scientists have turned to cultivation-independent techniques to study microbial diversity. One such technique—shotgun sequencing—allows random sampling of DNA sequences to examine the genomic material present in a microbial community. We used shotgun sequencing to examine microbial communities in water samples collected by the Sorcerer II Global Ocean Sampling (GOS) expedition. Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature.
The GOS data identified 6.12 million predicted proteins covering nearly all known prokaryotic protein families, and several new families. This almost doubles the number of known proteins and shows that we are far from identifying all the proteins in nature.
PMCID: PMC1821046  PMID: 17355171
14.  Comparative metagenomics of bathypelagic plankton and bottom sediment from the Sea of Marmara 
The ISME journal  2010;5(2):285-304.
To extend comparative metagenomic analyses of the deep-sea, we produced metagenomic data by direct 454 pyrosequencing from bathypelagic plankton (1000 m depth) and bottom sediment of the Sea of Marmara, the gateway between the Eastern Mediterranean and the Black Seas. Data from small subunit ribosomal RNA (SSU rRNA) gene libraries and direct pyrosequencing of the same samples indicated that Gamma- and Alpha-proteobacteria, followed by Bacteroidetes, dominated the bacterial fraction in Marmara deep-sea plankton, whereas Planctomycetes, Delta- and Gamma-proteobacteria were the most abundant groups in high bacterial-diversity sediment. Group I Crenarchaeota/Thaumarchaeota dominated the archaeal plankton fraction, although group II and III Euryarchaeota were also present. Eukaryotes were highly diverse in SSU rRNA gene libraries, with group I (Duboscquellida) and II (Syndiniales) alveolates and Radiozoa dominating plankton, and Opisthokonta and Alveolates, sediment. However, eukaryotic sequences were scarce in pyrosequence data. Archaeal amo genes were abundant in plankton, suggesting that Marmara planktonic Thaumarchaeota are ammonia oxidizers. Genes involved in sulfate reduction, carbon monoxide oxidation, anammox and sulfatases were over-represented in sediment. Genome recruitment analyses showed that Alteromonas macleodii ‘surface ecotype', Pelagibacter ubique and Nitrosopumilus maritimus were highly represented in 1000 m-deep plankton. A comparative analysis of Marmara metagenomes with ALOHA deep-sea and surface plankton, whale carcasses, Peru subsurface sediment and soil metagenomes clustered deep-sea Marmara plankton with deep-ALOHA plankton and whale carcasses, likely because of the suboxic conditions in the deep Marmara water column. The Marmara sediment clustered with the soil metagenome, highlighting the common ecological role of both types of microbial communities in the degradation of organic matter and the completion of biogeochemical cycles.
PMCID: PMC3105693  PMID: 20668488
deep-sea; anaerobic respiration; carbon fixation; carbon cycle; sulfate reduction; ammonia oxidation
15.  Reconstructing the Genomic Content of Microbiome Taxa through Shotgun Metagenomic Deconvolution 
PLoS Computational Biology  2013;9(10):e1003292.
Metagenomics has transformed our understanding of the microbial world, allowing researchers to bypass the need to isolate and culture individual taxa and to directly characterize both the taxonomic and gene compositions of environmental samples. However, associating the genes found in a metagenomic sample with the specific taxa of origin remains a critical challenge. Existing binning methods, based on nucleotide composition or alignment to reference genomes allow only a coarse-grained classification and rely heavily on the availability of sequenced genomes from closely related taxa. Here, we introduce a novel computational framework, integrating variation in gene abundances across multiple samples with taxonomic abundance data to deconvolve metagenomic samples into taxa-specific gene profiles and to reconstruct the genomic content of community members. This assembly-free method is not bounded by various factors limiting previously described methods of metagenomic binning or metagenomic assembly and represents a fundamentally different approach to metagenomic-based genome reconstruction. An implementation of this framework is available at We first describe the mathematical foundations of our framework and discuss considerations for implementing its various components. We demonstrate the ability of this framework to accurately deconvolve a set of metagenomic samples and to recover the gene content of individual taxa using synthetic metagenomic samples. We specifically characterize determinants of prediction accuracy and examine the impact of annotation errors on the reconstructed genomes. We finally apply metagenomic deconvolution to samples from the Human Microbiome Project, successfully reconstructing genus-level genomic content of various microbial genera, based solely on variation in gene count. These reconstructed genera are shown to correctly capture genus-specific properties. With the accumulation of metagenomic data, this deconvolution framework provides an essential tool for characterizing microbial taxa never before seen, laying the foundation for addressing fundamental questions concerning the taxa comprising diverse microbial communities.
Author Summary
Most microorganisms inhabit complex, diverse, and largely uncharacterized communities. Metagenomic technologies allow us to determine the taxonomic and gene compositions of these communities and to obtain insights into their function as a whole but usually do not enable the characterization of individual member taxa. Here, we introduce a novel computational framework for decomposing metagenomic community-level gene content data into taxa-specific gene profiles. Specifically, by analyzing the way taxonomic and gene abundances co-vary across a set of metagenomic samples, we are able to associate genes with their taxa of origin. We first demonstrate the ability of this approach to decompose metagenomes and to reconstruct the genomes of member taxa using simulated datasets. We further identify the factors that contribute to the accuracy of our method. We then apply our framework to samples from the human microbiome – the set of microorganisms that inhabit the human body – and show that it can be used to successfully reconstruct the typical genomes of various microbiome genera. Notably, our framework is based solely on variation in gene composition and does not rely on sequence composition signatures, assembly, or available reference genomes. It is therefore especially suited to studying the many microbial habitats yet to be extensively characterized.
PMCID: PMC3798274  PMID: 24146609
16.  Metagenomic and Metabolic Profiling of Nonlithifying and Lithifying Stromatolitic Mats of Highborne Cay, The Bahamas 
PLoS ONE  2012;7(5):e38229.
Stromatolites are laminated carbonate build-ups formed by the metabolic activity of microbial mats and represent one of the oldest known ecosystems on Earth. In this study, we examined a living stromatolite located within the Exuma Sound, The Bahamas and profiled the metagenome and metabolic potential underlying these complex microbial communities.
Methodology/Principal Findings
The metagenomes of the two dominant stromatolitic mat types, a nonlithifying (Type 1) and lithifying (Type 3) microbial mat, were partially sequenced and compared. This deep-sequencing approach was complemented by profiling the substrate utilization patterns of the mats using metabolic microarrays. Taxonomic assessment of the protein-encoding genes confirmed previous SSU rRNA analyses that bacteria dominate the metagenome of both mat types. Eukaryotes comprised less than 13% of the metagenomes and were rich in sequences associated with nematodes and heterotrophic protists. Comparative genomic analyses of the functional genes revealed extensive similarities in most of the subsystems between the nonlithifying and lithifying mat types. The one exception was an increase in the relative abundance of certain genes associated with carbohydrate metabolism in the lithifying Type 3 mats. Specifically, genes associated with the degradation of carbohydrates commonly found in exopolymeric substances, such as hexoses, deoxy- and acidic sugars were found. The genetic differences in carbohydrate metabolisms between the two mat types were confirmed using metabolic microarrays. Lithifying mats had a significant increase in diversity and utilization of carbon, nitrogen, phosphorus and sulfur substrates.
The two stromatolitic mat types retained similar microbial communities, functional diversity and many genetic components within their metagenomes. However, there were major differences detected in the activity and genetic pathways of organic carbon utilization. These differences provide a strong link between the metagenome and the physiology of the mats, as well as new insights into the biological processes associated with carbonate precipitation in modern marine stromatolites.
PMCID: PMC3360630  PMID: 22662280
17.  The Systemic Imprint of Growth and Its Uses in Ecological (Meta)Genomics 
PLoS Genetics  2010;6(1):e1000808.
Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable prokaryotes and paves the way to the understanding of community dynamics from metagenomic data.
Author Summary
Microbial minimal generation times vary from a few minutes to several weeks. The reasons for this disparity have been thought to lie on different life-history strategies: fast-growing microbes grow extremely fast in rich media, but are less capable of dealing with stress and/or poor nutrient conditions. Prokaryotes have evolved a set of genomic traits to grow fast, including biased codon usage and transient or permanent gene multiplication for dosage effects. Here, we studied the relative role of these traits and show they can be used to predict minimal generation times from the genomic data of the vast majority of microbes that cannot be cultivated. We show that this inference can also be made with incomplete genomes and thus be applied to metagenomic data to test hypotheses about the biomass productivity of biotopes and the evolution of microbiota in the human gut after birth. Our results also allow a better understanding of the co-evolution between growth rates and genomic traits and how they can be manipulated in synthetic biology. Growth rates have been a key variable in microbial physiology studies in the last century, and we show how intimately they are linked with genome organization and prokaryotic ecology.
PMCID: PMC2797632  PMID: 20090831
18.  An in vitro biofilm model system maintaining a highly reproducible species and metabolic diversity approaching that of the human oral microbiome 
Microbiome  2013;1:25.
Our knowledge of microbial diversity in the human oral cavity has vastly expanded during the last two decades of research. However, much of what is known about the behavior of oral species to date derives from pure culture approaches and the studies combining several cultivated species, which likely does not fully reflect their function in complex microbial communities. It has been shown in studies with a limited number of cultivated species that early oral biofilm development occurs in a successional manner and that continuous low pH can lead to an enrichment of aciduric species. Observations that in vitro grown plaque biofilm microcosms can maintain similar pH profiles in response to carbohydrate addition as plaque in vivo suggests a complex microbial community can be established in the laboratory. In light of this, our primary goal was to develop a robust in vitro biofilm-model system from a pooled saliva inoculum in order to study the stability, reproducibility, and development of the oral microbiome, and its dynamic response to environmental changes from the community to the molecular level.
Comparative metagenomic analyses confirmed a high similarity of metabolic potential in biofilms to recently available oral metagenomes from healthy subjects as part of the Human Microbiome Project. A time-series metagenomic analysis of the taxonomic community composition in biofilms revealed that the proportions of major species at 3 hours of growth are maintained during 48 hours of biofilm development. By employing deep pyrosequencing of the 16S rRNA gene to investigate this biofilm model with regards to bacterial taxonomic diversity, we show a high reproducibility of the taxonomic carriage and proportions between: 1) individual biofilm samples; 2) biofilm batches grown at different dates; 3) DNA extraction techniques and 4) research laboratories.
Our study demonstrates that we now have the capability to grow stable oral microbial in vitro biofilms containing more than one hundred operational taxonomic units (OTU) which represent 60-80% of the original inoculum OTU richness. Previously uncultivated Human Oral Taxa (HOT) were identified in the biofilms and contributed to approximately one-third of the totally captured 16S rRNA gene diversity. To our knowledge, this represents the highest oral bacterial diversity reported for an in vitro model system so far. This robust model will help investigate currently uncultivated species and the known virulence properties for many oral pathogens not solely restricted to pure culture systems, but within multi-species biofilms.
PMCID: PMC3971625  PMID: 24451062
In vitro model; Biofilm; Oral microbiome; Saliva; Streptococcus; Lactobacillus; Uncultivated bacteria
19.  Microbial Co-occurrence Relationships in the Human Microbiome 
PLoS Computational Biology  2012;8(7):e1002606.
The healthy microbiota show remarkable variability within and among individuals. In addition to external exposures, ecological relationships (both oppositional and symbiotic) between microbial inhabitants are important contributors to this variation. It is thus of interest to assess what relationships might exist among microbes and determine their underlying reasons. The initial Human Microbiome Project (HMP) cohort, comprising 239 individuals and 18 different microbial habitats, provides an unprecedented resource to detect, catalog, and analyze such relationships. Here, we applied an ensemble method based on multiple similarity measures in combination with generalized boosted linear models (GBLMs) to taxonomic marker (16S rRNA gene) profiles of this cohort, resulting in a global network of 3,005 significant co-occurrence and co-exclusion relationships between 197 clades occurring throughout the human microbiome. This network revealed strong niche specialization, with most microbial associations occurring within body sites and a number of accompanying inter-body site relationships. Microbial communities within the oropharynx grouped into three distinct habitats, which themselves showed no direct influence on the composition of the gut microbiota. Conversely, niches such as the vagina demonstrated little to no decomposition into region-specific interactions. Diverse mechanisms underlay individual interactions, with some such as the co-exclusion of Porphyromonaceae family members and Streptococcus in the subgingival plaque supported by known biochemical dependencies. These differences varied among broad phylogenetic groups as well, with the Bacilli and Fusobacteria, for example, both enriched for exclusion of taxa from other clades. Comparing phylogenetic versus functional similarities among bacteria, we show that dominant commensal taxa (such as Prevotellaceae and Bacteroides in the gut) often compete, while potential pathogens (e.g. Treponema and Prevotella in the dental plaque) are more likely to co-occur in complementary niches. This approach thus serves to open new opportunities for future targeted mechanistic studies of the microbial ecology of the human microbiome.
Author Summary
The human body is a complex ecosystem where microbes compete, and cooperate. These interactions can support health or promote disease, e.g. in dental plaque formation. The Human Microbiome Project collected and sequenced ca. 5,000 samples from 18 different body sites, including the airways, gut, skin, oral cavity and vagina. These data allowed the first assessment of significant patterns of co-presence and exclusion among human-associated bacteria. We combined sparse regression with an ensemble of similarity measures to predict microbial relationships within and between body sites. This captured known relationships in the dental plaque, vagina, and gut, and also predicted novel interactions involving members of under-characterized phyla such as TM7. We detected relationships necessary for plaque formation and differences in community composition among dominant members of the gut and vaginal microbiomes. Most relationships were strongly niche-specific, with only a few hub microorganisms forming links across multiple body areas. We also found that phylogenetic distance had a strong impact on the interaction type: closely related microorganisms co-occurred within the same niche, whereas most exclusive relationships occurred between more distantly related microorganisms. This establishes both the specific organisms and general principles by which microbial communities associated with healthy humans are assembled and maintained.
PMCID: PMC3395616  PMID: 22807668
20.  An introduction to the analysis of shotgun metagenomic data 
Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified the relationship between host-associated microbial communities and host phenotype. Shotgun metagenomic DNA sequencing is a relatively new and powerful environmental sequencing approach that provides insight into community biodiversity and function. But, the analysis of metagenomic sequences is complicated due to the complex structure of the data. Fortunately, new tools and data resources have been developed to circumvent these complexities and allow researchers to determine which microbes are present in the community and what they might be doing. This review describes the analytical strategies and specific tools that can be applied to metagenomic data and the considerations and caveats associated with their use. Specifically, it documents how metagenomes can be analyzed to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. It also discusses several methods that can be used compare metagenomes to identify taxa and functions that differentiate communities.
PMCID: PMC4059276  PMID: 24982662
metagenome; bioinformatics; microbiota; microbiome; microbial diversity; host–microbe interactions; review
21.  Analysis of Shotgun Metagenomes with MG-RAST 
The field of metagenomics is transforming our ability to study the enormous biomass and diversity of microbial life around us. Understanding this microbial world will lead to advances and practical applications in a broad range of fields. Metagenomic sequencing, provides unprecedented access to the thousands (or even millions) of microbes in an environment. Unlike 16S SSU rRNA amplicon sequencing, metagenomic sequencing (whole shotgun sequencing) provides information on not only who is in a community but what they are doing, extending understanding of community structure towards interactions within an environment. This talk will discuss the MG-RAST analysis pipeline starting from quality control assessment to annotation and an overview the interactive tools for comparative analysis. MG-RAST has analyzed over 60,000 WGS and amplicon datasets equaling approximately 20 Tbp.
PMCID: PMC3635262
22.  Metagenomic analysis of the medicinal leech gut microbiota 
There are trillions of microbes found throughout the human body and they exceed the number of eukaryotic cells by 10-fold. Metagenomic studies have revealed that the majority of these microbes are found within the gut, playing an important role in the host's digestion and nutrition. The complexity of the animal digestive tract, unculturable microbes, and the lack of genetic tools for most culturable microbes make it challenging to explore the nature of these microbial interactions within this niche. The medicinal leech, Hirudo verbana, has been shown to be a useful tool in overcoming these challenges, due to the simplicity of the microbiome and the availability of genetic tools for one of the two dominant gut symbionts, Aeromonas veronii. In this study, we utilize 16S rRNA gene pyrosequencing to further explore the microbial composition of the leech digestive tract, confirming the dominance of two taxa, the Rikenella-like bacterium and A. veronii. The deep sequencing approach revealed the presence of additional members of the microbial community that suggests the presence of a moderately complex microbial community with a richness of 36 taxa. The presence of a Proteus strain as a newly identified resident in the leech crop was confirmed using fluorescence in situ hybridization (FISH). The metagenome of this community was also pyrosequenced and the contigs were binned into the following taxonomic groups: Rikenella-like (3.1 MB), Aeromonas (4.5 MB), Proteus (2.9 MB), Clostridium (1.8 MB), Eryspelothrix (0.96 MB), Desulfovibrio (0.14 MB), and Fusobacterium (0.27 MB). Functional analyses on the leech gut symbionts were explored using the metagenomic data and MG-RAST. A comparison of the COG and KEGG categories of the leech gut metagenome to that of other animal digestive-tract microbiomes revealed that the leech digestive tract had a similar metabolic potential to the human digestive tract, supporting the usefulness of this system as a model for studying digestive-tract microbiomes. This study lays the foundation for more detailed metatranscriptomic studies and the investigation of symbiont population dynamics.
PMCID: PMC4029005  PMID: 24860552
high-throughput sequencing; beneficial microbes; symbiosis; medicinal leech
23.  Phylogenetic Molecular Ecological Network of Soil Microbial Communities in Response to Elevated CO2 
mBio  2011;2(4):e00122-11.
Understanding the interactions among different species and their responses to environmental changes, such as elevated atmospheric concentrations of CO2, is a central goal in ecology but is poorly understood in microbial ecology. Here we describe a novel random matrix theory (RMT)-based conceptual framework to discern phylogenetic molecular ecological networks using metagenomic sequencing data of 16S rRNA genes from grassland soil microbial communities, which were sampled from a long-term free-air CO2 enrichment experimental facility at the Cedar Creek Ecosystem Science Reserve in Minnesota. Our experimental results demonstrated that an RMT-based network approach is very useful in delineating phylogenetic molecular ecological networks of microbial communities based on high-throughput metagenomic sequencing data. The structure of the identified networks under ambient and elevated CO2 levels was substantially different in terms of overall network topology, network composition, node overlap, module preservation, module-based higher-order organization, topological roles of individual nodes, and network hubs, suggesting that the network interactions among different phylogenetic groups/populations were markedly changed. Also, the changes in network structure were significantly correlated with soil carbon and nitrogen contents, indicating the potential importance of network interactions in ecosystem functioning. In addition, based on network topology, microbial populations potentially most important to community structure and ecosystem functioning can be discerned. The novel approach described in this study is important not only for research on biodiversity, microbial ecology, and systems microbiology but also for microbial community studies in human health, global change, and environmental management.
The interactions among different microbial populations in a community play critical roles in determining ecosystem functioning, but very little is known about the network interactions in a microbial community, owing to the lack of appropriate experimental data and computational analytic tools. High-throughput metagenomic technologies can rapidly produce a massive amount of data, but one of the greatest difficulties is deciding how to extract, analyze, synthesize, and transform such a vast amount of information into biological knowledge. This study provides a novel conceptual framework to identify microbial interactions and key populations based on high-throughput metagenomic sequencing data. This study is among the first to document that the network interactions among different phylogenetic populations in soil microbial communities were substantially changed by a global change such as an elevated CO2 level. The framework developed will allow microbiologists to address research questions which could not be approached previously, and hence, it could represent a new direction in microbial ecology research.
PMCID: PMC3143843  PMID: 21791581
24.  Diverse CRISPRs Evolving in Human Microbiomes 
PLoS Genetics  2012;8(6):e1002441.
CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci, together with cas (CRISPR–associated) genes, form the CRISPR/Cas adaptive immune system, a primary defense strategy that eubacteria and archaea mobilize against foreign nucleic acids, including phages and conjugative plasmids. Short spacer sequences separated by the repeats are derived from foreign DNA and direct interference to future infections. The availability of hundreds of shotgun metagenomic datasets from the Human Microbiome Project (HMP) enables us to explore the distribution and diversity of known CRISPRs in human-associated microbial communities and to discover new CRISPRs. We propose a targeted assembly strategy to reconstruct CRISPR arrays, which whole-metagenome assemblies fail to identify. For each known CRISPR type (identified from reference genomes), we use its direct repeat consensus sequence to recruit reads from each HMP dataset and then assemble the recruited reads into CRISPR loci; the unique spacer sequences can then be extracted for analysis. We also identified novel CRISPRs or new CRISPR variants in contigs from whole-metagenome assemblies and used targeted assembly to more comprehensively identify these CRISPRs across samples. We observed that the distributions of CRISPRs (including 64 known and 86 novel ones) are largely body-site specific. We provide detailed analysis of several CRISPR loci, including novel CRISPRs. For example, known streptococcal CRISPRs were identified in most oral microbiomes, totaling ∼8,000 unique spacers: samples resampled from the same individual and oral site shared the most spacers; different oral sites from the same individual shared significantly fewer, while different individuals had almost no common spacers, indicating the impact of subtle niche differences on the evolution of CRISPR defenses. We further demonstrate potential applications of CRISPRs to the tracing of rare species and the virus exposure of individuals. This work indicates the importance of effective identification and characterization of CRISPR loci to the study of the dynamic ecology of microbiomes.
Author Summary
Human bodies are complex ecological systems in which various microbial organisms and viruses interact with each other and with the human host. The Human Microbiome Project (HMP) has resulted in >700 datasets of shotgun metagenomic sequences, from which we can learn about the compositions and functions of human-associated microbial communities. CRISPR/Cas systems are a widespread class of adaptive immune systems in bacteria and archaea, providing acquired immunity against foreign nucleic acids: CRISPR/Cas defense pathways involve integration of viral- or plasmid-derived DNA segments into CRISPR arrays (forming spacers between repeated structural sequences), and expression of short crRNAs from these single repeat-spacer units, to generate interference to future invading foreign genomes. Powered by an effective computational approach (the targeted assembly approach for CRISPR), our analysis of CRISPR arrays in the HMP datasets provides the very first global view of bacterial immunity systems in human-associated microbial communities. The great diversity of CRISPR spacers we observed among different body sites, in different individuals, and in single individuals over time, indicates the impact of subtle niche differences on the evolution of CRISPR defenses and indicates the key role of bacteriophage (and plasmids) in shaping human microbial communities.
PMCID: PMC3374615  PMID: 22719260
25.  Development of the Human Infant Intestinal Microbiota 
PLoS Biology  2007;5(7):e177.
Almost immediately after a human being is born, so too is a new microbial ecosystem, one that resides in that person's gastrointestinal tract. Although it is a universal and integral part of human biology, the temporal progression of this process, the sources of the microbes that make up the ecosystem, how and why it varies from one infant to another, and how the composition of this ecosystem influences human physiology, development, and disease are still poorly understood. As a step toward systematically investigating these questions, we designed a microarray to detect and quantitate the small subunit ribosomal RNA (SSU rRNA) gene sequences of most currently recognized species and taxonomic groups of bacteria. We used this microarray, along with sequencing of cloned libraries of PCR-amplified SSU rDNA, to profile the microbial communities in an average of 26 stool samples each from 14 healthy, full-term human infants, including a pair of dizygotic twins, beginning with the first stool after birth and continuing at defined intervals throughout the first year of life. To investigate possible origins of the infant microbiota, we also profiled vaginal and milk samples from most of the mothers, and stool samples from all of the mothers, most of the fathers, and two siblings. The composition and temporal patterns of the microbial communities varied widely from baby to baby. Despite considerable temporal variation, the distinct features of each baby's microbial community were recognizable for intervals of weeks to months. The strikingly parallel temporal patterns of the twins suggested that incidental environmental exposures play a major role in determining the distinctive characteristics of the microbial community in each baby. By the end of the first year of life, the idiosyncratic microbial ecosystems in each baby, although still distinct, had converged toward a profile characteristic of the adult gastrointestinal tract.
Author Summary
It has been recognized for nearly a century that human beings are inhabited by a remarkably dense and diverse microbial ecosystem, yet we are only just beginning to understand and appreciate the many roles that these microbes play in human health and development. Knowing the composition of this ecosystem is a crucial step toward understanding its roles. In this study, we designed and applied a ribosomal DNA microarray-based approach to trace the development of the intestinal flora in 14 healthy, full-term infants over the first year of life. We found that the composition and temporal patterns of the microbial communities varied widely from baby to baby, supporting a broader definition of healthy colonization than previously recognized. By one year of age, the babies retained their uniqueness but had converged toward a profile characteristic of the adult gastrointestinal tract. The composition and temporal patterns of development of the intestinal microbiota in a pair of fraternal twins were strikingly similar, suggesting that genetic and environmental factors shape our gut microbiota in a reproducible way.
Microarray profiling of the microbial communities of infant guts throughout the first year shows initial variation then convergence on the adult flora, providing new insight into this human ecosystem.
PMCID: PMC1896187  PMID: 17594176

Results 1-25 (833948)