Search tips
Search criteria

Results 1-25 (761700)

Clipboard (0)

Related Articles

1.  PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data 
PLoS Computational Biology  2011;7(1):e1001061.
Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity?
Author Summary
Microorganisms comprise the majority of the biodiversity on the planet. Because the overwhelming majority of microbes are not readily cultured in the laboratory, researchers often rely on PCR-based investigations of genomic sequence to characterize microbial diversity. These analyses have dramatically expanded our understanding of biodiversity, but due to methodological biases PCR-based approaches may only reveal part of the microbial biosphere. Shotgun sequencing of environmental DNA, known as metagenomics, avoids the biases associated with targeted amplification of genomic sequence and can provide insight into the diversity hidden from traditional investigations. However, the fragmentary, non-overlapping nature of shotgun sequence data makes it intractable to analyze with existing tools. Here, we present PhylOTU, a novel computational method that enables accurate characterization of microbial diversity from metagenomic data. We process over 10 million metagenomic sequences obtained from the global open ocean to identify novel Bacterial taxa and reveal the presence of microorganisms overlooked by investigation of PCR-based sequences from the same samples. These results suggest that to fully characterize microbial biodiversity requires a novel bioinformatics toolbox for analysis of shotgun metagenomic data.
PMCID: PMC3024254  PMID: 21283775
2.  Toward molecular trait-based ecology through integration of biogeochemical, geographical and metagenomic data 
Using metagenomic ‘parts lists' to study microbial ecology remains a significant challenge. This work proposes a molecular trait-based approach to biogeography by integrating metagenomic data with external metadata and using functional community composition as readout.
Climatic factors drive functional and phylogenetic composition of ocean microbial communities.Function dispersal is controlled by environmental conditions.Functional richness has a clear latitudinal gradient and correlates with primary production.Metagenomic data can be used as a predictor for ecosystem processes.To understand the relationship between community composition and environment, functional readouts are the most direct. Metagenomic data enable such trait-based ecology at the molecular level.
Metagenomics (shotgun sequencing of pooled DNA of complete microbial communities) is widely used to investigate ecosystem functioning of environmental and clinical samples. However, the nature of this data (usually a gigantic collection of gene fragments of 1000s of organisms) makes it very hard to infer global patterns on microbial ecology of the environment at hand. To address important ecological questions such as ‘How do microbial communities adapt to the environmental conditions?', ‘What drives the functional variation across the globe and to what extent do genes disperse?' and ‘What drives variation of CO2 uptake across different locations and communities?', we integrated 25 ocean metagenomes from the Global Ocean Sampling project with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the functional and phylogenetic composition of an environment and the main limiting factor on whether functions dispersal across the planet. We find a distinct latitudinal gradient in the size and diversity of the functional repertoire of ocean microbial communities, peaking at 20°N, and which correlates with oceanic CO2 uptake. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes can be used as quantitative predictor for molecular trait-based biogeography and ecology.
Using metagenomic ‘parts lists' to infer global patterns on microbial ecology remains a significant challenge. To deduce important ecological indicators such as environmental adaptation, molecular trait dispersal, diversity variation and primary production from the gene pool of an ecosystem, we integrated 25 ocean metagenomes with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the biomolecular repertoire of each sample and the main limiting factor on functional trait dispersal (absence of biogeographic provincialism). Molecular functional richness and diversity show a distinct latitudinal gradient peaking at 20°N and correlate with primary production. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes is an important quantitative readout for molecular trait-based biogeography and ecology.
PMCID: PMC3094067  PMID: 21407210
ecosystems biology; environmental genomics; metagenomics; microbiology; molecular trait-based ecology
3.  The Phylogenetic Diversity of Metagenomes 
PLoS ONE  2011;6(8):e23214.
Phylogenetic diversity—patterns of phylogenetic relatedness among organisms in ecological communities—provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context.
PMCID: PMC3166145  PMID: 21912589
4.  Social networks predict gut microbiome composition in wild baboons 
eLife  null;4:e05224.
Social relationships have profound effects on health in humans and other primates, but the mechanisms that explain this relationship are not well understood. Using shotgun metagenomic data from wild baboons, we found that social group membership and social network relationships predicted both the taxonomic structure of the gut microbiome and the structure of genes encoded by gut microbial species. Rates of interaction directly explained variation in the gut microbiome, even after controlling for diet, kinship, and shared environments. They therefore strongly implicate direct physical contact among social partners in the transmission of gut microbial species. We identified 51 socially structured taxa, which were significantly enriched for anaerobic and non-spore-forming lifestyles. Our results argue that social interactions are an important determinant of gut microbiome composition in natural animal populations—a relationship with important ramifications for understanding how social relationships influence health, as well as the evolution of group living.
eLife digest
The digestive system is home to a complex community of microbes—known as the gut microbiome—that contributes to our health and wellbeing by digesting food, producing essential vitamins, and preventing the growth of harmful bacteria. The recent development of rapid genome sequencing techniques has made it much easier to identify the species of microbes found in the gut microbiome, and how this microbiome's composition varies between individuals.
Studies in humans and other primates suggest that direct contact during social interactions may alter the composition of the gut microbiome in an individual. This could explain why there is a strong association between social interactions and health in humans and other social animals. However, similarities in the gut microbiomes of individuals within a social group could also be due to a shared diet or a common environment. The information collected during long-term studies of wild primates offers an opportunity to analyze and assess the influence of diet, environment and social interaction on the gut microbiome.
Here, Tung et al. studied the gut microbiomes of 48 wild baboons belonging to two different social groups in Amboseli, Kenya. Using a technique called shotgun metagenomic sequencing, they sequenced DNA extracted from samples of feces collected from individual baboons. The sequence data revealed that an individual's social group and social network can predict the species found in its gut microbiome. This remained the case even when other factors—such as diet, kinship, and shared environments—were taken into account.
Tung et al.'s findings suggest that direct physical contact during social interactions may be important in transmitting gut microbiomes between members of the same social group. However, scientists still don't know whether this exchange is good or bad for the health of the baboons. Future work will try to understand whether baboons benefit from acquiring gut microbes from their group members, and if the gut microbes of some social groups are better than others.
PMCID: PMC4379495  PMID: 25774601
Papio cynocephalus; social behavior; gut microbiome; metagenomics; transmission; social network; other
5.  Incorporating 16S Gene Copy Number Information Improves Estimates of Microbial Diversity and Abundance 
PLoS Computational Biology  2012;8(10):e1002743.
The abundance of different SSU rRNA (“16S”) gene sequences in environmental samples is widely used in studies of microbial ecology as a measure of microbial community structure and diversity. However, the genomic copy number of the 16S gene varies greatly – from one in many species to up to 15 in some bacteria and to hundreds in some microbial eukaryotes. As a result of this variation the relative abundance of 16S genes in environmental samples can be attributed both to variation in the relative abundance of different organisms, and to variation in genomic 16S copy number among those organisms. Despite this fact, many studies assume that the abundance of 16S gene sequences is a surrogate measure of the relative abundance of the organisms containing those sequences. Here we present a method that uses data on sequences and genomic copy number of 16S genes along with phylogenetic placement and ancestral state estimation to estimate organismal abundances from environmental DNA sequence data. We use theory and simulations to demonstrate that 16S genomic copy number can be accurately estimated from the short reads typically obtained from high-throughput environmental sequencing of the 16S gene, and that organismal abundances in microbial communities are more strongly correlated with estimated abundances obtained from our method than with gene abundances. We re-analyze several published empirical data sets and demonstrate that the use of gene abundance versus estimated organismal abundance can lead to different inferences about community diversity and structure and the identity of the dominant taxa in microbial communities. Our approach will allow microbial ecologists to make more accurate inferences about microbial diversity and abundance based on 16S sequence data.
Author Summary
Microbial ecologists cannot observe their study organisms directly, so they use molecular sequencing to measure the abundance of different microbes living in the wild. The most commonly used method for measuring the abundance of different microbes is to collect a DNA sample from an environment and sequence a particular gene, the 16S SSU rRNA gene (“16S”) from those samples. The abundance of 16S sequences from different microbes is then used as a surrogate measure of the abundance of the microbial taxa in the community. One problem with the use of the 16S gene as a measure of microbial abundance is that many microbes have multiple copies of the gene in their genome. Thus, variation in 16S gene abundances can be caused by both genomic copy number variation and variation in the abundance of organisms. In this study we present a computational method that allows estimation of the abundance and genomic 16S copy number of microbes based on environmental sequencing of the 16S gene. We use simulations and analysis of microbial community data sets to demonstrate that estimating the abundance of organisms from 16S data improves our ability to accurately measure the diversity and abundance of microbial communities.
PMCID: PMC3486904  PMID: 23133348
6.  The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific 
PLoS Biology  2007;5(3):e77.
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
Author Summary
Marine microbes remain elusive and mysterious, even though they are the most abundant life form in the ocean, form the base of the marine food web, and drive energy and nutrient cycling. We know so little about the vast majority of microbes because only a small percentage can be cultivated and studied in the lab. Here we report on the Global Ocean Sampling expedition, an environmental metagenomics project that aims to shed light on the role of marine microbes by sequencing their DNA without first needing to isolate individual organisms. A total of 41 different samples were taken from a wide variety of aquatic habitats collected over 8,000 km. The resulting 7.7 million sequencing reads provide an unprecedented look at the incredible diversity and heterogeneity in naturally occurring microbial populations. We have developed new bioinformatic methods to reconstitute large portions of both cultured and uncultured microbial genomes. Organism diversity is analyzed in relation to sampling locations and environmental pressures. Taken together, these data and analyses serve as a foundation for greatly expanding our understanding of individual microbial lineages and their evolution, the nature of marine microbial communities, and how they are impacted by and impact our world.
TheSorcerer II GOS expedition, data sampling, and analysis is described. The immense diversity in the sequence data required novel comparative genomic assembly methods, which uncovered genomic differences that marker-based methods could not.
PMCID: PMC1821060  PMID: 17355176
7.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families 
PLoS Biology  2007;5(3):e16.
Metagenomics projects based on shotgun sequencing of populations of micro-organisms yield insight into protein families. We used sequence similarity clustering to explore proteins with a comprehensive dataset consisting of sequences from available databases together with 6.12 million proteins predicted from an assembly of 7.7 million Global Ocean Sampling (GOS) sequences. The GOS dataset covers nearly all known prokaryotic protein families. A total of 3,995 medium- and large-sized clusters consisting of only GOS sequences are identified, out of which 1,700 have no detectable homology to known families. The GOS-only clusters contain a higher than expected proportion of sequences of viral origin, thus reflecting a poor sampling of viral diversity until now. Protein domain distributions in the GOS dataset and current protein databases show distinct biases. Several protein domains that were previously categorized as kingdom specific are shown to have GOS examples in other kingdoms. About 6,000 sequences (ORFans) from the literature that heretofore lacked similarity to known proteins have matches in the GOS data. The GOS dataset is also used to improve remote homology detection. Overall, besides nearly doubling the number of current proteins, the predicted GOS proteins also add a great deal of diversity to known protein families and shed light on their evolution. These observations are illustrated using several protein families, including phosphatases, proteases, ultraviolet-irradiation DNA damage repair enzymes, glutamine synthetase, and RuBisCO. The diversity added by GOS data has implications for choosing targets for experimental structure characterization as part of structural genomics efforts. Our analysis indicates that new families are being discovered at a rate that is linear or almost linear with the addition of new sequences, implying that we are still far from discovering all protein families in nature.
Author Summary
The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. Given the wide-ranging roles microbes play in many ecosystems, metagenomics studies of microbial communities will reveal insights into protein families and their evolution. Because most microbes will not grow in the laboratory using current cultivation techniques, scientists have turned to cultivation-independent techniques to study microbial diversity. One such technique—shotgun sequencing—allows random sampling of DNA sequences to examine the genomic material present in a microbial community. We used shotgun sequencing to examine microbial communities in water samples collected by the Sorcerer II Global Ocean Sampling (GOS) expedition. Our analysis predicted more than six million proteins in the GOS data—nearly twice the number of proteins present in current databases. These predictions add tremendous diversity to known protein families and cover nearly all known prokaryotic protein families. Some of the predicted proteins had no similarity to any currently known proteins and therefore represent new families. A higher than expected fraction of these novel families is predicted to be of viral origin. We also found that several protein domains that were previously thought to be kingdom specific have GOS examples in other kingdoms. Our analysis opens the door for a multitude of follow-up protein family analyses and indicates that we are a long way from sampling all the protein families that exist in nature.
The GOS data identified 6.12 million predicted proteins covering nearly all known prokaryotic protein families, and several new families. This almost doubles the number of known proteins and shows that we are far from identifying all the proteins in nature.
PMCID: PMC1821046  PMID: 17355171
8.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome 
PLoS Computational Biology  2012;8(6):e1002358.
Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies.
Author Summary
The human body is inhabited by trillions of bacteria and other microbes, which have recently been studied in many different habitats (including gut, mouth, skin, and urogenital) by the Human Microbiome Project (HMP). These microbial communities were assayed using high-throughput DNA sequencing, but it can be challenging to determine their biological functions based solely on the resulting short sequences. To reconstruct the metabolic activities of such communities, we have developed HUMAnN, a method to accurately infer community function directly from short DNA reads. The method's accuracy was validated using a collection of synthetic microbial communities. Applying HUMAnN to data from the HMP, we showed that, unlike individual microbial species, many metabolic processes were present among all body habitats. However, the frequencies of these processes varied dramatically, and some were highly enriched within individual habitats to provide niche specialization (e.g. in the gut, which is abundant in food matter but low in oxygen). Other community functions were linked specifically to properties of the human host, such as biochemical processes only present in vaginal habitats with particularly high or low pH. Studying additional environmental or disease-associated communities using HUMAnN will further improve our understanding of how the microbial organisms in a community are linked to the biological processes they carry out.
PMCID: PMC3374609  PMID: 22719234
9.  Community transcriptomic assembly reveals microbes that contribute to deep-sea carbon and nitrogen cycling 
The ISME Journal  2013;7(10):1962-1973.
The deep ocean is an important component of global biogeochemical cycles because it contains one of the largest pools of reactive carbon and nitrogen on earth. However, the microbial communities that drive deep-sea geochemistry are vastly unexplored. Metatranscriptomics offers new windows into these communities, but it has been hampered by reliance on genome databases for interpretation. We reconstructed the transcriptomes of microbial populations from Guaymas Basin, in the deep Gulf of California, through shotgun sequencing and de novo assembly of total community RNA. Many of the resulting messenger RNA (mRNA) contiguous sequences contain multiple genes, reflecting co-transcription of operons, including those from dominant members. Also prevalent were transcripts with only limited representation (2.8 times coverage) in a corresponding metagenome, including a considerable portion (1.2 Mb total assembled mRNA sequence) with similarity (96%) to a marine heterotroph, Alteromonas macleodii. This Alteromonas and euryarchaeal marine group II populations displayed abundant transcripts from amino-acid transporters, suggesting recycling of organic carbon and nitrogen from amino acids. Also among the most abundant mRNAs were catalytic subunits of the nitrite oxidoreductase complex and electron transfer components involved in nitrite oxidation. These and other novel genes are related to novel Nitrospirae and have limited representation in accompanying metagenomic data. High throughput sequencing of 16S ribosomal RNA (rRNA) genes and rRNA read counts confirmed that Nitrospirae are minor yet widespread members of deep-sea communities. These results implicate a novel bacterial group in deep-sea nitrite oxidation, the second step of nitrification. This study highlights metatranscriptomic assembly as a valuable approach to study microbial communities.
PMCID: PMC3965313  PMID: 23702516
Archaea; deep sea; transcriptomics; nitrification; Alteromonas; Nitrospirae
10.  Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients 
The ISME Journal  2011;6(5):1007-1017.
Terrestrial ecosystems are receiving elevated inputs of nitrogen (N) from anthropogenic sources and understanding how these increases in N availability affect soil microbial communities is critical for predicting the associated effects on belowground ecosystems. We used a suite of approaches to analyze the structure and functional characteristics of soil microbial communities from replicated plots in two long-term N fertilization experiments located in contrasting systems. Pyrosequencing-based analyses of 16S rRNA genes revealed no significant effects of N fertilization on bacterial diversity, but significant effects on community composition at both sites; copiotrophic taxa (including members of the Proteobacteria and Bacteroidetes phyla) typically increased in relative abundance in the high N plots, with oligotrophic taxa (mainly Acidobacteria) exhibiting the opposite pattern. Consistent with the phylogenetic shifts under N fertilization, shotgun metagenomic sequencing revealed increases in the relative abundances of genes associated with DNA/RNA replication, electron transport and protein metabolism, increases that could be resolved even with the shallow shotgun metagenomic sequencing conducted here (average of 75 000 reads per sample). We also observed shifts in the catabolic capabilities of the communities across the N gradients that were significantly correlated with the phylogenetic and metagenomic responses, indicating possible linkages between the structure and functioning of soil microbial communities. Overall, our results suggest that N fertilization may, directly or indirectly, induce a shift in the predominant microbial life-history strategies, favoring a more active, copiotrophic microbial community, a pattern that parallels the often observed replacement of K-selected with r-selected plant species with elevated N.
PMCID: PMC3329107  PMID: 22134642
shotgun metagenomics; pyrosequencing; soil bacteria; nitrogen fertilization; soil carbon dynamics
11.  Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing 
PLoS Genetics  2008;4(11):e1000255.
Massively parallel pyrosequencing of hypervariable regions from small subunit ribosomal RNA (SSU rRNA) genes can sample a microbial community two or three orders of magnitude more deeply per dollar and per hour than capillary sequencing of full-length SSU rRNA. As with full-length rRNA surveys, each sequence read is a tag surrogate for a single microbe. However, rather than assigning taxonomy by creating gene trees de novo that include all experimental sequences and certain reference taxa, we compare the hypervariable region tags to an extensive database of rRNA sequences and assign taxonomy based on the best match in a Global Alignment for Sequence Taxonomy (GAST) process. The resulting taxonomic census provides information on both composition and diversity of the microbial community. To determine the effectiveness of using only hypervariable region tags for assessing microbial community membership, we compared the taxonomy assigned to the V3 and V6 hypervariable regions with the taxonomy assigned to full-length SSU rRNA sequences isolated from both the human gut and a deep-sea hydrothermal vent. The hypervariable region tags and full-length rRNA sequences provided equivalent taxonomy and measures of relative abundance of microbial communities, even for tags up to 15% divergent from their nearest reference match. The greater sampling depth per dollar afforded by massively parallel pyrosequencing reveals many more members of the “rare biosphere” than does capillary sequencing of the full-length gene. In addition, tag sequencing eliminates cloning bias and the sequences are short enough to be completely sequenced in a single read, maximizing the number of organisms sampled in a run while minimizing chimera formation. This technique allows the cost-effective exploration of changes in microbial community structure, including the rare biosphere, over space and time and can be applied immediately to initiatives, such as the Human Microbiome Project.
Author Summary
Microbes play a critical role in both human and environmental health. The more we explore microbial populations, the more complexity and diversity we find. Phylogenetic trees based on 16S ribosomal RNA genes have been used with great success to identify microbial taxonomy from DNA alone. New DNA sequencing technologies, such as massively parallel pyrosequencing, can provide orders of magnitude more DNA sequences than ever before, however, the sequences are much shorter, so new methods are necessary to identify the microbes from short DNA tags. We demonstrate the effectiveness of identifying microbial taxa by comparing short tags from 16S hypervariable regions against a large database of known 16S genes. Using this technique, hypervariable region tags provide equivalent taxonomy and relative abundances of microbial communities as full-length rRNA sequences. The greater sampling depth afforded by tag pyrosequencing uncovers not only the dominant microbial species, but many more members of the “rare biosphere” than does capillary sequencing of the full-length gene. Tag pyrosequencing greatly enhances projects exploring composition, diversity, and distribution of microbial populations, such as the Human Microbiome Initiative. A companion paper in PLoS Biology (see Dethlefsen et al., doi:10.1371/journal.pbio.0060280) successfully uses this technique to characterize the effects of antibiotics on the human gut microbiota.
PMCID: PMC2577301  PMID: 19023400
12.  Strengths and Limitations of 16S rRNA Gene Amplicon Sequencing in Revealing Temporal Microbial Community Dynamics 
PLoS ONE  2014;9(4):e93827.
This study explored the short-term planktonic microbial community structure and resilience in Lake Lanier (GA, USA) while simultaneously evaluating the technical aspects of identifying taxa via 16S rRNA gene amplicon and metagenomic sequence data. 16S rRNA gene amplicons generated from four temporally discrete samples were sequenced with 454 GS-FLX-Ti yielding ∼40,000 rRNA gene sequences from each sample and representing ∼300 observed OTUs. Replicates obtained from the same biological sample clustered together but several biases were observed, linked to either the PCR or sequencing-preparation steps. In comparisons with companion whole-community shotgun metagenome datasets, the estimated number of OTUs at each timepoint was concordant, but 1.5 times and ∼10 times as many phyla and genera, respectively, were identified in the metagenomes. Our analyses showed that the 16S rRNA gene captures broad shifts in community diversity over time, but with limited resolution and lower sensitivity compared to metagenomic data. We also identified OTUs that showed marked shifts in abundance over four close timepoints separated by perturbations and tracked these taxa in the metagenome vs. 16S rRNA amplicon data. A strong summer storm had less of an effect on community composition than did seasonal mixing, which revealed a distinct succession of organisms. This study provides insights into freshwater microbial communities and advances the approaches for assessing community diversity and dynamics in situ.
PMCID: PMC3979728  PMID: 24714158
13.  Quantitative Metagenomic Analyses Based on Average Genome Size Normalization ▿ †  
Over the past quarter-century, microbiologists have used DNA sequence information to aid in the characterization of microbial communities. During the last decade, this has expanded from single genes to microbial community genomics, or metagenomics, in which the gene content of an environment can provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data by estimating average genome sizes. This normalization can relieve comparative biases introduced by differences in community structure, number of sequencing reads, and sequencing read lengths between different metagenomes. We demonstrate the utility of this approach by comparing metagenomes from two different marine sources using both conventional small-subunit (SSU) rRNA gene analyses and our quantitative method to calculate the proportion of genomes in each sample that are capable of a particular metabolic trait. With both environments, to determine what proportion of each community they make up and how differences in environment affect their abundances, we characterize three different types of autotrophic organisms: aerobic, photosynthetic carbon fixers (the Cyanobacteria); anaerobic, photosynthetic carbon fixers (the Chlorobi); and anaerobic, nonphotosynthetic carbon fixers (the Desulfobacteraceae). These analyses demonstrate how genome proportionality compares to SSU rRNA gene relative abundance and how factors such as average genome size and SSU rRNA gene copy number affect sampling probability and therefore both types of community analysis.
PMCID: PMC3067418  PMID: 21317268
14.  Probabilistic Inference of Biochemical Reactions in Microbial Communities from Metagenomic Sequences 
PLoS Computational Biology  2013;9(3):e1002981.
Shotgun metagenomics has been applied to the studies of the functionality of various microbial communities. As a critical analysis step in these studies, biological pathways are reconstructed based on the genes predicted from metagenomic shotgun sequences. Pathway reconstruction provides insights into the functionality of a microbial community and can be used for comparing multiple microbial communities. The utilization of pathway reconstruction, however, can be jeopardized because of imperfect functional annotation of genes, and ambiguity in the assignment of predicted enzymes to biochemical reactions (e.g., some enzymes are involved in multiple biochemical reactions). Considering that metabolic functions in a microbial community are carried out by many enzymes in a collaborative manner, we present a probabilistic sampling approach to profiling functional content in a metagenomic dataset, by sampling functions of catalytically promiscuous enzymes within the context of the entire metabolic network defined by the annotated metagenome. We test our approach on metagenomic datasets from environmental and human-associated microbial communities. The results show that our approach provides a more accurate representation of the metabolic activities encoded in a metagenome, and thus improves the comparative analysis of multiple microbial communities. In addition, our approach reports likelihood scores of putative reactions, which can be used to identify important reactions and metabolic pathways that reflect the environmental adaptation of the microbial communities. Source code for sampling metabolic networks is available online at
Author Summary
We present a probabilistic sampling approach to profiling metabolic reactions in a microbial community from metagenomic shotgun reads, in an attempt to understand the metabolism within a microbial community and compare them across multiple communities. Different from the conventional pathway reconstruction approaches that aim at a definitive set of reactions, our method estimates how likely each annotated reaction can occur in the metabolism of the microbial community, given the shotgun sequencing data. This probabilistic measure improves our prediction of the actual metabolism in the microbial communities and can be used in the comparative functional analysis of metagenomic data.
PMCID: PMC3605055  PMID: 23555216
15.  Comparative metagenomics of bathypelagic plankton and bottom sediment from the Sea of Marmara 
The ISME journal  2010;5(2):285-304.
To extend comparative metagenomic analyses of the deep-sea, we produced metagenomic data by direct 454 pyrosequencing from bathypelagic plankton (1000 m depth) and bottom sediment of the Sea of Marmara, the gateway between the Eastern Mediterranean and the Black Seas. Data from small subunit ribosomal RNA (SSU rRNA) gene libraries and direct pyrosequencing of the same samples indicated that Gamma- and Alpha-proteobacteria, followed by Bacteroidetes, dominated the bacterial fraction in Marmara deep-sea plankton, whereas Planctomycetes, Delta- and Gamma-proteobacteria were the most abundant groups in high bacterial-diversity sediment. Group I Crenarchaeota/Thaumarchaeota dominated the archaeal plankton fraction, although group II and III Euryarchaeota were also present. Eukaryotes were highly diverse in SSU rRNA gene libraries, with group I (Duboscquellida) and II (Syndiniales) alveolates and Radiozoa dominating plankton, and Opisthokonta and Alveolates, sediment. However, eukaryotic sequences were scarce in pyrosequence data. Archaeal amo genes were abundant in plankton, suggesting that Marmara planktonic Thaumarchaeota are ammonia oxidizers. Genes involved in sulfate reduction, carbon monoxide oxidation, anammox and sulfatases were over-represented in sediment. Genome recruitment analyses showed that Alteromonas macleodii ‘surface ecotype', Pelagibacter ubique and Nitrosopumilus maritimus were highly represented in 1000 m-deep plankton. A comparative analysis of Marmara metagenomes with ALOHA deep-sea and surface plankton, whale carcasses, Peru subsurface sediment and soil metagenomes clustered deep-sea Marmara plankton with deep-ALOHA plankton and whale carcasses, likely because of the suboxic conditions in the deep Marmara water column. The Marmara sediment clustered with the soil metagenome, highlighting the common ecological role of both types of microbial communities in the degradation of organic matter and the completion of biogeochemical cycles.
PMCID: PMC3105693  PMID: 20668488
deep-sea; anaerobic respiration; carbon fixation; carbon cycle; sulfate reduction; ammonia oxidation
16.  Sparse and Compositionally Robust Inference of Microbial Ecological Networks 
PLoS Computational Biology  2015;11(5):e1004226.
16S ribosomal RNA (rRNA) gene and other environmental sequencing techniques provide snapshots of microbial communities, revealing phylogeny and the abundances of microbial populations across diverse ecosystems. While changes in microbial community structure are demonstrably associated with certain environmental conditions (from metabolic and immunological health in mammals to ecological stability in soils and oceans), identification of underlying mechanisms requires new statistical tools, as these datasets present several technical challenges. First, the abundances of microbial operational taxonomic units (OTUs) from amplicon-based datasets are compositional. Counts are normalized to the total number of counts in the sample. Thus, microbial abundances are not independent, and traditional statistical metrics (e.g., correlation) for the detection of OTU-OTU relationships can lead to spurious results. Secondly, microbial sequencing-based studies typically measure hundreds of OTUs on only tens to hundreds of samples; thus, inference of OTU-OTU association networks is severely under-powered, and additional information (or assumptions) are required for accurate inference. Here, we present SPIEC-EASI (SParse InversE Covariance Estimation for Ecological Association Inference), a statistical method for the inference of microbial ecological networks from amplicon sequencing datasets that addresses both of these issues. SPIEC-EASI combines data transformations developed for compositional data analysis with a graphical model inference framework that assumes the underlying ecological association network is sparse. To reconstruct the network, SPIEC-EASI relies on algorithms for sparse neighborhood and inverse covariance selection. To provide a synthetic benchmark in the absence of an experimentally validated gold-standard network, SPIEC-EASI is accompanied by a set of computational tools to generate OTU count data from a set of diverse underlying network topologies. SPIEC-EASI outperforms state-of-the-art methods to recover edges and network properties on synthetic data under a variety of scenarios. SPIEC-EASI also reproducibly predicts previously unknown microbial associations using data from the American Gut project.
Author Summary
Genomic survey of microbes by 16S rRNA gene sequencing and metagenomics has inspired appreciation for the role of complex communities in diverse ecosystems. However, due to the unique properties of community composition data, standard data analysis tools are likely to produce statistical artifacts. For a typical experiment studying microbial ecosystems these artifacts can lead to erroneous conclusions about patterns of associations between microbial taxa. We developed a new procedure that seeks to infer ecological associations between microbial populations, by 1) taking advantage of the proportionality invariance of relative abundance data and 2) making assumptions about the underlying network structure when the number of taxa in the dataset is larger than the number of sampled communities. Additionally, we employed a novel tool to generate biologically plausible synthetic data and objectively benchmark current association inference tools. Finally, we tested our procedures on a large-scale 16S rRNA gene sequencing dataset sampled from the human gut.
PMCID: PMC4423992  PMID: 25950956
17.  Analysis of Shotgun Metagenomes with MG-RAST 
The field of metagenomics is transforming our ability to study the enormous biomass and diversity of microbial life around us. Understanding this microbial world will lead to advances and practical applications in a broad range of fields. Metagenomic sequencing, provides unprecedented access to the thousands (or even millions) of microbes in an environment. Unlike 16S SSU rRNA amplicon sequencing, metagenomic sequencing (whole shotgun sequencing) provides information on not only who is in a community but what they are doing, extending understanding of community structure towards interactions within an environment. This talk will discuss the MG-RAST analysis pipeline starting from quality control assessment to annotation and an overview the interactive tools for comparative analysis. MG-RAST has analyzed over 60,000 WGS and amplicon datasets equaling approximately 20 Tbp.
PMCID: PMC3635262
18.  Phylogenetic Molecular Ecological Network of Soil Microbial Communities in Response to Elevated CO2 
mBio  2011;2(4):e00122-11.
Understanding the interactions among different species and their responses to environmental changes, such as elevated atmospheric concentrations of CO2, is a central goal in ecology but is poorly understood in microbial ecology. Here we describe a novel random matrix theory (RMT)-based conceptual framework to discern phylogenetic molecular ecological networks using metagenomic sequencing data of 16S rRNA genes from grassland soil microbial communities, which were sampled from a long-term free-air CO2 enrichment experimental facility at the Cedar Creek Ecosystem Science Reserve in Minnesota. Our experimental results demonstrated that an RMT-based network approach is very useful in delineating phylogenetic molecular ecological networks of microbial communities based on high-throughput metagenomic sequencing data. The structure of the identified networks under ambient and elevated CO2 levels was substantially different in terms of overall network topology, network composition, node overlap, module preservation, module-based higher-order organization, topological roles of individual nodes, and network hubs, suggesting that the network interactions among different phylogenetic groups/populations were markedly changed. Also, the changes in network structure were significantly correlated with soil carbon and nitrogen contents, indicating the potential importance of network interactions in ecosystem functioning. In addition, based on network topology, microbial populations potentially most important to community structure and ecosystem functioning can be discerned. The novel approach described in this study is important not only for research on biodiversity, microbial ecology, and systems microbiology but also for microbial community studies in human health, global change, and environmental management.
The interactions among different microbial populations in a community play critical roles in determining ecosystem functioning, but very little is known about the network interactions in a microbial community, owing to the lack of appropriate experimental data and computational analytic tools. High-throughput metagenomic technologies can rapidly produce a massive amount of data, but one of the greatest difficulties is deciding how to extract, analyze, synthesize, and transform such a vast amount of information into biological knowledge. This study provides a novel conceptual framework to identify microbial interactions and key populations based on high-throughput metagenomic sequencing data. This study is among the first to document that the network interactions among different phylogenetic populations in soil microbial communities were substantially changed by a global change such as an elevated CO2 level. The framework developed will allow microbiologists to address research questions which could not be approached previously, and hence, it could represent a new direction in microbial ecology research.
PMCID: PMC3143843  PMID: 21791581
19.  Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale 
PLoS Computational Biology  2014;10(4):e1003594.
Operational Taxonomic Units (OTUs), usually defined as clusters of similar 16S/18S rRNA sequences, are the most widely used basic diversity units in large-scale characterizations of microbial communities. However, it remains unclear how well the various proposed OTU clustering algorithms approximate ‘true’ microbial taxa. Here, we explore the ecological consistency of OTUs – based on the assumption that, like true microbial taxa, they should show measurable habitat preferences (niche conservatism). In a global and comprehensive survey of available microbial sequence data, we systematically parse sequence annotations to obtain broad ecological descriptions of sampling sites. Based on these, we observe that sequence-based microbial OTUs generally show high levels of ecological consistency. However, different OTU clustering methods result in marked differences in the strength of this signal. Assuming that ecological consistency can serve as an objective external benchmark for cluster quality, we conclude that hierarchical complete linkage clustering, which provided the most ecologically consistent partitions, should be the default choice for OTU clustering. To our knowledge, this is the first approach to assess cluster quality using an external, biologically meaningful parameter as a benchmark, on a global scale.
Author Summary
To characterize the composition of microbial communities, researchers often sequence and quantify specific marker genes, particularly the SSU (‘small subunit’) ribosomal RNA gene. One crucial step in such studies is the clustering of sequences into Operational Taxonomic Units (OTUs) of closely related organisms. However, this practice has repeatedly been called into question, arguing that the use of OTUs is not backed by microbial speciation theory. Here, we explore whether OTUs group ecologically similar organisms and show that indeed, OTUs are generally ecologically consistent. Moreover, we show how ecological consistency can be used as a measure of OTU ‘quality’ and compare different widely used OTU clustering methods. Our findings should help in the design and interpretation of SSU-based microbial ecology studies, in a research field that is only beginning to unfold its full potential to help understand life at the smallest scales.
PMCID: PMC3998914  PMID: 24763141
20.  The Systemic Imprint of Growth and Its Uses in Ecological (Meta)Genomics 
PLoS Genetics  2010;6(1):e1000808.
Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable prokaryotes and paves the way to the understanding of community dynamics from metagenomic data.
Author Summary
Microbial minimal generation times vary from a few minutes to several weeks. The reasons for this disparity have been thought to lie on different life-history strategies: fast-growing microbes grow extremely fast in rich media, but are less capable of dealing with stress and/or poor nutrient conditions. Prokaryotes have evolved a set of genomic traits to grow fast, including biased codon usage and transient or permanent gene multiplication for dosage effects. Here, we studied the relative role of these traits and show they can be used to predict minimal generation times from the genomic data of the vast majority of microbes that cannot be cultivated. We show that this inference can also be made with incomplete genomes and thus be applied to metagenomic data to test hypotheses about the biomass productivity of biotopes and the evolution of microbiota in the human gut after birth. Our results also allow a better understanding of the co-evolution between growth rates and genomic traits and how they can be manipulated in synthetic biology. Growth rates have been a key variable in microbial physiology studies in the last century, and we show how intimately they are linked with genome organization and prokaryotic ecology.
PMCID: PMC2797632  PMID: 20090831
21.  An in vitro biofilm model system maintaining a highly reproducible species and metabolic diversity approaching that of the human oral microbiome 
Microbiome  2013;1:25.
Our knowledge of microbial diversity in the human oral cavity has vastly expanded during the last two decades of research. However, much of what is known about the behavior of oral species to date derives from pure culture approaches and the studies combining several cultivated species, which likely does not fully reflect their function in complex microbial communities. It has been shown in studies with a limited number of cultivated species that early oral biofilm development occurs in a successional manner and that continuous low pH can lead to an enrichment of aciduric species. Observations that in vitro grown plaque biofilm microcosms can maintain similar pH profiles in response to carbohydrate addition as plaque in vivo suggests a complex microbial community can be established in the laboratory. In light of this, our primary goal was to develop a robust in vitro biofilm-model system from a pooled saliva inoculum in order to study the stability, reproducibility, and development of the oral microbiome, and its dynamic response to environmental changes from the community to the molecular level.
Comparative metagenomic analyses confirmed a high similarity of metabolic potential in biofilms to recently available oral metagenomes from healthy subjects as part of the Human Microbiome Project. A time-series metagenomic analysis of the taxonomic community composition in biofilms revealed that the proportions of major species at 3 hours of growth are maintained during 48 hours of biofilm development. By employing deep pyrosequencing of the 16S rRNA gene to investigate this biofilm model with regards to bacterial taxonomic diversity, we show a high reproducibility of the taxonomic carriage and proportions between: 1) individual biofilm samples; 2) biofilm batches grown at different dates; 3) DNA extraction techniques and 4) research laboratories.
Our study demonstrates that we now have the capability to grow stable oral microbial in vitro biofilms containing more than one hundred operational taxonomic units (OTU) which represent 60-80% of the original inoculum OTU richness. Previously uncultivated Human Oral Taxa (HOT) were identified in the biofilms and contributed to approximately one-third of the totally captured 16S rRNA gene diversity. To our knowledge, this represents the highest oral bacterial diversity reported for an in vitro model system so far. This robust model will help investigate currently uncultivated species and the known virulence properties for many oral pathogens not solely restricted to pure culture systems, but within multi-species biofilms.
PMCID: PMC3971625  PMID: 24451062
In vitro model; Biofilm; Oral microbiome; Saliva; Streptococcus; Lactobacillus; Uncultivated bacteria
22.  Metagenomic and Metabolic Profiling of Nonlithifying and Lithifying Stromatolitic Mats of Highborne Cay, The Bahamas 
PLoS ONE  2012;7(5):e38229.
Stromatolites are laminated carbonate build-ups formed by the metabolic activity of microbial mats and represent one of the oldest known ecosystems on Earth. In this study, we examined a living stromatolite located within the Exuma Sound, The Bahamas and profiled the metagenome and metabolic potential underlying these complex microbial communities.
Methodology/Principal Findings
The metagenomes of the two dominant stromatolitic mat types, a nonlithifying (Type 1) and lithifying (Type 3) microbial mat, were partially sequenced and compared. This deep-sequencing approach was complemented by profiling the substrate utilization patterns of the mats using metabolic microarrays. Taxonomic assessment of the protein-encoding genes confirmed previous SSU rRNA analyses that bacteria dominate the metagenome of both mat types. Eukaryotes comprised less than 13% of the metagenomes and were rich in sequences associated with nematodes and heterotrophic protists. Comparative genomic analyses of the functional genes revealed extensive similarities in most of the subsystems between the nonlithifying and lithifying mat types. The one exception was an increase in the relative abundance of certain genes associated with carbohydrate metabolism in the lithifying Type 3 mats. Specifically, genes associated with the degradation of carbohydrates commonly found in exopolymeric substances, such as hexoses, deoxy- and acidic sugars were found. The genetic differences in carbohydrate metabolisms between the two mat types were confirmed using metabolic microarrays. Lithifying mats had a significant increase in diversity and utilization of carbon, nitrogen, phosphorus and sulfur substrates.
The two stromatolitic mat types retained similar microbial communities, functional diversity and many genetic components within their metagenomes. However, there were major differences detected in the activity and genetic pathways of organic carbon utilization. These differences provide a strong link between the metagenome and the physiology of the mats, as well as new insights into the biological processes associated with carbonate precipitation in modern marine stromatolites.
PMCID: PMC3360630  PMID: 22662280
23.  Microbial Co-occurrence Relationships in the Human Microbiome 
PLoS Computational Biology  2012;8(7):e1002606.
The healthy microbiota show remarkable variability within and among individuals. In addition to external exposures, ecological relationships (both oppositional and symbiotic) between microbial inhabitants are important contributors to this variation. It is thus of interest to assess what relationships might exist among microbes and determine their underlying reasons. The initial Human Microbiome Project (HMP) cohort, comprising 239 individuals and 18 different microbial habitats, provides an unprecedented resource to detect, catalog, and analyze such relationships. Here, we applied an ensemble method based on multiple similarity measures in combination with generalized boosted linear models (GBLMs) to taxonomic marker (16S rRNA gene) profiles of this cohort, resulting in a global network of 3,005 significant co-occurrence and co-exclusion relationships between 197 clades occurring throughout the human microbiome. This network revealed strong niche specialization, with most microbial associations occurring within body sites and a number of accompanying inter-body site relationships. Microbial communities within the oropharynx grouped into three distinct habitats, which themselves showed no direct influence on the composition of the gut microbiota. Conversely, niches such as the vagina demonstrated little to no decomposition into region-specific interactions. Diverse mechanisms underlay individual interactions, with some such as the co-exclusion of Porphyromonaceae family members and Streptococcus in the subgingival plaque supported by known biochemical dependencies. These differences varied among broad phylogenetic groups as well, with the Bacilli and Fusobacteria, for example, both enriched for exclusion of taxa from other clades. Comparing phylogenetic versus functional similarities among bacteria, we show that dominant commensal taxa (such as Prevotellaceae and Bacteroides in the gut) often compete, while potential pathogens (e.g. Treponema and Prevotella in the dental plaque) are more likely to co-occur in complementary niches. This approach thus serves to open new opportunities for future targeted mechanistic studies of the microbial ecology of the human microbiome.
Author Summary
The human body is a complex ecosystem where microbes compete, and cooperate. These interactions can support health or promote disease, e.g. in dental plaque formation. The Human Microbiome Project collected and sequenced ca. 5,000 samples from 18 different body sites, including the airways, gut, skin, oral cavity and vagina. These data allowed the first assessment of significant patterns of co-presence and exclusion among human-associated bacteria. We combined sparse regression with an ensemble of similarity measures to predict microbial relationships within and between body sites. This captured known relationships in the dental plaque, vagina, and gut, and also predicted novel interactions involving members of under-characterized phyla such as TM7. We detected relationships necessary for plaque formation and differences in community composition among dominant members of the gut and vaginal microbiomes. Most relationships were strongly niche-specific, with only a few hub microorganisms forming links across multiple body areas. We also found that phylogenetic distance had a strong impact on the interaction type: closely related microorganisms co-occurred within the same niche, whereas most exclusive relationships occurred between more distantly related microorganisms. This establishes both the specific organisms and general principles by which microbial communities associated with healthy humans are assembled and maintained.
PMCID: PMC3395616  PMID: 22807668
24.  Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms 
eLife  null;3:e03318.
Genomic analyses of microbial populations in their natural environment remain limited by the difficulty to assemble full genomes of individual species. Consequently, the chromosome organization of microorganisms has been investigated in a few model species, but the extent to which the features described can be generalized to other taxa remains unknown. Using controlled mixes of bacterial and yeast species, we developed meta3C, a metagenomic chromosome conformation capture approach that allows characterizing individual genomes and their average organization within a mix of organisms. Not only can meta3C be applied to species already sequenced, but a single meta3C library can be used for assembling, scaffolding and characterizing the tridimensional organization of unknown genomes. By applying meta3C to a semi-complex environmental sample, we confirmed its promising potential. Overall, this first meta3C study highlights the remarkable diversity of microorganisms chromosome organization, while providing an elegant and integrated approach to metagenomic analysis.
eLife digest
Microbial communities play vital roles in the environment and sustain animal and plant life. Marine microbes are part of the ocean's food chain; soil microbes support the turnover of major nutrients and facilitate plant growth; and the microbial communities residing in the human gut support digestion and the immune system, among other roles. These communities are very complex systems, often containing 1000s of different species engaged in co-dependent relationships, and are therefore very difficult to study.
The entire DNA sequence of an organism constitutes its genome, and much of this genetic information is stored in large structures called chromosomes. Examining the genome of a species can provide important clues about its lifestyle and how it evolved. To do this, DNA is extracted from cells and is then usually cut into smaller fragments, amplified, and sequenced. The small stretches of sequence obtained, called reads, are finally assembled, yielding ideally the complete genome of the organism under study.
Metagenomics attempts to interpret the combined genome of all the different species in a microbial community and has been instrumental in deciphering how the different species interact with each other. Metagenomics involves sequencing stretches of the community's DNA and matching these pieces to individual species to ultimately assemble whole genomes. While this may be a relatively straightforward task for communities that contain only a handful of members, the metagenomes derived from complex microbial communities are huge, fragmented, and incomplete. This often makes it very difficult or even nearly impossible to match the inferred DNA stretches to individual species.
A method called chromosome conformation capture (or ‘3C’ for short) can reveal the physical contacts between different regions of a chromosome and between the different chromosomes of a cell. How often each of these chromosomal contacts occurs provides a kind of physical signature to each genome and each individual chromosome within it.
Marbouty et al. took advantage of these interactions to develop a technique that combines metagenomics and chromosome conformation capture—called meta3C—that can analyze the DNA of many different species mixed together. Testing meta3C on artificial mixtures of a few species of yeast or bacteria showed that meta3C can separate the genomes of the different species without any prior knowledge of the composition of the mix. In a single experiment, meta3C can identify individual chromosomes, match each of them to its species of origin, and reveal the three-dimensional structure of each genome in the mix. Further tests showed that meta3C can also interpret more complex communities where the number and types of the species present are not known.
Meta3C holds great promise for understanding how microbial communities work and how the genomes of the species within a community are organized. However, further developments of the technique will be required to investigate communities as diverse as those present in most natural environments.
PMCID: PMC4381813  PMID: 25517076
Hi-C; meta3C; metagenomics; plasmid F; meta Hi-C; genome assembly; B. subtilis; E. coli; S. cerevisiae
25.  Reconstructing the Genomic Content of Microbiome Taxa through Shotgun Metagenomic Deconvolution 
PLoS Computational Biology  2013;9(10):e1003292.
Metagenomics has transformed our understanding of the microbial world, allowing researchers to bypass the need to isolate and culture individual taxa and to directly characterize both the taxonomic and gene compositions of environmental samples. However, associating the genes found in a metagenomic sample with the specific taxa of origin remains a critical challenge. Existing binning methods, based on nucleotide composition or alignment to reference genomes allow only a coarse-grained classification and rely heavily on the availability of sequenced genomes from closely related taxa. Here, we introduce a novel computational framework, integrating variation in gene abundances across multiple samples with taxonomic abundance data to deconvolve metagenomic samples into taxa-specific gene profiles and to reconstruct the genomic content of community members. This assembly-free method is not bounded by various factors limiting previously described methods of metagenomic binning or metagenomic assembly and represents a fundamentally different approach to metagenomic-based genome reconstruction. An implementation of this framework is available at We first describe the mathematical foundations of our framework and discuss considerations for implementing its various components. We demonstrate the ability of this framework to accurately deconvolve a set of metagenomic samples and to recover the gene content of individual taxa using synthetic metagenomic samples. We specifically characterize determinants of prediction accuracy and examine the impact of annotation errors on the reconstructed genomes. We finally apply metagenomic deconvolution to samples from the Human Microbiome Project, successfully reconstructing genus-level genomic content of various microbial genera, based solely on variation in gene count. These reconstructed genera are shown to correctly capture genus-specific properties. With the accumulation of metagenomic data, this deconvolution framework provides an essential tool for characterizing microbial taxa never before seen, laying the foundation for addressing fundamental questions concerning the taxa comprising diverse microbial communities.
Author Summary
Most microorganisms inhabit complex, diverse, and largely uncharacterized communities. Metagenomic technologies allow us to determine the taxonomic and gene compositions of these communities and to obtain insights into their function as a whole but usually do not enable the characterization of individual member taxa. Here, we introduce a novel computational framework for decomposing metagenomic community-level gene content data into taxa-specific gene profiles. Specifically, by analyzing the way taxonomic and gene abundances co-vary across a set of metagenomic samples, we are able to associate genes with their taxa of origin. We first demonstrate the ability of this approach to decompose metagenomes and to reconstruct the genomes of member taxa using simulated datasets. We further identify the factors that contribute to the accuracy of our method. We then apply our framework to samples from the human microbiome – the set of microorganisms that inhabit the human body – and show that it can be used to successfully reconstruct the typical genomes of various microbiome genera. Notably, our framework is based solely on variation in gene composition and does not rely on sequence composition signatures, assembly, or available reference genomes. It is therefore especially suited to studying the many microbial habitats yet to be extensively characterized.
PMCID: PMC3798274  PMID: 24146609

Results 1-25 (761700)