3.1. Samples, sequencing, and gene prediction
The subjects analysed in this study included seven adults (aged 24–45 years), two children (3 and 1.5 years), and four unweaned infants (3–7 months). Seven of the subjects belonged to two unrelated families consisting of three and four members, respectively ().
We isolated the microbial DNA from each fecal sample, constructed shotgun libraries (see ‘Materials and methods’ for details), and produced a total of 1,057,481 shotgun reads (about 80,000 reads for each) representing sequences of about 727 Mb at a Phred score of >15 (). Relatively large numbers of total shotgun reads (67.6% on average) were assembled into contigs for each sample (), contrasted with the soil sample in which less than 1% of the total reads (nearly 150, 000 reads) were assembled.39
Although 52–80% (in most cases less than 60%) of the total reads were assembled in the adults and children, 79–89% were assembled in the unweaned infants. Therefore, the lengths of the non-redundant metasequences significantly differed between adult/child and infant microbiomes: 38.9–49.6 Mb for all adults and children except for one adult (29.9 Mb in In-A) and 14.9–28.1 Mb for all infants (). The total length of the contigs and singletons obtained from the 13 samples was 478.8 Mb.
From each non-redundant metasequence, we identified 20,063–67,740 potential protein-coding genes (≥20 amino acids) by using the MetaGene program (). We might have overestimated the number of genes in our metasequences by false prediction and/or double counting of the fragmented ORFs that were derived from the same gene. The MetaGene program, however, predicted 1,406,000 ORFs in the Sargasso Sea metagenomic data,26
which is 8.7% more than that (1,284,108 ORFs) identified from the same dataset by evidence-based gene finding.
3.2. Composition of human gut microbiota
To compare the overall sequence similarities among the microbiomes from fecal and other-environmental samples,21,26,39
we performed a reciprocal BLASTP analysis of the whole gene set for each microbiome, followed by MDS clustering against the D2 normalized distance matrix (see ‘Materials and methods’). The data indicated that all gut microbiomes from the adults and weaned children form a distinct group (). In contrast, those from the unweaned infants were highly divergent from each other and from the microbiomes of the adults and children, as well as from those of other environments.
Figure 1 Clustering analysis of microbiomes based on cumulative bitscore comparisons. MDS was applied to the distance matrix calculated from reciprocal pairwise BLASTP analysis among all predicted gene products. The dots represent fecal samples from adults and (more ...)
To determine the microbial composition at the genus level, we next conducted a BLASTP analysis of all the predicted genes against the genes in our ‘in-house extended NR database’ (see ‘Materials and methods’). With a threshold of 90% BLASTP identity, 17–43% of the predicted genes could be assigned to particular genera (35–65 genera, 121 in total) in the adults and children (). A significantly higher proportion of genes (35–55%) was assignable (31–61 genera, 84 in total) in the unweaned infants, but, overall, the data indicated that the majority of gut microbes are as yet uncharacterized. We detected a total of 142 genera from the 13 samples in this analysis.
Figure 2 Compositional view of human intestinal microbiomes. A compositional view of microbiomes based on the taxonomic assignment of protein-coding genes is shown. The stacked bars represent the compositions of each sample estimated from the results of BLASTP (more ...)
Despite the low proportion of assigned genes, their taxonomic distribution indicated a clear compositional change after weaning. In the adults and weaned children, the major constituents were always Bacteroides, followed by several genera belonging to the division Firmicutes, such as Eubacterium, Ruminococcus, and Clostridium, and the genus Bifidobacterium. In the infants, Bifidobacterium and/or a few genera from the family Enterobacteriaceae, such as Escherichia, Raoultella, and Klebsiella, were the major constituents. A significant level of inter-individual variation was observed also among the adults and children, but there was a much higher variation among the unweaned infants.
To further evaluate the microbial composition at the species level, we examined the intra-genus diversity of Bacteroides
, the most dominant genera in the adults and children and in the infants, respectively. Taking advantage of the number of available genome sequences belonging to these two genera (11 species from Bacteroides
, five from Bifidobacterium
), we performed a mapping analysis of the shotgun reads to these genomes using the BLASTN program34
with a threshold of ≥95% identity and ≥150 bp aligned length. The analysis demonstrated that, for each infant, more than 80% of the shotgun reads were mapped to Bifidobacterium
derived from a single species or genome (Supplementary Table S4). In contrast, the shotgun reads from the adults and children were mapped to multiple Bacteroides
species except for In-A, suggesting a compositional complexity of this genus in human gut microbiota.
Together with the results of the shotgun sequence assembly (), the data from these compositional analyses showed a clear structural difference between the microbiota of the unweaned infants and those of the adults and weaned children. Infant microbiota were dominated by a few microbial species or strains, exhibiting rather simple structures, but showed a remarkable inter-individual variation. In contrast, most microbiota of the adults and children were much more complex in species composition, but exhibited high levels of overall sequence similarity between the samples. In In-A, the shotgun read sequences apparently derived from a few species of Bacteroides and Eubacterium were notably dominated (data not shown), accounting for the significantly shorter non-redundant sequence of In-A (see Subsection 3.1 and ).
It should also be noted that the samples from Japanese and American21
adults differed significantly in composition, particularly in terms of Bacteroides
and archaeal species (). The gut microbiomes from two American samples contained very few sequences and genes assigned to Bacteroides
species and a significant number of sequences and genes assigned to an archaeal species, whereas the gut microbiomes from the Japanese samples contained a high ratio of sequences and genes assigned to Bacteroides
species and almost no archaeal sequences and genes. Further studies should establish the reasons for these intriguing differences, which could be due to various factors including the genetic background and dietary style of the hosts, but also to differences in the experimental conditions between the two studies.
3.3. Functional assignment of predicted genes
Functional assignment of predicted genes (662,548 in total) was made on the basis of BLASTP analysis against the ‘reference dataset for COG assignment’ (see ‘Materials and methods’). By this analysis, about 48% of the predicted genes were assigned to a total of 3,268 COGs (). The number of COGs identified in the infant microbiomes showed remarkable inter-individual variation (1617–2857 COGs), in contrast to those of the adults and children (2355–2921 COGs). Also, the number of orthologous genes belonging to each COG in the infants was on average about two-third of that observed in the adults and children. These results indicate that the gene repertoires in the gut microbiomes are more variable and functionally less redundant in infants than in adults and children.
To explore the functional characteristics of human intestinal microbiota, we looked for significantly over- or under-represented COGs in gut microbiomes when compared with Ref-DB (see ‘Materials and methods’). As shown in , human gut microbiomes showed patterns distinct from those of other environments such as sea and soil. The over-representation of COGs classified into the ‘Carbohydrate transport and metabolism’ category and the under-representation of those for ‘Lipid transport and metabolism’ were observed in all the human gut microbiomes examined in this study. However, a clear difference was observed between the adults/children and the unweaned infants. The gut microbiomes from the adults and children exhibited a uniform pattern, and the over-representation of COGs for ‘Defense mechanisms’ and under-representation of ‘Cell motility’, ‘Secondary metabolites biosynthesis, transport and catabolism’ and ‘Post-translational modification, protein turnover, chaperones’ were remarkable (B). In contrast, the infant microbiomes showed variable patterns (C). The enrichment values of all the COGs in each microbiome are shown in Supplementary Table S5.
Figure 3 Summary of the COG assignment of predicted genes. (A) Comparison of the distribution patterns of COG-assigned genes between each type of microbiome and Ref-DB (for Ref-DB, see ‘Materials and methods’). Fecal samples from Japanese adults (more ...)
Profiling analysis based on the COG enrichment values calculated for each microbiome further demonstrated that, while the gut microbiomes of the adults and children showed similar profiles, those of the infants had distinct and more variable profiles (; also see Supplementary Fig. S1). It may be noteworthy that neither this analysis nor the overall sequence similarity analysis shown in provided any conclusive evidence for the resemblance of the genomic features of gut microbiomes among family members and within the sexes.
Figure 4 Relationship between human intestinal microbiomes and other-environmental microbiomes based on their functional profiles. The result of a clustering analysis of microbiomes based on the enrichment values of each COG calculated for each microbiome is shown. (more ...)
3.4. A gene set commonly enriched in adult-type gut microbiomes
COGs that are commonly enriched in the microbiomes of all adults and children were searched, and 237 COGs met the following criteria: (i) the average enrichment value exceeds 2.0 and (ii) the enrichment value in all subjects exceeds 1.0 ( and Supplementary Table S6). When the samples from two adult Americans were included in the analysis, 79% (188) of the 237 COGs still met the above criteria, even though the American data contained, unusually, only a few sequences derived from Bacteroides species (). In contrast, only 5–10% of these COGs exhibited an enrichment value of >2.0 in the microbiomes of other environments. Therefore, these COGs are specifically enriched in adult-type gut microbiomes, and thus may encode important functions for the gut microbiota itself as well as for its host.
Figure 5 Functional distribution of commonly enriched COGs. The functional distribution of commonly enriched COGs in adult/child microbiomes (‘A-gutCEGs’), in infant microbiomes (‘I-gutCEGs’), or in both types of microbiomes is (more ...)
Pyruvate-formate lyase (COG1882), which catalyzes the non-oxidative conversion of pyruvate to formate and acetyl-coenzyme A, was enriched. Unexpectedly, however, genes for the formate hydrogenlyase system that decomposes formate to CO2 and H2 were rather under-represented. In this regard, of interest is the enrichment of formyltetrahydrofolate synthetase (COG2759), methenyl tetrahydrofolate cyclohydrolase (COG3404), and methionine synthase (COG1410), all of which are enzymes involved in the regulation of one carbon pool by folate. Their enrichment may suggest that one carbon unit of formate can be utilized effectively by the gut microbiota in the folate-mediated cycle of one carbon pool. In contrast to the enrichment of enzymes for anaerobic pyruvate metabolism, the pyruvate dehydrogenase complex (COG2069) was profoundly depleted. All components of the oxidative tricarboxylic acid (TCA) cycle and the membrane respiratory chain (with the exception of NADH:ubiquinone oxidoreductase) were also significantly under-represented in all subjects, but phosphoenolpyruvate carboxykinase (COG1866) and pyruvate carboxyltransferase (COG5016), which generate oxaloacetate, an entry substrate to the TCA cycle in the reductive pathway, were enriched. Together with the striking depletion of most gene families whose products scavenge oxygen radicals, these findings reflect well the fact that the adult gut ecosystem is a kingdom of strict anaerobes.
The enrichment of carbohydrate metabolism genes was also striking: 24% (53 COGs) of the commonly enriched COGs had this function. At least 14 families of glycosyl hydrolases for plant-derived dietary polysaccharides and host tissue-derived proteoglycans or glycoconjugates were enriched in the adults. In addition, many enzymes involved in the metabolism of mono- or disaccharides released by these glycosyl hydrolases, such as l
-fucose isomerase (COG2407), l
-arabinose isomerase (COG2160), and galactokinase (COG0153), were also over-represented. Several peptidase families (COG1362, COG2195, COG3340, and COG3579) were also enriched, but most genes for fatty-acid metabolism were selectively reduced in number. These findings support the notion that the colonic microbiota utilizes otherwise indigestible polysaccharides and peptides as major resources for energy production and biosynthesis of cellular components.7,8
The enrichment of phosphoenolpyruvate carboxykinase (COG1866), glycogen synthase (COG0297) and ADP-glucose pyrophosphorylase (COG0448) suggests that energy storage is also an important activity of adult-type gut microbiota. This activity may be required for the gut microbiota to cope with intermittent nutrient supply in the adult gut. The enrichment of antimicrobial peptide transporters (COG0577 and COG1132) and a multidrug efflux pump (COG0534) is also of interest. Host intestinal cells produce various cationic antimicrobial peptides (CAMPs), such as beta-defensins.43
Many microorganisms also produce CAMPs to compete with other microbes sharing the same niche. The enrichment of antimicrobial peptide transporters and the multidrug efflux pump may play a primary role in the stable colonization of gut microbes in the adult intestine by conferring resistance to CAMPs.
The enrichment of several enzymes for DNA repair is also noteworthy (Supplementary Table S6). These enzymes may be needed to repair microbial DNA damage caused by genotoxic substances, such as nitrosamines and heterocyclic amines contained in ingested foods and secondary bile acids and nitroso compounds synthesized in the intestine via gut microbiota-involving processes.44
It is conceivable that not only the host cells but also intestinal microbes are constantly exposed to such genotoxic compounds.
Another distinguishing feature of the adult-type microbiota is the striking depletion of genes for the biosynthesis of flagella and chemotaxis (Supplementary Table S5). This implies that motility and chemotaxis are not required for the intestinal microbes to persist in the gut, where the contents are constantly stirred by peristalsis. Rather, flagellated microbes may be easily eliminated by the host immune system because flagella are highly immunogenic. Abnegation of motility may be another adaptation mechanism of gut microbes to the intestinal environment.
3.5. A gene set commonly enriched in infant-type gut microbiomes
Despite the high inter-individual variation, 136 COGs were found to be commonly enriched in the infant microbiomes ( and Supplementary Table S6). Of these, 58 were also over-represented in the adult/child microbiomes.
Genes for anaerobic energy production were also enriched in infants, but genes for the pyruvate dehydrogenase complex and all components of the oxidative TCA cycle were present in the infants at a frequency similar to that in Ref-DB. These findings may reflect the compositional feature of the infant gut microbiota, which contains considerable numbers of facultative anaerobes ().
In infant microbiomes, about 35% (47
) of the 136 enriched COGs were for ‘Carbohydrate transport and metabolism,’ including 12 families of glycosyl hydrolases, nine of which were enriched also in the adult gut microbiomes. Unexpectedly, they included several enzymes that degrade non-digestible polysaccharides of plant origin, such as pullulanase and related glycosidases (COG1523), arabinogalactan endo-1,4-beta-galactosidase (COG3867), and endopolygalacturonase (COG5434). These enzymes may act to degrade oligosaccharides in breast milk or host-derived proteoglycans like mucin to maintain the functional homeostasis of gut epithelia.45
It is also possible that the gut microbiota is ready to utilize plant-derived polysaccharides to some extent before weaning.
The over-representation of various transport systems was also characteristic to infants, with 22% (29
) of the 136 enriched COGs being transporters. In particular, the enrichment of phosphotransferase systems that mediate active sugar transport was remarkable. This prokaryote-specific transport system may play a central role in the uptake of lactose and other easily digestible simple sugars rich in breast milk. The over-representation of other transporters may also be advantageous to the microbes in the infant intestine because breast milk contains many other essential nutrients such as amino acids, long-chain fatty acids, nucleotides, vitamins, and minerals in a readily available form. The difference in diet between adults and unweaned infants appears to affect other functional properties as well. For instance, the genes for defense mechanisms and DNA repair that were over-represented in the adults were not so in the infants.
3.6. A CTn family amplified in the intestine
Although this remains largely unproven, the distal colon has been regarded an ecologically suitable site for horizontal gene transfer (HGT) between microorganisms due to its high microbial cell density.2
We identified many gene families related to transposases and bacteriophages in the metagenomic data, but their over-representation was noted only in certain individuals (Supplementary Table S5). An exception to this was a set of genes homologous to those on Tn1549
-like CTns, which was notably enriched in most of the gut microbiomes analysed here (). Tn1549
was originally identified in an E. faecalis
thereafter, its relatives have been identified in another E. faecalis
, and Streptococcus
It has also been recorded that Tn1549
is transferable between C. symbiosum
The homologues found in the metagenomic data accounted for 0.8% of all the predicted genes (5,325 genes in total) and were also enriched in the two fecal samples from American individuals21
( and Supplementary Table S7). They were highly divergent in sequence from the corresponding genes on six known Tn1549
-like CTns (C), but frequently appeared as gene clusters on contigs. We identified 89 contigs that contained four or more genes related to Tn1549
-like CTns. These genes appeared there in the same or a similar gene organization as seen in Tn1549
-like CTns (Supplementary Table S7), suggesting that they were derived from divergent members of a Tn1549
-like CTn family, which we refer to as ‘CTnRINT’ (CTn rich in intestine). By analysing these contigs, we found that CTnRINT members contain a variety of genes, such as those for ABC-type multidrug transport systems, in the regions corresponding to that for the vancomycin-resistance genes on Tn1549
(data not shown). As shown in B, other known Tn1549
-like CTns also contain various accessory genes in this region. These findings strongly suggest that the CTnRINT family is largely involved in the process of HGT in the human intestine. It seems reasonable that conjugal elements, which mediate genetic exchanges and transmittance through cell–cell contact, are key players in HGT in the colon.
Figure 6 The Tn1549-like CTn family, ‘CTnRINT’, explosively amplified in human gut microbiomes. (A) Numbers of genes homologous to those on six known Tn1549-like CTns. Genes derived from each fecal sample are shown in different colors. Plum: In-A; (more ...)
In addition to the CTnRINT family, we found that integrases/site-specific recombinases belonging to COG4974 were remarkably expanded in the microbiomes of the adults and children. Most of them were apparently derived from several types of integrative mobile genetic elements such as Tn916
, suggesting that other types of integrative elements are also richly present in human gut microbiomes.
3.7. Orphan gene families in human gut microbiomes
Of the 662,548 genes predicted in the 13 samples, 162,647 were orphan genes (25% of the total genes). Similarly, 503,115 orphan genes were obtained from other-environmental microbiomes.21,26,39
An all-to-all BLASTP analysis of these 665,762 orphan gene products followed by a clustering analysis (see ‘Materials and methods’) yielded 160,543 clusters and 461,435 singletons. Of the 160,543 clusters, 647 comprised five or more gene products derived only from human fecal samples (Supplementary Table S8). The largest two clusters, ID37 and ID39 containing 48 and 47 members, respectively, were present in the microbiomes of all the Japanese adults and children, and also in those of the American adults. For eight clusters that comprise ≥30 gene members, we performed motif search/extraction analyses by using the HMMER program against the Pfam motif database.41
Only two gene products in cluster ID44 showed a significant similarity to a Pfam motif (PF07508; recombinase); other gene products in cluster ID44 showed no significant similarity in the Pfam database. We could identify the conserved amino-acid sequences (38–50 amino acids) for each cluster by using the MEME program42
(Supplementary Fig. S2). These conserved sequences may represent new motifs specific to human gut microbiomes.
3.8. Remarks and future perspectives
The present study is the first large-scale comparative metagenomic analysis of human gut microbiomes. The data provided several new lines of insight into the genomic features of gut microbiota. First, our data clearly demonstrated a difference in overall composition and gene repertoire between adult- and infant-type gut microbiomes. The simple and less redundant features of the infant-type gut microbiota are probably linked to its high inter-individual variability (Figs. , 2, and 3C). We suggest that the infant-type can be viewed as unstable, yet dynamic and adaptable. Conversely, the functional uniformity observed in the adult-type microbiota (Figs. , 3B, and 4) may be attributable to its more complex nature ( and Tables and S2), which in turn suggests that the insurance hypothesis50
for the benefit of biodiversity may be relevant to the gut.
Secondly, a comparison of the gene contents between gut microbiota and previously sequenced microbes revealed 237 COGs commonly enriched in adult-type microbiomes and 136 COGs in infant-type microbiomes ( and Supplementary Table S6). The characterization of these genes revealed distinct nutrient acquisition strategies in each type of microbiota, possibly to accommodate very different diets of their hosts. The analysis also revealed several possible strategies through which intestinal microbes adapt to the intestinal environment and establish symbiotic relationships with their host. Thus, these genes, which we refer to as ‘Adult- or Infant-gut Commonly Enriched Genes’ (A-gutCEGs or I-gutCEGs), appear to encode some of the core functions of each type of microbiota. It is noteworthy that the two gene sets contain as many as 104 gene families of unknown functions (). The in vitro and in vivo functions of these uncharacterized genes as well as those of the 647 ‘new’ gene families (Supplementary Table S8) would be important topics of future studies.
Thirdly, a survey of the enriched genes revealed an abundance of mobile genetic elements in the human intestinal gene pool, emphasizing that the human gut microbiota is a “hot spot” for HGT between microbes. Of particular importance is the abundance of conjugal elements including CTnRINT. Considering their high transfer efficiency, the broad range of hosts, and the frequent carriage of drug-resistance genes, it would be prudent to reassess the heavy use of antibiotics in modern medicine.
Finally, the metagenomic datasets presented here will be of great use for understanding the roles of gut microbiota in the etiology of human diseases and also for scientifically evaluating the efficacy of probiotics, prebiotics and other ‘functional foods’ that are widely used for modulating the intestinal microbiota in an effort to improve our health7