We characterized gut microbial communities in 31 monozygotic (MZ) twin pairs, 23 dizygotic (DZ) twin pairs, and where available their mothers (n=46) (
Supplementary Tables 1–5). MZ and DZ co-twins and parent-offspring pairs provided an attractive paradigm for assessing the impact of genotype and shared early environment exposures on the gut microbiome. Moreover, genetically ‘identical’
9 MZ twin pairs gain weight in response to overfeeding in a more reproducible way than do unrelated individuals
10 and are more concordant for body mass index (BMI) than DZ twin pairs
11.
Twin pairs who had been enrolled in the Missouri Adolescent Female Twin Study (MOAFTS
12) were recruited for this study (mean period of enrollment in MOAFTS, 11.7±1.2 years; range, 4.4–13.0 years). All twins were 25–32 years old, of European or African ancestry (EA and AA, respectively), were generally concordant for obesity (BMI≥30 kg/m
2) or leanness (BMI=18.5–24.9 kg/m
2) [1 twin pair was lean/overweight (overweight defined as BMI ≥25 and <30) and 6 pairs were overweight/obese], and had not taken antibiotics for at least 5.49±0.09 months. Each participant completed a detailed medical, lifestyle, and dietary questionnaire: they were broadly representative of the overall Missouri population with respect to BMI, parity, education, and martial status (see
Supplementary Results). Although all were born in Missouri, they currently live throughout the USA: 29% live in the same house, but some live >800 km apart. Since fecal samples are readily attainable and representative of interpersonal differences in gut microbial ecology
7, they were collected from each individual and frozen immediately. The collection procedure was repeated again with an average interval between sampling of 57±4 days.
To characterize the bacterial lineages present in the fecal microbiotas of these 154 individuals, we performed 16S rRNA sequencing, targeting the full-length gene with an ABI 3730xl capillary sequencer. Additionally, we performed multiplex pyrosequencing with a 454 FLX instrument to survey the gene’s V2 variable region
13 and it’s V6 hypervariable region
14 (
Supplementary Tables 1–3).
Complementary phylogenetic and taxon-based methods were used to compare 16S rRNA sequences among fecal communities (see
Methods). No matter which region of the gene was examined, individuals from the same family (a twin and her co-twin, or twins and their mother) had a more similar bacterial community structure than unrelated individuals ( and
Supplementary Fig. 1A,B), and shared significantly more species-level phylotypes (defined as sharing ≥97% identity in their 16S rRNA sequences) [G=55.2, p<10
−12 (V2); G=12.3, p<0.001 (V6); G=11.3, p<0.001 (full-length)]. No significant correlation was seen between the degree of physical separation of family members’ current homes and the degree of similarity between their microbial communities (defined by UniFrac
15). The observed familial similarity was not due to an indirect effect of the physiologic states of obesity versus leanness; similar results were observed after stratifying twin-pairs and their mothers by BMI category (concordant lean or concordant obese individuals;
Supplementary Fig. 2). Surprisingly, there was no significant difference in the degree of similarity in the gut microbiotas of adult MZ versus DZ twin-pairs (). However, in the present study we could not assess whether MZ and DZ twin pairs had different degrees of similarities at earlier stages of their lives.
Multiplex pyrosequencing of V2 and V6 amplicons allowed higher levels of coverage compared to what was feasible using Sanger sequencing, reaching on average 3,984±232 (V2) and 24,786±1,403 (V6) sequences per sample. To control for differences in coverage, all analyses were performed on an equal number of randomly selected sequences [200 full-length, 1,000 V2, and 10,000 V6]. At this level of coverage, there was little overlap between the sampled fecal communities. Moreover, the number of 16S rRNA gene sequences belonging to each phylotype varied greatly between fecal microbiotas (
Supplementary Tables 6–8).
Since this apparent lack of overlap could reflect the level of coverage (
Supplementary Tables 1–3), we subsequently searched all hosts for bacterial phylotypes present at high abundance using a sampling model based on a combination of standard Poisson and binomial sampling statistics. The analysis allowed us to conclude that no phylotype was present at more than ~0.5% abundance in all of the samples in this study (see
Supplementary Results). Finally, we subsampled our dataset by randomly selecting 50–3,000 sequences/sample; again, no phylotypes were detectable in all individuals sampled within this range of coverage (
Supplementary Fig. 3).
Samples taken from the same individual at the initial collection point and 57±4 days later were consistent with respect to the specific phylotypes found (
Supplementary Figs. 4,5), but showed variations in relative abundance of the major gut bacterial phyla (
Supplementary Fig. 6). There was no significant association between UniFrac distance and the time between sample collections. Overall, fecal samples from the same individual were much more similar to one another than samples from family members or unrelated individuals (), demonstrating that short-term temporal changes in community structure within an individual are minor compared to inter-personal differences.
Analysis of 16S rRNA datasets produced by the three PCR-based methods, plus shotgun sequencing of community DNA (see below), revealed a lower proportion of Bacteroidetes and a higher proportion of Actinobacteria in obese versus lean EA and AA individuals (
Supplementary Table 9). Combining the individual p-values across these independent analyses using Fisher’s method disclosed significantly less Bacteroidetes (p=0.003), more Actinobacteria (p=0.002), but no significant difference in Firmicutes (p=0.09). These findings are in agreement with previous work showing comparable differences in both taxa in mice
2 and a progressive increase the representation of Bacteroidetes when 12 unrelated obese humans lost weight after being placed on one of two reduced calorie diets
6.
Across all methods, obesity was associated with a significant decrease in the level of diversity ( plus
Supplementary Fig. 1C–F). This reduced diversity suggests an analogy: the obese gut microbiota is not like a rainforest or reef, which are adapted to high energy flux and are highly diverse, but rather may be more like a fertilizer runoff where a reduced diversity microbial community blooms with abnormal energy input
16.
We subsequently characterized the microbial lineage and gene content of the fecal microbiomes of 18 individuals representing 6 of the families (3 lean EA MZ twin-pairs and their mothers plus 3 obese EA MZ twin pairs and their mothers) through shotgun pyrosequencing (
Supplementary Tables 4,5) and BLASTX comparisons against a number of databases [KEGG
17 (v44) and STRING
18] plus a custom database of 44 reference human gut microbial genomes (
Supplementary Figs. 7–10 and Supplementary Results). Our analysis parameters were validated using control datasets comprised of randomly fragmented microbial genes with annotations in the KEGG database
17 (
Supplementary Fig. 11 and Supplementary Methods). We also tested how technical advances that produce longer reads might improve these assignments by sequencing fecal community samples from one twin pair using Titanium pyrosequencing methods [average read length of 341±134 nt (SD) versus 208±68 nt for the standard FLX method].
Supplementary Fig. 12 shows that the frequency and quality of sequence assignments is improved as read length increases from 200 to 350 nt.
The 18 microbiomes were searched to identify sequences matching domains from experimentally validated Carbohydrate-Active enZymes (CAZymes). Sequences matching 156 total CAZy families were found within at least one human gut microbiome, including 77 glycoside hydrolase, 21 carbohydrate-binding module, 35 glycosyltransferase, 12 polysaccharide lyase, and 11 carbohydrate-esterase families (
Supplementary Table 10). On average 2.62±0.13% of the sequences in the gut microbiome could be assigned to CAZymes (total of 217,615 sequences), a percentage that is greater than the most abundant KEGG pathway (‘Transporters’; 1.20±0.06% of the filtered sequences generated from each sample), and indicative of the abundant and diverse set of microbial genes directed towards accessing a wide range of polysaccharides.
Category-based clustering of the functions from each microbiome was performed using Principal Components Analysis (PCA) and hierarchical clustering
19. Two distinct clusters of gut microbiomes were identified based on metabolic profile, corresponding to samples with an increased abundance of Firmicutes and Actinobacteria, and samples with a high abundance of Bacteroidetes (). A linear regression of the first principal component (PC1, explaining 20% of the functional variance) and the relative abundance of the Bacteroidetes showed a highly significant correlation (R
2=0.96, p<10
−12; ). Functional profiles stabilized within each individual’s microbiome after ~20,000 sequences had been accumulated (
Supplementary Fig. 13). Family members had more similar profiles than unrelated individuals (), suggesting that shared bacterial community structure (who’s there based on 16S rRNA analyses) also translates into shared community-wide relative abundance of metabolic pathways. Accordingly, a direct comparison of functional and taxonomic similarity (see
Supplementary Methods) disclosed a significant association: individuals with similar taxonomic profiles also share similar metabolic profiles (p<0.001; Mantel test).
Functional clustering of phylum-wide sequence bins representing microbiome reads assigned to 23 human gut Firmicutes and 14 Bacteroidetes reference genomes showed discrete clustering by phylum (
Supplementary Figs. 14A,15). Bootstrap analyses of the relative abundance of metabolic pathways in the microbiome-derived Firmicutes and Bacteroidetes sequence bins, disclosed 26 pathways with a significantly different relative abundance (
Supplementary Fig. 14A). The Bacteroidetes bins were enriched for a number of carbohydrate metabolism pathways, while the Firmicutes bins were enriched for transport systems. The finding is consistent with our CAZyme analysis, which revealed a significantly higher relative abundance of glycoside hydrolases, carbohydrate-binding modules, glycosyltransferases, polysaccharide lyases, and carbohydrate esterases in the Bacteroidetes sequence bins (
Supplementary Fig. 14B).
One of the major goals of the International Human Microbiome Project(s) is to determine whether there is an identifiable ‘core microbiome’ of shared organisms, genes, or functional capabilities found in a given body habitat of all or the vast majority of humans
1. Although all of the 18 gut microbiomes surveyed showed a high level of beta-diversity with respect to the relative abundance of bacterial phyla (), analysis of the relative abundance of broad functional categories of genes (COG) and metabolic pathways (KEGG) revealed a generally consistent pattern regardless of the sample surveyed ( and
Supplementary Table 11): the pattern is also consistent with results we obtained from an meta-analysis of previously published gut microbiome datasets from nine adults
20,21 (
Supplementary Fig. 16). This consistency is not simply due to the broad level of these annotations, as a similar analysis of Bacteroidetes and Firmicutes reference genomes revealed substantial variation in the relative abundance of each category (see
Supplementary Fig. 17). Furthermore, pair-wise comparisons of metabolic profiles obtained from the 18 microbiomes in this study revealed an average R
2 of 0.97±0.002 (), indicating a high level of functional similarity.
Overall functional diversity was compared using the Shannon index
22, a measurement that combines diversity (the number of different types of metabolic pathways) and evenness (the relative abundance of each pathway). The human gut microbiomes surveyed had a stable and high Shannon index value (4.63±0.01), close to the maximum possible level of functional diversity (5.54; see
Supplementary Methods). Despite the presence of a small number of abundant metabolic pathways (listed in
Supplementary Table 11), the overall functional profile of each gut microbiome is quite even (Shannon evenness of 0.84±0.001 on a scale of 0 to 1), demonstrating that most metabolic pathways are found at a similar level of abundance. Interestingly, the level of functional diversity in each microbiome was significantly linked to the relative abundance of the Bacteroidetes (R
2=0.81, p<10
−6); microbiomes enriched for Firmicutes/Actinobacteria had a lower level of functional diversity. This observation is consistent with an analysis of simulated metagenomic reads generated from each of 36 Bacteroidetes and Firmicutes genomes (
Supplementary Fig. 18): on average, the Bacteroidetes genomes have a significantly higher level of both functional diversity and evenness (Mann-Whitney, p<0.01).
At a finer level, 26–53% of ‘enzyme’-level functional groups (KEGG/CAZy/STRING) were shared across all 18 microbiomes, while 8–22% of the groups were unique to a single microbiome (
Supplementary Fig. 19A–C). The ‘core’ functional groups present in all microbiomes were also highly abundant, representing 93–98% of the total sequences. Given the higher relative abundance of these ‘core’ groups, >95% were found after 26.11±2.02 Mb of sequence was collected from a given microbiome, whereas the ‘variable’ groups continue to increase substantially with each additional Mb of sequence. Of course, any estimate of the total size of the core microbiome will be dependent upon sequencing effort, especially for functional groups found at a low abundance. On average, our survey achieved greater than 450,000 sequences per fecal sample, which, assuming an even distribution, would allow us to sample groups found at a relative abundance of 10
−4. To estimate the total size of the core microbiome based on the 18 individuals, we randomly sub-sampled each microbiome in 1,000 sequence intervals (
Supplementary Fig. 19D). Based on this analysis, the core microbiome is approaching a total of 2,142 total orthologous groups (one site binding hyperbola curve fit to the resulting rarefaction curve, R
2=0.9966), indicating that we have identified 93% of functional groups (defined by STRING) found within the core microbiome of the 18 individuals surveyed. Of these core groups, 71% (CAZy), 64% (KEGG), and 56% (STRING) were also found in the 9 previously published but much lower coverage datasets generated by capillary sequencing of adult fecal DNA
20,21 (average of 78,413±2,044 bidirectional reads/sample; see
Supplementary Methods).
Metabolic reconstructions of the ‘core’ microbiome revealed significant enrichment for a number of expected functional categories, including those involved in transcription and translation (). Metabolic profile-based clustering indicated that the representation of ‘core’ functional groups was highly consistent across samples (
Supplementary Fig. 20), and includes a number of pathways likely important for life in the gut, such as those for carbohydrate and amino acid metabolism (e.g. fructose/mannose metabolism, aminosugar metabolism, and N-Glycan degradation). Variably represented pathways and categories include cell motility (only a subset of Firmicutes produce flagella), secretion systems, and membrane transport (e.g. phosphotransferase systems involved in the import of nutrients, including sugars; and
Supplementary Fig. 20).
The distribution of CAZy glycoside hydrolase and glycosyltransferase families was compared between each pair of microbiomes (see
Supplementary Table 10 for CAZy families with a relative abundance >1%). This analysis revealed that all individuals have a similar profile of glycosyltransferases (R
2=0.96±0.003), while the profiles of glycoside hydrolases were significantly more variable, even between family members (R
2=0.80±0.01; p<10
−30, paired Student’s t-test). This suggests that the number and spectrum of glycoside hydrolases is probably affected by ‘external’ factors such as diet more than the glycosyltransferases.
To identify metabolic pathways associated with obesity, only non-core associated (variable) functional groups were included in a comparison of the gut microbiomes of lean versus obese twin pairs. A bootstrap analysis
23 was used to identify metabolic pathways that were enriched or depleted in the variable obese gut microbiome. For example, similar to a mouse model of diet-induced obesity
4, the obese human gut microbiome was enriched for phosphotransferase systems involved in microbial processing of carbohydrates (
Supplementary Table 12). All gut microbiome sequences were compared against the custom database of 44 human gut genomes: an odds ratio analysis revealed 383 genes that were significantly different between the obese and lean gut microbiome (q-value < 0.05; 273 enriched and 110 depleted in the obese microbiome;
Supplementary Tables 13,14). By contrast, only 49 genes were consistently enriched or depleted between all twin-pairs (see
Supplementary Methods).
These obesity-associated genes were representative of the taxonomic differences described above: 75% of the obesity-enriched genes were from Actinobacteria (vs. 0% of lean-enriched genes; the other 25% are from Firmicutes) while 42% of the lean-enriched genes were from Bacteroidetes (vs. 0% of the obesity-enriched genes). Their functional annotation indicated that many are involved in carbohydrate, lipid, and amino acid metabolism (
Supplementary Tables 13,14). Together, they comprise an initial set of microbial biomarkers of the obese gut microbiome.
Our finding that the gut microbial community structures of adult MZ twin pairs had a degree of similarity that was comparable to that of DZ twin pairs, and only slightly more similar compared to their mothers, is consistent with an earlier fingerprinting study of adult twins
24, and with a recent microarray-based analysis, which revealed that gut community assembly during the first year of life followed a more similar pattern in a pair of DZ twins compared to 12 unrelated infants
25. Intriguingly, another fingerprinting study of MZ and DZ twins in childhood showed a slightly reduced similarity profile in DZ twins
26. Thus, comprehensive time-course studies, comparing MZ and DZ twin pairs from birth through adulthood, as well as intergenerational analyses of their families’ microbiotas, will be key to determining the relative contributions of host genotype and environmental exposures to (gut) microbial ecology.
The hypothesis that there is a core human gut microbiome, definable by a set of abundant microbial organismal lineages that we all share, may be incorrect: by adulthood, no single bacterial phylotype was detectable at an abundant frequency in the guts of all 154 sampled humans. Instead, it appears that a core gut microbiome exists at the level of metabolic functions. This conservation suggests a high degree of redundancy in the gut microbiome and supports an ecological view of each individual as an ‘island’ inhabited by unique collections microbial phylotypes: as in actual islands, different species assemblages converge on shared core functions provided by distinctive components. Our findings raise the question of how core functionality is assembled in this body habitat. Understanding the underlying principles should provide insights about microbial adaptation to, and perhaps mutualistic community assembly within, a wide range of environments.