The microorganism community in the human gastrointestinal (GI) tract contains more than 1000 species whose accumulated genomes may have 100 times more genes than the human genome. In this perspective, gut microbiota can be viewed as an organ that regulates its host’s metabolic and immune systems [1
]. Gut intestinal (GI) microorganisms are in fact known to contribute to diverse human processes, such as preventing the colonization and attacks of pathogens, regulating the immune system through a number of signal molecules and metabolites, aiding the development of intestinal microvilli, breaking down non-digestible polysaccharides, and ensuring anaerobic metabolism of peptides and proteins which results in recovery of metabolic energy for the host [3
]. Although definitive proofs remains to be provided, growing evidence indicates that GI microbiota play a crucial role in the progress of human diseases and in particular for metabolic syndromes (MS), such as obesity, diabetes and hypertension [5–7
To enhance the understanding of the mechanisms of MS development, early research has devoted considerable efforts to the study of the host genomic variations. Nowadays, given the indication that GI is involved to some extent in such diseases, more attention has been paid to exploring the disruption of gut microbiota. This metagenomic approach to diseases challenges researchers in many ways, from a shift in paradigm that modifies the prevalence of genomic etiology of diseases, to more practical issues related to the complexity and vastness of GI microbiota data.
The actual connection between variations in the GI and onset of obesity and other more complex MS is still under huge debate among scientists, and is not the object of this paper. In this study, we concentrate in particular on the identification of methodologies that are able to highlight relationships between the variations in the GI composition and obesity (a well known consequence of fat feeding, related to MS [8–12
]), as it is measured in terms of impaired glucose intolerance and fat mass development, making use of tools that are both well-developed and validated in other areas of research.
Based on the observation that complexity and vastness of the GI microbiota data are traits shared with high-throughput transcriptional data, which are the object of study of functional genomics, we sought to adapt some of the well-tested and much-used tools for gene expression analysis of DNA microarray data to metagenomic studies. To clarify this concept, indicates schematically how the two types of data can be considered in this perspective. To clarify further, we briefly summarize the main areas of research of functional genomics. First, in functional genomics, we are interested in mining genes significantly related to a biological query of interest (i.e. genes differentially expressed in healthy versus diseased patients, etc.): this is achieved with methodologies broadly classified as supervised and unsupervised [13
]. Such approaches are able to group genes based on their mutual similarity (for a review see ref. [14
]), or in terms of their resemblance to some external trait (e.g. significance analysis of microarray, SAM [15
] and gene-set enrichment analysis, GSEA [16
]), based on the over/under expression of genes across samples. Second, in functional genomics we are also interested in the identification of interactions among selected genes (gene-network inference approaches, for reviews see ref. [17
]). Third, through the use of statistical methods, functional genomics concentrates on the inference of the functionality of such selected genes based on previous knowledge (definition of the controlled vocabulary of terms describing genes functionalities, Gene Ontology [19
]). We will show that, interestingly, part of these problems and their solutions can be advantageously adapted to the investigation of the role and activity of GI microorganisms.
Figure 1: Scheme representing the parallelism that can be drawn between functional genomic and metagenomic data. This parallelism is crucial for the understanding of the whole approach, since it has allowed us to adapt the methodologies largely developed in functional (more ...)
In particular, we adapt and apply these approaches to data from a recent work [20
], which investigated 10 genetically insulin resistant model-
knockout (leading to impaired glucose tolerance, IGT) mice (K), and their 10 wild-type counterparts (W), both on normal-chow (N) and high-fat diet (F). Their aim was to characterize the relative contributions of the host’s genetics and diet-disrupted gut microbiota in relation to obesity. Gut microbiota samples were harvested from fecal matter, high-throughput sequence data of the 16S rRNA gene were obtained from barcoded 454 pyrosequencing, and original sequences were merged to 516 operational taxonomic units (OTU), based on phylogenetic distance, from which a final set of 65 OTUs was identified as relevant. Overall, this work lead to the conclusion that diet is more active than genotypic host mutation in the onset of obesity. Due to the fact that diet is more effective in causing variations of the GI microbiota composition, according to these results, it is statistically more relevant than genotypic host in explaining obesity and impaired glucose tolerance in mice.
The aim of the current work is to corroborate the results in ref. [20
] with an independent and systematic approach able to make comparisons between groups, and between combinations of groups (from genotype and diet) and to assess their influence on the variation in the GI composition. In particular, further interpretation of the aforementioned final 65 OTUs and their subgroups represents our gold standard (GS) for comparisons. Given the encouraging results, we believe that this work can represent an interesting proof-of-principle of the possibility to adapt functional genomic approaches to metagenomics.