Meta-analysis of gene expression datasets
Our work aims at developing methodological and computational procedures for the study of metabolic pathways and conserved regulatory mechanisms underlying the fundamental biological response to high fat diet, with additional goal to suggest novel candidate genes belonging to PPARα signaling.
We used a cross-species meta-analysis approach for the integration, at the gene level, of sixteen transcriptional datasets from different organisms (human, mouse, rat and yeast) and experimental platforms (Affymetrix, single and double channels spotted oligonucleotides). The datasets included into the analysis were focused on either genetic or dietary perturbations.
First, we evaluated the presence of possible trend in clustering due to organism, platform, tissue and experimental design variability characterising the sixteen datasets selected for the analysis. After inferential and enrichment analysis on each single dataset, functional similarities among studies have been performed through cluster analysis (see Methods for details). We expected that if some biases would be present in our analysis, datasets should be grouped according to the sources of variability. Figure represents the dendrogram resulting from cluster analysis performed on GO and KEGG enriched classes. As shown in Figure , despite some differences, all the datasets seem to be highly similar, strictly comparable and not grouped according to the mentioned possible sources of variability. Thus, we proceeded with the meta-analysis approach in order to identify a set of marker genes highly relevant for the response to high fat diet and PPARα signaling. The clustering results support the evidence that dietary conditions modulating PPAR signaling and in general, high fat-low fat diet affect a coherent and evolutionary conserved core of genes, that overcomes the differences in gene expression associated to cell type, tissue and organ.
Functional enrichment bootstrap support trees of datasets analyzed. Biological Process (BP), KEGG's pathways (KEGG), Cellular Components (CC) and Molecular Functions (MF).
Analyzing the data intra and inter datasets, the meta-analysis approach allows us to find the most frequently deregulated genes in the tested condition.
Comparing the MDEG list with the DEG lists of datasets analyzed individually, we observe that the meta-analysis led us to identify a smaller number of total genes but biologically more strongly correlated to the studied condition, probably reducing the false positive genes and recovering true positive genes eliminated by a strict statistic at a single study level. A cross-species meta-analysis provides an added value to find conserved genes and for this reason is more reliable.
Meta-analysis approach identified 164 differentially expressed genes shared by at least 6 datasets (MDEGs) (for a complete list of MDEG see Additional File 1
). The 164 MDEGs are selected with the consensus of at least 6 datasets regardless of the species. All the species are represented in the MDEG list but not all MDEGs are represented in all species excepted for mouse. This is due to the different number of organism specific experiments and the evolutionary distance between the species and the reference species. The Venn diagram represents the contribution of the four organisms in defining the MDEGs (Figure ). The 100% of MDEGs (164 genes) are found in mice datasets, the 28.6% (47 genes) are found in human, the 25% (41 genes) in yeast and the 55.5% (91 genes) in rat datasets.
Venn diagram of 164 MDEGs sharing among each organism.
Functional characterization of the MDEGs
To study the identified set of 164 genes, we applied a network-based approach to describe the over-representation of GO categories in our pool of genes. The background organism used for this analysis was mouse, because it best represents all the genes highlighted in this study. This allows assessment of the reliability that our gene set reflects in the response to high fat diet and PPARα signaling as it is known by the literature (Figure &). For details on number of genes on each category, hypergeometric and FDR correction see Additional File 2
Figure 3 Graphical representation of enriched biological process Gene Ontology category in MDEGs. The network represented is a hierarchical network, the arrow from the element A to the element B signifies "the element B is part of the element A". The size of each (more ...)
Graphical representation of enriched cellular component Gene Ontology category in MDEGs. Cellular Component (GO) category over-represented in MDEGs: A) endoplasmic reticulum B) peroxisome C) mitochondrion.
Resulting biological process network (Figure ) could be divided into three areas: i) group A, representing categories linked to amino acid metabolism, ii) group B, composed by carbohydrate metabolism, specifically related to monosaccharide metabolism and gluconeogenesis, and iii) group C, with categories associated to lipid metabolism and transport. These results give an overview of how PPARα modulates gene expression in order to regulate energy metabolisms.
The main over-represented cluster (cluster C Figure ) was composed by lipid related categories, biogenesis, catabolism and transport of lipids. Fatty acid metabolism was one of the fundamental category of the network (Hsd17b4, Ehhadh, Dci and Acox1, Acadl and Acadvl), together with biosynthesis of lipids, represented by the enzyme to elongation of fatty acid (Elovl3, Elovl5, Elovl6) and Stearoyl-Coenzyme A desaturase 1 (Scd1) which catalyzes the rate limiting step in biosynthesis from unsaturated to saturated fatty acids. Moreover we found Me1 (Malic enzyme), which catalyzes the generation of NADPH required for fatty acid biosynthesis. Me1 and Scd1 are known as target genes of PPARα responsible for important parts of lipogenesis [30
]. The activation of PPARα and fatty acid metabolism requires mobilization and transport within the cell and the engagement of various compartments of fatty acids. The categories of fatty acid transport in MDEGs were represented by Cpt1b, Cpt2 and Adfp. Cpt1b is a carnitine palmitoyltransferase enzyme, responsible for the oxidation of fatty acid allowing the translocation across the outer mitochondrial membrane and the starting of fatty acid oxidation. The carnitine palmitoyltransferase II (Cpt2) encodes for an enzyme embedded into inner mitochondrial membrane, that favours the reaction condensating coenzyme A with long-chain fatty acids facilitating the release from mitochondria. Cpt1b and Cpt2 are directly regulated by PPARα [33
]. The intracellular transport of fatty acids to the nucleus and the nuclear receptor was represented in MDEGs by Fabp1, regulated by PPARα [39
The representation of amino-acid metabolism pathway in GO network could be well explained by the interplay between anabolism and catabolism. In case of caloric restriction, amino acids are precursors for lipids, carbohydrates, and nucleic acids used as co-substrates and co-enzymes in the production of energy. On the contrary in case of dietary surplus, the potentially toxic nitrogen from amino acids has to be eliminated via transamination, deamination, and urea formation. Kersten et al. observed that in fasted mice the simultaneous increase in ketone body concentration and decrease in urea concentration are due to the action of a single factor, PPARα, which regulates the transcription of genes involved in the relative pathways, up-regulating fatty acid oxidation genes and downregulating the ureagenesis and ammonia detoxification.
PPARα regulates amino-acid metabolism through several genes, two of them, Cytosolic aspartate aminotransferase (Got1) and argininesuccinate lyase (Asl) were included in MDEGs [42
Finally, Kersten et al. demonstrated that PPARα regulates carbohydrate metabolisms in particular by acting on gluconeogenesis [42
] and governing hepatic glycerol metabolism [44
All this evidence demonstrates that PPARα, despite its primary role in regulating the metabolism of fatty acids, acts as a master regulator of the rate of utilization of the various energy substrates in relation to food availability, and the identified network of biological process provides full details of all these aspects.
In addition to biological processes network, the cellular component network (Figure ) highlights that the principal sites of activity during the response to high fat diet and PPARα signaling activation are the endoplasmatic reticulum (cluster A), the peroxisome (cluster B), and the mitochondrion (cluster C). The over-represented cellular components indicate that mitochondrion was the most over-represented scenario (26.8%; 44 genes) of the differential transcription regulation under the studied stimuli.
The results of pathway enrichment analysis are shown in Additional File 2
. Several MDEGs belong to pathways related to high fat diet and 16 out of 164 MDEGs (10%) belonged to KEGG's Mouse PPARs signaling pathway (Acadl, Acaa1a, Acox1, Cpt1b, Cpt2, Cyp4a14, Cyp8b1, Fabp1, Hmgcs2, Me1, Scd1, Sorbs1, Acsl5, Angptl4, Ehhadh, Acsl3). Each of these 16 genes could be specifically connected to PPARα signal and not to the others different type of PPARs. Focusing on PPARα target genes in the whole set of 164 MDEGs, we found that the transcription of 42 genes (25.6%) was regulated by this nuclear receptor (Additional File 2
). We also observed that only 11 of these 42 genes were characterized by functional PPRE and 5 genes have an in silico
Evolutionary conserved markers of high-fat response
Cross-species analysis allows deciphering molecular complexity through evolutionary constrains. As expected mammalian organisms share the largest amount of genes (Figure ), however, it is interesting to note that 41 yeast ORF (25%) were identified among MDEGs, each of these yeast genes has a homolog in mouse.
Looking at functions and processes linked to the yeast MDEGs, we observed that 7 out 41 genes, (17%; CAT2, FEN1, POT1, FOX2, YAT1, CRC1, SPS19) were directly involved in fatty acid metabolism and transport. CRC1, POT1, YAT1, FOX2 and SPS19 are targets of the transcription induced by Oaf1-Pip2 in yeast [45
]. The 41 yeast genes identified by meta-analysis are localized both in mitochondrion (49%; 20 genes) and in peroxisome (12%; 5 genes). This strongly agrees with findings described in literature, stating that S. cerevisiae
adapts to oleic acid as a sole carbon source inducing transcriptional modulation of both peroxisomal and mitochondrial function [46
]. In addition to lipid metabolism, transcriptional reprogramming induced by oleic acid in yeast, as in the mammalian organisms, deregulates the amino acid metabolism (20%; 8 genes; EHD3, ARG4, CAT2, CYS3, CDC60, YAT1, GLN1, AAT2). Interestingly we discovered 5 yeast genes that are homologous of mammalian genes under control of PPARα. The 5 genes are ADP1 homologous of Abcg2 [47
], AAT2 homologous of Got1 [42
], ARG4 homologous of Asl [42
], FOX2 homologous of Hsd17b4 [48
] and YAT1 homologous of human CPT2 [37
]. Interestingly only two genes Hsd17b4/FOX2 and CPT2/YAT1 appears to share the same transcriptional regulator PPARa in the mammal and Pip2p-Oaf1p in the yeast. This suggests FOX2 and YAT1 as central and evolutionary conserved response elements for the high fat diet response. Moreover these findings indicate that, either the regulatory structure remains to be completely elucidated, the other three genes could represent valid candidate genes for future investigations and that they could be used as model study for mammals genes exploiting the awesome benefits of yeast genetics.
Gene signature validation
Further validation of our methodology and analyses came from the comparison of our gene list with several external expression dataset. We performed a validation for yeast, that have a gene signature composed of 41 MDEGs and a validation for mouse using all the 164 genes. The validation on yeast was performed using a dataset published in GEO database by Smith et al. [50
]. Smith et al. (2007) performed the expression profiles of 4 transcription factor deletion strains (delta_OAF1, delta_PIP2, delta_ADR1 or delta_OAF3) in the presence of oleate and the expression profile of wild type strain in oleate versus low glucose diet. Our 41 yeast genes resulted as a subgroup of the pool of Smith, underlining the consistency of our list of genes in relation to the pathway of signal studied. In order to establish if our list is a useful gene signature to understand the activity of oleate response we selected the expression value of the 41 yeast MDEGs in each experiments of the Smith's dataset and we performed a cluster analysis. This analysis allows us to separate the datasets into two main groups. The first group contains the expression profiles of delta_OAF1, delta_PIP2, delta_ADR1 strains where the oleate-inducible transcription factors are deleted and therefore, the response to oleate diet is repressed. The second group is composed by oleate diet versus low glucose diet dataset, delta_OAF3 strain and our dataset used as a reference. The findings exactly mach to our expectations. As OAF3 is a repressor of oleate-induced transcription, delta_OAF3 strain has a behaviour similar to the activation of transcription induced by oleate. (Tree in Figure ). The same procedure was applied to external expression dataset of M. musculus
. The resultant tree of experiments, obtained clustering by similarity of expression the selected 164 mouse MDEGs, perfectly split the 4 experiments in 2 groups. In the first group we find experiment studying activation of PPARα signaling, in the second group experiment in which there is not PPARα. (Tree in Figure ; Complete Matrix with expression values in Additional File 2
Bootstrap support trees of validation datasets selected for MDEGs. A) Yeast datasets B) Mouse datasets.
The correspondence of the datasets separation to our expectations qualifies the identified genes as putative markers of the biological response to high fat diet.
Transcription factor (TF) screening on MDEGs
Mining the literature we can find that Lemay and Hwang [13
] have already performed a genome-wide screening of PPRE on human genome. Accordingly we compared the list of MDEG with the list of genes with PPRE provided by Lemay and Hwang, finding that 12 MDEGs have in silico
predicted PPRE. We calculated the Fisher exact test p-value to study the overlap. As expected, because of the unbalanced numbers of the comparison, the p-value (p = 0.21) was not significant, although giving an indication of non randomness. However we are aware that the only presence of a PPRE does not necessarily result in a change in gene expression and vice versa many of the changes in gene expression we discover might be due to a not-direct interaction with PPARa or a combined action of more TFs.
With the aim of deeply investigate the presence of other TF binding sites, we accomplished a genome-wide screening of the region flanking the transcription starting site of all mouse MDEGs, searching binding site (TFBS) for all TFs contained in the Jaspar database [51
]. The screening was performed using a tool called oPOSSUM [27
]. Confirming and strengthening our previous results, the analysis shown that PPRE was the only TFBS over-represented in our pool of genes (see Additional File 2
for details). The target genes with predicted PPRE were 18 (Table ). As expected the results of the two TF screening on MDEGs were partially different because of two different analysis pipelines [13
]. Lemay and Hwang performed the screening using the whole human genome instead of the mouse one, they used a quite different sequence consensus for the PPAR binding site and different methods to assess a score to the findings. However the two analyses were similar for the purpose and, despite of the two different organisms used as reference, each method is developed taking into account species comparison. As consequence the human-mouse conserved elements have to be found and, in our opinion, no method is better than the other providing two complementary results.
Table of MDEG with predicted Peroxisome Proliferator Response Element.
We believe that, despite the previous not significant p-value, the overlapping genes might represent an important set of response activators. This belief is strengthened by the fact that the overlap between the results of the two methods is composed by 5 genes (Lpcat3, Hmgcs2, Fabp1, Ccnd1, Cpt1b) all already known as target gene of PPARa (as shown in Table ). This makes us confident that the remaining 20 genes, 13 of the oPOSSUM list and the 7 of Lemay and Hwang list, could be new candidate targets potentially interesting for further investigations.
Co-localization of MDEGs across the genome
The regulatory mechanism at the basis of PPARα induction of transcription is not well understood. The presence of PPRE in the promoter and the simultaneous expression of the gene with the activation of the nuclear receptor are the best criteria required to confirm the regulation of a gene by PPARα. However, often we do not have both of these evidences to identify a target genes. Some genes without PPRE show fatty acid responsive changes in transcription and they seem to be under control of fatty acid regulation. In literature we find evidences supporting the idea that genes regulated by the same transcription factors and/or that sharing biological functions are co-localized in the genome [28
Given the validity of our MDEGs, resulting from all the previous findings, for a better comprehension of the molecular mechanism underneath the high fat response, we explored MDEG genome arrangements across a mouse genome.
To increase the power of analysis we added to MDEG list the known PPARα target found in literature and not in our list. The analysis of the arrangement of the genes reveals a non-random chromosomal location, in particular, MDEGs are often co-localized to compose small group consisting of 2 to 5 genes (see Table in Additional File 2
). Some of these clusters are very interesting for two reasons: (i) they contained genes that have been already demonstrated to be regulated by PPARα and in some cases the PPRE is known, and (ii) they contained genes with similar function and strongly correlated to the known activity of PPARα.
In our opinion, this finding is a further confirmation that the mechanism of transcription regulation operated by PPARα involve epigenetic processes. Indeed in literature we can find molecular and in silico
confirmation of this hypothesis. Lemay and Wang, calculating functional enrichment of genes showing PPRE, have found that one of the most over-represented category was chromatin remodelling. In 1999 Xu et al. demonstrated that the recruitment of transcriptional machinery by nuclear receptors can occur directly or in response to chromatin remodelling, elicited by the dismissing of HDAC (Histone Deacetylase Complex), by ligand and by the recruitment of HAT complex (Histone Acetylase) [52
]. Unfortunately this was not demonstrated specifically for PPARα. However, in recent years substantial effort has been invested in studies of chromatin remodelling complexes associated with transcription factors. In particular, Li et al. have shown that SMARCD1 is the molecular link between SWI/SNF chromatin remodelling complex and PPARα transcription factor. The recruitment of SMARCD1 to PPRE, mediated by PGC-1α, leads to a switch in chromatin structure to an active state [53
]. In yeast, the connection between diet and chromatin remodelling is well studied. The nutritional status and chromatin state are correlated to health state and replicative life span, by mechanism involving sirtuin activation that regulates mitochondrial biogenesis through changing of the acetylation state of the transcriptional coactivator PGC-1α [54
]. The fact that PGC-1α plays important role in epigenetic transcription regulation in both yeast and mammals as a physical link between PPRE bounded by PPARα and chromatin remodelling complex, suggest possible presence of the evolutionary conserved epigenetic regulatory mechanisms.
At the light of these findings, in silico analysis suggest that transcription factor induction and chromatin state seem to be the principal factors mediating the response to excess dietary fat. This probably allows PPARα to bind a PPRE and to regulate more than one gene at the same time.