To demonstrate the utility of MAPPFinder, we used the program to analyze the publicly available mouse microarray dataset, the FVB benchmark set for cardiac development, maturation and aging [14
]. This dataset measures gene-expression levels in the hearts of 12.5-day embryos and adult mice. We have used the 12.5-day embryonic time point to identify those biological processes that show differentially expressed genes between embryonic and adult hearts. We ran the MAPPFinder analysis on this dataset using two criteria, either an increase (fold change > 1.2 and p
< 0.05) or decrease (fold change < -1.2 and p
< 0.05) in gene expression for the 12.5-day embryo. We chose this dataset for demonstration because of the large number of differences in gene expression observed in the 12.5-day embryo compared to the adult mouse heart tissue.
MAPPFinder linked the 9,946 probe sets measured in this experiment to the 11,239 GO terms [12
] in the hierarchy and calculated the percentage of genes meeting the criterion and a z
score for each GO term. Table gives an overall summary of the linkages made between the dataset and GO and calculations carried out by MAPPFinder. Nearly half of the 9,946 probe sets measured in the FVB benchmark dataset were connected to a GO term, representing approximately 70% of the mouse genes associated with GO terms [15
] and covering a good portion of what is currently known about mouse biology. The proportion of genes in the microarray dataset that link to GO terms will increase as more GO terms and gene associations are added by the Mouse Genome Database (MGD) [16
Numbers of genes used in the MAPPFinder calculations
After MAPPFinder assigns the genes in the microarray dataset to the GO structure, it calculates for each GO term the percentage and z score (see Materials and methods) for the genes that meet the user's criterion. These two values can be used to identify GO terms with an over- (or under-) represented number of gene-expression changes. The MAPPFinder results are displayed in two forms. The first is a GO browser that graphically displays the MAPPFinder results in the structure of the GO hierarchy (Figures ,). The second is a text file listing all the GO terms measured, ranked by the z score. The number of genes meeting the criterion, the number of genes measured in the experiment, and the number of genes assigned to each GO term by MGD are given, along with the respective percentages and z score, in the text file and GO browser (Figure ). Table shows the list of process, component and function terms with a z score greater than 2 for the significantly increased and decreased criteria at the 12.5-day embryonic time point. GO terms that had fewer than 5 or more than 100 genes changed were removed from the list because these terms were either too specific or too general for our data analysis. This filter identified the top 108 (8.0%) GO terms for the significantly increased criterion and the top 63 (4.8%) GO terms for the significantly decreased criterion. The stringency of this filter can be increased or decreased by raising or lowering the z score cutoff, or by including terms with larger or smaller numbers of genes. The filtered list was then pruned by hand for related GO terms to remove any over-represented branches of the GO hierarchy (for the complete results, see Additional data files). When both a parent and a child term were present in the list, the parent term was removed if its presence was due entirely to genes meeting the criterion for the child term. The remaining terms on the list still have a large degree of interrelatedness, but have been retained here for completeness.
Figure 2 The MAPPFinder browser. (a) The branch of the GO hierarchy rooted at the biological process term 'RNA processing' is shown. The terms are colored with the MAPPFinder results for genes significantly increased in the 12.5-day embryo versus the adult mice. (more ...)
Figure 3 Linking MAPPFinder to GenMAPP. (a) The MAPPFinder browser displaying the 12.5-day embryo increased results for the GO process term 'glycolysis'. Color-coding of GO terms is the same as in Figure . (b) Clicking on the GO term glycolysis (more ...)
MAPPFinder results for genes significantly increased and significantly decreased in 12.5-day embryos versus adult mice
The MAPPFinder results present a global picture of the biological processes, cellular components and molecular functions that are increased and decreased in the 12.5-day embryo compared with the adult mouse (Table ). Using the criterion for a significantly increased gene-expression change, MAPPFinder primarily identified GO terms involved in cell division and growth. Notable GO terms include the processes 'mitotic cell cycle' (62.9% of 70 genes, z
score of 8.1), 'mRNA splicing' (90.5% of 21 genes, z
score of 7.5), and 'protein biosynthesis' (50% of 104 genes, z
score of 6.8). The top-ranked component and function terms reflected the same biological processes. For example, the component term 'spliceosome' shows that 17 out of 20 genes (85%, z
score of 6.7) were upregulated. The upregulation of these processes is consistent with the fact that cardiomyocytes remain mitotically active throughout embryonic development [17
]. Apart from processes involved in cell division and growth, the MAPPFinder results indicate that the processes 'transmembrane receptor protein serine/threonine kinase signaling pathway' and 'induction of apoptosis' are upregulated, with a z
score of approximately 2. The presence of the term 'transmembrane receptor protein serine/threonine kinase signaling pathway' is due to the upregulation of genes involved in transforming growth factor-β (TGFβ) receptor signaling, which is thought to regulate the induction of apoptosis required for morphogenesis during heart development [18
Genes involved in energy metabolism showed the highest levels of downregulation in the 12.5-day embryo heart versus the adult heart. In particular, the process terms 'fatty acid metabolism' (63.3% of 30 genes, z
score of 5.9) and 'main pathways of carbohydrate metabolism' (51.3% of 39 genes, z
score 4.8), which is the parent of the terms 'glycolysis' and 'tricarboxylic acid cycle', indicate that metabolic genes as a whole are downregulated in an embryo when compared to an adult mouse. In addition, the component term 'mitochondrion' shows that 88 out of 187 genes (47.1%, z
score of 9.1) are downregulated. The downregulation of genes involved in fatty-acid metabolism is consistent with research that has shown that the developing heart, unlike the adult heart, does not derive its energy from fatty acids [20
Overall, the MAPPFinder results provide a global perspective of the processes that are up- and down-regulated in the 12.5-day embryonic heart compared to an adult heart. The results confirmed what was expected: when compared to the adult heart, the embryonic heart is undergoing increased cell division and growth and has decreased energy metabolism. In addition, the global gene-expression profile presented by MAPPFinder allows the gene-expression changes observed for cell division and growth and energy metabolism to be put in the context of other regulatory and developmental processes such as TGFβ signaling and apoptosis.
The MAPPFinder browser
Viewing the MAPPFinder results as a ranked list is informative, but it does not take full advantage of the fact that GO is arranged in a hierarchy. MAPPFinder also presents the results in the context of the GO hierarchy (Figures ,) showing the entire hierarchy, color-coded by the percentage of genes changed. Users can step through the hierarchy, expanding those branches of the tree that are showing gene expression changes, moving from broad terms to more specific categories. Often the ranked list of terms will show many interrelated terms, and it is necessary to view the results in the hierarchy to identify the relationships among them. For example, the terms 'RNA metabolism', 'RNA processing', 'mRNA processing', and 'mRNA splicing' appear as upregulated in Table . However, the tree view (Figure ) clearly shows that mRNA splicing is a child term of both RNA splicing and mRNA processing, which are in turn child terms of RNA metabolism. Similarly, the terms 'main pathways of carbohydrate metabolism', 'catabolic carbohydrate metabolism', and 'glycolysis' also appear as downregulated in Table . The MAPPFinder browser (Figure ) shows that 'glycolysis' is related to 'main pathways of carboyhydrate metabolism' through the hierarchical relationship between these terms.
The MAPPFinder browser also provides three search and navigation functions. First, the user can search by a keyword or an exact GO term name. Second, the user can search by a gene identifier to find which GO term(s) the gene is associated with. For example, searching for the gene alpha-myosin heavy chain using its SWISS-PROT identifier MYH6_MOUSE or its MGD identifier MGI:97255 finds the GO process terms 'striated muscle contraction', 'cytoskeleton organization and biogenesis', 'protein modification', and 'muscle development'. Third, the user can expand the GO tree automatically to show all nodes with a minimum number of genes or minimum percentage of genes meeting the criterion or with a minimum z score. The terms meeting the filter are highlighted in yellow to clearly indicate the results of the search.
Once the GO terms of interest have been identified with MAPPFinder, the user will want to know exactly which genes are associated with these terms and exactly which genes are being differentially expressed. This can be accomplished using GenMAPP. Selecting a GO term in the MAPPFinder browser automatically builds a MAPP containing the genes associated with that GO term and all of its children, and opens this MAPP in GenMAPP. Figure shows the MAPP generated by selecting the GO term 'glycolysis' in the MAPPFinder browser. The genes on the MAPP are color-coded with the same criteria used to calculate the MAPPFinder results, significantly increased and decreased at the 12.5-day embryo time point. Clicking on a gene on the MAPP opens a 'back page' containing annotations, gene-expression data and hyperlinks to that gene's page in the public databases. By integrating GenMAPP and MAPPFinder, it is possible to seamlessly move from a global gene-expression profile at the level of all biological processes, components and functions to a detailed description of the gene-expression levels for the specific genes involved. For example, a closer examination of the glycolysis MAPP indicates that hexokinase I is upregulated in the 12.5-day embryo and isoforms II and IV are downregulated, as compared with the adult heart. This is consistent with hexokinase I being the predominant isoform in the embryonic heart [21
Expanding MAPPFinder beyond GO
GO is a good starting point for analyzing microarray data in the context of biological pathways, but this is by no means the only way to group related genes. Instead of representing each GO process as an alphabetical list on a MAPP, it would be more useful to represent the relationships between these genes as a fully delineated pathway. As a start in this direction, GenMAPP.org [13
] has created over 50 MAPPs depicting metabolic pathways, signaling pathways and gene families. MAPPFinder can incorporate any MAPP file into its analysis to augment the GO hierarchy. For the FVB benchmark developmental dataset, we have run MAPPFinder on an archive of 54 mouse MAPPs available from [13
] (see Additional data files for the complete results). These results for the 12.5-day embryonic time point agree with the GO results, showing that the expression of genes involved in the metabolic pathways 'tricarboxylic acid cycle' (83.3% of 12 genes measured, z
score of 5.91) and 'fatty acid degradation' (69.2% of 13 genes measured, z
score 4.82) is significantly decreased. In addition, the significantly increased criterion identified genes encoding ribosomal proteins (71.1% of 45 genes, z
score 6.75) and genes involved in the cell cycle (53.3% of 15 genes, z
The archive of MAPPs provided by GenMAPP is in no way comprehensive. The growth of this archive depends on assistance from the entire biological community. Our hope is that, as MAPPFinder users see the added utility of viewing the GO biological processes as fully delineated pathways, they will use GenMAPP to organize the gene lists into more descriptive biological pathways. Figure gives an example of how the genes from the GO term 'glycolysis' can be rearranged using the tools in GenMAPP to depict the full pathway showing the direction of the enzymatic cascade, metabolic intermediates and cellular compartments. GenMAPP.org is currently accepting submissions of new MAPP files. MAPPs contributed by the community will be included in the downloadable MAPP archive.
MAPPFinder is a necessary complement to current analysis tools
By approaching large datasets from a higher level or organization, MAPPFinder helps to ease the data analysis and shorten the time necessary to gain a biological understanding of the microarray data. MAPPFinder has greatly expanded current pathway-based tools by using the large amount of annotations available from the GO. This broad analysis will help identify biological processes that have not yet been implicated in a particular experimental condition and begin to make connections between biological processes previously thought to be unrelated.
MAPPFinder is available for yeast, mouse and human data. We plan to extend the program to many of the other species that are in GO and updates will be available at [13