|Home | About | Journals | Submit | Contact Us | Français|
Due to their close relationship with Animalia and their experimental tractability, filamentous fungi are frequently used as model organisms for understanding diverse aspects of basic cellular regulation, including cell cycle progression, gene expression, circadian timing, light sensing, recombination, secretion, multicellular development (Davis 1966; Ballance 1986; Dunlap et al. 2007), and evolution (Bruns et al. 1991; Kohn 2005). For instance, the multicellular filamentous fungus Neurospora crassa (Galagan et al. 2003; Wu et al. 2007) has been studied in research laboratories for decades (Beadle and Tatum 1941; Davis and Perkins 2002; Dong et al. 2008), primarily owing to its interesting and diverse biology, ease of culture, facile genetics, and rapid growth rate (Borkovich et al. 2004; Merrow and Roenneberg 2007). A number of other fungi, such as Aspergillus nidulans (Galagan et al. 2005a), A. fumigatus (Schoberle and May 2007), Fusarium graminearum (Cuomo et al. 2007), Magnaporthe grisea (Soderlund et al. 2006), have become important fungal models for animal and plant pathogenesis (Gow et al. 2002; Tunlid and Talbot 2002). Increasingly, gene expression data for filamentous fungi provide a rich resource for the identification of candidate genes and the characterization of systemic biological responses to growth conditions and environmental stresses (Soderlund et al. 2006; Breakspear and Momany 2007). However, few resources have been made available for storage and access to diverse yet interrelated filamentous fungal gene expression studies.
A number of databases have been constructed to provide storage and management of gene expression data, several (Demeter et al. 2007; Hong et al. 2008; Barrett et al. 2009; Parkinson et al. 2009). The Stanford Microarray Database (SMD) (Demeter et al. 2007) serves as a research tool that is free of charge for Stanford researchers and their collaborators. The Saccharomyces Genome Database (SGD) (Hong et al. 2008) is a central repository for the budding yeast Saccharomyces cerevisiae, allowing the retrieval of microarray expression data for Saccharomyces (Ball et al. 2001).
Among them, the Gene Expression Omnibus (GEO) database (Barrett et al. 2009) at the US National Center for Biotechnology Information (NCBI) and the ArrayExpress repository (Parkinson et al. 2009) at the European Bioinformatics Institute (EBI) are major databases for public gene expression data. However, these databases simply collect user-submitted results and lack a clear classification of gene expression data. Gene expression metadata include a variety of information from wet-lab experiment to in-silico procedure, which may be classified into different sections. Considering the burgeoning collection of gene expression data, a necessary next step is to make full use of classified expression data to automatically correlate relevant experiments and discover some correlations with different experimental conditions. However, such an operation is not well-supported by these generalized unclassified databases. Moreover, none of these databases provide specific resources to facilitate well-annotated deposition or facile comparison of filamentous fungal gene expression studies.
To simplify data submission and management and to facilitate discovery of correlations among diverse gene expression experiments in filamentous fungi, we constructed a novel, comprehensive, free database, the Filamentous Fungal Gene Expression Database (FFGED; http://bioinfo.townsend.yale.edu), to serve as a collective and collaborative platform for filamentous fungi. Unlike existing related databases, FFGED focuses on comparisons of diverse experimental designs and correlations of different experimental results for filamentous fungi. The aim of FFGED is not only to provide users with friendly web interfaces for facilitating data submission and management, but also to rapidly share gene expression data within the scientific community at large, to automatically identify related gene expression experiments to assist users in connecting their experiments with similar experiments, and to effectively ease virtual comparisons of each gene’s expression under multiple experimental designs within different experiments across diverse fungi.
To provide gene annotations to result files from raw or statistical analyses of microarray data, we retrieved annotations for N. crassa from three databases: PEDANT (Protein Extraction, Description and Analysis Tool; http://pedant.gsf.de) (Riley et al. 2005), the Neurospora crassa e-Compendium (http://bioinformatics.leeds.ac.uk/~gen6ar/newgenelist/genes/), and the Broad Institute of MIT and Harvard (http://www.broadinstitute.org/annotation/genome/neurospora/). Annotation data for Aspergillus fumigatus, A. nidulans, Coccidioides immitis, Coprinus cinereus, Fusarium graminearum, F. verticillioides, Histoplasma capsulatum, and Magnaporthe grisea were obtained from the Broad Institute. Additionally, pathway information for several species was extracted from KEGG (Kyoto Encyclopedia of Genes and Genomes) (Kanehisa and Goto 2000).
To ease data analysis, FFGED features an online normalization tool, providing normalization for two-color arrays as in (Townsend et al. 2003). For evaluation of hybridization quality a helpful illustrative plot of the normalization regression, can be constructed and exported as an image in SVG or PNG formats.
To link related experiments together, FFGED calculates the cosine similarity coefficient for two experiments. The cosine similarity coefficient is widely used in information retrieval and data mining (Salton and McGill 1983; Baeza-Yates and Ribeiro-Neto 1999; Dhillon and Modha 2001). It is the cosine of the angle (θ) between two vectors of n dimensions when each experiment is represented by its experimental design, which can include multiple developmental, genetic, and/or environmental states. Therefore, given two vectors of experiments x(x1, x2, …, xn) and y(y1, y2, …, yn), the cosine similarity coefficient between x and y is obtained by
where and .
In FFGED, any data can be submitted in the process of an experiment or after its completion. Entry of an experiment is divided into five sections: experimental Metadata (e.g., experimental name), experimental Design (e.g., cyanine-3 and cyanine-5 treatments), Raw data (e.g., GenePix Results), Normalized details (e.g., normalization methods), and analysis Results (e.g., gene expression levels). This division is hereafter denoted as MDRNR (see Figure 2B). MDRNR is also compatible with MIAME (Minimum Information about Microarray Experiment; http://www.mged.org/Workgroups/MIAME/miame.html) (Brazma et al. 2001), further facilitating data exchange with other institutions. By providing resources for archiving raw data through processed results at each of these stages, FFGED facilitates use as an “active experiment” database, increasing the immediate utility to the user and decreasing the burden of post-experimental submission and annotation.
Unlike existing related databases (Killion et al. 2003; Demeter et al. 2007; Hong et al. 2008; Barrett et al. 2009; Parkinson et al. 2009), FFGED not only functions as a storage database but also as an analysis tool that facilitates progression and processing of data through the MDRNR classifications. Several online programs, for example, normalization (Townsend et al. 2003), generation of an input file for BAGEL (Bayesian Analysis of Gene Expression Levels; (Townsend and Hartl 2002), annotation of results, and figure plotting (implemented with JfreeChart, an open-source framework for creation of professional charts; http://www.jfree.org), have been integrated into FFGED. These tools greatly ease data analysis and visualization. In addition, user-friendly web interfaces with clear progression monitoring of experiment submission are provided for diverse competencies of submitters, lowering technological entrance barriers (Ball et al. 2004).
Any user can submit multiple experiments to FFGED. Each experiment can be set to be private or public. Public experiments are freely accessible by users. MDRNR classifies gene expression data into five different sections, with “experimental design” as the core section that shapes the entries for other sections. Although experiments may differ greatly, similar experimental states submitted in experimental design indicate that experiments may be compared very productively. Therefore, FFGED employs experimental states to link each experiment with similar experiments made public by other users.
For each public experiment in FFGED, detailed information is shown in a tab format according to MDRNR (Figure 2), and related experiments are also displayed according to their similarity coefficients (see Materials and Methods). As a result, related experiments are identified, which can lead to synthetic knowledge discovery. Furthermore, users with similar experiments are connected to each other, enhancing communication and collaboration within the scientific community.
Public results are gene expression levels under a variety of experimental states, extracted from analysis results within public experiments. Such public results provide a forum for the study of the expression of individual genes and their correlations under different experimental states. For each gene, FFGED quantifies expression levels under diverse experimental variables within different public experiments (Figure 3), enabling comparative gene expression analysis.
To share this information with the scientific community at large and to provide content for other institutions, we provide an XML retrieval mode via Internet for structured data exchange. In addition, FFGED offers a dynamic interface to embed up-to-the-minute experimental results for the expression of a relevant gene into any web site. For instance, the N. crassa database at the Broad Institute has implemented this tool, embedding experimental results for each gene’s expression into its gene-specific web page (See the FAQ page at http://bioinfo.townsend.yale.edu/faq.jsp#sharing).
Our newly developed database, FFGED, is a collective and collaborative platform that provides a useful tool for filamentous fungal gene expression studies. The AJAX-based database has several unique advantages over other currently available databases; the database provides users with friendly and highly interactive web interfaces, lowers technological barriers and emphasizes ease of data submission, retrieval, visualization, and sharing. Furthermore, unlike existing generalized databases, FFGED provides a specific platform for filamentous fungal gene expression studies, and most importantly, facilitates direct comparisons of diverse experimental designs and correlations of different experimental results for filamentous fungi. Additionally, in FFGED, gene expression data can be shared rapidly within the scientific community, related gene expression experiments can be identified easily, and gene expression systems can be compared virtually under multiple experimental designs across diverse fungi.
Regarding gene expression data storage and management, FFGED provides a flexible framework, MDRNR, which clearly classifies gene expression data into five different sections and can be used with diverse designs and platforms (Burgoon 2006). This classification scheme accordingly makes FFGED completely different from other databases. Compared with extant unclassified databases, FFGED also promises to provide correlations among different experiments that may be easily retrieved and interpreted. Moreover, MDRNR separates wet-bench work conducted by biologists (experimental metadata and experimental design) from dry-bench work done by bioinformaticians (raw data, normalized details and analysis results).
In contrast with some databases which are not free of charge for all users (e.g., SMD), FFGED features free private submission and free public access to public data for any user, with the aim of increasing collaboration by maximizing data sharing and facilitating knowledge discovery from collective data. This feature of knowledge discovery through collective public data is a key element for data-intensive bioinformatics studies (Stajich and Lapp 2006; Zhang et al. 2009).
FFGED has served as an in-house database for gene expression data since October 2007, and has been regularly updated and improved. Potential expansions of this database would be the addition of other fungal genome browsers (http://fungal.genome.duke.edu), facilitating genome annotations and comparative sequence analysis. In addition, providing specific formats for submission of other types of gene expression data, such as MPSS (Massively Parallel Signature Sequencing, Reinartz et al. 2002; Torres et al. 2008) and DGE (Digital Gene Expression) profiling (Linsen et al. 2009), into the expression level tables would facilitate further gene expression comparisons across multiple samples, as with DNA microarrays and other gene expression platforms (Nakano et al. 2006). Moreover, integrating more online tools for gene expression analysis, for example, tools for detecting co-expressed genes within and among experiments, would greatly facilitate expression analysis. Lastly, expanding comparative expression analysis across fungi would be essential for studying their vast and diverse biology and morphology (Galagan et al. 2005b).
In summary, we have constructed the Filamentous Fungal Gene Expression Database and it features user-friendly web interfaces, free access, clear classification of expression data (MDRNR), novel identification of similar related experiments and correlation of diverse gene expression levels within multiple experiments. It will facilitate in-depth comparative expression studies on diverse filamentous fungi.
We thank Dr. Jay Dunlap, Dr. Louise Glass, Dr. Matthew Sachs, our colleagues at the Broad Institute of MIT and Harvard, and everyone involved in the Neurospora PO1 functional analysis of a model filamentous fungus project for providing feedback and a highly productive collaborative community. We also thank Dr. Takao Kasuga for valuable suggestions, Dr. Namboori B. Raju for providing an image for the database logo, and many users as well as members of the Townsend lab for reporting bugs and sending comments.
This work was supported by program project grant GM068087 from the National Institute of General Medical Sciences at the US National Institutes of Health.
Author ContributionsZZ designed, programmed, and implemented this database and drafted the manuscript. JPT contributed core design principles, supervised the development, and revised the manuscript. Both authors have read and approved the final manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.