|Home | About | Journals | Submit | Contact Us | Français|
Elucidating the function of all predicted genes in rice remains as the ultimate goal in cereal genomics in order to ensure the development of improved varieties that will sustain an expanding world population. We constructed a gene expression database (RiceXPro, URL: http://ricexpro.dna.affrc.go.jp/) to provide an overview of the transcriptional changes throughout the growth of the rice plant in the field. RiceXPro contains two data sets corresponding to spatiotemporal gene expression profiles of various organs and tissues, and continuous gene expression profiles of leaf from transplanting to harvesting. A user-friendly web interface enables the extraction of specific gene expression profiles by keyword and chromosome search, and basic data analysis, thereby providing useful information as to the organ/tissue and developmental stage specificity of expression of a particular gene. Analysis tools such as t-test, calculation of fold change and degree of correlation facilitate the comparison of expression profiles between two random samples and the prediction of function of uncharacterized genes. As a repository of expression data encompassing growth in the field, this database can provide baseline information of genes that underlie various agronomically important traits in rice.
The sequencing of Oryza sativa L. ssp. japonica cv. Nipponbare genome has brought to light the genome structure of a major cereal crop that provides food for almost half the world population (1). A comprehensive annotation of the high-quality genome sequence identified approximately 32000 genes that define the rice plant, more than half of which does not have known biological functions (2–4). Assigning function to every annotated gene has become an enormous challenge not only for complete understanding of the biology of rice, but more importantly, for efficient utilization of the most basic genome information in crop improvement. Although worldwide efforts to characterize the function of the rice genome by map-based cloning strategies (5), loss-of-function approaches through insertional mutagenesis (6–9), and gain-of-function approaches (10) have made significant progress in the last few years, there is no doubt that functional characterization of all predicted genes in rice would take much longer than anticipated earlier.
Microarray technology has now become a major strategy for analyzing the genome-wide expression of genes and could provide useful clues for functional characterization of the entire genome (11). Several microarray platforms for rice have been successfully used in characterizing tissue and organ development (12–21) as well as specific treatment conditions (22–25). More recently, global gene expression analyses encompassing different cell types (26) and different organs/tissues representing the entire life cycle of the rice plant (27) have been reported. As a result, there is a proliferation of rice microarray datasets in public repositories based on varying platforms, varying experimental conditions, and plant samples from varying genetic backgrounds. Although intra-platform reproducibility of microarray data can be achieved among different commercial array systems (28), a more reliable biological information and interpretation of gene expression profiles can be best accomplished using a single platform and a model plant variety. In addition, for rice plants normally grown in the field, the entire cycle of growth and development is constantly subjected to changes in environmental conditions so that a number of factors that may affect subtle changes in gene expression have to be taken into consideration. This is particularly important in clarifying the dynamic nature of gene regulation that determines the expression of certain phenotypic traits and the changes in transcriptome profile for each organ or tissue as initiated by both intrinsic and extrinsic factors throughout the entire growth. Several databases of rice expression data are available including the National Science Foundation (NSF) Rice Array Database (24), Yale Rice Atlas Database (26) and Collection of Rice Expression Profiles (CREP) Database (27), thereby providing a wide range of rice gene expression data. Our goal, however, is to establish a comprehensive gene expression profile database that can be used as standard for understanding genome-wide expression of rice under normal field conditions.
We describe here the rice expression profile database (RiceXPro) of the model rice cultivar, O. sativa ssp. japonica cv. Nipponbare, with a user-friendly web interface for retrieving any expression data of interest. The overall concept of this database is to provide baseline information in terms of expression in different organs or tissues, and specificity in the stage of growth and development that can be used for elucidating the function of the predicted rice genes.
For microarray analysis, we used a rice 4×44K microarray RAP-DB (Agilent Technologies; G2519F#15241). The microarray was designed based on the manually curated annotation of the rice genome as described in the Rice Annotation Project Database (RAP-DB) (2) and consisted of 60-mer oligo sequences corresponding to 27800 RAP loci with transcripts based on rice full-length cDNA (32325) (29), transcripts based on expressed sequence tag (EST) support (6943), and predicted loci (2612). The oligo-sequences for all the full-length cDNA correspond to the alignment of 3′-UTR for each gene, which often show sufficient specificity, thereby producing high quality expression data. For RAP locus with several mapped full-length cDNAs, multiple 60-mer probes were designed based on the corresponding sequence. Therefore, there are probes with same sequence among the multiple probes designed for a single locus. These redundant probes provide evidence on replicability and reliability of the expression data and are indicated in the RiceXPro database as described below. Details of our microarray platform are provided in the section of ‘RICE 44K MICROARRAY’ in the RiceXPro database.
RiceXPro currently contains gene expression data sets corresponding to spatiotemporal gene expression profiling based on 48 different tissue and organ types at various developmental stages, and continuous profiling of leaves from transplanting until harvesting in japonica rice cultivar, Nipponbare, grown under natural field conditions (Figure 1; manuscript submitted). The two data sets comprise 143 and 51 microarray data with three replicates, and designated as RXP_0001 and RXP_0003, respectively, in the list of data sets.
All expression data were generated by one-color (Cy3) microarray hybridization protocol. Background correction of the Cy3 raw signals was performed with the Agilent Feature Extraction software (version 22.214.171.124). The processed signal intensities of 40121 probes corresponding to 27800 loci as described in RAP-DB were used for construction of expression profiles in graph format. Each expression data was subjected to 75 percentile normalization and log2 transformation using the R program (http://www.R-project.org).
A gene expression database must be structured in a way that facilitates both visualization and analysis of a large amount of data. The RiceXPro database can be accessed through a user-friendly web interface (http://ricexpro.dna.affrc.go.jp/) that provides two search options: keyword search and chromosome search, and two analysis tools: correlation analysis and t-test/fold change (FC) analysis to evaluate the expression pattern based on the signal intensity of genes in different organs/tissues under natural field conditions (Figure 2A).
Keyword search can be used for RAP-DB locus ID, MSU Osa1 Rice Loci, accession number, gene name or any word included in the gene description. In the search option, multiple keyword search can be performed by (i) entering two or more keywords with one keyword per line in the search box to initiate an OR search or (ii) entering several keywords per line with a space in between to initiate an AND search. The chromosome search option provides expression data of genes based on the position in the genome. A graphical representation of the 12 chromosomes allows the user to select a particular chromosome. This search system can provide valuable information in assessing the spatiotemporal specificity of a gene of interest to facilitate an efficient approach in forward genetics.
The search options provide a tabular list of the gene/genes with locus ID (RAP-DB), ‘feature number’ representing the ID number of the probes in the Agilent microarray platform, accession number, probe sequence ID, gene description and MSU Osa1 Rice Loci (Figure 2B). For example, the list of genes generated by the keyword search with ‘MADS’ consisted of 39 loci represented by 97 probes. Mouse over on the locus ID opens a pop-up window with direct links to RAP-DB (4), SALAD database (30) and Rice TOGO Browser (http://agri-trait.dna.affrc.go.jp/) to obtain detailed information on annotation, a genome-wide comparative analysis of motifs and genetic mapping data (Figure 2C). In addition, the MSU Osa1 Rice Loci provides direct links to the Rice Genome Annotation Project database (http://rice.plantbiology.msu.edu/). A probe sequence ID is assigned based on the alignment of the probe sequence, so that probes with same sequence would have the same identifier. For example, two probes with feature number (8448 and 44218) representing Os01g0883100 have same sequence (indicated as ‘non-unique’ in probe sequence ID) (Figure 2C). The expression profiles for each gene are shown as two graphs with the raw signal intensity for each sample in triplicate and the normalized signal in log2 scale (Figure 2D). A line graph representing normalized data shows median value in triplicate with error bars to indicate maximum and minimum values, respectively. These graphical data can be accessed from the tabular list generated from the search via the feature number column. Putting the mouse on a feature number gives an overview of the expression profile for each gene through a pop-up window (Figure 2C). In association with multiple keyword search and chromosome search, this function allows the user to identify and select a specific gene of interest. Furthermore, an image of each tissue and organ represented in the graphical data helps in understanding the expression signature of a particular gene. Figure 2D shows the gene expression data of Os03g0752800 that encodes OsMADS14 protein based on spatiotemporal gene expression profiling of organs and tissues at various developmental stages (RXP_0001). OsMADS14 is homologous to an Arabidopsis floral identity gene APETALA1, and has been reported to be induced by Hd3a and RFT1, rice orthologs of Arabidopsis florigen gene FLOWERING LOCUS T (FT) (31,32). Our data provided from RiceXPro indicate that the OsMADS14 is expressed in reproductive organs as well as vegetative organs after the reproductive phase. This is further confirmed by the expression data of OsMADS14 based on continuous profiling (RXP_0003) as shown in Figure 2E. A high-resolution gene expression data as provided in RiceXPro could, therefore, be useful in characterizing genes in rice based on the combined expression pattern in different tissues and organs, and at different stages of growth and development under natural field conditions.
In RiceXPro, the graph images for expression data of multiple locus IDs or feature IDs can be downloaded from the tabular list generated by two search options: keyword search and chromosome search, and two analysis tools: correlation analysis and t-test, and FC analysis. Additionally, the user is also provided with an option to display the expression data in a heatmap format for a maximum of 100 genes. For these two options, the user must first specify the genes by checking ‘Select all’, ‘Locus Select’ or ‘Feature Select’ buttons, and then select either ‘Download graph’ or ‘Construct a heatmap’ option (Figure 2C). The heatmap can be constructed based on a customized ordering, or alternatively, based on a clustering method of selected genes. For customized ordering, the locus_feature IDs can be rearranged by drag-and-drop (Figure 3A) before initiating the heatmap construction. The clustering method is based on the correlation distance and complete linkage of each gene using heatmap.2 in the `gplots’ package of R program (Figure 3A). For the construction of a heatmap, the expression level for each gene across all the data within a data set is normalized by shifting the baseline of median value to zero. A title can be added and the heatmap can be downloaded as PDF with the designated title. Figure 3B shows the heatmap obtained from clustering of 97 probes representing 39 loci generated by keyword search with `MADS’. The heatmap option facilitates the visualization and comparison of the expression profiles of any number of genes.
The RiceXPro further provides a retrieval system for the sequence of the genes generated from the search options and the analysis tools. The mRNA sequence, genome sequence, protein sequence and 1kb or 2kb sequence upstream of the transcription start site of the genes specified by checking the ‘All Select’ or ‘Locus Select’ button can be downloaded in fasta format via the ‘Download sequence’ option (Figure 2C). This function helps the user to further examine in detail the function of genes of interest or the regulatory sequences associated with gene expression.
RiceXPro provides information of coexpressed genes based on calculation of Pearson’s correlation coefficients (r-values) of the normalized signal intensities (log2) in each data set (Figure 4A). The coexpression data can be used in speculating the function of uncharacterized genes of interest and finding of new genes which maybe associated to a particular gene function (33–35). In data set RXP_0001 that contains expression profiles derived from a broad range of organs and tissues at various developmental stages, data mining based on coexpression data may facilitate identification of functionally related genes. Both locus ID and accession number can be used as a query in the search function of the coexpression analysis tool (Figure 4A). The search option provides a tabular list of genes in descending order of r-values, with the corresponding locus ID, feature number, accession number, probe sequence ID and description (Figure 4B). The list can be changed to ascending order to identify the genes with negative correlation. Figure 4B shows the gene list generated by search with ‘Os04g0179700’ as a query. The gene corresponding to Os04g0179700 encodes 9βH-pimara-7,15-diene synthase (OsKS4), which is related to the biosynthesis of momilactones, diterpenoid phytoalxins (36,37). The genes coexpressed with OsKS4 include Os04g0180400 (CYP99A2; r=0.9325), Os04g0179200 (OsMAS; r=0.89146), Os04g0178400 (CYP99A3; r=0.88045) and Os04g0178300 (OsCPS4; r=0.87038) (37). In addition to OsKS4, these genes present as a gene cluster of chromosome 4 and involved in momilactone biosynthesis (37). Although RiceXPro basically provides experimental condition-dependent coexpression data within a data set, the coexpression analysis tool may be used not only to extract genes based on similarity of expression pattern but also to identify various functionally related gene networks in rice.
The t-test (Welch’s t-test) and FC analysis tools in RiceXPro can be used to extract genes with significant difference in expression levels between two samples (Figure 4A). Options are available for P-values of <0.01 and <0.05 significant levels. For FC analysis, options to identify up or downregulated genes can be selected from 2- to 10-fold. Any two samples can be compared with a list of genes with the corresponding FC values and P-value in descending order of FC values. Figure 4C shows the gene list generated by t-test and FC analysis with a setting of P-value (<0.01) and FC (>10) between leaf blade_vegetative_12:00 as a control sample and leaf blade_vegetative_00:00 as a test sample. As shown in the list, Os04g0583900 shows 767-fold level of nighttime expression as compared with daytime. The tabular list generated by this analysis can be downloaded as a text file. This analysis tool can be useful in extracting organ/tissue- or developmental stage-specific expressed genes, finding novel genes that maybe associated with organ/tissue development, and searching for candidate promoters that can used for transgenic approaches.
RiceXPro is designed to provide a comprehensive gene expression profile of all rice genes under different experimental conditions. The current version focusing on growth and development under natural field conditions will be further expanded to include gene expression profiles of various tissues and organs at different intervals during the growth process. Microarray analysis involving rice plants treated with various hormones, and rice plants subjected to biotic and abiotic stresses are also being undertaken. The field transcriptome data could then be used as reference for comparative expression data analysis thereby enhancing our understanding of many biological processes that determine productivity of an important agricultural crop.
Genomics for Agricultural Innovation (RTR0002), Ministry of Agriculture, Forestry and Fisheries (MAFF) of Japan to Y. Nagamura. Funding for open access charge: National Institute of Agrobiological Sciences.
Conflict of interest statement. None declared.
We thank Hajime Ohyanagi and Hiroshi Ikawa (Mitsubishi Space Software Co. Ltd) for critical reading of the manuscript, Ritsuko Motoyama for microarray analysis, and the members of the Genome Resource Center for preparation of samples.