General description of microarray data
An in-house cDNA array system was used in this study (see Methods). The system consists of 10,642 amplicons of unisequences derived from cDNA libraries from several developmental stages of Brassica
. This platform was deposited in Gene Expression Omnibus (GEO) database, under accession number GPL8090. The quality of the array was previously evaluated with different RNA sources [28
]. Approximately 55% of the unisequences in our endosperm EST dataset were covered in this cDNA array. Two-color labeled cDNAs from endosperm at three embryo developing stages including globular-shape embryo, heart-shape embryo and cotyledon were used for pairwise comparisons. Two biological repeats and four technical repeats with color-swap hybridizations in each comparison were performed. Details could be accessed in Gene Expression Omnibus with accession number GSE14766 [29
]. The reproducibility of the microarray hybridization and data quality were assessed through correlation analyses and RT-PCR. The correlation coefficient between duplicate spots within an array ranged from 0.97 to 0.99 and the correlation coefficient of the signal ratio for each tissue pair between technical repeats with same color labeling varied from 0.94 to 0.99. Furthermore, 12 unisequences were selected and subjected to RT-PCR. The expression levels of these genes covered a wide spectrum, thereby allowing us to compare the resolution and relative accuracy of microarray data. There is good correspondence (r2
= 0.846) between the RT-PCR log2 ratio and the microarray log2 ratio (Figure and Figure ). These results confirmed the overall reliability of the microarray expression data.
Patterns of 12 unisequences that were identified by microarray and validated with RT-PCR. (G) globular-shape embryo stage; (H) heart-shape embryo stage; (C) Cotyledon stage.
Correlation of gene expression ratios between cDNA microarray and RT-PCR. The gene expression ratios between tissue-pair comparisons were transformed with log2 ratio.
The microarray data were normalized and went through rigorous preprocessing and filtering (see Methods). Using Rank Product [30
] we identified 1,229 unisequences that were significantly differentially expressed during endosperm development (see Additional file 4
). Twenty-four clusters were identified using a pattern-based clustering technique [31
] (see Additional file 5
). Clusters 3 and 4 (Figure ) are two major clusters that collectively include 63% of all differentially expressed unisequences. Their patterns of change are mirror images of each another. The distribution of GO annotation in biological process for each cluster was calculated (see Additional file 6
Figure 4 Schematic description of the 24 patterns based cluster analysis. The 1229 differentially expressed unisequences were classified into 24 clusters using a hierarchical algorithm. The mean signal ratio (in log2 scale) for unisequences in each cluster is (more ...)
Identification of developmental stage favored genes
Certain embryo stage favored groups can be identified from the clusters as detailed in Additional file 7
. The 24 clusters can be grouped into four classes: globular-shape embryo stage favored, heart-shape embryo stage favored, cotyledon stage favored, and others that do not fall into any of the above groups (Figure ). In general, differential expression was most evident at early stages of endosperm development between the globular-shape and heart-shape embryo and the globular-shape embryo and cotyledon stages. The difference in gene expression between heart-shape embryo and cotyledon stages was not as obvious.
Genes of the globular-shape embryo stage favored class, which include clusters 4, 5, 9, 19, 24 and 25, show peak expression at the globular-shape embryo stage. Most genes involved in photosynthesis, such as light-harvesting complex proteins and chlorophyll-binding proteins, were among this group. A significant number of genes (55%) in this class still have unknown functions.
Clusters 17, 18, 22 and 23 constitute the heart-shape embryo stage favored class. Genes in these clusters showed an increase in transcript level from globular-shape embryo to heart-shape embryo stages, whereas the transcript level in endosperm at the cotyledon stage was generally lower than that at the heart-shape embryo stage. This group includes genes for biochemical machineries of protein turnover, such as ubiquitin-conjugating enzyme 16 (UBC16), ubiquitin 11 (UBQ11), cysteine proteinase, and heat shock proteins such as HSP70B, HSP101, and DNAJ. Genes encoding pectinesterase family proteins (CN732993 [Brassica EST accession], AT5G47500 [its Arabidopsis homolog], hereinafter denoted same) and cellulose synthase (CN727050, AT5G09870) also reached high expression levels in endosperm at the heart-shape embryo and cotyledon stages. Similarly, four genes encoding cellulose synthase (CESA1/CESA2) and α-1,4-glucan-protein synthase (RGP4) were up-regulated at this particular stage. We also identified two genes encoding auxin-responsive proteins and one for gibberellin 20-oxidase, whose expression levels increased from the heart-shape embryo to the cotyledon stages. The up-regulation of these genes suggests that these plant growth regulators may play a key role in regulating endosperm development at the heart-shape embryo stage.
The cotyledon stage favored class encompasses clusters 7, 8, 11, 12 and 13, which reached their peak expression levels at the cotyledon stage. Most prominent in this group are transcription factors, some of which are known to play a role in cotyledon development and storage product synthesis. These include LEC1 (CN732092, AT1G21970), basic Leucine Zipper 25 (bZIP25, EE436021, AT3G54620), HIGH MOBILITY GROUP AT-hook (HMGA, ES265203, AT1G14900) and WRI1 (DY013242, AT3G54320), which were up-regulated by 2.5- to 9.4-fold. Genes for myo-inositol-1-phosphate synthase (MI-1-P synthase, CN737485, AT4G39800) and the starch degradation enzymes, α-amylase (AMY1, CN727248, AT4G25000) and β-amylase (BMY1, DY011386, AT4G15210), also gradually increased their expression level at this stage.
Nine clusters do not seem to peak at any particular developmental stage and therefore do not fall into any of the above groups. Cluster 3, by far the largest cluster, contains 423 unisequences that show rapid transcript accumulation from the globular-shape embryo to the heart-shape embryo stage, but only minor changes between the heart-shape embryo and cotyledon stages. This cluster includes genes encoding enzymes involved in fatty acid biosynthesis, such as 3-ketoacyl-acyl carrier protein reductase (CN731455, AT1G24360), pyruvate dehydrogenase (CN737211, AT1G01090), acyl-(acyl-carrier protein) desaturase (DY004112, AT2G43710) and β-hydroxyacyl-ACP dehydratase (CN734884, AT5G10160). Several genes that were represented by the most abundant ESTs in our endosperm EST dataset, including those in the lipid transfer protein (LTP) family (CX266460, AT1G62790; EE435630, AT3G08770; EE541123, AT4G30880; EE543887, AT5G38160; EE541128, AT5G38195 and CN732718, AT5G64080) and the putative plastocyanin-like domain-containing proteins (CN737293, AT2G23990; CN730000, AT2G25060; CN731273, AT4G31840; CN736723, AT4G32490; CN735105, AT5G15350 and CN737110, AT5G57920), also exhibited enhanced expression levels from the globular-shape embryo to the heart-shape embryo and cotyledon stages.