The selected winter wheat cultivar, Hereward, was released in the UK in 1991, but is still widely grown and has remained a "gold standard" for breadmaking wheats.
Grain development of cv. Hereward
Approximately 33,000 of the 61,000 probesets on the array showed significant binding to transcripts and, of these, 14,550 showed significant differences in transcript abundance between developmental stages. (Note: Transcriptomics is the measurement of transcript abundance, but here we follow common practice using the terms 'gene expression' and 'transcript abundance' interchangeably, so that both include variation in transcript degradation). The profiles of this latter set during grain development are summarised with the changes in seed dry and fresh weights in Figure . Hierarchical cluster analysis of the whole dataset (Figure ) shows that biological replicate samples cluster together, with successive changes in the patterns from 6 to 42 daa. Furthermore, three broad phases are indicated, with the samples from 6, 8 and 10 days, 12, 14, 17 and 21 days, and 28, 35 and 42 days forming separate clusters.
Figure 1 Data from two biological replicate samples of developing wheat caryopses. Upper panel: grain fresh weight and dry weight (error bars are least significant difference at P < 0.05 from 1-way analysis of variance). Lower panel: transcriptome of samples (more ...)
Hierarchical clustering of samples by gene expression. Gene set is same as that in Figure 1. Co-expression measure was Pearson correlation. All replicates cluster as pairs, for which the stage is shown as days after anthesis.
These phases can also be seen in the highly distinctive pattern of normalised transcript abundance (Figure ) which shows massive changes in expression for many genes in the switch from 10–12 days and again from 21 to 28 days. Similar changes can be seen in the smaller set (2,237) of differentially regulated transcripts identified by cDNA arrays [4
] and in the profiles of the 250 most abundant SAGE tags [8
It is useful to aggregate genes with similar expression profiles to find if probesets with particular properties are over-represented in these clusters. The dataset comprised hundreds of statistically significant different gene expression profiles, but for display purposes we chose to aggregate these into the 12 sets shown in Figure using the Self-Organising Map algorithm. This is appropriate for an overview display as it places similar gene clusters next to each other, so each dimension represents a progressive change in the expression profile of the cluster.
Figure 3 Day of anthesis versus normalised expression of probesets grouped into clusters according to their expression profiles by Self-Organising Map algorithm. Each graph represents a different cluster and at the top of the graph the cluster name (which is the (more ...)
In order to associate the gene clusters with biological processes, we assigned probesets to ten process categories chosen to be of most relevance to grain development (see key of Figure ; and Methods). Whereas most probesets could be identified in terms of their molecular function (e.g. transcription factor, protein kinase), only 38% of them could reliably be associated with these processes. The number of probesets in each category are summarised for each cluster in the left hand pie charts shown in Figure . Whole developing caryopses comprise three main types of tissue: the endosperm, the embryo and the outer, maternal (mainly pericarp) tissues. We therefore assigned putative tissue locations to 668 of the transcripts based on published biochemical studies of the encoded proteins and on the locations reported in the barley transcriptome study of Sreenivasulu et al
], who analysed separate pericarp, endosperm and embryo tissues. Additional information on tissue locations came from the in situ
hybridisation database of Drea et al
]. The putative locations of the transcripts in each set are summarised in the right hand pie charts in Figure . The assignment of probesets to process and tissue classifications are available [see Additional file 1
Based on their putative assignments of function and tissue location it is possible to relate the changes in gene expression profiles to stages of grain development. An overall pattern is immediately apparent with embryo transcripts tending to increase throughout development (clusters shown on top left of map in Fig. , i.e. 1_1, 2_1, 1_2), endosperm transcripts tending to increase to a plateau starting at 14 daa (top right) and some endosperm and pericarp transcripts decreasing through development (bottom right).
The cellularisation of the coenocytic endosperm is usually complete between 6 and 8 days after anthesis and is followed by a period of active cell division, expansion and differentiation to establish the starchy endosperm and aleurone tissues. The embryo develops more slowly than the endosperm duringthis period while the pericarp remains metabolically active. This phase corresponds to the 6, 8 and 10 day samples in our analysis and many of the transcripts which are expressed most highly during this earliest period (Figure 1_4, 2_4, 3_4) are associated with the endosperm and pericarp and with cell division, photosynthesis and development rather than storage product (starch and protein) synthesis.
Grain filling is initiated at about 10 daa and continues until about 28 days. This is associated with very high abundance of specific transcripts (2_1, 3_1, 3_2) but these are only represented by about 50 distinct probesets. As a result, and because the data shown in Figure are normalised to median gene expression for display purposes, the dominance of these transcripts during grain fill is not apparent. However, this is clear when our data are expressed on an absolute basis (not shown) and confirms results from other transcriptomics approaches (cDNA arrays, SAGE tag and EST counts), which show storage protein transcripts to be the most abundant in developing seeds of wheat [4
] and rice [14
]. These transcripts tend to reach a maximal level at around 14 daa which is maintained (relative to the total transcriptome) until 42 daa.
Many transcripts associated with the pericarp and photosynthesis decline steadily from the start of the sampling (1_4, 2_4, 3_4); however others maintain a more constant level of expression throughout the developmental series (1_3, 2_3).
In contrast, the majority of embryo transcripts continue to rise until the end of the sampling period (42 days). Transcripts expressed highly during this latter period include many related to defence and stress (1_1, 2_1), in agreement with SAGE results [8
]. The stress transcripts may relate to embryo desiccation; for example, dehydrins are exclusively in group1_1. During this same period (28–42 days) there are decreases in transcripts associated with the endosperm and pericarp (3_3, 3_4).
Several clusters show more subtle, albeit significant, changes in expression throughout development associated with all three tissues (1_2, 1_3, 2_2) or with the endosperm and pericarp (2_3). These four clusters include many transcripts encoding proteins expected to be present in almost every cell type, e.g. mitochondrial proteins, machinery for protein synthesis and degradation, enzymes of primary metabolism.
All clusters contain a small proportion (1–2%) of probesets that are in the antisense orientation when compared to coding rice sequences. These presumably function to down-regulate the sense transcript in vivo
, many of which seem to be involved in protein synthesis and degradation (ribosomal proteins, proteases, ubiquitin, proteasome). A similar fraction of transcripts was identified as being antisense using SAGE technology on developing wheat grain [8
The clusters identified here can be compared with those reported in other transcriptome analyses of developing wheat [4
] and barley [6
] grain by identifying similar sequences [see Additional file 2
]. The separation into a small number of gene clusters is to some extent arbitrary and dependent on choice of algorithm; nevertheless some trends are clear, e.g. cluster 3_1 is very similar to McIntosh et al. cluster 2j (19 out of 20 matching sequences), Sreenivasulu et al
. cluster 5,3 and Laudencia-Chingcuanco et al
. cluster 6. Those clusters that have similarities in sequence composition [see Additional file 2
], also show similar average expression profiles. This shows some conservation of effects across conditions and between wheat and barley. However, it is noticeable that the size of changes in apparent transcript abundance observed here are often greater than in these other experiments. Possibly the EST-based platforms tend to integrate across several similar transcripts thus giving a damped signal compared with the oligonucleotide-based Affymetrix platform [12
Effects of environmental factors on grain development
Environmental factors are known to have effects on wheat grain development, with impacts on both yield and end-use quality [15
]. The effects of heat, drought and heat & drought on the grain transcriptome profile of cv. Hereward, were therefore studied, selecting a highest temperature of 28°C. This temperature is sufficient to affect yield and quality [17
] but substantially below temperatures which are known to affect wheat storage protein gene expression [18
]. The plants were grownin controlled environment (CE) cabinets and subjected to different conditions from 14 daa.
The CE datasets show the same trends as observed in the developmental series but are accelerated, especially in hot and hot & dry conditions, as shown by the average expression of gene clusters [see Additional file 3
] or by gene sets where the likely tissue of expression is known (Figure ). (Note:the expression measure shown in Figs. , , , , whilst not normalised to median gene expression, is still normalised to the total transcriptome at each time point; if expression were to be calculated on a per caryopsis basis the values would be much lower later in development as the total amount of RNA decreases.) Thus the abundances of embryo-associated transcripts increase throughout while those of endosperm-associated transcripts increase to a plateau. These changes occur faster in the heat treated samples and the late decreases in endosperm associated transcripts are greatly exaggerated in the 28 daa heat treated samples (Figure ; Additional file 3
3_2). The pericarp-associated transcripts decreased steeply till 10 daa then more gradually (Figure ).
Figure 4 Geometric average of expression for all probesets likely to be predominantly expressed in embryo, endosperm or pericarp tissues for both developmental series (left panel) and controlled environment experiment with control (c), drought (d), heat (h) and (more ...)
Figure 5 Comparison of expression estimates from wheat Affymetrix array (lines) and quantitative reverse transcription PCR (qRT-PCR; points) for nine transcription factors. The left hand panel displays data from the developmental series which are the average of (more ...)
Figure 6 Comparison between expression of transcription factors (red lines) and their known targets (green lines). A: Transcripts of the TaSPA, TaPBF and TaGAmyb transcription factors and LMW glutenins. B: Transcripts of TaEmBP and TaVP1 transcription factors (more ...)
Expression profiles for probesets matching putative NAC (A), YABBY (B) and ARF (C) transcription factors.
These environmental effects on development are expected since temperature is known to accelerate development and measures such as thermal time have been used in order to quantitise this effect. Drought can also accelerate grain development due to stomatal closure, which reduces transpirational cooling, and due to increases in the rate of desiccation. It is possible to accurately quantify these effects on the transcriptome from the array data by estimating the equivalent stage in the developmental series for each of the CE experiment samples. This was done by calculating a distance measure between the CE samples and the interpolated developmental series [see Methods and Additional file 4
]. The value of daa which gives the minimum distance is an estimate of the developmental stage of the 12 samples (Table ). These estimates are not sensitive to changes in the sample of probesets used, as shown by the results of a bootstrapping procedure.
Table 1 Estimates of equivalent developmental stage, derived from comparison of transcriptomes, expressed as daa in the developmental series for 12 similar samples measured in a different experiment. The estimates are shown and also all the estimates from 200 (more ...)
This analysis showed that the developmental stage of CE samples at 14 daa (the start of the imposition of different environmental conditions) is equivalent to about 20–23 daa in the developmental series; this is expected as the temperature regime used was 23°C/15°C day/night compared to 18°C/15°C in the developmental series. The variation in equivalent daa probably reflects differences between the separate cabinets used.
Increasing the temperature to 28°C/20°C day/night or reducing the water to 44% field capacity both accelerated development over the first seven days of treatment as the control samples progressed the equivalent of 12 days, the dry 15 days, the hot 17 days and the hot & dry 17 days. After 14 days treatment (i.e. at 28 daa) the control sample was equivalent to 41 daa and the other three samples had progressed beyond the final sample taken in the developmental series (at 42 daa). The trends [see Additional file 4
Panel C] show that the treatments continued to accelerate development, with the size of effect being dry < hot < hot & dry.
Altenbach and Kothari [19
] showed that the effect of temperature on expression of some selected genes in wheat grain was consistent with the acceleration of physiological markers of development. Our estimates of developmental stage from the transcriptome agree well with those using the moisturecontent of the grain, if the values for the CE samples [17
] are compared with those from the developmental series (not shown). This suggests that water status acts as a major signal for control of wheat caryopsis development (also postulated by McIntosh et al
Using this transcriptome analysis, it is also possible to identify genes which are affected by the environmental treatments independently of general developmental effects. The expression values from the CE samples at 21 daa were corrected to the closest developmental stage from Table . Probesets were selected which were more than two-fold changed in this corrected expression for both drought-treated or both heat-treated samples relative to corresponding samples lacking these treatments (Table ). Transcripts specifically up-regulated due to drought seem to have a role in non-starch polysaccharide hydrolysis, whereas those down-regulated include a Lt1.1 transcript; homologues of Lt1.1 have been shown to be highly responsive to environmentalconditions, being induced by low temperature (e.g. [20
]). The most highly up-regulated transcript under heat is Rubisco activase whichhas been shown to be inducible by heat and is consistent with its role in maintaining Rubisco integrity at high temperature [21
]. Surprisingly, a transcript for a heat shock protein appeared to be down-regulated in the heat-treated samples. Altenbach and Kothari [19
] also identified a few transcripts as affected by temperature, independently of developmental effects, but these effects were not seen here, probably because of the more moderate high temperature treatment (28/20 compared to 37/28°C). Overall, acceleration of development explained the great majority of the changesobserved at 21 daa; 93% being within a factor of 1.5 of the predicted value.
Table 2 Probesets identified as being early responsive to environmental treatments, when corrected for general developmental effects. The corrected effects, expressed as transcript abundance ratios, of drought (average of drought/control and heat & drought/heat) (more ...)
Expression of transcription factors
The wheat Affymetrix GeneChip® contains about 2,000 probesets for potential transcription factors (TFs). Relatively few TFs have been characterised in cereals and even fewer of their target genes and/or biological roles determined. Available evidence, however, suggests that TFs and their targets display a many-to-many relationship: thus, multiple TFs bind to a promoter while individual TFs control multiple genes. Transcription is either controlled through the requirement for a set of factors to be present in sufficiently high numbers in the right cells at the right time, or via TF modification to alter their binding to DNA or each other. The first model predicts the differential expression of factors together with the genes they control; the second does not. The expression of around a half of the total TFs interrogated was constant during grain development while the remainder were distributed between the expression profiles groups outlined. We selected nine TFs for detailed analysis, using qRT-PCR to confirm the changes in expression levels determined using the GeneChip® arrays (Figure ),
The relative expression levels determined by qRT-PCR showed excellent agreement with those determined using arrays, with correlation coefficients ranging from 0.89 to 0.97 in five out of the nine cases, while the other four varied between 0.79 and 0.53. This degree of agreement is very close to that observed for other genes by [9
The expression profiles of the TFs selected showed three distinct patterns, associated with gene expression predicted to be in the endosperm, embryo or pericarp (Figure ).
Four of the TFs (Figure ) displayed an endosperm expression pattern typical of 2_1, 3_1, 3_2 (Fig. ). The heat and heat & drought treatments precipitated the decline of the transcripts presumably in line with accelerated maturation of the endosperm.
TaSPA and TaPBF
Many prolamin storage protein genes (including those encoding α-gliadins and low molecular weight (LMW) subunits) contain the cis-element known asthe endosperm box in their promoters. This bipartite element is composed ofthe GCN4 box to which the bZIP TFs TaSPA (wheat), and BlZ2 (barley) bind and the prolamin box to which the DOF TFs WPBF(TaPBF) and BPBF bind [22
]. Previous Northern analysis suggested that TaSPA and TaPBF were endosperm-specific with transcript levels peaking around 15–18 daa. Our data(Figure &) is in good agreement with this as their expression parallels the rise in transcripts for LMW subunit and α-gliadins genes (Figure ) which is consistent with a role in prolamin gene expression. Additional targets are likely to fall into the same grain filling expression profile, such as the trypsin inhibitor BTi-CMe genes (Figure ), known targets of BPBF and BIZ2 [26
]. However, not all prolamin genes have an endosperm box [27
]. The high molecular weight (HMW) subunit genes, for example, presumably rely on other TFs potentially with similar profiles to TaSPA and TaPBF: 54 other putative TFs from 13 families are represented in the predominantly endosperm 3_1 cluster.
The R2R3 class barley HvGAmyb TF, initially isolated based on its ability to bind to GA responsive α-amylase promoters expressed in the aleurone during germination also binds to the AACA elements of the B hordein (Hor 2
) and BTI-CMe (iTrr-1
) promoters, and interacts with BPBF [28
]. Their Northern data showed moderate levels of HvGamyb in the endosperm from 10 to 22 daa, while in situs
at 20 daa indicated expression tobe mainly in the aleurone layer and embryo. TaGAmyb has a pattern of expression (Figure ), typical of group 2_1, which would be consistent with roles in endosperm grain filling and in the embryo. α-Amylase genes are not normally expressed in late grain development but in some genotypes and under specific environmental conditions pre-harvest sprouting or premature amylase production can occur [29
]. High levels of TaGAmyb late in grain development may contribute to this phenomenon.
The TF encoded by Ta.37139.1 has homology to the NAC class of TFs [see Additional file 5
]. This large plant-specific TF family (123 in rice PlnTFDB; [30
]) contain a NAM DNA domain that binds to the core sequence CACG. Their members regulate developmental processes, as well as defence and abiotic stress responses [31
]. Although still consistent with endosperm expression, transcripts corresponding to Ta.37139.1.S1 peaked later in grain development compared to the expression of TaPBF2 and TaSPA.
Several other NAC TFs were also included in the array (Fig ). Although their expression pattern was not verified by q-RTPCR, those represented by probe sets Ta.11509.1.S1, TaAffx.117676.1.S1 and Ta.25258.1.S1 showed very similar expression patterns to Ta.37139.1.S1 and their encoded proteins sharea very high degree of sequence similarity with each other and to several barley endosperm expressed NAC genes [6
]. Phyologenetic analysis [see Additional file 6
] of the NAC related sequences from the TIGR gene indices and PlnTFDB database, placed the wheat and barley genes in a small subcladeof the NAM group [32
] together with two TFs from rice and the maize nrp1
genes. Consistent with our data, expression of the maize genes is confined to starchy-endosperm cells [33
], while MPSS data for the two rice genes suggests their expression is confined to the developing grain. No functions have been ascribed to these proteins; however, their tissue specificity and the fact that the maize nrp1
gene is a maternally controlled imprinted gene, may indicate an important role in endosperm development.
Interestingly, the gene represented by probe set Ta.12286.1.A1, had a distinctly different expression pattern (Figure ) although was still highly homologous to the others [see Additional file 5
], suggesting either this protein has a different function, or performs the same function in other tissues of the developing grain.
Three of the other TFs chosen for further analysis showed an embryo like expression pattern similar to group 1_1 and 2_1 transcripts.
TaEmBP and TaVP1
TaEmBP a bZIP TF [35
] and ZmVP1 an ABI3B3 class TF [36
] are known to be associated with maturation of the embryo and aleurone layer and the expression profiles of TaEmBP and TaVP1 both showed an embryo like pattern (Figure ). Known targets of both factors include genes encoding the Em (early methionine) protein that are involved in protecting cells against tissue damage during seed desiccation [37
]. Expression of the multiple Em genes is induced by abscisic acid (ABA) and involves both EmBP and VP1, binding to the G-box in the abscisic acid response element (ABRE[38
]). Expression of the Em genes present in the array (Figure ) was typical of group 1_1 transcripts, thus consistent with TaEmBP and TaVP1, being responsible for their expression.
Maize VP1 is also known to be involved in expression of the aleurone specific myb TF gene C1 and in the repression of α-amylase gene expression in the aleurone layer late in grain development [39
]. Accordingly, ZmVP1 was found to be highly expressed in both developing embryos and the aleurone layer. Since Em mRNA accumulates to high levels in wheat aleurone cells (unpublished) it is likely that this is also true in wheat which would mean group 1_1 contains transcripts expressed in both embryo and aleurone.
HvMyb3 a myb-related (SHAQKYF R1myb) TF was reported to be capable of interacting with BPBF and BLZ2 and to bind to the TATC elements in the promotersof the Itr-1
(BTI-CMe), and α-amylase Amy6.4
]. The expression profile of the wheat TaMyb3 (Ta.7266.1.S1) orthologue (Figure ), however, does not reflect those of BTI-CMe (Figure ), or the LMW subunit genes (Figure ), nor TaGAmyb (Figure ), which potentially regulates the same spectrum of genes, and is more consistent with a primary role in the embryo for this factor. In fact HvMyb3 was also shown to be expressed in barley embryos in addition to the developing endosperm, but it ispossible that the roles of HvMyb3 and TaMyb3 may have diverged.
Two previously uncharacterised TFs showed a typical pericarp like expression similar to transcripts in groups 2_4 and 3_4 (Figure ).
The protein corresponding to probe set Ta7721.1.S1 (Figure ) has homologyto the class of TFs known as C2C2-YABBY, all of which contain a zinc-fingerDNA binding domain and a HLH YABBY domain. This small plant-specific TF family contains seven to eight members in rice and six in Arabidopsis, where they have been shown to be involved in establishing abaxial-adaxial polarityin lateral organs and in restricting meristem initiation and growth [41
]. Characterisation of the genes in monocots is less advanced, but mutational and expression analysis suggest that their functions have diverged between monocots and dicots, with the monocot TFs lacking a central role in specifying abaxial-adaxial cell fate [42
Phylogenetic analysis [see Additional file 6
] shows that apart from Ta7721.1.S1 the wheat Affymetrix chip also has probesets for wheat homologues to all of the rice genes [43
] apart from OsYab1 and OsYab7. The wheat Ta yab
3, 4, and 5 genes are not expressed in developing grain (which is also true of their counterparts in rice), while Tayab2 (Ta7721.1.S1), Ta
DL (Ta4352.1.S1), and Tayab6 (Ta.14101.1.S1) all showed broadly similar pericarp like patterns of expression (Figure ); which is consistent with the pericarp expression reported for HvDL and early grain development forOsyab2, 6 and OsDL [6
]. Our data are consistent with a role for all these yabby proteins in pericarp development in wheat.
Probeset Ta.7431.1.A1 (Figure ), shows homology to the auxin response factor (ARF) family of TFs that bind specifically to TGTCTC-containing auxin response elements (AuxREs). This relatively small TF family (25 members in rice) play a pivotal role in auxin-regulated gene expression of primary response genes [44
]. The wheat gene sequence is most closely related to the rice OsARF22 and the Arabidopsis AtARF16
genes [see Additional file 7
] that are highly expressed in most tissues [45
]. The function of these ARFs is unknown but the wheat gene expression profile would be consistent with an auxin-mediated role in pericarp development. A second ARF represented by the probe set Ta2593.2.S1 was also highly expressed in developing grain although in a pattern consistent with roles in both the pericarp and endosperm (Fig. ), This gene is most closely related to OsARF4 of rice and At ARF
2 of Arabidopsis. Mutants in AtARF
2 result in pleiotropic effects related to its repression of cell division. For example, knockouts of ATARF2
lead to extra cell divisions in the integument, which in turn result in the production of larger seeds [46
]. It would be of interest to determine if Ta2593.2.S1 has a similar role in the pericarp and endosperm of wheat.