Before this study, the NCBI GenBank database held 14,246 ESTs and 34.5 million RNA-seqs from C. sinensis. In this study, we report the identification of 17,458 ESTs from seven cDNA libraries. Within the 5.3-k unigene set developed here, 732 unigenes had no significant matches by BLASTN homology searches against the tea ESTs and assembled sequences from RNA-seqs previously deposited in GenBank, indicating that these unigenes are novel mRNA sequences from tea. The lengths of 64.1% of the sequences in the 5.3-k unigene set were more than 500 bp, whereas in the unigenes generated by RNA-seq analysis, only 17.9% were longer than 500 bp. In general, EST analysis using Sanger sequencing generates longer sequence reads than RNA-seqs using a high-throughput Illumina GA IIx sequencer, so the difference in unigene length distribution is attributed to the difference in sequencing technique.
The data presented here are expected to become a useful gene resource for research aimed at understanding physiological processes important for tea cultivation and quality, such as nitrogen assimilation and amino acid metabolism. In Japan, large amounts of nitrogen fertilizers are used in tea plantations, causing pollution of groundwater, rivers and lakes. To improve this situation, it is important to develop tea cultivars with high nitrogen use efficiency (
Tanaka and Taniguchi 2007). Therefore, we searched for unigenes related to nitrogen assimilation within the unigene set and found several that were homologous to genes related to nitrogen assimilation, such as glutamine synthetase, glutamate de-hydrogenase, ammonium transporter and nitrate transporter. In addition, the unigene set contains theanine synthase and several unigenes related to the metabolism of 2-
oxoglutarate, a key component of the interaction of nitrogen and carbon metabolism.
In addition to nitrogen compounds such as amino acids, secondary metabolites such as catechins and caffeine are important for tea quality. Among our ESTs, we found several unigenes related to synthesis of these secondary metabolites. The metabolisms of nitrogen compounds and secondary metabolites are regulated by environmental status. For example, in young tea leaves, catechins increase under high light intensity (
Saijo 1980). In contrast, shading of young tea leaves leads to an increase in total nitrogen content, as well as enhancement of theanine (
Anan and Nakagawa 1974,
Karasuyama and Matsumoto 1988). In the future, it will be important to decipher the mechanism of photoresponsive regulation of genes related to the metabolism of nitrogen compounds and secondary metabolites to enable improvement of these traits. Two unigenes related to photoresponse were found in our ESTs, providing us with tools to analyze the associated regulatory mechanisms.
Tea is well known as an aluminum-accumulating plant that grows well in very acidic soils containing high levels of Al
3+; this is of interest because aluminum toxicity limits the growth of many other species in acidic soils (
Morita et al. 2004,
2008) and the aluminum in the xylem sap of tea is complexed with citrate (
Morita et al. 2004). Three unigenes potentially related to aluminum response were found in this study: one citrate synthetase and two aluminum-response proteins. Further analyses, such as expression analysis of the response of tea to aluminum, might reveal whether these genes have roles in aluminum resistance or response.
Using the EST data derived from seven different organs of the tea plant, digital northern analysis was performed to identify unigenes with different expression levels among different organs; 67 such unigenes were identified out of a sample of 144. Cluster analysis showed that the groups of unigenes highly expressed in each organ were related to different physiological functions. For example, several photosynthesis-related genes were highly expressed in the YL and ML libraries. Cluster III, which showed high expression in the RT library, was the largest cluster (25 unigenes), indicating that the physiological and developmental status of young root is considerably different from that of other organs. Interestingly, dihydroflavonol 4-reductase (DFR) was highly expressed in tap roots and lateral roots. Although catechins are not contained in tea roots (
Forrest and Bendall 1969), leucoanthocyanidin, which is the product of DFR and the precursor of (+)-catechin, is contained in roots. Thus, we assume this DFR in roots to be involved not in catechin biosynthesis, but in other metabolic processes such as lignin or anthocyanin biosynthesis. One more unigene encoding DFR was found in the 5.3-k unigene set. This unigene was expressed in young stem, and the sequence similarity between the two DFRs was 52%. We think that the DFR from young stem is involved in catechin biosynthesis.
Ellis and Burke (2007) surveyed EST data from 33 species and showed that the proportion of unigenes containing SSRs was 2.5% to 21.1% (9.0% ± 0.1%, mean ± SEM). Based on this survey, the percentage of SSR-containing unigenes in this study (34.9%) is relatively high compared to that in other plant species.
The proportion of multi-locus markers in this study was higher than that reported by
Sharma et al. (2009). We used a capillary sequencer for fragment analysis, whereas
Sharma et al. (2009) used autoradiography of PAGE gels, which has lower resolution. Thus, the difference in the proportion of multi-locus markers might have been caused by the difference in the analysis method. Because of the paleopolyploidy of
C. sinensis (
Shi et al. 2010), it is not surprising that many multi-locus markers are contained in the set of EST-SSRs reported here.
The 16 accessions used in this study include major tea cultivars in Japan, parental cultivars and several foreign germplasms. These materials are representative of the genetic diversity of breeding materials in Japan. The EST-SSRs developed in this study were highly polymorphic among the 16 accessions. They should be very useful for many genetic studies in tea, such as construction of linkage maps, analysis of genetic diversity and cultivar identification. For example, using 377 EST-SSRs and other co-dominant markers, we recently constructed a reference linkage map of tea (
Taniguchi et al. 2012).
Most of the EST-SSR markers developed here were applicable to other
Camellia species. Species of
Camellia other than
C. sinensis contain useful traits that have been utilized in tea breeding; for instance, a parental line containing a high level of anthocyanin (
Ogino et al. 2005) and a caffeineless tea plant (
Ogino et al. 2009) were developed from interspecific crosses. EST-SSR markers will enable genetic analysis of important agronomic traits of various
Camellia species, thus expanding the usefulness of these species in tea breeding.
In conclusion, the tea ESTs obtained in this study are valuable resources for analysis of gene function and for development of SSR markers. The 5.3-k tea unigene set contains novel transcripts from tea, and 67 out of 144 unigenes tested showed specific expression patterns among a set of seven organs. The SSR markers developed in this study are highly polymorphic in C. sinensis and many other Camellia species. Further studies using the tea EST dataset are expected to accelerate functional genomics and genetic breeding research in tea.