|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Despite its primary economic importance, genomic information on olive tree is still lacking. 454 pyrosequencing was used to enrich the very few sequence data currently available for the Olea europaea species and to identify genes involved in expression of fruit quality traits.
Fruits of Coratina, a widely cultivated variety characterized by a very high phenolic content, and Tendellone, an oleuropein-lacking natural variant, were used as starting material for monitoring the transcriptome. Four different cDNA libraries were sequenced, respectively at the beginning and at the end of drupe development. A total of 261,485 reads were obtained, for an output of about 58 Mb. Raw sequence data were processed using a four step pipeline procedure and data were stored in a relational database with a web interface.
Massively parallel sequencing of different fruit cDNA collections has provided large scale information about the structure and putative function of gene transcripts accumulated during fruit development. Comparative transcript profiling allowed the identification of differentially expressed genes with potential relevance in regulating the fruit metabolism and phenolic content during ripening.
An improvement of our knowledge on gene composition and expression is essential to investigate the molecular basis of fruit ripening and to define the gene pool involved in lipid and phenol metabolism in an oil crop species as olive, characterized by a peculiar fatty acid and antioxidant composition.
The availability of complete genome sequences and large sets of expressed sequence tags (ESTs) from several plants has recently triggered the development of efficient and informative methods for large-scale and genome-wide analysis of genetic variation and gene expression patterns. The ability to monitor simultaneously the expression of a large set of genes is one of the most important objectives of genome sequencing efforts. In this respect, the 454 pyrosequencing technology  is a rather novel method for high-throughput DNA sequencing, allowing gene discovery and parallel efficient and quantitative analysis of expression patterns in cells, tissues and organs.
In the past few years, several studies based on comparative high throughput sequencing of plant transcriptomes have, indeed, allowed the identification of new gene functions, contaminant sequences from other organisms, alterations of gene expression in response to genotype, tissue or physiological changes, as well as large scale discovery of SNPs (Single Nucleotide Polymorphisms) in a number of model and non model species, such us maize, grapevine and eucalyptus [2-5].
Olive is the sixth most important oil crop in the world, presently spreading from the Mediterranean region of origin to new production areas, due to the beneficial nutritional properties of olive oil and to its high economic value.
It belongs to the family of Oleaceae, order of Lamiales, which includes about 10 families for a total of about 11,000 species. Members of this order are important sources of fragrances, essential oils and phenolics claiming for numerous health benefits, or providing valuable commercial products, such as wood or ornamentals. Information on the genome sequence and transcript profiles of the entire clade are completely lacking.
Olive is a diplod species (2n = 2x = 46), predominantly allogamous, with a genome size of about 1,800 Mb [6,7]. In spite of its economical importance and metabolic peculiarities, very few data are available on gene sequences controlling the main metabolic pathways.
Olive accumulates oil mainly in the drupe mesocarp and its content can reach up to 28–30% of total mesocarp fresh weight. Olive oil shows a peculiar acyl composition, particularly enriched in the monounsaturated fatty acid oleate (C18:1), deriving from the desaturation of stearate. Oleate can reach percentages up to 75–80% of total fatty acids, while linoleate (C18:2), palmitate (C16:0), stearate (C18:0) and linolenate (C18:3) represent minor components. The final acyl composition of olive oil varies enormously among varieties. Environmental factors, such as temperature and light during fruit ripening, can deeply influence the balance between saturated and unsaturated fatty acids .
The chemistry of phenolic oleosides is attracting an increasing interest of pharmacological research and agri-food biotechnology, and the biochemical pathway leading to their biosynthesis and regulation has been recently deeply evaluated , even if the genetic control still remains completely unknown.
Secoiridoids represent the most important class of phenolics and they arise from simple structures, like tyrosol and hydroxytyrosol, to quantitatively more important conjugated forms like oleuropein, demethyloleuropein, 3-4DHPEA-EDA and ligstroside . Oleuropein is the main secoiridoid, representing up to the 82% of total biophenols, known as the bitter principle of olives and responsible for major effects on human health and for releasing phytoalexins against plant pathogens . Another secoiridoid with relevant health functions is oleocanthal (deacetoxy ligstroside aglycone) .
Developing olives contain active chloroplasts capable of photosynthesis, thus representing significant sources of photoassimilates. While chlorophyll is localized mostly in the epicarp, the mesocarp contains significant amounts of other photosynthetic pathway components, such as phosphoenol pyruvate carboxylase .
Olive fruit development and ripening, takes place in about 4–5 months and includes the following phases: i) fruit set after fertilization, ii) seed development, iii) pit hardening, iv) mesocarp development and v) ripening. During the ripening process, fruit tissues undergo physiological and biochemical changes that include cell division and expansion, oil accumulation, metabolite storage, softening, phenol degradation, colour change (due to anthocyanin accumulation in outer mesocarp cells). Oil synthesis starts after pit hardening, reaching a plateau after 75–90 days, while the phenolic fraction is maximum at fruit set and decreases rapidly along fruit development.
This work is aimed at defining the transcriptome of olive drupes and at identifying ESTs involved in phenolic and lipid metabolism during fruit development. Drupes from two cultivars have been used: a widely cultivated variety characterized by a very high phenolic content, and an oleuropein-lacking natural variant; two developmental stages, at completed fruit set and at mesocarp development, representing diverse sets of expressed genes, were analyzed using 454 pyrosequencing.
The starting materials to explore the olive fruit transcriptome were fruit pools from two Olea europaea cultivars, Coratina and Tendellone (C and T), showing striking differences in their biophenol accumulation pattern (Figure (Figure1,1, ,2).2). C is cultivated in the Apulia region and represents the most widely cultivated variety of Italy, while T is a minor cultivar locally spread in Central Italy. A previous SSR analysis reported a very high genetic distance between them . These cultivars also differ markedly in their oleuropein concentration (272.9 mg g-1 dw in C, decreasing strongly during fruit ripening, and 0.3 mg g-1 dw in T, decreasing only slightly during ripening). In contrast, the content of the 3-4DHPEA-EDA intermediate compound was similar between genotypes (data not shown).
Four enriched full-length ds-cDNA collections (see Methods) were obtained and their 454 pyrosequencing provided a total of 261,485 sequence reads, corresponding to 58.08 Mb, with an average read length of 217–224 nt, depending on the cDNA sample (Table (Table1).1). The 4-step procedure adopted by the ParPEST pipeline to process the 454 EST reads is summarised in Figure Figure33.
ESTs were masked to eliminate sequence regions that would cause incorrect clustering. Targets for masking include simple sequence repeats (SSR, also referred to as microsatellites), low complexity sequences (including poly-A tails) and other DNA repeats. The number of ESTs masked for each category, as well as the total nucleotides that were masked, are shown in Table Table22.
The most frequent DNA repeats, identified using RepBase as the filtering database, were ribosomal RNA (both SSU and LSU); LTR retrotransposons from the BEL type family, Gypsy and Copia; non-LTR retrotransposons from the CR1 superfamily and, finally, a batch of retro pseudogenes (CYCLO, L10, L31, L32) (data not shown).
In order to assess EST redundancy in the whole collection and provide a survey of the Olea europaea drupe transcriptome, masked EST sequences were pair-wise compared and grouped into clusters, based on shared sequence similarity. As a consequence, the obtained clusters are ESTs which are most likely products of the same gene. Each cluster was then assembled into one or more tentative consensus sequences (TCs), which were derived from multiple EST alignments. As described in Methods, TCs within a cluster shared at least 90% identity within a window of 100 nucleotides. Therefore, the presence of multiple TCs in a cluster could be due to possible alternative transcripts, to paralogy or to domain sharing. In addition, all the ESTs that, during the clustering/assembling process, did not meet the match criteria to be clustered/assembled with any other EST in the collection, were defined as singleton ESTs. The combination of TCs and singletons are referred to as unique transcripts.
The total number of clusters generated was 22,904. They were assembled into 26,563 TCs comprising 185,913 EST reads. The TC length ranged between 102 (min) and 4,916 (max) nucleotides, while TC average length was 355 nucleotides. 2,406 were the clusters assembled into multiple TCs (ranging from 2 to 14). The total number of singleton ESTs (sESTs) was 75,570, with an average length of 179 nucleotides (Table (Table3,3, ,44).
The analysis of the full EST collection from this work revealed an average GC-content of 42.5%, ranging from less than 16% to more than 63%.
The OLEA EST database consists of a main relational database (MySQL) which collects raw as well as processed data generated by ParPEST. This is supported by three local satellite databases: myENZYME, a local copy of the ENZYME repository which was built by parsing the enzclass.txt and the enzyme.dat files (release 04 Nov 2008) retrieved from the ExPASy FTP site; myGO, a mirror of the Gene Ontology database, which was built by running the seqdblite MySQL script (version 20081102) downloaded from the GO database archives; myKEGG, which was built by parsing XML files of the KEGG pathways (release 21 October 2008), retrieved from the KEGG  FTP site. A PHP-based web application provides user-friendly data querying, browsing and visualization.
The web interface http://454reads.oleadb.it/ includes Java tree-views for easy object navigation as well as the possibility to highlight on-the-fly the enzymes in the pathway image files retrieved from the KEGG FTP site.
In order to identify Olea unigenes coding for proteins with a known function, we used a BlastX-based annotation that provided 12,560 TCs with significant similarities to proteins in the UniProtKB database; the remaining 14,003 (52.7%) had no function assigned. A higher number of sESTs with no function was obtained (58,835), representing 77.85% of the total.
When considering annotated TCs and sESTs with respect to the origin of the protein data source, the bulk of the identifications (73% – 75%), concerned proteins of plant origin, as expected.
TCs and sESTs coding for enzymes with assigned EC number were 5,040 and 5,864, respectively (Figure (Figure4A)4A) following the ENZYME classification scheme http://www.expasy.org/enzyme/. The majority of the enzyme-related unigenes encode for transferases (3,982), hydrolases (2,628) and oxidoreductases (1,895). Of particular relevance for fruit metabolism are those TCs and sESTs involved in the biosynthesis of secondary metabolites (761) and lipids (1,005) (Figure 4B, C). Most frequently, the same enzymatic function is redundantly encoded by several unigenes, this may be the result of different proteins referenced with the same EC number or the effect of different transcripts encoding specific enzyme subunits. Given the limited sequence length typically provided by 454 pyrosequencing, it is also plausible that in some cases different TCs and sESTs cover non-matching fragments of the enzyme transcript coding frame.
In principle, the higher the number of ESTs assembled in a specific TC, the higher the number of mRNA molecules encoding that particular gene in a given tissue sample. However, differences in transcript abundance may reflect sampling errors rather than genuine differences in gene expression. Hence, in order to identify differentially expressed genes in the four sequenced fruit cDNA collections, the statistical R test  was applied, as a measure of the extent to which the observed differences in the gene transcription among samples reflect their actual heterogeneity. Applying this test and further filtering criteria to select differentially expressed TCs among the four sets (see Methods), we selected 2,942 differentially expressed TCs, 1,627 of them with a predicted annotation and 1,315 with no similarity with other sequences in the public databases [see Additional file 1].
Clustering of differentially expressed TCs distinguished gene transcripts differentially expressed during fruit development from those differentially expressed between genotypes, evidencing that the former were more numerous than the latter. C was the genotype showing the highest expression differences between the two stages (Figure (Figure5A).5A). This result was confirmed by PCA analysis, where transcripts from the second stage of C were significantly divergent from the remaining ones (Figure (Figure5B5B).
Transcript differences affect several important physiological processes that promote fruit growth and development. Transcripts identified as differentially expressed between 45 and 135 DAF in both genotypes and between genotypes, grouped in 13 categories on the basis of their predicted annotations, showed that different biological processes are modulated at the molecular level by stringent genetic and developmental signals (Figure (Figure6).6). Transcripts involved in photosynthesis, in biosynthesis of structural proteins (histones, aquaporins, ribosomal proteins, tubulins, pollen allergens), in terpenoid and flavonoid biosynthesis, in cell wall metabolism, in cellular communication (hormone biosynthesis and regulation, cascades of signal transduction) and responses to biotic and abiotic stresses, were mainly expressed at 45 DAF, whereas the majority of gene transcripts related to different primary metabolic pathways (carbohydrate, lipid, amino acid and protein metabolism) as well as transcription factors and regulators and genes involved in vitamin biosynthesis, were more expressed at 135 DAF (Figure (Figure66).
A wide set of genotype-specific TCs, mainly related to hormone biosynthesis and signalling, responses to abiotic and biotic stresses, biosynthesis of terpenes and phenylpropanoids, were also observed (Table (Table55).
Furthermore, TCs encoding structural enzymes synthesizing terpenoids and terpenoid precursors (such as dimethylallyl diphosphate (DMAPP) and isopentenyl diphosphate (IPP)) fluctuated between developmental stages (Figures (Figures77 and and8).8). Transcripts involved in the mevalonate (MVA) pathway for isoprenoid biosynthesis, occurring in the cytoplasm, were predominantly not regulated, while six out of seven genes coding for the main enzymes of the plastidial non-mevalonate (non-MVA) pathway, presented TCs more abundant at 45 DAF.
Finally, transcripts involved in flavonoid biosynthesis were also regulated between developmental stages and genotypes (Figure (Figure99).
This is the first report of a large-scale and comparative EST analysis from olive fruit. Olive is one of the most important oil crops in the world. It belongs to the Asterid clade of angiosperms, that includes thousands of economically important crops for which genomic information is still scarce. The massive EST characterization described here can be considered an initial platform for the functional genomics of Olea europaea and will be a starting point for the establishment of molecular tools for improving the major quality traits in Olea species. Massively parallel EST sequencing provided more than 102,000 unigenes consisting in 26,563 TCs and 75,570 singletons from four fruit libraries. Considering 27 available data on expressed genes of other plant species, such as Arabidopsis , it is possible that the reported unigene set of Olea is an over estimate of the actual number of transcripts expressed in the fruit. This could in part be the result of unassembled segments of TCs and sESTs pertaining to the same transcript unit. A certain amount of incomplete EST assembly is expected as a result of the short reads provided by the 454 pyrosequencing technology.
Despite the fact that cDNA samples were prepared without any normalization process, we only found a moderate degree of redundancy. Clustering of ESTs has indeed reduced the number of sequences by 61% from 261,483 quality passed reads to 26,563 TCs plus 75,570 sESTs. RepBase masking analysis has revealed a surprising amount of short repeats and transposable elements (TEs), which could represent a valuable resource to develop TE-derived molecular markers  and to investigate on Olea genome size evolution. Also, the GC-content of 42.5%, ranging from less than 16% to more than 63%, can provide a contribution to the evolution studies and gene transfer dynamics within the Oleaceae taxon.
The percentage of TCs and singletons with no putative function assigned was considerably elevated, possibly as a result of gene functions specifically evolved in Olea europaea and quite divergent from those of other plant species. The Olea fruit, indeed, presents a number of exclusive traits, like, above all, oil and biophenol accumulation. These traits are encoded at genomic level. On the other hand, the high incidence of unigenes with no assigned function (about 70%), could be due to the poor annotation that still affects protein databases. Also, it is possible that many TCs and sESTs could not be reliably annotated because they did not cover the entire length of the transcript or because they represent untranslated regions (UTRs). This could be particularly the case of our dataset given that the 454 sequencing technology typically provides short sequence reads.
The identification in the Olea genome of transcribed sequences similar to a wide range of phylogenetically distant organisms raises intriguing questions about the evolution of their physiological roles and about whether or not these sequences and the related functions are the result of recent gene transfer or the relic of an ancient past.
It is important to note that about 25% of the annotated enzyme-coding transcripts are involved in biosynthesis of lipids and fruit metabolites. The availability of the genetic information related to these enzyme functions represents, in our view, a fundamental tool for understanding the molecular basis of the expression of traits related to fruit phenotype and for establishing new strategies of metabolic engineering to improve the overall quality of olive fruit.
Large scale random sequencing of different fruit cDNA collections has provided information on relative large scale variation of gene expression. However, it should be noted that no further experimental validation has been performed on differentially expressed TCs passing the R test .
Analysis of differentially expressed gene transcripts evidenced large differences in key genes involved in a number of metabolic pathways that can potentially alter most quality traits in olive fruits. In some cases, different TCs with identical predicted annotation showed a contrasting accumulation pattern between developing stages or between genotypes; this implies that similar, although not identical, proteins and enzymes may undergo different expression patterns, determining a fine regulation of metabolic pathways and the accumulation of alternative metabolites.
It is interesting to note that the C cultivar underwent a larger degree of transcriptional modulation during fruit development. It is possible that this is related to the very high content in phenolic compounds at the beginning of fruit development in this cultivar.
Expression differences were found for transcripts involved in several physiological processes that promote fruit growth and development. Plant cells require sugars to synthesize lipids and acetyl-CoA is the precursor of carbon chain elongation in all fatty acids. Photosynthesis is an important source of sugars for mesocarp development and olive oil biogenesis. Photoassimilates translocated from leaves to fruit mesocarp by phloem are another indispensable source of sugars in developing fruits . At the end of the ripening process, concurrently to the decrease of chlorophyll content in fruit mesocarp and to the gradual color change, the intense mitochondrial respiration of photoassimilates translocated from leaves to fruits through the phloem, becomes the main energy source sustaining fruit ripening . Consistent with this fact, TCs with predicted functions related to photosynthesis (photoreception, Calvin cycle and oxidative phosphorilation) were more represented at 45 DAF, while transcripts associated with carbohydrate metabolism (glycolysis/gluconeogenesis, citrate cycle, and fructose, mannose and galactose metabolism) were more represented at 135 DAF.
Generally, transcript fluctuations were consistent with the physiological status of the fruit. The higher expression of transcripts related to the biosynthesis of structural proteins at 45 DAF may be correlated with the intense and rapid cell divisions during fruit growth, while the higher expression of transcripts putatively associated with fatty acid biosynthesis and with the assembly of storage triacylglycerols (TAGs) at 135 DAF, is in agreement with fatty acid accumulation pattern in olive fruits, starting at about 90 DAF until the end of fruit maturation .
This work has allowed the identification of most TCs related to secoiridoid conjugate (such as oleuropein) biosynthesis, deriving from the conjugation of an esterified phenolic moiety (phenylpropanoid metabolism) and an oleoside moiety (deoxyloganic acid deriving from the mevalonic acid pathway) . TCs related to the mevalonate pathway were not significantly regulated with the exception of MVAPP decarboxylase, leading to the biosynthesis of IPP, a common precursor for all secoiridoid oleosides. Surprisingly, this TC was up-regulated in both cultivars at fruit veraison, when the secoiridoid content is decreasing. The non-MVA pathway, recently discovered in plant chloroplasts, produces various classes of terpenoids, mostly hemiterpenes, monoterpenes, diterpenes and carotenoids . In higher plants, both pathways operate simultaneously and their physical compartmentalization does not preclude exchanges of metabolic intermediates, although the nature of this crosstalk remains to be elucidated [20,21]. It is likely that the specialization of each pathway can play a key role in regulating the biosynthesis of specific end products during olive fruit development.
Given the very high accumulation of secoiridoids (mainly oleuropein) in developing fruits, the terpenoid metabolic pathway must be strongly oriented toward the secoiridoid biosynthesis branch. Unfortunately, the main enzymes and related genes involved in oleuropein biosynthesis are still unknown, hence the information provided by our comparative genomics survey cannot provide direct insight into the molecular basis of secoiridoids accumulation. Since C and T show extremely different oleuropein accumulation patterns, it is likely that transcripts encoding key enzymes for oleuropein biosynthesis show specific differences in their accumulation patterns in these two cultivars. In this respect, mining of TCs differentially expressed in T vs. C, with functional annotations compatible with enzymes for oleuropein biosynthesis, or with no functional annotation, is under way to address this point.
Other classes of phenolic compounds are known to be less represented in olive fruits. Nonetheless, specific alteration of gene transcripts encoding a number of structural enzymes suggests that even metabolites synthesized from secondary branches can show differential expression following developmental and genetic cues.
The common precursor of both secoiridoids and indole alkaloids, is deoxyloganic acid. Also loganin and secologanin, leading to indole alkaloid synthesis, could possibly be involved in biosynthesis of secoiridoids . The fact that three TCs putatively encoding secologanine synthase (EC: 22.214.171.124) are more represented at 45 DAF, while 2 TCs are instead more represented at 145 DAF, deserves further investigation to verify to which extent fine regulation of different secologanine synthases can affect the actual accumulation of specific secoiridoid/indole alkaloid products. Alternate regulation of distinct TCs sharing identical annotations has proven to be a quite common condition in the Olea transcriptome [see Additional file 1], suggesting that the well known plasticity of metabolite accumulation in olive fruits could be the result of the fine modulation of genes encoding different enzyme isoforms.
Although the flavonoid content of olive fruits is relatively low compared to other phenolic classes and the pattern of their accumulation during fruit development is still unknown, transcriptional differences observed between 45 DAF and 135 DAF are in general agreement with data available from other fruit species .
The EST database also contains comparative information between fruits of the C and T genotypes. This can be particularly informative, considering that, as previously reported, the two genotypes have extremely differentiated fruit phenolic content. On the other hand, the lack of genetic information on the biosynthesis of secoiridoids in plants makes it impossible to find orthologs in protein databases, thus precluding the possibility to identify ESTs directly correlated to their accumulation in the olive fruit.
Among genotype-specific transcripts, several TCs putatively involved in the biosynthesis of steroids with nutritional and health benefits, were reported exclusively in C. Two TCs, specific to C, encode R-limonene synthase 1 (EC: 126.96.36.199) and 1,8-cineole synthase (EC: 188.8.131.52), which are related to the biosynthesis of important flavour compounds, such as (+)-R-limonene, one of the most abundant monocyclic monoterpenes in nature  and 1,8-cineole, also known as eucalyptol, a monoterpenic oxide present in many plant essential oils.
A number of other genotype-specific TCs could account for biologically relevant differences between C and T and provide a useful hint for focused biochemical analyses. In general, genotype-specific TCs were prevalent in C, supporting the hypothesis that C fruits may synthesize a wider array of secondary metabolites. The Olea EST database will be a useful tool for unravelling the biochemical diversity of olive fruits.
In this work we describe the first large EST collection of Olea europaea L. It represents a valuable resource to assist a preliminary evaluation of features from the Olea europaea genome (i.e. GC content, SSR, genome annotation). The EST database can be consulted through a user friendly web interface that provides useful tools for data querying, blast services, browsing and visualization.
Comparative sequencing of four fruit cDNA collections has provided information on variation of gene expression during fruit development and between two genotypes with contrasting phenolic accumulation in fruits.
Analysis of differentially expressed gene transcripts evidenced large differences in key genes involved in a number of metabolic pathways that can potentially control most quality traits in olive fruits.
Olive drupes of two cultivars were used: Coratina (C), a widely cultivated variety, characterized by a very high phenolic content, reaching 332,5 mg/g of total fruit dry weight, and Tendellone (T), a low-phenolics natural variant (42,7 mg/g dw). Olive fruits were sampled from plants of the Olive Cultivar Collection held by the CRA-OLI (Collececco, Spoleto). The different olive trees were grown using the same agronomic practices, including irrigation conditions, that can affect phenolic concentration in the fruit. C and T cultivars were selected as the extreme variants in phenolic content among a set of twelve cultivars (including Bianchella, Canino, Dolce d'Andria, Dritta, Frantoio, Leccino, Moraiolo, Nocellara del Belice, Nocellara Etnea, Rosciola) surveyed for two years for oleuropein, demethyloleuropein, and 3–4 DHPEA-EDA content (data not shown). Fruits were harvested at 45 and 135 days after flowering (DAFs). These stages correspond to important physiological phases of fruit development: completed fruit set and mesocarp development, respectively. Only fruit mesocarp and epicarp have been used for RNA extraction.
Total RNAs were extracted from pooled fruits using the RNeasy Plant Mini Kit (Qiagen). Contaminating genomic DNA was removed by DNase I (Qiagen) treatment. The RNA was quantified using a spectrophotometer at a wavelength of 260 nm and its quality was checked by running 5 μg of total RNA on 1.2% agarose gel under denaturing conditions (Figure 10A).
The cDNA was prepared by using SMART PCR cDNA Synthesis protocol (Clontech) optimizing the conditions to obtain high quantity of clean cDNA in a small volume. For this reason numerous tests have been performed applying different reaction conditions and purification protocols.
First strand synthesis was performed using 6 ug of total RNA for each sample, in three independent reactions, with the use of Super Script II (Invitrogen) in a reaction mixture containing 50 mM Tris-HCl pH 8.3, 75 mM KCl, 6 mM MgCl2, 2 mM DTT. The retro-transcription reaction was primed with 3' SMART CDS Primer IIA (Clontech). The SMART II™ A Oligonucleotide (Clontech), which has an oligo(G) sequence at its 3' end, was used to create an extended template useful for the full-length enrichment provided by SMART™ technology. In fact, when reverse transcriptase (RT) reaches the 5' end of the mRNA, the enzyme's terminal transferase activity adds a few additional nucleotides, primarily deoxycytidine, to the 3' end of the cDNA. The SMART™ II A Oligonucleotide base-pairs with the oligo(G) sequence and RT then switches templates and continues replicating to the end of the oligonucleotide . In cases where RT pauses before the end of the template, the addition of deoxycytidine nucleotides is less efficient than with full-length cDNA-RNA hybrids, thus preventing base-pairing with the SMART™ II A Oligonucleotide. The SMART anchor sequences contained in both 5' and 3' ends of cDNA serve as universal priming sites for end-to-end cDNA amplification. In this manner, SMART method is able to preferentially enrich for full-length cDNAs http://www.clontech.com/images/pt/PT3041-1.pdf.
For second strand synthesis, PCR was carried out on a small aliquot (1/10th volume) of the primary template by using Advantage 2 Polymerase Mix (Clontech). The following thermal cycling program was applied: initial denaturation at 95°C for 60 sec, followed by 15 cycles at: 95°C denaturation for 30 sec; 55°C annealing for 30 sec; 68°C extention for 6 min. All the PCR reactions, for each sample, were pooled together and purified by using QIAquick PCR purification kit (Qiagen).
Double stranded cDNA was quantified with a spectrophotometer (NanoDrop 1000, Thermo Scientific) and microplate fluorimeter (Victor 2, Perkin Elmer, Wellesley, MA, USA) and then concentrated by speed vacuum to a concentration of 500 ng/ul. The products were checked on a 2% agarose gel to verify cDNA quality and fragment length. The main size distribution was included between 500 and 4,000 bp (Figure 10B).
Approximately 5 μg of each cDNA sample were sheared via nebulization into small fragments, and sequenced in a single 454 run (the pico-titer plate was divided in four sectors) by using a GS-FLX sequencer (454 Life Sciences, Branford, CT, USA).
Raw unprocessed EST sequences generated from this study have been submitted to Short Read Archive (SRA) division of the Genbank repository. 454 SFF file containing raw sequences and sequence quality information can be access through the SRA web site under accession number SRA008270. Data files can also be directly accessed via FTP ftp://ftp.ncbi.nlm.nih.gov/sra/Submissions/SRA008/SRA008270/.
ESTs were processed using the ParPEST (Parallel Processing of ESTs) pipeline  that has been tweaked in order to properly manage shorter length error prone sequences, such as 454 EST reads.
Masking of simple sequence repeats (SSR) and low complexity sub-sequences was performed using the RepeatMasker tool http://www.repeatmasker.org. In addition, EST reads were screened against RepBase (version 13.06) , a library of repetitive elements, and returned as masked sequences ready for the clustering/assembling and annotation procedures. This process produced a set of unique transcripts divided into singleton ESTs (sESTs) and tentative consensus sequences (TCs), grouped by clusters. Each cluster comprises TCs sharing at least 90% of identity within a 100 nucleotide window.
BLAST similarity searches (e-value 1e10-3) against the UniProtKB database (version 13.3)  were performed to assign a biological function to the unique transcripts.
To assess the relative abundance of gene transcripts among cDNA samples we applied the statistical R test . All TCs with R>8 (true positive rate of ~98%) and with a minimum 3-fold EST number difference in at least one sample out of the four sequence sets, were considered as differentially expressed.
Hierarchical clustering analysis (HCA) and principal component analysis (PCA) of the data were performed using GeneSpring version 7.3 (Agilent, Santa Clara, CA, USA).
FA carried out RNA extractions and cDNA synthesis. ND performed sequence alignment, assembling, annotation and database construction. LT participated in data mining and design of some figures. MS provided samples and contributed in biochemical inferences. RR supervised the work of FA. MP participated in cDNA synthesis. GG participated in design, drafting and editing of the manuscript. MLC supervised data analysis process and database construction. LB conceived of the study, and participated in its design and coordination. GP conceived of the study, and participated in its design and coordination.
All authors read and approved the final manuscript.
List of TCs. the data provided include the list of TCs. For each TC, ESTs contribution per genotype and per developing stage is also provided.
The work has been supported by the Italian Ministry of Research, Project FISR "Improving flavour and nutritional properties of plant food after first and second transformation", Activity 2.1 'Identification of sequences involved in the synthesis and degradation of secoiridoids in olive fruits' and Project FIRB "Parallelomics". The Authors are very grateful to Dr. Giorgio Pannelli for providing the olive fruit samples of the Olive Cultivar Collection held by the CRA-OLI (Collececco, Spoleto).