Generation and assembly of transcript reads
A normalized cDNA sample pool from chickpea cultivar ICC 4958, prepared from 22 tissues representing different developmental stages (embryo, shoots, roots, leaves, apical meristem, buds, flowers, young pods) of the plant, as well as challenged by abiotic stresses such as drought and salinity (details mentioned under ‘ Experimental procedure ’), was sequenced using the Roche/454 platform. A single sequencing run produced a total of 435 018 STRs with an average sequence length of 216 bp ().
Figure 1 Read length distribution of Roche/454 reads and ESTs before and after assembly. Read size of Roche/454 sequences ranged from 50 bp to a maximum of 300 bp, with the highest number of reads having read size between 201 and 250 bp. Read size of high-quality (more ...)
All 435 018 Roche/454 STRs were assembled using CAP3 (Huang and Madan, 1999
). Post-assembly trimming of repetitive and poly-A sequence was performed using custom Perl scripts. Assembly of 435 018 Roche/454 reads produced 44 852 contigs and 87 806 singletons. Around 1704 contigs containing only two reads with zero read coverage variation were categorized as ‘high-confidence singletons’, because they might informatically represent unique genes expressed at low levels similar to singletons. The length of assembled contigs ranged from 159 to 650 bp with an average of 550 bp (). A maximum number of contigs had a size range of 250–450 bp. Overall, the size of the contigs ranged from 100 to 1250 bp.
Assembly of Roche/454 STRs and Sanger ESTs together provided 44 845 contigs and 58 370 singletons, including 1679 ‘high-confidence singletons’. Thus, a total of 103 215 tentative unique sequences (TUSs) have been defined and will be referred to as the ‘chickpea transcriptome assembly’ (CaTA). The size of TUSs ranged from 54 to 3346 bp, with an average read length of 459 bp (). The highest number of contigs (19 901) had a size range of 250–450 bp. Subsequently, this 103 215 TUS dataset was used for analysing the chickpea transcriptome for both gene structures and functions. Based on the CAP3 results it was observed that of the 21 491 Sanger ESTs that were used for analysis, 15 905 (74.1%) had similarity with 454 STRs while 5586 (25.9%) Sanger ESTs did not show any match and hence remained as singletons.
Analysis with chickpea genomic survey sequences (GSSs)
The 103 215 TUSs were compared with 48 796 chickpea GSSs (NCBI, 20 October 2009). However, only 8218 (7.96%) TUSs showed significant matches to 4641 (9.51%) chickpea GSSs (≤1E-10, minimum query length of 70, and 80% identity). This indicates that the matched GSSs may be derived from coding regions (which may also include intronic or other noncoding regions).
Comparison with the Medicago genome
As the closest legume model for chickpea is Medicago truncatula
, we aligned the TUSs with the Medicago
genome (Mt 3.5.1; http://www.medicagohapmap.org/?genome
) to investigate gene coverage and gene structures. Taking into account the estimated time of divergence between chickpea and Medicago
, as well as the error-prone nature of EST data, we used the HMM-based alignment program Exonerate (Slater and Birney, 2005
), with thresholds requiring a minimum per cent identity of 75, a maximum intron length of 5000 bp, and retaining up to 10 alignments with Exonerate scores at least 50% as high as the top-scoring match. Out of 103 215 TUSs, 42 141 (40.8%) of the TUSs aligned with the Medicago
genome, intersecting 14 580 predicted Medicago
genes (Dataset S1
). The alignments were also used to predict 39 281 splice sites in 20 137 of the TUS alignments and, furthermore, to predict intron-spanning primer sets. These alignments and primer sets are visible as a GBrowse track at the Legume Information System (LIS), at http://medtr.comparative-legumes.org/gb2/gbrowse/3.5.1/
. The counts of best TUS alignments to Medicago
chromosomes 1–8, respectively, were 4964, 4829, 5918, 6507, 6684, 1395, 5371 and 4201. The low count of alignments on chromosome 6 is noteworthy though not surprising, as this chromosome is known to be short and unusually repeat-dense (Cannon et al., 2006
Functional annotation, categorization according to Gene Ontology (GO) descriptions
Comparison of the TUSs against the sequences of UniProt database (Uniref50) showed that 60 330 (58.45%) of TUSs had similarity (Dataset S2
). At a threshold of ≤1E−10, functional annotations could be retrieved only for 49 437 TUSs (47.8%). These were functionally categorized based on GO descriptions. As a result, 20 634 (19.9%) TUSs were assigned to three principal categories: molecular function (10 963 TUSs), biological process (8099 TUSs) and cellular component (6662 TUSs). The highest number of TUSs fell into metabolic process (5631 TUSs, 28.19%), followed by cell part (6505 TUSs, 47.12%), binding (7714 TUSs, 46.35%), catalytic activity (6310 TUSs, 37.92%), cellular process (5517 TUSs, 27.62%) and organelle (3889 TUSs, 28.17%) subcategories ().
Figure 2 Functional categorization of chickpea TUSs. Chickpea TUSs (≤1E-10) were categorized hierarchically according to three principal gene ontologies, viz. biological processes, molecular functions and cellular components. Binding (46.35%) and catalytic (more ...)
Gene ontology classifications were also used to identify the genes related to stress responses. A large number of TUSs (1456; 7.29%) was found under the ‘response to stimulus’ subcategory. Additionally, Enzyme Commission (EC) IDs were retrieved for chickpea TUSs with a maximum number belonging to the ‘transferases’ (728) enzyme class, 671 to ‘hydrolases’ and 474 to ‘oxido-reductases’.
Transcription factors (TFs) were identified from 49 437 TUSs based on conserved domains. TFs may be classified based on their (i) mechanistic, (ii) structural and (iii) functional properties. Within the mechanistic class, ubiquitous transcription factors such as TFIIB (six TUSs), TFIID (six TUSs), TFIIA (two TUSs) and TFIIE (one TUS) were identified. Structure-based classification is based on tertiary structures of DNA-binding domains, which are grouped under five super-classes which comprise various TF families (). A total of 498 TFs were identified in the chickpea transcriptome: 44 of basic-helix-loop-helix class, 273 of zinc-coordinating DNA-binding class, 25 of helix-turn-helix class, 57 of β-scaffold factors with minor groove contacts, and 99 belonging to uncharacterized groups or from AP2 and ARF families.
Figure 3 Transcription factors (TFs) identified by conserved domain annotation. Based on conserved domain characteristics, TUSs showing significant annotation to transcription factors were classified. Zinc fingers of alternating composition, MADS box and AP2/ERF (more ...)
Gene expression analysis under drought conditions
Illumina/Solexa sequencing was performed on drought-challenged root tissues of two parental genotypes ICC 4958 and ICC 1882, to identify drought-responsive genes. As a result, 15.66 and 22.09 million tags (36 bp) were generated for ICC 4958 and ICC 1882, respectively. These reads were aligned against the TUSs using the ‘Alpheus’ pipeline of NCGR (Miller et al., 2008
) and used to identify differentially expressed genes. Expression data were available for 60 286 TUSs. Only 44 639 TUSs had a log difference value ranging between −4.5 and +4.3. The remaining 15 647 TUSs had expression values in only one of the libraries; subsequently, fold differences could not be calculated and hence were excluded from the analysis. Of the 44 639 TUSs, nine TUSs had more than a four-fold difference, 347 had three to four-fold difference, 2504 had two to threefold, 10 055 had one to two-fold and 31 724 had less than one-fold difference, while 9199 and 6448 were expressed exclusively in ICC 4958 and ICC 1882, respectively.
With an objective to display differentially expressed genes onto pathways and to obtain an overview of genes affected in response to drought in chickpea, the MapMan 3.0.0 tool was used on 44 639 genes for which differential expression values were available. The annotation tool ‘Mercator’ (http://mapman.gabipd.org/web/guest/app/mercator
) allowed the assignment of 103 200 of 103 215 TUSs that were submitted, into a total of 36 functional classes, referred as BINs (Thimm et al., 2004
; Usadel et al., 2009
). Of these, 77 143 were classified as unknown or not assigned, while 26 057 were identified as belonging to known metabolic pathways or large enzyme families. The mapping file generated by the ‘Mercator’ pipeline was used for assigning differentially expressed chickpea TUSs obtained by comparing against five different databases (described in the Experimental procedure section).
The resulting mapping file was used to map the drought-responsive genes onto various pathways using the Image annotator module of the MapMan application. This allowed us to explore gene categories that are activated during drought response with more emphasis on those related to energy metabolism, secondary metabolism, transcription regulators and stress responses that are well documented to be responsive to wide-array of stresses. A total of 2974 TUSs [2860 TUSs which had greater than or equal to two-fold expression variation and also 116 TUSs with significant differential expression (R
> 6) (Stekel et al., 2000
) excluding two common TUSs (Dataset S3
)] were submitted to MapMan.
The overview map showed that a total of 2823 of 2974 differentially expressed TUSs/genes were mapped under 31 of 36 BINs (, Dataset S4
). While the majority of genes (1926 TUSs) were grouped in BIN 35 (‘not assigned’ category), the remaining 897 genes were assigned to 30 BINs. Of these, 583 (71.7%) genes belonged to six BINs and had higher proportion of genes comparatively, which include protein metabolism (BIN 29, 216 TUSs), RNA metabolism (BIN 27, 110 TUSs), miscellaneous enzyme families (BIN 26, 82 TUSs), transport (BIN 34, 80 TUSs), signalling (BIN 30, 59 TUSs) and cell (BIN 31, 39 TUSs).
Figure 4 Overview of differentially regulated genes involved in different metabolic processes. Gene transcripts that are induced or repressed as a result of drought stress are shown in red and green colours respectively as shown in the colour bar ranging from (more ...)
Based on the assigned genes to different BINs, an attempt was made to understand differentially expressed genes of key metabolic reactions that often modulate normal cellular functioning during stress. As a result, energy metabolism [glycolysis, tri-carboxylic acid (TCA) cycle, electron transport chain (ETC) reactions], secondary metabolism, transcription factors and stress-related categories were analysed in detail, as follows:
Changes in the magnitudes of enzymes and metabolites of carbon and energy cycles have been identified to play crucial roles in cellular metabolism. Variations in glycolysis, TCA and ETC. cycles in response to stress are well documented for plants earlier (Apel and Hirt, 2004
). The possible activation of respiratory activities in mitochondria and ATPs released during these reactions help to initiate tolerance events under hypoxic conditions (Kreuzwieser et al., 2009
). The genes (25 TUSs) related to energy metabolism belonging to BINs 4, 5, 6, 8 and 9 were identified in the chickpea transcriptome (Dataset S4
). Genes coding for phospho fructokinase, pyruvate kinase, lactate dehydrogenase, pyruvate decarboxylase, etc. were induced (Figure S1a
Flavonoids and isoflavonoids are known to play a significant role in plant defence responses to pathogens (Dixon and Steele, 1999
; Uppalapati et al., 2009
). Several genes (BIN 16) related to secondary metabolism such as phenylpropanoids (11 TUSs), terpenoids (three TUSs) and flavanoids (seven TUSs), which are expressed mainly in leguminous roots, were observed in response to stress, and hence, we expected their expression to be affected. Several genes involved in phenylpropanoid metabolism, such as phenyl ammonia lyase (PAL
), coumarate:CoA ligase, cinnamoyl-CoA reductase family, putative/4-coumaroyl-CoA synthase and mannitol dehydrogenase, were observed in the study. Similarly, camelliol C synthase (CAMS1
), terpene cyclase/mutase, beta-amyrin synthase of terpenoid and flavanoid pathways were also identified. All the aforementioned genes showed induction in sensitive ICC 1882 library as compared to the tolerant ICC 4958 library. Many other genes identified in this category are listed in Dataset S4
Expression of stress-responsive genes was shown to be regulated by two or more specific transcription factors present in the cell prior to stress (Srinivas and Swamynathan, 1996
). Many genes (75 TUSs) assigned to transcription factors (BIN 27.3) (of several different TF classes) were identified and mapped. For instance, genes coding for Zinc finger family protein, MYB domain containing family, WRKY, auxin response factor, pentatricopeptide-repeat containing protein, bZIP were identified. A homologue to the early response to dehydration (ERD)-related protein of Arabidopsis
was highly expressed in the drought susceptible library (562/million tags) as compared to the tolerant library (375/million tags). Genes belonging to this category showed highly individual responses in drought sensitive as well as tolerant libraries. However, overall, a clear trend in expression of all TFs together was not observed (Dataset S4
Molecular responses to stress factors such as heat shock, anaerobiosis, plant pathogens, oxygen free radicals, heavy metals, water stress and chilling in plants have been assessed in various plant species (Matters and Scandalios, 1986
). In our study, 278 TUSs with stress-related annotations (either biotic or abiotic) were grouped (in BINs 10, 17, 20, 21, 26, 29 and 30). These included genes involved in red-ox reactions, cell wall breakdown, cell signalling and hormone signalling (, Dataset S4
and Figure S1c
). About 66% of TUSs (17/26) involved in abiotic stress (BIN 20.2) were found to be induced in tolerant ICC 4958 and repressed in sensitive ICC 1882, while the remaining 34% (9/26) TUSs showed the inverse pattern. The induced genes in ICC 4958 include abscisic acid-responsive protein (ABR 17, ABR 18
), DNAJ heat shock protein, responsive to desiccation 22 (RD22
), early ERD-related protein and various heat shock proteins (HSP 70, HSP 91
). Interestingly, all genes encoding to ABRs were specifically induced in sensitive ICC 1882 library. These results are strongly conserved and are evidenced in earlier stress response studies (Swamy and Smith, 1999
Utilization of chickpea TUSs for development of molecular markers
With an objective to facilitate chickpea genetics and breeding, the TUSs were used for identification and development of several kinds of molecular markers, as described latter.
Identification and development of SSR markers
All TUSs (103 215) were mined for the presence of SSRs with the MI
) tool (Thiel et al., 2003
), giving 26 252 SSRs in 23 330 TUSs (). The most frequently occurring di-nucleotide motifs were AG followed by TC and CT, whereas among tri-nucleotides TTC is the highest.
SSR identification using MISA search tool
With an objective to convert the identified SSRs into potential genetic markers, an attempt was made to design the primer pairs for the TUSs containing SSR(s). Primer pairs could be designed for 3172 (12.08%) SSRs. Excluding the primer pairs for mono-nucleotide SSR motifs and for those yielding putative products of <100 bp, 807 primer pairs were considered suitable. All 807 TUSs were compared with the source sequences of SSR markers developed earlier (Hüttel et al., 1999
; Winter et al., 1999
; Lichtenzveig et al., 2005
; Sethy et al., 2006
; Nayak et al., 2010
) using BLASTN (Altschul et al., 1990
) at ≤1E-05, query coverage of ≥30 and per cent identity of >90, giving a set of 728 nonredundant primer pairs (Dataset S5
To validate the newly designed EST-SSRs, a set of 80 primers (i.e. 16 from each informative SSR classes such as di-, tri-, tetra-, penta- and hexa-nucleotide) were randomly selected for synthesis and analysis. Of the 80 primer pairs that were screened, 71 showed amplification on five parental genotypes (ICC 4958, PI 489777, ICC 1882, ICC 283 and ICC 8261) of three chickpea mapping populations. While 42 SSR markers showed ≥2 alleles with a polymorphic information content (PIC) value ranging from 0.20 to 0.67 with a mean of 0.35, the remaining 29 markers amplified only one allele across five genotypes surveyed.
Conserved orthologous set (COS) markers
As mentioned earlier, 638 chickpea TUSs showed significant similarity with ESTs of all the six legume species (≤1E-30). Only 556 had an identical functional annotation, based on BLASTX (Altschul, 1993
) (UniProt database, ≤1E-05) and across the legume species. Of the 556 TUSs, 90 TUSs were identified as potential paralogs and therefore a set of 466 TUSs were considered as putative orthologs. As another set of 1440 COS genes have already been developed at UC-Davis, USA (Douglas R. Cook, personal communication), the identified set of 466 TUSs in this study was analysed with 1440 COS genes. As a result, at ≤1E-05 and query coverage length of ≥25, 79 TUSs showed similarity with COS genes of UC-Davis and were subsequently excluded. Finally, the primer pairs were designed for a total of 387 nonredundant COS genes (Dataset S6
Intron-spanning region (ISR) markers
Using the alignments of chickpea with the Medicago
genome (Mt 3.5.1), ISR candidate markers were designed in silico
for chickpea. These markers were designed from sequences having a single best match to the reference. A total of 2088 ISR primer pairs were designed across whole genome of chickpea (Dataset S7
SNP identification based on Illumina/Solexa sequence reads
The utility of the TUSs was also demonstrated for SNP discovery. For this purpose, Illumina/Solexa sequences of ICC 4958 and ICC 1882 were aligned against TUSs using the ‘Alpheus’ program of NCGR (Miller et al., 2008
). A total of 26 082 potential nucleotide variants (transitions, transversions and indels) were identified between these two genotypes, using requirements of allele frequency (i.e. ratio of alleles at one locus observed among reads from another genotype) >0.1 and read depth ≥3 and <500 (). The number of likely, well-supported SNPs (e.g. 1503 SNPs with allele frequency ≥0.9 and coverage ≥3) was much smaller, consistent with generally low ranges of polymorphism in chickpea.
Number of SNPs classified based on allele frequency and read depth