The relative effects of drought and salinity on the growth pattern were observed in all the objectives of the study. The growth of the drought-tolerant genotype (ICC 4958) was observed to be better compared to drought-sensitive genotype (ICC 1882) in all the cases of drought stress implications. Similarly, the salinity-sensitive genotype (ICCV 2) exhibited a relatively more stunted growth pattern than salinity-tolerant genotype (JG 11) when these genotypes were exposed to salinity stress. It was observed that the genotype JG 11 withstood salt stress (80 mM) to a greater extent in comparison to ICCV 2. However, when compared to the control set of plants in each case, growth of stressed plants was decreased. Root tissues from both drought and saline stressed plants were harvested for total RNA extraction and subsequent cDNA library construction.
Generation of drought- and salinity-responsive ESTs
A set of four genotypes i.e. ICC 4958 (drought-tolerant), ICC 1882 (drought-sensitive), JG 11 (salinity-tolerant) and ICCV 2 (salinity-sensitive) that represent parents of two mapping populations i.e. ICC 4958 × ICC 1882 and JG 11 × ICCV 2 segregating for tolerance to drought and salinity, respectively, were employed for generating ESTs. A total of 10 cDNA libraries including 8 from drought challenged tissues and 2 from salinity challenged tissues were generated. By using the Sanger sequencing approach, 5,982 and 5,922 ESTs were generated from ICC 4958 and ICC 1882 cDNA libraries. Similarly, 3,798 and 4,460 ESTs were generated from cDNA libraries derived from salinity stressed root tissues of JG 11 and ICCV 2, respectively. Details of EST generation from different cDNA libraries are given in Figure . In brief, a total of 20,162 ESTs were generated and after a stringent screening for shorter and poor quality sequences, 18,435 high quality ESTs were obtained. The average length of these high quality ESTs was 569 bp. All EST sequences were deposited in the dbEST division of GenBank (GR390696-GR410171 and GR420430-GR421115).
EST assembly
Assembly analyses was done for different datasets of ESTs to define the unigenes for (a) drought-responsive ESTs, (b) salinity-responsive ESTs, (c) drought- and salinity-responsive ESTs, and (d) the entire set of chickpea ESTs including those from the public domain. These unigene (UG) sets are referred to UG-I, UG-II, UG-III and UG-IV, respectively. The UG-I comprised of 4,558 unigenes (763 contigs and 3,795 singletons) based on cluster analysis of 10,996 high quality drought-responsive ESTs. Likewise, the UG-II included 2,595 unigenes (945 contigs and 1,650 singletons) after cluster analysis of 7,439 high quality salinity-responsive ESTs. Based on the clustering of all the18, 435 high quality ESTs generated in this study, the UG-III was defined with 6,404 unigenes (1,590 contigs and 4,814 singletons). Detailed cluster analysis of the 18,435 ESTs identified 1,855 (10.06%) unique to ICC 4958, 1,606 (8.71%) to ICC 1882, 967 (5.24%) to JG 11 and 386 (2.09%) to ICCV 2. Inclusion of 7,097 ESTs available in the public domain at the time of analysis (as of March 2008), the entire set of chickpea ESTs (including 18,435 high quality ESTs generated in the present study and 7,097 available in public domain), the UG-IV was defined with 9,569 unigenes (2,431 contigs and 7,138 singletons). The assembly size in terms of number of ESTs aligned in each contig varied from 2 EST members (587 contigs) to 874 EST members (1 contig) with an average of 8.56 (Figure ).
Sequence annotation
Sequence annotation was performed for all four unigene datasets (i.e. UG-I, UG-II, UG-III and UG-IV) using standalone BLASTN and BLASTX algorithms. For BLASTN analysis, significant similarity was considered at threshold E-value of ≤1E-05. BLASTN similarity search for all the four unigene datasets was carried out against ESTs of closely related legume and model plant species. For instance, analysis of UG-III unigenes showed high similarity to Medicago (64.5%), followed by soybean (62.3%), Lotus (50.6%), poplar (42.8%), Arabidopsis (40.9%), groundnut (29.7%), and least to rice (27.0%). The BLASTN similarity results across different plant species for UG-III found 4,654 (72.6%) unigenes with significant similarity to ESTs of atleast one analysed legume species, 3,117 (48.6%) unigenes with significant similarity to ESTs of atleast one of the analysed model plant species and overall 4,719 (73.6%) unigenes with significant similarity to ESTs of atleast one of the analysed plant species. In contrast, 37 (0.5%) and 36 (0.5%) unigenes did not match ESTs of any legume or model plant species respectively. Results of the detailed analyses of the four unigene sets are given in Table .
| Table 1Analysis of chickpea unigenes with related legume and plant ESTs |
BLASTX search results for all four unigene sets against the UniProt database, found varying numbers of unigenes from different unigene sets with significant similarity at different thresholds. For UG-III (6,404), for instance, 2,965 unigenes had significant similarity against the UniProt database at E-value ≤1E-05, 2,538 unigenes at E-value ≤1E-08 and 2,333 unigenes at E-value ≤1E-10. Based on these findings, for further analyses of the BLASTX hits in this study, a threshold E-value ≤1E-05 was considered. Using this criterion, UG-I, UG-II, UG-III and UG-IV had significant similarity to 1,912 (41.94%), 1,476 (56.87%), 2,965 (46.29%), and 4,657 (48.66%) unigenes, respectively (Figure ). Details of BLASTN and BLASTX analyses against closely related legume and model plant EST databases and the Uniprot database for all the four unigene sets are provided in Additional files
1,
2,
3 and
4.
Functional categorization
Transcripts with significant BLASTX homology (≤1E-05) to annotated ESTs were further classified into functional categories. As expected only a small percentage of unigenes (~35.2%) could be thus classified. The Gene Ontology annotation of transcripts helped classify functional descriptions into three principal ontologies: molecular function, biological process and cellular component. Like in earlier studies of this nature [
11], one gene product could be assigned to more than one multiple parental categories. Thus, the total number of GO mappings in each of the three ontologies exceeded the number of unigenes analysed. Details on GO analyses for all four unigene sets are provided in Additional files
5,
6,
7 and
8. As an example, GO analysis has been described below for one unigene set (UG-III).
The GO analysis of 2,965 (46.3%) unigenes from UG-III set (those with a significant hit in BLASTX analysis) revealed that 2,071 (32.3%) unigenes had GO descriptions for gene products: 1,684 were categorised under biological process, 1,586 under cellular component and 1,662 under molecular function. Of the functionally categorised unigenes, the largest proportion fell into cell part (1,528) followed by cellular process (1,284), nucleotide binding (1,171), metabolic process (1,140), organelle (1,048), catalytic activity (876) and response to stimulus (371) categories. Unigenes with significant similarity that could not be classified into any of the categories were grouped as 'unclassified'. Unigenes coding for housekeeping functions such as cellular process and metabolic process in the biological process ontology, cell part and organelle part in the cellular component ontology, and genes with binding and catalytic activity in molecular function category are over-represented in similar proportion in all unigene datasets (Figure ). Enzyme Commission IDs were also retrieved from the UniProt database, to get an overview of the distribution of transcripts putatively annotated to be enzymes. The three largest groups of enzyme classes included transferases, hydrolases and oxidoreductases with 208 (27.9%), 206 (27.7%) and 183 (24.6%), respectively. The distribution pattern of enzymes was observed to be similar across all four unigene datasets.
Correlated gene expression pattern analysis
To understand the patterns of gene expression and correlations between the 10 libraries from which ESTs were generated, the contigs generated in UG-III set were analyzed using the R Stekel statistical test [
12] of IDEG.6 tool to identify the most significant expression and large differences in the abundances of ESTs in each contig. Of 1,590 total contigs in this dataset, only 105 returned a true positive significance (R>8) and were used for hierarchical clustering analysis. The expression level of each gene/contig (relative EST counts across all the libraries) has been graphically represented by a colour/heat map (Figure ).
The expression profile of the 105 contigs with significant expression and their derivative libraries were classified into four major clusters (I-IV, represented in different colour bars) with the minimum similarity of 0.5 using HCE version 2.0 beta web tool. On the basis of their high expression level in a specific library, cluster II and III were further sub-clustered (IIa, IIb, IIc, IIIa IIIb, IIIc and IIId) that contained 3 (subcluster IIa) to 23 contigs (subcluster IIIa and IIId) representing different genes (Additional file
9). The cluster analysis showed higher number of differentially expressed genes in salinity libraries as compared to drought libraries. Furthermore as suggested by Mantri and colleagues [
13], more transcripts were observed in severe stress-challenged libraries. In general, the cluster analysis revealed high expression of genes related to biotic stress signaling (20.9%), drought response (7.6%), transporter proteins (6.6%), reactive oxygen species (ROS) scavenging (4.7%) and transcriptional, translational regulation (6.6%) and uncharacterised proteins (7.6%) categories.
In addition, the clustering of different libraries was also analysed. The grouping/clustering of the 10 libraries was found consistent with their origin and genotypes. For instance, libraries were clustered into two main clades/clusters according to drought and salinity treatments. ICC 4958_Drought_Field and ICC 1882_Drought_Field libraries were grouped into the first clade, while the remaining libraries were grouped into second clade. The second clade was further divided into 2 clusters with both consisting of homogeneously segregating drought related libraries, while JG 11_Salinity and ICCV 2_Salinity cDNA libraries clustered heterogeneously within the hierarchical cluster. In both clades, libraries generated from similar conditions tended to cluster together, regardless of the genotype from which they derived, thus reflecting their relationship.
Development of functional markers
In recent years, molecular markers have been developed from genes/ESTs and are popularly referred to as genic molecular markers (GMMs) [
14] or functional markers [
15] as a putative function can be deduced for majority of such markers. Functional markers (EST-SSRs and SNPs) were identified using unigene assembly UG-IV.
Identification of genic SSRs
EST-SSR markers can assay the functional genetic variation and also exhibit more transferability across taxonomic classes than genomic SSRs [
16,
17]. A total of 9,569 chickpea unigenes compiled in the present study (UG-IV) were analyzed using
MISA (
MIcro
SAtellite) tool [
18] for the identification of SSRs. As a result, a total of 3,728 SSRs were identified in 2,029 (21.2%) unigenes at the frequency of 1/707 bp in coding regions. Majority of SSRs, however, were monomeric repeats (1,793). Among other classes of SSRs, 126 dimeric SSRs, 110 trimeric SSRs, 7 tetrameric SSRs, 8 pentameric SSRs and 5 hexameric SSRs were also present (Table ). Out of 3,728 SSRs, primer pairs were generated for 1,222 SSRs. After excluding the primers for monomeric repeats, a set of 177 primer pairs were considered. Considering minimum repeat number criteria such as six for di- and tri- nucleotides and four for tetra-, penta- and hexa-nucleotides, a sub-set of primer pairs were developed for only 77 SSRs.
| Table 2Features of SSRs identified in the chickpea unigenes |
The potential of 77 SSR markers for detection of polymorphism was assessed on a set of 24 chickpea genotypes. Out of 77 primer pairs, 50 primer pairs yielded scorable amplicons. These SSR markers provided 1 (ICCeM0004, ICCeM0031, ICCeM0042, ICCeM0059 and ICCeM0073) to 12 (ICCeM0013, ICCeM0054 and ICCeM0055) alleles with an average of 4.6 alleles per marker. Only 45 primer pairs had more than one allele in the genotypes examined. The polymorphic markers showed a PIC value in the range of 0.08 to 0.86 with an average of 0.43 (Table ).
| Table 3Diversity features of polymorphic EST-SSR markers |
Identification of SNPs
As large number of ESTs were generated from four genotypes, these EST datasets were analysed for identification of SNPs. SNP discovery was performed on contigs/multiple sequence alignments (MSA) containing two or more ESTs from more than one genotype. Out of 2,431 contigs (UG-IV), SNPs were detected in 2,047 contigs, while 384 did not have any SNP. A total of 36,086 SNPs were identified in 2,047 contigs. While 14,681 (40%) SNPs were identified in 1,305 contigs with 2-4 ESTs, the remaining 21,405 SNPs were identified in 742 contigs composed of 5 or more ESTs.
In order to perform cost-effective and robust genotyping assay for the 21,405 SNPs detected in 742 contigs, attempts were made to identify the restriction enzymes that can be used to assay SNPs via cleaved amplified polymorphic sequence (CAPS) assays. The analysis suggested that 7,884 SNPs could be assayed in 240 contigs by CAPS methods (Table ).
| Table 4Identification of SNPs and CAPS based on the entire set of chickpea ESTs |