Groundnut (Arachis hypogaea L.), a self-pollinated legume is an important crop cultivated in 24 million ha world over for extraction of edible oil and food uses. The kernels are rich in oil (48–50%) and protein (25–28%), and are source of several vitamins, minerals, antioxidants, biologically active polyphenols, flavonoids, and isoflavones. Improved varieties of groundnut with high yield potential were developed and released for cultivation world over. The improved varieties belong to different maturity durations and possess resistance to diseases, tolerance to drought, enhanced oil content, and improved quality traits for food uses. Conventional breeding procedures along with the tools for phenotyping were largely used in groundnut improvement programs. Mutations were used to induce variability and wide hybridization was attempted to tap variability from wild species. Low genetic variability has been a bottleneck for groundnut improvement. The vast potential of wild species, reservoir of new alleles remains under-utilized. Development of linkage maps of groundnut during the last decade was followed by identification of markers and quantitative trait loci for the target traits. Consequently, the last decade has witnessed the deployment of molecular breeding approaches to complement the ongoing groundnut improvement programs in USA, China, India, and Japan. The other potential advantages of molecular breeding are the feasibility to target multiple traits for improvement and provide tools to tap new alleles from wild species. The first groundnut variety developed through marker-assisted back-crossing is a root-knot nematode-resistant variety, NemaTAM in USA. The uptake of molecular breeding approaches in groundnut improvement programs by NARS partners in India and many African countries is slow or needs to be initiated in part due to inadequate infrastructure, high genotyping costs, and human capacities. Availability of draft genome sequence for diploid (AA and BB) and tetraploid, AABB genome species of Arachis in coming years is expected to bring low-cost genotyping to the groundnut community that will facilitate use of modern genetics and breeding approaches such as genome-wide association studies for trait mapping and genomic selection for crop improvement.
Arachis hypogaea; genetic variability; pedigree; disease resistance; phenotyping; QTLs; molecular breeding; genomic selection
Single-nucleotide polymorphisms (SNPs, >2000) were discovered by using RNA-seq and allele-specific sequencing approaches in pigeonpea (Cajanus cajan). For making the SNP genotyping cost-effective, successful competitive allele-specific polymerase chain reaction (KASPar) assays were developed for 1616 SNPs and referred to as PKAMs (pigeonpea KASPar assay markers). Screening of PKAMs on 24 genotypes [23 from cultivated species and 1 wild species (Cajanus scarabaeoides)] defined a set of 1154 polymorphic markers (77.4%) with a polymorphism information content (PIC) value from 0.04 to 0.38. One thousand and ninety-four PKAMs showed polymorphisms between parental lines of the reference mapping population (C. cajan ICP 28 × C. scarabaeoides ICPW 94). By using high-quality marker genotyping data on 167 F2 lines from the population, a comprehensive genetic map comprising 875 PKAMs with an average inter-marker distance of 1.11 cM was developed. Previously mapped 35 simple sequence repeat markers were integrated into the PKAM map and an integrated genetic map of 996.21 cM was constructed. Mapped PKAMs showed a higher degree of synteny with the genome of Glycine max followed by Medicago truncatula and Lotus japonicus and least with Vigna unguiculata. These PKAMs will be useful for genetics research and breeding applications in pigeonpea and for utilizing genome information from other legume species.
pigeonpea; SNP; linkage map; comparative genomics; molecular breeding
Pigeonpea (Cajanus cajan L.) is an important food legume crop of rainfed agriculture. Owing to exposure of the crop to a number of biotic and abiotic stresses, the crop productivity has remained stagnant for almost last five decades at ca. 750 kg/ha. The availability of a cytoplasmic male sterility (CMS) system has facilitated the development and release of hybrids which are expected to enhance the productivity of pigeonpea. Recent advances in genomics and molecular breeding such as marker-assisted selection (MAS) offer the possibility to accelerate hybrid breeding. Molecular markers and genetic maps are pre-requisites for deploying MAS in breeding. However, in the case of pigeonpea, only one inter- and two intra-specific genetic maps are available so far. Here, four new intra-specific genetic maps comprising 59–140 simple sequence repeat (SSR) loci with map lengths ranging from 586.9 to 881.6 cM have been constructed. Using these four genetic maps together with two recently published intra-specific genetic maps, a consensus map was constructed, comprising of 339 SSR loci spanning a distance of 1,059 cM. Furthermore, quantitative trait loci (QTL) analysis for fertility restoration (Rf) conducted in three mapping populations identified four major QTLs explaining phenotypic variances up to 24 %. To the best of our knowledge, this is the first report on construction of a consensus genetic map in pigeonpea and on the identification of QTLs for fertility restoration. The developed consensus genetic map should serve as a reference for developing new genetic maps as well as correlating with the physical map in pigeonpea to be developed in near future. The availability of more informative markers in the bins harbouring QTLs for sterility mosaic disease (SMD) and Rf will facilitate the selection of the most suitable markers for genetic analysis and molecular breeding applications in pigeonpea.
Electronic supplementary material
The online version of this article (doi:10.1007/s00122-012-1916-5) contains supplementary material, which is available to authorized users.
Drought is one of the most serious production constraint for world agriculture and is projected to worsen with anticipated climate change. Inter-disciplinary scientists have been trying to understand and dissect the mechanisms of plant tolerance to drought stress using a variety of approaches; however, success has been limited. Modern genomics and genetic approaches coupled with advances in precise phenotyping and breeding methodologies are expected to more effectively unravel the genes and metabolic pathways that confer drought tolerance in crops. This article discusses the most recent advances in plant physiology for precision phenotyping of drought response, a vital step before implementing the genetic and molecular-physiological strategies to unravel the complex multilayered drought tolerance mechanism and further exploration using molecular breeding approaches for crop improvement. Emphasis has been given to molecular dissection of drought tolerance by QTL or gene discovery through linkage and association mapping, QTL cloning, candidate gene identification, transcriptomics and functional genomics. Molecular breeding approaches such as marker-assisted backcrossing, marker-assisted recurrent selection and genome-wide selection have been suggested to be integrated in crop improvement strategies to develop drought-tolerant cultivars that will enhance food security in the context of a changing and more variable climate.
A comprehensive transcriptome assembly for pigeonpea has been developed by analyzing 128.9 million short Illumina GA IIx single end reads, 2.19 million single end FLX/454 reads, and 18 353 Sanger expressed sequenced tags from more than 16 genotypes. The resultant transcriptome assembly, referred to as CcTA v2, comprised 21 434 transcript assembly contigs (TACs) with an N50 of 1510 bp, the largest one being ∼8 kb. Of the 21 434 TACs, 16 622 (77.5%) could be mapped on to the soybean genome build 1.0.9 under fairly stringent alignment parameters. Based on knowledge of intron junctions, 10 009 primer pairs were designed from 5033 TACs for amplifying intron spanning regions (ISRs). By using in silico mapping of BAC-end-derived SSR loci of pigeonpea on the soybean genome as a reference, putative mapping positions at the chromosome level were predicted for 6284 ISR markers, covering all 11 pigeonpea chromosomes. A subset of 128 ISR markers were analyzed on a set of eight genotypes. While 116 markers were validated, 70 markers showed one to three alleles, with an average of 0.16 polymorphism information content (PIC) value. In summary, the CcTA v2 transcript assembly and ISR markers will serve as a useful resource to accelerate genetic research and breeding applications in pigeonpea.
Cajanus cajan (L.); second-generation sequencing; transcriptome assembly; intron spanning region (ISR) markers
Single feature polymorphisms (SFPs) are microarray-based molecular markers that are detected by hybridization of DNA or cRNA to oligonucleotide probes. With an objective to identify the potential polymorphic markers for drought tolerance in pigeonpea [Cajanus cajan (L.) Millspaugh], an important legume crop for the semi-arid tropics but deficient in genomic resources, Affymetrix Genome Arrays of soybean (Glycine max), a closely related species of pigeonpea were used on cRNA of six parental genotypes of three mapping populations of pigeonpea segregating for agronomic traits like drought tolerance and pod borer (Helicoverpa armigiera) resistance. By using robustified projection pursuit method on 15 pair-wise comparisons for the six parental genotypes, 5,692 SFPs were identified. Number of SFPs varied from 780 (ICPL 8755 × ICPL 227) to 854 (ICPL 151 × ICPL 87) per parental combination of the mapping populations. Randomly selected 179 SFPs were used for validation by Sanger sequencing and good quality sequence data were obtained for 99 genes of which 75 genes showed sequence polymorphisms. While associating the sequence polymorphisms with SFPs detected, true positives were observed for 52.6% SFPs detected. In terms of parental combinations of the mapping populations, occurrence of true positives was 34.48% for ICPL 151 × ICPL 87, 41.86% for ICPL 8755 × ICPL 227, and 81.58% for ICP 28 × ICPW 94. In addition, a set of 139 candidate genes that may be associated with drought tolerance has been identified based on gene ontology analysis of the homologous pigeonpea genes to the soybean genes that detected SFPs between the parents of the mapping populations segregating for drought tolerance.
Electronic supplementary material
The online version of this article (doi:10.1007/s10142-011-0227-2) contains supplementary material, which is available to authorized users.
Single feature polymorphism; Microarray; Robustified projection pursuit; Molecular markers; Legumes
Chickpea (Cicer arietinum L.) is an important grain-legume crop that is mainly grown in rainfed areas, where terminal drought is a major constraint to its productivity. We generated expressed sequence tags (ESTs) by suppression subtraction hybridization (SSH) to identify differentially expressed genes in drought-tolerant and -susceptible genotypes in chickpea.
EST libraries were generated by SSH from root and shoot tissues of IC4958 (drought tolerant) and ICC 1882 (drought resistant) exposed to terminal drought conditions by the dry down method. SSH libraries were also constructed by using 2 sets of bulks prepared from the RNA of root tissues from selected recombinant inbred lines (RILs) (10 each) for the extreme high and low root biomass phenotype. A total of 3062 unigenes (638 contigs and 2424 singletons), 51.4% of which were novel in chickpea, were derived by cluster assembly and sequence alignment of 5949 ESTs. Only 2185 (71%) unigenes showed significant BLASTX similarity (<1E-06) in the NCBI non-redundant (nr) database. Gene ontology functional classification terms (BLASTX results and GO term), were retrieved for 2006 (92.0%) sequences, and 656 sequences were further annotated with 812 Enzyme Commission (EC) codes and were mapped to 108 different KEGG pathways. In addition, expression status of 830 unigenes in response to terminal drought stress was evaluated using macro-array (dot blots). The expression of few selected genes was validated by northern blotting and quantitative real-time PCR assay.
Our study compares not only genes that are up- and down-regulated in a drought-tolerant genotype under terminal drought stress and a drought susceptible genotype but also between the bulks of the selected RILs exhibiting extreme phenotypes. More than 50% of the genes identified have been shown to be associated with drought stress in chickpea for the first time. This study not only serves as resource for marker discovery, but can provide a better insight into the selection of candidate genes (both up- and downregulated) associated with drought tolerance. These results can be used to identify suitable targets for manipulating the drought-tolerance trait in chickpea.
A transcript map has been constructed by the development and integration of genic molecular markers (GMMs) including single nucleotide polymorphism (SNP), genic microsatellite or simple sequence repeat (SSR) and intron spanning region (ISR)-based markers, on an inter-specific mapping population of chickpea, the third food legume crop of the world and the first food legume crop of India. For SNP discovery through allele re-sequencing, primer pairs were designed for 688 genes/expressed sequence tags (ESTs) of chickpea and 657 genes/ESTs of closely related species of chickpea. High-quality sequence data obtained for 220 candidate genic regions on 2–20 genotypes representing 9 Cicer species provided 1,893 SNPs with an average frequency of 1/35.83 bp and 0.34 PIC (polymorphism information content) value. On an average 2.9 haplotypes were present in 220 candidate genic regions with an average haplotype diversity of 0.6326. SNP2CAPS analysis of 220 sequence alignments, as mentioned above, provided a total of 192 CAPS candidates. Experimental analysis of these 192 CAPS candidates together with 87 CAPS candidates identified earlier through in silico mining of ESTs provided scorable amplification in 173 (62.01%) cases of which predicted assays were validated in 143 (82.66%) cases (CGMM). Alignments of chickpea unigenes with Medicago truncatula genome were used to develop 121 intron spanning region (CISR) markers of which 87 yielded scorable products. In addition, optimization of 77 EST-derived SSR (ICCeM) markers provided 51 scorable markers. Screening of easily assayable 281 markers including 143 CGMMs, 87 CISRs and 51 ICCeMs on 5 parental genotypes of three mapping populations identified 104 polymorphic markers including 90 markers on the inter-specific mapping population. Sixty-two of these GMMs together with 218 earlier published markers (including 64 GMM loci) and 20 other unpublished markers could be integrated into this genetic map. A genetic map developed here, therefore, has a total of 300 loci including 126 GMM loci and spans 766.56 cM, with an average inter-marker distance of 2.55 cM. In summary, this is the first report on the development of large-scale genic markers including development of easily assayable markers and a transcript map of chickpea. These resources should be useful not only for genome analysis and genetics and breeding applications of chickpea, but also for comparative legume genomics.
Electronic supplementary material
The online version of this article (doi:10.1007/s00122-011-1556-1) contains supplementary material, which is available to authorized users.
The genus Arachis, originated in South America, is divided into nine taxonomical sections comprising of 80 species. Most of the Arachis species are diploids (2n = 2x = 20) and the tetraploid species (2n = 2x = 40) are found in sections Arachis, Extranervosae and Rhizomatosae. Diploid species have great potential to be used as resistance sources for agronomic traits like pests and diseases, drought related traits and different life cycle spans. Understanding of genetic relationships among wild species and between wild and cultivated species will be useful for enhanced utilization of wild species in improving cultivated germplasm. The present study was undertaken to evaluate genetic relationships among species (96 accessions) belonging to seven sections of Arachis by using simple sequence repeat (SSR) markers developed from Arachis hypogaea genomic library and gene sequences from related genera of Arachis.
The average transferability rate of 101 SSR markers tested to section Arachis and six other sections was 81% and 59% respectively. Five markers (IPAHM 164, IPAHM 165, IPAHM 407a, IPAHM 409, and IPAHM 659) showed 100% transferability. Cluster analysis of allelic data from a subset of 32 SSR markers on 85 wild and 11 cultivated accessions grouped accessions according to their genome composition, sections and species to which they belong. A total of 109 species specific alleles were detected in different wild species, Arachis pusilla exhibited largest number of species specific alleles (15). Based on genetic distance analysis, the A-genome accession ICG 8200 (A. duranensis) and the B-genome accession ICG 8206 (A. ipaënsis) were found most closely related to A. hypogaea.
A set of cross species and cross section transferable SSR markers has been identified that will be useful for genetic studies of wild species of Arachis, including comparative genome mapping, germplasm analysis, population genetic structure and phylogenetic inferences among species. The present study provides strong support based on both genomic and genic markers, probably for the first time, on relationships of A. monticola and A. hypogaea as well as on the most probable donor of A and B-genomes of cultivated groundnut.
Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers.
A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (≤1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries.
Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species.
Drought tolerance is a key trait for increasing and stabilizing barley productivity in dry areas worldwide. Identification of the genes responsible for drought tolerance in barley (Hordeum vulgare L.) will facilitate understanding of the molecular mechanisms of drought tolerance, and also facilitate the genetic improvement of barley through marker-assisted selection or gene transformation. To monitor the changes in gene expression at the transcriptional level in barley leaves during the reproductive stage under drought conditions, the 22K Affymetrix Barley 1 microarray was used to screen two drought-tolerant barley genotypes, Martin and Hordeum spontaneum 41-1 (HS41-1), and one drought-sensitive genotype Moroc9-75. Seventeen genes were expressed exclusively in the two drought-tolerant genotypes under drought stress, and their encoded proteins may play significant roles in enhancing drought tolerance through controlling stomatal closure via carbon metabolism (NADP malic enzyme, NADP-ME, and pyruvate dehydrogenase, PDH), synthesizing the osmoprotectant glycine-betaine (C-4 sterol methyl oxidase, CSMO), generating protectants against reactive-oxygen-species scavenging (aldehyde dehydrogenase,ALDH, ascorbate-dependent oxidoreductase, ADOR), and stabilizing membranes and proteins (heat-shock protein 17.8, HSP17.8, and dehydrin 3, DHN3). Moreover, 17 genes were abundantly expressed in Martin and HS41-1 compared with Moroc9-75 under both drought and control conditions. These genes were possibly constitutively expressed in drought-tolerant genotypes. Among them, seven known annotated genes might enhance drought tolerance through signalling [such as calcium-dependent protein kinase (CDPK) and membrane steroid binding protein (MSBP)], anti-senescence (G2 pea dark accumulated protein, GDA2), and detoxification (glutathione S-transferase, GST) pathways. In addition, 18 genes, including those encoding Δl-pyrroline-5-carboxylate synthetase (P5CS), protein phosphatase 2C-like protein (PP2C), and several chaperones, were differentially expressed in all genotypes under drought; thus they were more likely to be general drought-responsive genes in barley. These results could provide new insights into further understanding of drought-tolerance mechanisms in barley.
Barley; drought stress; drought tolerance; microarray; reproductive stage
There is a need for software scripts and modules for format parsing, data manipulation, statistical analysis and annotation especially for tasks related to marker identification from sequence data and sequence diversity analysis.
Here we present several new Perl scripts and a module for sequence data diversity analysis. To enable the use of these software with other public domain tools, we also make available PISE (Pasteur Institute Software Environment) wrappers for these Perl scripts and module. This enables the user to generate pipelines for automated analysis, since PISE is a web interface generator for bioinformatics programmes.
A new set of modules and scripts for diversity statistic calculation, format parsing and data manipulation are available with PISE wrappers that enable pipelining of these scripts with commonly used contig assembly and sequence feature prediction software, to answer specific sequence diversity related questions.
Plant genetic resources (PGR) are the basic raw materials for future genetic progress and an insurance against unforeseen threats to agricultural production. An extensive characterization of PGR provides an opportunity to dissect structure, mine allelic variations, and identify diverse accessions for crop improvement. The Generation Challenge Program conceptualized the development of "composite collections" and extraction of "reference sets" from these for more efficient tapping of global crop-related genetic resources. In this study, we report the genetic structure, diversity and allelic richness in a composite collection of chickpea using SSR markers, and formation of a reference set of 300 accessions.
The 48 SSR markers detected 1683 alleles in 2915 accessions, of which, 935 were considered rare, 720 common and 28 most frequent. The alleles per locus ranged from 14 to 67, averaged 35, and the polymorphic information content was from 0.467 to 0.974, averaged 0.854. Marker polymorphism varied between groups of accessions in the composite collection and reference set. A number of group-specific alleles were detected: 104 in Kabuli, 297 in desi, and 69 in wild Cicer; 114 each in Mediterranean and West Asia (WA), 117 in South and South East Asia (SSEA), and 10 in African region accessions. Desi and kabuli shared 436 alleles, while wild Cicer shared 17 and 16 alleles with desi and kabuli, respectively. The accessions from SSEA and WA shared 74 alleles, while those from Mediterranean 38 and 33 alleles with WA and SSEA, respectively. Desi chickpea contained a higher proportion of rare alleles (53%) than kabuli (46%), while wild Cicer accessions were devoid of rare alleles. A genotype-based reference set captured 1315 (78%) of the 1683 composite collection alleles of which 463 were rare, 826 common, and 26 the most frequent alleles. The neighbour-joining tree diagram of this reference set represents diversity from all directions of the tree diagram of the composite collection.
The genotype-based reference set, reported here, is an ideal set of germplasm for allele mining, association genetics, mapping and cloning gene(s), and in applied breeding for the development of broad-based elite breeding lines/cultivars with superior yield and enhanced adaptation to diverse environments.
Hordeum chilense, a native South American diploid wild barley, is a potential source of useful genes for cereal breeding. The use of this wild species to increase genetic variation in cereals will be greatly facilitated by marker-assisted selection. Different economically feasible approaches have been undertaken for this wild species with limited direct agricultural use in a search for suitable and cost-effective markers. The availability of Expressed Sequence Tags (EST) derived microsatellites or simple sequence repeat (SSR) markers, commonly called as EST-SSRs, for barley (Hordeum vulgare) represents a promising source to increase the number of genetic markers available for the H. chilense genome.
All of the 82 barley EST-derived SSR primer pairs tested for transferability to H. chilense amplified products of correct size from this species. Of these 82 barley EST-SSRs, 21 (26%) showed polymorphism among H. chilense lines. Identified polymorphic markers were used to test the transferability and polymorphism in other Poaceae family species with the aim of establishing H. chilense phylogenetic relationships. Triticum aestivum-H. chilense addition lines allowed us to determine the chromosomal localizations of EST-SSR markers and confirm conservation of the linkage group.
From the present study a set of 21 polymorphic EST-SSR markers have been identified to be useful for diversity analysis of H. chilense, related wild barleys like H. murinum, and for wheat marker-assisted introgression breeding. Across-genera transferability of the barley EST-SSR markers has allowed phylogenetic inference within the Triticeae complex.
Cultivated peanut or groundnut (Arachis hypogaea L.) is the fourth most important oilseed crop in the world, grown mainly in tropical, subtropical and warm temperate climates. Due to its origin through a single and recent polyploidization event, followed by successive selection during breeding efforts, cultivated groundnut has a limited genetic background. In such species, microsatellite or simple sequence repeat (SSR) markers are very informative and useful for breeding applications. The low level of polymorphism in cultivated germplasm, however, warrants a need of larger number of polymorphic microsatellite markers for cultivated groundnut.
A microsatellite-enriched library was constructed from the genotype TMV2. Sequencing of 720 putative SSR-positive clones from a total of 3,072 provided 490 SSRs. 71.2% of these SSRs were perfect type, 13.1% were imperfect and 15.7% were compound. Among these SSRs, the GT/CA repeat motifs were the most common (37.6%) followed by GA/CT repeat motifs (25.9%). The primer pairs could be designed for a total of 170 SSRs and were optimized initially on two genotypes. 104 (61.2%) primer pairs yielded scorable amplicon and 46 (44.2%) primers showed polymorphism among 32 cultivated groundnut genotypes. The polymorphic SSR markers detected 2 to 5 alleles with an average of 2.44 per locus. The polymorphic information content (PIC) value for these markers varied from 0.12 to 0.75 with an average of 0.46. Based on 112 alleles obtained by 46 markers, a phenogram was constructed to understand the relationships among the 32 genotypes. Majority of the genotypes representing subspecies hypogaea were grouped together in one cluster, while the genotypes belonging to subspecies fastigiata were grouped mainly under two clusters.
Newly developed set of 104 markers extends the repertoire of SSR markers for cultivated groundnut. These markers showed a good level of PIC value in cultivated germplasm and therefore would be very useful for germplasm analysis, linkage mapping, diversity studies and phylogenetic relationships in cultivated groundnut as well as related Arachis species.
The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of
intra- and interspecies single nucleotide polymorphisms
(SNPs). In the case of model organisms, the data available are
numerous, given the degree of redundancy in the deposited EST
data. There are several available bioinformatics tools that
can be used to mine this data; however, using them requires a
certain level of expertise: the tools have to be used
sequentially with accompanying format conversion and steps
like clustering and assembly of sequences become
time-intensive jobs even for moderately sized datasets. We
report here a pipeline of open source software extended to run
on multiple CPU architectures that can be used to mine large
EST datasets for SNPs and identify restriction sites for
assaying the SNPs so that cost-effective CAPS assays can be
developed for SNP genotyping in genetics and breeding
applications. At the International Crops Research Institute for
the Semi-Arid Tropics (ICRISAT), the pipeline has been
implemented to run on a Paracel high-performance system
consisting of four dual AMD Opteron processors running Linux
with MPICH. The pipeline can be accessed through user-friendly
web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is
available on request for academic use. We have validated the
developed pipeline by mining chickpea ESTs for interspecies
SNPs, development of CAPS assays for SNP genotyping, and
confirmation of restriction digestion pattern at the sequence
Small heat shock protein 17.8 (HSP17.8) is produced abundantly in plant cells under heat and other stress conditions and may play an important role in plant tolerance to stress environments. However, HSP17.8 may be differentially expressed in different accessions of a crop species exposed to identical stress conditions. The ability of different genotypes to adapt to various stress conditions resides in their genetic diversity. Allelic variations are the most common forms of genetic variation in natural populations. In this study, single nucleotide polymorphisms (SNPs) of the HSP17.8 gene were investigated across 210 barley accessions collected from 30 countries using EcoTILLING technology. Eleven SNPs including 10 from the coding region of HSP17.8 were detected, which form nine distinguishable haplotypes in the barley collection. Among the 10 SNPs in the coding region, six are missense mutations and four are synonymous nucleotide changes. Five of the six missense changes are predicted to be deleterious to HSP17.8 function. The accessions from Middle East Asia showed the higher nucleotide diversity of HSP17.8 than those from other regions and wild barley (H. spontaneum) accessions exhibited greater diversity than the cultivated barley (H. vulgare) accessions. Four SNPs in HSP17.8 were found associated with at least one of the agronomic traits evaluated except for spike length, namely number of grains per spike, thousand kernel weight, plant height, flag leaf area and leaf color. The association between SNP and these agronomic traits may provide new insight for study of the gene's potential contribution to drought tolerance of barley.
Only a few genetic maps based on recombinant inbred line (RIL) and backcross (BC) populations have been developed for tetraploid groundnut. The marker density, however, is not very satisfactory especially in the context of large genome size (2800 Mb/1C) and 20 linkage groups (LGs). Therefore, using marker segregation data for 10 RILs and one BC population from the international groundnut community, with the help of common markers across different populations, a reference consensus genetic map has been developed. This map is comprised of 897 marker loci including 895 simple sequence repeat (SSR) and 2 cleaved amplified polymorphic sequence (CAPS) loci distributed on 20 LGs (a01–a10 and b01–b10) spanning a map distance of 3, 863.6 cM with an average map density of 4.4 cM. The highest numbers of markers (70) were integrated on a01 and the least number of markers (21) on b09. The marker density, however, was lowest (6.4 cM) on a08 and highest (2.5 cM) on a01. The reference consensus map has been divided into 20 cM long 203 BINs. These BINs carry 1 (a10_02, a10_08 and a10_09) to 20 (a10_04) loci with an average of 4 marker loci per BIN. Although the polymorphism information content (PIC) value was available for 526 markers in 190 BINs, 36 and 111 BINs have at least one marker with >0.70 and >0.50 PIC values, respectively. This information will be useful for selecting highly informative and uniformly distributed markers for developing new genetic maps, background selection and diversity analysis. Most importantly, this reference consensus map will serve as a reliable reference for aligning new genetic and physical maps, performing QTL analysis in a multi-populations design, evaluating the genetic background effect on QTL expression, and serving other genetic and molecular breeding activities in groundnut.
Pigeonpea (Cajanus cajan) is an annual or short-lived perennial food legume of acute regional importance, providing significant protein to the human diet in less developed regions of Asia and Africa. Due to its narrow genetic base, pigeonpea improvement is increasingly reliant on introgression of valuable traits from wild forms, a practice that would benefit from knowledge of its domestication history and relationships to wild species. Here we use 752 single nucleotide polymorphisms (SNPs) derived from 670 low copy orthologous genes to clarify the evolutionary history of pigeonpea (79 accessions) and its wild relatives (31 accessions). We identified three well-supported lineages that are geographically clustered and congruent with previous nuclear and plastid sequence-based phylogenies. Among all species analyzed Cajanus cajanifolius is the most probable progenitor of cultivated pigeonpea. Multiple lines of evidence suggest recent gene flow between cultivated and non-cultivated forms, as well as historical gene flow between diverged but sympatric species. Evidence supports that primary domestication occurred in India, with a second and more recent nested population bottleneck focused in tropical regions that is the likely consequence of pigeonpea breeding. We find abundant allelic variation and genetic diversity among the wild relatives, with the exception of wild species from Australia for which we report a third bottleneck unrelated to domestication within India. Domesticated C. cajan possess 75% less allelic diversity than the progenitor clade of wild Indian species, indicating a severe “domestication bottleneck” during pigeonpea domestication.
Cultivated peanut (Arachis hypogaea L.) is an important crop worldwide, valued for its edible oil and digestible protein. It has a very narrow genetic base that may well derive from a relatively recent single polyploidization event. Accordingly molecular markers have low levels of polymorphism and the number of polymorphic molecular markers available for cultivated peanut is still limiting.
Here, we report a large set of BAC-end sequences (BES), use them for developing SSR (BES-SSR) markers, and apply them in genetic linkage mapping. The majority of BESs had no detectable homology to known genes (49.5%) followed by sequences with similarity to known genes (44.3%), and miscellaneous sequences (6.2%) such as transposable element, retroelement, and organelle sequences. A total of 1,424 SSRs were identified from 36,435 BESs. Among these identified SSRs, dinucleotide (47.4%) and trinucleotide (37.1%) SSRs were predominant. The new set of 1,152 SSRs as well as about 4,000 published or unpublished SSRs were screened against two parents of a mapping population, generating 385 polymorphic loci. A genetic linkage map was constructed, consisting of 318 loci onto 21 linkage groups and covering a total of 1,674.4 cM, with an average distance of 5.3 cM between adjacent loci. Two markers related to resistance gene homologs (RGH) were mapped to two different groups, thus anchoring 1 RGH-BAC contig and 1 singleton.
The SSRs mined from BESs will be of use in further molecular analysis of the peanut genome, providing a novel set of markers, genetically anchoring BAC clones, and incorporating gene sequences into a linkage map. This will aid in the identification of markers linked to genes of interest and map-based cloning.
Chickpea (Cicer arietinum L.) is the third most important cool season food legume, cultivated in arid and semi-arid regions of the world. The goal of this study was to develop novel molecular markers such as microsatellite or simple sequence repeat (SSR) markers from bacterial artificial chromosome (BAC)-end sequences (BESs) and diversity arrays technology (DArT) markers, and to construct a high-density genetic map based on recombinant inbred line (RIL) population ICC 4958 (C. arietinum)×PI 489777 (C. reticulatum). A BAC-library comprising 55,680 clones was constructed and 46,270 BESs were generated. Mining of these BESs provided 6,845 SSRs, and primer pairs were designed for 1,344 SSRs. In parallel, DArT arrays with ca. 15,000 clones were developed, and 5,397 clones were found polymorphic among 94 genotypes tested. Screening of newly developed BES-SSR markers and DArT arrays on the parental genotypes of the RIL mapping population showed polymorphism with 253 BES-SSR markers and 675 DArT markers. Segregation data obtained for these polymorphic markers and 494 markers data compiled from published reports or collaborators were used for constructing the genetic map. As a result, a comprehensive genetic map comprising 1,291 markers on eight linkage groups (LGs) spanning a total of 845.56 cM distance was developed (http://cmap.icrisat.ac.in/cmap/sm/cp/thudi/). The number of markers per linkage group ranged from 68 (LG 8) to 218 (LG 3) with an average inter-marker distance of 0.65 cM. While the developed resource of molecular markers will be useful for genetic diversity, genetic mapping and molecular breeding applications, the comprehensive genetic map with integrated BES-SSR markers will facilitate its anchoring to the physical map (under construction) to accelerate map-based cloning of genes in chickpea and comparative genome evolution studies in legumes.
This study reports generation of large-scale genomic resources for pigeonpea, a so-called ‘orphan crop species’ of the semi-arid tropic regions. FLX/454 sequencing carried out on a normalized cDNA pool prepared from 31 tissues produced 494 353 short transcript reads (STRs). Cluster analysis of these STRs, together with 10 817 Sanger ESTs, resulted in a pigeonpea trancriptome assembly (CcTA) comprising of 127 754 tentative unique sequences (TUSs). Functional analysis of these TUSs highlights several active pathways and processes in the sampled tissues. Comparison of the CcTA with the soybean genome showed similarity to 10 857 and 16 367 soybean gene models (depending on alignment methods). Additionally, Illumina 1G sequencing was performed on Fusarium wilt (FW)- and sterility mosaic disease (SMD)-challenged root tissues of 10 resistant and susceptible genotypes. More than 160 million sequence tags were used to identify FW- and SMD-responsive genes. Sequence analysis of CcTA and the Illumina tags identified a large new set of markers for use in genetics and breeding, including 8137 simple sequence repeats, 12 141 single-nucleotide polymorphisms and 5845 intron-spanning regions. Genomic resources developed in this study should be useful for basic and applied research, not only for pigeonpea improvement but also for other related, agronomically important legumes.
Cajanus cajan L.; next generation sequencing; transcriptome assembly; molecular markers and gene discovery
Pigeonpea [Cajanus cajan (L.) Millsp.] is an important legume crop of rainfed agriculture. Despite of concerted research efforts directed to pigeonpea improvement, stagnated productivity of pigeonpea during last several decades may be accounted to prevalence of various biotic and abiotic constraints and the situation is exacerbated by availability of inadequate genomic resources to undertake any molecular breeding programme for accelerated crop improvement. With the objective of enhancing genomic resources for pigeonpea, this study reports for the first time, large scale development of SSR markers from BAC-end sequences and their subsequent use for genetic mapping and hybridity testing in pigeonpea.
A set of 88,860 BAC (bacterial artificial chromosome)-end sequences (BESs) were generated after constructing two BAC libraries by using HindIII (34,560 clones) and BamHI (34,560 clones) restriction enzymes. Clustering based on sequence identity of BESs yielded a set of >52K non-redundant sequences, comprising 35 Mbp or >4% of the pigeonpea genome. These sequences were analyzed to develop annotation lists and subdivide the BESs into genome fractions (e.g., genes, retroelements, transpons and non-annotated sequences). Parallel analysis of BESs for microsatellites or simple sequence repeats (SSRs) identified 18,149 SSRs, from which a set of 6,212 SSRs were selected for further analysis. A total of 3,072 novel SSR primer pairs were synthesized and tested for length polymorphism on a set of 22 parental genotypes of 13 mapping populations segregating for traits of interest. In total, we identified 842 polymorphic SSR markers that will have utility in pigeonpea improvement. Based on these markers, the first SSR-based genetic map comprising of 239 loci was developed for this previously uncharacterized genome. Utility of developed SSR markers was also demonstrated by identifying a set of 42 markers each for two hybrids (ICPH 2671 and ICPH 2438) for genetic purity assessment in commercial hybrid breeding programme.
In summary, while BAC libraries and BESs should be useful for genomics studies, BES-SSR markers, and the genetic map should be very useful for linking the genetic map with a future physical map as well as for molecular breeding in pigeonpea.
Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs).
A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (≤ 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay.
The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.
This study presents the development and mapping of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers in chickpea. The mapping population is based on an inter-specific cross between domesticated and non-domesticated genotypes of chickpea (Cicer arietinum ICC 4958 × C. reticulatum PI 489777). This same population has been the focus of previous studies, permitting integration of new and legacy genetic markers into a single genetic map. We report a set of 311 novel SSR markers (designated ICCM—ICRISAT chickpea microsatellite), obtained from an SSR-enriched genomic library of ICC 4958. Screening of these SSR markers on a diverse panel of 48 chickpea accessions provided 147 polymorphic markers with 2–21 alleles and polymorphic information content value 0.04–0.92. Fifty-two of these markers were polymorphic between parental genotypes of the inter-specific population. We also analyzed 233 previously published (H-series) SSR markers that provided another set of 52 polymorphic markers. An additional 71 gene-based SNP markers were developed from transcript sequences that are highly conserved between chickpea and its near relative Medicago truncatula. By using these three approaches, 175 new marker loci along with 407 previously reported marker loci were integrated to yield an improved genetic map of chickpea. The integrated map contains 521 loci organized into eight linkage groups that span 2,602 cM, with an average inter-marker distance of 4.99 cM. Gene-based markers provide anchor points for comparing the genomes of Medicago and chickpea, and reveal extended synteny between these two species. The combined set of genetic markers and their integration into an improved genetic map should facilitate chickpea genetics and breeding, as well as translational studies between chickpea and Medicago.
Electronic supplementary material
The online version of this article (doi:10.1007/s00122-010-1265-1) contains supplementary material, which is available to authorized users.