Cultivated peanut (Arachis hypogaea) is one of the most widely grown grain legumes in the world, being valued for its high protein and unsaturated oil contents. Worldwide, the major constraints to peanut production are drought and fungal diseases. Wild Arachis species, which are exclusively South American in origin, have high genetic diversity and have been selected during evolution in a range of environments and biotic stresses, constituting a rich source of allele diversity. Arachis stenosperma harbors resistances to a number of pests, including fungal diseases, whilst A. duranensis has shown improved tolerance to water limited stress. In this study, these species were used for the creation of an extensive databank of wild Arachis transcripts under stress which will constitute a rich source for gene discovery and molecular markers development.
Transcriptome analysis of cDNA collections from A. stenosperma challenged with Cercosporidium personatum (Berk. and M.A. Curtis) Deighton, and A. duranensis submitted to gradual water limited stress was conducted using 454 GS FLX Titanium generating a total of 7.4 x 105 raw sequence reads covering 211 Mbp of both genomes. High quality reads were assembled to 7,723 contigs for A. stenosperma and 12,792 for A. duranensis and functional annotation indicated that 95% of the contigs in both species could be appointed to GO annotation categories. A number of transcription factors families and defense related genes were identified in both species. Additionally, the expression of five A. stenosperma Resistance Gene Analogs (RGAs) and four retrotransposon (FIDEL-related) sequences were analyzed by qRT-PCR. This data set was used to design a total of 2,325 EST-SSRs, of which a subset of 584 amplified in both species and 214 were shown to be polymorphic using ePCR.
This study comprises one of the largest unigene dataset for wild Arachis species and will help to elucidate genes involved in responses to biological processes such as fungal diseases and water limited stress. Moreover, it will also facilitate basic and applied research on the genetics of peanut through the development of new molecular markers and the study of adaptive variation across the genus.
Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination.
We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species.
The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546.
Lack of sufficient molecular markers hinders current genetic research in peanuts (Arachis hypogaea L.). It is necessary to develop more molecular markers for potential use in peanut genetic research. With the development of peanut EST projects, a vast amount of available EST sequence data has been generated. These data offered an opportunity to identify SSR in ESTs by data mining.
In this study, we investigated 24,238 ESTs for the identification and development of SSR markers. In total, 881 SSRs were identified from 780 SSR-containing unique ESTs. On an average, one SSR was found per 7.3 kb of EST sequence with tri-nucleotide motifs (63.9%) being the most abundant followed by di- (32.7%), tetra- (1.7%), hexa- (1.0%) and penta-nucleotide (0.7%) repeat types. The top six motifs included AG/TC (27.7%), AAG/TTC (17.4%), AAT/TTA (11.9%), ACC/TGG (7.72%), ACT/TGA (7.26%) and AT/TA (6.3%). Based on the 780 SSR-containing ESTs, a total of 290 primer pairs were successfully designed and used for validation of the amplification and assessment of the polymorphism among 22 genotypes of cultivated peanuts and 16 accessions of wild species. The results showed that 251 primer pairs yielded amplification products, of which 26 and 221 primer pairs exhibited polymorphism among the cultivated and wild species examined, respectively. Two to four alleles were found in cultivated peanuts, while 3–8 alleles presented in wild species. The apparent broad polymorphism was further confirmed by cloning and sequencing of amplified alleles. Sequence analysis of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the microsatellite regions. In addition, a few single base mutations were observed in the microsatellite flanking regions.
This study gives an insight into the frequency, type and distribution of peanut EST-SSRs and demonstrates successful development of EST-SSR markers in cultivated peanut. These EST-SSR markers could enrich the current resource of molecular markers for the peanut community and would be useful for qualitative and quantitative trait mapping, marker-assisted selection, and genetic diversity studies in cultivated peanut as well as related Arachis species. All of the 251 working primer pairs with names, motifs, repeat types, primer sequences, and alleles tested in cultivated and wild species are listed in Additional File 1.
Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and phenotypic diversity within the cultigen. However, wild diploid relatives are a rich source of alleles that could be used for crop improvement and their simpler genomes can be more easily analyzed while providing insight into the structure of the allotetraploid peanut genome. The objective of this research was to establish a high-density genetic map of the diploid species A. duranensis based on de novo generated EST databases. Arachis duranensis was chosen for mapping because it is the A-genome progenitor of cultivated peanut and also in order to circumvent the confounding effects of gene duplication associated with allopolyploidy in A. hypogaea.
More than one million expressed sequence tag (EST) sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. Mining this dataset, 1236 EST-SNP markers were developed between two A. duranensis accessions, PI 475887 and Grif 15036. An additional 300 SNP markers also were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map. In addition, 598 EST-SSR markers identified in A. hypogaea assemblies were included in the map along with 37 disease resistance gene candidate (RGC) and 35 other previously published markers. In total, 1724 markers spanning 1081.3 cM over 10 linkage groups were mapped. Gene sequences that provided mapped markers were annotated using similarity searches in three different databases, and gene ontology descriptions were determined using the Medicago Gene Atlas and TAIR databases. Synteny analysis between A. duranensis, Medicago and Glycine revealed significant stretches of conserved gene clusters spread across the peanut genome. A higher level of colinearity was detected between A. duranensis and Glycine than with Medicago.
The first high-density, gene-based linkage map for A. duranensis was generated that can serve as a reference map for both wild and cultivated Arachis species. The markers developed here are valuable resources for the peanut, and more broadly, to the legume research community. The A-genome map will have utility for fine mapping in other peanut species and has already had application for mapping a nematode resistance gene that was introgressed into A. hypogaea from A. cardenasii.
The genus Arachis is native to a region that includes Central Brazil and neighboring countries. Little is known about the genetic variability of the Brazilian cultivated peanut (Arachis hypogaea, genome AABB) germplasm collection at the DNA level. The understanding of the genetic diversity of cultivated and wild species of peanut (Arachis spp.) is essential to develop strategies of collection, conservation and use of the germplasm in variety development. The identity of the ancestor progenitor species of cultivated peanut has also been of great interest. Several species have been suggested as putative AA and BB genome donors to allotetraploid A. hypogaea. Microsatellite or SSR (Simple Sequence Repeat) markers are co-dominant, multiallelic, and highly polymorphic genetic markers, appropriate for genetic diversity studies. Microsatellite markers may also, to some extent, support phylogenetic inferences. Here we report the use of a set of microsatellite markers, including newly developed ones, for phylogenetic inferences and the analysis of genetic variation of accessions of A. hypogea and its wild relatives.
A total of 67 new microsatellite markers (mainly TTG motif) were developed for Arachis. Only three of these markers, however, were polymorphic in cultivated peanut. These three new markers plus five other markers characterized previously were evaluated for number of alleles per locus and gene diversity using 60 accessions of A. hypogaea. Genetic relationships among these 60 accessions and a sample of 36 wild accessions representative of section Arachis were estimated using allelic variation observed in a selected set of 12 SSR markers. Results showed that the Brazilian peanut germplasm collection has considerable levels of genetic diversity detected by SSR markers. Similarity groups for A. hypogaea accessions were established, which is a useful criteria for selecting parental plants for crop improvement. Microsatellite marker transferability was up to 76% for species of the section Arachis, but only 45% for species from the other eight Arachis sections tested. A new marker (Ah-041) presented a 100% transferability and could be used to classify the peanut accessions in AA and non-AA genome carriers.
The level of polymorphism observed among accessions of A. hypogaea analyzed with newly developed microsatellite markers was low, corroborating the accumulated data which show that cultivated peanut presents a relatively reduced variation at the DNA level. A selected panel of SSR markers allowed the classification of A. hypogaea accessions into two major groups. The identification of similarity groups will be useful for the selection of parental plants to be used in breeding programs. Marker transferability is relatively high between accessions of section Arachis. The possibility of using microsatellite markers developed for one species in genetic evaluation of other species greatly reduces the cost of the analysis, since the development of microsatellite markers is still expensive and time consuming. The SSR markers developed in this study could be very useful for genetic analysis of wild species of Arachis, including comparative genome mapping, population genetic structure and phylogenetic inferences among species.
Peanut is vulnerable to a range of foliar diseases such as spotted wilt caused by Tomato spotted wilt virus (TSWV), early (Cercospora arachidicola) and late (Cercosporidium personatum) leaf spots, southern stem rot (Sclerotium rolfsii), and sclerotinia blight (Sclerotinia minor). In this study, we report the generation of 17,376 peanut expressed sequence tags (ESTs) from leaf tissues of a peanut cultivar (Tifrunner, resistant to TSWV and leaf spots) and a breeding line (GT-C20, susceptible to TSWV and leaf spots). After trimming vector and discarding low quality sequences, a total of 14,432 high-quality ESTs were selected for further analysis and deposition to GenBank. Sequence clustering resulted in 6,888 unique ESTs composed of 1,703 tentative consensus (TCs) sequences and 5185 singletons. A large number of ESTs (5717) representing genes of unknown functions were also identified. Among the unique sequences, there were 856 EST-SSRs identified. A total of 290 new EST-based SSR markers were developed and examined for amplification and polymorphism in cultivated peanut and wild species. Resequencing information of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the SSR regions. In addition, a few additional INDEL mutations and substitutions were observed in the regions flanking the microsatellite regions. In addition, some defense-related transcripts were also identified, such as putative oxalate oxidase (EU024476) and NBS-LRR domains. EST data in this study have provided a new source of information for gene discovery and development of SSR markers in cultivated peanut. A total of 16931 ESTs have been deposited to the NCBI GenBank database with accession numbers ES751523 to
Cultivated peanut or groundnut (Arachis hypogaea L.) is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). Both the low level of genetic variation within the cultivated gene pool and its polyploid nature limit the utilization of molecular markers to explore genome structure and facilitate genetic improvement. Nevertheless, a wealth of genetic diversity exists in diploid Arachis species (2n = 2x = 20), which represent a valuable gene pool for cultivated peanut improvement. Interspecific populations have been used widely for genetic mapping in diploid species of Arachis. However, an intraspecific mapping strategy was essential to detect chromosomal rearrangements among species that could be obscured by mapping in interspecific populations. To develop intraspecific reference linkage maps and gain insights into karyotypic evolution within the genus, we comparatively mapped the A- and B-genome diploid species using intraspecific F2 populations. Exploring genome organization among diploid peanut species by comparative mapping will enhance our understanding of the cultivated tetraploid peanut genome. Moreover, new sources of molecular markers that are highly transferable between species and developed from expressed genes will be required to construct saturated genetic maps for peanut.
A total of 2,138 EST-SSR (expressed sequence tag-simple sequence repeat) markers were developed by mining a tetraploid peanut EST assembly including 101,132 unigenes (37,916 contigs and 63,216 singletons) derived from 70,771 long-read (Sanger) and 270,957 short-read (454) sequences. A set of 97 SSR markers were also developed by mining 9,517 genomic survey sequences of Arachis. An SSR-based intraspecific linkage map was constructed using an F2 population derived from a cross between K 9484 (PI 298639) and GKBSPSc 30081 (PI 468327) in the B-genome species A. batizocoi. A high degree of macrosynteny was observed when comparing the homoeologous linkage groups between A (A. duranensis) and B (A. batizocoi) genomes. Comparison of the A- and B-genome genetic linkage maps also showed a total of five inversions and one major reciprocal translocation between two pairs of chromosomes under our current mapping resolution.
Our findings will contribute to understanding tetraploid peanut genome origin and evolution and eventually promote its genetic improvement. The newly developed EST-SSR markers will enrich current molecular marker resources in peanut.
Peanut (Arachis hypogaea); SSR; Genetic linkage map; Intraspecific cross; EST
Wild peanut species (Arachis spp.) are a rich source of new alleles for peanut improvement. Plant transcriptome analysis under specific experimental conditions helps the understanding of cellular processes related, for instance, to development, stress response, and crop yield. The validation of these studies has been generally accomplished by quantitative reverse transcription-polymerase chain reaction (qRT-PCR) which requires normalization of mRNA levels among samples. This can be achieved by comparing the expression ratio between a gene of interest and a reference gene which is constitutively expressed. Nowadays there is a lack of appropriate reference genes for both wild and cultivated Arachis. The identification of such genes would allow a consistent analysis of qRT-PCR data and speed up candidate gene validation in peanut.
A set of ten reference genes were analyzed in four Arachis species (A. magna; A. duranensis; A. stenosperma and A. hypogaea) subjected to biotic (root-knot nematode and leaf spot fungus) and abiotic (drought) stresses, in two distinct plant organs (roots and leaves). By the use of three programs (GeNorm, NormFinder and BestKeeper) and taking into account the entire dataset, five of these ten genes, ACT1 (actin depolymerizing factor-like protein), UBI1 (polyubiquitin), GAPDH (glyceraldehyde-3-phosphate dehydrogenase), 60S (60S ribosomal protein L10) and UBI2 (ubiquitin/ribosomal protein S27a) emerged as top reference genes, with their stability varying in eight subsets. The former three genes were the most stable across all species, organs and treatments studied.
This first in-depth study of reference genes validation in wild Arachis species will allow the use of specific combinations of secure and stable reference genes in qRT-PCR assays. The use of these appropriate references characterized here should improve the accuracy and reliability of gene expression analysis in both wild and cultivated Arachis and contribute for the better understanding of gene expression in, for instance, stress tolerance/resistance mechanisms in plants.
Sequencing of cDNA libraries for the development of expressed sequence tags (ESTs) as well as for the discovery of simple sequence repeats (SSRs) has been a common method of developing microsatellites or SSR-based markers. In this research, our objective was to further sequence and develop common bean microsatellites from leaf and root cDNA libraries derived from the Andean gene pool accession G19833 and the Mesoamerican gene pool accession DOR364, mapping parents of a commonly used reference map. The root libraries were made from high and low phosphorus treated plants.
A total of 3,123 EST sequences from leaf and root cDNA libraries were screened and used for direct simple sequence repeat discovery. From these EST sequences we found 184 microsatellites; the majority containing tri-nucleotide motifs, many of which were GC rich (ACC, AGC and AGG in particular). Di-nucleotide motif microsatellites were about half as common as the tri-nucleotide motif microsatellites but most of these were AGn microsatellites with a moderate number of ATn microsatellites in root ESTs followed by few ACn and no GCn microsatellites. Out of the 184 new SSR loci, 120 new microsatellite markers were developed in the BMc (Bean Microsatellites from cDNAs) series and these were evaluated for their capacity to distinguish bean diversity in a germplasm panel of 18 genotypes. We developed a database with images of the microsatellites and their polymorphism information content (PIC), which averaged 0.310 for polymorphic markers.
The present study produced information about microsatellite frequency in root and leaf tissues of two important genotypes for common bean genomics: namely G19833, the Andean genotype selected for whole genome shotgun sequencing from race Peru, and DOR364 a race Mesoamerica subgroup 2 genotype that is a small-red seeded, released variety in Central America. Both race Peru and Mesoamerica subgroup 2 (small red beans) have been understudied in comparison to race Nueva Granada and Mesoamerica subgroup 1 (black beans) both with regards to gene expression and as sources of markers. However, we found few differences between SSR type and frequency between the G19833 leaf and DOR364 root tissue-derived ESTs. Overall, our work adds to the analysis of microsatellite frequency evaluation for common bean and provides a new set of 120 BMc markers which combined with the 248 previously developed BMc markers brings the total in this series to 368 markers. Once we include BMd markers, which are derived from GenBank sequences, the current total of gene-based markers from our laboratory surpasses 500 markers. These markers are basic for studies of the transcriptome of common bean and can form anchor points for genetic mapping studies in the future.
The entomopathogenic nematode Heterorhabditis bacteriophora and its symbiotic bacterium, Photorhabdus luminescens, are important biological control agents of insect pests. This nematode-bacterium-insect association represents an emerging tripartite model for research on mutualistic and parasitic symbioses. Elucidation of mechanisms underlying these biological processes may serve as a foundation for improving the biological control potential of the nematode-bacterium complex. This large-scale expressed sequence tag (EST) analysis effort enables gene discovery and development of microsatellite markers. These ESTs will also aid in the annotation of the upcoming complete genome sequence of H. bacteriophora.
A total of 31,485 high quality ESTs were generated from cDNA libraries of the adult H. bacteriophora TTO1 strain. Cluster analysis revealed the presence of 3,051 contigs and 7,835 singletons, representing 10,886 distinct EST sequences. About 72% of the distinct EST sequences had significant matches (E value < 1e-5) to proteins in GenBank's non-redundant (nr) and Wormpep190 databases. We have identified 12 ESTs corresponding to 8 genes potentially involved in RNA interference, 22 ESTs corresponding to 14 genes potentially involved in dauer-related processes, and 51 ESTs corresponding to 27 genes potentially involved in defense and stress responses. Comparison to ESTs and proteins of free-living nematodes led to the identification of 554 parasitic nematode-specific ESTs in H. bacteriophora, among which are those encoding F-box-like/WD-repeat protein theromacin, Bax inhibitor-1-like protein, and PAZ domain containing protein. Gene Ontology terms were assigned to 6,685 of the 10,886 ESTs. A total of 168 microsatellite loci were identified with primers designable for 141 loci.
A total of 10,886 distinct EST sequences were identified from adult H. bacteriophora cDNA libraries. BLAST searches revealed ESTs potentially involved in parasitism, RNA interference, defense responses, stress responses, and dauer-related processes. The putative microsatellite markers identified in H. bacteriophora ESTs will enable genetic mapping and population genetic studies. These genomic resources provide the material base necessary for genome annotation, microarray development, and in-depth gene functional analysis.
The peanut (Arachis hypogaea L.) is an important oilseed crop in tropical and subtropical regions of the world. However, little about the molecular biology of the peanut is currently known. Recently, next-generation sequencing technology, termed RNA-seq, has provided a powerful approach for analysing the transcriptome, and for shedding light on the molecular biology of peanut.
In this study, we employed RNA-seq to analyse the transcriptomes of the immature seeds of three different peanut varieties with different oil contents. A total of 26.1-27.2 million paired-end reads with lengths of 100 bp were generated from the three varieties and 59,077 unigenes were assembled with N50 of 823 bp. Based on sequence similarity search with known proteins, a total of 40,100 genes were identified. Among these unigenes, only 8,252 unigenes were annotated with 42 gene ontology (GO) functional categories. And 18,028 unigenes mapped to 125 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG). In addition, 3,919 microsatellite markers were developed in the unigene library, and 160 PCR primers of SSR loci were used for validation of the amplification and the polymorphism.
We completed a successful global analysis of the peanut transcriptome using RNA-seq, a large number of unigenes were assembled, and almost four thousand SSR primers were developed. These data will facilitate gene discovery and functional genomic studies of the peanut plant. In addition, this study provides insight into the complex transcriptome of the peanut and established a biotechnological platform for future research.
Large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed in peanut (Arachis hypogaea L.) to obtain more informative genetic markers. A total of 10,102 potential non-redundant EST sequences, including 3,445 contigs and 6,657 singletons, were generated from cDNA libraries of the gynophore, roots, leaves and seedlings. A total of 3,187 primer pairs were designed on flanking regions of SSRs, some of which allowed one and two base mismatches. Among the 3,187 markers generated, 2,540 (80%) were trinucleotide repeats, 302 (9%) were dinucleotide repeats, and 345 (11%) were tetranucleotide repeats. Pre-polymorphic analyses of 24 Arachis accessions were performed using 10% polyacrylamide gels. A total of 1,571 EST-SSR markers showing clear polymorphisms were selected for further polymorphic analysis with a Fluoro-fragment Analyzer. The 16 Arachis accessions examined included cultivated peanut varieties as well as diploid species with the A or B genome. Altogether 1,281 (81.5%) of the 1,571 markers were polymorphic among the 16 accessions, and 366 (23.3%) were polymorphic among the 12 cultivated varieties. Diversity analysis was performed and the genotypes of all 16 Arachis accessions showed similarity coefficients ranging from 0.37 to 0.97.
Electronic supplementary material
The online version of this article (doi:10.1007/s11032-011-9604-8) contains supplementary material, which is available to authorized users.
Arachis spp.; EST-SSR marker; Polymorphic analysis; Genetic diversity
EST sequencing is one of the most efficient means for gene discovery and molecular marker development, and can be additionally utilized in both comparative genome analysis and evaluation of gene duplications. While much progress has been made in catfish genomics, large-scale EST resources have been lacking. The objectives of this project were to construct primary cDNA libraries, to conduct initial EST sequencing to generate catfish EST resources, and to obtain baseline information about highly expressed genes in various catfish organs to provide a guide for the production of normalized and subtracted cDNA libraries for large-scale transcriptome analysis in catfish.
A total of 17 cDNA libraries were constructed including 12 from channel catfish (Ictalurus punctatus) and 5 from blue catfish (I. furcatus). A total of 31,215 ESTs, with average length of 778 bp, were generated including 20,451 from the channel catfish and 10,764 from blue catfish. Cluster analysis indicated that 73% of channel catfish and 67% of blue catfish ESTs were unique within the project. Over 53% and 50% of the channel catfish and blue catfish ESTs, respectively, had significant similarities to known genes. All ESTs have been deposited in GenBank. Evaluation of the catfish EST resources demonstrated their potential for molecular marker development, comparative genome analysis, and evaluation of ancient and recent gene duplications. Subtraction of abundantly expressed genes in a variety of catfish tissues, identified here, will allow the production of low-redundancy libraries for in-depth sequencing.
The sequencing of 31,215 ESTs from channel catfish and blue catfish has significantly increased the EST resources in catfish. The EST resources should provide the potential for microarray development, polymorphic marker identification, mapping, and comparative genome analysis.
Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be “unclean”. Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction.
After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3′-end terminal structures in designated 5′-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/) using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are “unclean” or abnormal, all of which could be cleaned or filtered by AFST.
cDNA terminal pattern analysis, as implemented in the AFST software tool, can be utilized to reveal wet-lab errors such as restriction enzyme cutting abnormities and chimeric EST sequences, detect various data abnormalities embedded in existing Sanger EST datasets, improve the accuracy of identifying and extracting bona fide cDNA inserts from raw ESTs, and therefore greatly benefit downstream EST-based applications.
cDNA terminus; cDNA library construction; Pattern analysis; Restriction enzyme cutting abnormality; Chimeric EST sequences
Worldwide, diseases are important reducers of peanut (Arachis hypogaea) yield. Sources of resistance against many diseases are available in cultivated peanut genotypes, although often not in farmer preferred varieties. Wild species generally harbor greater levels of resistance and even apparent immunity, although the linkage of agronomically un-adapted wild alleles with wild disease resistance genes is inevitable. Marker-assisted selection has the potential to facilitate the combination of both cultivated and wild resistance loci with agronomically adapted alleles. However, in peanut there is an almost complete lack of knowledge of the regions of the Arachis genome that control disease resistance.
In this work we identified candidate genome regions that control disease resistance. For this we placed candidate disease resistance genes and QTLs against late leaf spot disease on the genetic map of the A-genome of Arachis, which is based on microsatellite markers and legume anchor markers. These marker types are transferable within the genus Arachis and to other legumes respectively, enabling this map to be aligned to other Arachis maps and to maps of other legume crops including those with sequenced genomes. In total, 34 sequence-confirmed candidate disease resistance genes and five QTLs were mapped.
Candidate genes and QTLs were distributed on all linkage groups except for the smallest, but the distribution was not even. Groupings of candidate genes and QTLs for late leaf spot resistance were apparent on the upper region of linkage group 4 and the lower region of linkage group 2, indicating that these regions are likely to control disease resistance.
Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs).
A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (≤ 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ≥ 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay.
The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.
Background and Aims
The genus Arachis contains 80 described species. Section Arachis is of particular interest because it includes cultivated peanut, an allotetraploid, and closely related wild species, most of which are diploids. This study aimed to analyse the genetic relationships of multiple accessions of section Arachis species using two complementary methods. Microsatellites allowed the analysis of inter- and intraspecific variability. Intron sequences from single-copy genes allowed phylogenetic analysis including the separation of the allotetraploid genome components.
Intron sequences and microsatellite markers were used to reconstruct phylogenetic relationships in section Arachis through maximum parsimony and genetic distance analyses.
Although high intraspecific variability was evident, there was good support for most species. However, some problems were revealed, notably a probable polyphyletic origin for A. kuhlmannii. The validity of the genome groups was well supported. The F, K and D genomes grouped close to the A genome group. The 2n = 18 species grouped closer to the B genome group. The phylogenetic tree based on the intron data strongly indicated that A. duranensis and A. ipaënsis are the ancestors of A. hypogaea and A. monticola. Intron nucleotide substitutions allowed the ages of divergences of the main genome groups to be estimated at a relatively recent 2·3–2·9 million years ago. This age and the number of species described indicate a much higher speciation rate for section Arachis than for legumes in general.
The analyses revealed relationships between the species and genome groups and showed a generally high level of intraspecific genetic diversity. The improved knowledge of species relationships should facilitate the utilization of wild species for peanut improvement. The estimates of speciation rates in section Arachis are high, but not unprecedented. We suggest these high rates may be linked to the peculiar reproductive biology of Arachis.
Arachis; peanut; groundnut; intron sequences; single-copy genes; molecular phylogeny; microsatellites; genetic relationships; speciation rates; genome donors; molecular dating
Arachis hypogaea (peanut) is an important crop worldwide, being mostly used for edible oil production, direct consumption and animal feed. Cultivated peanut is an allotetraploid species with two different genome components, A and B. Genetic linkage maps can greatly assist molecular breeding and genomic studies. However, the development of linkage maps for A. hypogaea is difficult because it has very low levels of polymorphism. This can be overcome by the utilization of wild species of Arachis, which present the A- and B-genomes in the diploid state, and show high levels of genetic variability.
In this work, we constructed a B-genome linkage map, which will complement the previously published map for the A-genome of Arachis, and produced an entire framework for the tetraploid genome. This map is based on an F2 population of 93 individuals obtained from the cross between the diploid A. ipaënsis (K30076) and the closely related A. magna (K30097), the former species being the most probable B genome donor to cultivated peanut. In spite of being classified as different species, the parents showed high crossability and relatively low polymorphism (22.3%), compared to other interspecific crosses. The map has 10 linkage groups, with 149 loci spanning a total map distance of 1,294 cM. The microsatellite markers utilized, developed for other Arachis species, showed high transferability (81.7%). Segregation distortion was 21.5%. This B-genome map was compared to the A-genome map using 51 common markers, revealing a high degree of synteny between both genomes.
The development of genetic maps for Arachis diploid wild species with A- and B-genomes effectively provides a genetic map for the tetraploid cultivated peanut in two separate diploid components and is a significant advance towards the construction of a transferable reference map for Arachis. Additionally, we were able to identify affinities of some Arachis linkage groups with Medicago truncatula, which will allow the transfer of information from the nearly-complete genome sequences of this model legume to the peanut crop.
The construction of genetic linkage maps for cultivated peanut (Arachis hypogaea L.) has and continues to be an important research goal to facilitate quantitative trait locus (QTL) analysis and gene tagging for use in a marker-assisted selection in breeding. Even though a few maps have been developed, they were constructed using diploid or interspecific tetraploid populations. The most recently published intra-specific map was constructed from the cross of cultivated peanuts, in which only 135 simple sequence repeat (SSR) markers were sparsely populated in 22 linkage groups. The more detailed linkage map with sufficient markers is necessary to be feasible for QTL identification and marker-assisted selection. The objective of this study was to construct a genetic linkage map of cultivated peanut using simple sequence repeat (SSR) markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs), and by "data mining" sequences released in GenBank.
Three recombinant inbred lines (RILs) populations were constructed from three crosses with one common female parental line Yueyou 13, a high yielding Spanish market type. The four parents were screened with 1044 primer pairs designed to amplify SSRs and 901 primer pairs produced clear PCR products. Of the 901 primer pairs, 146, 124 and 64 primer pairs (markers) were polymorphic in these populations, respectively, and used in genotyping these RIL populations. Individual linkage maps were constructed from each of the three populations and a composite map based on 93 common loci were created using JoinMap. The composite linkage maps consist of 22 composite linkage groups (LG) with 175 SSR markers (including 47 SSRs on the published AA genome maps), representing the 20 chromosomes of A. hypogaea. The total composite map length is 885.4 cM, with an average marker density of 5.8 cM. Segregation distortion in the 3 populations was 23.0%, 13.5% and 7.8% of the markers, respectively. These distorted loci tended to cluster on LG1, LG3, LG4 and LG5. There were only 15 EST-SSR markers mapped due to low polymorphism. By comparison, there were potential synteny, collinear order of some markers and conservation of collinear linkage groups among the maps and with the AA genome but not fully conservative.
A composite linkage map was constructed from three individual mapping populations with 175 SSR markers in 22 composite linkage groups. This composite genetic linkage map is among the first "true" tetraploid peanut maps produced. This map also consists of 47 SSRs that have been used in the published AA genome maps, and could be used in comparative mapping studies. The primers described in this study are PCR-based markers, which are easy to share for genetic mapping in peanuts. All 1044 primer pairs are provided as additional files and the three RIL populations will be made available to public upon request for quantitative trait loci (QTL) analysis and linkage map improvement.
Peanut (Arachis hypogaea L.) ranks fifth among the world oil crops and is widely grown in India and neighbouring countries. Due to
its large and unknown genome size, studies on genomics and genetic modification of peanut are still scanty as compared to other
model crops like Arabidopsis, rice, cotton and soybean. Because of its favourable cultivation in semi-arid regions, study on abiotic
stress responsive genes and its regulation in peanut is very much important. Therefore, we aim to identify and annotate the abiotic
stress responsive candidate genes in peanut ESTs. Expression data of drought stress responsive corresponding genes and EST
sequences were screened from dot blot experiments shown as heat maps and supplementary tables, respectively as reported by
Govind et al. (2009). Some of the screened genes having no information about their ESTs in above mentioned supplementary tables
were retrieved from NCBI. A phylogenetic analysis was performed to find a group of utmost similar ESTs for each selected gene.
Individual EST of the said group were further searched in peanut ESTs (1,78,490 whole EST sequences) using stand alone BLAST.
For the prediction as well as annotation of abiotic stress responsive selected genes, various tools (like Vec-Screen, Repeat Masker,
EST-Trimmer, DNA Baser, WISE2 and I-TASSER) were used. Here we report the predicted result of Contigs, domain as well as 3D
structure for HSP 17.3KDa protein, DnaJ protein and Type 2 Metallothionein protein.
Arachis hypogaea; EST; Gene annotation; Stress; Contigs
The complex, tetraploid genome structure of peanut (Arachis hypogaea) has obstructed advances in genetics and genomics in the species. The aim of this study is to understand the genome structure of Arachis by developing a high-density integrated consensus map. Three recombinant inbred line populations derived from crosses between the A genome diploid species, Arachis duranensis and Arachis stenosperma; the B genome diploid species, Arachis ipaënsis and Arachis magna; and between the AB genome tetraploids, A. hypogaea and an artificial amphidiploid (A. ipaënsis × A. duranensis)4×, were used to construct genetic linkage maps: 10 linkage groups (LGs) of 544 cM with 597 loci for the A genome; 10 LGs of 461 cM with 798 loci for the B genome; and 20 LGs of 1442 cM with 1469 loci for the AB genome. The resultant maps plus 13 published maps were integrated into a consensus map covering 2651 cM with 3693 marker loci which was anchored to 20 consensus LGs corresponding to the A and B genomes. The comparative genomics with genome sequences of Cajanus cajan, Glycine max, Lotus japonicus, and Medicago truncatula revealed that the Arachis genome has segmented synteny relationship to the other legumes. The comparative maps in legumes, integrated tetraploid consensus maps, and genome-specific diploid maps will increase the genetic and genomic understanding of Arachis and should facilitate molecular breeding.
Arachis spp.; comparative genomics; genetic linkage map; integrated consensus map; legume genome
Many plant ESTs have been sequenced as an alternative to whole genome sequences, including peanut because of the genome size and complexity. The US peanut research community had the historic 2004 Atlanta Genomics Workshop and named the EST project as a main priority. As of August 2011, the peanut research community had deposited 252,832 ESTs in the public NCBI EST database, and this resource has been providing the community valuable tools and core foundations for various genome-scale experiments before the whole genome sequencing project. These EST resources have been used for marker development, gene cloning, microarray gene expression and genetic map construction. Certainly, the peanut EST sequence resources have been shown to have a wide range of applications and accomplished its essential role at the time of need. Then the EST project contributes to the second historic event, the Peanut Genome Project 2010 Inaugural Meeting also held in Atlanta where it was decided to sequence the entire peanut genome. After the completion of peanut whole genome sequencing, ESTs or transcriptome will continue to play an important role to fill in knowledge gaps, to identify particular genes and to explore gene function.
The genus Arachis includes Arachis hypogaea (cultivated peanut) and wild species that are used in peanut breeding or as forage. Molecular markers have been employed in several studies of this genus, but microsatellite markers have only been used in few investigations. Microsatellites are very informative and are useful to assess genetic variability, analyze mating systems and in genetic mapping. The objectives of this study were to develop A. hypogaea microsatellite loci and to evaluate the transferability of these markers to other Arachis species.
Thirteen loci were isolated and characterized using 16 accessions of A. hypogaea. The level of variation found in A. hypogaea using microsatellites was higher than with other markers. Cross-transferability of the markers was also high. Sequencing of the fragments amplified using the primer pair Ah11 from 17 wild Arachis species showed that almost all wild species had similar repeated sequence to the one observed in A. hypogaea. Sequence data suggested that there is no correlation between taxonomic relationship of a wild species to A. hypogaea and the number of repeats found in its microsatellite loci.
These results show that microsatellite primer pairs from A. hypogaea have multiple uses. A higher level of variation among A. hypogaea accessions can be detected using microsatellite markers in comparison to other markers, such as RFLP, RAPD and AFLP. The microsatellite primers of A. hypogaea showed a very high rate of transferability to other species of the genus. These primer pairs provide important tools to evaluate the genetic variability and to assess the mating system in Arachis species.
EST (expressed sequence tag) sequences and their annotation provide a highly valuable resource for gene discovery, genome sequence annotation, and other genomics studies that can be applied in genetics, breeding and conservation programs for non-model organisms. Conifers are long-lived plants that are ecologically and economically important globally, and have a large genome size. Black spruce (Picea mariana), is a transcontinental species of the North American boreal and temperate forests. However, there are limited transcriptomic and genomic resources for this species. The primary objective of our study was to develop a black spruce transcriptomic resource to facilitate on-going functional genomics projects related to growth and adaptation to climate change.
We conducted bidirectional sequencing of cDNA clones from a standard cDNA library constructed from black spruce needle tissues. We obtained 4,594 high quality (2,455 5' end and 2,139 3' end) sequence reads, with an average read-length of 532 bp. Clustering and assembly of ESTs resulted in 2,731 unique sequences, consisting of 2,234 singletons and 497 contigs. Approximately two-thirds (63%) of unique sequences were functionally annotated. Genes involved in 36 molecular functions and 90 biological processes were discovered, including 24 putative transcription factors and 232 genes involved in photosynthesis. Most abundantly expressed transcripts were associated with photosynthesis, growth factors, stress and disease response, and transcription factors. A total of 216 full-length genes were identified. About 18% (493) of the transcripts were novel, representing an important addition to the Genbank EST database (dbEST). Fifty-seven di-, tri-, tetra- and penta-nucleotide simple sequence repeats were identified.
We have developed the first high quality EST resource for black spruce and identified 493 novel transcripts, which may be species-specific related to life history and ecological traits. We have also identified full-length genes and microsatellite-containing ESTs. Based on EST sequence similarities, black spruce showed close evolutionary relationships with congeneric Picea glauca and Picea sitchensis compared to other Pinaceae members and angiosperms. The EST sequences reported here provide an important resource for genome annotation, functional and comparative genomics, molecular breeding, conservation and management studies and applications in black spruce and related conifer species.
Picea mariana; Expressed sequence tag; Gene discovery; Gene expression; Gene ontology; Microsatellites
Peanut (Arachis hypogaea L.) is a crop of economic and social importance, mainly in tropical areas, and developing countries. Its molecular breeding has been hindered by a shortage of polymorphic genetic markers due to a very narrow genetic base. Microsatellites (SSRs) are markers of choice in peanut because they are co-dominant, highly transferrable between species and easily applicable in the allotetraploid genome. In spite of substantial effort over the last few years by a number of research groups, the number of SSRs that are polymorphic for A. hypogaea is still limiting for routine application, creating the demand for the discovery of more markers polymorphic within cultivated germplasm.
A plasmid genomic library enriched for TC/AG repeats was constructed and 1401 clones sequenced. From the sequences obtained 146 primer pairs flanking mostly TC microsatellites were developed. The average number of repeat motifs amplified was 23. These 146 markers were characterized on 22 genotypes of cultivated peanut. In total 78 of the markers were polymorphic within cultivated germplasm. Most of those 78 markers were highly informative with an average of 5.4 alleles per locus being amplified. Average gene diversity index (GD) was 0.6, and 66 markers showed a GD of more than 0.5. Genetic relationship analysis was performed and corroborated the current taxonomical classification of A. hypogaea subspecies and varieties.
The microsatellite markers described here are a useful resource for genetics and genomics in Arachis. In particular, the 66 markers that are highly polymorphic in cultivated peanut are a significant step towards routine genetic mapping and marker-assisted selection for the crop.