Cultivated peanut (Arachis hypogaea) is one of the most widely grown grain legumes in the world, being valued for its high protein and unsaturated oil contents. Worldwide, the major constraints to peanut production are drought and fungal diseases. Wild Arachis species, which are exclusively South American in origin, have high genetic diversity and have been selected during evolution in a range of environments and biotic stresses, constituting a rich source of allele diversity. Arachis stenosperma harbors resistances to a number of pests, including fungal diseases, whilst A. duranensis has shown improved tolerance to water limited stress. In this study, these species were used for the creation of an extensive databank of wild Arachis transcripts under stress which will constitute a rich source for gene discovery and molecular markers development.
Transcriptome analysis of cDNA collections from A. stenosperma challenged with Cercosporidium personatum (Berk. and M.A. Curtis) Deighton, and A. duranensis submitted to gradual water limited stress was conducted using 454 GS FLX Titanium generating a total of 7.4 x 105 raw sequence reads covering 211 Mbp of both genomes. High quality reads were assembled to 7,723 contigs for A. stenosperma and 12,792 for A. duranensis and functional annotation indicated that 95% of the contigs in both species could be appointed to GO annotation categories. A number of transcription factors families and defense related genes were identified in both species. Additionally, the expression of five A. stenosperma Resistance Gene Analogs (RGAs) and four retrotransposon (FIDEL-related) sequences were analyzed by qRT-PCR. This data set was used to design a total of 2,325 EST-SSRs, of which a subset of 584 amplified in both species and 214 were shown to be polymorphic using ePCR.
This study comprises one of the largest unigene dataset for wild Arachis species and will help to elucidate genes involved in responses to biological processes such as fungal diseases and water limited stress. Moreover, it will also facilitate basic and applied research on the genetics of peanut through the development of new molecular markers and the study of adaptive variation across the genus.
Large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed in peanut (Arachis hypogaea L.) to obtain more informative genetic markers. A total of 10,102 potential non-redundant EST sequences, including 3,445 contigs and 6,657 singletons, were generated from cDNA libraries of the gynophore, roots, leaves and seedlings. A total of 3,187 primer pairs were designed on flanking regions of SSRs, some of which allowed one and two base mismatches. Among the 3,187 markers generated, 2,540 (80%) were trinucleotide repeats, 302 (9%) were dinucleotide repeats, and 345 (11%) were tetranucleotide repeats. Pre-polymorphic analyses of 24 Arachis accessions were performed using 10% polyacrylamide gels. A total of 1,571 EST-SSR markers showing clear polymorphisms were selected for further polymorphic analysis with a Fluoro-fragment Analyzer. The 16 Arachis accessions examined included cultivated peanut varieties as well as diploid species with the A or B genome. Altogether 1,281 (81.5%) of the 1,571 markers were polymorphic among the 16 accessions, and 366 (23.3%) were polymorphic among the 12 cultivated varieties. Diversity analysis was performed and the genotypes of all 16 Arachis accessions showed similarity coefficients ranging from 0.37 to 0.97.
Electronic supplementary material
The online version of this article (doi:10.1007/s11032-011-9604-8) contains supplementary material, which is available to authorized users.
Arachis spp.; EST-SSR marker; Polymorphic analysis; Genetic diversity
Peanut (Arachis hypogaea L.) ranks fifth among the world oil crops and is widely grown in India and neighbouring countries. Due to
its large and unknown genome size, studies on genomics and genetic modification of peanut are still scanty as compared to other
model crops like Arabidopsis, rice, cotton and soybean. Because of its favourable cultivation in semi-arid regions, study on abiotic
stress responsive genes and its regulation in peanut is very much important. Therefore, we aim to identify and annotate the abiotic
stress responsive candidate genes in peanut ESTs. Expression data of drought stress responsive corresponding genes and EST
sequences were screened from dot blot experiments shown as heat maps and supplementary tables, respectively as reported by
Govind et al. (2009). Some of the screened genes having no information about their ESTs in above mentioned supplementary tables
were retrieved from NCBI. A phylogenetic analysis was performed to find a group of utmost similar ESTs for each selected gene.
Individual EST of the said group were further searched in peanut ESTs (1,78,490 whole EST sequences) using stand alone BLAST.
For the prediction as well as annotation of abiotic stress responsive selected genes, various tools (like Vec-Screen, Repeat Masker,
EST-Trimmer, DNA Baser, WISE2 and I-TASSER) were used. Here we report the predicted result of Contigs, domain as well as 3D
structure for HSP 17.3KDa protein, DnaJ protein and Type 2 Metallothionein protein.
Arachis hypogaea; EST; Gene annotation; Stress; Contigs
The genus Arachis is native to a region that includes Central Brazil and neighboring countries. Little is known about the genetic variability of the Brazilian cultivated peanut (Arachis hypogaea, genome AABB) germplasm collection at the DNA level. The understanding of the genetic diversity of cultivated and wild species of peanut (Arachis spp.) is essential to develop strategies of collection, conservation and use of the germplasm in variety development. The identity of the ancestor progenitor species of cultivated peanut has also been of great interest. Several species have been suggested as putative AA and BB genome donors to allotetraploid A. hypogaea. Microsatellite or SSR (Simple Sequence Repeat) markers are co-dominant, multiallelic, and highly polymorphic genetic markers, appropriate for genetic diversity studies. Microsatellite markers may also, to some extent, support phylogenetic inferences. Here we report the use of a set of microsatellite markers, including newly developed ones, for phylogenetic inferences and the analysis of genetic variation of accessions of A. hypogea and its wild relatives.
A total of 67 new microsatellite markers (mainly TTG motif) were developed for Arachis. Only three of these markers, however, were polymorphic in cultivated peanut. These three new markers plus five other markers characterized previously were evaluated for number of alleles per locus and gene diversity using 60 accessions of A. hypogaea. Genetic relationships among these 60 accessions and a sample of 36 wild accessions representative of section Arachis were estimated using allelic variation observed in a selected set of 12 SSR markers. Results showed that the Brazilian peanut germplasm collection has considerable levels of genetic diversity detected by SSR markers. Similarity groups for A. hypogaea accessions were established, which is a useful criteria for selecting parental plants for crop improvement. Microsatellite marker transferability was up to 76% for species of the section Arachis, but only 45% for species from the other eight Arachis sections tested. A new marker (Ah-041) presented a 100% transferability and could be used to classify the peanut accessions in AA and non-AA genome carriers.
The level of polymorphism observed among accessions of A. hypogaea analyzed with newly developed microsatellite markers was low, corroborating the accumulated data which show that cultivated peanut presents a relatively reduced variation at the DNA level. A selected panel of SSR markers allowed the classification of A. hypogaea accessions into two major groups. The identification of similarity groups will be useful for the selection of parental plants to be used in breeding programs. Marker transferability is relatively high between accessions of section Arachis. The possibility of using microsatellite markers developed for one species in genetic evaluation of other species greatly reduces the cost of the analysis, since the development of microsatellite markers is still expensive and time consuming. The SSR markers developed in this study could be very useful for genetic analysis of wild species of Arachis, including comparative genome mapping, population genetic structure and phylogenetic inferences among species.
Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination.
We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species.
The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546.
The complex, tetraploid genome structure of peanut (Arachis hypogaea) has obstructed advances in genetics and genomics in the species. The aim of this study is to understand the genome structure of Arachis by developing a high-density integrated consensus map. Three recombinant inbred line populations derived from crosses between the A genome diploid species, Arachis duranensis and Arachis stenosperma; the B genome diploid species, Arachis ipaënsis and Arachis magna; and between the AB genome tetraploids, A. hypogaea and an artificial amphidiploid (A. ipaënsis × A. duranensis)4×, were used to construct genetic linkage maps: 10 linkage groups (LGs) of 544 cM with 597 loci for the A genome; 10 LGs of 461 cM with 798 loci for the B genome; and 20 LGs of 1442 cM with 1469 loci for the AB genome. The resultant maps plus 13 published maps were integrated into a consensus map covering 2651 cM with 3693 marker loci which was anchored to 20 consensus LGs corresponding to the A and B genomes. The comparative genomics with genome sequences of Cajanus cajan, Glycine max, Lotus japonicus, and Medicago truncatula revealed that the Arachis genome has segmented synteny relationship to the other legumes. The comparative maps in legumes, integrated tetraploid consensus maps, and genome-specific diploid maps will increase the genetic and genomic understanding of Arachis and should facilitate molecular breeding.
Arachis spp.; comparative genomics; genetic linkage map; integrated consensus map; legume genome
Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and phenotypic diversity within the cultigen. However, wild diploid relatives are a rich source of alleles that could be used for crop improvement and their simpler genomes can be more easily analyzed while providing insight into the structure of the allotetraploid peanut genome. The objective of this research was to establish a high-density genetic map of the diploid species A. duranensis based on de novo generated EST databases. Arachis duranensis was chosen for mapping because it is the A-genome progenitor of cultivated peanut and also in order to circumvent the confounding effects of gene duplication associated with allopolyploidy in A. hypogaea.
More than one million expressed sequence tag (EST) sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. Mining this dataset, 1236 EST-SNP markers were developed between two A. duranensis accessions, PI 475887 and Grif 15036. An additional 300 SNP markers also were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map. In addition, 598 EST-SSR markers identified in A. hypogaea assemblies were included in the map along with 37 disease resistance gene candidate (RGC) and 35 other previously published markers. In total, 1724 markers spanning 1081.3 cM over 10 linkage groups were mapped. Gene sequences that provided mapped markers were annotated using similarity searches in three different databases, and gene ontology descriptions were determined using the Medicago Gene Atlas and TAIR databases. Synteny analysis between A. duranensis, Medicago and Glycine revealed significant stretches of conserved gene clusters spread across the peanut genome. A higher level of colinearity was detected between A. duranensis and Glycine than with Medicago.
The first high-density, gene-based linkage map for A. duranensis was generated that can serve as a reference map for both wild and cultivated Arachis species. The markers developed here are valuable resources for the peanut, and more broadly, to the legume research community. The A-genome map will have utility for fine mapping in other peanut species and has already had application for mapping a nematode resistance gene that was introgressed into A. hypogaea from A. cardenasii.
The genus Arachis includes Arachis hypogaea (cultivated peanut) and wild species that are used in peanut breeding or as forage. Molecular markers have been employed in several studies of this genus, but microsatellite markers have only been used in few investigations. Microsatellites are very informative and are useful to assess genetic variability, analyze mating systems and in genetic mapping. The objectives of this study were to develop A. hypogaea microsatellite loci and to evaluate the transferability of these markers to other Arachis species.
Thirteen loci were isolated and characterized using 16 accessions of A. hypogaea. The level of variation found in A. hypogaea using microsatellites was higher than with other markers. Cross-transferability of the markers was also high. Sequencing of the fragments amplified using the primer pair Ah11 from 17 wild Arachis species showed that almost all wild species had similar repeated sequence to the one observed in A. hypogaea. Sequence data suggested that there is no correlation between taxonomic relationship of a wild species to A. hypogaea and the number of repeats found in its microsatellite loci.
These results show that microsatellite primer pairs from A. hypogaea have multiple uses. A higher level of variation among A. hypogaea accessions can be detected using microsatellite markers in comparison to other markers, such as RFLP, RAPD and AFLP. The microsatellite primers of A. hypogaea showed a very high rate of transferability to other species of the genus. These primer pairs provide important tools to evaluate the genetic variability and to assess the mating system in Arachis species.
The genus Arachis, originated in South America, is divided into nine taxonomical sections comprising of 80 species. Most of the Arachis species are diploids (2n = 2x = 20) and the tetraploid species (2n = 2x = 40) are found in sections Arachis, Extranervosae and Rhizomatosae. Diploid species have great potential to be used as resistance sources for agronomic traits like pests and diseases, drought related traits and different life cycle spans. Understanding of genetic relationships among wild species and between wild and cultivated species will be useful for enhanced utilization of wild species in improving cultivated germplasm. The present study was undertaken to evaluate genetic relationships among species (96 accessions) belonging to seven sections of Arachis by using simple sequence repeat (SSR) markers developed from Arachis hypogaea genomic library and gene sequences from related genera of Arachis.
The average transferability rate of 101 SSR markers tested to section Arachis and six other sections was 81% and 59% respectively. Five markers (IPAHM 164, IPAHM 165, IPAHM 407a, IPAHM 409, and IPAHM 659) showed 100% transferability. Cluster analysis of allelic data from a subset of 32 SSR markers on 85 wild and 11 cultivated accessions grouped accessions according to their genome composition, sections and species to which they belong. A total of 109 species specific alleles were detected in different wild species, Arachis pusilla exhibited largest number of species specific alleles (15). Based on genetic distance analysis, the A-genome accession ICG 8200 (A. duranensis) and the B-genome accession ICG 8206 (A. ipaënsis) were found most closely related to A. hypogaea.
A set of cross species and cross section transferable SSR markers has been identified that will be useful for genetic studies of wild species of Arachis, including comparative genome mapping, germplasm analysis, population genetic structure and phylogenetic inferences among species. The present study provides strong support based on both genomic and genic markers, probably for the first time, on relationships of A. monticola and A. hypogaea as well as on the most probable donor of A and B-genomes of cultivated groundnut.
Cultivated peanut, Arachis hypogaea is an allotetraploid of recent origin, with an AABB genome. In common with many other polyploids, it seems that a severe genetic bottle-neck was imposed at the species origin, via hybridisation of two wild species and spontaneous chromosome duplication. Therefore, the study of the genome of peanut is hampered both by the crop's low genetic diversity and its polyploidy. In contrast to cultivated peanut, most wild Arachis species are diploid with high genetic diversity. The study of diploid Arachis genomes is therefore attractive, both to simplify the construction of genetic and physical maps, and for the isolation and characterization of wild alleles. The most probable wild ancestors of cultivated peanut are A. duranensis and A. ipaënsis with genome types AA and BB respectively.
We constructed and characterized two large-insert libraries in Bacterial Artificial Chromosome (BAC) vector, one for each of the diploid ancestral species. The libraries (AA and BB) are respectively c. 7.4 and c. 5.3 genome equivalents with low organelle contamination and average insert sizes of 110 and 100 kb. Both libraries were used for the isolation of clones containing genetically mapped legume anchor markers (single copy genes), and resistance gene analogues.
These diploid BAC libraries are important tools for the isolation of wild alleles conferring resistances to biotic stresses, comparisons of orthologous regions of the AA and BB genomes with each other and with other legume species, and will facilitate the construction of a physical map.
Worldwide, diseases are important reducers of peanut (Arachis hypogaea) yield. Sources of resistance against many diseases are available in cultivated peanut genotypes, although often not in farmer preferred varieties. Wild species generally harbor greater levels of resistance and even apparent immunity, although the linkage of agronomically un-adapted wild alleles with wild disease resistance genes is inevitable. Marker-assisted selection has the potential to facilitate the combination of both cultivated and wild resistance loci with agronomically adapted alleles. However, in peanut there is an almost complete lack of knowledge of the regions of the Arachis genome that control disease resistance.
In this work we identified candidate genome regions that control disease resistance. For this we placed candidate disease resistance genes and QTLs against late leaf spot disease on the genetic map of the A-genome of Arachis, which is based on microsatellite markers and legume anchor markers. These marker types are transferable within the genus Arachis and to other legumes respectively, enabling this map to be aligned to other Arachis maps and to maps of other legume crops including those with sequenced genomes. In total, 34 sequence-confirmed candidate disease resistance genes and five QTLs were mapped.
Candidate genes and QTLs were distributed on all linkage groups except for the smallest, but the distribution was not even. Groupings of candidate genes and QTLs for late leaf spot resistance were apparent on the upper region of linkage group 4 and the lower region of linkage group 2, indicating that these regions are likely to control disease resistance.
Lack of sufficient molecular markers hinders current genetic research in peanuts (Arachis hypogaea L.). It is necessary to develop more molecular markers for potential use in peanut genetic research. With the development of peanut EST projects, a vast amount of available EST sequence data has been generated. These data offered an opportunity to identify SSR in ESTs by data mining.
In this study, we investigated 24,238 ESTs for the identification and development of SSR markers. In total, 881 SSRs were identified from 780 SSR-containing unique ESTs. On an average, one SSR was found per 7.3 kb of EST sequence with tri-nucleotide motifs (63.9%) being the most abundant followed by di- (32.7%), tetra- (1.7%), hexa- (1.0%) and penta-nucleotide (0.7%) repeat types. The top six motifs included AG/TC (27.7%), AAG/TTC (17.4%), AAT/TTA (11.9%), ACC/TGG (7.72%), ACT/TGA (7.26%) and AT/TA (6.3%). Based on the 780 SSR-containing ESTs, a total of 290 primer pairs were successfully designed and used for validation of the amplification and assessment of the polymorphism among 22 genotypes of cultivated peanuts and 16 accessions of wild species. The results showed that 251 primer pairs yielded amplification products, of which 26 and 221 primer pairs exhibited polymorphism among the cultivated and wild species examined, respectively. Two to four alleles were found in cultivated peanuts, while 3–8 alleles presented in wild species. The apparent broad polymorphism was further confirmed by cloning and sequencing of amplified alleles. Sequence analysis of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the microsatellite regions. In addition, a few single base mutations were observed in the microsatellite flanking regions.
This study gives an insight into the frequency, type and distribution of peanut EST-SSRs and demonstrates successful development of EST-SSR markers in cultivated peanut. These EST-SSR markers could enrich the current resource of molecular markers for the peanut community and would be useful for qualitative and quantitative trait mapping, marker-assisted selection, and genetic diversity studies in cultivated peanut as well as related Arachis species. All of the 251 working primer pairs with names, motifs, repeat types, primer sequences, and alleles tested in cultivated and wild species are listed in Additional File 1.
The peanut (Arachis hypogaea) is an important oil crop. Breeding for high oil content is becoming increasingly important. Wild Arachis species have been reported to harbor genes for many valuable traits that may enable the improvement of cultivated Arachis hypogaea, such as resistance to pests and disease. However, only limited information is available on variation in oil content. In the present study, a collection of 72 wild Arachis accessions representing 19 species and 3 cultivated peanut accessions were genotyped using 136 genome-wide SSR markers and phenotyped for oil content over three growing seasons. The wild Arachis accessions showed abundant diversity across the 19 species. A. duranensis exhibited the highest diversity, with a Shannon-Weaver diversity index of 0.35. A total of 129 unique alleles were detected in the species studied. A. rigonii exhibited the largest number of unique alleles (75), indicating that this species is highly differentiated. AMOVA and genetic distance analyses confirmed the genetic differentiation between the wild Arachis species. The majority of SSR alleles were detected exclusively in the wild species and not in A. hypogaea, indicating that directional selection or the hitchhiking effect has played an important role in the domestication of the cultivated peanut. The 75 accessions were grouped into three clusters based on population structure and phylogenic analysis, consistent with their taxonomic sections, species and genome types. A. villosa and A. batizocoi were grouped with A. hypogaea, suggesting the close relationship between these two diploid wild species and the cultivated peanut. Considerable phenotypic variation in oil content was observed among different sections and species. Nine alleles were identified as associated with oil content based on association analysis, of these, three alleles were associated with higher oil content but were absent in the cultivated peanut. The results demonstrated that there is great potential to increase the oil content in A. hypogaea by using the wild Arachis germplasm.
Arachis hypogaea (peanut) is an important crop worldwide, being mostly used for edible oil production, direct consumption and animal feed. Cultivated peanut is an allotetraploid species with two different genome components, A and B. Genetic linkage maps can greatly assist molecular breeding and genomic studies. However, the development of linkage maps for A. hypogaea is difficult because it has very low levels of polymorphism. This can be overcome by the utilization of wild species of Arachis, which present the A- and B-genomes in the diploid state, and show high levels of genetic variability.
In this work, we constructed a B-genome linkage map, which will complement the previously published map for the A-genome of Arachis, and produced an entire framework for the tetraploid genome. This map is based on an F2 population of 93 individuals obtained from the cross between the diploid A. ipaënsis (K30076) and the closely related A. magna (K30097), the former species being the most probable B genome donor to cultivated peanut. In spite of being classified as different species, the parents showed high crossability and relatively low polymorphism (22.3%), compared to other interspecific crosses. The map has 10 linkage groups, with 149 loci spanning a total map distance of 1,294 cM. The microsatellite markers utilized, developed for other Arachis species, showed high transferability (81.7%). Segregation distortion was 21.5%. This B-genome map was compared to the A-genome map using 51 common markers, revealing a high degree of synteny between both genomes.
The development of genetic maps for Arachis diploid wild species with A- and B-genomes effectively provides a genetic map for the tetraploid cultivated peanut in two separate diploid components and is a significant advance towards the construction of a transferable reference map for Arachis. Additionally, we were able to identify affinities of some Arachis linkage groups with Medicago truncatula, which will allow the transfer of information from the nearly-complete genome sequences of this model legume to the peanut crop.
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.
Chinese oak silkworm; Antheraea pernyi; cDNA library; Expressed sequence tag; KK-42-binding protein; diapause termination
Chickpea (C. arietinum L.) ranks third in food legume crop production in the world. However, drought poses a serious threat to chickpea production, and development of drought-resistant varieties is a necessity. Unfortunately, cultivated chickpea has a high morphological but narrow genetic diversity, and understanding the genetic processes of this plant is hindered by the fact that the chickpea genome has not yet been sequenced and its EST resources are limited. In this study, two chickpea varieties having contrasting levels of drought-tolerance were analyzed for differences in transcript profiling during drought stress treatment by withdrawal of irrigation at different time points. Transcript profiles of ESTs derived from subtractive cDNA libraries constructed with RNA from whole seedlings of both varieties were analyzed at different stages of stress treatment.
A series of comparisons of transcript abundance between two varieties at different time points were made. 319 unique ESTs available from different libraries were categorized into eleven clusters according to their comparative expression profiles. Expression analysis revealed that 70% of the ESTs were more than two fold abundant in the tolerant cultivar at any point of the stress treatment of which expression of 33% ESTs were more than two fold high even under the control condition. 53 ESTs that displayed very high fold relative expression in the tolerant variety were screened for further analysis. These ESTs were clustered in four groups according to their expression patterns.
Annotation of the highly expressed ESTs in the tolerant cultivar predicted that most of them encoded proteins involved in cellular organization, protein metabolism, signal transduction, and transcription. Results from this study may help in targeting useful genes for improving drought tolerance in chickpea.
Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum.
cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9).
A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species.
The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource.
With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors.
As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the Peanut research community with an easy-to-use web portal that will definitely facilitate genomics research and molecular breeding in this less-studied crop.
Peanut; Arachis hypogaea; Transcriptome sequencing; Transcriptome assembly; Database; PeanutDB; SNP; SSR; Functional annotation
Analysis of transcriptomes is of great importance in genomic studies. Asian seabass is an important fish species. A number of genomic tools in it were developed, while large expressed sequence tag (EST) data are lacking. We sequenced ESTs from nine normalized cDNA libraries and obtained 11 431 high-quality ESTs. We retrieved 8524 ESTs from dbEST database and analyzed all 19 975 ESTs using bioinformatics tools. After clustering, we obtained 8837 unique sequences (2838 contigs and 5999 singletons). The average contig length was 574 bp. Annotation of these unique sequences revealed that 48.9% of them showed significant homology to RNA sequences in GenBank. Functional classification of the unique ESTs identified a broad range of genes involved in different functions. We identified 6114 putative single-nucleotide polymorphisms and 634 microsatellites in ESTs. We discovered different temporal and spatial expression patterns of some immune-related genes in the Asian seabass after challenging with a pathogen Vibrio harveyi. The unique EST sequences are being used in developing a cDNA microarray to examine global gene expression and will also facilitate future whole-genome sequence assembly and annotation of Asian seabass and comparative genomics.
Asian seabass; EST; function; expression
Peanut (Arachis hypogaea L.) is a crop of economic and social importance, mainly in tropical areas, and developing countries. Its molecular breeding has been hindered by a shortage of polymorphic genetic markers due to a very narrow genetic base. Microsatellites (SSRs) are markers of choice in peanut because they are co-dominant, highly transferrable between species and easily applicable in the allotetraploid genome. In spite of substantial effort over the last few years by a number of research groups, the number of SSRs that are polymorphic for A. hypogaea is still limiting for routine application, creating the demand for the discovery of more markers polymorphic within cultivated germplasm.
A plasmid genomic library enriched for TC/AG repeats was constructed and 1401 clones sequenced. From the sequences obtained 146 primer pairs flanking mostly TC microsatellites were developed. The average number of repeat motifs amplified was 23. These 146 markers were characterized on 22 genotypes of cultivated peanut. In total 78 of the markers were polymorphic within cultivated germplasm. Most of those 78 markers were highly informative with an average of 5.4 alleles per locus being amplified. Average gene diversity index (GD) was 0.6, and 66 markers showed a GD of more than 0.5. Genetic relationship analysis was performed and corroborated the current taxonomical classification of A. hypogaea subspecies and varieties.
The microsatellite markers described here are a useful resource for genetics and genomics in Arachis. In particular, the 66 markers that are highly polymorphic in cultivated peanut are a significant step towards routine genetic mapping and marker-assisted selection for the crop.
Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes.
A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs).
This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants and serve as an abundant information platform for functional marker development and functional gene study.
Limited DNA sequence and DNA marker resources have been developed for Iris (Iridaceae), a monocot genus of 200–300 species in the Asparagales, several of which are horticulturally important. We mined an I. brevicaulis-I. fulva EST database for simple sequence repeats (SSRs) and developed ortholog-specific EST-SSR markers for genetic mapping and other genotyping applications in Iris. Here, we describe the abundance and other characteristics of SSRs identified in the transcript assembly (EST database) and the cross-species utility and polymorphisms of I. brevicaulis-I. fulva EST-SSR markers among wild collected ecotypes and horticulturally important cultivars.
Collectively, 6,530 ESTs were produced from normalized leaf and root cDNA libraries of I. brevicaulis (IB72) and I. fulva (IF174), and assembled into 4,917 unigenes (1,066 contigs and 3,851 singletons). We identified 1,447 SSRs in 1,162 unigenes and developed 526 EST-SSR markers, each tracing a different unigene. Three-fourths of the EST-SSR markers (399/526) amplified alleles from IB72 and IF174 and 84% (335/399) were polymorphic between IB25 and IF174, the parents of I. brevicaulis × I. fulva mapping populations. Forty EST-SSR markers were screened for polymorphisms among 39 ecotypes or cultivars of seven species – 100% amplified alleles from wild collected ecotypes of Louisiana Iris (I.brevicaulis, I.fulva, I. nelsonii, and I. hexagona), whereas 42–52% amplified alleles from cultivars of three horticulturally important species (I. pseudacorus, I. germanica, and I. sibirica). Ecotypes and cultivars were genetically diverse – the number of alleles/locus ranged from two to 18 and mean heterozygosity was 0.76.
Nearly 400 ortholog-specific EST-SSR markers were developed for comparative genetic mapping and other genotyping applications in Iris, were highly polymorphic among ecotypes and cultivars, and have broad utility for genotyping applications within the genus.
TomatEST is a secondary database integrating expressed sequence tag (EST)/cDNA sequence information from different libraries of multiple tomato species. Redundant EST collections from each species are organized into clusters (gene indices). A cluster consists of one or multiple contigs. Multiple contigs in a cluster represent alternatively transcribed forms of a gene. The set of stand-alone EST sequences (singletons) and contigs, representing all the computationally defined ‘Transcript Indices’, are annotated according to similarity versus protein and RNA family databases. Sequence function description is integrated with the Gene Ontologies and the Enzyme Commission identifiers for a standard classification of gene products and for the mapping of the expressed sequences onto metabolic pathways. Information on the origin of the ESTs, on their structural features, on clusters and contigs, as well as on functional annotations are accessible via a user-friendly web interface. Specific facilities in the database allow Transcript Indices from a query be automatically classified in Enzyme classes and in metabolic pathways. The ‘on the fly’ mapping onto the metabolic maps is integrated in the analytical tools. The TomatEST database website is freely available at .
Cultivated strawberry (Fragaria × ananassa) represents one of the most valued fruit crops in the United States. Despite its economic importance, the octoploid genome presents a formidable barrier to efficient study of genome structure and molecular mechanisms that underlie agriculturally-relevant traits. Many potentially fruitful research avenues, especially large-scale gene expression surveys and development of molecular genetic markers have been limited by a lack of sequence information in public databases. As a first step to remedy this discrepancy a cDNA library has been developed from salicylate-treated, whole-plant tissues and over 1800 expressed sequence tags (EST's) have been sequenced and analyzed.
A putative unigene set of 1304 sequences – 133 contigs and 1171 singlets – has been developed, and the transcripts have been functionally annotated. Homology searches indicate that 89.5% of sequences share significant similarity to known/putative proteins or Rosaceae ESTs. The ESTs have been functionally characterized and genes relevant to specific physiological processes of economic importance have been identified. A set of tools useful for SSR development and mapping is presented.
Sequences derived from this effort may be used to speed gene discovery efforts in Fragaria and the Rosaceae in general and also open avenues of comparative mapping. This report represents a first step in expanding molecular-genetic analyses in strawberry and demonstrates how computational tools can be used to optimally mine a large body of useful information from a relatively small data set.
We generated the PEDE (Pig EST Data Explorer; http://pede.dna.affrc.go.jp/) database using se quences assembled from porcine 5′ ESTs from oligo-capped full-length cDNA libraries. Thus far we have performed EST analysis of various organs (thymus, spleen, uterus, lung, liver, ovary and peripheral blood mononuclear cells) and assembled 68 076 high-quality sequences into 5546 contigs and 28 461 singlets. PEDE provides a search interface for getting results of homology searches and enables users to obtain information on sequence data and cDNA clones of interest. Single-nucleotide polymorphisms detected through comparison of the EST sequences are classified by origin (western and oriental breeds) and are searchable in the database. This database system can accelerate analyses of livestock traits and yields information that can lead to new applications in pigs as model systems for medical research.