Cassava (Manihot esculenta Crantz), a starchy root crop grown in tropical and subtropical climates, is the sixth most important crop in the world after wheat, rice, maize, potato and barley. The repertoire of simple sequence repeat (SSR) markers for cassava is limited and warrants a need for a larger number of polymorphic SSRs for germplasm characterization and breeding applications.
A total of 846 putative microsatellites were identified in silico from an 8,577 cassava unigene set with an average density of one SSR every 7 kb. One hundred and ninety-two candidate SSRs were screened for polymorphism among a panel of cassava cultivars from Africa, Latin America and Asia, four wild Manihot species as well as two other important taxa in the Euphorbiaceae, leafy spurge (Euphorbia esula) and castor bean (Ricinus communis). Of 168 markers with clean amplification products, 124 (73.8%) displayed polymorphism based on high resolution agarose gels. Of 85 EST-SSR markers screened, 80 (94.1%) amplified alleles from one or more wild species (M epruinosa, M glaziovii, M brachyandra, M tripartita) whereas 13 (15.3%) amplified alleles from castor bean and 9 (10.6%) amplified alleles from leafy spurge; hence nearly all markers were transferable to wild relatives of M esculenta while only a fraction was transferable to the more distantly related taxa. In a subset of 20 EST-SSRs assessed by fluorescence-based genotyping the number of alleles per locus ranged from 2 to 10 with an average of 4.55 per locus. These markers had a polymorphism information content (PIC) from 0.19 to 0.75 with an average value of 0.55 and showed genetic relationships consistent with existing information on these genotypes.
A set of 124 new, unique polymorphic EST-SSRs was developed and characterized which extends the repertoire of SSR markers for cultivated cassava and its wild relatives. The markers show high PIC values and therefore will be useful for cultivar identification, taxonomic studies, and genetic mapping. The study further shows that mining ESTs is a highly efficient strategy for polymorphism detection within the cultivated cassava gene pool.
MicroRNAs (miRNAs) are endogenously encoded small RNAs that post-transcriptionally regulate gene expression and play essential roles in numerous developmental and physiological processes. Currently, little information on the transcriptome and tissue-specific expression of miRNAs is available in the model non-edible oilseed crop castor bean (Ricinus communis L.), one of the most important non-edible oilseed crops cultivated worldwide. Recent advances in sequencing technologies have allowed the identification of conserved and novel miRNAs in many plant species. Here, we used high-throughput sequencing technologies to identify and characterize the miRNAs in castor bean.
Five small RNA libraries were constructed for deep sequencing from root tips, leaves, developing seeds (at the initial stage, seed1; and at the fast oil accumulation stage, seed2) and endosperms in castor bean. High-throughput sequencing generated a large number of sequence reads of small RNAs in this study. In total, 86 conserved miRNAs were identified, including 63 known and 23 newly identified. Sixteen miRNA isoform variants in length were found from the conserved miRNAs of castor bean. MiRNAs displayed diverse organ-specific expression levels among five libraries. Combined with criteria for miRNA annotation and a RT-PCR approach, 72 novel miRNAs and their potential precursors were annotated and 20 miRNAs newly identified were validated. In addition, new target candidates for miRNAs newly identified in this study were proposed.
The current study presents the first high-throughput small RNA sequencing study performed in castor bean to identify its miRNA population. It characterizes and increases the number of miRNAs and their isoforms identified in castor bean. The miRNA expression analysis provides a foundation for understanding castor bean miRNA organ-specific expression patterns. The present study offers an expanded picture of miRNAs for castor bean and other members in the family Euphorbiaceae.
Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.
The AP2/ERF transcription factor, one of the largest gene families in plants, plays a crucial role in the regulation of growth and development, metabolism, and responses to biotic and abiotic stresses. Castor bean (Ricinus communis L., Euphobiaceae) is one of most important non-edible oilseed crops and its seed oil is broadly used for industrial applications. The available genome provides a great chance to identify and characterize the global information on AP2/ERF transcription factors in castor bean, which might provide insights in understanding the molecular basis of the AP2/ERF family in castor bean.
A total of 114 AP2/ERF transcription factors were identified based on the genome in castor bean. According to the number of the AP2/ERF domain, the conserved amino acid residues within AP2/ERF domain, the conserved motifs and gene organization in structure, and phylogenetical analysis, the identified 114 AP2/ERF transcription factors were characterized. Global expression profiles among different tissues using high-throughput sequencing of digital gene expression profiles (DGEs) displayed diverse expression patterns that may provide basic information in understanding the function of the AP2/ERF gene family in castor bean.
The current study is the first report on identification and characterization of the AP2/ERF transcription factors based on the genome of castor bean in the family Euphobiaceae. Results obtained from this study provide valuable information in understanding the molecular basis of the AP2/ERF family in castor bean.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-14-785) contains supplementary material, which is available to authorized users.
Castor bean and Jatropha contain seed oil of industrial importance, share taxonomical and biochemical similarities, which can be explored for identifying SSRs in the whole genome sequence of castor bean and utilized in Jatropha curcas. Whole genome analysis of castor bean identified 5,80,986 SSRs with a frequency of 1 per 680 bp. Genomic distribution of SSRs revealed that 27% were present in the non-genic region whereas 73% were also present in the putative genic regions with 26% in 5′UTRs, 25% in introns, 16% in 3′UTRs and 6% in the exons. Dinucleotide repeats were more frequent in introns, 5′UTRs and 3′UTRs whereas trinucleotide repeats were predominant in the exons. The transferability of randomly selected 302 SSRs, from castor bean to 49 J. curcas genotypes and 8 Jatropha species other than J. curcas, showed that 211 (∼70%) amplified on Jatropha out of which 7.58% showed polymorphisms in J. curcas genotypes and 12.32% in Jatropha species. The higher rate of transferability of SSR markers from castor bean to Jatropha coupled with a good level of PIC (polymorphic information content) value (0.2 in J. curcas genotypes and 0.6 in Jatropha species) suggested that SSRs would be useful in germplasm analysis, linkage mapping, diversity studies and phylogenetic relationships, and so forth, in J. curcas as well as other Jatropha species.
Castor bean (Ricinus communis) is an oil crop that belongs to the spurge (Euphorbiaceae) family. Its seeds are the source of castor oil, used for the production of high-quality lubricants due to its high proportion of the unusual fatty acid ricinoleic acid. Castor bean seeds also produce ricin, a highly toxic ribosome inactivating protein, making castor bean relevant for biosafety. We report here the 4.6X draft genome sequence of castor bean, representing the first reported Euphorbiaceae genome sequence. Our analysis shows that most key castor oil metabolism genes are single-copy while the ricin gene family is larger than previously thought. Comparative genomics analysis suggests the presence of an ancient hexaploidization event that is conserved across the dicotyledonous lineage.
Sequencing of cDNA libraries for the development of expressed sequence tags (ESTs) as well as for the discovery of simple sequence repeats (SSRs) has been a common method of developing microsatellites or SSR-based markers. In this research, our objective was to further sequence and develop common bean microsatellites from leaf and root cDNA libraries derived from the Andean gene pool accession G19833 and the Mesoamerican gene pool accession DOR364, mapping parents of a commonly used reference map. The root libraries were made from high and low phosphorus treated plants.
A total of 3,123 EST sequences from leaf and root cDNA libraries were screened and used for direct simple sequence repeat discovery. From these EST sequences we found 184 microsatellites; the majority containing tri-nucleotide motifs, many of which were GC rich (ACC, AGC and AGG in particular). Di-nucleotide motif microsatellites were about half as common as the tri-nucleotide motif microsatellites but most of these were AGn microsatellites with a moderate number of ATn microsatellites in root ESTs followed by few ACn and no GCn microsatellites. Out of the 184 new SSR loci, 120 new microsatellite markers were developed in the BMc (Bean Microsatellites from cDNAs) series and these were evaluated for their capacity to distinguish bean diversity in a germplasm panel of 18 genotypes. We developed a database with images of the microsatellites and their polymorphism information content (PIC), which averaged 0.310 for polymorphic markers.
The present study produced information about microsatellite frequency in root and leaf tissues of two important genotypes for common bean genomics: namely G19833, the Andean genotype selected for whole genome shotgun sequencing from race Peru, and DOR364 a race Mesoamerica subgroup 2 genotype that is a small-red seeded, released variety in Central America. Both race Peru and Mesoamerica subgroup 2 (small red beans) have been understudied in comparison to race Nueva Granada and Mesoamerica subgroup 1 (black beans) both with regards to gene expression and as sources of markers. However, we found few differences between SSR type and frequency between the G19833 leaf and DOR364 root tissue-derived ESTs. Overall, our work adds to the analysis of microsatellite frequency evaluation for common bean and provides a new set of 120 BMc markers which combined with the 248 previously developed BMc markers brings the total in this series to 368 markers. Once we include BMd markers, which are derived from GenBank sequences, the current total of gene-based markers from our laboratory surpasses 500 markers. These markers are basic for studies of the transcriptome of common bean and can form anchor points for genetic mapping studies in the future.
The LEAFY COTYLEDON2 (LEC2) gene plays critically important regulatory roles during both early and late embryonic development. Here, we report the identification of the LEC2 gene from the castor bean plant (Ricinus communis), and characterize the effects of its overexpression on gene regulation and lipid metabolism in transgenic Arabidopsis plants. LEC2 exists as a single-copy gene in castor bean, is expressed predominantly in embryos, and encodes a protein with a conserved B3 domain, but different N- and C-terminal domains to those found in LEC2 from Arabidopsis. Ectopic overexpression of LEC2 from castor bean under the control of the cauliflower mosaic virus (CaMV) 35S promoter in Arabidopsis plants induces the accumulation of transcripts that encodes five major transcription factors (the LEAFY COTYLEDON1 (LEC1), LEAFY COTYLEDON1-LIKE (L1L), FUSCA3 (FUS3), and ABSCISIC ACID INSENSITIVE 3 (ABI3) transcripts for seed maturation, and WRINKELED1 (WRI1) transcripts for fatty acid biosynthesis), as well as OLEOSIN transcripts for the formation of oil bodies in vegetative tissues. Transgenic Arabidopsis plants that express the LEC2 gene from castor bean show a range of dose-dependent morphological phenotypes and effects on the expression of LEC2-regulated genes during seedling establishment and vegetative growth. Expression of castor bean LEC2 in Arabidopsis increased the expression of fatty acid elongase 1 (FAE1) and induced the accumulation of triacylglycerols, especially those containing the seed-specific fatty acid, eicosenoic acid (20:1Δ11), in vegetative tissues.
•Castor bean LEC2 is single copy and shows seed-specific expression.•Over-expression of castor LEC2 induces genes involved in seed maturation in leaves.•Castor LEC2 induces the accumulation of triacylglycerols and 20:1 fatty acids in leaves.•Ectopic expression of castor LEC2 in Arabidopsis affects plant growth.
Castor bean; Eicosenoic acid; LEAFY COTYLEDON2; Seed maturation; Transcription factor; Triacylglycerol; ABI3-VP1, abscisic acid-insensitive 3-viviparous 1; CaMV, cauliflower mosaic virus; cDNA, complementary DNA; DHA, docosahexaenoic acid; DIG, digoxigenin; FAE1, fatty acid elongase 1; GC, gas chromatography; ORF, open reading frame; RT-PCR, reverse transcription polymerase chain reaction; SSC, sodium chloride-sodium citrate; TAG, triacylglycerol; TF, transcription factor; TLC, thin-layer chromatography
The adzuki bean (Vigna angularis (Ohwi) Ohwi and Ohashi) is an important grain legume of Asia. It is cultivated mainly in China, Japan and Korea. Despite its importance, few genomic resources are available for molecular genetic research of adzuki bean. In this study, we developed EST-SSR markers for the adzuki bean through next-generation sequencing. More than 112 million high-quality cDNA sequence reads were obtained from adzuki bean using Illumina paired-end sequencing technology, and the sequences were de novo assembled into 65,950 unigenes. The average length of the unigenes was 1,213 bp. Among the unigenes, 14,547 sequences contained a unique simple sequence repeat (SSR) and 3,350 sequences contained more than one SSR. A total of 7,947 EST-SSRs were identified as potential molecular markers, with mono-nucleotide A/T repeats (99.0%) as the most abundant motif class, followed by AG/CT (68.4%), AAG/CTT (30.0%), AAAG/CTTT (26.2%), AAAAG/CTTTT (16.1%), and AACGGG/CCCGTT (6.0%). A total of 500 SSR markers were randomly selected for validation, of which 296 markers produced reproducible amplicons with 38 polymorphic markers among the 32 adzuki bean genotypes selected from diverse geographical locations across China. The large number of SSR-containing sequences and EST-SSR markers will be valuable for genetic analysis of the adzuki bean and related Vigna species.
Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds.
Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12) gene that is responsible for ricinoleate biosynthesis. The role(s) of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2) gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds.
Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at the Institute for Genome Research (TIGR).
Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species.
Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession’s origin.
L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection.
Lupinus luteus; EST-SSR; Orphan crop; Microsynteny
Castor bean (Ricinus communis) is an agricultural crop and garden ornamental that is widely cultivated and has been introduced worldwide. Understanding population structure and the distribution of castor bean cultivars has been challenging because of limited genetic variability. We analyzed the population genetics of R. communis in a worldwide collection of plants from germplasm and from naturalized populations in Florida, U.S. To assess genetic diversity we conducted survey sequencing of the genomes of seven diverse cultivars and compared the data to a reference genome assembly of a widespread cultivar (Hale). We determined the population genetic structure of 676 samples using single nucleotide polymorphisms (SNPs) at 48 loci.
Bayesian clustering indicated five main groups worldwide and a repeated pattern of mixed genotypes in most countries. High levels of population differentiation occurred between most populations but this structure was not geographically based. Most molecular variance occurred within populations (74%) followed by 22% among populations, and 4% among continents. Samples from naturalized populations in Florida indicated significant population structuring consistent with local demes. There was significant population differentiation for 56 of 78 comparisons in Florida (pairwise population ϕPT values, p < 0.01).
Low levels of genetic diversity and mixing of genotypes have led to minimal geographic structuring of castor bean populations worldwide. Relatively few lineages occur and these are widely distributed. Our approach of determining population genetic structure using SNPs from genome-wide comparisons constitutes a framework for high-throughput analyses of genetic diversity in plants, particularly in species with limited genetic diversity.
Storage triacylglycerols in castor bean seeds are enriched in the hydroxylated fatty acid ricinoleate. Extensive tissue-specific RNA-Seq transcriptome and lipid analysis will help identify components important for its biosynthesis.
Storage triacylglycerols (TAGs) in the endosperm of developing castor (Ricinus communis) seeds are highly enriched in ricinoleic acid (18:1-OH). We have analysed neutral lipid fractions from other castor tissues using TLC, GLC and mass spectrometry. Cotyledons, like the endosperm, contain high levels of 18:1-OH in TAG. Pollen and male developing flowers accumulate TAG but do not contain 18:1-OH and leaves do not contain TAG or 18:1-OH. Analysis of acyl-CoAs in developing endosperm shows that ricinoleoyl-CoA is not the dominant acyl-CoA, indicating that either metabolic channelling or enzyme substrate selectivity are important in the synthesis of tri-ricinolein in this tissue. RNA-Seq transcriptomic analysis, using Illumina sequencing by synthesis technology, has been performed on mRNA isolated from two stages of developing seeds, germinating seeds, leaf and pollen-producing male flowers in order to identify differences in lipid-metabolic pathways and enzyme isoforms which could be important in the biosynthesis of TAG enriched in 18:1-OH. This study gives comprehensive coverage of gene expression in a variety of different castor tissues. The potential role of differentially expressed genes is discussed against a background of proteins identified in the endoplasmic reticulum, which is the site of TAG biosynthesis, and transgenic studies aimed at increasing the ricinoleic acid content of TAG.
Several of the genes identified in this tissue-specific whole transcriptome study have been used in transgenic plant research aimed at increasing the level of ricinoleic acid in TAG. New candidate genes have been identified which might further improve the level of ricinoleic acid in transgenic crops.
Pearl millet [Pennisetum glaucum (L.) R. Br.] is a staple food and fodder crop of marginal agricultural lands of sub-Saharan Africa and the Indian subcontinent. It is also a summer forage crop in the southern USA, Australia and Latin America, and is the preferred mulch in Brazilian no-till soybean production systems. Use of molecular marker technology for pearl millet genetic improvement has been limited. Progress is hampered by insufficient numbers of PCR-compatible co-dominant markers that can be used readily in applied breeding programmes. Therefore, we sought to develop additional SSR markers for the pearl millet research community.
A set of new pearl millet SSR markers were developed using available sequence information from 3520 expressed sequence tags (ESTs). After clustering, unigene sequences (2175 singlets and 317 contigs) were searched for the presence of SSRs. We detected 164 sequences containing SSRs (at least 14 bases in length), with a density of one per 1.75 kb of EST sequence. Di-nucleotide repeats were the most abundant followed by tri-nucleotide repeats. Ninety primer pairs were designed and tested for their ability to detect polymorphism across a panel of 11 pairs of pearl millet mapping population parental lines. Clear amplification products were obtained for 58 primer pairs. Of these, 15 were monomorphic across the panel. A subset of 21 polymorphic EST-SSRs and 6 recently developed genomic SSR markers were mapped using existing mapping populations. Linkage map positions of these EST-SSR were compared by homology search with mapped rice genomic sequences on the basis of pearl millet-rice synteny. Most new EST-SSR markers mapped to distal regions of linkage groups, often to previous gaps in these linkage maps. These new EST-SSRs are now are used by ICRISAT in pearl millet diversity assessment and marker-aided breeding programs.
This study has demonstrated the potential of EST-derived SSR primer pairs in pearl millet. As reported for other crops, EST-derived SSRs provide a cost-saving marker development option in pearl millet. Resources developed in this study have added a sizeable number of useful SSRs to the existing repertoire of circa 100 genomic SSRs that were previously available to pearl millet researchers.
The oil palm (Elaeis guineensis Jacq.) is a perennial monocotyledonous tropical crop species that is now the world's number one source of edible vegetable oil, and the richest dietary source of provitamin A. While new elite genotypes from traditional breeding programs provide steady yield increases, the long selection cycle (10-12 years) and the large areas required to cultivate oil palm make genetic improvement slow and labor intensive. Molecular breeding programs have the potential to make significant impacts on the rate of genetic improvement but the limited molecular resources, in particular the lack of molecular markers for agronomic traits of interest, restrict the application of molecular breeding schemes for oil palm.
In the current study, 6,103 non-redundant ESTs derived from cDNA libraries of developing vegetative and reproductive tissues were annotated and searched for simple sequence repeats (SSRs). Primer pairs from sequences flanking 289 EST-SSRs were tested to detect polymorphisms in elite breeding parents and their crosses. 230 of these amplified PCR products, 88 of which were polymorphic within the breeding material tested. A detailed analysis and annotation of the EST-SSRs revealed the locations of the polymorphisms within the transcripts, and that the main functional category was related to transcription and post-transcriptional regulation. Indeed, SSR polymorphisms were found in sequences encoding AP2-like, bZIP, zinc finger, MADS-box, and NAC-like transcription factors in addition to other transcriptional regulatory proteins and several RNA interacting proteins.
The identification of new EST-SSRs that detect polymorphisms in elite breeding material provides tools for molecular breeding strategies. The identification of SSRs within transcripts, in particular those that encode proteins involved in transcriptional and post-transcriptional regulation, will allow insight into the functional roles of these proteins by studying the phenotypic traits that cosegregate with these markers. Finally, the oil palm EST-SSRs derived from vegetative and reproductive development will be useful for studies on the evolution of the functional diversity within the palm family.
Field pea (Pisum sativum L.) and faba bean (Vicia faba L.) are cool-season grain legume species that provide rich sources of food for humans and fodder for livestock. To date, both species have been relative 'genomic orphans' due to limited availability of genetic and genomic information. A significant enrichment of genomic resources is consequently required in order to understand the genetic architecture of important agronomic traits, and to support germplasm enhancement, genetic diversity, population structure and demographic studies.
cDNA samples obtained from various tissue types of specific field pea and faba bean genotypes were sequenced using 454 Roche GS FLX Titanium technology. A total of 720,324 and 304,680 reads for field pea and faba bean, respectively, were de novo assembled to generate sets of 70,682 and 60,440 unigenes. Consensus sequences were compared against the genome of the model legume species Medicago truncatula Gaertn., as well as that of the more distantly related, but better-characterised genome of Arabidopsis thaliana L.. In comparison to M. truncatula coding sequences, 11,737 and 10,179 unique hits were obtained from field pea and faba bean. Totals of 22,057 field pea and 18,052 faba bean unigenes were subsequently annotated from GenBank. Comparison to the genome of soybean (Glycine max L.) resulted in 19,451 unique hits for field pea and 16,497 unique hits for faba bean, corresponding to c. 35% and 30% of the known gene space, respectively. Simple sequence repeat (SSR)-containing expressed sequence tags (ESTs) were identified from consensus sequences, and totals of 2,397 and 802 primer pairs were designed for field pea and faba bean. Subsets of 96 EST-SSR markers were screened for validation across modest panels of field pea and faba bean cultivars, as well as related non-domesticated species. For field pea, 86 primer pairs successfully obtained amplification products from one or more template genotypes, of which 59% revealed polymorphism between 6 genotypes. In the case of faba bean, 81 primer pairs displayed successful amplification, of which 48% detected polymorphism.
The generation of EST datasets for field pea and faba bean has permitted effective unigene identification and functional sequence annotation. EST-SSR loci were detected at incidences of 14-17%, permitting design of comprehensive sets of primer pairs. The subsets from these primer pairs proved highly useful for polymorphism detection within Pisum and Vicia germplasm.
The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomic resources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomic resources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches in generating genomic resources provides a non-biased glimpse into the publicly available EST-sequences, yam genome, and GBS profiles with affirmation that the genomic complexity can be methodically unraveled and constitute a critical foundation for future studies in linkage mapping, germplasm analysis, and predictive breeding.
Ricinus communis is an industrially important non-edible oil seed crop, native to tropical and subtropical regions of the world. Although, R. communis genome was assembled in 4X draft by JCVI, and is predicted to contain 31,221 proteins, the function of most of the genes remains to be elucidated. A large amount of information of different aspects of the biology of R. communis is available, but most of the data are scattered one not easily accessible. Therefore a comprehensive resource on Castor, Castor DB, is required to facilitate research on this important plant.
CastorDB is a specialized and comprehensive database for the oil seed plant R. communis, integrating information from several diverse resources. CastorDB contains information on gene and protein sequences, gene expression and gene ontology annotation of protein sequences obtained from a variety of repositories, as primary data. In addition, computational analysis was used to predict cellular localization, domains, pathways, protein-protein interactions, sumoylation sites and biochemical properties and has been included as derived data. This database has an intuitive user interface that prompts the user to explore various possible information resources available on a given gene or a protein.
CastorDB provides a user friendly comprehensive resource on castor with particular emphasis on its genome, transcriptome, and proteome and on protein domains, pathways, protein localization, presence of sumoylation sites, expression data and protein interacting partners.
Jatropha curcas is a potential plant species for biodiesel production. However, its seed yield is too low for profitable production of biodiesel. To improve the productivity, genetic improvement through breeding is essential. A linkage map is an important component in molecular breeding. We established a first-generation linkage map using a mapping panel containing two backcross populations with 93 progeny. We mapped 506 markers (216 microsatellites and 290 SNPs from ESTs) onto 11 linkage groups. The total length of the map was 1440.9 cM with an average marker space of 2.8 cM. Blasting of 222 Jatropha ESTs containing polymorphic SSR or SNP markers against EST-databases revealed that 91.0%, 86.5% and 79.2% of Jatropha ESTs were homologous to counterparts in castor bean, poplar and Arabidopsis respectively. Mapping 192 orthologous markers to the assembled whole genome sequence of Arabidopsis thaliana identified 38 syntenic blocks and revealed that small linkage blocks were well conserved, but often shuffled. The first generation linkage map and the data of comparative mapping could lay a solid foundation for QTL mapping of agronomic traits, marker-assisted breeding and cloning genes responsible for phenotypic variation.
Over recent years, a growing effort has been made to develop microsatellite markers for the genomic analysis of the common bean (Phaseolus vulgaris) to broaden the knowledge of the molecular genetic basis of this species. The availability of large sets of expressed sequence tags (ESTs) in public databases has given rise to an expedient approach for the identification of SSRs (Simple Sequence Repeats), specifically EST-derived SSRs. In the present work, a battery of new microsatellite markers was obtained from a search of the Phaseolus vulgaris EST database. The diversity, degree of transferability and polymorphism of these markers were tested.
From 9,583 valid ESTs, 4,764 had microsatellite motifs, from which 377 were used to design primers, and 302 (80.11%) showed good amplification quality. To analyze transferability, a group of 167 SSRs were tested, and the results showed that they were 82% transferable across at least one species. The highest amplification rates were observed between the species from the Phaseolus (63.7%), Vigna (25.9%), Glycine (19.8%), Medicago (10.2%), Dipterix (6%) and Arachis (1.8%) genera. The average PIC (Polymorphism Information Content) varied from 0.53 for genomic SSRs to 0.47 for EST-SSRs, and the average number of alleles per locus was 4 and 3, respectively. Among the 315 newly tested SSRs in the BJ (BAT93 X Jalo EEP558) population, 24% (76) were polymorphic. The integration of these segregant loci into a framework map composed of 123 previously obtained SSR markers yielded a total of 199 segregant loci, of which 182 (91.5%) were mapped to 14 linkage groups, resulting in a map length of 1,157 cM.
A total of 302 newly developed EST-SSR markers, showing good amplification quality, are available for the genetic analysis of Phaseolus vulgaris. These markers showed satisfactory rates of transferability, especially between species that have great economic and genomic values. Their diversity was comparable to genomic SSRs, and they were incorporated in the common bean reference genetic map, which constitutes an important contribution to and advance in Phaseolus vulgaris genomic research.
Jatropha curcas L. has attracted a great deal of attention worldwide, regarding its potential as a new biodiesel crop. However, the understanding of this crop remains very limited and little genomic research has been done. We used simple sequence repeat (SSR) markers that could be transferred from Manihot esculenta (cassava) to analyze the genetic relationships among 45 accessions of J. curcas from our germplasm collection.
In total, 187 out of 419 expressed sequence tag (EST)-SSR and 54 out of 182 genomic (G)-SSR markers from cassava were polymorphic among the J. curcas accessions. The EST-SSR markers comprised 26.20% dinucleotide repeats, 57.75% trinucleotide repeats, 7.49% tetranucleotide repeats, and 8.56% pentanucleotide repeats, whereas the majority of the G-SSR markers were dinucleotide repeats (62.96%). The 187 EST-SSRs resided in genes that are involved mainly in biological and metabolic processes. Thirty-six EST-SSRs and 20 G-SSRs were chosen to analyze the genetic diversity among 45 J. curcas accessions. A total of 183 polymorphic alleles were detected. On the basis of the distribution of these polymorphic alleles, the 45 accessions were classified into six groups, in which the genotype showed a correlation with geographic origin. The estimated mean genetic diversity index was 0.5572, which suggests that our J. curcas germplasm collection has a high level of genetic diversity. This should facilitate subsequent studies on genetic mapping and molecular breeding.
We identified 241 novel EST-SSR and G-SSR markers in J. curcas, which should be useful for genetic mapping and quantitative trait loci analysis of important agronomic traits. By using these markers, we found that the intergroup gene diversity of J. curcas was greater than the intragroup diversity, and that the domestication of the species probably occurred partly in America and partly in Hainan, China.
Ramie (Boehmeria nivea L. Gaud) is one of the most important natural fiber crops, and improvement of fiber yield and quality is the main goal in efforts to breed superior cultivars. However, efforts aimed at enhancing the understanding of ramie genetics and developing more effective breeding strategies have been hampered by the shortage of simple sequence repeat (SSR) markers. In our previous study, we had assembled de novo 43,990 expressed sequence tags (ESTs). In the present study, we searched these previously assembled ESTs for SSRs and identified 1,685 ESTs (3.83%) containing 1,878 SSRs. Next, we designed 1,827 primer pairs complementary to regions flanking these SSRs, and these regions were designated as SSR markers. Among these markers, dinucleotide and trinucleotide repeat motifs were the most abundant types (36.4% and 36.3%, respectively), whereas tetranucleotide, pentanucleotide, and hexanucleotide motifs represented <10% of the markers. The motif AG/CT was the most abundant, accounting for 28.74% of the markers. One hundred EST-SSR markers (97 SSRs located in genes encoding transcription factors and 3 SSRs in genes encoding cellulose synthases) were amplified using polymerase chain reaction for detecting 24 ramie varieties. Of these 100 markers, 98 markers were successfully amplified and 81 markers were polymorphic, with 2–6 alleles among the 24 varieties. Analysis of the genetic diversity of all 24 varieties revealed similarity coefficients that ranged from 0.51 to 0.80. The EST-SSRs developed in this study represent the first large-scale development of SSR markers for ramie. These SSR markers could be used for development of genetic and physical maps, quantitative trait loci mapping, genetic diversity studies, association mapping, and cultivar fingerprinting.
Coffee breeding and improvement efforts can be greatly facilitated by availability of a large repository of simple sequence repeats (SSRs) based microsatellite markers, which provides efficiency and high-resolution in genetic analyses. This study was aimed to improve SSR availability in coffee by developing new genic−/genomic-SSR markers using in-silico bioinformatics and streptavidin-biotin based enrichment approach, respectively. The expressed sequence tag (EST) based genic microsatellite markers (EST-SSRs) were developed using the publicly available dataset of 13,175 unigene ESTs, which showed a distribution of 1 SSR/3.4 kb of coffee transcriptome. Genomic SSRs, on the other hand, were developed from an SSR-enriched small-insert partial genomic library of robusta coffee. In total, 69 new SSRs (44 EST-SSRs and 25 genomic SSRs) were developed and validated as suitable genetic markers. Diversity analysis of selected coffee genotypes revealed these to be highly informative in terms of allelic diversity and PIC values, and eighteen of these markers (∼27%) could be mapped on a robusta linkage map. Notably, the markers described here also revealed a very high cross-species transferability. In addition to the validated markers, we have also designed primer pairs for 270 putative EST-SSRs, which are expected to provide another ca. 200 useful genetic markers considering the high success rate (88%) of marker conversion of similar pairs tested/validated in this study.
Lettuce (Lactuca sativa L.) is the major crop from the group of leafy vegetables. Several types of molecular markers were developed that are effectively used in lettuce breeding and genetic studies. However only a very limited number of microsattelite-based markers are publicly available. We have employed the method of enriched microsatellite libraries to develop 97 genomic SSR markers.
Testing of newly developed markers on a set of 36 Lactuca accession (33 L. sativa, and one of each L. serriola L., L. saligna L., and L. virosa L.) revealed that both the genetic heterozygosity (UHe = 0.56) and the number of loci per SSR (Na = 5.50) are significantly higher for genomic SSR markers than for previously developed EST-based SSR markers (UHe = 0.32, Na = 3.56). Fifty-four genomic SSR markers were placed on the molecular linkage map of lettuce. Distribution of markers in the genome appeared to be random, with the exception of possible cluster on linkage group 6. Any combination of 32 genomic SSRs was able to distinguish genotypes of all 36 accessions. Fourteen of newly developed SSR markers originate from fragments with high sequence similarity to resistance gene candidates (RGCs) and RGC pseudogenes. Analysis of molecular variance (AMOVA) of L. sativa accessions showed that approximately 3% of genetic diversity was within accessions, 79% among accessions, and 18% among horticultural types.
The newly developed genomic SSR markers were added to the pool of previously developed EST-SSRs markers. These two types of SSR-based markers provide useful tools for lettuce cultivar fingerprinting, development of integrated molecular linkage maps, and mapping of genes.
Data resolution statistics; Genotyping; Lactuca; Linkage map; Marker distribution; Microsatellites
Tibetan annual wild barley is rich in genetic variation. This study was aimed at the exploitation of new SSRs for the genetic diversity and phylogenetic analysis of wild barley by data mining. We developed 49 novel EST-SSRs and confirmed 20 genomic SSRs for 80 Tibetan annual wild barley and 16 cultivated barley accessions. A total of 213 alleles were generated from 69 loci with an average of 3.14 alleles per locus. The trimeric repeats were the most abundant motifs (40.82%) among the EST-SSRs, while the majority of the genomic SSRs were di-nuleotide repeats. The polymorphic information content (PIC) ranged from 0.08 to 0.75 with a mean of 0.46. Besides this, the expected heterozygosity (He) ranged from 0.0854 to 0.7842 with an average of 0.5279. Overall, the polymorphism of genomic SSRs was higher than that of EST-SSRs. Furthermore, the number of alleles and the PIC of wild barley were both higher than that of cultivated barley, being 3.12 vs 2.59 and 0.44 vs 0.37. Indicating more polymorphism existed in the Tibetan wild barley than in cultivated barley. The 96 accessions were divided into eight subpopulations based on 69 SSR markers, and the cultivated genotypes can be clearly separated from wild barleys. A total of 47 SSR-containing EST unigenes showed significant similarities to the known genes. These EST-SSR markers have potential for application in germplasm appraisal, genetic diversity and population structure analysis, facilitating marker-assisted breeding and crop improvement in barley.