The advantages of using molecular markers in modern genebanks are well documented. They are commonly used to understand the distribution of genetic diversity in populations and among species which is crucial for efficient management and effective utilization of germplasm collections. We describe the development of two types of DArT molecular marker platforms for the new oilseed crop lesquerella (Physaria spp.), a member of the Brassicaceae family, to characterize a collection in the National Plant Germplasm System (NPGS) with relatively little known in regards to the genetic diversity and traits. The two types of platforms were developed using a subset of the germplasm conserved ex situ consisting of 87 Physaria and 2 Paysonia accessions. The microarray DArT revealed a total of 2,833 polymorphic markers with an average genotype call rate of 98.4% and a scoring reproducibility of 99.7%. On the other hand, the DArTseq platform developed for SNP and DArT markers from short sequence reads showed a total of 27,748 high quality markers. Cluster analysis and principal coordinate analysis indicated that the different accessions were successfully classified by both systems based on species, by geographical source, and breeding status. In the germplasm set analyzed, which represented more than 80% of the P. fendleri collection, we observed that a substantial amount of variation exists in the species collection. These markers will be valuable in germplasm management studies and lesquerella breeding, and augment the microsatellite markers previously developed on the taxa.
The distinctness of, and overlap between, pea genotypes held in several Pisum germplasm collections has been used to determine their relatedness and to test previous ideas about the genetic diversity of Pisum. Our characterisation of genetic diversity among 4,538 Pisum accessions held in 7 European Genebanks has identified sources of novel genetic variation, and both reinforces and refines previous interpretations of the overall structure of genetic diversity in Pisum. Molecular marker analysis was based upon the presence/absence of polymorphism of retrotransposon insertions scored by a high-throughput microarray and SSAP approaches. We conclude that the diversity of Pisum constitutes a broad continuum, with graded differentiation into sub-populations which display various degrees of distinctness. The most distinct genetic groups correspond to the named taxa while the cultivars and landraces of Pisum sativum can be divided into two broad types, one of which is strongly enriched for modern cultivars. The addition of germplasm sets from six European Genebanks, chosen to represent high diversity, to a single collection previously studied with these markers resulted in modest additions to the overall diversity observed, suggesting that the great majority of the total genetic diversity collected for the Pisum genus has now been described. Two interesting sources of novel genetic variation have been identified. Finally, we have proposed reference sets of core accessions with a range of sample sizes to represent Pisum diversity for the future study and exploitation by researchers and breeders.
Electronic supplementary material
The online version of this article (doi:10.1007/s00122-012-1839-1) contains supplementary material, which is available to authorized users.
Numerous rye accessions are stored in ex situ genebanks worldwide. Little is known about the extent of genetic diversity contained in any of them and its relation to contemporary varieties, since to date rye genetic diversity studies had a very limited scope, analyzing few loci and/ or few accessions. Development of high throughput genotyping methods for rye opened the possibility for genome wide characterizations of large accessions sets. In this study we used 1054 Diversity Array Technology (DArT) markers with defined chromosomal location to characterize genetic diversity and population structure in a collection of 379 rye accessions including wild species, landraces, cultivated materials, historical and contemporary rye varieties.
Average genetic similarity (GS) coefficients and average polymorphic information content (PIC) values varied among chromosomes. Comparison of chromosome specific average GS within and between germplasm sub-groups indicated regions of chromosomes 1R and 4R as being targeted by selection in current breeding programs. Bayesian clustering, principal coordinate analysis and Neighbor Joining clustering demonstrated that source and improvement status contributed significantly to the structure observed in the analyzed set of Secale germplasm. We revealed a relatively limited diversity in improved rye accessions, both historical and contemporary, as well as lack of correlation between clustering of improved accessions and geographic origin, suggesting common genetic background of rye accessions from diverse geographic regions and extensive germplasm exchange. Moreover, contemporary varieties were distinct from the remaining accessions.
Our results point to an influence of reproduction methods on the observed diversity patterns and indicate potential of ex situ collections for broadening the genetic diversity in rye breeding programs. Obtained data show that DArT markers provide a realistic picture of the genetic diversity and population structure present in the collection of 379 rye accessions and are an effective platform for rye germplasm characterization and association mapping studies.
The National Institute of Agrobiological Sciences (NIAS) is implementing the NIAS Genebank Project for conservation and promotion of agrobiological genetic resources to contribute to the development and utilization of agriculture and agricultural products. The project’s databases (NIASGBdb; http://www.gene.affrc.go.jp/databases_en.php) consist of a genetic resource database and a plant diseases database, linked by a web retrieval database. The genetic resources database has plant and microorganism search systems to provide information on research materials, including passport and evaluation data for genetic resources with the desired properties. To facilitate genetic diversity research, several NIAS Core Collections have been developed. The NIAS Rice (Oryza sativa) Core Collection of Japanese Landraces contains information on simple sequence repeat (SSR) polymorphisms. SSR marker information for azuki bean (Vigna angularis) and black gram (V. mungo) and DNA sequence data from some selected Japanese strains of the genus Fusarium are also available. A database of plant diseases in Japan has been developed based on the listing of common names of plant diseases compiled by the Phytopathological Society of Japan. Relevant plant and microorganism genetic resources are associated with the plant disease names by the web retrieval database and can be obtained from the NIAS Genebank for research or educational purposes.
Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information. The analysis of NGS data by means of bioinformatics developments allows discovering new genes and regulatory sequences and their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Genomic approaches include TILLING and EcoTILLING, which make possible to screen mutant and germplasm collections for allelic variants in target genes. Re-sequencing of genomes is very useful for the genome-wide discovery of markers amenable for high-throughput genotyping platforms, like SSRs and SNPs, or the construction of high density genetic maps. All these tools and resources facilitate studying the genetic diversity, which is important for germplasm management, enhancement and use. Also, they allow the identification of markers linked to genes and QTLs, using a diversity of techniques like bulked segregant analysis (BSA), fine genetic mapping, or association mapping. These new markers are used for marker assisted selection, including marker assisted backcross selection, ‘breeding by design’, or new strategies, like genomic selection. In conclusion, advances in genomics are providing breeders with new tools and methodologies that allow a great leap forward in plant breeding, including the ‘superdomestication’ of crops and the genetic dissection and breeding for complex traits.
Bioinformatics; complex traits; genetic maps; marker assisted selection; molecular markers; next-generation-sequencing; quantitative trait loci.
Research related to crop domestication has been transformed by technologies and discoveries in the genome sciences as well as information-related sciences that are providing new tools for bioinformatics and systems' biology. Rapid progress in archaeobotany and ethnobotany are also contributing new knowledge to understanding crop domestication. This sense of rapid progress is encapsulated in this Special Issue, which contains 18 papers by scientists in botanical, crop sciences and related disciplines on the topic of crop domestication. One paper focuses on current themes in the genetics of crop domestication across crops, whereas other papers have a crop or geographic focus. One feature of progress in the sciences related to crop domestication is the availability of well-characterized germplasm resources in the global network of genetic resources centres (genebanks). Germplasm in genebanks is providing research materials for understanding domestication as well as for plant breeding. In this review, we highlight current genetic themes related to crop domestication. Impressive progress in this field in recent years is transforming plant breeding into crop engineering to meet the human need for increased crop yield with the minimum environmental impact – we consider this to be ‘super-domestication’. While the time scale of domestication of 10 000 years or less is a very short evolutionary time span, the details emerging of what has happened and what is happening provide a window to see where domestication might – and can – advance in the future.
Evolution; gene cloning; gene pyramiding; gene duplication; marker assisted selection; QTL; crop wild relatives
The wild relatives of crops represent a major source of valuable traits for crop improvement. These resources are threatened by habitat destruction, land use changes, and other factors, requiring their urgent collection and long-term availability for research and breeding from ex situ collections. We propose a method to identify gaps in ex situ collections (i.e. gap analysis) of crop wild relatives as a means to guide efficient and effective collecting activities.
The methodology prioritizes among taxa based on a combination of sampling, geographic, and environmental gaps. We apply the gap analysis methodology to wild taxa of the Phaseolus genepool. Of 85 taxa, 48 (56.5%) are assigned high priority for collecting due to lack of, or under-representation, in genebanks, 17 taxa are given medium priority for collecting, 15 low priority, and 5 species are assessed as adequately represented in ex situ collections. Gap “hotspots”, representing priority target areas for collecting, are concentrated in central Mexico, although the narrow endemic nature of a suite of priority species adds a number of specific additional regions to spatial collecting priorities.
Results of the gap analysis method mostly align very well with expert opinion of gaps in ex situ collections, with only a few exceptions. A more detailed prioritization of taxa and geographic areas for collection can be achieved by including in the analysis predictive threat factors, such as climate change or habitat destruction, or by adding additional prioritization filters, such as the degree of relatedness to cultivated species (i.e. ease of use in crop breeding). Furthermore, results for multiple crop genepools may be overlaid, which would allow a global analysis of gaps in ex situ collections of the world's plant genetic resources.
In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding.
Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program.
We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.
Studies of Hordeum vulgare subsp. spontaneum, the wild progenitor of cultivated barley, have mostly relied on materials collected decades ago and maintained since then ex situ in germplasm repositories. We analyzed spatial genetic variation in wild barley populations collected rather recently, exploring sequence variations at seven single-copy nuclear loci, and inferred the relationships among these populations and toward the genepool of the crop. The wild barley collection covers the whole natural distribution area from the Mediterranean to Middle Asia. In contrast to earlier studies, Bayesian assignment analyses revealed three population clusters, in the Levant, Turkey, and east of Turkey, respectively. Genetic diversity was exceptionally high in the Levant, while eastern populations were depleted of private alleles. Species distribution modeling based on climate parameters and extant occurrence points of the taxon inferred suitable habitat conditions during the ice-age, particularly in the Levant and Turkey. Together with the ecologically wide range of habitats, they might contribute to structured but long-term stable populations in this region and their high genetic diversity. For recently collected individuals, Bayesian assignment to geographic clusters was generally unambiguous, but materials from genebanks often showed accessions that were not placed according to their assumed geographic origin or showed traces of introgression from cultivated barley. We assign this to gene flow among accessions during ex situ maintenance. Evolutionary studies based on such materials might therefore result in wrong conclusions regarding the history of the species or the origin and mode of domestication of the crop, depending on the accessions included.
phylogeography; genetic diversity; population genetics; species distribution models; population structure; domestication
The EcoTILLING technique allows polymorphisms in target genes of natural populations to be quickly analysed or identified and facilitates the screening of genebank collections for desired traits. We have developed an EcoTILLING platform to exploit Capsicum genetic resources. A perfect example of the utility of this EcoTILLING platform is its application in searching for new virus-resistant alleles in Capsicum genus. Mutations in translation initiation factors (eIF4E, eIF(iso)4E, eIF4G and eIF(iso)4G) break the cycle of several RNA viruses without affecting the plant life cycle, which makes these genes potential targets to screen for resistant germplasm.
We developed and assayed a cDNA-based EcoTILLING platform with 233 cultivated accessions of the genus Capsicum. High variability in the coding sequences of the eIF4E and eIF(iso)4E genes was detected using the cDNA platform. After sequencing, 36 nucleotide changes were detected in the CDS of eIF4E and 26 in eIF(iso)4E. A total of 21 eIF4E haplotypes and 15 eIF(iso)4E haplotypes were identified. To evaluate the functional relevance of this variability, 31 possible eIF4E/eIF(iso)4E combinations were tested against Potato virus Y. The results showed that five new eIF4E variants (pvr210, pvr211, pvr212, pvr213 and pvr214) were related to PVY-resistance responses.
EcoTILLING was optimised in different Capsicum species to detect allelic variants of target genes. This work is the first to use cDNA instead of genomic DNA in EcoTILLING. This approach avoids intronic sequence problems and reduces the number of reactions. A high level of polymorphism has been identified for initiation factors, showing the high genetic variability present in our collection and its potential use for other traits, such as genes related to biotic or abiotic stresses, quality or production. Moreover, the new eIF4E and eIF(iso)4E alleles are an excellent collection for searching for new resistance against other RNA viruses.
Linkage maps are an integral resource for dissection of complex genetic traits in plant and animal species. Canonical map construction follows a well-established workflow: an initial discovery phase where genetic markers are mined from a small pool of individuals, followed by genotyping of selected mapping populations using sets of marker panels. A newly developed sequence-based marker technology, Restriction site Associated DNA (RAD), enables synchronous single nucleotide polymorphism (SNP) marker discovery and genotyping using massively parallel sequencing. The objective of this research was to assess the utility of RAD markers for linkage map construction, employing barley as a model system. Using the published high density EST-based SNP map in the Oregon Wolfe Barley (OWB) mapping population as a reference, we created a RAD map using a limited set of prior markers to establish linakge group identity, integrated the RAD and prior data, and used both maps for detection of quantitative trait loci (QTL).
Using the RAD protocol in tandem with the Illumina sequence by synthesis platform, a total of 530 SNP markers were identified from initial scans of the OWB parental inbred lines - the "dominant" and "recessive" marker stocks - and scored in a 93 member doubled haploid (DH) mapping population. RAD sequence data from the structured population was converted into allele genotypes from which a genetic map was constructed. The assembled RAD-only map consists of 445 markers with an average interval length of 5 cM, while an integrated map includes 463 RAD loci and 2383 prior markers. Sequenced RAD markers are distributed across all seven chromosomes, with polymorphic loci emanating from both coding and noncoding regions in the Hordeum genome. Total map lengths are comparable and the order of common markers is identical in both maps. The same large-effect QTL for reproductive fitness traits were detected with both maps and the majority of these QTL were coincident with a dwarfing gene (ZEO) and the VRS1 gene, which determines the two-row and six-row germplasm groups of barley.
We demonstrate how sequenced RAD markers can be leveraged to produce high quality linkage maps for detection of single gene loci and QTLs. By combining SNP discovery and genotyping into parallel sequencing events, RAD markers should be a useful molecular breeding tool for a range of crop species. Expected improvements in cost and throughput of second and third-generation sequencing technologies will enable more powerful applications of the sequenced RAD marker system, including improvements in de novo genome assembly, development of ultra-high density genetic maps and association mapping.
Global environmental change and increasing human population emphasize the urgent need for higher yielding and better adapted crop plants. One strategy to achieve this aim is to exploit the wealth of so called landraces of crop species, representing diverse traditional domesticated populations of locally adapted genotypes. In this study, we investigated a comprehensive set of 1485 spring barley landraces (Lrc1485) adapted to a wide range of climates, which were selected from one of the largest genebanks worldwide. The landraces originated from 5° to 62.5° N and 16° to 71° E. The whole collection was genotyped using 42 SSR markers to assess the genetic diversity and population structure. With an average allelic richness of 5.74 and 372 alleles, Lrc1485 harbours considerably more genetic diversity than the most polymorphic current GWAS panel for barley. Ten major clusters defined most of the population structure based on geographical origin, row type of the ear and caryopsis type – and were assigned to specific climate zones. The legacy core reference set Lrc648 established in this study will provide a long-lasting resource and a very valuable tool for the scientific community. Lrc648 is best suited for multi-environmental field testing to identify candidate genes underlying quantitative traits but also for allele mining approaches.
Phloem-feeding insects are among the most devastating pests worldwide. They not only cause damage by feeding from the phloem, thereby depleting the plant from photo-assimilates, but also by vectoring viruses. Until now, the main way to prevent such problems is the frequent use of insecticides. Applying resistant varieties would be a more environmental friendly and sustainable solution. For this, resistant sources need to be identified first. Up to now there were no methods suitable for high throughput phenotyping of plant germplasm to identify sources of resistance towards phloem-feeding insects.
In this paper we present a high throughput screening system to identify plants with an increased resistance against aphids. Its versatility is demonstrated using an Arabidopsis thaliana activation tag mutant line collection. This system consists of the green peach aphid Myzus persicae (Sulzer) and the circulative virus Turnip yellows virus (TuYV). In an initial screening, with one plant representing one mutant line, 13 virus-free mutant lines were identified by ELISA. Using seeds produced from these lines, the putative candidates were re-evaluated and characterized, resulting in nine lines with increased resistance towards the aphid.
This M. persicae-TuYV screening system is an efficient, reliable and quick procedure to identify among thousands of mutated lines those resistant to aphids. In our study, nine mutant lines with increased resistance against the aphid were selected among 5160 mutant lines in just 5 months by one person. The system can be extended to other phloem-feeding insects and circulative viruses to identify insect resistant sources from several collections, including for example genebanks and artificially prepared mutant collections.
Phloem-feeding insect; Myzus persicae; Turnip yellows virus; Arabidopsis thaliana; Activation tag
Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes.
In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies.
These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis.
Simple sequence repeat (SSR) and Single Nucleotide Polymorphic (SNP), the two most robust markers for identifying rice varieties were compared for assessment of genetic diversity and population structure. Total 375 varieties of rice from various regions of India archived at the Indian National GeneBank, NBPGR, New Delhi, were analyzed using thirty six genetic markers, each of hypervariable SSR (HvSSR) and SNP which were distributed across 12 rice chromosomes. A total of 80 alleles were amplified with the SSR markers with an average of 2.22 alleles per locus whereas, 72 alleles were amplified with SNP markers. Polymorphic information content (PIC) values for HvSSR ranged from 0.04 to 0.5 with an average of 0.25. In the case of SNP markers, PIC values ranged from 0.03 to 0.37 with an average of 0.23. Genetic relatedness among the varieties was studied; utilizing an unrooted tree all the genotypes were grouped into three major clusters with both SSR and SNP markers. Analysis of molecular variance (AMOVA) indicated that maximum diversity was partitioned between and within individual level but not between populations. Principal coordinate analysis (PCoA) with SSR markers showed that genotypes were uniformly distributed across the two axes with 13.33% of cumulative variation whereas, in case of SNP markers varieties were grouped into three broad groups across two axes with 45.20% of cumulative variation. Population structure were tested using K values from 1 to 20, but there was no clear population structure, therefore Ln(PD) derived Δk was plotted against the K to determine the number of populations. In case of SSR maximum Δk was at K=5 whereas, in case of SNP maximum Δk was found at K=15, suggesting that resolution of population was higher with SNP markers, but SSR were more efficient for diversity analysis.
Single Nucleotide Polymorphisms (SNPs) can be used as genetic markers for applications such as genetic diversity studies or genetic mapping. New technologies now allow genotyping hundreds to thousands of SNPs in a single reaction.
In order to evaluate the potential of these technologies in pea, we selected a custom 384-SNP set using SNPs discovered in Pisum through the resequencing of gene fragments in different genotypes and by compiling genomic sequence data present in databases. We then designed an Illumina GoldenGate assay to genotype both a Pisum germplasm collection and a genetic mapping population with the SNP set.
We obtained clear allelic data for more than 92% of the SNPs (356 out of 384). Interestingly, the technique was successful for all the genotypes present in the germplasm collection, including those from species or subspecies different from the P. sativum ssp sativum used to generate sequences. By genotyping the mapping population with the SNP set, we obtained a genetic map and map positions for 37 new gene markers.
Our results show that the Illumina GoldenGate assay can be used successfully for high-throughput SNP genotyping of diverse germplasm in pea. This genotyping approach will simplify genotyping procedures for association mapping or diversity studies purposes and open new perspectives in legume genomics.
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the ‘Golden Delicious’ genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.
Single nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe.
We designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species.
Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers.
The need for higher yielding and better-adapted crop plants for feeding the world's rapidly growing population has raised the question of how to systematically utilize large genebank collections with their wide range of largely untouched genetic diversity. Phenotypic data that has been recorded for decades during various rounds of seed multiplication provides a rich source of information. Their usefulness has remained limited though, due to various biases induced by conservation management over time or changing environmental conditions. Here, we present a powerful procedure that permits an unbiased trait-based selection of plant samples based on such phenotypic data. Applying this technique to the wheat collection of one of the largest genebanks worldwide, we identified groups of plant samples displaying contrasting phenotypes for selected traits. As a proof of concept for our discovery pipeline, we resequenced the entire major but conserved flowering time locus Ppd-D1 in just a few such selected wheat samples – and nearly doubled the number of hitherto known alleles.
Current breeding approaches in potato rely almost entirely on phenotypic evaluations; molecular markers, with the exception of a few linked to disease resistance traits, are not widely used. Large-scale sequence datasets generated primarily through Sanger Expressed Sequence Tag projects are available from a limited number of potato cultivars and access to next generation sequencing technologies permits rapid generation of sequence data for additional cultivars. When coupled with the advent of high throughput genotyping methods, an opportunity now exists for potato breeders to incorporate considerably more genotypic data into their decision-making.
To identify a large number of Single Nucleotide Polymorphisms (SNPs) in elite potato germplasm, we sequenced normalized cDNA prepared from three commercial potato cultivars: 'Atlantic', 'Premier Russet' and 'Snowden'. For each cultivar, we generated 2 Gb of sequence which was assembled into a representative transcriptome of ~28-29 Mb for each cultivar. Using the Maq SNP filter that filters read depth, density, and quality, 575,340 SNPs were identified within these three cultivars. In parallel, 2,358 SNPs were identified within existing Sanger sequences for three additional cultivars, 'Bintje', 'Kennebec', and 'Shepody'. Using a stringent set of filters in conjunction with the potato reference genome, we identified 69,011 high confidence SNPs from these six cultivars for use in genotyping with the Infinium platform. Ninety-six of these SNPs were used with a BeadXpress assay to assess allelic diversity in a germplasm panel of 248 lines; 82 of the SNPs proved sufficiently informative for subsequent analyses. Within diverse North American germplasm, the chip processing market class was most distinct, clearly separated from all other market classes. The round white and russet market classes both include fresh market and processing cultivars. Nevertheless, the russet and round white market classes are more distant from each other than processing are from fresh market types within these two groups.
The genotype data generated in this study, albeit limited in number, has revealed distinct relationships among the market classes of potato. The SNPs identified in this study will enable high-throughput genotyping of germplasm and populations, which in turn will enable more efficient marker-assisted breeding efforts in potato.
Melon (Cucumis melo L.) is a highly diverse species that is cultivated worldwide. Recent advances in massively parallel sequencing have begun to allow the study of nucleotide diversity in this species. The Sanger method combined with medium-throughput 454 technology were used in a previous study to analyze the genetic diversity of germplasm representing 3 botanical varieties, yielding a collection of about 40,000 SNPs distributed in 14,000 unigenes. However, the usefulness of this resource is limited as the sequenced genotypes do not represent the whole diversity of the species, which is divided into two subspecies with many botanical varieties variable in plant, flowering, and fruit traits, as well as in stress response. As a first step to extensively document levels and patterns of nucleotide variability across the species, we used the high-throughput SOLiD™ system to resequence the transcriptomes of a set of 67 genotypes that had previously been selected from a core collection representing the extant variation of the entire species.
The deep transcriptome resequencing of all of the genotypes, grouped into 8 pools (wild African agrestis, Asian agrestis and acidulus, exotic Far Eastern conomon, Indian momordica and Asian dudaim and flexuosus, commercial cantalupensis, subsp. melo Asian and European landraces, Spanish inodorus landraces, and Piel de Sapo breeding lines) yielded about 300 M reads. Short reads were mapped to the recently generated draft genome assembly of the DHL line Piel de Sapo (inodorus) x Songwhan Charmi (conomon) and to a new version of melon transcriptome. Regions with at least 6X coverage were used in SNV calling, generating a melon collection with 303,883 variants. These SNVs were dispersed across the entire C. melo genome, and distributed in 15,064 annotated genes. The number and variability of in silico SNVs differed considerably between pools. Our finding of higher genomic diversity in wild and exotic agrestis melons from India and Africa as compared to commercial cultivars, cultigens and landraces from Eastern Europe, Western Asia and the Mediterranean basin is consistent with the evolutionary history proposed for the species. Group-specific SNVs that will be useful in introgression programs were also detected. In a sample of 143 selected putative SNPs, we verified 93% of the polymorphisms in a panel of 78 genotypes.
This study provides the first comprehensive resequencing data for wild, exotic, and cultivated (landraces and commercial) melon transcriptomes, yielding the largest melon SNP collection available to date and representing a notable sample of the species diversity. This data provides a valuable resource for creating a catalog of allelic variants of melon genes and it will aid in future in-depth studies of population genetics, marker-assisted breeding, and gene identification aimed at developing improved varieties.
Cultivated peanut, or groundnut (Arachis hypogaea L.), is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). In recent years, many efforts have been made to construct linkage maps in cultivated peanut, but almost all of these maps were constructed using low-throughput molecular markers, and most show a low density, directly influencing the value of their applications. With advances in next-generation sequencing (NGS) technology, the construction of high-density genetic maps has become more achievable in a cost-effective and rapid manner. The objective of this study was to establish a high-density single nucleotide polymorphism (SNP)-based genetic map for cultivated peanut by analyzing next-generation double-digest restriction-site-associated DNA sequencing (ddRADseq) reads.
We constructed reduced representation libraries (RRLs) for two A. hypogaea lines and 166 of their recombinant inbred line (RIL) progenies using the ddRADseq technique. Approximately 175 gigabases of data containing 952,679,665 paired-end reads were obtained following Solexa sequencing. Mining this dataset, 53,257 SNPs were detected between the parents, of which 14,663 SNPs were also detected in the population, and 1,765 of the obtained polymorphic markers met the requirements for use in the construction of a genetic map. Among 50 randomly selected in silico SNPs, 47 were able to be successfully validated. One linkage map was constructed, which was comprised of 1,685 marker loci, including 1,621 SNPs and 64 simple sequence repeat (SSR) markers. The map displayed a distribution of the markers into 20 linkage groups (LGs A01–A10 and B01–B10), spanning a distance of 1,446.7 cM. The alignment of the LGs from this map was shown in comparison with a previously integrated consensus map from peanut.
This study showed that the ddRAD library combined with NGS allowed the rapid discovery of a large number of SNPs in the cultivated peanut. The first high density SNP-based linkage map for A. hypogaea was generated that can serve as a reference map for cultivated Arachis species and will be useful in genetic mapping. Our results contribute to the available molecular marker resources and to the assembly of a reference genome sequence for the peanut.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-351) contains supplementary material, which is available to authorized users.
Cultivated peanut; Linkage map; SNP; ddRADseq
Amorphophallus is a genus of perennial plants widely distributed in the tropics or subtropics of West Africa and South Asia. Its corms contain a high level of water-soluble glucomannan; therefore, it has long been used as a medicinal herb and food source. Genetic studies of Amorphophallus have been hindered by a lack of genetic markers. A large number of molecular markers are required for genetic diversity study and improving disease resistance in Amorphophallus. Here, we report large scale of transcriptome sequencing of two species: Amorphophallus konjac and Amorphophallus bulbifer using deep sequencing technology, and microsatellite (SSR) markers were identified based on these transcriptome sequences.
cDNAs of A. konjac and A. bulbifer were sequenced using Illumina HiSeq™ 2000 sequencing technology. A total of 135,822 non-redundant unigenes were assembled from about 9.66 gigabases, and 19,596 SSRs were identified in 16,027 non-redundant unigenes. Di-nucleotide SSRs were the most abundant motif (61.6%), followed by tri- (30.3%), tetra- (5.6%), penta- (1.5%), and hexa-nucleotides (1%) repeats. The top di- and tri-nucleotide repeat motifs included AG/CT (45.2%) and AGG/CCT (7.1%), respectively. A total of 10,754 primer pairs were designed for marker development. Of these, 320 primers were synthesized and used for validation of amplification and assessment of polymorphisms in 25 individual plants. The total of 275 primer pairs yielded PCR amplification products, of which 205 were polymorphic. The number of alleles ranged from 2 to 14 and the polymorphism information content valued ranged from 0.10 to 0.90. Genetic diversity analysis was done using 177 highly polymorphic SSR markers. A phenogram based on Jaccard’s similarity coefficients was constructed, which showed a distinct cluster of 25 Amorphophallus individuals.
A total of 10,754 SSR markers have been identified in Amorphophallus using transcriptome sequencing. One hundred and seventy-seven polymorphic markers were successfully validated in 25 individuals. The large number of genetic markers developed in the present study should contribute greatly to research into genetic diversity and germplasm characterization in Amorphophallus.
Amorphophallus; Microsatellite marker; Transcriptome; Genetic diversity
Rice is the world’s most important staple grown by millions of small-holder farmers. Sustaining rice production relies on the intelligent use of rice diversity. The 3,000 Rice Genomes Project is a giga-dataset of publically available genome sequences (averaging 14× depth of coverage) derived from 3,000 accessions of rice with global representation of genetic and functional diversity. The seed of these accessions is available from the International Rice Genebank Collection. Together, they are an unprecedented resource for advancing rice science and breeding technology. Our immediate challenge now is to comprehensively and systematically mine this dataset to link genotypic variation to functional variation with the ultimate goal of creating new and sustainable rice varieties that can support a future world population that will approach 9.6 billion by 2050.
Oryza sativa; Genetic resources; Genome diversity; Phenomics; Sequencing
Genotyping by sequencing, a new low-cost, high-throughput sequencing technology was used to genotype 2,815 maize inbred accessions, preserved mostly at the National Plant Germplasm System in the USA. The collection includes inbred lines from breeding programs all over the world.
The method produced 681,257 single-nucleotide polymorphism (SNP) markers distributed across the entire genome, with the ability to detect rare alleles at high confidence levels. More than half of the SNPs in the collection are rare. Although most rare alleles have been incorporated into public temperate breeding programs, only a modest amount of the available diversity is present in the commercial germplasm. Analysis of genetic distances shows population stratification, including a small number of large clusters centered on key lines. Nevertheless, an average fixation index of 0.06 indicates moderate differentiation between the three major maize subpopulations. Linkage disequilibrium (LD) decays very rapidly, but the extent of LD is highly dependent on the particular group of germplasm and region of the genome. The utility of these data for performing genome-wide association studies was tested with two simply inherited traits and one complex trait. We identified trait associations at SNPs very close to known candidate genes for kernel color, sweet corn, and flowering time; however, results suggest that more SNPs are needed to better explore the genetic architecture of complex traits.
The genotypic information described here allows this publicly available panel to be exploited by researchers facing the challenges of sustainable agriculture through better knowledge of the nature of genetic diversity.
Diversity; Genotyping by sequencing; Germplasm; Maize; Public