The advantages of using molecular markers in modern genebanks are well documented. They are commonly used to understand the distribution of genetic diversity in populations and among species which is crucial for efficient management and effective utilization of germplasm collections. We describe the development of two types of DArT molecular marker platforms for the new oilseed crop lesquerella (Physaria spp.), a member of the Brassicaceae family, to characterize a collection in the National Plant Germplasm System (NPGS) with relatively little known in regards to the genetic diversity and traits. The two types of platforms were developed using a subset of the germplasm conserved ex situ consisting of 87 Physaria and 2 Paysonia accessions. The microarray DArT revealed a total of 2,833 polymorphic markers with an average genotype call rate of 98.4% and a scoring reproducibility of 99.7%. On the other hand, the DArTseq platform developed for SNP and DArT markers from short sequence reads showed a total of 27,748 high quality markers. Cluster analysis and principal coordinate analysis indicated that the different accessions were successfully classified by both systems based on species, by geographical source, and breeding status. In the germplasm set analyzed, which represented more than 80% of the P. fendleri collection, we observed that a substantial amount of variation exists in the species collection. These markers will be valuable in germplasm management studies and lesquerella breeding, and augment the microsatellite markers previously developed on the taxa.
Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes.
Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins.
Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.
Pepper; Capsicum spp; Molecular Markers; EST; Transcriptome; RNAseq; Annotation; SNP; SSR; SPP
Single Nucleotide Polymorphisms (SNPs) can be used as genetic markers for applications such as genetic diversity studies or genetic mapping. New technologies now allow genotyping hundreds to thousands of SNPs in a single reaction.
In order to evaluate the potential of these technologies in pea, we selected a custom 384-SNP set using SNPs discovered in Pisum through the resequencing of gene fragments in different genotypes and by compiling genomic sequence data present in databases. We then designed an Illumina GoldenGate assay to genotype both a Pisum germplasm collection and a genetic mapping population with the SNP set.
We obtained clear allelic data for more than 92% of the SNPs (356 out of 384). Interestingly, the technique was successful for all the genotypes present in the germplasm collection, including those from species or subspecies different from the P. sativum ssp sativum used to generate sequences. By genotyping the mapping population with the SNP set, we obtained a genetic map and map positions for 37 new gene markers.
Our results show that the Illumina GoldenGate assay can be used successfully for high-throughput SNP genotyping of diverse germplasm in pea. This genotyping approach will simplify genotyping procedures for association mapping or diversity studies purposes and open new perspectives in legume genomics.
Current breeding approaches in potato rely almost entirely on phenotypic evaluations; molecular markers, with the exception of a few linked to disease resistance traits, are not widely used. Large-scale sequence datasets generated primarily through Sanger Expressed Sequence Tag projects are available from a limited number of potato cultivars and access to next generation sequencing technologies permits rapid generation of sequence data for additional cultivars. When coupled with the advent of high throughput genotyping methods, an opportunity now exists for potato breeders to incorporate considerably more genotypic data into their decision-making.
To identify a large number of Single Nucleotide Polymorphisms (SNPs) in elite potato germplasm, we sequenced normalized cDNA prepared from three commercial potato cultivars: 'Atlantic', 'Premier Russet' and 'Snowden'. For each cultivar, we generated 2 Gb of sequence which was assembled into a representative transcriptome of ~28-29 Mb for each cultivar. Using the Maq SNP filter that filters read depth, density, and quality, 575,340 SNPs were identified within these three cultivars. In parallel, 2,358 SNPs were identified within existing Sanger sequences for three additional cultivars, 'Bintje', 'Kennebec', and 'Shepody'. Using a stringent set of filters in conjunction with the potato reference genome, we identified 69,011 high confidence SNPs from these six cultivars for use in genotyping with the Infinium platform. Ninety-six of these SNPs were used with a BeadXpress assay to assess allelic diversity in a germplasm panel of 248 lines; 82 of the SNPs proved sufficiently informative for subsequent analyses. Within diverse North American germplasm, the chip processing market class was most distinct, clearly separated from all other market classes. The round white and russet market classes both include fresh market and processing cultivars. Nevertheless, the russet and round white market classes are more distant from each other than processing are from fresh market types within these two groups.
The genotype data generated in this study, albeit limited in number, has revealed distinct relationships among the market classes of potato. The SNPs identified in this study will enable high-throughput genotyping of germplasm and populations, which in turn will enable more efficient marker-assisted breeding efforts in potato.
Research related to crop domestication has been transformed by technologies and discoveries in the genome sciences as well as information-related sciences that are providing new tools for bioinformatics and systems' biology. Rapid progress in archaeobotany and ethnobotany are also contributing new knowledge to understanding crop domestication. This sense of rapid progress is encapsulated in this Special Issue, which contains 18 papers by scientists in botanical, crop sciences and related disciplines on the topic of crop domestication. One paper focuses on current themes in the genetics of crop domestication across crops, whereas other papers have a crop or geographic focus. One feature of progress in the sciences related to crop domestication is the availability of well-characterized germplasm resources in the global network of genetic resources centres (genebanks). Germplasm in genebanks is providing research materials for understanding domestication as well as for plant breeding. In this review, we highlight current genetic themes related to crop domestication. Impressive progress in this field in recent years is transforming plant breeding into crop engineering to meet the human need for increased crop yield with the minimum environmental impact – we consider this to be ‘super-domestication’. While the time scale of domestication of 10 000 years or less is a very short evolutionary time span, the details emerging of what has happened and what is happening provide a window to see where domestication might – and can – advance in the future.
Evolution; gene cloning; gene pyramiding; gene duplication; marker assisted selection; QTL; crop wild relatives
The distinctness of, and overlap between, pea genotypes held in several Pisum germplasm collections has been used to determine their relatedness and to test previous ideas about the genetic diversity of Pisum. Our characterisation of genetic diversity among 4,538 Pisum accessions held in 7 European Genebanks has identified sources of novel genetic variation, and both reinforces and refines previous interpretations of the overall structure of genetic diversity in Pisum. Molecular marker analysis was based upon the presence/absence of polymorphism of retrotransposon insertions scored by a high-throughput microarray and SSAP approaches. We conclude that the diversity of Pisum constitutes a broad continuum, with graded differentiation into sub-populations which display various degrees of distinctness. The most distinct genetic groups correspond to the named taxa while the cultivars and landraces of Pisum sativum can be divided into two broad types, one of which is strongly enriched for modern cultivars. The addition of germplasm sets from six European Genebanks, chosen to represent high diversity, to a single collection previously studied with these markers resulted in modest additions to the overall diversity observed, suggesting that the great majority of the total genetic diversity collected for the Pisum genus has now been described. Two interesting sources of novel genetic variation have been identified. Finally, we have proposed reference sets of core accessions with a range of sample sizes to represent Pisum diversity for the future study and exploitation by researchers and breeders.
Electronic supplementary material
The online version of this article (doi:10.1007/s00122-012-1839-1) contains supplementary material, which is available to authorized users.
The EcoTILLING technique allows polymorphisms in target genes of natural populations to be quickly analysed or identified and facilitates the screening of genebank collections for desired traits. We have developed an EcoTILLING platform to exploit Capsicum genetic resources. A perfect example of the utility of this EcoTILLING platform is its application in searching for new virus-resistant alleles in Capsicum genus. Mutations in translation initiation factors (eIF4E, eIF(iso)4E, eIF4G and eIF(iso)4G) break the cycle of several RNA viruses without affecting the plant life cycle, which makes these genes potential targets to screen for resistant germplasm.
We developed and assayed a cDNA-based EcoTILLING platform with 233 cultivated accessions of the genus Capsicum. High variability in the coding sequences of the eIF4E and eIF(iso)4E genes was detected using the cDNA platform. After sequencing, 36 nucleotide changes were detected in the CDS of eIF4E and 26 in eIF(iso)4E. A total of 21 eIF4E haplotypes and 15 eIF(iso)4E haplotypes were identified. To evaluate the functional relevance of this variability, 31 possible eIF4E/eIF(iso)4E combinations were tested against Potato virus Y. The results showed that five new eIF4E variants (pvr210, pvr211, pvr212, pvr213 and pvr214) were related to PVY-resistance responses.
EcoTILLING was optimised in different Capsicum species to detect allelic variants of target genes. This work is the first to use cDNA instead of genomic DNA in EcoTILLING. This approach avoids intronic sequence problems and reduces the number of reactions. A high level of polymorphism has been identified for initiation factors, showing the high genetic variability present in our collection and its potential use for other traits, such as genes related to biotic or abiotic stresses, quality or production. Moreover, the new eIF4E and eIF(iso)4E alleles are an excellent collection for searching for new resistance against other RNA viruses.
Ex-situ conservation of crop diversity is a global concern, and the development of an efficient and sustainable conservation system is a historic priority recognized in international law and policy. We assess the completeness of the safety duplication collection in the Svalbard Global Seed Vault with respect to data on the world's ex-situ collections as reported by the Food and Agriculture Organization of the United Nations. Currently, 774,601 samples are deposited at Svalbard by 53 genebanks. We estimate that more than one third of the globally distinct accessions of 156 crop genera stored in genebanks as orthodox seeds are conserved in the Seed Vault. The numbers of safety duplicates of Triticum (wheat), Sorghum (sorghum), Pennisetum (pearl millet), Eleusine (finger millet), Cicer (chickpea) and Lens (lentil) exceed 50% of the estimated numbers of distinct accessions in global ex-situ collections. The number of accessions conserved globally generally reflects importance for food production, but there are significant gaps in the safety collection at Svalbard in some genera of high importance for food security in tropical countries, such as Amaranthus (amaranth), Chenopodium (quinoa), Eragrostis (teff) and Abelmoschus (okra). In the 29 food-crop genera with the largest number of accessions stored globally, an average of 5.5 out of the ten largest collections is already represented in the Seed Vault collection or is covered by existing deposit agreements. The high coverage of ITPGRFA Annex 1 crops and of those crops for which there is a CGIAR mandate in the current Seed Vault collection indicates that existence of international policies and institutions are important determinants for accessions to be safety duplicated at Svalbard. As a back-up site for the global conservation system, the Seed Vault plays not only a practical but also a symbolic role for enhanced integration and cooperation for conservation of crop diversity.
The National Institute of Agrobiological Sciences (NIAS) is implementing the NIAS Genebank Project for conservation and promotion of agrobiological genetic resources to contribute to the development and utilization of agriculture and agricultural products. The project’s databases (NIASGBdb; http://www.gene.affrc.go.jp/databases_en.php) consist of a genetic resource database and a plant diseases database, linked by a web retrieval database. The genetic resources database has plant and microorganism search systems to provide information on research materials, including passport and evaluation data for genetic resources with the desired properties. To facilitate genetic diversity research, several NIAS Core Collections have been developed. The NIAS Rice (Oryza sativa) Core Collection of Japanese Landraces contains information on simple sequence repeat (SSR) polymorphisms. SSR marker information for azuki bean (Vigna angularis) and black gram (V. mungo) and DNA sequence data from some selected Japanese strains of the genus Fusarium are also available. A database of plant diseases in Japan has been developed based on the listing of common names of plant diseases compiled by the Phytopathological Society of Japan. Relevant plant and microorganism genetic resources are associated with the plant disease names by the web retrieval database and can be obtained from the NIAS Genebank for research or educational purposes.
Mutations in the mitochondrial DNA (mtDNA) have been reported in a wide variety of human neoplasms. A polynucleotide tract extending from 303 to 315 nucleotide positions (D310) within the non-coding region of mtDNA has been identified as a mutational hotspot of primary tumors. This region consists of two polycytosine stretches interrupted by a thymidine nucleotide. The number of cytosines at the first and second stretches are 7 and 5 respectively, according to the GeneBank sequence. The first stretch exhibits a polymorphic length variation (6-C to 9-C) among individuals and has been investigated in many cancer types. Large-scale studies are needed to clarify the relationship between cytosine number and cancer development/progression. However, time and money consuming methods such as radioactivity-based gel electrophoresis and sequencing, are not appropriate for the determination of this polymorphism for large case-control studies. In this study, we conducted a rapid RFLP analysis using a restriction enzyme, BsaXI, for the single step simple determination of 7-C carriers at the first stretch in D310 region.
25 colorectal cancer patients, 25 breast cancer patients and 41 healthy individuals were enrolled into the study. PCR amplification followed by restriction enzyme digestion of D310 region was performed for RFLP analysis. Digestion products were analysed by agarose gel electrophoresis. Sequencing was also applied to samples in order to confirm the RFLP data.
Samples containing 7-C at first stretch of D310 region were successfully determined by the BsaXI RFLP method. Heteroplasmy and homoplasmy for 7-C content was also determined as evidenced by direct sequencing. Forty-one percent of the studied samples were found to be BsaXI positive. Furthermore, BsaXI status of colorectal cancer samples were significantly different from that of healthy individuals.
In conclusion, BsaXI RFLP analysis is a simple and rapid approach for the single step determination of D310 polymorphism of mitochondrial DNA. This method allows the evaluation of a significant proportion of samples without the need for sequencing- and/or radioactivity-based techniques.
We present the first set of microsatellite markers developed exclusively for an extinct taxon. Microsatellite data have been analysed in thousands of genetic studies on extant species but the technology can be problematic when applied to low copy number (LCN) DNA. It is therefore rarely used on substrates more than a few decades old. Now, with the primers and protocols presented here, microsatellite markers are available to study the extinct New Zealand moa (Aves: Dinornithiformes) and, as with single nucleotide polymorphism (SNP) technology, the markers represent a means by which the field of ancient DNA can (preservation allowing) move on from its reliance on mitochondrial DNA. Candidate markers were identified using high throughput sequencing technology (GS-FLX) on DNA extracted from fossil moa bone and eggshell. From the ‘shotgun’ reads, >60 primer pairs were designed and tested on DNA from bones of the South Island giant moa (Dinornis robustus). Six polymorphic loci were characterised and used to assess measures of genetic diversity. Because of low template numbers, typical of ancient DNA, allelic dropout was observed in 36–70% of the PCR reactions at each microsatellite marker. However, a comprehensive survey of allelic dropout, combined with supporting quantitative PCR data, allowed us to establish a set of criteria that maximised data fidelity. Finally, we demonstrated the viability of the primers and the protocols, by compiling a full Dinornis microsatellite dataset representing fossils of c. 600–5000 years of age. A multi-locus genotype was obtained from 74 individuals (84% success rate), and the data showed no signs of being compromised by allelic dropout. The methodology presented here provides a framework by which to generate and evaluate microsatellite data from samples of much greater antiquity than attempted before, and opens new opportunities for ancient DNA research.
Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible.
The wild relatives of crops represent a major source of valuable traits for crop improvement. These resources are threatened by habitat destruction, land use changes, and other factors, requiring their urgent collection and long-term availability for research and breeding from ex situ collections. We propose a method to identify gaps in ex situ collections (i.e. gap analysis) of crop wild relatives as a means to guide efficient and effective collecting activities.
The methodology prioritizes among taxa based on a combination of sampling, geographic, and environmental gaps. We apply the gap analysis methodology to wild taxa of the Phaseolus genepool. Of 85 taxa, 48 (56.5%) are assigned high priority for collecting due to lack of, or under-representation, in genebanks, 17 taxa are given medium priority for collecting, 15 low priority, and 5 species are assessed as adequately represented in ex situ collections. Gap “hotspots”, representing priority target areas for collecting, are concentrated in central Mexico, although the narrow endemic nature of a suite of priority species adds a number of specific additional regions to spatial collecting priorities.
Results of the gap analysis method mostly align very well with expert opinion of gaps in ex situ collections, with only a few exceptions. A more detailed prioritization of taxa and geographic areas for collection can be achieved by including in the analysis predictive threat factors, such as climate change or habitat destruction, or by adding additional prioritization filters, such as the degree of relatedness to cultivated species (i.e. ease of use in crop breeding). Furthermore, results for multiple crop genepools may be overlaid, which would allow a global analysis of gaps in ex situ collections of the world's plant genetic resources.
Genotyping by sequencing, a new low-cost, high-throughput sequencing technology was used to genotype 2,815 maize inbred accessions, preserved mostly at the National Plant Germplasm System in the USA. The collection includes inbred lines from breeding programs all over the world.
The method produced 681,257 single-nucleotide polymorphism (SNP) markers distributed across the entire genome, with the ability to detect rare alleles at high confidence levels. More than half of the SNPs in the collection are rare. Although most rare alleles have been incorporated into public temperate breeding programs, only a modest amount of the available diversity is present in the commercial germplasm. Analysis of genetic distances shows population stratification, including a small number of large clusters centered on key lines. Nevertheless, an average fixation index of 0.06 indicates moderate differentiation between the three major maize subpopulations. Linkage disequilibrium (LD) decays very rapidly, but the extent of LD is highly dependent on the particular group of germplasm and region of the genome. The utility of these data for performing genome-wide association studies was tested with two simply inherited traits and one complex trait. We identified trait associations at SNPs very close to known candidate genes for kernel color, sweet corn, and flowering time; however, results suggest that more SNPs are needed to better explore the genetic architecture of complex traits.
The genotypic information described here allows this publicly available panel to be exploited by researchers facing the challenges of sustainable agriculture through better knowledge of the nature of genetic diversity.
Diversity; Genotyping by sequencing; Germplasm; Maize; Public
The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay.
A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina's Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274.
Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs.
Next-generation sequencing technologies are revolutionizing the field of evolutionary biology, opening the possibility for genetic analysis at scales not previously possible. Research in population genetics, quantitative trait mapping, comparative genomics, and phylogeography that was unthinkable even a few years ago is now possible. More importantly, these next-generation sequencing studies can be performed in organisms for which few genomic resources presently exist. To speed this revolution in evolutionary genetics, we have developed Restriction site Associated DNA (RAD) genotyping, a method that uses Illumina next-generation sequencing to simultaneously discover and score tens to hundreds of thousands of single-nucleotide polymorphism (SNP) markers in hundreds of individuals for minimal investment of resources. In this chapter, we describe the core RAD-seq protocol, which can be modified to suit a diversity of evolutionary genetic questions. In addition, we discuss bioinformatic considerations that arise from unique aspects of next-generation sequencing data as compared to traditional marker-based approaches, and we outline some general analytical approaches for RAD-seq and similar data. Despite considerable progress, the development of analytical tools remains in its infancy, and further work is needed to fully quantify sampling variance and biases in these data types.
Genetic mapping; Population genetics; Genomics; Evolution; Genotyping; Single-Nucleotide Polymorphisms; Next-generation sequencing; RAD-seq
The Jinchuan yak is a new yak population identified in Sichuan, China. This population has a special anatomical characteristic: an additional pair of ribs compared with other yak breeds. The genetic structure of this population is unknown. In the present study, we investigated the maternal phylogeny of this special yak population using the mitochondrial DNA variation. A total of 23 Jinchuan yaks were sequenced for a 823-bp fragment of D-loop control region and three individuals were sequenced for the whole mtDNA genome with a length of 16,371-bp. To compare with the data from other yaks, we extracted sequence data from Genebank, including D-loop of 398 yaks (from 12 breeds) and 55 wild yaks, and whole mitochondrial genomes of 53 yaks (from 12 breeds) and 21 wild yaks. A total of 127 haplotypes were defined, based on the D-loop data. Thirteen haplotypes were defined from 23 mtDNA D-loop sequences of Jinchuan yaks, six of which were shared only by Jinchuan, and one was shared by Jinchuan and wild yaks. The Jinquan yaks were found to carry clades A and B from lineage I and clade C of lineage II, respectively. It was also suggested that the Jinchuan population has no distinct different phylogenetic relationship in maternal inheritance with other breeds of yak. The highly haplotype diversity of the Pali breed, Jinchuan population, Maiwa breed and Jiulong breed suggested that the yak was first domesticated from wild yaks in the middle Himalayan region and the northern Hengduan Mountains. The special anatomic characteristic that we found in the Jinchuan population needs further studies based on nuclear data.
Bos grunniens; Jinchuan yak population; mtDNA control region; mitochondrial genome; genetic diversity; phylogeny
Plant breeding has been very successful in developing improved varieties using conventional tools and methodologies. Nowadays, the availability of genomic tools and resources is leading to a new revolution of plant breeding, as they facilitate the study of the genotype and its relationship with the phenotype, in particular for complex traits. Next Generation Sequencing (NGS) technologies are allowing the mass sequencing of genomes and transcriptomes, which is producing a vast array of genomic information. The analysis of NGS data by means of bioinformatics developments allows discovering new genes and regulatory sequences and their positions, and makes available large collections of molecular markers. Genome-wide expression studies provide breeders with an understanding of the molecular basis of complex traits. Genomic approaches include TILLING and EcoTILLING, which make possible to screen mutant and germplasm collections for allelic variants in target genes. Re-sequencing of genomes is very useful for the genome-wide discovery of markers amenable for high-throughput genotyping platforms, like SSRs and SNPs, or the construction of high density genetic maps. All these tools and resources facilitate studying the genetic diversity, which is important for germplasm management, enhancement and use. Also, they allow the identification of markers linked to genes and QTLs, using a diversity of techniques like bulked segregant analysis (BSA), fine genetic mapping, or association mapping. These new markers are used for marker assisted selection, including marker assisted backcross selection, ‘breeding by design’, or new strategies, like genomic selection. In conclusion, advances in genomics are providing breeders with new tools and methodologies that allow a great leap forward in plant breeding, including the ‘superdomestication’ of crops and the genetic dissection and breeding for complex traits.
Bioinformatics; complex traits; genetic maps; marker assisted selection; molecular markers; next-generation-sequencing; quantitative trait loci.
Recent genome-wide single nucleotide polymorphism (SNP) association studies (GWAS) have identified a number of SNPs that were significantly associated with coronary artery disease (CAD) and myocardial infarction (MI). However, many independent replication studies in other populations are needed to unequivocally confirm the GWAS association. To assess GWAS association, we have established a case-control cohort consisting of 1,231 well-characterized MI patients and 560 controls without detectable coronary stenosis, all selected from the Cleveland Genebank population. The Genebank cohort has a sufficient power to detect the association between MI and four GWAS SNPs, including rs17465637 within the MIA3 gene, rs2943634 (intergenic), rs6922269 in MTHFD1L, and rs599839 near SORT1. SNPs were genotyped by TaqMan assays and follow-up multivariate logistic regression analysis with incorporation of significant covariates showed significant association with MI for MIA3 SNP rs17465637 (P-adj=0.0034) and SORT1 SNP rs599839 (P-adj=0.009). The minor allele G of rs599839 was also associated with a decreased LDL-C level of 5–9 mg/dL per allele, but not with HDL-C or triglyceride levels. No association for MI or lipid levels was found for SNPs rs2943634 and rs6922269 (P-adj>0.05). Our results establish two SNPs, rs17465637 in MIA3 and rs599839 near SORT1 as significant risk factors for MI in the American Genebank Caucasian population.
genome-wide association study (GWAS); single nucleotide polymorphism (SNP); myocardial infarction (MI); coronary artery disease; genetics; LDL; SNP rs17465637; SNP rs599839
Gene-modified cell vaccines are the best way to achieve the immunotherapy for all types of acute leukemia. In this study, the recombinant eukaryotic expression vector (pDisplay-HSP70) of heat shock protein 70 (HSP70) of Bacille Calmette-Guérin (BCG) was constructed by amplifying the whole BCG HSP70 gene using polymerase chain reaction (PCR) and sub-cloning into the polyclone endonuclease sites in pDisplay. Then the HL-60 cell vaccine expressing the protein onto the cell surface was prepared by lipofectamine transfection and its anti-tumor effect and mechanism were further studied. Results showed that the fragment of BCG HSP70 was consistent with Mycobacterium tuberculosis HSP70 gene published in GeneBank. DNA sequencing showed that the recombinant vector was correctly constructed and named pDisplay-HSP70. After BCG HSP70 gene transfection, the yellow-green fluorescence on the HL-60 cells surface was observed under a fluorescence microscope. The immunogenicity of HSP70-transfected HL-60 cells exhibited upregulated proliferation of lymphocytes, increased cytokine secretion (IFN-γ) and enhanced killing activity. These results suggested that gene transfection of BCG HSP70 could significantly enhance the immunogenicity of HL-60 cells. It may be used as a suitable candidate gene-modified cell vaccine for cancer immunotherapy.
BCG; Heat shock protein 70; gene transfection; HL-60; cancer vaccine
Whole genome approaches using single nucleotide polymorphism (SNP) markers have the
potential to transform complex disease genetics and expedite pharmacogenetics research.
This has led to a requirement for high-throughput SNP genotyping platforms.
Development of a successful high-throughput genotyping platform depends on coupling
reliable assay chemistry with an appropriate detection system to maximise efficiency with
respect to accuracy, speed and cost. Current technology platforms are able to deliver
throughputs in excess of 100 000 genotypes per day, with an accuracy of >99%, at a cost
of 20–30 cents per genotype. In order to meet the demands of the coming years, however,
genotyping platforms need to deliver throughputs in the order of one million genotypes per
day at a cost of only a few cents per genotype. In addition, DNA template requirements
must be minimised such that hundreds of thousands of SNPs can be interrogated using a
relatively small amount of genomic DNA. As such, it is predicted that the next generation
of high-throughput genotyping platforms will exploit large-scale multiplex reactions and
solid phase assay detection systems.
Genome-wide association studies have proven to be a highly successful method for identification of genetic loci for complex phenotypes in both humans and model organisms. These large scale studies rely on the collection of hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome. Standard high-throughput genotyping technologies capture only a fraction of the total genetic variation. Recent efforts have shown that it is possible to “impute” with high accuracy the genotypes of SNPs that are not collected in the study provided that they are present in a reference data set which contains both SNPs collected in the study as well as other SNPs. We here introduce a novel HMM based technique to solve the imputation problem that addresses several shortcomings of existing methods. First, our method is adaptive which lets it estimate population genetic parameters from the data and be applied to model organisms that have very different evolutionary histories. Compared to previous methods, our method is up to ten times more accurate on model organisms such as mouse. Second, our algorithm scales in memory usage in the number of collected markers as opposed to the number of known SNPs. This issue is very relevant due to the size of the reference data sets currently being generated. We compare our method over mouse and human data sets to existing methods, and show that each has either comparable or better performance and much lower memory usage. The method is available for download at http://genetics.cs.ucla.edu/eminim.
genetic variation; genetics; genomics; statistics
We present here the first use of DNA barcoding in a new approach to ethnobotany we coined "ethnobotany genomics". This new approach is founded on the concept of 'assemblage' of biodiversity knowledge, which includes a coming together of different ways of knowing and valorizing species variation in a novel approach seeking to add value to both traditional knowledge (TK) and scientific knowledge (SK). We employed contemporary genomic technology, DNA barcoding, as an important tool for identifying cryptic species, which were already recognized ethnotaxa using the TK classification systems of local cultures in the Velliangiri Hills of India. This research is based on several case studies in our lab, which define an approach to that is poised to evolve quickly with the advent of new ideas and technology. Our results show that DNA barcoding validated several new cryptic plant species to science that were previously recognized by TK classifications of the Irulas and Malasars, and were lumped using SK classification. The contribution of the local aboriginal knowledge concerning plant diversity and utility in India is considerable; our study presents new ethnomedicine to science. Ethnobotany genomics can also be used to determine the distribution of rare species and their ecological requirements, including traditional ecological knowledge so that conservation strategies can be implemented. This is aligned with the Convention on Biological Diversity that was signed by over 150 nations, and thus the world's complex array of human-natural-technological relationships has effectively been re-organized.
Next-generation sequencing technologies promise to dramatically accelerate the use of genetic information for crop improvement by facilitating the genetic mapping of agriculturally important phenotypes. The first step in optimizing the design of genetic mapping studies involves large-scale polymorphism discovery and a subsequent genome-wide assessment of the population structure and pattern of linkage disequilibrium (LD) in the species of interest. In the present study, we provide such an assessment for the grapevine (genus Vitis), the world's most economically important fruit crop. Reduced representation libraries (RRLs) from 17 grape DNA samples (10 cultivated V. vinifera and 7 wild Vitis species) were sequenced with sequencing-by-synthesis technology. We developed heuristic approaches for SNP calling, identified hundreds of thousands of SNPs and validated a subset of these SNPs on a 9K genotyping array. We demonstrate that the 9K SNP array provides sufficient resolution to distinguish among V. vinifera cultivars, between V. vinifera and wild Vitis species, and even among diverse wild Vitis species. We show that there is substantial sharing of polymorphism between V. vinifera and wild Vitis species and find that genetic relationships among V. vinifera cultivars agree well with their proposed geographic origins using principal components analysis (PCA). Levels of LD in the domesticated grapevine are low even at short ranges, but LD persists above background levels to 3 kb. While genotyping arrays are useful for assessing population structure and the decay of LD across large numbers of samples, we suggest that whole-genome sequencing will become the genotyping method of choice for genome-wide genetic mapping studies in high-diversity plant species. This study demonstrates that we can move quickly towards genome-wide studies of crop species using next-generation sequencing. Our study sets the stage for future work in other high diversity crop species, and provides a significant enhancement to current genetic resources available to the grapevine genetic community.
Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules.
High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array.
Conclusion and Significance
The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.