The Department of Energy, Joint Genome Institute (JGI) has completed an 8× shotgun draft sequence of the soybean cultivar Williams 82 [1
]. For initial assembly of the genome sequence, a preliminary 4× and 6.5× scaffold assembly was produced by the JGI, with the 6.5× assembly released to the public http://www.phytozome.net
. This 6.5× assembly contained a total of 3,118 scaffolds totaling 993.5 Mb of sequence. Using the soybean Consensus Map 4.0, which contains a total of 5,500 markers [2
], Schmutz et al., [1
] associated a total of 296 of the 6.5× scaffolds with the genetic map. These scaffolds consisted of 949 Mb, or 95.6% of the total 6.5× assembly.
The initial assembly resulted in the anchoring of a large proportion of the genome to create 20 psuedomolecules corresponding with the 20 soybean chromosomes. However, it was subsequently evident that this initial psuedomolecule build had a significant number of assembly problems [1
]. First and foremost was the insufficient resolution afforded by the Consensus Map 4.0, which had been constructed using five separate mapping populations and with most of the markers mapped in less than 100 individuals [2
]. Second, many of the anchored scaffolds contained just one mapped marker, or contained multiple tightly linked markers whose map order was questionable due to insufficient recombination. Thus, proper orientation of those scaffolds was not possible or was questionable.
The ideal marker for anchoring and orienting the soybean genome is the single nucleotide polymorphism (SNP), primarily because SNPs are the most abundant marker available. Cultivated soybean [Glycine max
(L.) Merr.] has nucleotide diversity (θ) of about 0.001 [3
], which translates into an average SNP frequency of one SNP per 1000 bp of contiguous sequence. The wild ancestor Glycine soja
(Sieb and Zucc.) has an estimated nucleotide diversity of θ = 0.00235, which is the equivalent of approximately one SNP per 425 bp [5
]. Another advantage of SNPs is the wide array of currently available technologies for performing multiplex assays that can range from genotyping a few SNPs at a time to over 1 million SNPs in parallel [6
]. One of these technologies is the GoldenGate assay, which can genotype 384 to 1,536 SNPs in 192 DNA samples in just three days. The reliability and rapidity of this assay was recently documented with soybean SNPs [2
New high-throughput re-sequencing technologies have recently become available for generating greater amounts of DNA sequence quickly and inexpensively relative to standard Sanger sequencing [9
]. Despite this advantage, large genomes still require a method to reduce genome complexity to a level that ensures accurate SNP discovery. One method utilizes high-throughput sequencing of the transcriptome through massively parallel pyro-sequencing technology [10
]. While this method was successful, SNP discovery using this procedure is restricted to the expressed transcriptome and would likely not discover SNPs that could be used to anchor and orient non-coding DNA stretches of the genome.
The use of reduced representation libraries (RRLs) was first proposed in humans to efficiently find SNPs using Sanger sequencing [11
]. A reduction in genome complexity is accomplished via the construction of an RRL with a restriction digestion followed by size selection. The use of fragments from a size-selected digestion permits a similar subset of fragments to be obtained from different genotypes that can be deep-sequenced for accurate SNP discovery. A procedure for high-throughput SNP discovery was recently described in cattle, and used an RRL combined with the sequence-by-synthesis (SBS) method on the clonal single molecule array (CSMA) platform manufactured by Illumina, Inc., with which short sequence reads could be compared to a reference genome for SNP discovery [12
]. This approach successfully identified 62,042 putative SNPs. A subsequent analysis of 22,865 of these SNPs revealed a 91% validation rate, demonstrating the robustness of this SNP discovery method [12
Our objective was to use the RRL approach with the SBS method on the CSMA platform from Illumina, Inc for the discovery of large numbers of soybean SNPs that could be developed into GoldenGate assays to create a new genetic map with higher resolution than Consensus Map 4.0 [2
]. This high resolution genetic map would then help address the challenges faced in assembling and orienting the remainder of the soybean genome [1