We have combined RAD marker isolation with massively parallel, high-throughput Illumina sequencing to develop a genotyping platform that can rapidly and cost-effectively discover novel SNP markers and simultaneously genotype many individuals. We demonstrated the feasibility of this platform by discovering more than 13,000 polymorphic markers in two different organisms and by rapidly mapping three traits in two bulk segregant populations and 96 individuals, using less than half the capacity of one run on an Illumina Genome Analyzer. First-generation RAD marker analysis using microarrays has proven to be an effective method for mapping genomic alterations in various species. However, the technique suffers from the need to make species-specific arrays for each organism and perform a new array hybridization for every desired comparison. Our new RAD marker sequencing method adapted the positive aspects of the RAD array approach for high-throughput Illumina sequencing and allows researchers to perform the equivalent of hundreds of RAD array experiments in a single sequencing run. Major benefits of this new approach are a significant increase in the number and type of markers assayed, a decrease in cost and increase in the speed of analyses. In addition, whereas array-based approaches only assay RAD tag presence variation, Illumina sequencing simultaneously provides data on SNPs located outside the recognition site of the restriction enzyme. The ability to sequence heterogeneous DNA samples with the Illumina sequencing platform, along with our sample barcoding, permits multiplexing of experiments such that many individual mapping projects can be performed in parallel, reducing the cost and effort needed to initiate and complete a mapping project.
SNP discovery in a mapping cross requires a high number of sequence reads per marker for each parent. However, F2 individuals in organisms with a reference genome can be lightly sequenced to make genotypic inferences about a genomic region because genetic material is inherited in large blocks defined by recombination breakpoints. We demonstrated this by inferring the genotype of unsequenced markers on LGIV from flanking RAD tag sequences to establish linkage blocks and fine-map the lateral plate breakpoints near Eda. A useful two-step approach is to first undersample a high density of markers to identify individuals with informative recombination breakpoints and then acquire additional sequences for these individuals. This ability to infer genotypes in undersequenced large populations allows effective mapping of quantitative trait loci (QTLs) in organisms with sequenced genomes. QTL analyses are most effective when the parental origin of genomic regions in each individual of an F2 mapping family can be determined.
Without a sequenced genome the above strategy of inferring genotypes from neighboring RAD markers will not work. However, we have shown that bulk segregant analysis can successfully identify completely linked markers. Several routes exist for moving from RAD markers to genomic location in organisms without a reference genome. The segregation of RAD markers could be analyzed in a mapping cross to produce a genetic linkage map. In addition, paired-end RAD tag sequences could be used to produce probes or PCR primers to screen fosmid or BAC libraries to identify genomic regions linked to the phenotype.
The RAD marker approach has the flexibility to assay different numbers of markers depending on the choice of restriction enzyme, as we showed by sequencing markers derived from restriction enzymes that have either a 6 or 8 nucleotide recognition sequence. Matching the GC content of a restriction site to a genome can also be used to influence marker number. EcoRI (25% GC) had fewer sites in the stickleback genome than expected by chance, while SbfI (75% GC) had greater than the number of sites expected by chance.
With decreasing cost and increasing capacity in sequencing technology, complete sequencing of each individual of interest to determine entire genomic sequences may be a viable option in the future. However, information from contiguous SNPs will often be highly redundant and the additional information gathered at such great density would be wasteful. By focusing sequencing efforts only on those tags flanking a restriction site in multiplexed samples, our novel approach provides significant data complexity reduction and increased throughput. This allows efficient, high-density SNP discovery and genotyping of mapping crosses. Therefore, even as sequencing technology continues to improve, RAD marker sequencing will remain a useful and cost-effective tool for most genetic mapping studies.