During meiosis, homologous copies of the chromosomes align, and the repair of programmed double-stranded breaks in the DNA leads to recombination: the reciprocal exchange of DNA between homologs (crossovers), or the non-reciprocal modification of one homolog, using the other as a template (non-crossover gene conversion). As a consequence, the genome of each meiotic product, or ‘segregant’, is a mosaic of the two parental genotypes (). A recent study in Saccharomyces cerevisiae
used the array-based genotyping methodology presented here to create a genome-wide map of crossover and non-crossover gene conversion with the highest resolution to date (Mancera et al.
Fig. 1. Meiotic recombination genotyping assay. (A) Meiosis was induced in a diploid cross derived from the highly polymorphic S96 and YJM789 strains. Haploid parents and meiotic products (‘segregants’) were used for genotyping. (B) Five probes—four (more ...)
Oligonucleotide microarrays provide an accurate and cost-effective means of identifying and genotyping polymorphic loci. Oligonucleotide microarray probes hybridize more efficiently to targets whose sequence is exactly complementary than to targets which only partially or imperfectly match the probes. Winzeler et al.
) used this fact to identify several thousand polymorphic positions in the same two yeast strains we consider here. Since then, numerous authors have made use of these so-called ‘single feature polymorphisms’ (SFPs)—in yeast (Brem et al.
; Deutschbauer and Davis, 2005
; Gresham et al.
; Steinmetz et al.
; Winzeler et al.
), and also in other organisms (Albert et al.
; Borevitz et al.
; Rostoks et al.
; Turner et al.
). With the exception of Brem et al.
), these authors have taken a supervised approach to the problem, training a genotyping classifier on samples of known genotype and then applying the classifier to new samples. Winzeler et al.
) hybridized parental genomic DNA from each of the two strains to standard yeast expression arrays. Then, after preprocessing, analysis of variance (ANOVA) was used to identify probes whose observed log-scale fluorescence intensities appeared to be better fit by a model with two means than by a model with one. Such probes were deemed to be SFPs. To genotype segregants from a cross, a posterior probability was computed using the estimated Gaussian densities from the parental-array ANOVA, plus a uniform prior on the two genotypes:
Variants on this procedure soon emerged. The 1- versus 2-mean ANOVA is equivalent to a two-sample t
-test for difference in means, and Borevitz et al.
) proposed an alternative t
-test for identification of SFPs, using the ad hoc
-statistic of SAM (Tusher et al.
). Brem et al.
)—whose data included hybridizations from numerous segregants of unknown genotype, as well from parental samples of known genotype—further augmented this approach: using parental data, candidate SFPs were identified on the basis of a high moderated t
-statistic. Then, known parental genotype labels were temporarily set aside, and the combined parental and segregant data were subjected to k
-means clustering (k
= 2). Candidate SFPs were only retained if the parental samples were correctly separated by the resulting clusters. Further, Brem et al.
estimated the Gaussian densities required in (1
) from all
data in the clusters, rather than only from parental observations of known genotype.
The more recent, multivariate approach of Gresham et al.
)—designed for high-density tiling microarrays—is quite different: the authors considered the set of probes which interrogate a given position, and they modeled the decrease in fluorescence intensity caused by a SNP as a function of (i) the SNP's position within each probe, (ii) known response of the probes to reference sequence and (iii) various aspects of the probes' base composition. Their algorithm, SNPscanner, was trained on a set of ‘high-quality’ known SNPs to produce two predictions for probe set behavior: one which corresponds to reference sequence, and the other, to sequence with a variant base at the given position. Observed behavior on new arrays was compared with the two predictions, and genotype was assigned on the basis of which model fits best.
In the remainder of this article, we introduce ssGenotyping (ssG) as an alternative to SNPscanner, and show that it provides both more specific and more sensitive genotyping in the context of a meiotic recombination dataset. In addition, we use the comparison between the methods to illustrate two points which are important for successful array-based genotyping in any context: (i) the extent to which probe behavior—cross-hybridization behavior, in particular—is sensitive to genomic background, and (ii) the ability of predictive models to describe probe behavior in a complex setting.