To survey PMS genome-wide we first dissected tetrads obtained from a cross between two diverged yeast strains - a laboratory strain, S288c, and a clinical isolate, YJM789 [
18,
19]. These strains were selected due to their substantial genetic diversity. In wild populations, including those of
S. cerevisiae [
20,
21], most individuals are heterozygous and the S288c/YJM789 cross may therefore resemble conditions in the wild closer than homozygous strains. Although the large number of polymorphisms between the strains allows high-resolution genotyping, heterozygosities could also affect meiotic recombination [
22]. Nevertheless, in the S288c/YJM789 cross, the genomic distribution of recombination events has been shown not to be markedly perturbed [
15,
16]. It has also been observed that certain allelic combinations of the mismatch repair (MMR) genes are incompatible, leading to elevated mitotic mutation rates in segregants of intra-species yeast hybrids. Strains with an S288c allele of
MLH1 in combination with the SK1 (another
S. cerevisiae strain) allele of
PMS1 show an approximately 100-fold higher mutation rate in the
lys2-A14 mutator assay [
23]. This observation is consistent with the central role that
MLH1 and
PMS1 play in MMR. YJM789 carries the ancestral form of both genes and is therefore compatible with S288c and SK1. Thus, we do not expect the progeny of the S288c/YJM789 cross to show elevated mutation rates [
23,
24].
We allowed each of the dissected spores to germinate and divide mitotically, and then separated the two resulting cells under a dissection microscope (Materials and methods). The four pairs of mother and daughter cells arising from each tetrad were genotyped using tiling microarrays and a supervised modality of the
ssGenotyping algorithm [
25], trained on a large set of published data [
16]. A total of four tetrads were analyzed. Markers where PMS occurred (PMS markers) were identified by comparing the genotypes from mother and daughter cells in each pair (Figure ). For each identified PMS event in the two tetrads with the most events, conventional Sanger sequencing was performed as validation, and no false positives were discovered.
Among the four tetrads, we found a total of 52 markers where PMS occurred (18, 6, 17, and 11 per tetrad; Additional file
1). This constitutes 1.2% of the overall number of markers involved in recombination events (Additional files
2 and
3). There were four instances in which PMS occurred in more than one marker in the same recombination event (for example, Figure ). PMS events were present in more than 9% of the overall recombination events: 46 of the total 499 COs and NCOs had at least one marker exhibiting PMS (Additional file
3). Furthermore, COs containing no converted markers presumably correspond to recombination events in which heteroduplex DNA contained no polymorphic positions, and which therefore could not produce gene conversion or PMS. In fact, the inter-marker spacing at the flanks of these COs was considerably larger than a typical inter-marker interval (median inter-marker spacing of 2.1 kb versus 78 bp). If such COs are set aside, the portion of recombination events with at least one PMS marker increased to 10.6%. The high number of recombination events where PMS occurred across the genome indicates that PMS is a widespread phenomenon in recombination and a significant contributor to allelic diversity during meiosis.
Although the MMR machinery that resolves mismatches during the formation of COs or NCOs is thought to be the same [
4], it has been observed that a fraction of COs presents higher PMS frequencies [
26]. Whether PMS occurs more frequently in COs overall or in NCOs has not been tested. Out of the 46 PMS events, 28 occurred in COs and 18 in NCOs. Notably, this ratio did not significantly differ from the overall genomic CO to NCO ratio observed (336 COs:163 NCOs; Additional file
3; Fisher exact test,
P = 0.33). Thus, our data do not suggest that the efficiency of the MMR machinery depends on whether the heteroduplex is resolved towards a CO or a NCO.
Interestingly, we observed that markers where PMS occurred tended to be at the ends of gene conversion tracts (Figure S1 in Additional file
4). Only six PMS events were not at the end of a tract. To test whether this observation statistically deviates from a scenario in which PMS occurs uniformly along conversion tracts, we focused on the 26 tracts containing at least one PMS marker and consisting of three or more markers. (Tracts smaller than three markers have only terminal markers.) Among these 26 events, there were 20 (76.9%) with a terminal PMS marker, and together they contained 32 PMS markers, of which 22 were terminal. If we assign 32 PMS events uniformly at random to this set of events, the probability of seeing such a high fraction of events with a terminal PMS marker is <0.001 (Figure S2 in Additional file
4; Materials and methods). This provides strong evidence that PMS occurred predominantly at terminal markers.
It has been previously shown that neighboring polymorphisms influence the PMS frequency of a given marker [
27,
28]. To investigate the effect of surrounding heterozygosities, we first considered the polymorphisms around PMS markers independently of whether they also showed PMS. We found that 100-bp windows centered on the PMS markers were twice as likely to not contain any other polymorphism as windows centered on markers not showing PMS (Figure , compare top and bottom panels; Fisher exact test,
P = 2.5 × 10
-10). A range of other window sizes (50 to 300 bp) gave qualitatively similar results. Since the ends of gene conversion tracts tend to have lower marker density (Figure , compare middle and bottom panels), the preferential position of PMS markers at the end of tracts might have been the cause of the observed relative isolation of PMS markers. This turned out not to be the case: the median distance to the nearest polymorphism for PMS markers was 49 bp larger than for all end-of-interval markers (Figure ; Wilcoxon test,
P = 0.002). Thus, PMS markers appear to be better separated from neighboring polymorphisms than would be expected by chance, even given their positioning at the end of conversion tracts. This suggests that the MMR machinery may be more responsive to heteroduplex regions with a higher density of mismatches.
The MMR machinery repairs mismatches by excising a segment of one of the two single strands, often as large as 900 bp [
27]. Therefore, adjacent mismatches, if present within the excised fragment, can be co-repaired. If MMR repair takes place over large tracts of heteroduplex DNA - that is, if repair does not take place one mismatch at a time - then it is also conceivable that tracts of heteroduplex DNA that contain multiple mismatches may be left unrepaired. In our data, consecutive PMS markers in the same conversion tract may provide evidence of this. Altogether, one recombination event involved two PMS markers, and two involved three (Figure ; Figure S3 in Additional file
4). Remarkably, markers where PMS occurred in the same conversion tract were always adjacent to each other, with no other polymorphisms in between (Figure ; Figure S3 in Additional file
4). Among these, the shortest distance between neighboring PMS markers was 43 bp, and the longest was 488 bp. All of these events were at the end of a conversion tract. Having established that a high fraction of the observed PMS events occurred in the final marker of a recombination tract, we next asked if the observed end-of-tract multi-marker PMS events were likely to be mechanistically linked or were rather due to chance co-localizations of independent PMS events. Using, as before, the 26 tracts with three or more markers that were observed to contain a PMS marker, we ran a second simulation. This simulation included end-of-event bias: simulated PMS markers were assigned to internal and terminal positions in proportions similar to those observed in the actual data (see Materials and methods). In this second simulation, the probability of seeing three or more recombination events with end-of-tract multi-marker PMS events is very unlikely (
P < 0.001). This suggests that the occurrence of PMS in a given marker increases the frequency of PMS in the surrounding markers, at least for terminal PMS events. This finding is consistent with previous observations made at the budding yeast
HIS4 locus [
27].
In our whole dataset, we observed only one instance in which two different spores had PMS in the same marker. Both of these PMS events were located in the two spores involved in a single CO, resulting in 4:4 aberrant segregation (Figure ). Such a pattern of symmetric heteroduplex tracts is expected to be the result of branch migration of a Holliday junction during DSB repair. Aberrant 4:4 segregation resulting from symmetric heteroduplex DNA was one of the original predictions of the Holliday model of recombination. However, since aberrant 4:4 segregation is rarely observed in
S. cerevisiae, Holliday junctions are currently thought to be resolved before branch migration [
6]. The rare cases of observed aberrant 4:4 segregation have been alternatively explained as the result of two independent recombination events involving all four chromatids [
6]. Although the event observed here has a complex topology (Figure ), the fact that only two chromatids show recombinant markers suggests that it resulted from symmetric heteroduplex tracts during the repair of a single DSB.
Having explored the context in which PMS markers are located in terms of other polymorphisms, we next considered the types of polymorphisms where PMS occurred. Insertions or deletions (indels) accounted for 9.4% of the polymorphisms in gene conversion regions, a similar proportion to that of indels present between the whole genomes of S288c and YJM789 (approximately 9.0%) [
29]. Of the markers where PMS occurred, 98.1%, or all but one (a 29-bp indel), were SNPs. If one treats the 52 PMS markers as independent Bernoulli draws from the pool of markers involved in a recombination event, then the chance of drawing 0 or 1 indels is 0.03. However, given the preferential occurrence of PMS at the ends of conversion tracts, if only such positions are considered, the fraction of indels drops to 6.4%, and the probability of observing 0 or 1 indels in 52 events rises to 0.15. Previous work has shown that the MMR machinery has similar binding affinities to 1-bp indel mismatches as to the strongest bound SNP mismatch [
30]. Other indel mismatches have been observed to be bound with lower affinity than 1-bp indels [
30]. Furthermore, null mutations in the main MMR proteins have been observed to exert a similar effect in the repair frequency of SNP and small indel mismatches [
4]. From our genome-wide PMS data we cannot conclude - with statistical significance - whether indel mismatches are better repaired than SNP mismatches.
To gain further insight into the sequence characteristics of PMS events and their evolutionary hallmarks, we focused on SNPs and analyzed the type of bases that are involved in PMS. Any given SNP can give rise to two possible mismatches, depending on which base is resected during recombination. As shown in Figure , at markers where PMS occurred, we observed SNPs that could generate all possible mismatches (Additional file
1). However, the relative frequencies of SNP types at PMS markers differed strongly from those of all SNPs found in recombination events (Figure ; Fisher exact test,
P = 4 × 10
-9). SNPs that generate C/C or G/G and A/A or T/T mismatches are, respectively, 5.0 and 1.8 times more frequent in PMS events than in overall recombination events. On the other hand, SNPs giving rise to A/G or C/T mismatches are approximately as frequent as in recombination events, and SNPs producing A/C or G/T mismatches are only half as frequent. These deviations in the relative frequencies do not seem to be caused by the preferential occurrence of PMS at the end of conversion tracts, since the different SNP classes are uniformly distributed along tracts (Figure S4 in Additional file
4). We thus find clear differences in the genome-wide PMS rates between all four SNP classes.
The enrichment of SNPs generating C/C or G/G mismatches is a likely reflection of the known relative inefficiency of C/C repair [
31,
32]. At the
ARG4 and
HIS4 loci, C/C repair has been reported to be between three- and five-fold less efficient than the repair of other mismatches [
7,
8]. Similar efficiency reductions have been found in other fungi (
Schizosaccharomyces pombe) [
33], in animals [
34] and in prokaryotes [
35]. It has even been proposed that C/C mismatches are repaired by a different molecular machinery than other mismatches [
36]. It is also known that the best-repaired mismatch is G/T. Binding studies
in vitro have revealed that the MSH2-MSH6 complex, a central player of MMR, has the highest affinity to G/T mismatches [
30,
34]. The efficiency with which other mismatches are repaired is less clear, especially
in vivo. A/A and T/T mismatches, for example, have been reported to be repaired less efficiently in mitotic assays [
31], but also as efficiently as other mismatches during meiosis [
7,
9]. Here we find clear differences in the genome-wide PMS rate between all four SNP classes (Figure ), suggesting that each mismatch class is repaired with a different efficiency
in vivo.
Interestingly, the repair efficiency of mismatches observed here was inversely related to the overall frequencies of the associated SNP classes in
S. cerevisiae (Figure ). This was not only true for SNPs between S288c and YJM789, but also for SNPs among several recently sequenced yeast strains [
37]. The distribution of SNP classes in the population reflects, at least in part, the frequency with which the MMR machinery encounters the mismatches caused by such SNPs. The fact that the mismatches associated with the most common SNP classes are also the most efficiently repaired may therefore be a consequence of selective pressure favoring removal of mutation-associated mismatches. MMR protein variants that are better at repairing common mismatches would be selected for. There is support for this hypothesis in
Escherichia coli, where the frequency of different DNA polymerase III errors
in vitro is positively related to the repair efficiency of mismatches in phage genomes [
38]. The same category of SNPs that is most numerous in the budding yeast genomes is also the only one to form purine/pyrimidine mismatches. Therefore, it may indeed be the case that the MMR machinery has evolved to more readily recognize such mismatches.