|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Meiotic recombination alters frequency and distribution of genetic variation, impacting genetics and evolution. In the budding yeast, DNA double strand breaks (DSBs) and D loops form either crossovers (COs) or non-crossovers (NCOs), which occur at many sites in the genome. Differences at the nucleotide level associated with COs and NCOs enable us to detect these recombination events and their distributions.
We used high throughput sequencing to uncover over 46 thousand single nucleotide polymorphisms (SNPs) between two budding yeast strains and investigated meiotic recombinational events. We provided a detailed analysis of CO and NCO events, including number, size range, and distribution on chromosomes. We have detected 91 COs, very close to the average number from previous genetic studies, as well as 21 NCO events and mapped the positions of these events with high resolution. We have obtained DNA sequence-level evidence for a wide range of sizes of chromosomal regions involved in CO and NCO events. We show that a large fraction of the COs are accompanied by gene conversion (GC), indicating that meiotic recombination changes allelic frequencies, in addition to redistributing existing genetic variations.
This work is the first reported study of meiotic recombination using high throughput sequencing technologies. Our results show that high-throughput sequencing is a sensitive method to uncover at single-base resolution details of CO and NCO events, including some complex patterns, providing new clues about the mechanism of this fundamental process.
Meiosis is essential for eukaryotic sexual reproduction and reduces the number of chromosomes in half to generate haploid cells [1-3]. To ensure the proper meiotic homolog segregation, the homologs must recognize and pair with each other in early prophase I [1-3]. It is thought that a key pairing mechanism is via DNA heteroduplex formation, which is intimately coupled with the initiation of meiotic recombination . One major type of outcome of meiotic recombination is crossover (CO), which involves the exchange of flanking markers, as well as possible gene conversion (GC) [4,5]. Another result of recombination is GC without exchange of flanking markers (Non-CO, or NCO) [4,5]. Meiosis is also the process that re-distributes the genetic variations in a eukaryotic population. The extent of meiotic recombination directly impacts the frequency of specific combinations of alleles. Because of the effect of meiotic recombination on the distribution of genetic diversity, meiosis is thought to have contributed to the extraordinary diversity and evolutionary success of eukaryotes [6-10].
Meiotic recombination has been studied extensively using model systems, including the budding and fission yeasts, Drosophila melanogaster, Caenorhabditis elegans, mammals, Arabidopsis thaliana, and maize [1-3]. In the budding yeast Saccharomyces cerevisiae, molecular and biochemical studies have identified key intermediates of meiotic recombination, starting with DNA double strand breaks (DSBs) and D-loops [4,5]. A portion of the D-loops proceeds to form double Holliday junctions (DHJ), which are then resolved largely to COs. Some D-loops undertake another pathway to form COs, possibly via single Holliday junctions (SHJ), as seen in the fission yeast . A third option for the D-loops is the repair of DSBs without COs, resulting NCO/GC events if the two recombining DNAs are not identical.
Because recombination occurs at many sites in the genome, it is important to investigate recombination at the whole-genome level. Genome-wide genetic detection of crossovers has been done in many genetic systems, resulting in the construction of genetic maps, as well as producing other information. However, previous molecular studies usually relied on the use of naturally occurring (such as the one at the HIS4 locus) and artificially generated (such as ones induced by the HO endonuclease) recombination hotspots as substrates; therefore, the molecular details of crossovers are not available on a genome-wide level. In addition, NCO/GC has been investigated using a small number of markers or by inference at a population level. Recently, meiosis between two strains of the budding yeast has been analyzed using microarrays, providing valuable information on the frequency of CO and NCO events on a genome-wide scale .
As an alternative way to analyze meiotic recombination at the DNA level on a whole-genome scale, we have used the recently developed Roche GS20/FLX  and Illumina  sequencing technologies. To obtain a large number of DNA polymorphisms as markers for recombination, we used two strains of S. cerevisiae that have sequenced genomes: S288C and RM11-1a [15,16], which were estimated to have 0.5-1% sequence divergence distributed throughout the genome. Here we report our results from high-throughput sequencing of both the S288C and RM11-1a (hereafter referred to as RM11 for convenience) strains and four meiotic products. Over 46 thousand single nucleotide polymorphisms (SNPs) were revealed by comparison and further parsing of the two genomic sequences. Armed with these markers, we were able to detect COs, NCOs and other recombination events in meiotic products (spores) from a diploid generated by crossing S288C with RM11.
We compared the S288C and RM11 genomic sequences and recognized 62,324 putative SNPs; however, our preliminary analysis by sequencing PCR products using the conventional dideoxynucleotide method indicated that 101 putative SNPs were actually sequencing errors in the S288C or RM11 sequences (data not shown). Therefore, we re-sequenced the S288C (12× coverage) and RM11 (15× coverage) genomic DNAs using the Illumina technology and obtained > 4.4 and 5.2 million reads, respectively (Table (Table1;1; sequence data to be submitted to Genbank). These reads covered 94% and 93% percent of the respectively public genomic sequences and provided independent verification of 46,487 SNPs (available upon request) that were previously detected by the public sequences. In addition, we found 803 and 1104 errors (Table (Table1)1) in the public S288C and RM11 sequences, respectively (available upon request), corresponding to previously identified SNPs between these sequences. Because the S288C strain is slightly different to RM11 (estimated to be 0.5-1%), the vast majority of the public sequences are identical. The sequences that agree between the two strains should be more reliable because they are supported by both sequencing projects. However, there is a very low probability that a small number of bases might be wrong. Therefore, we also compared the sequences that are in agreement between S288C and RM11 with our new data. Using our data with consistent results from at least 2 reads, we found indeed there were only a very small number of errors, 116 and 242 in the previously reported S288C and RM11 sequences, respectively, resulting in the identification of 358 new SNPs (available upon request). Our data provide strong support for over 46 thousand SNPs, which will facilitate further molecular genetic and genomic studies using these two yeast strains.
To obtain a diploid with a large number of sequence polymorphisms, we crossed S288C with RM11; then we induced meiosis in the diploid using a standard protocol, and obtained a number of tetrads (asci) with meiotic spores (not shown). We cultured one set of four spores in a rich medium and isolated DNAs from these four cultures. These DNAs were sequenced using the 454 technology, resulting in approximately 300,000 to 416,000 reads, or 3.6× to 4.9× coverage, of each of the four meiotic products (Table (Table11).
Because the 454 sequences are relatively long, ranging from ~100 to > 170 bps, we thought it would be informative to assess the feasibility of performing assembly of the new sequence data as a strategy for de novo sequencing of genomes. We assembled the 454 reads from our four meiotic samples separately to test the effect of read length on assembly since the sequences from spore 4 had only shorter reads of ~100 bases, whereas the sequences of the other meiotic products had longer reads. We found that the assembly of data from any of the first three spores with longer reads gave much longer contigs (t test, p << 0.001) and a higher coverage than the short reads from spore 4, using the S288C genome as a reference (Table (Table2).2). Next, we pooled the data from two, three or four spores, and performed assembly again; we found that the data from any two of the first three spores allowed the assembly of much larger contigs and greater coverage of the genome than data from single spores (Table (Table2).2). With the combination of data of the first three spores, the assembly yielded longer contigs, but little increase in the coverage of the genome. The addition of the short reads from spore 4 primarily resulted in many more short contigs. These results indicated that ~10× coverage from the longer 454 reads provided > 94% coverage of the yeast genome.
Using the > 46 thousand SNP markers that we have verified, we determined chromosomal regions that are primarily of the S288C or RM11 strain backgrounds, respectively (Figure (Figure1A),1A), resulting in the identification of 91 COs (4550 cM, close to the reported map distance of 4884 cM) (Figure (Figure1A,1A, Additional file 1 - Table S1). As shown in Figure Figure1A,1A, each of the 16 yeast chromosomes had 2 to 11 COs, with larger chromosomes having more COs than smaller chromosomes. This observation is well supported by a recent study  based on 4161 COs resulted from 46 meiosis (Additional file 1 - Figure S1), in which CO number is substantially linear proportional to chromosome size (correlation coefficient squared as R2 = 0.985). In addition, the genotypes of the four meiotic products indicated that the four meiotic chromatids had participated in 35, 44, 52, and 51 COs, respectively. Among the 91 CO events, 37 did not show a detectable GC (details not shown), 48 were associated with a single detected GC (see Figure Figure1B1B for an example), 5 associated with two GCs (1:3 and 2:2 [or 3:1 and 2:2] see below for more details; see Additional file 1 - Figure S2 and S3), the remaining 1 had a complex GC pattern (see below for additional information; see Additional file 1 - Figure S2 and S4). Three COs even had sequence changes in 3 of the four meiotic products. A subset of COs was verified by PCR and the results (Additional file 1 - Figure S5) agreed with the high-throughput sequencing results.
In addition, by comparing sequences from all four meiotic products, we detected 21 putative GC events not associated with CO (Additional file 1 - Table S2). To verify its reliability, we analyzed the DNA sequences at all 21 putative GC sites using PCR and conventional Sanger DNA sequencing. The PCR and sequencing results were in complete agreement with the Roche GS20/FLX and Illumina results. The results indicated that the four meiotic chromatids had 7, 6, 4, and 4 detected GCs. Because the two yeast genomes are ~99% identical, the observed GC events were likely fewer than the actual recombination/pairing events. We estimated the possible number of undetected NCOs in a way similar to that in a recent study . Among 91 COs discovered in this analysis, 37 were detected using flanking SNP information, but did not show a detectable GC due to the lack of a SNP. If a similar fraction (37/91 = 0.407) of NCOs was not detected due to the lack of SNPs, the estimated total amount of NCOs would be 30 (= 21 × 1.407). Therefore, our genome sequencing results indicated that there were a significant number of NCO (GC) events, resulting in a change of allelic frequency.
The DNA of spore 4 was analyzed earlier than others and the 454 reads had shorter lengths, resulting in a reduced coverage of the SNPs. One effect of the reduced coverage was that a crossover involving spore 4 probably had more inaccurate border(s); nevertheless, all COs involving spore 4 were still detectable because flanking markers were still observed. Because NCOs were detected using the SNP information for each spore in the chromosomal context, reduced SNP coverage in spore 4 likely caused a decrease in NCO detection, providing another possible explanation for under-estimation of the NCO number.
From the sequence information, we estimated the minimum and maximum sizes of the COs, as illustrated in the example shown in Figure Figure1B.1B. The maximum possible lengths of crossover regions (defined by the closest detected markers) ranged from 164 bp to 10,637 bp. As this could be an over-estimation due to the limited SNP information, we also estimated the minimum size, as defined by the detected SNPs within the CO regions; as often there was only one SNP in the CO region, the minimum sizes for these were as small as one base pair. Therefore, median sizes of COs, estimated by the average value of their minimum and maximum sizes, were used for statistical analysis (Figure (Figure2A).2A). A histogram for over 4000 COs detected from 46 meiosis from a recent study  is shown in Figure Figure2B2B as a comparison (1252 COs without available length were not included). Distribution of distance between adjacent COs are displayed in Figure Figure2C2C and and2D2D for both 91 COs in this analysis and that from the recent study .
Nevertheless, at least 28 COs had minimal sizes of greater than 1.0 kb, with the largest minimum size being over 7 kb (Additional file 1 - Figure S6). Among the NCOs, the maximum sizes ranged from 1,109 bp to 7,575 bp, and the largest minimum size was over 6.5 kb (Additional file 1 - Figure S7). These results indicate that both CO and NCO can involve several kbs, suggesting that DNA repair and/or heteroduplex formation can be rather extensive. In budding yeast, most COs are thought to result from the double Holliday junctions (DHJs), and a small fraction of COs from single Holliday junctions (SHJs) . If all DHJ are initiated with the same size and then each Holliday junction "randomly" expands to a larger size, the length distribution of COs should follow a Normal distribution. However, we found that the observed sizes of COs (Figure (Figure2A)2A) were not consistent with a Normal distribution, supporting a mixture of COs resulted from both DHJs and SHJs, since COs from SHJs might have different ranges of lengths resulted from a different pathway . This distribution is also supported by the same analysis on the data of the recent study using microarrays (Figure (Figure2B)2B) .
A recent study mapped 1,306 DSB hot-spots along the chromosomes of a dmc1 mutant . To test whether the CO and NCO events were enriched for DSBs, we obtained the DSB density data  in a region from 10-kb upstream to 10-kb downstream of each CO, and calculated the average DSB density in 1-kb intervals centered at every kb in this 20-kb region for the CO and NCO sets in our study, and found that the peak of highest average DSB density was very close to the site of CO or NCO (Figure (Figure3).3). We also performed a similar analysis for the CO and NCO loci reported by Mancera et al. , and found very similar patterns (Figure (Figure3).3). These findings indicated that the loci of COs and NCOs were enriched for DSB hot-spots, consistent with the idea that DSBs are initiation sites of meiotic recombination.
In yeast, genetic studies indicated that COs might be distributed according to two models: Poisson and counting models for interference insensitive and sensitive COs, respectively [18,20]. We found that the observed COs (Figure (Figure4A)4A) did not agree with a Poisson model. Further analysis indicates that the distribution of COs is consistent with a mixture of interference insensitive and sensitive events (Figure (Figure4B4B and and4C).4C). In the budding yeast, plants, and human, interference-sensitive COs are the majority, accounting for about 80% of the total CO events, with the remaining COs being interference insensitive [18,20-23]. Molecular genetic studies indicate that interference-sensitive COs are generated from the DHJ intermediates, and involve branch migration, resulting in potentially longer tracks of conversion . In the fission yeast, the interference insensitive pathway has been reported to go through a SHJ intermediate, having a distinct mechanism , although the mechanisms for interference insensitive COs in other organisms are not clear.
The maximum possible lengths of CO regions in this study covered a wide range. If interference insensitive COs in the budding yeast also involve a SHJ, it is possible that shorter COs might be generated by the interference insensitive pathway. To test this idea, we analyzed the genomic distribution of the COs that were shorter than 1.5 kb, and found them to be consistent with a Poisson distribution (Figure (Figure4B);4B); on the other hand, the COs that were longer than 1.5 kb did not have a Poisson distribution, consistent with the possibility that they were generated by the interference-sensitive pathway (Figure (Figure4C).4C). Analyses with different cutoffs other than 1.5 kb were also preformed (data now shown), but the statistical fit of the distribution of shorter COs to a Poisson model was not as good as that of the 1.5 kb cutoff; in addition, the proportion of shorter COs from the 1.5 kb cutoff was consistent with previous observations [18,20-23].
Grieg et al.  reported that sequence divergence between homologs could affect the frequency and distribution of COs. To test whether the sequence differences between S288C and RM11 had an effect on CO and NCO frequency, we divided the yeast genome into 10-kb intervals, and determined the distribution of numbers of SNPs in 10-kb intervals throughout the genome. We then obtained the positions of the COs and NCOs from Mancera et al. , and plotted the average number of COs (or NCOs) as a function of the number of SNPs/10-kb (Figure (Figure5).5). Our analysis showed that the regions with more SNPs did not have a reduced frequency of COs or NCOs. Therefore, the extent of divergence between the S288C and RM11 strains did not seem to adversely affect CO frequencies.
As mentioned above, several COs have complex GC patterns (Figure (Figure66 and Additional file 1 - Figure S2, S4 and S8), consistent with the repair of heteroduplex DNA after Holliday junction formation and resolution. In addition, we also found that three CO events had sequence alterations in three of the four meiotic products (Additional file 1 - Figure S5) [26,27]. Further analysis of these three events using PCR and sequencing showed that they indeed involved three chromatids (Additional file 1 - Figure S5), suggesting that DSBs were generated in two chromatids, and that the recombinogenic broken ends from the DSBs interacted with the other two chromatids during the recombination process, in four-chromatid events. In addition, we observed one region that had two adjacent GC regions without exchange of flanking sequences (Additional file 1 - Figure S9); this could most easily be explained by the resolution of a DHJ in a NCO fashion [26,27]. Although, this was proposed in the original DSB repair model of recombination as a major pathway for NCOs, more recent models favor a non-Holliday junction pathway for NCOs. Our results suggested that DHJs might still be revolved to form NCOs, although at a frequency much lower than that of CO formation. We have also detected some evidence of post-meiotic segregation (PMS), which was an indication of unrepaired heteroduplexes that subsequently segregated during mitotic growth of the haploid meiotic products (Additional file 1 - Figure S10). From the high-throughput reads, we found one putative PMS events and another 4 PMS candidates with low quality from initial mapping. Sequence analysis of PCR products confirmed the PMS event and the other candidates were denied due to mis-alignment of repeat sequences.
A major difference between this study and the microarray studies published recently  is that we determined the actual sequences of the meiotic products, rather than inferring about the SNP genotypes on the basis of differential hybridization signals. Our approach can detect both SNPs and any other sequence information. It was reported that spontaneous mutation rates at specific loci could be 6-20 fold higher in meiosis than mitosis [28,29]. However, there has been no study of mutations during meiosis at a genome-wide scale. To search for spontaneous mutations, we examined the sequences throughout the genome for base substitution mutations and did not identify any sequences that differed from both parental sequences. Therefore, the mutation rate was below our detection limit of ~8 × 10-8 per base per cell division. A recent genome-wide analysis of mitotic yeast cells provided an estimated rate of mitotic substitution of 3.3 × 10-10 per base per cell division , suggesting that a 6-20 fold increase would not be detected by our analysis. Tandem repetitive sequences are known to have high mutation rates to form different copy numbers in cell division. Repeats with a higher copy number usually have higher mutation rates and lower appearance frequency (number of loci) . However, the possibility of appearance of such kind of mutation is still too low to be observed in one generation of meiosis, as confirmed by our analysis of all 16 chromosomes in the 4 spores.
In summary, our studies have reliably verified over 46 thousand SNPs that were identified by comparison between the public S288C and RM11 genomic sequences and have uncovered errors in the S288C and RM11 sequences, respectively, thereby removing 1907 previously reported SNPs and defining 358 new SNPs. These new sequence results are useful resources for further genomic and genetic studies using the budding yeast. We have uncovered detailed molecular information about meiotic recombination on a whole genome level using high-throughput sequencing. The numbers of CO and NCO events we detected were in very good agreement with previous studies; furthermore, we described complex patterns of COs that involved three chromatids, shedding new light on the process of meiotic recombination. Our studies provide a window into the nature of meiotic recombination at the DNA level throughout the genome and established a whole-genome foundation for further molecular genetic studies of this fundamental process.
The Saccharomyces cerevisiae strains S288C and RM11 were grown overnight at 30° on an agar plate with the YPD rich medium, and mixed on an YPD plate to allow mating to form diploid cells. Newly formed zygotes were identified under a light microscope and transferred to a clean area of the YPD plate using a micromanipulator, and grown to a colony at 30°. The diploid strain was then grown on an YPD plate as a patch, and freshly grown cells were transferred to a sporulation plate. After one week, tetrads with four spores were detected under a light microscope, were partially digested in an aqueous solution of zymolyase. The partially digested tetrads were dissected to separate the spores under a light microscope using a micromanipulator, and the spores were allowed to grow for two days on an YPD plate into colonies. Cells from the colonies were used to inoculate liquid YPD cultures. Also, S288C and RM11 were similarly grown in YPD cultures to late exponential phase. The yeast cells were then harvested from the cultures and used for the isolation of genomic DNAs.
The public whole genome sequences of the S288C and RM11 strains were downloaded from NCBI (National Center for Biotechnology Information, http://www.ncbi.nlm.nih.gov/) and Broad Institute http://www.broad.mit.edu/ respectively. The four haploid meiotic products from the same meiosis were sequenced by using Roche GS20/FLX pyrosequencing technology to detect COs, NCOs and other recombination events. The S288C and RM11-1a genomic DNAs were sent to Fasteris http://www.fasteris.com for re-sequencing by using Illumina sequencing technology to verify SNPs between these two parental references. The public S288C and RM11-1a genomic sequences were used for BLAST analysis to map the newly obtained sequences from the high throughput shotgun sequencing technologies.
We applied a series of steps to map the high-throughput reads to the S288C and RM11-1a public sequences and to detect SNPs.
First, SNPs between S288C and RM11-1a were initially identified by the global alignment tool MUMMER . Ambiguous differences in repetitive and low complexity regions were ignored (the option "--mum" was used for anchoring matches uniquely on both references genomes). Total 62,324 SNPs were detected for all 16 pairs of chromosomes.
However, some SNPs were false positive and could be attributable to the sequencing error on either S288C or RM11. Each sequencing error on reference genomes could raise an artifact of gene conversion. In order to identify and then exclude these pseudo-SNPs from our analysis, S288C and RM11 were re-sequenced by using Illumina sequencing technology. 803 and 1104 nucleotides on the public S288C and RM11 were corrected by mapping of their re-sequenced reads. 46,487 of 62,324 SNPs were verified for further analysis. A confirmed SNP in this analysis must have at least 2 Illumina reads from each of S288C and RM11. Those SNPs without coverage by Illumina reads on either S288C or RM11, due to uneven sequencing coverage or matches to repeats, were removed in the analysis. These filtered out SNPs need to be verified by additional sequencing coverage.
Third, the reads from the four meiotic products were mapped to the pubic S288C and RM11 sequences by BLASTN  to provide primary information of location and identity for further alignment. A global identity cutoff of 80% was applied to all read matches, from which reads with high identity to reference genomes were kept. Then nucleotide sequences of the references near each SNP and the reads of meiotic products nearby were selected for detailed multiple alignment by CLUSTALW .
Last, whole genome mapping and visualization were applied to all 4 meiotic products near the SNPs. We developed a whole genome visualization tool, named inGAP to display all homology exchange among meiotic products. The manuscript has been submitted (Ji Qi, Fangqing Zhao, Anne Buboltz and Stephan C. Schuster) and the software is available online at http://sites.google.com/site/nextgengenomics/ingap
We have also written an additional set of scripts to perform the bioinformatic analyses in this study. More information will be provided if requested.
JQ carried out the bioinformatics analysis; HM, AJW and YH performed the cell culture, DNA preparation and PCR experiments; LPT prepared DNA library and conducted Roche/454 sequencing; HM and JQ carried out the genomic studies and drafted the manuscript; HM, SCS and JQ produced the final version of the manuscript; HM and SCS designed the study. All authors read and approved the final manuscript.
Supplemental figures and tables. 10 supplemental figures are displayed for selected COs and GCs with PCR results. The positions of all 91 COs and 21 GCs are listed in two tables respectively.
We thank three anonymous reviewers of a previous version of this manuscript for their helpful comments. This sequencing-by-synthesis study was made possible through generous funding from the Department of Biology and the Huck Institutes of the Life Sciences, the Pennsylvania State University. A.J.W. and H.M. were partially supported by funds from Rijk Zwaan, the Netherlands. H.M. was partially supported by funds from Fudan University. J.Q. and S.C.S. were supported in part by the Gordon and Betty Moore Foundation. This project was also supported in part by a grant from the Pennsylvania Department of Health using Tobacco Settlement Funds appropriated by the US legislature. The Pennsylvania Department of Health specifically disclaims responsibility for any analyses, interpretations or conclusions.