Induction of chromosomal rearrangements at the URA3/CAN1 reporter region of chromosome V
Yeast cells harboring chromosome V rearrangements were generated by selective growth on
l-canavanine and 5-FOA plates as described in (
10). To increase the efficiency for the gross rearrangement formation, we used RDKY3615 derived strain, RDKY3671, which has a temperature-sensitive allele (
rfa1-t33) of the
rfa1 gene and a deletion of
XRS2 by insertion of
HIS3 (
10). Rfa1 is a subunit of single-strand DNA-binding protein, RFA which is required for several phases of DNA replication and repair and
XRS2 makes a heterotrimeric endo/exonuclease complex with Mre11, Rad50 required for both homology-dependent double-strand break repair and nonhomologous end-joining. Mutations in each of these genes increased the rate of GCRs. The nonessential arm of chromosome V was used as a reporter region for chromosomal rearrangements. In strain RDKY3615, the
HXT13 gene located distal to the
CAN1 gene on chromosome V was replaced by the
URA3 gene (
Figure S1A). Since
CAN1 expression sensitizes cells to
l-canavanine and
URA3 expression makes cells sensitive to 5-fluorotic acid (5-FOA), cells which have inactivated both
CAN1 and
URA3 can be selected for by growth on plates containing these drugs. The resulting colonies usually contain deletions in the
CAN1/URA3 region of chromosome V (
10).
Chromosomal breakpoints induced by this method are expected to localize within the 12.1-kb nonessential region within
CAN1 and between
CAN1 and the first essential gene
PCM1 on the left arm of chromosome V (
Figure S1A). PCR amplification using primer pairs directed to the essential (primers 5A/5B) and nonessential (primers 5C/5D) regions confirmed loss of the
CAN1 locus in our strain RDKY3671GCR (
Figure S1B).
Generation of ChromPET library
MmeI is a class II restriction endonuclease that digests DNA 20/18 nt away from the recognition site. Genomic DNA fragments of 1.5–2 kb in length were circularized by ligation to an adaptor sequence that has two outward facing MmeI recognition sequences. Digestion of the circularized DNA with MmeI therefore leaves two tags from the ends of the genomic sequence Chromosomal Paired End Tags (ChromPET) linked to the adaptor sequence. A ChromPET library consisting of pairs of tags from either end of a genomic DNA fragment, separated by the T30MmeI adaptor was thus constructed from strain RDKY3671GCR. Since MmeI occasionally cuts DNA at even longer or shorter distances away from the recognition sequence, we get a distribution of tags, of 12–20-bp length (see
Supplementary Figure S5A).
Figure S5B shows that only tags in the range of 12–20 bp long could be mapped back to the genome reliably. This library was subjected to high-throughput sequencing using the Roche 454 sequencing platform and 617 602 sequencing reads were obtained. Using the protocol shown in B and
Supplementary Data, we first identified reads which had a perfect linker sequence and then took the flanking sequences as the PETs of a chromPET. Sequences that did not contain a perfect linker sequence (49 678 reads, 8.04% of total reads) were discarded. This yielded ~17 Mb of sequence data corresponding to 567 924 ChromPETs. Once we removed any additional copies of duplicate chromPETs, i.e. two chromPETs having identical 5′ and 3′ tags, we were left with 489 479 (86.2%) ‘unique’ ChromPETs, which were then mapped back to the yeast reference genome using MegaBLAST (details in ‘Materials and Methods’ section). Of the unique ChromPETs, 380 987 (77.8%) had both ends mapping back to the yeast genome, 84 256 (17.2%) had only one end mapping back to the yeast genome and 24 236 (5%) had neither of the ends mapping back to the yeast genome ().
| Table 1.Number of reads and identified ChromPETs for each category |
Identification of aberrant ChromPETs
Aberrant ChromPETs are those that (i) link two different chromosomes (interchromosomal), (ii) are too close or too far from each other on the same chromosome (intra-chromosomal deletions or insertions) or (iii) inverted in orientation relative to each other on the same chromosome. While aberrant ChromPETs of types (i) and (iii) are easy to call, we had to use a statistical cutoff to call those of type (ii).
Since the ChromPET library was generated using DNA fragments of ~1.5–2 kb in size, any intra-chromosomal ChromPET whose inter-tag distance was sufficiently far from this range should be classified as an aberrant ChromPET. To examine this, we plotted the minimum inter-tag distance for all intra-chromosomal chromPETs as a histogram (C). The distribution of inter-tag distances appears Poisson-like (C) with a median of 1118 bp and MAD (
Supplementary Data) of 172 bp. ChromPETs with inter-tag distances 5 MAD away from the median (≤258 bp or ≥1978 bp) were classified as aberrant chromPETs.
Because of the short size of the tags (12–20 nt) and the presence of repeat sequences in the genome, ~30% of the tags map to multiple sites in the genome. Instead of using a heuristic to guess the most probable alignment of a tag, we examined all possible combinations of the 5′ and 3′ tag addresses of a ChromPET. If even one combination reported on a normal linkage (i.e. the two tags map with an inter-tag distance between 258 bp and 1978 bp), the ChromPET was classified as a normal ChromPET. However, 6.92% of the chromPETs could not be explained by a normal linkage ().
| Table 2.Number of uniquely mapped ChromPETs and their distribution into the indicated categories |
Of the aberrant chromPETs, ~27.89% mapped in the reverse orientation (both inter- and intra-chromosomal), and 24.53% mapped to different chromosomes in the correct orientation. Of the aberrant chromPETs, 29.44% represented direct, intra-chromosomal deletions with tags mapping to the same chromosome but separated by >1978 bp in the reference genome, while 0.47% of aberrant ChromPETs represented intra-chromosomal insertion events with tags separated by <258 bp on the same chromosome in the reference genome. The remaining chromPETs had tags, which could not be classified into one category as defined above, because they showed combinations of more than one category. Hence, these were determined as ‘ambiguous’ chromPETs.
Prediction of aberrant linkages representing structural variation of the chromosomes
Some aberrant chromPETs may be the result of artifactual intermolecular ligation between genomic fragments during library construction and/or mis-mapping back to the genome. To reduce such false-positive calls, we required that multiple independent chromPETs report on an aberrant linkage, and rather than using a arbitrary number (e.g. two) we chose to find this number (of multiple independent chromPETs) using a statistically rigorous approach. First, we calculated the number of normal and aberrant chromPETs that cover each base pair in the genome and then determined the ratio of aberrant to normal ChromPETs at each base pair. This removes the bias toward repeat sequences in the genome since, such regions would have a high coverage by normal ChromPETs as well. D shows the coverage per base pair for aberrant and normal tags and the fold enrichment of aberrant ChromPET tags compared to normal tags for a 30-kb region of chromosome V. Next, we used a sliding window analysis (with a window size of 2000 bp and step size of 200 bp) to survey the number of aberrant chromPET covering each base pair and the fold enrichment of aberrant ChromPETs over normal chromPETs per base pair for the whole yeast genome. The distribution of these two variables across the entire genome is shown in
Figure S2. We then selected a cutoff of one MAD higher than the median for both aberrant chromPET coverage and fold enrichment over normal to determine areas of the genome with a high density of aberrant tags (the cutoffs are indicated in
Figure S2). This yielded 14 423 HDWs of 2 kb each for further analysis. Since this is an intermediate step in the analysis pipeline, we chose the most permissive cutoffs so that even windows that are moderately enriched in aberrant ChromPETs pass to the next stage of the pipeline.
For each HDW, we identified all the aberrant ChromPETs that were mapped to that window and the genomic locations to which the corresponding paired tags mapped. For example, in E, an HDW (W1) contains multiple tags belonging to aberrant ChromPETs whose paired tag resides in various different ‘partner’ windows (W2, W3, … , W7). The partner window with the most linkages was identified as MLW.
The distribution of ChromPETs linking a HDW with a MLW for the whole genome is shown in
Figure S3. To assess the statistical significance of our predicted aberrant linkages (between a HDW and the corresponding MLW), we developed a null model where all the addresses for the aberrant chromPETs were randomized, and aberrant linkages in this random population were identified (
Figure S3). The linkages between an HDW and a MLW were clearly more frequent (
Figure S3A) and more specific for a single MLW (
Figure S3B) in the experimental (or observed) data set than in the random control. To call a significant aberrant linkage in the experimental data set, we required the number of aberrant chromPETs linking a HDW to its partner MLW to be at least 3 MAD away from the median of the experimental distribution. At least 11 chromPETs were required to link a HDW with its MLW (
Figure S3A). In addition, the number of ChromPETs linking these two windows was required to represent at least 35% of the total aberrant ChromPETs present in the HDW being interrogated (35% is again 3 MAD away from the median percentage for the experimental distribution shown in
Figure S3B). These cutoffs are >9 SDs above the mean of the random model, >12 SDs for the mean percentage of the random model.
Streamlining of predicted aberrant linkages and summary of predictions
This generated 184 aberrant linkages. For a given structural alteration, it was common to find multiple contiguous windows on each side of the alteration to be aberrantly linked to each other. Once we merged such overlapping predictions we were left with 37 aberrant linkages (shown in
Table S2). In addition, when we had a unique genomic locus linked to a repeat element (like a Ty element), the unique locus appears to be linked to multiple sites, one for each site where the repeat element maps. In addition, such events are reported in both directions, doubling the number of reported linkages. After merging such unique locus-repeat element linkages, we were left with a total of 21 aberrant linkages. These linkages might represent any one of three types of structural variations—inter-chromosomal recombinations, intra-chromosomal insertions and intra-chromosomal deletions.
Of the 11 linkages reporting deletions, 10 pairs of linked sites were <3 kb apart from each other. Because this inter-tag distance was so close to the upper limit of normal inter-tag spacing (1978 bp) they either represent small deletions in the genome or arise from normal genomic architecture. The remaining one region with larger deletion, has been shown in . (All the predictions are reported in
Table S2).
| Table 3.Summary of all tested aberrant linkages and their experimental validation results |
The inter-tag distances of all the insertion chromPETs were found to be very close to the 258 bp inter-tag distance cutoff for defining normal chromPETs (C) and could therefore represent normal genomic fragments. PCR analysis of one of the candidate ‘insertions’ confirmed the reference genomic architecture (data not shown). Since several normal ChomPETs were obtained which spanned the aberrant linkages, these apparent insertions were considered to be false positives and not followed up further.
The most interesting aberrant linkages are the inter-chromosomal rearrangements. However, even in this group it quickly became apparent that several were anchored on one side by unique sequence, but were linked on the other side to a repeat element (e.g. a Ty element or telomeric repeat) so that instead of true inter-chromosomal rearrangements, they represent the insertion of a repeat element at the unique sequence site. We decided to validate by PCR both true inter-chromosomal rearrangements and a subset of the ones where a Ty element appeared to be inserted in an unique sequence (). Rearrangements where repeat elements anchored both ends of the aberrant linkage were not validated because of a difficulty in picking unique PCR primers.
Detection of expected chromosomal rearrangements
As mentioned above, RDKY3671GCR lacks a nonessential portion of the left arm of chromosome V. The majority of tags that mapped to region 33 500–35 000 of chromosome V had paired-tags, which mapped to the ribosomal DNA (rDNA) region of chromosome XII (A). PCR amplification using a unique primer sequence from chromosome V and a primer from the rDNA repeat sequence specifically yielded a product from RDKY3671GCR genomic DNA but not genomic DNA from other strains (B). A second PCR reaction on the amplified fragment using internal primers successfully amplified DNA (5F/12B. C) and sequencing of this amplicon identified the breakpoint. This breakpoint was flanked by a few base pairs of homology (microhomology) (D), consistent with previous reports that nonhomologous-end-joining using sites of microhomology are responsible for most of the translocations obtained in this system (
10).
In the parental strain RDKY3671, XRS2 on chromosome IV in RDKY3671 was disrupted by insertion of the HIS3 gene and thus we expected to detect this aberrant linkage in our study. Indeed, many aberrant ChromPETs linked XRS2 on chromosome IV (region 1 212 600–1 219 000) to the HIS3 locus located on chromosome XV (A) and we did not detect any normal tags that contained XRS2 gene sequence. PCR primers based on the paired-tag sequences confirmed the HIS3 insertion in the XRS2 locus (B).
Detection of Ty element insertions in the URA3 gene and at several sites in chromosome III
Several of the aberrant chromosomal linkages determined computationally were anchored on one side at a unique map position but were computationally linked on the other side to multiple sites in the genome. An examination of these multiple linkage sites revealed that they mapped within Ty elements, raising the possibility that these rearrangements were pointing to Ty element insertions.
The first of these types of anomalous linkages mapped on one side to chromosome V (region 115 400–117 000) near the
URA3 locus and on the other side to Ty element sequences (C). The parental strains, RDKY3671 and RDKY3615, carry the mutant
ura3-53 allele, which is caused by a Ty element insertion within the coding region of the
URA3 gene (
14). PCR amplification across this region of chromosome V from RDKY3671GCR genomic DNA yielded a DNA fragment ~6 kb larger than the predicted size fragment obtained using S288C genomic DNA (D), consistent with the presence of a full-length Ty element insertion in the
URA3 gene.
ChromPETs were identified that linked the chromosome III region 147 000–153 000 (A) to a full-length Ty element sequence. As a Ty element was not reported in the reference genome at this locus, we confirmed this chromosomal rearrangement using PCR. According to the reference genome, the primer pair 3E/3F should generate a PCR product of 3.8 kb; however, we obtained fragments >10 kb (B), supporting the unexpected insertion. In order to confirm that the PCR product was derived from the correct region, we sequenced both ends of the product. Both ends mapped to the expected sites in the genome, but the internal region of the PCR amplified fragment was not present in the reference genome and was perfectly matched to Ty1 element sequence (data not shown). Considering the PCR-amplified fragment size, it is possible that there are two copies of Ty element in this region and this is supported by a previous report (
15). This Ty element is present in the reference strain S288c, but as will be discussed, there is a reason why it was absent in the reference sequence.
The area of chromosome III around 83 000 nt was aberrantly linked to a Ty element sequence (). This observation was also confirmed by PCR (A and B). DNA fragments were amplified with primer pair 3A/T1 using RDKY3671GCR genomic DNA as a template. Interestingly, PCR with S288C genomic DNA also yielded similar DNA fragments (B), suggesting the presence of a Ty element at this locus even in S288c. In order to confirm this observation further, we designed additional primers based on Ty1 sequence and the region upstream of the predicted Ty element insertion site. DNA fragments consistent with a Ty element insertion were obtained with these primer pairs. Primer pairs that flanked the predicted insertion site amplified a fragment of 6 kb, consistent with insertion of a full-length Ty1 element (B) and sequencing confirmed that this is the case.
Because the Ty element at chromosome III, 83 000 nt, is missing in the reference sequence obtained from S288c, we wanted to show that the Ty element is indeed present in the S288c genome by Southern blotting independent of PCR. As shown in C, AseI digestion should yield 2210 bp and 972 bp fragments based on the reference sequence; instead, Southern blots showed a closely migrating doublet of around 2 kb (D). In addition, ClaI digestion yielded a 5-kb fragment instead of the expected 3-kb fragment based on the reference sequence (D). Both of these results are consistent with the sequencing results and indicate the presence of an unannotated Ty element in this region.
Similarly, we found two other ChromPETs that did not correspond to annotated Ty elements on chromosome XII (818 200–820 400) and on chromosome III (region 169 800–171 900). Primer pairs 3G/T4 and 12C/T5 yielded amplified products (C and D), consistent with the insertion of a full length Ty element in these regions, even in the S288c reference strain.
The aberrant linkage reporting a deletion event turned out to link the
MAT locus to the
HMRa locus (
Figure S4), bringing them closer together than expected in the reference sequence. This is consistent with the
HMRa1 cassette being copied into the
MAT locus, as expected in our yeast strain of mating type a.