Identification of Copy Number Polymorphisms between Parental Strains
To identify transposed segments with confidence, we first excluded regions of the genome that differed in copy number or hybridization efficiency between the parental strains. Since a number of deletions in Y101 relative to S288C have been identified previously, these parental hybridizations were also used to assess our technical accuracy in identifying deletions. Genomic DNA from strains S90 and Y101 were separately hybridized to DNA microarrays. In both cases, genomic DNA from strain S288C was used as a hybridization reference. S288C is the original sequenced strain, and the strain from which the microarray probes were derived. The microarrays used in this study represent all coding and non-coding regions in the reference genome with an average probe size of ~750 nucleotides 
. We were able to identify all ten of the deletions identified previously in Y101 
, along with additional putative duplications and deletions ( and S1
). Some of these apparent deletions may reflect sequence polymorphisms relative to the probe sequence 
The number of microarray probes in each TS confidence class.
Identification of Transposed Segments between Parental Strains
Having identified, and eliminated from further consideration, regions of putative copy number variation between the parents, we aimed to identify putative transposed segments. We comparatively hybridized DNA from each of the four spores of six different tetrads against S288C, and identified putative transposed segments as those genomic regions with copy-number differences among the spores (). We would expect that, some fraction of the time, a transposed segment would result in a non-parental ditype (NPD) segregation pattern, in which two spores harbor a duplication and two spores harbor a deletion, or a tetratype (TT) pattern, in which one spore harbors a duplication, one spore harbors a deletion, and two spores harbor one copy each. The remaining tetrads would show the parental ditype (PD) pattern, in which each spore harbors one copy. The expected frequency of NPD and TT tetrads cannot be predicted in advance, since the expected frequency is a function of whether the loci are linked in the parental strain and the distance of each locus from the centromere.
The measurement noise inherent in DNA microarray hybridizations prevented us from relying entirely on the presence of perfect NPD and TT tetrads to identify transposed segments. Therefore, we initially used a more relaxed criterion to identify genomic regions of potential interest. If any duplications or deletions of that region were observed among the spores, it was placed into one of three classes based on the degree of fit to the expected segregation pattern (). In Class 1, a single duplication or deletion was observed in one of the tetrads; this class was the most liberal and included approximately 20% of the genome. In Class 2, at least one duplication and at least one deletion were observed, but in independent tetrads; this class included approximately 4% of the genome. In Class 3, the highest confidence category, at least one individual tetrad contained one or two duplications and one or two deletions; this class included 23 regions that covered 0.2% of the genome ().
Segregation of the six Class 3 putative transposed segments.
Characterization of Putative Transposed Segments
While some of the Class 1 and Class 2 regions may be true transposed segments, we initially focused our attention on the higher-confidence Class 3 regions. Many of the Class 3 regions were adjacent or nearly so, suggesting that they were part of larger transposed segments. Thus, we grouped Class 3 regions that were located within five kilobases of each other, and that showed compatible segregation patterns, into six distinct putative transposed segments (). Each transposed segment (TS) was named based on its chromosomal location in S288C, two on chromosome XV (TS15.1 and TS15.2) and one each on chromosomes I, IV, VII, and XVI (TS1, TS4, TS7, and TS16; ). They ranged in size from about 1.4 kb to 13.5 kb. Five of the six transposed segments were between 3 and 21 kb from a telomere and demonstrated the Class 3 pattern in at least four out of six tetrads. The lone exception, TS4, was also the smallest putative transposed segment and exhibited the Class 3 pattern in just one tetrad. Collectively, the six transposed segments contained a total of 15 annotated genes (seven ‘verified’, six ‘uncharacterized’, and two ‘dubious’) and one transposable element ().
Genomic location and structure of the six Class 3 putative transposed segments.
Table 2 Genes within Class 3 putative transposed segments, based on annotations in Saccharomyces Genome Database (SGD) November 2008 .
Characterization of TS15.1
We sought to determine the endpoints of TS15.1, the largest of the six segments, more precisely by manual inspection of the hybridization data (). The segment was initially identified by eleven closely linked Class 3 regions, including 12–17, 19, 20, and 22–24. Probes 18 and 21 had been excluded from consideration initially because they were not present on the particular batch of microarrays we used.
Detailed characterization of the TS15.1 region.
The positions of the TS15.1 endpoints were ambiguous on the basis of the aCGH data alone. While the most distal duplication (relative to the centromere) was at probe 12, deletions continued all the way to the telomere in Y101 (probes 0–11), in addition to several of the spores. At the proximal end of TS15.1, probe 25 had been excluded due to missing data from Y101 and probes 26 and 27 showed no evidence for copy number variation. While probes 28–31 had duplications in three of the spores, only one of these spores was duplicated in the core regions of TS15.1.
This ambiguity motivated us to further characterize the endpoints of TS15.1 by a PCR assay. We tested for amplification of an appropriately sized product from the parents and the spores using primers that corresponded to the ends of the microarray probes. While detection of duplications requires a more quantitative assay, our methodology could easily identify deletions. Amplification within a transposed segment should fail in a spore harboring a deletion while succeeding in both parents. Amplification could also fail if the primers span a transposed segment endpoint or if the primer sites have diverged between the two strains, but these cases can be distinguished from true deletions by a lack of amplification in Y101, since the primers are designed to match the S288C reference sequence.
Initially, we tested primer pairs corresponding to probes 11 through 32 and probe 36 in the two parental strains, the reference strain, and the four spores from tetrad 27 (, Tables S2
). Amplification was obtained from all genotypes using primer pairs from probes 11, 26–30, 32, and 36. Primers corresponding to probes 13 through 25 failed to amplify products only in spore 27A, consistent with the hypothesis that this spore did not inherit TS15.1 from either parent. Probes 12 and 31 failed to amplify in spore 27A and D, and also in Y101, indicating that segments 12 and 31 are candidate endpoints for TS15.1. To map the right endpoint more finely, we designed new primers to split probe 31 into two halves, 31L and 31R. The results support 31L as being external to the transposed segment, and identify 31R as containing the endpoint (as illustrated in ).
Results of PCR assays for the boundaries of TS15.1 in tetrad 27.
Model for the position and structure of TS15.1 in the parental strains.
The pattern of amplification in probes 12 and 31R is consistent with the endpoints of the transposed segment occurring within these two intervals. However, since probes 26–30 amplified in all genotypes, they appear to be external to the transposed segment. To confirm this surprising result, we used primers corresponding to probes 11, 14, 16, 19, 21, 27, 30, 32, and 36 in the other five tetrads. In all cases, the amplification results were consistent with the aCGH-predicted duplications and deletions in probes 13–25 and the PCR-predicted endpoints in probes 12 and 31R (data not shown). The presence of an apparent endpoint within probe 31R rather than probe 25 suggests that TS15.1 region differs not only in genomic location, but also in structure, between the two strains, possibly through an inversion of ~6 kb. Thus, we conclude that TS15.1 comprises the segments of the genome covered by probes 13–25 and portions covered by probes 12 and 31R, for a total of about 15 kb of DNA originating approximately 12 kb from the left end of the chromosome XV reference sequence ().
TS15.1 Resides on Chromosome IX in Y101
We can infer the position of a transposed segment in S90 based on its position in the genome assembly of S288C, but its position in Y101 is unknown. To map the transposed segment in Y101, we identified genomic regions that co-segregated with the transposed segment in F1 spores. We obtained parent-of-origin information for 6,215 open reading frame (ORF)-based probes in each spore using a second microarray-based technique, genomic mismatch scanning, or GMS 
. With these data, we can also identify meiotic crossover events that may have occurred within the transposed segment. We focused on the two genes within TS15.1, YOL158C (probe 23) and YOL160W (probe 16). In the F1 haploids, alleles of YOL158C derived from Y101 perfectly cosegregated with eleven genes near the left telomere of chromosome IX, while alleles of YOL160W derived from Y101 perfectly cosegregated with three genes immediately adjacent to the same region on chromosome IX ().
To verify the predicted location of TS15.1 on chromosome IX in Y101, the chromosomes of both S90 and Y101 were separated using pulsed-field gel electrophoresis (PFGE) and subjected to Southern blotting with a probe amplified from within TS15.1. The probe hybridized to the band that includes both chromosomes XV and VII in S288C and S90, while it hybridized to the chromosome IX band in Y101 (). A secondary signal, perhaps the result of cross-hybridization to the rDNA repeats, was observed from chromosome XII in all strains. Thus, the PFGE and GMS-data are both in agreement that TS15.1 is located on chromosome IX in Y101.
Experimental validation of the transposition of TS15.1 in strain Y101.
Despite the location of TS15.1 on two different chromosomes in the parental strains, and evidence for structural heterogeneity between the two alleles, two meiotic crossovers appear to have occurred very near, possibly even within, the TS. One of the events was observed in tetrad 32 between gene YIL154C and gene YIL155C and one was observed in tetrad 55 between YIL158W and YIL157C (the latter illustrated in ). The segregation patterns leads us to infer that the orientation of the segment in Y101 is opposite to that in S90, i.e. that YOL157C is distal to YOL161C (results not shown).
Evidence That S288C Represents the Ancestral Organization of TS15.1
arrangement is seen not only in S90 but in the genome assemblies of two other sequenced S. cerevisiae
strains: the wine strain RM11-1a 
and YJM789, a strain isolated from the lungs of an AIDS patient with pneumonia 
. While additional strains have been sequenced by the Saccharomyces Genome Resequencing Project 
, the genome assemblies have used S288C as a template, and thus are not informative regarding structural differences relative to that template. Instead, we examined the sequence reads that spanned the breakpoints, segments 12 and 31R. We found no evidence for the null allele in any of the strains, suggesting that TS15.1S288C
is by far the more common arrangement among present-day strains.
To determine the ancestral state for TS15.1, we examined the genome sequence of S. paradoxus
and S. bayanus 
. In the initial genome assembly, S. paradoxus
“contig 539” contains homologs to the genes YOL157C (probe 25), YOL156W (probe 27) and YOL155C (probe 30), which span the proximal endpoint of the transposed segment and are arranged in the same order and orientation as in S288C (). Likewise, S. bayanus
contig 223 contains homologs to the genes YOL163W (probe 10), YOL162W (probe 11) and YOL161C (probe 13) which span the distal endpoint of the transposed segment are also arranged in the same order and orientation (). While it is not possible to compare the genome organization distal to the transposed segment due to the incompleteness and fragmentation of the assemblies in this region for S. paradoxus and S. bayanus
, this nonetheless strongly suggests that TS15.1S288C
is the ancestral state.
Comparative genomic evidence that S288C harbors the ancestral form of TS15.1.
Cross-Platform Validation Using Paired-End Sequencing
To validate the aCGH results using an independent method, we obtained 28× coverage paired-end Illumina sequencing of Y101. The ends of each mapped genomic fragment were separated by 243±35 (mean±standard deviation) base pairs. To detect transposition events, we looked for instances in which the two ends of a paired-end read mapped to a different chromosome, or cases in which they mapped more than 5 kb apart from each other on the same chromosome, relative to the S288C reference sequence.
We detected 40 such discordant paired-end sequences, each represented by multiple independent sequence reads. The start and end points of these genomic segments are given in Table S4
. The end sequences can be used to locate and orient the position of the corresponding segment in Y101, and to examine the location of the breakpoints in both strains at fine resolution. For instance, the location of TS15.1 on chromosome IX in Y101 is confirmed by the paired-end data. Eighteen of the 40 discordant paired-ends map to different chromosomes, and thus may represent transposed segments. Interestingly, the paired-end data do not support the idea, suggested by the Class 3 transposed segments, that such rearrangements occur predominantly within subtelomeric regions. There are 21 regions for which the paired-end sequences occur at two loci on the same chromosome separated by only 5 to 15 kb, indicative of local rearrangements. In only one case are the paired-ends found on the same chromosome at a distance greater than 50 kb. Overall, four of the six Class 3 TS were validated by the paired-end sequencing data: TS7, TS15.1, TS15.2, and TS16. For the remaining two Class 3 predictions (TS1 and TS4), the many independent paired-ends that spanned the junctions mapped to the same locus on the reference genome within the specified tolerance. Therefore, we conclude that TS1 and TS4 are false positives, yielding an overall specificity of 67% for calling TS by their Class 3 status. The paired-end sequencing data provide evidence for rearrangements in ten of the 529 Class 2 array features, yielding a specificity of only 1.9% for these lower-confidence predictions. The number of rearranged features identified by paired-end sequencing data among those categorized as Class 0 or Class 1 is 0.09% (10 rearranged features out of a total of 11,442 Class 0 and Class 1 features).