For decades, the budding yeast,
Saccharomyces cerevisiae, has been used as a model organism for studying eukaryotic molecular and cell biology. Its primacy as a model system and its small and compact genome made it the best eukaryotic candidate for the early sequencing efforts. In 1996, the strain S288c, first isolated in the 1950's through genetic crosses by Robert Mortimer
[1], became the first eukaryote to be completely sequenced
[2]. The determination of this reference sequence facilitated the construction of the first whole genome microarrays
[3] and the systematic deletions of all genes
[4], as well as the first whole genome protein and genetic interaction maps
[5],
[6].
Despite the fact that most of the genomic information has been obtained from S288c, this strain is often not the ideal genetic background for studying particular aspects of biology. In fact, the S288c background has a number of drawbacks, such as low sporulation efficiency
[7], an inability to grow on maltose
[8] and failure to initiate filamentous growth upon nitrogen starvation
[9]. For these reasons, other genetic backgrounds are used for physiological
[10], genetic and genomic analyses
[11]–
[13] in many laboratories. These other genetic backgrounds are sometimes of known close (e.g. A363A and W303) or distant (SK1) genetic relatedness to S288c. However, for some experimentally important strains the relationship is unclear and largely undocumented.
The use of these different strains may contribute to the inconsistencies in some biological results because it is often assumed that the genomic sequence information of S288c can be extrapolated to other strains. In fact, even for strains closely related to S288c, the small number of sequence differences may still have important consequences for different biological pathways and phenotypes. Therefore, understanding genetic differences between strains has become extremely important. Mortimer and Johnson
[1] traced a genealogy of the commonly used
S. cerevisiae strains from knowledge of their history. More recently, Winzeler
et al. [14] determined a subset of DNA sequence variation among a set of 14
S. cerevisiae strains using low-coverage oligonucleotide arrays and discovered 11,115 sites of variation among them. Although this study provided some insight into allelic differences between yeast strains, it was limited by the proportion of the genome covered by the array (~16%) and the 25 bp resolution for localizing these variants.
We recently developed a method for characterizing nucleotide variation in the entire genome using 25mer oligonucleotide microarrays (Affymetrix yeast tiling arrays) that provide complete and redundant coverage of the ~12 Mb
S. cerevisiae genome
[15]. This design provides for multiple measurements of each nucleotide's contribution to hybridization efficiency and therefore has the ability to detect the presence and location of single nucleotide polymorphisms (SNPs) and deletion events throughout the entire yeast genome with near nucleotide precision. We have employed this approach to characterize the nucleotide-level similarity and divergence between S288c and 7 commonly used lab strains (A364A, W303, FL100, CEN.PK, ∑1278b, SK1 and BY4716). The analyses revealed which genomic regions of each strain are derived from the reference strain and which regions are highly diverged, indicating a different ancestry. The data are presented in an online database, the Yeast SNPs Browser (YSB;
http://gbrowse.princeton.edu/cgi-bin/gbrowse/yeast_strains_snps). YSB represents a valuable tool for the yeast community, enabling the development of genomic resources for other yeast strains and informing conventional molecular biology methods such as PCR primer and Southern blot probe design.