DBY11331 (referred to as S2c1 in Gresham et al.) was isolated after ~188 generations of a sulfate-limited continuous culture seeded with the prototrophic haploid S. cerevisiae
strain DBY10147 (MATα
), as previously described [4
]. Illumina sequencing libraries were constructed from DBY11331 and DBY10147 genomic DNA following standard procedures and published recommendations [36
]. Briefly, 10 μg of yeast genomic DNA were sonicated to fragment sizes below 2000 bp, concentrated and end-repaired using the End-It DNA repair kit (EPICENTRE Biotechnologies). End-repaired DNA was A-tailed with GoTaq DNA polymerase (Promega) and ligated to Illumina adapters (QuickLigase, NEB). Ligation products between 300-400 bp were excised from a 6% polyacrylamide gel, eluted and ethanol precipitated. Fragment libraries were PCR amplified, cleaned following AMPure (Agencourt) and Qiaquick PCR clean-up procedures, and submitted for sequencing. We prepared two such libraries for each strain.
We collected 13,555,852 and 13,901,121 single-end, 36 bp, quality-filtered reads from DBY11331 and DBY10147, respectively, using the Illumina Genome Analyzer II platform. Reads were aligned to the UCSC sacCer1 reference sequence using Maq
] with default parameters for single-end reads (12,274,183 evolved strain and 10,441,548 parental strain reads), to a coverage of ≥99.8%. We filtered reads with low mapping quality (score <10) and obtained a final coverage of ≥93.5% with an average read-depth of 35× and 28× in the non-gap regions of the evolved and parental genomes, respectively [Additional File 1
: Supplementary Table S1].
For SNP-calling, we settled on a approach that required a nucleotide read depth ≥6× per position, with ≥80% base-calls supporting a SNP in the evolved genome data and ≥5× read depth, with ≥70% base-calls supporting a different base in the parental genome data. These nucleotide read depth thresholds allowed us to examine 90.99% of the mappable genome for SNPs [Additional File 1
: Supplementary Table S2]. A parallel analysis relying on consensus base quality, quality of adjacent bases, and read mapping quality filters yielded similar SNP calls, but the fraction of the genome compliant with the analysis criteria was slightly reduced [Additional File 1
: Supplementary Table S3].
We searched for small insertions and deletions by performing gapped alignment (BLAT
) of the Maq-
unmapped reads to the reference genome and recovering coordinates at which multiple unmapped reads show a bipartite alignment -an alignment to flanking sequences- as the best alignment. Candidate indel coordinates were reduced to sets specific to the evolved or ancestor genome. These strain-specific, candidate indels were then refined to maintain sites at which wild-type sequences are not observed in the Maq
-alignment in the corresponding genome sequencing data, but are obtained in the comparison strain [Additional File 1
: Supplementary Table S4].
To detect copy-number polymorphisms (CNPs), we averaged the per-nucleotide read depth data across 25 bp bins across the unique nuclear genome and normalized by the total nuclear bases acquired. For each bin, the log2-ratio in read depth between the evolved and parental data was calculated. Circular binary segmentation was applied on the ratios using DNAcopy
], available as an R package, to partition the genome into regions of equal copy number. Segments were smoothed by removing changes < 3 standard deviations, and only those spanning ≥1000 bp were considered for further analysis. We used only one lane of data for each strain for copy number analysis (NCBI Sequence Read Archive accessions SRX014130 and SRX014132).
Breakpoint sequences for the SUL1
amplification were discovered as follows: Unmapped reads from the evolved genome data were assembled into contigs using Velvet
, a de novo
assembler for short-reads [28
]. Contigs of unmapped reads were BLAT
-aligned against the reference genome sequence requiring ≥90% un-gapped sequence identity. Alignments were filtered to remove contigs in which shared identity to a single genomic region spans the length of the contig, and contigs whose ends fall within the unmappable portions of the genome. In addition, we filtered contigs to remove those for which the subsequence alignments do not cover ≥90% of the contig. From this filtered group (11 contigs), seven contigs are composed of mitochondrial subsequences, three of nuclear subsequences, and one shares sequence identity to nuclear and mitochondrial sequences [Additional File 1
: Supplementary Table S6]. The two contigs aligned to amplification boundary coordinates were selected as candidate breakpoint sequences for the SUL1
amplification and examined in detail. We estimated the probability of these candidate breakpoint contigs arising independently of the amplification by analyzing contigs of unmapped reads derived from the ancestor genome data. Briefly, contigs of ancestor genome unmapped reads were assembled and aligned to the reference genome sequence. This yielded two contigs of unmapped reads composed of sequences with alignments to the mappable nuclear genome coordinates [Additional File 1
: Supplementary Table S7]. The probability of observing such contigs was then calculated per mappable base in the nuclear genome. We limit this estimate to the nuclear genome to account for differences in the read-depth between the nuclear and mitochondrial genomes.
For Southern blot analysis, genomic DNA was digested overnight with BamHI, EcoRV or PstI (New England Biolabs). Samples were subjected to electrophoresis through 0.6% w/v agarose in 1× TBE overnight at 33 V, visualized after ethidium bromide staining and transferred to a GeneScreen™ hybridization transfer membrane (PerkinEilmer) in 10× SSC. Hybridization was performed at 65°C for approximately 20 h with 32
P-labeled "SUL1" and "BamHI" probes constructed by PCR [Additional File 1
: Supplementary Table S5].
Tiling array SNP analysis and ORF array CGH data were obtained as previously described [4
Data are archived at NCBI Sequence Read Archive (SRA) under accession SRP001478.