Analysis was performed on custom software developed in-house using BowTie (Langmead et al. 2009
) to map reads to the S288C reference genome available on SGD, downloaded on May 17, 2010. Python, NumPy, SciPy, and matplotlib were used to further process the data. The software’s source code is available (Saccharomyces Genome
Database). Of note, to maximize the information gleaned, unmappable reads were trimmed by four bases from the 3′ end and remapping was attempted; this was done iteratively until only 28 bp remained, at which point the read was considered unmappable. This end trimming typically doubled or more the number of mappable reads.
Expression levels were calculated by adding, for each base pair, the number of times that base pair was encountered in the RNA-Seq data. For example, if 14 reads overlapped a particular base pair, that base pair would have an expression level of 14. The expression level of a gene was calculated as the mean expression level across the annotated ORF; these values were quantile normalized for comparisons between multiple conditions.
The mappability of genes was calculated because if a gene is in or near an area of the genome with high homology to another genomic area, it becomes impossible to assign the genomic origin of an RNA-Seq read stemming from one of the homologous areas. Mappability of genes was determined by creating, for the plus and minus strands, simulated 76-mer and 28-mer reads starting at each base pair in the annotated genome and processing these reads in the pipeline. (Note from above that reads may have been 28, 32, 36, Δ, 76-mers.) Perfectly mappable genes would thus have an expression level of 104, as all 76 of the 76-mer reads and all 28 of the 28-mer reads would have intersected every base pair of the gene. Genes were considered mappable if the mean expression level across the gene was 90% or more of that (94 reads).
For a full explanation of the end-calling algorithm, refer to the software source code. In summary, for each annotated ORF the log2 expression levels for YPAD exponential growth and the other condition were retrieved and median normalized. The standard deviation of the expression level difference over the annotated ORF was calculated as the measure of signal noise, here called n, since the median-normalized expression of the annotated ORF should have been the same in both conditions. Ends were called as the first region—determined by both a 10-bp and 80-bp sliding window—which was expressed at a level of 3.5 times n or more away from (i.e., above or below) the expression level in YPAD exponential growth, although not less than a fourfold difference in expression. This cutoff was chosen by manual inspection and represents a conservative cutoff level. Tests were then applied to make sure that the called end was sufficiently expressed, was not an annotated intron, and was in a mappable region of the yeast genome. Each differential end also had to be at least 40 bp long.
RNA-binding protein (RBP) motifs from (Riordan et al. 2011
) were called by using the consensus motifs in Supporting Information
, Table S1
as a standard text search algorithm, for example, their motif AAACACAW could be matched to either AAACACAA or AAACACAU; no probabilistic weighting of nucleotide combinations was performed.
3′ RACE on the CDC19
gene was carried out using the RLM-RACE kit from Ambion, following all instructions therein. The 3′ RACE outer primer had sequence CACCGAAACCGTCGCTGCCT and the 3′ RACE inner primer had sequence TTTTCGAACAAAAGGCCAAG.
Luciferase assays were performed using firefly luciferase with Renilla
luciferase as a control, as described previously (McNabb and Reed 2005
). Instead of integrating the luciferase constructs, however, they were used on a plasmid, as described in (Chu et al. 2011
). Plasmid inserts were produced by DNA 2.0, and sequences are provided in Table S8