Detailed step-by-step protocols for polyA+
RNA purification and double stranded (ds) cDNA synthesis are presented in Supplementary Methods
Yeast strain BY4741 (MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0) was grown in rich medium (YPD; BD Company) at 30°C overnight, diluted to an OD600 of 0.15 and grown until reaching an OD600 of 0.87. The cells were harvested by centrifugation at room temperature, washed once with 1× PBS, and frozen in liquid nitrogen. Total RNA was extracted using the RiboPureTM-Yeast kit (Ambion) and analyzed by an Agilent 2100 bioanalyzer (Agilent Technologies).
Two 11-week-old female C57Bl/6J mice were dissected and whole brain was taken for RNA preparation. Total RNA was extracted using the Trizol method.
polyA+RNA was purified with the Dynabeads mRNA purification kit (Invitrogen) following the manufacturer's instructions and treated for 30 min at 37°C with 0.2 units of TURBO™ DNase (Ambion) per 1 μg of RNA.
First strand synthesis (FSS)
FSS reaction was prepared by mixing 0.5 μg of polyA+ RNA, 40 ng of (dN)6 primers (Invitrogen) and 25 pmol of oligo(dT) primer (Invitrogen) in 8.5 μl of 1× reverse transcription buffer (Invitrogen), 0.5 mM dNTPs, 5 mM MgCl2 and 10 mM DTT. The mixture was incubated at 98°C for 1 min to melt RNA secondary structures, then at 70°C for 5 min and was cooled to 15°C at 0.1°C/s. Slow temperature cooling was used to make annealing of secondary RNA structures and primers as reproducible as possible. At 15°C 0.5 μl of actinomycin D solution (120 ng/μl), 0.5 μl of RNase OUT (40 units/μl, Invitrogen) and 0.5 μl of SuperScript III polymerase (200 units/μl, Invitrogen) were added to the reaction. Temperature of reverse transcription reaction was increased gradually as a compromise between survival of the enzyme, stability of the primers and denaturation of RNA secondary structures: heating from 15 to 25°C at 0.1°C/s; incubation at 25°C for 10 min; heating from 25 to 42°C at 0.1°C/s; incubation at 42°C for 45 min; heating from 42 to 50°C at 0.1°C/s; incubation at 50°C for 25 min. SuperScript III polymerase was finally inactivated at 75°C for 15 min.
Removal of dNTPs
EB (20 μl) (10 mM Tris–Cl, pH 8.5, Qiagen) was added to the reaction. dNTPs were removed by purification of the first strand mixture on a self-made 200 μl G-50 gel filtration spin-column equilibrated with 1 mM Tris–Cl, pH 7.0.
Second strand synthesis (SSS)
Since the Invitrogen kit was used for the SSS, the FSS buffer had to be restored after gel filtration. Water was added to the purified FSS reaction to bring the final volume to 52.5 μl. The mixture was cooled on ice. Then, 22.5 μl of the ‘second strand mixture’ [1 μl of 10× reverse transcription buffer (Invitrogen); 0.5 μl of 100 mM MgCl2; 1 μl of 0.1 M DTT; 2 μl of 10 mM mixture of each: dATP, dGTP, dCTP, dUTP; 15 μl of 5× SSS buffer (Invitrogen); 0.5 μl of Escherichia coli ligase (10 units/μl, NEB); 2 μl of DNA polymerase I (10 units/μl, NEB); and 0.5 μl RNase H (2 units/μl, Invitrogen)] were added. SSS reactions were incubated at 16°C for 2 h. ds cDNA was purified on QIAquick columns (Qiagen) according to the manufacturer's instructions.
About 250 ng of ds cDNA was fragmented by sonication with a UTR200 (Hielscher Ultrasonics GmbH, Germany) under the following conditions: 1 h, 50% pulse, 100% power and continuous cooling by 0°C water flow-through.
Preparation of libraries for Illumina sequencing platform
Libraries were prepared using the DNA sample kit (#FC-102-1002, Illumina), as described previously (4
), but with the following modifications: just before library amplification uridine digestion was performed at 37°C for 15 min in 5 μl of 1× TE buffer, pH 7.5 with 1 units of Uracil-N-Glycosylase (UNG; Applied Biosystems).
The procedure of paired-end sequencing library preparation was the same as for single read libraries except that different ligation adapters and PCR primers were used (#PE-102-1002, Illumina).
Amplified material was loaded onto a flow-cell at a concentration of 4 pM. Sequencing was carried out on the Illumina 1G Genome Analyser by running 36 cycles according to the manufacturer's instructions.
Image deconvolution, quality value calculation and the mapping of exon reads and exon junctions were performed as described previously (4
). Sequencing reads were aligned to the Mus musculus
(UCSC mm9) or Saccharomyces cerevisiae
(UCSC sacCer1) genomes using a modification of the Eland software (Gerald module v.1.27, Illumina). The mapping criteria of Eland are the following: sequencing reads should be uniquely matched to the genome allowing up to two mismatches, without insertions or deletions. We applied the following recursive modification of the Eland procedure: the first 32 bp of reads (trimming the last 4 bp of 36 bp reads due to Eland limitations) were aligned, then reads that do not match according to Eland criteria were trimmed to 31 bp, and aligned again. This 3′-end trimming of unmatched reads was done recursively down to a length of 25 bp. This modified procedure typically increases the number of uniquely aligned fragments by 20–50%, because sequencing errors that prevent successful alignment by the Eland criteria are mostly located at the ends of reads, and these are gradually trimmed off. Under these conditions, ~60% of the reads obtained here were matched to unique locations on the reference genome, whereas ~25% of the reads map to more than one genomic position and ~15% do not map to any location.
Mapping end tags
Unmapped sequencing reads with 1–11 nt long leading oligo(dT) stretches were used to map the 3′-gene boundaries. Leading oligo(dT) stretches were removed, and the remaining fragment was aligned on a reference genome.
The Eland program does not map reads with multiple hits on a genome. As a result, no sequencing reads were mapped to repetitive genomic regions. To visualize repeat-related gaps in the genome browser the following simulation was performed. The whole reference genome was sliced into 30-bp long fragments with a 10-bp overlap for mouse and a 1 bp overlap for yeast. These fragments were aligned back to the reference sequence using the standard Eland settings. About 80% of the reads for mouse and 90% for yeast were then aligned uniquely. The remaining reads producing multiple hits are shown in the genome browser by gray bars, representing repetitive genomic regions where in general expression levels cannot be resolved unambiguously.
Search for novel transcribed regions
The whole genome was split into 50 bp windows (non-overlapping). A ‘new transcribed region’ was defined as a joined group of more than two consecutive windows, with at least two sequence reads (in the same direction) mapped per window. The gap between ‘new transcribed regions’ should be at least 50 bp, and the gap between a ‘new transcribed region’ and an annotated gene (with the same transcription direction as the ‘new transcribed region’) at least 100 bp.