|Home | About | Journals | Submit | Contact Us | Français|
Recent transcription profiling studies have revealed an unanticipatedly large proportion of antisense transcription across eukaryotic and bacterial genomes. However, the extent and significance of antisense transcripts is controversial partly because experimental artifacts are suspected. Here, we present a method to generate clean genome-wide transcriptome profiles, using actinomycin D (ActD) during reverse transcription. We show that antisense artifacts appear to be triggered by spurious synthesis of second-strand cDNA during reverse transcription reactions. Strand-specific hybridization signals obtained from Saccharomyces cerevisiae tiling arrays were compared between samples prepared with and without ActD. Use of ActD removed about half of the detectable antisense transcripts, consistent with their being artifacts, while sense expression levels and about 200 antisense transcripts were not affected. Our findings thus facilitate a more accurate assessment of the extent and position of antisense transcription, towards a better understanding of its role in cells.
Non-protein-coding RNAs are postulated to represent an important development in the genetic operating system of higher eukaryotes, where the ratio of non-coding to protein-coding DNA rises as a function of phenotypic complexity (1). Among them, antisense transcripts have been implicated in transcription, processing, stability, transport and translation of their corresponding complementary RNAs (2,3). Their role appears to be regulatory (4) and has been implicated in human genetic diseases (5).
Ongoing large-scale studies using cDNA sequencing and genome tiling arrays have reported a widespread occurrence of antisense transcripts in the genomes of simple organisms (6–9), as well as higher eukaryotes (10–12). The unexpected extent of antisense transcription over the genomes, as well as the observation of mirrored transcription for several sense–antisense pairs have raised the concern that some of these signals can be due to experimental artifacts (13). However, these concerns have remained speculative and no comprehensive independent experimental validation has been performed yet.
One possibility is that artifacts arise during target preparation by reverse transcription, confounding the interpretation of array data when both genomic strands are interrogated (Figure 1). Indeed, reverse transcriptase has a tendency to generate spurious second-strand cDNA based on its DNA-dependent DNA polymerase activity (14). Among several possibilities, priming of first-strand cDNA could occur by either a hairpin loop at its 3′ end or by re-priming, from either RNA fragments formed by degradation of RNA templates or from primers used for the first-strand synthesis (15,16).
ActD is a known inhibitor of transcription, as well as DNA replication. Less well known is the fact that ActD can also selectively prevent second-strand cDNA synthesis during reverse transcription (17) due to the specific inhibition of DNA-dependent, but not RNA-dependent, DNA-synthesis (18) (Figure 1). This inhibitory effect of ActD appears related to its ability to bind deoxyguanosine residues on single- and double-stranded DNA, possibly preventing either the annealing of DNA during priming or the elongation of DNA-dependent DNA-synthesis (19).
Here we show that a standard reverse transcription reaction, used for the synthesis of DNA complementary to RNA templates, is a major source of artifactual antisense transcripts. We demonstrate this using an array that contains 6.5 million probes tiling both strands of the full genomic sequence of S. cerevisiae at an average of 8 nt intervals on each strand and a 4 nt offset of the tile between strands (7). We propose that ActD can be used in cDNA preparations during reverse transcription reactions for hybridization to microarrays, to generate clean transcriptome profiles. We show that antisense artifacts are eliminated in samples prepared with ActD. These findings have implications for other genome-wide approaches involving reverse transcription.
S1003 (MATa/α gal2/gal2 lys5/lys5), an S288c background strain, was grown in 100 mL of rich medium (2% Difco peptone, 1% yeast extract, 2% dextrose) to mid-exponential phase (OD600 ~ 1.0). Total RNA was isolated by a standard hot phenol method. Poly(A) RNA was isolated from 1 mg of total RNA by using the Oligotex mRNA Midi kit (Qiagen). Each sample of poly(A) RNA was treated with RNase-free DNaseI using Turbo DNA-free kit (Ambion). For first-strand cDNA synthesis, 9 μg of poly(A) RNA was mixed with 4.5 μg of random hexamers and incubated at 70°C for 10 min, then transferred on ice. The synthesis included 2000 U of SuperScript II Reverse Transcriptase, 50 mM Tris–HCl, 75 mM KCl, 3 mM MgCl2, 0.01 M DTT, 0.25 mM dNTPs mix (Invitrogen) in a total volume of 200 μl at 42°C for 1 h. Samples prepared with ActD were treated the same, except that the drug was added after the denaturing step at 70°C to a final concentration of 6 μg/ml. Samples were then subjected to RNase treatment of 20 min at 37°C (30 U RNase H, Epicentre, 60 U of RNase Cocktail, Ambion). First-strand cDNA was purified by standard phenol extraction (1.5 ml Phase lock gel, Eppendorf) and ethanol precipitation. The sample was dissolved in DEPC water and digested by using 0.1 U DNase I (Invitrogen) in 1× One-Phor-All buffer (Amersham Pharmacia) and 1.5 mM CoCl2 (Roche) solution at 37°C, to yield fragments of 50–100 bp in size. Each sample was 3′ end-labeled with 0.07 mM Biotin-N6-ddATP (Enzo Life Sciences) using 400 U of Terminal Transferase (Roche) for 2 h at 37°C.
For genomic DNA hybridizations, the diploid yeast strain (S1003) was grown in rich media to saturation and genomic DNA was purified by using the Genomic tip protocol (Genomic DNA kit, Qiagen) including RNase A treatment. 10 μg of DNA was fragmented with 0.2 U of DNaseI (Invitrogen) for 5 min at 37°C and labeled as described for cDNA.
For genomic DNA and cDNA from poly(A)-RNA, labeled samples of 10 and 4.5 μg respectively were denatured in a solution containing 100 mM Mes, 1 M [Na+], 20 mM EDTA, 0.01% Tween-20, 50 pM control oligonucleotide B2 (Affymetrix), 0.1 mg/ml herring sperm DNA, and 0.5 mg/ml BSA in a total volume of 300 μl, from which 220 μl were hybridized per array. Hybridizations were carried out at 45°C for 16 h with 60 r.p.m. rotation. Our analysis is based on five replicate sample hybridizations of poly(A) RNA without ActD, three replicate sample hybridizations of poly(A) RNA with ActD and three of genomic DNA.
Arrays were normalized using the DNA reference normalization (20) as implemented in the tiling array package of Bioconductor (21). Only the probes that are unique and exactly match to the S288c genome were further considered. All arrays were then segmented together using the segmentation algorithm (20) as implemented in the tiling array package of Bioconductor.
To estimate a common background threshold for both ORFs and segment comparisons, we used the segments that do not overlap with any annotated, transcribed features. Only data from the hybridizations without ActD were used to determine the background threshold. The values of the segments were first sorted and the midpoint of the shorth (the shortest interval that covers half of the values) of the first 99.9% of the data was used to fit a normal distribution to determine the threshold at which the false discovery rate is 0.1%, as performed in David et al. (7). This estimated background threshold was used to rescale the probe intensity data, setting the background threshold to zero for all probes.
To compare the signal for both sense and antisense strands of ORFs between hybridizations performed with and without ActD, the probes were mapped to the annotated features (davidTiling Package in Bioconductor, gff data) and the same boundaries opposite to the annotated features. The signals of the probes were then averaged among the replicates and the median value of all the probes that mapped to the feature was used to represent the signal of that feature. An ORF or its antisense region was considered as transcribed if the expression signal was above 0 (Supplementary Table 1).
After segmentation, the probe signals within the boundaries of the segments were averaged among replicates and the median value of the averages was used to represent the expression signal of the segment. Segments were then assigned to different categories depending on how they overlapped with annotated features, as described in David et al. (7). The segments in reaction with and without ActD were categorized separately and classified as annotated ORF, annotated ncRNA, novel isolated, novel antisense, excluded, dubious gene. The excluded segments were the segments that have more than 50% non-unique probes. The novel isolated segments were the segments that do not overlap with any transcribed features on either strand. The novel antisense filtered segments are the segments that do not overlap with any transcribed features but overlap with a transcribed feature on the opposite strand. Antisense and isolated segments had to further fulfill the following two criteria: (i) have a length longer than 48 bp and (ii) be flanked by segments with reduced signal on both sides. If not fulfilling both criteria, they were classified as unassigned antisense or unassigned isolated, respectively (Supplementary Table 2).
We compared the stringent set of antisense detected by cDNA sequencing of strain S288c (8) to our data. Comparisons were based on gene names of the sense transcripts. Of 7562 features (e.g. genes) (davidTiling Package in Bioconductor, gff data), 218 features show antisense sequences by cDNA sequencing, 229 features have antisense segments in data generated with the addition of ActD, and 589 features have antisense segments in data generated in the absence of ActD. We found 26 features in common between the cDNA sequencing dataset and the ActD+ array dataset, and 36 features in common with the ActD− array dataset. A Fisher's exact test was performed on the numbers above.
The antisense segments were flagged by the filter if any median signal of a 100 bp sliding region on the opposite strand of the antisense segment was larger than the signal of the antisense segment itself, as applied previously (7). Two comparisons were performed between the two datasets (with and without ActD). In a first comparison the filter was applied to both datasets generated with and without ActD. In a second comparison the dataset on antisense segments detected in the presence of ActD was compared with the dataset on antisense segments detected in the absence of ActD but after filtering. The differences were manually checked. A list of these segments is available in Supplementary Table 4 and the expression profiles are available online on our website (www.ebi.ac.uk/huber-srv/actinomycinD).
Total RNA was isolated from strain S1003 (MATa/α gal2/gal2 lys5/lys5) as described above and was treated with RNase-free DNase I (TURBO DNase, Ambion) for 25 min at 37°C. DNA-free RNA samples were split into +RT and −RT reactions. In +RT reactions, three replicates for each antisense RNA were performed in the presence of reverse transcriptase. In each reaction 2 pmol of primer, designed to hybridize to the antisense transcript (Supplementary Table 5), was added and annealed to 2 μg of total RNA by 5 min incubation at 65°C, followed by cooling on ice. A strand-specific primer targeting ACT1 mRNA was always included as an internal control. First-strand cDNA was synthesized by using 200 U of SuperScript II RT (Invitrogen) for 50 min at 42°C. The enzyme was heat inactivated at 70°C for 15 min. RNA complementary to the cDNA was removed by Escherichia coli RNase H (10 U, Epicentre) and remaining RNAs were digested with 20 U of RNase Cocktail (Ambion). The −RT reactions were treated identically to the +RT reactions, with the exception that reverse transcriptase was omitted.
PCR was performed for the antisense transcripts of interest and for ACT1 individually by using 1 μl of the RT reaction as a template, two gene specific primers (250 nM each) (Supplementary Table 5), 200 μM dNTP and 1 U of Ampli Taq Gold (Applied Biosystems). As a positive control PCR reactions were performed with genomic DNA as a template. The following thermal profile was used for amplification: incubation at 95°C for 10 min followed by 25 cycles of three-step amplification at 95°C for 30 s, 60°C or 52°C (dependent on primer) for 30 s and 72°C for 30 s.
To assess the extent of artifacts generated by reverse transcription, we performed five replicates (biological) of standard reverse transcription reactions lacking ActD (ActD−) on RNA samples from yeast grown in rich media and analyzed cDNA targets by hybridization to tiling arrays. For sense transcripts, hybridization signals were concordant between all replicates. However, two types of signals were registered on the antisense strands opposite to expressed genomic regions. In one class, the signal intensities corresponded proportionally to the intensities of the sense counterparts (Figure 2A, upper panel); furthermore, substantial variability existed across replicates (Figure 2B and C, upper panel). In the other class, antisense signals were highly reproducible across all replicates and did not correlate with sense strand expression levels (Figure 2D, upper panel). These differences suggested that the first class of antisense signals might be artifacts, potentially triggered by spurious second-strand synthesis during reverse transcription as proposed by the models (Figure 1). Besides growth in rich media, we have seen the same pattern for several other conditions (data not shown). We postulated that the putative artifacts can be resolved using ActD. Indeed, three replicates of reverse transcription reactions performed with ActD (ActD+) resulted in expression signals below background for the first class, but not for the second class of antisense regions (Figure 2A–D, lower panels). In addition to the examples shown in Figure 2, profiles for all genomic regions are available online (http://www.ebi.ac.uk/huber-srv/actinomycinD).
A genome-wide comparison of expression signals with or without ActD over all coding genes [as defined by ORFs in the Saccharomyces Genome Database (SGD, http://www.yeastgenome.org)] and their opposite regions supported the interpretation that ActD reduces artifactual antisense signals. The number of antisense regions detected as expressed above background declined from 1046 in ActD− to 325 in ActD+. Moreover, only 25% (260/1046) of the cases observed upon standard first-strand cDNA synthesis are still detectable in ActD+ (Supplementary Table 1). Furthermore, consistent with its specific role in second-strand synthesis, ActD did not affect the number of sense transcripts detected above background. Out of 5703 coding-genes (ORFs), the majority is expressed above background in both conditions: 5214 ORFs in ActD+ and 5186 ORFs in ActD−; in addition, there is a nearly complete overlap between the two gene lists (5172 ORFs) (Supplementary Table 1).
Tiling arrays can be used for de novo identification of transcripts (7). Therefore, the hybridization signals of probes were examined along their chromosomal positions irrespective of previous annotation and separately for each strand. The profiles were partitioned into segments of constant hybridization intensity using a segmentation algorithm (7,20). We analyzed the effect of ActD for sense and antisense transcripts defined by segmentation (Supplementary Table 2). Antisense segments were defined based on three criteria: (i) the segments are expressed and overlap annotated genes located on the opposite strand, but not on the same strand, (ii) the segment lengths are longer than 48 bp and (iii) they are flanked by segments with reduced hybridization signal on both sides. A comparison shows that ActD has no quantitative effect on first-strand cDNA synthesis, since there is concordance between the expression levels of sense segments measured in ActD+ and ActD− reactions (Figure 3). However, the addition of ActD reduces the expression level below background for more than half of the antisense segments: among a total of 553 antisense segments, 347 give signal above background only in ActD− and 14 only in ActD+, while 192 antisense segments are detected in both conditions (Supplementary Table 3). Therefore, strikingly, the number of antisense segments observed by using a standard reverse transcription protocol is decreased by 64% (347/539) with the inclusion of ActD.
Three independent lines of evidence are in agreement with the array hybridization results generated in the presence of ActD. First, we compared our array hybridization data to the stringent set of antisense transcripts obtained by full-length cDNAs sequencing (8). Despite complementarities between these two methods, as well as the comparison being between different experimental growth conditions (rich media versus minimal media), a better overlap was achieved for the ActD+ than the ActD− dataset (P value of 1.6e−9 versus 2.5e−6, Fisher's exact test). Second, we evaluated the effect of a stringent computational filter on removing putative antisense artifacts from hybridization results. This filter was previously developed to computationally remove putative antisense artifacts by requiring segments to have higher expression signal than seen on the opposite strand for at least part of their length (7). On ActD− data, the computational filter reduced the number of antisense segments by 63% (337/539). In contrast, the filter had only a mild effect on the number of antisense segments detected in ActD+: of 206 antisense segments, only 39 were filtered out. In addition, a comparison of antisense segments detected in ActD− after filtering, and antisense segments detected in ActD+, shows good (but imperfect) concordance (158/250) (Supplementary Table 4). Since computational filters always face the dilemma of compromising between false positives and false negatives, the experimental improvement using ActD is more advantageous. This is clearly the case for antisense detection as the majority of antisense transcripts are weakly expressed (Supplementary Table 3) and thus hard to distinguish from noise. Notably, several antisense segments filtered out in ActD− (48 segments, Figure 4A orange dots) are detectable in ActD+, which suggests that these cases were erroneously filtered (manual inspection confirms this). In addition, ActD+ also resolves artifactual antisense segments that erroneously passed the computational filter (44 segments, Figure 4A green dots). Third, we conducted semi-quantitative strand-specific RT-PCR analysis for 10 antisense segments (Figure 4B): of the three antisense transcripts detected both in ActD+ and ActD− (MBR1, EPL1, MRK1), all three yielded a signal by strand-specific RT-PCR. In contrast, of seven antisense segments detected exclusively in ActD− (CYS4, EMP24, PNC1, MDH3, HAC1, TRR1, PFK1), all seven yielded negative results by strand-specific RT-PCR.
Our results indicate that about half of all antisense signals observed in yeast by conventional protocols hybridizing first-strand cDNA to high-density tiling arrays are experimental artifacts. These artifacts appear to be largely triggered by spurious synthesis of second-strand cDNAs during reverse transcription reactions. Although the inhibitory effect of ActD on spurious second-strand synthesis during reverse transcription has been known for over 30 years (17), its application to microarray cDNA hybridization has been largely overlooked. Recent technological advancements in microarray design have allowed profiling transcription at the level of strand-specificity, but when conventional protocols for cDNA generation were used, the data analysis required stringent computational filtering to reduce the artifacts (7). The development of more sensitive and high resolution technologies needs to be complemented by the development of more accurate protocol designs.
We find that ActD enables a substantial experimental improvement to resolve the artifacts early during generation of the raw data. Although the precise molecular mechanism of this inhibition still awaits further elucidation, ActD likely acts through exerting an inhibitory effect on double-stranded DNA synthesis. Its inclusion in reverse transcription reactions increases both the sensitivity and the accuracy of array-detected antisense signals. Thus, we encourage the use of ActD when strand-specific transcription is interrogated by microarrays.
As reverse transcription is used in technologies beyond array hybridization, the use of ActD has broader applications. While widespread antisense transcription remains detectable in the yeast genome, the extent and identity of this transcription over the genomes of several other organisms might be reconsidered in light of these findings.
Supplementary Data are available at NAR Online.
National Institutes of Health (GM068717, HG000205 to L.M.S.) and the Deutsche Forschungsgemeinschaft (STE 1422/2-1 to L.M.S.). We thank Ye Ning and Marina Granovskaia for advice on experimental methods, Eran Segal, Wolfgang Huber, Himanshu Sinha, Lior David and Julien Gagneur for helpful comments on the manuscript, Joern Toedling for website template, Wolfgang Huber for providing the computing resources, and the contributors to the BIOCONDUCTOR (www.bioconductor.org) and R (http://www.R-project.org) projects for their software. Funding to pay the Open Access publication charges for this article was provided by the European Molecular Biology Laboratory.
Conflict of interest statement. None declared.