A comparison of different mappers and assemblers allowed us to evaluate different reference-based strategies for genome annotation from RNA-Seq reads. We found, surprisingly, that the use of a more accurate mapper like GSNAP does not improve the accuracy of transcriptome reconstruction. Since GSNAP maps more reads compared to TopHat, increased coverage for highly expressed genes can be an obstacle to the correct reconstruction of isoforms, as shown in 
. As the per-run coverage of sequencing technologies is rapidly increasing, solutions will be needed to cope with this issue. One likely advantage of including more reads would be the improved annotation of lowly expressed genes, but we did not test this in our analysis. Instead, our quality assessment only relied on an overall performance.
Additionally, merging reads from different samples did not improve the reconstruction independently of the method used. This observation is consistent with recent guidelines, which suggest performing transcriptome reconstruction independently for each sample 
. This phenomenon can be attributed to the complexity of disentangling isoforms when the assembler is faced with the computational burden of handling many reads, a pitfall for both assemblers tested here, Cufflinks and Scripture.
We found Cufflinks to be more accurate than Scripture. This observation is in contrast to the results found for simulated data from mouse 
, where the opposite pattern was detected. Further investigation is necessary to understand the nature of this discrepancy.
A striking observation was the low consistency among the three approaches evaluated here. Irrespective of whether we analyzed the number of shared splice junctions or isoforms, we noted that each mapper-assembler combination resulted in a high fraction of private annotations, i.e. annotations specific to one approach (from 45% of the isoforms for TopHat-Cufflinks to 84% for TopHat-Scripture). These private annotations likely represent an annotation artifact. A similar amount of private isoforms was also found when Cufflinks and Scripture were applied to RNA-Seq data from human and mouse 
A high number of false positives produced by reference-based assemblies has been noticed before for Cufflinks 
and Scripture 
. We tried to reduce these effects by only using transcripts, which occurred in at least two samples. Even using this conservative criterion we identified alternative splicing for more than 30% of the genes. On the other hand, this is still a considerably lower number than has been detected in D. melanogaster
(30% in 
, 60% in 
). The difference can be explained since we only used adult flies in our study; most likely, the number of alternatively spliced genes will dramatically increase when additional developmental stages are included and isoforms not expressed in adults can be analyzed.
Among different alternative splicing modes, intron retention seems to be predominant in D. pseudoobscura
. The high fraction of frameshifts and premature stop codons and the short length of retained introns suggest that most of them may be unprocessed mRNAs. On the other hand, the pattern of exon length, intron length and UTR length are similar to those found in D. melanogaster
, suggesting phylogenetic inertia or similar selective pressures maintaining the gene length in both species.
Our de novo
assembly recovered 99 candidate transcripts potentially belonging to genes not yet included in the current D. pseudoobscura
annotation. New-extra genes have reduced GC content compared to new genes located on the scaffolds, most likely related to their very low expression level 
. This may suggest that new-extra genes are presumably located in heterochromatin, which is notoriously difficult to assemble. Moreover, the finding of new-extra genes highlights the incompleteness of the current D. pseudoobscura
assembly and raises further questions about the true number of genes in poorly annotated genomes.
We show that the annotation of genomes can be improved by using RNA-Seq data in presence of a reference genome. Nevertheless, our analyses also demonstrated that RNA-Seq based annotation is still in its infancy and more reliable approaches need to be developed. Using RNA-Seq data obtained from adult flies we considerably improved the annotation of D. pseudoobscura by revealing almost 7000 new isoforms for 30% of the multiple-exon genes, extending more than 40% of the gene boundaries and discovering 768 novel genes. Multiple conditions and different developmental stages will be needed to dissect the alternative splicing landscape at a deeper resolution. This improved annotation will contribute to the understanding of the regulation of alternative splicing in Drosophila. Further studies of putative D. pseudooobscura-specific genes may also shed light on genes contributing to speciation and genes with a novel adaptive role in this species.