Cuffdiff 2 performs differential analysis at transcript-level resolution of RNA-seq experiments and controls for both variability across replicates and uncertainty in abundance expression estimates caused by ambiguously mapped reads. In the absence of isoform switching or specialization, gene expression estimates reported by Cuffdiff 2 are consistent with those produced by count-based schemes. However, fold change in counts is a poor proxy for change in expression when there is substantial differential regulation of isoforms. Thus, although competing methods may offer higher gene-level sensitivity than Cuffdiff 2, they also report a higher background of false positives. In experiments where few genes are truly differentially expressed, this background could occlude the true positives. Cuffdiff 2 controls for cross-replicate variability and read-mapping ambiguity by using a model for fragment counts based on the beta negative binomial distribution. Experiments with real and simulated data show that Cuffdiff 2 is highly accurate at gene- and transcript-level resolution, even when used with benchtop sequencers.
Cuffdiff 2 performs integrated differential analysis of genes and transcripts within a single software workflow. Alternate means of performing differential analysis at transcript-level resolution that combine transcript-level fragment count estimates with existing count-based tools for assessing differential expression suffer from several limitations. Workflows combining methods fail to conform to several key requirements imposed by the component tools. For example, DESeq and edgeR expect that the input data are the number of perfectly and unambiguously mapped fragments that originate from each gene or transcript in each library. Failing to account for uncertainties in counts owing to ambiguous reads can result in false differential expression calls of transcripts with similar isoforms within the same gene, especially when sequencing depth is insufficient to accurately resolve the abundance of individual isoforms. Notably, our simulations show that this problem is more severe in genes with many isoforms, and cannot be eliminated by simply adding more replicates or sequencing depth to the experiment. Cuffdiff 2 surmounts this challenge by augmenting the cross-replicate variability modeling strategy used by count-based methods with incorporation of fragment assignment uncertainty computed for each gene. This enables it to dynamically control for uncertainty in highly complex or insufficiently sequenced genes. Recent large-scale transcriptome surveys have found that alternative splicing is extremely prevalent, with about three-quarters of human genes producing multiple abundant isoforms in a given cell type41
. Moreover, thousands of human genes contain introns that have ‘NAGNAG’ splice sites, where N is any nucleotide and either AG can form an acceptor, generating isoforms that differ only by a single codon42
. Thus, dealing with fragment assignment ambiguity is likely to be an increasingly important concern in differential analysis of RNA-seq data.
Commercially available library multiplexing kits have made sequencing-based designs cost-competitive with cDNA microarrays for expression analysis, but sequencing cost depends on overall sequencing depth, read length and number of replicates. Our simulations show that sensitivity is largely a function of depth, and specificity is mostly dependent on replication. However, long, paired reads dramatically aid in transcript and gene discovery, and we caution against the use of single reads in studies aimed at transcriptome assembly.
Cuffdiff 2 has offered a transcript-resolution view of the role of HOXA1, a critical regulator of embryonic development and body patterning, in maintaining adult cells. We have shown in different cell types that HOXA1 knockdown perturbs the expression of thousands of genes, alters the isoform selection of key cell cycle regulators and causes disruption of the cell cycle leading to cell death. Further experiments will be required to determine the nature and mechanism of the disruption and to identify the direct targets of HOXA1.
With Cuffdiff 2, RNA-seq can now be used for robust differential expression analysis at both gene- and isoform-level resolution with a single analysis tool. This creates opportunities for integrated genomic analysis of unprecedented scope and scale and can uncover biological phenomena not observable with other high-throughput technologies. Sequencing is now used to map histone modifications and protein-DNA interactions (ChIP-Seq43,44
), chromatin accessibility (DNase hypersensitivity45
) and conformation (ChIA-PET47
), and processing of RNA by protein (RIP-Seq48
). Analyses with complementary sequencing assays are becoming increasingly common. For example, a recent study coupled transcript-level resolution RNA-seq with CLIP-Seq to track splicing changes that depend on muscleblind-like RNA binding proteins, which play key roles in development and in myotonic dystrophy50
. The authors used high-throughput measurements of cellular state to connect sequence features of the genome to the transcriptional and post-transcriptional regulation of its genes. The large genomic footprints and numerous isoforms of many genes can greatly complicate such studies. Transcript-resolution measurements made with RNA-seq could drastically simplify the problem by eliminating unexpressed transcripts and isolating abundant ones. We are confident that the power and resolution offered by Cuffdiff 2 will allow biologists to better disentangle complex cellular circuitry and precisely relate genomic sequence to gene regulation.