Currently, RNA-seq is becoming the preferable choice for gene expression profiling in place of microarrays. Although, all the parameters that influence the various aspects of this method are yet to be understood completely, RNA-seq undoubtedly is playing a very important role in deciphering the complexity of the transcriptome by giving a new direction to isoforms, allelic expression, untranslated regions, splice junctions, antisense regulation and intragenic expression [10
]. Several studies have begun to investigate on the parameters like sequencing depth, precision, GC bias, length bias, lane effects, and processing artifacts [16
]. On the other hand, microarrays are in usage for more than two decades. Therefore, most of the biases inherent to this method have become more apparent [78
]. For instance, biases in the hybridization of the samples labeled with Cyanine5 (Cy5) and Cyanine3 (Cy3) are sufficiently explored, and currently several approaches are practiced to minimize such effects [79
]. Further, systematic variability like influence of the image scanner settings on the dye intensity measurements have now been robustly handled by applying various normalization techniques [83
]. Despite these developments, some inherent genes–specific biases like differential hybridization efficiencies of the labeled target transcript to the same probe are still found to be inevitable in microarrays. In RNA-seq as well as microarray, all these known and unknown parameters influence the final outcome. Therefore, in this study, we focused on the assessment of RNA-seq and microarray based on the final outcome .i.e. statistically significant differentially expressed genes.
In comparison with previous RNA-seq studies, with a sequence coverage of 97% we observed for our data set, is in consistence with the reported 89.5% to 95% coverage observed in other bacterial RNA-seq studies [87
]. In our study, RNA-seq has identified more significantly differentially expressed genes (82%), when compared to microarray (63%) as in previous studies [18
]. The overall correlation (rs
0.76) in the magnitudes of FC for the consensus genes between the two methods was found to be similar or higher than previous studies [18
]. Furthermore, our comparison analysis with qRT-PCR suggested that the expression levels were highly reliable for those genes that were determined to be differentially expressed by both RNA-seq and microarray. Hence, confirming the differential expression of genes by multiple methods reduces false positives thereby enhances the biological discovery.
Even though microarray overall outperformed RNA-seq by detecting more known HrpX target genes from the T3SS in hrp
cluster by satisfying both FC and FDR cut-off threshold, in principle RNA-seq also detected genes hrpB5, hrcS, hpaP
, XAC0395, hrpB7
, and hrcT
, in terms of FC, but failed to pass FDR threshold. This parameter is more directly influenced by error model considered in the statistical method that is used to infer the differential expression rather than RNA-seq itself. For the same read counts, one can get slightly different FDR values depending on the statistical method [90
]. But the implementation of all the statistical methods is not feasible for every dataset. From the T3SS in hrp
cluster, three genes namely, hrcC
, and hpaA
were not found to be detected by both RNA-seq and microarray, mainly because they fail to pass FDR threshold. Interestingly, our previous microarray analysis confirmed that all these three genes are regulated by HrpX, but only at a later stage of the growth phase by satisfying both FC and FDR cut-off thresholds [33
]. This consolidates the regulation of some of the genes at later stages of the growth phase. Further, in case of Type III effector genes, 8 genes (36.4%) were not detected by both RNA-seq and microarray within considered cut-off threshold limit. However, among them xopL
were found to be regulated by HrpX only at the later stage of the growth phase (OD600
time point 0.5), according to our previous microarray analysis [33
]. Further, four genes namely, pthA2
were regulated by another transcription regulator HrpG at early stage of growth phase (OD600
= 0.25 and 0.4) as observed in our previous study, while another undetected gene xopE
was found to be also regulated by HrpG, but only at OD600
= 0.25 time point of growth phase [33
]. Thereby this study further validated our previous results. Subsequently, both methods detected 100% of the genes known to be regulated by HrpX (at time point OD600
= 0.4) without any false positives. Among them, 72% were detected by both the methods while interestingly 28% of the known target genes were detected by either one of the methods. Hence, both the methods together could complement each other.
In addition 55 genes (~51%) were newly identified as differentially expressed by applying both microarray as well as RNA-seq methods, thereby adding up to the already existing repertoire of HrpX regulated genes. Furthermore, 46 (83.6%) genes among them were uniquely identified by either one of the methods. Overall, 21 newly identified genes were found to have PIP box in their promoter regions, wherein 14 (58.3%) genes were uniquely identified by either RNA-seq or microarray. The presence of the PIP box in the promoter regions of the HrpX-regulated genes uniquely identified by RNA-seq and microarray further not only confirmed that these genes are directly regulated by HrpX, but also that these candidates are not false positives. Consequently, 100% of the known HrpX regulated genes could only be detected together by both the methods, since each method missed out on some of the known genes; hence both the methods together enhance the understanding of HrpX regulome by providing a more comprehensive picture.