Differential gene expression studies typically use the fold change statistic (the ratio of mRNA quantities between two samples) as input, and have been used to discover genes involved in adaptive stress responses which have not been previously characterized (i.e., “novel genes”) [1
]. Specifically, to correct for changes in gene expression induced by non-treatment related influences, fold-change values for time-series data are usually calculated using treatment and control data at every timepoint [1
]. One of the major factors causing gene oscillations under control conditions is the molecular circadian clock, which influences physiology and metabolism in preparation for predictable changes in light and temperature [2
]. However, a wide range of biotic and abiotic stress treatments have been shown to disrupt rhythmic clock patterns through amplitude changes or phase shifts [3
], resulting in significant fold changes for genes which are clock-influenced but are not involved in direct stress response. Figure demonstrates that genes can be differentially regulated due to direct stress responses (I), indirectly differentially regulated through disruption of clock pathways induced by the stress (II) or a combination of both (III). Additional complications in regulation patterns arise from the complexity of transcription factor pathways, in which targets may be regulated by clock components directly or through interactions with their transcription factors (Figure). For this reason, novel treatment-response gene discovery methods are complicated by the disruption of synchronization of the circadian rhythm pathways, but this complexity is not reflected in existing methods including fold change studies, clustering analysis approaches, and more complex time-serial-based algorithms [1
Figure 1 Biotic and abiotic stresses both directly and indirectly influence target gene expression patterns. Genes found to be differentially expressed may be influenced by (I) only direct treatment influences, (II) only indirect circadian-clock disruption influences, (more ...)
In this paper, we present the PRIISM (Pattern Recomposition for the Isolation of Independent Signals in Microarray data) algorithm to perform novel stress-response gene discovery analyses which correct for differential gene expression patterns induced by the circadian clock. As described previously [6
], although core circadian clock gene patterns undergo significant changes in phase and amplitude as a result of stress, they maintain oscillating frequencies which remain similar to each other, and still remain close to the circadian pattern of one cycle per day. It has also been shown that stress results in significantly increased average expression levels for stress-response genes [6
], which are reflected in the low-frequency signals (where one oscillation cycle occurs over the course of several days) for these genes. We assume that although circadian clock influences and adaptive stress-response influences can interact with each other (Figure), they still cycle at very different rates from each other (and therefore maintain separate dominant frequency ranges) under stress conditions. Based on these observations, we have developed PRIISM to project gene expression data to the frequency domain using the Fourier Transform, isolate independent signals, and then project them back to the expression domain to reconstruct independent gene expression patterns representing the effects of different genetic influences. PRIISM is capable of separating one gene expression pattern into three distinct gene expression patterns: (1) The treatment-frequency gene expression pattern, which has much of the complicating circadian influences removed, and consequently can be used to more accurately identify differentially regulated genes which are involved in direct treatment response, (2) the clock-frequency gene expression pattern, representing rhythmic patterns with a period of approximately one cycle per day, and (3) the noise-frequency gene expression pattern (Figure). By applying PRIISM on a cold-treatment dataset, we demonstrate that it can identify known treatment-response genes with a much lower false-positive rate than the existing methods, and can also identify important regulatory timepoints which are not obvious in the unprocessed data. In addition to improving performance when conducting novel treatment-response gene discovery, PRIISM also provides gene expression data which represent only circadian clock influences, and may be useful for circadian clock analysis studies.
Figure 2 PRIISM separates gene expression data into three independent gene expression datasets. PRIISM separates (A) the original gene expression patterns under control and treatment conditions (used to calculate the fold change pattern) into (B) treatment-frequency, (more ...)
Biological approaches such as the use of constant light and clock component genetic knockout mutants are applied in order to attempt to remove the influences of the circadian clock on target gene expression. However, constant light is an unnatural condition which reduces the applicability of the results, because natural biotic and abiotic genetic stress-response patterns depend on the time-of-day (the point in the light/dark cycle) at which the treatment is applied [6
]. Likewise, the use of genetic knockout mutants of circadian clock genes can reduce disruptions due to circadian input; However, since stress response genes may be regulated by clock components, the results of such a study are also difficult to interpret [7
Most existing computational approaches for studying differential gene expression in microarray datasets involve clustering algorithms designed to group genes with similar expression profiles, with the goal of identifying potential annotations for unknown genes [10
]. However, the gene distance measures used by all of these clustering methods are unable to distinguish adaptive-response gene expression patterns from circadian clock disruption gene expression patterns, and so may cluster genes with similar clock influences but very different treatment-response influences. Bar-Joseph et al’s (2003) continuous representation model for finding differentially expressed genes in time series micro array datasets (which has been used to find more cell-cycle response genes in yeast than conventional clustering methods) is also unable to filter clock influences from treatment response influences on gene expression patterns [21
Several studies have shown that between 6% and 31% of the Arabidopsis
genome is influenced by circadian clock genetic components [5
]; while another study suggests that there are significant baseline circadian oscillations for nearly 100% of the genome [24
]. A number of approaches have been developed for analyzing the circadian rhythms of genes in time-series datasets [5
]. Fourier analysis (which can be used to identify dominant frequencies in time-series data) has been applied to successfully identify periodic genes by treating time-series microarray datasets as time-domain signals [28
]. However, these Fourier analysis methods have not been widely used in differential gene expression study methods, because 1) in existing Fourier analysis applications [28
], a fixed frequency range was used as a priori
knowledge to discover genes with similar oscillations, but novel genes may have totally different frequency patterns under different treatment conditions and; 2) to accurately capture oscillating rhythms, high resolution time course gene expression data is essential according to Nyquist sampling theorem
], but such data have not been available until recently.
As the price of running microarrays and RNA-seq chips continues to fall, high-resolution time-series gene expression datasets that contain enough information to identify and characterize circadian-frequency rhythms for every gene are becoming available [34
]. Recently, Espinoza et al.
(2010) produced one such microarray dataset, which measured 16 timepoints covering a 58-hour time period with a cold treatment in Arabidopsis
]. Cold-stress genetic responses in Arabidopsis
are particularly well-characterized, and have been shown to significantly dampen and phase-shift the oscillations of the core clock genes CCA1
, which have regulatory influences over some cold-responsive transcription factors, including CBF1CBF2
]. Disruption of the expression patterns of other circadian output marker genes due to cold treatment has also been reported, including constant over expression of CAB2
and constant underexpression of CAT3
]. For these reasons, this is an ideal dataset to test whether the PRIISM algorithm is able to separate the strong circadian-clock influences on cold-response genes from treatment-response influences.