Glucocorticoids (GC) are a class of steroid hormones present in almost every animal cell, playing a central role in a wide range of physiological responses [1
]. Because of their potent anti-inflammatory and immunosuppressive effects, synthetic glucocorticoids referred as corticosteroids (CS) (e.g. methylprednisolone - MPL) have been used widely in pharmacology as a therapeutic option for a wide range of autoimmune and inflammatory diseases [2
]. However, beneficial effects are derived from magnifying the physiological actions of endogenous glucocorticoids, causing a variety of side effects following long-term treatment with this class of drugs e.g. hyperglycemia, dyslipidemia, arteriosclerosis, muscle wasting, and osteoporosis [4
]. The physiological and pharmacological effects of corticosteroids are complex and manifest themselves with expression changes of many genes across multiple tissues [8
]. It has been observed that even in a single tissue different dosing regimens of CS administration can induce different patterns of expression [11
]. As such genes with similar expression profiles under acute CS administration may not exhibit similar expression patterns during continuous infusion, pointing to the possibility of alternative regulatory mechanisms. Therefore, a better understanding of corticosteroid pharmacogenomic effects from multiple dosing regimens are very valuable not only to reveal the transcriptional dynamics under different patterns of input perturbations but also to provide an insight into the underlying molecular mechanisms of action, for both beneficial and detrimental effects, and thus for the optimization of clinical therapies.
It has been noted that genes affected by CS include both immunosuppressive genes, mostly associated with therapeutic effects, and metabolic genes often associated with adverse effects whose regulation is mainly controlled by glucocorticoid receptor gene mediated pathways [6
]. Unbound CS binds with cytosolic free glucocorticoid receptors (GR) releasing it from the heat shock complex allowing dimerization and translocation into the nucleus where it binds to glucocorticoid response element (GRE) of the target genes, leading to enhancement or inhibition of the target gene expression. As a result, long-term treatment with corticosteroids results in sustained up- or down-regulation of numerous genes, leading to a new steady state which might be the basis for occurrence of adverse effects. However, it has also been noted that chronic infusion of CS causes a sustained down-regulation of the receptor (mRNA and thus protein) [14
]. While several alternative mechanisms have been proposed [16
] it is still not understood why drug effects remain strong although GR mRNA is down-regulated to the point of almost being eliminated. A plausible explanation is that besides direct regulation through GRE binding sites in the 5' regulatory regions of genes, there are changes in expression that are also the indirect results of effects due to changes in expression of transcription factors (TFs) that act as secondary biosignals directly or indirectly modulating the transcription of genes [15
]. Thus, along with identification of expression patterns, predicted regulatory control structures are also an essential source of information towards understanding corticosteroid effects.
In this study we address the question as to whether (1) significant transcriptional modules coupled with complex patterns of mRNA changes across multiple dosing regimens of corticosteroids and (2) their common regulatory controls can be computationally identified. Hypothetically, transcriptional modules that are significantly coexpressed under different dosing regimens will be important gene clusters for further analysis towards better understanding of both beneficial and adverse effects of corticosteroids, especially the metabolic side-effects since these patterns are survived under a long-term treatment of corticosteroids. The hypothesis explored here is that if two or more genes have the same temporal expression profiles in response to different dosing regimens, they are more likely to share some common regulatory mechanisms. The liver was selected because of its major role in both the physiological efficacious and adverse effects of corticosteroids e.g. altering the expression of serum proteins that regulate immune/inflammatory responses [21
], enhancing the expression of liver enzymes involved in metabolic effects (gluconeogenesis and lipid metabolism) [22
However, rich in vivo
datasets of pharmacological time-series across multiple dosing regimens are often obtained from different microarray platforms and time-sets [11
], leading to a problematical issue for computational analysis [24
]. As an example, in a study comparing normal and chronic lymphocytic leukemia B-cells, Wang et al. [27
] identified only 9 differentially expressed genes across all three studies, when combining results from three different platforms, while there are at least 1,172 differentially expressed genes in each individual platform. In general, there are two important issues relevant to the analysis of data derived from different platforms: (i) genes may be present in one platform but not in the other, and (ii) genes present on both platforms may not be represented by the same probes. Since different microarray platforms do not contain the same probesets, and even do not have a similar hardware design and sample processing protocols, standard analyses may not yield comparable expression level quantifications across platforms, leading to many challenges for computational models aiming at the analysis of microarray data from heterogeneous sources [25
A number of approaches have been proposed and are generally classified into two main categories: (1) integrate raw expression profiles from different studies into one dataset so that available computational models can be directly applied, and (2) develop and/or utilize a unitless statistic as a primary analysis and then combine the result through a meta-level analysis. The former category can be further divided into two sub-classes, namely combining raw data through a normalization and/or transformation procedure [30
] and pooling raw information from common probes that can be mapped to the same Unigene clusters or full-length mRNA transcripts [34
]. However, these approaches are not general enough to make data from different platforms fully compatible [25
]. Since combining data across different platforms remains a serious challenge, meta-analysis - the second category - has been identified as a more popular technique in order to combine results, and thus data, from a number of independent studies [39
]. The assumption here is that while the raw expression levels from different platforms may not be comparable, the results of the primary analysis should be. However, almost all prior studies has focused on the discovery of genes that are differentially expressed in conjunction with standard models such as effect size models [41
], Bayesian models [44
Consequently, in order to identify significant clusters of genes that share common expression patterns across multiple dosing regimens, we extend our prior study [46
] in the aspect of (i) producing an agreement matrix (AM) that describes the agreement levels of co-expression of genes across multiple conditions and (ii) successively searching clusterable subsets to infer all such gene clusters. The approach follows the concept of meta-analysis to avoid the limitation of incompatible data across multiple datasets from different platforms (also different tissues, time-grids, as well as lab-protocols when applicable). The unitless statistic, expressing the confidence level of co-expression is the agreement level of cluster assignments drawn from multiple clustering runs. There remain a number of open critical issues associated with a single clustering run (e.g. the input number of clusters [47
], the biases and assumptions of distance metrics and/or clustering methods [49
], cluster significance [50
]), and thus consensus clustering coupled with the examination of AM distribution has been designed with the aims of reducing aforementioned limitations [46
]. Once the AM is obtained for each condition independently (e.g. each dosing regimen in this case), an average agreement matrix is calculated to estimate the confidence levels of coexpression between genes across multiple conditions, thus combining data from different datasets into a single input for the next analysis. For the analysis at the meta-level, we extend the selection and clustering processes (also proposed in [46
]) to identify all possible clusters of genes that are highly coexpressed with the average AM above as the input. As such these clusters of genes will share common patterns of expression across multiple dosing regimens. Additionally, due to the selection of all possible patterns of expression several clusters may have similar expression patterns and thus we also provide a heuristic as an optional procedure to merge such similar clusters based on a criterion of maximizing the total homogeneity and separation of selected clusters. Subsequently, we analyze promoter regions of genes in every cluster in order to predict putative transcriptional regulators, aiming at providing another source of information towards better understanding those complex patterns of expression and the underlying regulatory mechanisms of corticosteroid effects.
Our results demonstrate that the proposed computational approach is highly effective on both synthetic and real data. When applying the approach to real time-series datasets (acute/chronic corticosteroid administration [11
]), selected patterns of transcriptional responses are enriched in a biological sense with relevant putative-regulatory controls and significant metabolic pathways in each pattern. Computational results are further validated predicated upon literature evidence.