In this study, we presented two new algorithms for analyzing short time series gene expression data: ASTRO
that uses the order preserving matrix
concept and MiMeSR
that uses the minimum mean squared residue
concept. Both these algorithms use linear algebra techniques to identify coherent gene clusters over all time points in linear time. This offers a significant advantage over existing methods that employ greedy approaches or heuristics, since ASTRO
avoid problems related to cost functions or the choice of predefined sets of expression profiles [19
]. Also, the complexity of our methods is smaller than that of existing algorithms, even when it is compared to those that use a greedy approach to speed up their running time [9
identifies all gene clusters with coherent expression patterns, irrespectively of the magnitude of expression change. Genes belonging in such clusters are generally expected to participate in the same biological processes (see, also, Table ). MiMeSR
identifies all gene clusters with coherent expression change both in direction as well as in magnitude. Genes belonging in such clusters are expected to be regulated by the same set of transcription factors (although different transcription factors may act at different time points). In general, MiMeSR identifies clusters that are enriched for genes belonging to the same pathways, while ASTRO identifies clusters with genes that are regulated by the same transcription factors
(Tables , , and Figure ). In terms of overall statistical significance of the identified clusters, MiMeSR
in most cases (Figure ). In a direct comparison of the two algorithms in simulated data (Figure ) we found that ASTRO
when fewer data points are available, whereas MiMeSR
performs better when more data points are available.
Testing ASTRO &MiMeSR in well-characterized short time series gene expression datasets showed that it is robust to noise and to random patterns, and that it can correctly predict the temporal expression profile of relevant functional categories as confirmed by statistical analysis of GO category membership over-representation and analysis of transcription factor occupancy in the promoters of the gene members of the various clusters. Our approaches are shown to outperform existing clustering algorithms, including the popular k-means, as well as FCV and STEM and they were able to distinguish between closely related but biologically distinct patterns. As expected, ASTRO finds more homogeneous clusters than MiMeSR, with respect to the percentage of genes associated with a given transcription factor. This is because it takes into consideration the magnitude of the change in gene expression, which is more closely related to the transcription factors involved.
In principle, ASTRO and MiMeSR can also be applied to long time series gene expression data (more than 10 time points) or gene expression data sampled over different conditions, but in this case the number of genes in each cluster is expected to be low. However, they can be adapted to identify local patterns, thus overcoming this problem.