A transcriptional regulatory module (TRM) is a set of genes that is regulated by a common set of TFs. By organizing the genome into TRMs, a living cell can coordinate the activities of many genes and carry out complex functions. Therefore, identifying TRMs is useful for understanding cellular responses to internal and external signals. The advances of high-throughput genomic tools such as DNA microarray [1
] and chromatin immunoprecipitation-DNA chip (ChIP-chip) [3
] have made the computational reconstruction of TRMs of a eukaryotic cell possible.
Genome-wide gene expression analysis has been used to investigate TRMs controlling a variety of cellular processes in yeast [5
]. Clustering and motif-discovering algorithms have been applied to gene expression data to find sets of co-regulated genes and have identified plausible binding motifs of their TFs [7
]. Such approaches have also been expanded to incorporate previous knowledge about the genes, such as cellular functions [12
] or promoter sequence motifs [13
]. Moreover, some researchers used model-based approaches such as random Boolean networks [14
] and Bayesian networks [15
] to infer regulatory network architectures. However, this approach provides only indirect evidence of genetic regulatory interactions and does not identify the relevant TFs. On the other hand, the ChIP-chip technique was developed to identify physical interactions between TFs and DNA regions. Using ChIP-chip data, Simon et al.
] investigated how the yeast cell-cycle gene-expression program is regulated by each of the nine major transcriptional activators. Lee et al.
] constructed a network of TF-gene interactions and Harbison et al.
] constructed an initial map of yeast's transcriptional regulatory code. However, ChIP-chip data alone cannot tell whether a TF is an activator or a repressor and, most importantly, ChIP-chip data are noisy and, depending on the chosen p
-value cutoff, include many false positive or false negative TF-DNA binding relationships.
Since gene expression and ChIP-chip data provide complementary information, some researchers [20
] have integrated both types of data in their studies. However, most previous studies except the GRAM algorithm [21
] assumed that a gene is regulated by a TF only if the p
-value of TF-gene binding in the ChIP-chip data is ≤ 0.001, thus suffering a false negative rate of ~24% in determining TF-gene binding [19
In order to reduce the high false negative rate, we develop a method, called Temporal Relationship Identification Algorithm (TRIA), that uses the information provided by gene expression data to alleviate the effect of using a stringent threshold in determining TF-gene binding. A TF-gene pair is said to have a positively (negatively) temporal relationship if the gene's expression profile is positively (negatively) correlated with the TF's regulatory profile possibly with time lags (see Methods). TRIA identifies TF-gene pairs with a temporal relationship. We define that a TF binds to a specific gene if (1) the p
-value for the TF to bind the gene is ≤ 0.001 in the ChIP-chip data or (2) 0.001 <p
≤ 0.01 and the TF-gene pair has a temporal relationship. That is, we allow the p
-value cutoff to be relaxed to 0.01 if the TF-gene pair has a temporal relationship. Our approach is different from the GRAM algorithm [21
], which relied on sets of co-expressed gene to relax the stringent p
From the above procedure, we derive a binding score matrix. Then we develop the MOdule Finding Algorithm (MOFA) that combines this binding score matrix with the gene expression matrix to reconstruct TRMs of the yeast cell cycle (see Methods). For each of the five cell cycle phases (M/G1, G1, S, S/G2 and G2/M), MOFA exhaustively searches for all possible TF combinations and find their target genes. Once the set of target genes to which a common set of TFs bind is inferred, MOFA identifies a subset of these target genes whose gene expression profiles are positively correlated possibly with time lags. That is, the genes of a module not only share a common set of TFs but also have positively (time-shifted) correlated expression profiles. Our gene module is more general than that of GRAM algorithm [21
], which only searched co-expressed genes to form a module. MOFA reconstructs 87 TRMs. We then validate the biological relevance of each inferred TRM using existing experimental data, enrichment for genes in the same MIPS functional category [23
], known DNA-binding motifs [7