In recent years, array-based analysis of chromatin immunoprecipitation (ChIP-chip) data has become a powerful technique to identify DNA target regions of individual transcription factors. ChIP-chip was first applied to yeast by Ren et al.
) and Iyer et al.
) based on promoter arrays. Nowadays, with the availability of sequenced genomes, ChIP-chip is predominantly based on tiling arrays (Johnson et al.
). The analysis of ChIP-chip data is challenging, because of the huge datasets containing thousands of hybridization signals. Most available methods focus on the analysis of tiling array ChIP-chip data to predict the DNA-binding targets of DNA-binding proteins like transcription factors or histones. Examples include a moving average method by Keles et al.
), a Hidden Markov Model (HMM) approach by Li et al.
), TileMap by Ji and Wong (2005
) using moving averages or an HMM to account for information of adjacent probes, or PMT by Chung et al.
) that integrates a physical model to correct for probe-specific behavior. Recently, a new HMM approach was developed by Humburg et al.
), outperforming TileMap in the context of the prediction of histone modifications from tiling array ChIP-chip data. Also ChIPmix (Martin-Magniette et al.
) based on a linear regression mixture model can be applied to this kind of analysis.
Here, we study three methods for the prediction of transcription factor target genes from promoter array ChIP-chip data. We consider (i) a standard log-fold change (LFC) analysis that does not integrate dependencies between adjacent genes on DNA; (ii) a two-state HMM that models dependencies between adjacent genes on DNA; and (iii) an extension of the two-state HMM to an HMM with scaled transition matrices (SHMM) that specifically models directly adjacent genes on DNA that are in head–head orientation to each other. The three methods are applied to two datasets, one of the yeast Saccharomyces cerevisiae
and another one of the model plant Arabidopsis thaliana
, to directly compare their predicted target genes. Regarding the HMM approach, the two-state architecture follows the proposal of Li et al.
). Our approach is extended in that way that all HMM parameters are directly learned from the ChIP-chip data using a Bayesian version of the Baum–Welch algorithm described in Seifert et al.
). The concept of SHMMs is based on the key assumption that promoters of directly adjacent genes in head–head orientation on DNA tend to have more similar ChIP-chip measurements then directly adjacent genes in tail–tail, tail–head or head–tail orientations. That gene pair orientation specific correlations of ChIP-chip measurements exist is clearly shown in for the three transcription factors ACE2, SWI5 and FKH2 studied in Saccharomyces cerevisiae
, and in for the seed-specific transcription factor ABI3 analyzed in Arabidopsis thaliana
. The high correlations of ChIP-chip measurements of promoters belonging to adjacent genes in head–head orientation are expected due to the design of the promoter array that contains spotted promoter fragments of each gene. Thus, depending on the distance between two genes in head–head orientation and the length of the hybridized DNA segments, one expects these correlations. The SHMM approach makes use of this observation by modeling that genes in head–head orientation have a higher probability that either both are targets of the transcription factor or both are non-targets of this transcription factor with respect to all other gene pair orientations. In general, good introductions to HMMs are given by Rabiner (1989
) or Durbin et al.
). Extensions of standard HMMs with one transition matrix to HMMs with more than one transition matrix are described in Knab et al.
). Some more details to SHMMs can be found in Seifert (2006
), and a concept similar to SHMMs has been developed by Meyer and Durbin (2004
) with an application to gene prediction.
Table 1. Pearson correlations of promoter array ChIP-chip measurements of transcription factors ACE2, SWI5 and FKH2 for the four gene pair orientations head–head, tail–tail, tail–head and head–tail based on all pairs of two directly (more ...)
Fig. 1. Pearson correlations of promoter array ChIP-chip measurements of the transcription factor ABI3 in the context of the four gene pair orientations head–head, tail–tail, tail–head, and head–tail of two directly adjacent genes (more ...)
In this article, we focus on the analysis of two promoter array ChIP-chip datasets. We start with an initial study in the context of the cell cycle of S.cerevisiae
. The three methods LFC, HMM and SHMM are used to predict common target genes bound by the transcription factors ACE2 and SWI5, and ACE2 and FKH2. We evaluate the common target genes using the Saccharomyces Genome Database by Cherry et al.
). Regarding A.thaliana
, ChIP-chip based on promoter arrays was established for the seed-specific transcription factor ABI3 (ARABIDO-SEED, 2008
). ABI3 is one of the fundamental regulators of seed development involved in controlling chlorophyll degradation, storage product accumulation, and desiccation tolerance (Mönke et al.
; Suzuki et al.
; To et al.
; Vicente-Carbajosa and Carbonero, 2005
). In an in-depth study, we use the three methods LFC, HMM and SHMM to identify putative ABI3 target genes, and we evaluate these genes using (i) publicly available expression data from Genevestigator (Hruz et al.
; Zimmermann et al.
) and (ii) transient assays, as described in Reidt et al.
), have been performed in wet laboratory experiments to test whether a promoter of a putative target gene is regulated by ABI3 or not.