A major goal of systems biology is to integrate biological functions of individual genes in terms of their interactions. Time course gene expression profiling, which can capture the global transcriptional responses to signals during a biological process of interest, offers a major data source to achieve this goal [1
In network modeling of gene expression data, assessing pair-wise relationships is often a starting point. In early days, correlation coefficient [2
], Euclidean distance, as well as their variations, such as partial correlations, empirical Bayes and bootstrap methods [4
], were used. They are effective for computing direction free linear dependence when the data are independent. Networks constructed this way are essentially co-expression networks. While having the appeal of being simple and intuitive, correlation metrics have limitations when applied to time course data. They assume independence of the order of the data points, while in reality the data at each time step depend on the previous time points. Ignoring the inter-time point dependence not only loses sensitivity toward detecting interactions but could also lead to erroneous predictions [5
Significant phase shift in the timing of expression changes have also been observed for highly associated genes [6
]. Some studies tried to identify the phase lag directly by shifting gene expression time series with respect to each other until the optimal alignment is reached. For instance, Qian et al
proposed a local clustering approach based on optimal pair-wise alignment through dynamic programming [7
]; Schmitt et al
used the Pearson's correlation [8
], Balasubramaniyan et al
used the Spearman rank correlation [9
], Pereda et al
used cross correlation [10
], to compute the maximum time-lagged similarity between two transcript profiles, and utilized the results to identify clusters. The degree of lag varies widely in different gene pairs, and these approaches need multiple runs to find the lag that best aligns each pair. The performance of the alignment depends on whether the lags are close to integer numbers of the sampling steps of the experiment.
More sophisticated methods were also developed. Aach and Church implemented both simple and interpolative time warping based on dynamic programming to identify an optimal alignment of two gene expression time series [11
]. Expanding this approach, Liu and Müller proposed a non-parametric time-synchronized iterative mean updating technique to construct modes of temporal structure in gene expression profiles [12
]. Bar-Joseph et al.
] developed an approach to align temporal data sets using piecewise spline fitting, extracting shift and stretch parameters for each data set. Butte et al.
] utilized digital signal-processing tools, including power spectral densities, coherence, transfer gain, and phase shift, to find pair-wise gene associations based on periodically expressed time-invariant gene profiles. More recently, a hidden Markov model based approach was utilized to infer the timing in gene expression changes under different experimental conditions [15
Linear and non-linear multivariate analysis and signal processing techniques were also introduced to analyze time series microarray data [16
]. Several studies used pair wise mutual information to infer interactions and regulatory relationships between genes [17
]. This method assumes a fixed time delay, which might not be true across different experimental conditions. In frequency domain time series analysis, causality and interrelationship among the components can be studied using coherence and partial coherence. Graphical models based on such analysis have been studied by Butte et al
] and Salvador et al
]. However, Albo et al
] showed that partial coherence-based causality measures are sensitive to measurement noise.
Apparently, more studies are needed to fully utilize the dynamics underlying the temporal gene expression pattern, and to better understand the complex spatial-temporal architecture of transcriptome. Recently, increasing evidence, including those from the advancement of single-cell time course gene expression profiling technologies [21
], suggest that like other complex dynamic systems in nature, synchrony through oscillation and frequency modulation is a general strategy for an organism to coordinate the transcription of multiple target genes in responses to external signals [22
]. Examples include the p53-Mdm2 feedback loop [24
], the NF-κB signaling pathway [27
], and calcium responsive pathways [23
]. These further emphasize the need of new methods to study and utilize the dynamics. The oscillations in gene expression, like other oscillations in biological systems [28
], are most often pulsatile or relaxed oscillations rather than harmonic, thus calling for mathematical methods rooted from phase space analysis [29
In this study, we investigate the potential of network inference using the phase locking analysis technique [31
]. This approach is based on the following concepts originating from nonlinear dynamics [29
]: if two time series interact with each other, there will be a process of rhythmic adjustment resulting from the interaction, leading to phase locking. Phase-locked oscillators progress through their trajectories in phase space at the same pace (1:1 locking), or rational ratios with respect to each other (m
being integers). Conversely, such phase locking phenomenon can be utilized to infer interaction between two dynamic systems, even for weak interactions [31
]. Recently Kim et al
clustered genes of synchronized oscillatory pattern (1:1 phase locking) during yeast cell cycle, and observed that genes in the same cluster were closely associated, as evidenced by the sharing of GO terms and BioGRID interactions [32
]. In this study we will apply the phase locking analysis to the Stanford yeast cell cycle data [33
], and examine the phase locking (including higher order locking) between transcription factors and targets, between gene pairs with prior evidence of other types of interaction, and between cell cycle genes. Based on the results, we will propose a new network inference approach utilizing the phase locking index, and examine the modular structure of the networks constructed and the biological themes shared by genes in the network modules.