Including spatial information into a graph that depicts a regulatory signaling network can be done in one of two approaches. In cases where different compartments can be physically defined (such as organelles such as the nucleus, or a subcellular compartment such as cytoplasm, soma, dendrite, etc.) one may modify the network to include compartment as part of a the name of a node. For example, instead of having a single node to represent proteins such as MAP-Kinase 1, 2, one can have two separate nodes – one for nuclear MAPK and one for cytoplasmic MAPK. The two nodes may be connected by a link, representing the translocation event (19
). This approach allows certain reactions to be assigned exclusively to the nuclear MAPK species, such as phosphorylation the nuclear kinase MSK(20
), without affecting the cytoplasmic MAPK, which may have its own specific substrates such as cytoplasmic phospholipase-A2
). Applying this approach to the PKA network is shown in . Such modified networks can be analyzed using any of the methods described above.
A different approach is to include detailed spatial information in the definition of nodes. This is a natural extension of the ODE method. Instead of associating each node with a time dependent concentration, the system is modeled by a set of partial differential equations (PDEs) and each node represents a time and space dependent concentration. Here there is no need to define compartments, as exact cartesian coordinates are included in the concentration definition, and reactions take place only if all the respective reactants are present simultaneously at the same coordinates. A limitation of this type of representation is the loss of topological information that is essential for the identification of motifs. Simulation of coupled-PDEs usually requires extensive computational resources and is currently impractical for large-scale systems.
One can use the spatial distribution information to construct a new graph where spatial information is used to specify the links. There are cases where experiments reveal spatial distribution and localization of various cellular components. This data can be processed into a network in the following manner: for each pair of components, calculate the correlation coefficient between their respective spatial distributions. The correlation coefficient between concentrations
is given by
is the standard deviation and the integration is over the whole space. C1,2
is a measure of the overlapping between
(). The spatial correlation (C1,2
) measures the extent to which distribution of component 1 may provide information and serve as a predictor to distribution of component 2. High correlation between components 1 and 2 implies similar localization of the two components. Areas with high concentration of 1 are expected to have also high concentration of 2, and in locations where 1 is absent, 2 cannot have significant concentration, either. Thus, it is enough to measure one of the components at a certain region in order to get good estimation of the two concentrations at that place (). Low correlation is an indication to independent distributions (). When one of the components (e.g. 1) is localized at a particular region, whereas the other component (2
) has a broad distribution, their correlation is low. In such a case, knowing the concentration of 1 at a particular point doesn't increase our knowledge about 2, and vice versa. It should be noted that such low correlation does not necessarily imply a lack of interaction, since a locally concentrated component may be able to interact with a widely distributed component. (C1,2
) may get negative values as well. Negative correlation (known also as anti-correlation) is a predictive tool just as the positive correlation. However, in cases of negative correlation, high concentration of 1 at a particular location indicates that 2 is expected to be absent from that region, and low concentration of 1 indicates high concentration of 2 (). Here again, like in the high correlation cases, it should be enough to measure one component in order to gain information about local concentrations of the two components.
Figure 3 Demonstration of spatial correlations in a schematic one dimensional cell. Concentrations are normalized to the range from 0 to 1 and space is labeled x and varies from 0 to 100. For example this could 1 to 100 microns within a cell. A. High correlation (more ...)
Consider the case where we would link any two components whose correlation coefficient is above an user defined threshold (for example 0.8). The resultant graph is the spatial co-localization graph (SCG). In the protein kinase A example shown in , the scaffold proteins Yotiao and AKAP450 will not be connected in a SCG although they are closely related in the chemical interaction graph. An SCG can be analyzed using conventional Graph theory metrics to find clusters and pathways which may indicate critical intracellular areas and routes. More importantly, the SCG can be used as a filter for the chemical interaction graph. The two graphs have the same set of nodes (representing intracellular components). In most cases, only pairs of components that are linked in both graphs have fulfilled the requirements for interaction from the biochemical and the spatial criteria. This way the spatial information filters out interactions which are possible biochemically but do not occur in a particular instance due to lack of colocalization between interacting components.
Current experiments as yet do not provide data sets of localization of intracellular components that allows us to construct an intracellular SCG. Thus in the system described in , we know where individual components are localized, thse are from different studies. So to illustrate the spatial co-localization graph, we have analyzed the data of Petyuk et.al (22
). This study describes the spatial distributions of over than 1000 proteins in the brain. It should be emphasized that this study does not include subcellular localization, but rather tissue level distribution. Nevertheless, this is the first study that describes such detailed spatial distribution on a large scale. There are ongoing efforts to conduct high throughput imaging of intracellular proteins (23
), but these large-scale datasets are not yet publicly available. From the Petyuk et al study, we downloaded the distributions of all the available proteins, and calculated the correlations between any pair of them. Most of the proteins are well localized, indicating their concentrations are non-zero in a defined region of the brain, and zero in the rest of the other regions. About 1/3 of the proteins had positive concentrations throughout the whole brain, indicating they are broadly distributed components. Despite their low correlation with all other components, these proteins can, chemical specificity permitting, interact with any other protein irrespective of the spatial distribution. To examine the effect of the broadly distributed components on the SCG we performed our analysis twice – once with all proteins, including those that have a wide distribution, albeit at varying levels and again with only the localized proteins. The number of protein pairs (i.e. specification of links), whose correlation is greater than a certain threshold is presented as function of the threshold in . When considering the whole data set (including the widely distributed proteins), the number of links decreases exponentially with the threshold, until the value of 0.8. Beyond that point there is a dramatic drop in the number of protein pairs with higher correlation. This sharp change is not seen in the respective plot relating exclusively to localized proteins (dashed plot in ). This difference indicates that within the subset of widely distributed proteins, the typical correlation is in the range of 0.8-0.95. At the very high correlation range (C1,2
>0.9) the difference between the two plots gets smaller, and they coincide at the end (C1,2
=1). Interestingly, there are about 20 pairs of proteins that are 99% correlated, and these proteins are all well localized.
Figure 4 Number of component pairs whose correlation is greater than a given threshold, as function of the threshold. Localized proteins are defined as proteins with zero concentration at least in one area of the brain as shown by Petyuk et al (22). 20 pairs of (more ...)
The SCG provides a new tool for understanding cellular regulation. As a threshold has to be determined while constructing the SCG, different threshold values produce different graphs and provide different information respectively. For example, taking a very high threshold would result in the graph only component pairs that are localized together in a very small region such as the 99% correlation described above. This high correlation indicates tight co-localization and thus indicates either a physical compartment, like nucleus, or common scaffold which is shared by the two correlated components. Even if there is no evidence for mutual chemical affinity between such tightly correlate components the spatial correlation can direct us to look for interactions, both direct and indirect ones, that may be mediated through scaffolds or anchoring components. Spatial correlation can also help to understand the functional role of a known chemical interaction. As mentioned above, correlation can be either positive or negative (). Whereas positive correlation indicates that the two components are co-localized and with appropriate chemical specificity an interaction will occur, negative correlation is an indication of the presence of one component and absence of the other. If the components have the chemical ability to interact with one another, defining a negative threshold and leaving only pairs of components whose correlation is below that threshold, gives us a graph in which each link may predict a regulatory locus, where the movement of a component is used to control chemical interaction and thus achieve local control of a subcellular process. Such regulation can be either direct or indirect. The exact pathway between the negatively correlated components would be found in the chemical interaction data that specifies binary interaction capabilities, but the functionality of such a pathway would be revealed by the differential spatial distribution.
However, considering spatial distribution by itself can result in erroneous representation of the system. This error arises from the fact that some cellular components may be broadly distributed, and nevertheless interacts with locally concentrated components. This situation can be seen by the analyses of the data of Petyuk et al. Using a threshold of 0.8, yields a graph consisting of 532 nodes (proteins) and about 44000 links (pairs of proteins with higher correlation than the threshold) (). The high density of links in the major island in this graph arises from the high correlation of the widely distributed components. This overwhelming connectivity obscures any meaningful information which may emerge from this graph. If the SCG in is filtered by considering only co-localized proteins then the system is reduced to 224 links between 142 proteins (). However, from visual inspection of it can readily be seen that the system is no longer a network but a set of isolated islands. This view is also not correct since it is likely that some of the broadly distributed proteins will interact with some of the local proteins and thus give rise to a better connected network rather that a set of islands. Thus the systems visualized in represent two extremes of the application of the spatial specification criteria and neither are realistic representations. Taking the system in , if we eliminate the links where mutual chemical affinity makes the interactions infeasible then we would obtain a much less densely interactive network. Such analyses is not wholly feasible for the Petyuk et al data since this is tissue not cellular localization, however the framework for mixed graphs where both localization information and mutual chemical affinity are used to specify links are described using a toy system.
Figure 5 Spatial correlation graph of the data obtained from Ref. 22, with a threshold 0.8. The colors indicate size of clusters. A. SCG of the whole data. 532 nodes and >44000 links are organized in one large cluster of 433 nodes (in red), one cluster (more ...)