|Home | About | Journals | Submit | Contact Us | Français|
Excessive vascularization is a hallmark of many diseases including cancer, rheumatoid arthritis, diabetic nephropathy, pathologic obesity, age-related macular degeneration, and asthma. Compounds that inhibit angiogenesis represent potential therapeutics for many diseases. Karagiannis and Popel (PNAS, 2008) used a bioinformatics approach to idenify more than 100 peptides with sequence homology to known angiogenesis inhibitors. The peptides could be grouped into families by the conserved domain of the proteins they were derived from. The families included type IV collagen fibrils, CXC chemokine ligands, and type I thrombospondin domain-containing proteins. The relationships between these families have received relatively little attention. To investigate these relationships, we approached the problem by placing the families of proteins in the context of the human interactome including >120,000 physical interactions among proteins, genes, and transcripts. We built on a graph theoretic approach to identify proteins that may represent conduits of crosstalk between protein families. We validated these findings by statistical analysis and analysis of a time series gene expression dataset taken during angiogenesis. We identified six proteins at the center of the angiogenesis-associated network including three syndecans, MMP9, CD44 and versican. These findings shed light on the complex signaling networks that govern angiogenesis phenomena.
Excessive vascularization is a hallmark of many diseases including cancer, rheumatoid arthritis, diabetic nephropathy, pathologic obesity, age-related macular degeneration, and asthma. Compounds that inhibit angiogenesis represent potential therapeutics for many diseases. Judah Folkman performed pioneering research in the field of angiogenesis;1 his work led to the identification of a number of proteins and polypeptides with anti-angiogenic activity.2
Karagiannis and Popel3 used a bioinformatics approach to group the peptides with anti-angiogenic activity into families by the conserved domain of the proteins they are derived from. The families included type IV collagens, CXC chemokines, and type I thrombospondin domain TSP1-containing proteins. Karagiannis and Popel identified conserved domains within each family by performing a multiple sequence alignment. They ran BLAST for each conserved domain against the proteome to identify other peptides with sequence homology. Their work revealed more than 100 peptides derived from over 80 proteins with sequence homology to known angiogenesis inhibitors. We will refer to this set of proteins throughout the rest of the article as angiogenesis-associated proteins. We extended the series of work from Karagiannis and Popel3 to investigate the collection of interactions surrounding the angiogenesis-associated proteins. In this study, we selected three families: type IV collagen, CXC chemokines and TSP1-containing proteins, for which we identified interactions with other proteins, thus building a protein-protein interaction (PPI) network. Note that the grouping of these angiogenesis-associated proteins into families only indicates that they share one or more conserved domains.
Karagiannis and Popel experimentally validated in vitro inibition of endothelial cell (EC) proliferation and migration by peptides derived from type IV collagens,4 thrombospondin domain-containing proteins,5,6 and CXC chemokines.7 These studies showed that a large fraction of the peptides had anti-angiogenic potential. Using EC proliferation assays, they also revealed synergy between the peptides derived from the CXC chemokines and TSP1-containing protein families,3 thus suggesting a possible crosstalk between the signaling networks. A greater understanding of the signaling pathways associated with the peptides is an important step in understanding their mechanisms of action. In vivo experiments with selected peptides demonstrated anti-angiogenic activity in tumor xenografts8 and ocular models.9
While the functional relationships between these protein families and angiogenesis have been catalogued by the gene ontology,10 the relationships between pairs of protein families are not well characterized. To better understand the relationships within and between type IV collagens, CXC chemokines, and TSP1-containing proteins, we placed each family of proteins in the context of the human interactome including 126,763 physical protein-protein, protein-DNA, or protein-RNA interactions accumulated in the Michigan Molecular Interactions database (MiMI).11 We used graph diffusion (see Methods) to identify those proteins that are in close topological proximity with multiple angiogenesis-associated protein families. The proteins that are well connected to multiple protein families represent potential mediators of crosstalk. We verified their statistical significance by repeatedly rewiring the human protein-protein interaction network. We found that many of these proteins had perturbed gene expression during time course measurements of VEGF-stimulated angiogenesis in endothelial cells.
The interaction dataset was taken from the Michigan Molecular Interaction database (MiMI)11 (Feb 2009 version). The dataset is composed of 13,491 genes, proteins, and RNA connected by 126,763 physical interactions. The interaction types include protein-protein, protein-DNA, protein-RNA, and RNA-RNA. As a result, the dataset captures diverse aspects of biomolecular interactions including protein complexation, transcriptional regulation, and RNA interference. The dataset consists of interactions curated from reputable online databases such as Reactome,12 BIND, BioGrid,13 HPRD.14 This network of physical interactions forms the basis for crosstalk discovery. Gene Ontology (GO)10 annotations were used for verification (6/2010 version). For additional verification, we used a time series gene expression dataset of VEGF-induced capillary endothelial tube formation in a 3D collagen matrix in vitro.15 The dataset included 8 time points: 15 min, and 1, 3, 6, 9, 12, 18, and 24 h of VEGF stimulation.
By treating a biomolecular interaction network as a graph where nodes correspond to biomolecules and edges represent physical interactions between those biomolecules, we can efficiently find topological associations between protein families. Diffusion kernel algorithms have proven to be powerful tools for identifying topological associations between a node and a seed set of nodes. The method can be thought of in terms of repeated random walks originating at the seed nodes. A parameter γ controls the length of the random walks. A lower value for γ results in longer random walks. Nodes are then assigned a diffusion kernel score (DKS) based on the fraction of random walks that pass through the node. While many values of γ will suffice, we selected γ such that all nodes has some non-zero DKS.
Figure 1 illustrates the principle of graph diffusion on a simple network consisting of a chain of 11 nodes connected by 10 edges. The nodes lie along the x-axis. Node set 1 (NS1) consists of nodes 1 through 3. The DKS of all the nodes with respect to NS1 is given by the blue line. Node set 2 (NS2) consists of only node 10 in green. The DKS with respect to NS2 is given by the green line. To identify the crosstalk proteins with respect to NS1 and NS2, we identify the intersection of the minimum of NS1 and NS2 diffusion (shown in red) and a minimum DKS threshold. The curves intersect before node 4 and after node 6. As a result, nodes 4, 5, and 6 would be labelled crosstalk nodes with respect to NS1 and NS2.
High confidence associations are established through multiple short length paths. Even if a single path is found to be incorrect, alternate paths through the network will still support the associations. This aggregation of evidence from multiple paths leads to a more stable result from potentially unreliable data. As the DKS is additive, we normalize the DKS by the number of query nodes. The software for performing this operation is provided through our website (sysbio.bme.jhu.edu).
For a weighted undirected graph G(V,E) with vertex set V and edge set E, let A be the symmetric adjacency matrix representing G. Let qi be 1 if node i is in the query set or zero otherwise. We express the time derivative of the diffusion kernel score si(q) for node iV as
Let D be the degree weighted diagonal matrix of A. In matrix notation, we have
Our goal is to identify the values of s at steady-state. We set and solve for s.
We define crosstalk proteins as topologically close to multiple node sets. A protein p is a crosstalk protein if the normalized DKS is greater than a threshold for multiple node sets. A maximum distance (minimum DKS) threshold marks the annotation boundary for a node set. The parameter is constant across all node sets. In this study, the parameter is set at 0.018. We use statistics to verify the significance of the results found using our fixed values of . We define a crosstalk protein p relative to node sets q1, q2, … , qn, if
We computed the normalized DKS for each protein to each node sets. We normalized by the number of proteins in the node set. In this way, DKS are comparable for different node set sizes. DKS results are given in Table 2. Then for a given protein, we could identify all potential crosstalk with other node sets.
We computed the statistical significance of crosstalk proteins by permutation testing. We tested the null hypothesis that the DKS of a protein is equal to the DKS of a protein in rewired networks. The alternative hypothesis is that the DKS of a protein lower in rewired networks. To test these hypotheses, we generated 300 randomly edge swapped networks. The probability of the null hypothesis is given by the fraction of rewired networks where the DKS of the protein exceeds the DKS of the protein in the real network to a fixed set of seed proteins.
The statistical significance calculation controls for node set size and node degree. Crosstalk proteins can be compared and ranked based on their statistical significance. By computing statistical significance we eliminate a bias towards hub proteins in the network. The computation of statistical significance gives a global measure of the important of the associations that we identify through crosstalk proteins. We do not evaluate the statistical significance of the seed nodes (i.e. the angiogenesis-annotated proteins). Seed nodes were selected for the study and as such they are inherently biased.
We identified enriched functions for sets of crosstalk proteins using Ontologizer 2.0.16 Results are shown using default settings with Parent-Child-Union association and the Benjamini-Hochberg method of multiple hypothesis correction. The background set consisted of all human proteins in the interactome according to MiMI.17 All network images were produced using the Cytoscape18 network visualization software.
We aimed to (i) identify proteins that may be mediators of crosstalk between angiogenesis-associated protein families and (ii) characterize their association with angiogenesis. We accomplished the first aim by application of graph diffusion on the human molecular interaction network followed by verification of statistical significance. We accomplished the second aim using a previously reported time series gene expression experimental dataset taken during angiogenesis.15
We used the human physical interactome as a basis for the analysis. We used a graph theoretic technique called graph diffusion to quantify the distance between proteins in the interactome19 (see Methods). The graph diffusion method also known as the diffusion kernel allowed us to quantify the distance between a single protein and a protein family. We referred to the distance between a protein and a protein family as the diffusion kernel score (DKS). A protein with a high DKS interacts closely with the protein family. For example, consider the family of type IV collagen fibrils, a protein that physically interacts with all type IV collagens would receive a high DKS, while a protein that only indirectly interacts with type IV collagens would receive a relatively lower DKS. We use the DKS to estimate the association between a single protein and a family of proteins.
To locate those proteins that potentially mediate crosstalk between families, we define crosstalk proteins that are highly associated with multiple protein families (i.e. the proteins have a DKS which is greater than a threshold for multiple families). For example, a crosstalk protein for type IV collagens and CXC chemokines would have many direct and indirect interactions with both protein families. We evaluate the statistical significance of a crosstalk protein by considering hundreds of rewired networks. We create each rewired network by repeatedly swapping interactions. The statistical test that we use for crosstalk proteins controls for the size of the protein families and the degree of protein interaction.
Using this approach, we found 126 proteins that were topologically close to the angiogenesis-associated protein families. We evaluated the quality of the protein annotations by their statistical significance and functional enrichment in angiogenesis. To put this network in context with the rest of the known human interactome, these are less than 1% of proteins (i.e. 0.93%) and interactions (0.25%). The analysis pointed to many proteins whose role in angiogenesis is well known, which serves as a validation of the approach. There are 194 human proteins that have angiogenesis as part of a GO annotation (as of 6/2010). The likelihood that a protein is annotated with angiogenesis by chance is 0.014. Excluding 31 seed proteins, our analysis of the human protein-protein interaction network identifies 4 proteins that have angiogenesis as part of their GO annotation. The probability that the 95 (i.e. 126 associated − 31 seeds) proteins contained 4 angiogenesis annotated proteins by chance is 0.045. We calculated the p-value using Fisher’s exact test. Our analysis suggests new or understudied modulators of angiogenesis. These centrally located proteins may be attractive targets due to their potential to minipulate multiple protein families.
In Figure 2, we show a Venn diagram to illustrate the associations of the 126 proteins. These proteins are topologically close to the type IV collagens, CXC chemokines or TSP1-containing proteins or some combination of families, as indicated by the figure. The figure gives the putative crosstalk between three angiogenesis-associated protein families: type IV collagens (blue), CXC chemokines (red), and TSP1-containing proteins (green). The crosstalk proteins are shown for CXC chemokines and type IV collagens (purple), CXC chemokines and TSP1-containing proteins (tan), type IV collagens and TSP1-containing proteins (yellow), and between all three (orange). The number of angiogenesis-associated proteins is shown in parentheses. In the results, we focus on the proteins associated with multiple families. First, we discuss crosstalk proteins between type IV collagen and TSP1-containing proteins. Then, we highlight six proteins identified as crosstalk proteins between all three families. These six proteins: three syndecans, MMP9, CD44 and versican may be important mediators of crosstalk for these angiogenesis-associated protein families.
Crosstalk between pathways is an important concept in biology. There have been both computational20 and experimental21 efforts to identify crosstalk between pathways. Some of these approaches are not suitable in this context because they rely on overlapping pathways to identify crosstalk. Alternate approaches might consider “first neighbors” or “second neighbors” to identify association between pathways or modules. These rigid approaches have the inherent disadvantage of being unable to identify crosstalk between modules of distance 2 for “first neighbors” or distance 3 for “second neighbors. Other studies used shortest paths to help define crosstalk proteins.20,22 These methods borrow from concepts such as betweenness centrality. Because graph diffusion considers all paths, our method has inherent advantages over those that only consider shortest paths between proteins.
To motivate the use of the graph diffusion method, we performed a systematic comparison of three alternative methods in a head-to-head comparison with graph diffusion. The we compared graph diffusion with methods based on first neighbors, second neighbors, and betweenness centrality. In Table 1, we show the results of this comparison. We found that graph diffusion identified more statistically significant proteins at both the 0.01 and 0.05 levels. The graph diffusion method identified a more functionally cohesive set of proteins as demonstrated by the number of GO term enrichments at the 0.001 and 0.0001 levels.
To further validate the role of the crosstalk proteins in angiogenesis we reanalysed a time series gene expression dataset taken during VEGF-induced angiogenesis. We expected that many crosstalk proteins would have perturbed gene expression during angiogenesis. If this proved to be the case, the microarray dataset would provide additional evidence of the role of crosstalk proteins in angiogenesis.
A research team led by Claesson-Welsh took measurements from a gene expression time series of VEGF-induced capillary endothelial tube formation in a 3D collagen matrix in vitro.15 The dataset included 8 time points: 15 min, and 1, 3, 6, 9, 12, 18, and 24 h of VEGF stimulation. We reanalysed these data to identify the transcription profiles that are significantly increasing or decreasing during tube formation (that we refer to as angiogenesis). To accomplish this, we ranked transcripts by the absolute value of the covariance between the transcript measurements and the time points. We tested the null hypothesis that the crosstalk proteins are uniformly distributed among the ranked list of genes. We computed the family-wise error rate (FWER) p-value using gene set enrichment analysis23 which is based the Kolmogorov-Smirnov test followed by permutation testing. We found the crosstalk proteins significantly enriched at the head of the ranked list of perturbed genes (p=3·10−4). In Table 2, we give the trajectory of gene expression during VEGF-induced angiogenesis. We measure the trajectory of gene expression change by the covariance between the gene expression and the time points. The statistical test indicates that many of the crosstalk proteins have either increasing or decreasing gene expression during angiogenesis. This analysis helped confirm the importance of these crosstalk proteins in VEGF-induced angiogenesis and serves as a validation of our bioinformatics analysis.
We studied the association between type IV collagens and TSP1-containing proteins to reveal the mediators of crosstalk between these two families. In Figure 3, the crosstalk proteins between type IV collagens and TSP1-containing proteins are highlighted in yellow. A significant number of these proteins bind collagen and associate with the vesicle lumen (Table 3). CD36 is also known to interact with type IV collagens.24 The identification of CD36 as a crosstalk protein for type IV collagens and TSP1-containing proteins helps confirm our approach. Decorin (DCN) is another proteoglycan that we identify as a crosstalk protein. Decorin interacts with collagens and extracellular matrix (ECM) and promotes angiogenesis.25 Fibronectin 1 (FN1) is an important connective molecule in the extracellular space. FN1 has domains for collagens, fibulin 1, heparin, and syndecan binding.26 We identify FN1 as a crosstalk protein between type IV collagens and TSP1-containing proteins. FN1 connects extracellular collagens with membrane-bound integrins (Figure 3). As such, FN1 has a central role in endothelial cell adhesion to the ECM. Another important conduit of information between TSP1-containing proteins and type IV collagens is through aggrecan (ACAN) and brevican (BCAN) through fibulin 2 (FBLN2).27,28 The crosstalk between type IV collagens and TSP1-containing proteins through ACAN, BCAN, and FBLN2 has not been reported in the context of angiogenesis, although it is known that FBLN2 inhibits tumor angiogenesis.28 The crosstalk between type IV collagens and TSP1-containing proteins may be significantly influenced by fibronectin 1, aggrecan, brevican, and fibulin 2. The amyloid beta (A4) precursor protein (APP) is also annotated as a crosstalk protein between type IV collagen and TSP1-containing proteins. Figure 3 shows the direct interaction between APP and COL4A1, COL4A2, COL4A5, COL4A6 and TSP1-containing spondin 1 (SPON1). APP is known to be associated with Alzheimer’s disease.29 It is also known that Alzheimer’s disease is related to angiogenesis.30 This study suggests angiogenesis might influence Alzheimer’s disease through the association between APP and type IV collagens and TSP1-containing proteins.
We were also interested in identifying the potential avenues of crosstalk between type IV collagens, CXC chemokines, and TSP1-containing proteins. We identified six proteins that are well connected to all three families of angiogenesis-associated proteins. In Figure 3, we show the crosstalk proteins between all three families in orange. A significant number of these proteins bind collagen and are localized on the cell surface (Table 3). MMP9 was identified as a crosstalk protein between the three families of angiogenesis-associated proteins. MMP9 is known to degrade type IV collagens31 and CXC chemokines like PF4.32 Thrombospondins are known to regulate the amount of MMP9.33 These functions outline the pivotal role of MMP9 in association with angiogenesis. Although MMP9 degrades many proteins, the interaction between MMP9 and the angiogenesis-associated protein families is highly significant (Table 2, p=0.004).
Our work highlights syndecan 1 (SDC1), syndecan 2 (SDC2), syndecan 4 (SDC4) at the centre of crosstalk between type IV collagens, CXC chemokines, and TSP1-containing proteins. Syndecans have been previously implicated in angiogenesis.34 Endothelial CD44 plays an important role in tube formation during angiogenesis.35 Our study suggests that CD44 may operate as a mediator of crosstalk between type IV collagens, CXC chemokines, and TSP1-containing proteins. Note that WISP-1, a TSP1-containing protein, is connected to the type IV collagen family through Bone Morphogenetic Protein 3 (BMP-3). An anti-angiogenic peptide derived from WISP-1 with relatively low anti-proliferative and anti-migratory in vitro activity identified in,5 showed a significant in vivo activity in corneal and laser-induced choroidal neovascularization mouse models.9
Versican (VCAN) is the last protein in the set of centrally located proteins. VCAN is involved in the attachment of endothelial cells to the extracellular matrix. The importance of VCAN in angiogenesis could easily be missed by other methods that only consider the direct interactions. VCAN has only a few physical protein-protein interactions, and it has only one direct interaction with the angiogenesis-associated proteins (i.e. ADAMTS1). Still, our analysis highlights VCAN as a potential component of crosstalk between type IV collagens, CXC chemokines, and TSP1-containing proteins. Using the quantitative comparison shown in Table 1, we confirmed that local approaches like first neighbors (p=0.046) and second neighbors (p=0.11) would have missed VCAN, while non-local approaches like graph diffusion (p=0.008) and betweenness centrality (p=0.006) would have identified the significance of VCAN at the 0.01 level. We identify six proteins at the center of the type IV collagen, CXC chemokine, and TSP1-containing protein network. These proteins, SDC1, SDC2, SDC4, MMP9, CD44, and VCAN, appear to be important components of angiogenesis, based on their position within the angiogenesis-associated network.
Figure 3 reflects the three families of angiogenesis-associated proteins and the putative crosstalk identified between each family. We identified proteins that either directly or indirectly interact with many of proteins from individual families. The association of proteins to angiogenesis-associated families was computed using graph diffusion. By identifying proteins that are well connected to multiple protein families, we identified proteins that are likely to represent conduits of crosstalk between these important angiogenesis-associated families. Statistical analysis and the incorporation of a time series gene expression dataset helped confirm the role of these proteins in angiogenesis.
In our study of type IV collagens, CXC chemokines, and TSP1-containing proteins, we identified many classes of proteins that are known to be associated with angiogenesis such as vascular endothelial growth factor A (VEGFA) as well as other families that receive less attention such as proteoglycans decorin (DCN), aggrecan (ACAN), brevican (BCAN), and versican (VCAN). We also identified six proteins that appear to be at the center of the network between type IV collagens, CXC chemokines, and TSP1-containing proteins. Those proteins are syndecan 1 (SDC1), syndecan 2 (SDC2), syndecan 4 (SDC4), versican (VCAN), CD44, and matrix metalloproteinase 9 (MMP9). These proteins may facilitate crosstalk between type IV collagens, CXC chemokines, and TSP1-containing proteins.
We examined protein-protein interactions (PPI) that are related to three angiogenesis-associated protein families: type IV collagen fibrils, CXC chemokine ligands and TSP-1 domain-containing proteins. To our knowledge, this work represents the first integrated network analysis of these angiogenesis-associated protein families. We identified several proteins that appear to be important mediators of crosstalk, and yet they have received relatively little attention such as the proteoglycans decorin (DCN), aggrecan (ACAN), brevican (BCAN), and versican (VCAN). We identified syndecans at the centre of the network associating type IV collagens, CXC chemokines, and TSP1-containing proteins.
The work was supported by NIH grants R01 HL101200 and R01 CA138264. The authors would like to thank Emmanouil Karagiannis for helpful discussions at the initial stage of the project. We would also like to thank Sofie Mellberg and Lena Claesson-Welsh for use of their time series gene expression dataset.
Competing interests The authors declare no competing interests.
Authors’ contributions CGR implemented the method, performed the analysis, generated the images and wrote the paper. ASP and JSB designed the study and edited the paper.