Protein-protein interaction network analysis
Preeclampsia PPI network topology reveals (Figure
) a similar behavior with respect to general topology of PPI following a power law behavior
] and therefore scale-free properties. These types of networks have the particular feature that some nodes are highly connected compared with others on the same network. These highly connected nodes (hubs) in general, represent important proteins/genes in biological terms and therefore are treated with special attention.
PPI network and topology. Left) PPI network and Right) Degree distribution. The degree distribution follows a power law distribution.
The top 50 genes with high scores and also present in the initial set (347) are shown in Table
, however, other genes were found with high scores value but there are not part of the initial gene group. As expected some of the selected genes like FN1, FLT1, F2, VEGFA, PGF, TNF, NOS and INHBA, are well known preeclampsia relates genes (see discussion) and several of them are related with signaling pathways.
Top 50 genes obtained by analysis of the PPI network
A total of 27 communities (k
3) covering 161 genes were identified by communality analysis. In Figure
(Left) we represent those communities that are superimposed to form a large connected graph. The genes involved in communities overlapping are also highly represented in the Table
(and also the genes members of the large community). The model based clustering analysis reveal an optimal number of 8 clusters (BIC
152192.4) with an ellipsoidal distribution with equal volume, shape and variable orientation. The genes are grouped in the clusters (C1…8) as follow: C1 (67), C2 (56), C3 (1806), C4 (59), C5 (133), C6 (23), C7 (95) and C8 (161). The C8 and C4 correspond with the highest mean scoring value: 393.3 and 348.9 respectively, and contain all the 100 genes with highest score values (part or not of the initial gene set). Furthermore, 161 genes of C8 are also the same genes detected in the communality analysis.
Figure 2 Communality and clusters analysis. Left) Representation of the largest connected community. Red nodes represent the genes involved in communities overlapping. White nodes represent the bigger community. Right) Representation of C8 and C4 clusters and (more ...)
Gene ontology (GO) enrichment analyses were performed in all obtained clusters. However, for simplicity only C4 and C8 are presented (Figure
Right). The GO analysis reveals that C8 comprise several processes related with angiogenesis, apoptosis and cell proliferation and also shared with C4 several processess involved in cell activation and biological adhesion. The relation between these processes as well as the fact that both groups are representative of the highest scored genes could indicate a particular relevance of the clusters in terms of genes-disease relationship. On the other hand, also these processes are well known involved in preeclampsia and are also consistent with the pathway enrichment analysis.
Diseases and metabolic pathway enrichment analysis
Several types of diseases were found statistically significant in the enrichment analysis; partial results are presented in Table
. Obviously, preeclampsia and even hypertension have to be present in the analysis. In the GAD database there are several disease classes and beside the presented in Table
, others like hematologic (p-value
6.1E-06) and renal diseases (p-value
2.5E-04) were also significant. Even when we present only the results obtained with GAD database, analysis was also performed with OMIN database confirming the ovarian cancer (p-value
0.019) and also indicating colorectal cancer as statistically significant (p-value
0.011), however, better results were obtained with GAD and also more consistent with literature information and pathway enrichment analysis.
The diseases enrichment analysis
It is important to consider that several genes in the PPI network do not present a known relation with specific diseases, at least reported in the GAD or OMIN databases. Only around 30% of the 2400 genes were found in the databases. This difficulty means that we have to be cautious with the preeclampsia genes-diseases relationships and with the reliability of the statistical p-value, even when some important and significant inferences, can be made.
A similar situation occurred with the pathway enrichment analysis (Table
). Even when the KEGG database is the most representative of our gene space and high coherence was noticed with the physiopathology of PE, the results only cover around 50% of the initial 2400 genes. A similar procedure was also performed with the Reactome and BioCarta databases with a less covering (37% and 27% respectively) and showing a high coherence with Table
results. These databases reveal other significant pathways like NGF, PDGF, BMP, EPO and EGFR signaling as well as apoptosis and hemostasis pathways (data not shown). Some cancerous pathways (i.e. prostatic, pancreatic and lung) were also found statistically significant in KEGG but were excluded from Table
, in order to simplify, because many of them have similar reactions with the general cancer pathway, already presented in the Table
The KEGG pathway enrichment analysis
In order to simplify and enhance the understanding of the involved pathways and their relationship with the selected hubs, a fusion between both was made (Figure
). However, it is important to exalt that from the 50 hubs previously selected; only 22 present some significant pathway association with Table
. The genes: NDRG1, LGALS3BP, BANF1, SGTA, TRIM29, RGS20, PLEC, GRN, ST13, AKAP5, FSTL3, DST, PKIA, QKI, MLF2 and KRT19, for example, were not found in the KEGG database.
Genes-metabolic pathways interaction. The genes and pathways were selected after hubs detection (see Table
) and enrichment analysis respectively (see Table