SVD Analysis of miRNA-mRNA Interaction Matrices
Before inferring the miRNA-mRNA network modules for PPC and MPC, we tried to obtain a general understanding of the differences between those two correlation matrices calculated from the miRNA/mRNA expression levels of the primary and metastatic tumor samples, respectively (see the Method
section for details). However, due to the enormous number of potential miRNA-mRNA regulatory or co-expressed relationships and the complicated interplay among them, together with the noises introduced into the sampling and measurement process, the pairwise comparison of the corresponding matrix elements was too trivial to reach a conclusion. We circumvented this obstacle by employing a novel method. More specifically, we treated each correlation matrix as a pseudo dataset with the rows (mRNAs) as “observations” and the columns (miRNAs) as “features”, and characterized it by conducting SVD (Singular Value Decomposition) analysis 
. Considering the values of correlation coefficients were in the range of [-1, 1], we transformed the matrix entries using Fisher’s r-z method prior to the decomposition so that the results can be explained by the standard statistical theory.
This analysis led to two interesting findings. First, for both PPC and MPC correlation matrices, the leading latent factor can explain over half of the variance in data (-A1, -A2), and the scores (in the first left singular vector) across genes (-B1, -B2) show a clear two-peak distribution. Second, the PPC matrix differs from the MPC matrix by a substantial second latent factor that explains ~20% of the variance and has an asymmetric score (in the second left singular vector) distribution (-C1), which deviates from the symmetric counterpart for MPC (-C2). The strength of this analysis is further highlighted by the fact that the observed patterns are not present in a random data set as shown in -A3, -B3 and C3, where SVD analysis was conducted on a matrix generated by shuffling the rows and columns of the PPC matrix. While the biological implications need to be further investigated, these findings preliminarily confirmed our initial hypothesis that the dynamic interference of miRNAs in those two types of prostate cancer can be different and worth further exploration.
The singular value decomposition (SVD) analysis of miRNA-mRNA transcriptional correlation matrices.
It is worth noting that, in preparing , we used the information of 58 cancer miRNAs collected in 
. However, the presented results largely held when the SVD analysis was conducted on the entire datasets (matrices). In addition, we recognize that the differences between the PPC and MPC can be determined by a core set of miRNAs. This perception is based on an additional analysis which showed that the patterns demonstrated in also held on the sub-matrices with only the tumor suppressor miRNAs, but not on the sub-matrices containing the oncogene miRNAs exclusively. We will continue to investigate this problem in future research.
miRNA-mRNA Modules in different Prostate Cancer Subtypes
Two miRNA-mRNA networks (PN and MN) were generated by a correlation-based method for PPC and MPC, respectively. In order to focus the analysis on the cancer-related miRNAs, the dimensions of PN and MN were further reduced as presented in the Method
section. With the condensed networks as inputs, major cancer-related miRNA-mRNA modules were identified by a clustering analysis-based algorithm. Within an individual module, each miRNA or mRNA has at least two connections with the corresponding modular mRNAs or miRNAs.
As summarized in ,
and , we identified 5 miRNA-mRNA module pairs (MPs) for PPC and MPC, respectively. Each MP includes one positive-connection (correlation) module and one negative-connection (correlation) module. The number of miRNAs or mRNAs (genes) in each module varies from 2 to 8 or from 6 to 622. Most of the modules contain the sequence-specific DNA binding transcription factor (TF) genes 
that may take roles as the mediators for the miRNA-mRNA connections. The two members (such as p-modu-1-ps
in PPC) of each MP contain the same miRNAs but different mRNAs. Two modules of distinct MPs (such as p-modu-1-ps
in PPC) consist of different miRNAs and varied (or partially overlapped) mRNA sets. Within a positive or negative-connection module (with “-ps
” or “-ne
” extension in the IDs), the correlations of the involved miRNAs and mRNAs at the expression levels are consistently positive or negative. Regardless of the connection type, the mRNA set in each module largely imitates a co-expressed gene cluster. Besides the four modules of the two MPs (p-modu-3-ps
) and m-modu-4-ps
)) in which most miRNAs belong to the let-7 family, the modules for MPC hardly overlap with the modules for PPC in terms of the involved miRNAs () and protein-coding genes (Table S1
). Advanced insights regarding the differences between PPC and MPC can be further inferred by a scrutiny of the biological implications of the identified modules.
The summary of miRNA-mRNA correlation-network modules.
Identifying miRNA-mRNA correlation-network modules in primary prostate cancer (PPC) by the hierarchical clustering algorithm.
Identifying miRNA-mRNA correlation-network modules in metastatic prostate cancer (MPC) by the hierarchical clustering algorithm.
Here we need to point out that the identified modules are not necessarily the canonical regulatory modules in which all the connections between miRNAs and genes are determined by the causal regulator-target relationships 
. In fact, due to the unsolved problem in accurately recognizing the miRNA target sites 
, it is challenging to identify a nontrivial and exact canonical regulatory module. Nonetheless, we can expect to find a less strict regulatory module, called a “semi-canonical” regulatory module hereafter, where the relationship potentially determined by the miRNA-mediated mRNA degrading mechanism is predominant among the miRNA-mRNA connections. Based on this widely-accepted theory, semi-canonical regulatory modules, if exist, should be among the negative-connection modules, and then can be determined by the target site enrichment analysis. We did this exploration by establishing the miRNA-mRNA sequence affinity matrix and conducting Fisher’s exact test (see the Method
section). As a result, we found that p-modu-5-ne,
a module identified for PPC, was the only semi-canonical module where the 3′UTR sequences of the involved 622 mRNAs were significantly enriched (p<1.0E−4) with the target site motifs of the 7 modular miRNAs (hsa-miR-106b, -200c, -19b, -92a, -92b, -93, and -141) (). This module, therefore, represents the most significant difference between PPC and MPC in terms of the observed miRNA-mRNA correlations. It suggests that the post-transcriptional regulation mediated by the documented cancer miRNAs directly contribute to the expression variability of the protein-coding genes across the tumor samples for PPC but not for MPC.
miRNA target site enrichment analysis for module p-modu-5-ne.
The implications of the identified modules need to be further inferred through the functional annotations of the modular genes, since there is a great biomedical interest in elucidating the relationships between the activity of modular miRNAs, as regulators or biomarkers, and the variability of the specific biological processes. Using the David database 
, we found that over 330 GO terms and KEGG pathways, including the TGF-beta signaling pathway, were over-represented (BH adj.p<0.05) with the 622 genes of the semi-canonical module p-modu-5-ne
. The genes within other individual modules also demonstrated the significant functional similarity. For example, m-modu-4-ne,
a negative-connection module discovered for MPC, was enriched with the genes in the GO:0007186~G-protein coupled protein signaling pathway (BH adj.p
1.14E−40). Table S2
summarized the functional enrichment analysis results of the modular genes.
We also examined the differences between these two prostate cancer subtypes by studying the distributional profile of the KEGG pathways over-represented in the individual modular gene set. As shown in , module p-modu-4-ps
for PPC is unique among the identified modules regarding the functions of the modular genes. About two dozen of the pathways, such as type I diabetes mellitus, are over-represented in its gene list and most of them are related to disease process and/or immune response. Three miRNAs (hsa-miR-146a, -150 and -223) are included in this module. The involvement of miR-146a in the immune response has been widely investigated. 
showed that miR-146a is a modulator of IL-2 and activation-induced cell death in lymphocytes. 
reported that it mediates an inflammatory circuit in Alzheimer disease and stressed human brain cells. Another study 
also demonstrated that HSV-1 infection in human brain cell can induce the expression of miR-146a. Apparently, these findings not only indicate that miR-146a takes a functional role in immune response as a regulatory factor, but also suggest that its dynamic activity (expression), in some specific contexts, may just serve as the passenger of disease and/or immune processes 
. This is exactly the main message conveyed by the module p-modu-4-ps
where the positive correlations between the modular miRNAs and genes could not be simply explained by a regulator-target mechanism.
Over-represented KEGG pathways in the individual modular gene sets of 10 major miRNA-mRNA modules discovered for primary prostate cancer (PPC) or metastatic prostate cancer (MPC).
Olfactory receptors (ORs) are expressed not only in the sensory neurons of the olfactory epithelium, but also in various other tissues where their potential functions are largely unknown. In a recent publication, the authors reported that the activation of an olfactory receptor (PSGR) inhibits proliferation of prostate cancer cells 
. The results from our analysis show that olfactory transduction (pathway) is over-represented in the gene list of 4 major modules (m-modu-2-ne
) in MPC (). The involved miRNAs include hsa-miR-200a/-200b, hsa-miR-15a/26a/29c, hsa-miR-7a/-7e/-7f/-98, and hsa-miR-107/-26b. Although the miRNA set in p-modu-3-ps (-ne)
largely overlapped with that in m-modu-4-ne
, no module in PPC has a functional relationship with olfactory conduction. Therefore, we speculate that the activation of PSGR and the (direct or indirect) association with miRNAs are only confined to MPC.
Focal-adhesion kinase (FAK) is an important mediator for growth-factor signaling, cell proliferation, cell survival and cell migration. Mouse models have shown that FAK expression is increased in human tumors 
. A recent study demonstrated that focal adhesion controls prostate cancer progression 
. In this study, we found that miRNAs interfere in the transduction of FAK signaling, thus may take roles in cancer development. As shown in , focal adhesion (together with 6 related KEGG pathways such as ECM-receptor interaction) is over-represented in the gene lists of 3 major modules identified for PPC (p-modu-1-ps
). Opposite to the cases of olfactory transduction for MPC, the association of by a hierarchical clustering algorithm FAK signaling with miRNAs seems limited to PPC. These findings indicate another major difference between the two cancer subtypes. Experimental investigation on this issue could be promising for the diagnosis of prostate cancer.
Potential Interference of miRNAs in TGF-beta Signaling Pathway
The transforming growth factor-beta (TGF
) maintains tissue homeostasis and plays a crucial role in the suppression of the proliferation of cancer cells 
. As mentioned above, the TGF-beta signaling pathway is over-represented in the gene (mRNAs) set of the semi-canonical regulatory module, p-modu-5-ne
. Of the 622 modular genes, a dozen of them encode proteins in the pathway. These 12 genes demonstrate 56 negative-connections with the 7 modular miRNAs, and nearly one third of the connections are compatible with the potential regulator-target relationships determined by the sequence affinity information (). Based on this finding and the theories described in 
(pages 198–224), we generated a hypothesized model () to show the interference of the modular miRNAs with the TGF-beta signaling and the proliferation of cancer cells. In , the genes negatively correlated with the miRNAs in the modular network are highlighted in red. Gene CDKN1A is also marked in red because of its significant negative correlations (p<0.001) with hsa-miR-93, -106b and −200c at the expression level. Genes E2F5, CMYC (MYC) and CDK4 show an apparent pattern of positive transcriptional correlations with the modular miRNAs and are highlighted in yellow. The relationships presented in suggest that the modular miRNAs interfere with the disease process of the primary prostate cancer by an oncogenic mechanism in the measured PPC samples. More specifically, the expression of those miRNAs inversely regulates the transcript intensity of two GF-beta genes, TGFBR1 and TGFBR2, and a cancer suppression gene SMAD3. Then, through the activation or inactivation of the Smad2-3/Smad4/E2f complex, this effect is reflected on the expression levels of CDKN1A and CMYC, and finally influences the cell cycle arrest and cell proliferation inhibition.
Modular miRNA-mRNA network for TGF-beta signaling pathway.
The interference of miRNAs in module p-module-5-ne in TGF-beta signaling pathway
Expression Variability of miRNAs in Module p-modu-5-ne
As discussed above, in p-modu-5-ne,
the identified miRNAs directly regulate the transcript intensities of the modular mRNAs in the PPC samples. In this regard, it is important to elucidate the etiology of the biological variability (across tumor samples) of the miRNA expression levels that determine the miRNA-mRNA connections. First, we noted that most of the identified miRNAs, including hsa-miR-19b, -92a, -93 and -106b, have been reported as the targets of the transcription factors of the E2F family 
. Meanwhile, we also observed the positive connections between these miRNAs and E2F5 at the expression levels ( and
). Therefore, there was a possibility that the biological variability of the miRNAs was due to the regulation by E2F5. However, such a mechanism needs to be further investigated since the actual picture of the regulation and/or mutation of the TF gene itself is still not clear. Next, we asked if the modular miRNA expression levels were related to the progression of prostate cancer. To investigate this issue, we grouped the 98 PPC samples on the expression levels of the 7 modular miRNAs by a hierarchical clustering algorithm and compared the result with the Gleason score-based classes. No association was found between the two partitions. Finally, we proposed and tested a hypothesis that the biological variability of the miRNAs was sourced from the regulation by the protein-coding genes (including cancer genes) on which mutations sporadically occurred in individual tumor samples. By clustering analysis, we firstly generated two partitions of the 98 PPC samples, respectively based on the expression levels of all the 7 module miRNAs and the transcript intensities of 34 potential cancer-driving genes in the tumor samples as shown in the figure-1 of 
. Then by a Chi-square test, we found the association between the two partitions was extremely significant (p<0.001), indicating our hypothesis can be confirmed in this way.
The transcriptional correlations of miRNAs in modules p-module-5-ne (ps) with TGF-beta signaling pathway downstream genes.