|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs are a class of noncoding RNA molecules that co-regulate the expression of multiple genes via mRNA transcript degradation or translation inhibition. Since they often target entire pathways, they may be better drug targets than genes or proteins. MicroRNAs are known to be dysregulated in many tumours and associated with aggressive or poor prognosis phenotypes. Since they regulate mRNA in a tissue specific manner, their functional mRNA targets are poorly understood. In previous work, we developed a method to identify direct mRNA targets of microRNA using patient matched microRNA/mRNA expression data using an anti-correlation signature. This method, applied to clear cell Renal Cell Carcinoma (ccRCC), revealed many new regulatory pathways compromised in ccRCC. In the present paper, we apply this method to identify dysregulated microRNA/mRNA mechanisms in ovarian cancer using data from The Cancer Genome Atlas (TCGA).
TCGA Microarray data was normalized and samples whose class labels (tumour or normal) were ambiguous with respect to consensus ensemble K-Means clustering were removed. Significantly anti-correlated and correlated genes/microRNA differentially expressed between tumour and normal samples were identified. TargetScan was used to identify gene targets of microRNA.
We identified novel microRNA/mRNA mechanisms in ovarian cancer. For example, the expression level of RAD51AP1 was found to be strongly anti-correlated with the expression of hsa-miR-140-3p, which was significantly down-regulated in the tumour samples. The anti-correlation signature was present separately in the tumour and normal samples, suggesting a direct causal dysregulation of RAD51AP1 by hsa-miR-140-3p in the ovary. Other pairs of potentially biological relevance include: hsa-miR-145/E2F3, hsa-miR-139-5p/TOP2A, and hsa-miR-133a/GCLC. We also identified sets of positively correlated microRNA/mRNA pairs that are most likely result from indirect regulatory mechanisms.
Our findings identify novel microRNA/mRNA relationships that can be verified experimentally. We identify both generic microRNA/mRNA regulation mechanisms in the ovary as well as specific microRNA/mRNA controls which are turned on or off in ovarian tumours. Our results suggest that the disease process uses specific mechanisms which may be significant for their utility as early detection biomarkers or in the development of microRNA therapies in treating ovarian cancers. The positively correlated microRNA/mRNA pairs suggest the existence of novel regulatory mechanisms that proceed via intermediate states (indirect regulation) in ovarian tumorigenesis.
Ovarian cancers have a high mortality rate and few treatment options and the failure rate is high . In spite of significant advances in detection and effort to reduce recurrence rates, the five year survival rate has remained relatively unchanged for over 50 years .
The Cancer Genome Atlas (TCGA)  is a public database which provides multi-modal, patient matched data, including microRNA and mRNA expression levels as well as clinical data (survival, recurrence and treatment), for large cohorts in several cancers, including serous cystadenocarcinoma (the most common type of ovarian cancer). In this paper, we apply a validated method  to identify statistically significant microRNA/mRNA regulations which are disrupted in serous cystadenocarcinoma using patient matched data from TCGA. Although mRNA and microRNA dysregulations in ovarian cancer have been identified in various studies [5-15], regulatory mRNA targets of microRNA have not been established. Our analysis proceeds as follows:
We first identify those mRNA or microRNA which can individually and robustly distinguish ovarian cancer samples from normal ovary samples based on expression level. We then identify putative mRNA potentially regulated by the microRNA by matching microRNA/mRNA using seed sequence complementarity from TargetScan http://www.targetscan.org). Finally, we look for a significant correlation/anti-correlation signature between patient matched microRNA/mRNA in tumor samples or normal samples. This procedure allows us to identify microRNA/mRNA regulations which are common (maintained) between tumor and normal tissue as well as microRNA/mRNA regulations that are disrupted in carcinogenesis .
Using several statistical tests (Students T-test with FDR correction, unpaired t-test without FDR correction, and Mann-Whitney test with FDR correction), K-Means Clustering  and principal component analysis , our analysis found 18 microRNA and 49 mRNA which best distinguish tumour from normal. Using seed sequence matches between all putative pairs between these sets (from TargetScan) and Pearson correlation at significance p < 0.05 and < 0.1 (one tailed test) in the tumour and normal samples, respectively, we found forty microRNA/mRNA pairs of potential interest, including fifteen pairs anti-correlated across all tumour samples and seven pairs anti-correlated across the normal samples. Within the anti-correlated pairs, one pair: hsa-miR-140-3p/RAD51AP1, was anti-correlated in both tumour and normal samples with opposite expression levels in tumor/normal, implying the dysregulation of a direct microRNA/mRNA mechanism. In addition, we also identified microRNA/mRNA pairs that were correlated or anti-correlated in the tumour samples but not in the normal samples, suggesting a de novo gain of microRNA function in tumours. Also present were microRNA/mRNA regulations present in normal samples but not in tumour samples, indicating a de novo loss of microRNA function in tumours. Interestingly, there are also pairs identified which show positive correlation in both tumour and normal samples, implying the potential existence of indirect pathways or intermediate regulatory mechanisms with a possible role in ovarian tumorigenesis.
The Cancer Genome Atlas (TCGA) is a central bank for multidimensional experimental cancer data, including MicroRNA and cDNA microarray data. These data were obtained from the Data Access Matrix within the TCGA data portal (http://cancergenome.nih.gov/dataportal/data/access/). cDNA microarray experiments measuring mRNA expression were run on the Affymetrix HG-U133A platform (22,277 probesets). microRNA experiments were performed on the Agilent 8 × 15 K Human microRNA-specific microarray V2 platform measuring the expression of 821 microRNAs. Of the 386 TCGA microRNA data samples (378 tumours and 8 normals) and 294 mRNA data samples (including one cell line experiment, which was eliminated), filters were applied such that only samples withboth microRNA/mRNA data were retained. When this was done, we were left with 290 samples (282 tumors, 8 normals). Outliers were then removed as described below, leaving a final cohort of 264 samples (258 tumors, 6 normals).
The TCGA dataset consisted of 386 samples with microRNA expression data (378 tumours and 8 normals) and 294 samples with mRNA expression data (including one cell line experiment). Of these, we retained only samples which had data for both microRNA and mRNA levels. This resulted in a reduced dataset with 290 total samples (282 tumours, 8 normals). Further removal of samples with ambiguous class labels with respect to consensus ensemble clustering reduced this to 258 tumour samples and 6 normals (see below).
The mRNA expression data was normalized using MAS5.0 summarization. We then standard normalized the data: The mean of the summarized intensities for each gene across all experiments was subtracted from the individual intensity for that same gene within a given experiment (per-gene mean subtraction). This value was then divided by the standard deviation of the summarized intensities for each gene across all experiments.
We further normalized the data by performing a 75th percentile shift using GeneSpring GX 11.0 (Agilent Technologies, Inc., Santa Clara, CA, USA). In order to compare our data with previous literature , these values were then per-gene mean subtracted and divided by the per-gene standard deviation (similar to the mRNA normalization) to produce normalized measurable microRNA expression values.
An ambiguous tumour/normal sample is defined as one that does not robustly cluster with members of its labelled class (tumour or normal). Such samples were identified and eliminated using both mRNA and microRNA data. This was done by using consensus ensemble K-means clustering, using the public software ConsensusCluster (http://code.google.com/p/consensus-cluster/). mRNA expression data for the 290 samples (282 tumours and 8 normals) was clustered via unsupervised K-Means consensus clustering into two classes . Most of the tumour samples clustered in a manner consistent with their labels (tumours with tumours and normal with normal) across the dataset. However, 20 samples clustered in an ambiguous way (sometimes with normal and sometimes with tumours) and were eliminated (Figure (Figure1),1), leaving 270 samples for to be analysed for their microRNA levels.
We found that microRNA levels varied over a much smaller range than mRNA levels. Hence a similar analysis of the microRNA data required that uninformative microRNA be removed first. We therefore eliminated microRNA whose expression had little or no variation across tumour/normal sample clusters, retaining only those whose expression levels could significantly distinguish tumour samples from normal samples. This was done using three statistical tests:
a) A Students' unpaired t-test with a Benjamini Hochberg FDR multiple testing correction (p < 0.05) between the tumour and normal samples within GeneSpring GX 11.0;
b) A Mann-Whitney test with a Benjamini Hochberg FDR multiple testing correction (p < 0.05) between the tumour and normal samples within GeneSpring GX 11.0;
c) A Students' unpaired T-test without a multiple testing correction (p < 0.05).
Only microRNA that passed all three tests were retained, which resulted in a robust subset of 127 microRNA (Figure (Figure22 and Additional file 1: Table S1) whose expression levels significantly separated tumour samples from normal samples. Consensus ensemble K-means clustering into two clusters using expression levels of these 127 microRNA was performed on the 270 samples that remained after analysis of the mRNA data (see above). This identified six additional samples that did not retain consistent cluster membership across bootstrap samplings of the data. These six samples were also removed.
These two analyses on mRNA and microRNA levels identified our final sample set of 264 samples (258 tumours and 6 normals) which are listed in Additional file 2: Table S2. We note in that our analysis showed that two of the TCGA samples which are labelled as "normal" (TCGA-01-0628-11 and TCGA-01-0631-11) consistently clustered with the tumour samples, suggesting that they might contain a significant contamination from the tumour. These were eliminated from further analysis.
After removal of these ambiguous samples, the ConsensusCluster software was used to perform principal component analysis (PCA) using expression levels of the 127 microRNAs. PCA analysis was performed 43 times, using the six remaining normal samples and six randomly selected tumour samples (without replacement) as input. The first two principal component eigenvectors (PC1 and PC2) from these 43 runs were averaged. Figure Figure33 shows the PCA plot obtained on projecting the samples onto the two averaged eigenvectors. A robust subset of 18 microRNA was identified as those which appeared most often (35 times out of 43) in these datasets as significantly able to distinguish tumour from normal using a signal-to-noise ratio (SNR) test using SNR > 0.5 as a cutoff. These 18 microRNAs with the most significant values of this eigen-score across the 43 datasets were retained for further analysis. A similar procedure was used for the mRNA, once again validating (using PCA) that the normal samples indeed separate from the tumour samples (Figure (Figure4).4). Furthermore, upon iterating 43 times once again, the SNR filtering provided 53 genes that were most informative in separating tumour from normal samples in a robust manner, with each gene appearing in at least 20 of the 43 lists generated by ConsensusCluster.
TargetScan (http://www.targetscan.org) was used to obtain putative mRNA targets for all 18 microRNAs that best distinguished tumour samples from normals. In the 18 × 53 matrix of microRNA/mRNA pairs, we retained those where the microRNA had a seed sequence in TargetScan which was identified as either 'conserved' or 'poorly conserved'. For each such microRNA/mRNA pair, we computed the Pearson rank correlation function separately within the normal and tumour samples. We retained those which had a p-value significance (http://danielsoper.com/statcalc3/calc.aspx?id = 44) < 0.5 for this statistic in tumour samples, and < 0.1 in the normal samples.
At the time of this study, the TCGA ovarian cancer data set consisted of 282 tumour samples and 8 non-matching normal samples with both microRNA and mRNA expression data available. The microRNA data consists of expression levels for 821 microRNA measured on an Agilent 8 × 15 K Human miRNA-specific microarray platform. The gene expression levels were measured on the Affymetrix GeneChip Human Genome HG-U133A array platform consisting of 22,277 probes. Clinical and survival data were also available for all samples, including: age, days to death (if applicable), days to last follow-up, days to tumour progression, days to tumour recurrence, and age at initial pathologic diagnosis. As described in the Methods section (above), after normalizing, removing ambiguous samples and finding the optimum set of microRNA and mRNA, we were left with 258 tumor samples, 6 normals, 18 microRNA and 53 mRNA.
Of the 18 microRNA that robustly distinguished tumours from normals, nine (hsa-miR-183*, hsa-miR-15b*, hsa-miR-15b, hsa-miR-590-5p, hsa-miR-18a, hsa-miR-16, hsa-miR-96, hsa-miR-18b, and dmr_285) were up-regulated and nine (hsa-miR-145*, hsa-miR-143*, hsa-miR-34b*, hsa-miR-140-3p, hsa-miR-145, hsa-miR-139-5p, hsa-miR-34c-3p, hsa-miR-133a, and hsa-miR-34c-5p) were down-regulated in tumour compared to normal. K-means clustering using only these 18 microRNA levels confirmed that these microRNA clearly separate tumour from normal samples (Figure (Figure55).
Of the 53 mRNA probes which robustly separated tumour from normal samples, 44 (BUB1, TPX2, CDC25C, ASPM, C1orf112, KIF23, CENPA, HJURP, CCNA2, TTK, CCNB2, C12orf48, BIRC5, RAD51AP1, RACGAP1, MELK, KIFC1, NCAPG, EXO1, KDM2A, EHMT2, DNA2, E2F3, C8orf30A, FAM64A, CORO1B, HEATR3, NCAPH, PSMB2, ERCC6L, KIF15, ESPL1, RANGAP1, KIF11, SCAMP5, NUSAP1, GINS1, ZWINT, ASF1A, an unannotated probe 217205_at, and two probes each for TOP2A and AURKA) were up-regulated and 9 (DNAH3, C6, ADH6, SERPINA6, GCLC, DNAI1, SPARCL1, and two probes for DNAH9) were down-regulated in tumour compared to normal. K-means clustering of using only data from these 53 mRNA confirmed that these mRNA clearly separate tumour from normal samples (Figure (Figure6).6). We note that the sub-clusters in Figure Figure66 suggests the presence of two (or more) distinct disease subtypes within the tumour samples. We will further explore these potential disease subclasses in a subsequent paper. Figure Figure77 shows a heat map of the data projected on the 53 genes that best separate tumour samples from normal samples, and shows that they define a highly accurate signature for this separation.
Of the 18 × 53 possible microRNA/mRNA pairs, 69 pairs showed seed sequence complementarity and conservation in TargetScan. The Pearson Rank test at p < 0.5 identified 21 significant pairs (Table (Table1)1) in the tumour samples. Of these, fifteen pairs (including nine mRNA) showed a significant anti-correlation signal while six showed a significant positive correlation signal. A similar analysis of the normal samples, at p < 0.1, identified nineteen significant pairs (Table (Table2)2) of which seven (with six distinct mRNA) exhibited anti-correlation and twelve were positively correlated.
We found that of the fifteen microRNA/mRNA pairs anti-correlated in tumour, one (hsa-miR-140-3p/RAD51AP1) was anti-correlated and fourteen either showed no correlation or were positively correlated across the normal samples. Furthermore, of the six positively correlated pairs in tumours, three showed no correlation and three showed positive correlation within the normal samples, implying a potential indirect mechanism of ovarian tumorigenesis. Of the seven pairs anti-correlated in normal samples, all (except miR-140-3p/RAD51AP1) showed no correlation across the tumour samples, implying a potential de novo loss in microRNA function in tumours. Of the twelve pairs which are positively correlated in normal samples, two show anti-correlation within the tumour samples, implying a potential de novo gain in microRNA function in tumours. The remaining ten pairs, which are positively correlated in normal samples, showed either a positive correlation or no correlation in tumour samples. These are of interest because they suggest some novel indirect mechanisms in ovarian tumorigenesis.
The methods we applied find potential functional relationships that are good candidates for experimental validation. Many of these microRNA/mRNA pairs impact critical cellular functions that are frequently dysregulated in cancer. For instance, RAD51AP1, a gene that functions in double-stranded DNA repair, is also positively expressed in cervical cancer  and breast cancer cell lines , cancers that exhibit high expression homogeneity with malignant ovarian carcinomas. Our analysis shows that RAD51AP1 is also up-regulated in ovarian cancers. Furthermore, hsa-mir-140-3p, a potential regulator of RAD51AP1 according to our anti-correlation analysis, has been previously shown to be down-regulated in ovarian cancer , a finding which we confirm. Interestingly, the anti-correlated relationship exhibited with this pair is also present in the normal samples. This suggests that if RAD51AP1 is acting as an oncogene in ovarian tumours then targeting hsa-mir-140-3p may be a way to target RAD51AP1.
The E2F transcription factor 3 (E2F3), a gene that is up-regulated in serous ovarian carcinomas, regulates crucial cell cycle and tumour suppressor genes [21-30]. We find that this gene is over-expressed in ovarian tumours. Moreover, hsa-miR-145, a microRNA that is highly down-regulated in ovarian cancer [5,8,9], shows significant expression anti-correlation with E2F3 in normal samples, indicating potential transcriptional repression by hsa-miR-145 in normal cells but loss of this function in tumours. Another potential regulator of E2F3, hsa-miR-139-5p, is also predicted to target Topoisomerase IIα (TOP2A), a gene encoding an enzyme which is involved in altering DNA topology, including chromosome condensation, chromatid separation and the relief of torsional stress occurring in transcription and replication. TOP2A has also been shown to be overexpressed in ovarian tumours  and is currently a common target in ovarian cancer clinical trials. Glutamate-cysteine ligase (GCLC), an enzyme of glutathione synthesis predicted to be regulated by hsa-miR-133a in tumours (in a de novo gain of microRNA function) and by hsa-miR-140-3p (in a de novo loss of microRNA function), has been shown to be an anti-apoptotic gene that is positively expressed in ovarian cancer cell lines [32,33].
In addition to many of the genes above which have normal microRNA regulation disrupted by a loss of microRNA function (Table (Table1)1) in tumours, several genes regulated by microRNAs with de novo gains in function exist as well. Baculoviral inhibitor of apoptosis repeat-containing 5 (BIRC5), or Survivin, is a protein predicted to be regulated by hsa-miR-96 that inhibits caspase activation  leading to negative regulation of apoptosis . Furthermore, increased presence of this protein has been shown to decrease apoptosis in cisplatin-sensitive ovarian carcinoma cells , a finding that is consistent with the up-regulation of this gene within the TCGA tumour data set. Additionally, another gene found to be up-regulated in tumours in our dataset, Rac GTPase-activating protein 1 (RACGAP1) is an enzyme whose overexpression has also been associated with serous ovarian carcinoma .
We also found six positively correlated microRNA/mRNA pairs with matching seed sequences in tumour and eleven pairs in normal. We reasoned that a likely explanation for this lies in an intermediate regulatory mechanism that is probably anti-correlated with the microRNA of interest. Consequently, it is important to note that some of the positively correlated genes/microRNA pairs also impact important cellular functions frequently dysregulated in cancer. Also noteworthy is that three of the pairs show positive correlation in both the tumour and normal samples. This finding suggests that while regulation of gene expression by these microRNA is indirect, physiological differences between tumour and normal samples is directly dependent on the changes in expression by these microRNA and the downstream effects they have on the target mRNA they are paired with. For instance, nucleolar and spindle associated protein 1 (NUSAP1), is a nucleolar-spindle-associated protein that plays a role in spindle microtubule organization  that is positively correlated with hsa-miR-18a in normal but not in tumour. It has also been previously shown to be positively regulated in cervical carcinoma [19,36]. Another gene of potential interest is cell division cycle 25 homolog C (CDC25C), a critical gene involved in cell division that dephosphorylates cyclin B-bound CDC2 (CDK1) and triggers entry into mitosis . It might also have a role in suppressing p53-induced growth arrest and has been shown to be overexpressed in ovarian cancer cell lines [27,38]. Finally, GINS complex subunit 1 (GINS1), which was shown to be significantly positively correlated with hsa-miR-18a and hsa-miR-18b in both tumour and normal samples (and positively correlated with hsa-miR590-5p in normal samples), represents a key component of the GINS complex that is essential for initiation of DNA replication and is positively regulated in serous ovarian carcinoma . Interestingly, hsa-miR-18a, which possesses a significant positive correlation with GINS1 (along with NUSAP1) has shown to be highly up-regulated in serous ovarian carcinoma both previously  and within our analysis.
Hsa-miR-16, mentioned previously to be anti-correlated in tumours with several potential mRNA targets including E2F3, C8ORF30A, and SCAMP5, is significantly positively correlated with CDC25C. Should seed sequence complementarity and conservation between these positively correlated pairs be purely coincidental, the relationships between the pairs of positively correlated entities suggest an elaborate mechanism by which these oncogenes/tumour suppressors operate. Further wet-lab validation may show this to be a significant ovarian cancer biomarker. This finding may further elucidate a broader cancer pathway involving CDC25C and some of its target tumour suppressor and cell cycle genes.
Using microRNA/mRNA expression data for 258 tumor samples and six normal samples from TCGA, we have uncovered forty mRNA/microRNA regulation which meet the following criteria: a) These pairs have seed sequence complementarity b) They are able to clearly differentiate ovarian tumours from normal ovary by their expression levels; c) They exhibit a strong anti-correlation or correlation signature within either the tumour samples, the normal samples, or both.
The novelty of our finding is that we have significantly reduced the space of possible dysregulated microRNA/mRNA pairs (based on seed sequence complementarity alone) to a robust subset which may potential represent functional relationships. Such a possibility can be tested experimentally on ovarian cancer cell lines knock down and/or knock in assays. If in addition, some of these dysregulations are associated with survival pathways for the tumor or with resistance to therapy, it opens up the possibility of patient specific targeted therapy.
All expression data is available for download at The Cancer Genome Atlas Data Portal (http://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp).
The authors declare that they have no competing interests.
GM obtained the data, performed most of the analysis, implemented the methods, interpreted the results, and wrote the manuscript, MS performed portions of the analysis, suggested alternative interpretations/analyses to perform, GB, LR, GR helped design and supervise the study. All authors have read and approved the final manuscript. The authors declare that they have no conflict of interest with any of the contents of this manuscript.
Table S1 MicroRNAs separating tumour from normal samples. 127 microRNAs are significantly differentially expressed and robustly separate tumour samples from normal samples.
Table S2 Samples used for analysis. This table identifies the 258 tumor samples and 6 normal samples with matched microRNA/mRNA data from the TCGA sample set which were retained for analysis ambiguous samples were removed using consensus ensemble clustering (see Methods for details).
We thank NIH/NCI for allowing us access to the TCGA datasets. Without this access, the present work would not have been possible.