|Home | About | Journals | Submit | Contact Us | Français|
Bladder Cancer (BC) has two clearly distinct phenotypes. Non-muscle invasive BC has good prognosis and is treated with tumor resection and intravesical therapy whereas muscle invasive BC has poor prognosis and requires usually systemic cisplatin based chemotherapy either prior to or after radical cystectomy. Neoadjuvant chemotherapy is not often used for patients undergoing cystectomy. High-throughput analytical omics techniques are now available that allow the identification of individual molecular signatures to characterize the invasive phenotype. However, a large amount of data produced by omics experiments is not easily accessible since it is often scattered over many publications or stored in supplementary files.
To develop a novel open-source database, BcCluster (http://www.bccluster.org/), dedicated to the comprehensive molecular characterization of muscle invasive bladder carcinoma.
A database was created containing all reported molecular features significant in invasive BC. The query interface was developed in Ruby programming language (version 1.9.3) using the web-framework Rails (version 4.1.5) (http://rubyonrails.org/).
BcCluster contains the data from 112 published references, providing 1,559 statistically significant features relative to BC invasion. The database also holds 435 protein-protein interaction data and 92 molecular pathways significant in BC invasion. The database can be used to retrieve binding partners and pathways for any protein of interest. We illustrate this possibility using survivin, a known BC biomarker.
BcCluster is an online database for retrieving molecular signatures relative to BC invasion. This application offers a comprehensive view of BC invasiveness at the molecular level and allows formulation of research hypotheses relevant to this phenotype.
Urothelial bladder carcinoma (BC) is a common malignancy of the urinary tract system presenting an estimate of 468,351 new bladder cancer casesand 179,753 deaths predicted for the year 2015 by the WHO GLOBOCAN (http://globocan.iarc.fr/Default.aspx) [1, 2]. Albeit variable for individual bladder cancer patients, initial symptoms include hematuria and flank pain [3, 4]. Cystoscopy is the gold standard diagnostic procedure with a reported sensitivity of 62–84% and specificity of 43–98% . This wide variability in sensitivity and specificity indicates a significant inter-operator variability . Non muscle-invasive BC (CIS, Ta, T1) is treated withresection and intravesical therapy whereas muscle-invasive tumors (T2, T3, T4) require radical cystectomy usually combined with systemic cisplatin-based chemotherapy [6–8]. Non-invasive BC comprises two forms: papillary urothelial carcinoma (Ta) and carcinoma in-situ (CIS), both of which can progress to invasive disease of the lamina propria (T1) or the detrusor muscle (T2) . Muscle invasive tumors have been distinguished into three distinct molecular subtypes, basal, luminal, and “p53-like” [9, 10]. Another classification system for the molecular subtypes of muscle invasive BC was carried out using data stored in The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/) and resulted in the report of four clusters termed TCGA I, II, III, and IV as described by Weinstein JN et al. . In addition, the Lund classification reports six different BC molecular subtypes .A recent review compares the different classifications of muscle invasive BC subtypes and concludes muscle invasive BC exhibits significant heterogeneity compared to other tumor types . Moreover, a recent report highlights significant errors in clinical staging of patients with bladder cancer that underwent cystectomy . Obtaining an accurate staging diagnosis is particularly crucial in patient selection for surgical treatments (i.e. cystectomy) and the choice of chemotherapy. Due to the invasive procedure of cystoscopy and in order to improve accuracy in the phenotype detection, blood or urine biomarkers could support clinical assessment . Biomarkers measured at the DNA, RNA and/or protein levels provide the potential to choose optimum surveillance measures and treatment regimens for specific patient populations . A better understanding of muscle invasive BC could be achieved by combining information obtained from individual biomarkers. Such a comprehensive view could help in predicting novel molecular targets and improve disease management.
High-throughput experimental platform technologies range from genomic sequencing to epigenomic, transcriptomic, proteomic and metabolomic profiling to characterize the molecular aspects of clinical phenotypes [17–24]. The advent of these approaches that generate a comprehensive view of the molecular landscape for a biological sample has introduced a paradigm shift in the way diseases are perceived [17, 18, 25]. A variety of datasets for such characterizations have become available for e.g. in Array Express/Gene Expression Omnibus (GEO) for transcriptomics, Human Proteinpedia for proteomics, Human Protein Atlas (http://www.proteinatlas.org/) for immunohistochemically validated proteins, or in large data consolidation platforms such as GeneCards . In regard to disease specific omics data, valuable general sources in oncology include TCGA, Oncomine , and Online Mendelian Inheritance in Man (OMIM) . The TCGA oncology portal currently lists SNPs, methylation data, mutations, mRNAs, miRNAs and proteins relevant to BC. A recent report presents a systems biology approach for the analysis of the MIBC dataset contained in TCGA . Another database for BC that provides molecular features in regard to miRNAs identified in literature (http://bladder.pparser.net/MIRMarkers.php) is also available [29, 30]. In addition, a user-friendly tool called BC-BET is also available and allows the evaluation of gene expression profiles determined by microarray studies across bladder cancer patients.
Though omics profiling has provided an abundance of data, technical boundaries involving incompleteness of the individual molecular datasets together with the static representation of cellular activity limit the insights on molecular processes and their interaction dynamics [32–34]. A large number of biological pathway analysis tools are available, including KEGG , PANTHER , REACTOME  and AmiGO  described in PathGuide (http://www.pathguide.org/), and allow detection of significant metabolic and signaling pathways. Previous omics studies report biomarkers associated with bladder cancer, and therapeutic targets that could allow development of personalized therapies [39–43].
Currently abundant datasets are readily available, but the information gathered from these large number of omics experiments is not fully exploited, as they are either scattered in many publications and databases or held in supplementary data files. In order to allow efficient retrieval of information and offer a global view of BC invasion at the molecular level, we developed BcCluster an open-source Bladder Cancer database (http://www.bccluster.org/).
The purpose of the publication is to describe the database and offer an example on how it can be used in the context of MIBC research with a specific example of its application (survivin).
In order to retrieve molecular features associated with muscle invasive bladder cancer, NCBI PubMed, Web of Science, Google Scholar and the omics repositories Gene Expression Omnibus (GEO)  and ArrayExpress  were queried. The keywords for the literature search included “bladder OR urothelial OR transitional cell” AND “neoplasm OR tumor OR carcinoma” AND “muscle” AND “invas* OR aggress* OR progress* OR inflammation” (Database version of June, 2015). The list of publications relevant to muscle invasion in BC was isolated from the complete list of retrieved papers. Publications were further screened for adequacy in sample size (at least 50 samples included in study design), magnitude of differential abundance (>2-fold change for proteomics, transcriptomics, metabolomics and miRNAs), FDR <0.1 for mutations, p values <0.05 for methylation and –omics studies, in addition to the specific phenotypic conditions; T2a/b, T3a/b, T4a/b. The MIBC-specific molecular features retrieved from the publications comprised of various sources such as DNA-mutations, DNA-methylation, mRNAs, miRNAs, proteins (IHC validated and proteomics) and metabolites. The features were then combined for further systems biology analysis.
For performing systems biology analysis that includes integration of molecular features from various resources, certain filtering steps for the total number of features were taken into account. For the protein-protein interaction and pathway enrichment analysis, microRNA and metabolomic data, although they are an integral part of the database, were not taken into consideration (i.e. official gene symbols mapped from metabolomics and miRNA data, by querying HMDB and miRBase). This choice was imposed by the inability to correlate metabolite and microRNA data with specific proteins. Metabolites are part of biochemicalpathways composed of several reactions that are catal-yzed by many different enzymes, hence it is not possible to match one metabolite with a specific enzyme. Moreover, each microRNA has many target genes and these genes are regulated by different microRNAs. Thus, it is not possible to map a single microRNA to a specific gene. In regard to genomics and epigenetics, we only incorporated those genes that contained information on the protein/mRNA abundance levels. In addition, the total set of features was screened to remove duplicated proteins caused by combining molecular features from various–omics sources. This new filtered set reduced the total feature set and was then subjected to PPI analysis in the context of muscle invasive bladder cancer.
In order to retrieve protein-protein interaction information for the muscle invasive BC associated proteins, IntAct , BioGRID , STRING  and Reactome  were queried. All available human proteins along with the PPI information were downloaded into Cytoscape  to yield the human interactome based on experimental evidence. Then, the proteins relative to MIBC were put on a separate list. MIBC proteins that had at least one binding partner in the list of MIBC specific proteins were retained to generate the muscle invasive BC specific interactome.
To retrieve molecular pathway information for MIBC, proteins from the muscle invasive BC interactome were subjected to pathway enrichment analysis. This analysis used two additional sub-applications from Cytoscape; ClueGO and CluePedia [50–52]. The statistical criterion used in generating molecular pathways included a two-sided hypergeometry test . Information from KEGG  and Reactome  databases was used in retrieving significant pathways associated to MIBC with a Bonferroni corrected p-value <0.05. In addition, the list of pathways was inspected manually and redundant pathway-terms (names) were combined. The filtered list of pathway-terms was then divided into previously known pathways and novel findings in the context of muscle invasive bladder carcinoma .
BcCluster database has been built by selecting the MySQL database system (http://www.mysql.com/), to store all comprehensive information regarding published papers, proteins, protein-interactions and molecular pathways associated specifically to the muscle invasive phenotype. The query interface for BcCluster is a web application developed in the Ruby programming language and using the framework Rails (http://rubyonrails.org/). This framework was opted as it provides an advantage to update the BcCluster database on regular intervals and allow data incorporation for newly published articles relevant in BC research. Supplementary Fig. 1, generated by the Dia software, illustrates the architecture of the BcCluster database. The query interface using the Ruby on Rails framework is detailed in supplementary Fig. 2.
Database sources (Pubmed, Google scholar and Web of Science) and omics profiles (GEO and ArrayExpress) provided a total of 112 unique references for manual screening. These references yielded a total of 1,559 molecular features being significantly associated to muscle invasive bladder cancer (Fig. 1). The features from individual studies included 113 proteins from IHC studies, 424 mRNAs and 280 miRNAs (targeted to official gene symbols, by querying miRBase ). 49 genes were derived from DNA mutations (distinguished into 15 oncogenes and 34 tumor suppressor genes), metabolomics experiments provided 464 molecular features (associated to an enzyme in gene symbols, by querying HMDB ). In addition 79 genes were obtained from DNA methylation studies and 150 proteins were derived from proteomics studies. The list of 112 articles collected in this analysis is provided in Supplementary Table 1.
The dataset associated with miRNAs and metabolites was not used for pathway enrichment analysis. This choice was opted due to the inability to correlate metabolite and microRNA data with specific proteins. In addition, the available bioinformatics tools (Cytoscape) do not allow efficient integration of these datasets with other omics data. Subsequently, the obtained dataset of significant MIBC features was processed in order to eliminate duplicate entries. The new filtered set resulted in a total of 737 proteins that were then subjected to PPI analysis in the context of muscle invasive bladder cancer. To obtain all possible protein interaction information for the 737 muscle invasive bladder cancer specific proteins, PPI databases including IntAct, BioGRID, String and Reactome were queried. Figure 2a, illustrates the steps taken in producing the muscle invasive bladder cancer interactome. In the first step, all available PPIs for the human proteome were downloaded into Cytoscape to form the human interactome. The PPIs relevant to the 737 MIBCproteins were retrieved from the human interactome and only proteins with at least one binding partner were retained (Fig. 2a). This step yielded 435 proteins along with 4,768 PPIs that were used in the pathway enrichment analysis. 302 proteins that did not have a binding partner among the full list of 737 relevant to MIBC were not retained for further analysis.
The 435 proteins that were obtained from the muscle invasive BC interactome (Fig. 2a) were incorporated into ClueGO and CluePedia to identify significant pathways in the context of MIBC. A total of 292 molecular pathways were retrieved in this analysis. This pathway list was manually screened in order to combine redundant pathway-names. This screening step reduced the list of pathways to a total of 92 pathways that were then distinguished into 75 previously known in literature and 17 novel pathway findings in the context to muscle invasive BC (Fig. 2b).
In order to efficiently store and retrieve information in regard to the 112 published articles, 1,559 molecular features, 435 proteins, 4,768 PPIs and 92 molecular pathways associated to muscle invasive bladder carcinoma, a database supported by a web user-interface was developed. The BcCluster web portal (http://bccluster.org/biomolecules/study_type) lists all the bladder cancer associated molecular features based on individual study type and is presented in Fig. 3. The application allows the user to browse specific omics traits to gain information regarding features associated to bladder cancer at different stages and grades of the disease. In addition the PubMed reference ids for these retrieved data are also provided and the application can be queried based on the keywords as described in Table 1.
For selecting proteins based on literature mined IHC validated experiments, each muscle invasive BC-associated protein was evaluated for being a member of the 92 muscle invasive BC enriched pathways. The evidence of identified pathways and extracted proteins involved in muscle invasive BC was assessed based on the number of individual bladder cancer studies that described such proteins. The quality of publications obtained for each protein was assessed by manually reviewing these articles. Only papers where a direct relevance for the protein to bladder cancer muscle invasion was proven were retained. For the entire 92 muscle invasive BC enriched pathways, each relevant pathway was ranked based on the statistical significance (p value). Pathways that appeared at least once in a bladder cancer study were grouped as literature known pathways (containing 75 enriched pathways), while those pathways that were not previously reported in muscle invasive bladder cancer were categorized as novel findings (containing 17 enriched pathways).
Previous studies by Shariat SF et al.  report survivin to be a promising biomarker for cancer diagnosis, prognosis and prediction of response to intravesical or systemic therapies. Urinary survivin tests in BC have been shown to perform better than urinary cytology with a sensitivity of 0.77 (95% confidence interval [CI] 0.75–0.80) and specificity of 0.92 (95% CI 0.90–0.93) [57–60]. Survivin is a protein involved in the inhibition of apoptosis and has been reported to control mitotic progression, in addition to inducing changes in the gene expression of caspases 3, 7 and 9 that are associated with cell invasion in BC [61–63]. Hence, we selected survivin as an example to query BcCluster. The workflow of a simple query to the BcCluster database via the web application is presented in Fig. 4. The search term used was BIRC5. The results to the search provide a list of the data available for the molecular feature in different published studies. The search also displays other important information such as the type of study (e.g. IHC, transcriptomics, etc.), the clinical samples used for the comparison and the relative amount of the gene/mRNA/protein present in these samples. Additionally, the application also displays, if available, a list of the protein-binding partners based on experimental evidence and muscle invasive BC pathways significantly enriched for the specific query term.
Survivin was found to be significantly associated to 3 molecular pathways namely cell cycle, hippo signaling and signal transduction pathways. The 3 molecular pathways presented in Table 2 are sorted according to the corrected p-values. Signal transduction (p-value 2.46 10 - 14) pathway was found to be the most significant followed by Hippo signaling (p-value 4.93 10 - 09), and cell cycle (p-value 2.55 10 - 04). From our analysis, we predict the involvement of survivin to be significantly associated to Hippo signaling pathway in the context of muscle invasive bladder cancer.
The main aim of this study was to develop a database specific for invasive bladder cancer. This was achieved by integrating all possible information publically available from literature and sequencing platforms, PPIs and pathway enrichment resources in the context of MIBC. An example of the database utility is given by extracting information relevant to survivin in BC.
Aggressive MIBC progresses rapidly to metastatic diseases and generates high patient mortality . Radical cystectomy combined with systemic cisplatin-based chemotherapy (either before or after cystectomy) is the current standard of care for high-risk MIBC . Treatment selection depends particularly on clinico-pathologic features. Unfortunately current staging systems are not optimum and yield high rates of clinical under-staging leading to inadequate treatment [9, 14]. Moreover, distinct molecular subtypes in MIBC have also been reported including basal, luminal and interestingly “p53-like” that are resistant to chemotherapy . Understanding the molecular pathophysiology of muscle-invasive bladder carcinoma and elucidating the network of pathways involved in invasion could lead to targeted therapies. In addition, discovery of specific de-regulated pathways linked to progressive disease holds the promise of supporting an improved, biomarker-based risk assessment followed by appropriate clinical intervention. High throughput omics platforms have provided a wealth of information in describing the molecular status of bladder carcinoma [11, 24, 67–69]. In order to contribute to the molecular characterization of invasive BC, we integrated omics data into a database called BcCluster. A database system at the level of networks and pathways was created in order to provide efficient retrieval of information for a single molecular feature (such as mutations, DNA methylation, miRNAs, mRNAs, metabolomics and IHC). Thus, a comprehensive view on the involvement of a feature in BC invasiveness can be obtained. Moreover, researchers can access a list of available high quality data from individual omics platforms (for e.g. transcriptomics data) and perform systems biology analysis (protein-protein interactions and pathway enrichment) in order to characterize bladder cancer at the molecular level. In addition, users can download the data from each study type that is useful to their research. We are willing to offer all the BcCluster data in a compressed format to interested users on request.
An application of the database is formulation of scientific hypotheses that can be validated experimentally, illustrated by the example of survivin. There is ample experimental evidence supporting the role of survivin in the context of BC pathology [70–72]. Our approach in integrating PPI and pathway databases yielded three significant pathways for survivin (cell cycle, signal transduction, and hippo signaling). The expression of survivin positively correlates with the levels of signal transduction proteins EGFR (Epidermal growth factor receptor) and VEGFA (Vascular endothelial growth factor A) in muscle invasive BC . In Reactome, the term “signal transduction pathway” is described as a hierarchy of signaling events that elicit changes in cell state and activity due to the result of extracellular signals. Some of the types of signal transduction pathways include, signaling by RTKs, PDGF, NGF, TGF-beta, JNK and NF-kB, RAF/MAPK and Rho GTPases. In order to determine which of these signal transduction pathways were specifically associated to survivin, we manually queried survivin in the Reactome database (http://www.reactome.org/). The most prominent pathway involving survivin was signaling by Rho GTPases. A transcriptomics study determined that the majority of the Rho family small GTPases (RhoA, RhoB, RhoC, Rac1 and Cdc42) are significantly over-expressed in BC, hence providing experimental validation of our in silico prediction . The role of survivin in cell cycle has previously been reported for MIBC, whereas hippo signaling is a novel finding. Down-regulation of survivin is associated with cell cycle arrest and apoptosis in BC cell lines . The experimental evidence confirms our in silico analysis for the involvement of survivin in BC pathways reported in the literature. Hippo signaling is a conserved pathway that regulates organ size and is implicated in cancer development . Survivin expression was proven to be inversely correlated with hippo signaling in vascular tumors of the lungs and liver . The available information indicates that survivin is likely involved in hippo signaling pathway in the context of BC and experimental validation for this in silico generated hypothesis could be attempted.
In summary, data retrieval from the literature and omics studies resulted in the collection of molecular features associated with muscle invasive BC. Using these features allowed the generation of the muscle invasive bladder cancer interactome and pathways highly associated to the invasive phenotype. The pathway enrichment analysis yielded significant muscle invasive BC pathways that have been previously reported in literature including PIK3 signaling, MAPK, focal adhesion, cell cycle, EGFR and ErbB signaling pathways [11, 14, 76–78]. These studies confirm the validity of our systems biology analysis. With the BcCluster application, we aim towards supporting a better understanding of complex molecular mechanisms in BC invasion. Similar web-based databases on individual molecular disorders such as for lung cancer , breast cancer , liver diseases , kidney disorders  or specifically for personalized medicine  are currently available. These databases provide information on features extracted from various omics studies (ranging from Single Nucleotide Polymorphisms, to transcriptomics, miRNAs, metabolomics, transcriptional binding-factor motifs, PPI networks and pathways). In addition, the features from the individual studies are integrated in a manner to allow researchers to efficiently retrieve molecular information for a specific gene or protein search. In regard to BC, there are limited database resources that allow the molecular characterization of the invasive disease. Publically available BC sources exist, either within the TCGA portal or in databases focused on miRNAs [29–31]. All the molecular features in regard to mutations, mRNAs, miRNAs and proteins from the TCGA-MIBC specific database were incorporated in the BcCluster database. In addition, when comparing the features stored in HLungDb  and BcCluster, the lung cancer database contains 2,797 molecular features by integrating genes and miRNAs, whereas the BcCluster contains 1,559 molecular features comprising of molecular information ranging from DNA-methylation, DNA-mutations, mRNAs, proteins (retrieved from proteomics and IHC experiments) and metabolites. In addition, our database also holds 4,768 PPIs and 92 BC significant pathways enriched from our systems biology analysis.
In the future, we plan to further expand the BcCluster database by updating our dataset with published studies and large-scale omics profiles in regular time intervals. This updated version will allow to refine the generated muscle invasive protein interactome leading to a comprehensive molecular characterization and identification of novel modules in the context of muscle invasive BC.
The research leading to these results has received funding from the Marie Curie Actions –BCMolMed under grant agreement no. FP7-PEOPLE-2012-ITN-EID and the European Community’s Seventh Framework Programme under grant agreement no. 306157.
Harald Mischak is the founder and co-owner of Mosaiques Diagnostics, Germany, and Akshay Bhat is employed by Mosaiques Diagnostics. All authors declare that they have no competing interests.
HM, VJ and AV designed and coordinated the study, JZ and MM provided support in data retrieval, and AB performed the analysis and drafted the manuscript. All authors contributed to the interpretation of the results and drafted the publication along with reading and approving the final manuscript.
The supplementary table and figure are available in the electronic version of this article: http://dx.doi.org/10.3233/BLC-150024.