To truly achieve personalized medicine in oncology, it is critical to catalog and curate cancer sequence variants for their clinical relevance. The Somatic Working Group (WG) of the Clinical Genome Resource (ClinGen), in cooperation with ClinVar and multiple cancer variant curation stakeholders, has developed a consensus set of minimal variant level data (MVLD). MVLD is a framework of standardized data elements to curate cancer variants for clinical utility. With implementation of MVLD standards, and in a working partnership with ClinVar, we aim to streamline the somatic variant curation efforts in the community and reduce redundancy and time burden for the interpretation of cancer variants in clinical practice.
We developed MVLD through a consensus approach by i) reviewing clinical actionability interpretations from institutions participating in the WG, ii) conducting extensive literature search of clinical somatic interpretation schemas, and iii) survey of cancer variant web portals. A forthcoming guideline on cancer variant interpretation, from the Association of Molecular Pathology (AMP), can be incorporated into MVLD.
Along with harmonizing standardized terminology for allele interpretive and descriptive fields that are collected by many databases, the MVLD includes unique fields for cancer variants such as Biomarker Class, Therapeutic Context and Effect. In addition, MVLD includes recommendations for controlled semantics and ontologies. The Somatic WG is collaborating with ClinVar to evaluate MVLD use for somatic variant submissions. ClinVar is an open and centralized repository where sequencing laboratories can report summary-level variant data with clinical significance, and ClinVar accepts cancer variant data.
We expect the use of the MVLD to streamline clinical interpretation of cancer variants, enhance interoperability among multiple redundant curation efforts, and increase submission of somatic variants to ClinVar, all of which will enhance translation to clinical oncology practice.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-016-0367-z) contains supplementary material, which is available to authorized users.
Cancer genomics; Somatic variant interpretation; Data standard; Somatic variant curation
G-DOC Plus is a data integration and bioinformatics platform that uses cloud computing and other advanced computational tools to handle a variety of biomedical BIG DATA including gene expression arrays, NGS and medical images so that they can be analyzed in the full context of other omics and clinical information.
G-DOC Plus currently holds data from over 10,000 patients selected from private and public resources including Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA) and the recently added datasets from REpository for Molecular BRAin Neoplasia DaTa (REMBRANDT), caArray studies of lung and colon cancer, ImmPort and the 1000 genomes data sets. The system allows researchers to explore clinical-omic data one sample at a time, as a cohort of samples; or at the level of population, providing the user with a comprehensive view of the data.
G-DOC Plus tools have been leveraged in cancer and non-cancer studies for hypothesis generation and validation; biomarker discovery and multi-omics analysis, to explore somatic mutations and cancer MRI images; as well as for training and graduate education in bioinformatics, data and computational sciences. Several of these use cases are described in this paper to demonstrate its multifaceted usability.
G-DOC Plus can be used to support a variety of user groups in multiple domains to enable hypothesis generation for precision medicine research. The long-term vision of G-DOC Plus is to extend this translational bioinformatics platform to stay current with emerging omics technologies and analysis methods to continue supporting novel hypothesis generation, analysis and validation for integrative biomedical research. By integrating several aspects of the disease and exposing various data elements, such as outpatient lab workup, pathology, radiology, current treatments, molecular signatures and expected outcomes over a web interface, G-DOC Plus will continue to strengthen precision medicine research. G-DOC Plus is available at: https://gdoc.georgetown.edu.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-016-1010-0) contains supplementary material, which is available to authorized users.
Bioinformatics; Translational research; Precision medicine; Cloud computing; Variant analysis; Next generation sequencing; Outcomes research; Genotype-phenotype integration
Serum metabolite profiling in Duchenne muscular dystrophy (DMD) may enable discovery of valuable molecular markers for disease progression and treatment response. Serum samples from 51 DMD patients from a natural history study and 22 age-matched healthy volunteers were profiled using liquid chromatography coupled to mass spectrometry (LC-MS) for discovery of novel circulating serum metabolites associated with DMD. Fourteen metabolites were found significantly altered (1% false discovery rate) in their levels between DMD patients and healthy controls while adjusting for age and study site and allowing for an interaction between disease status and age. Increased metabolites included arginine, creatine and unknown compounds at m/z of 357 and 312 while decreased metabolites included creatinine, androgen derivatives and other unknown yet to be identified compounds. Furthermore, the creatine to creatinine ratio is significantly associated with disease progression in DMD patients. This ratio sharply increased with age in DMD patients while it decreased with age in healthy controls. Overall, this study yielded promising metabolic signatures that could prove useful to monitor DMD disease progression and response to therapies in the future.
Stem cell antigen-1 (Sca-1) is used to isolate and characterize tumor initiating cell populations from tumors of various murine models . Sca-1 induced disruption of TGF-β signaling is required in vivo tumorigenesis in breast cancer models [2, 3-5]. The role of human Ly6 gene family is only beginning to be appreciated in recent literature [6-9]. To study the significance of Ly6 gene family members, we have visualized one hundred thirty gene expression omnibus (GEO) dataset using Oncomine (Invitrogen) and Georgetown Database of Cancer (G-DOC). This analysis showed that four different members Ly6D, Ly6E, Ly6H or Ly6K have increased gene expressed in bladder, brain and CNS, breast, colorectal, cervical, ovarian, lung, head and neck, pancreatic and prostate cancer than their normal counter part tissues. Increased expression of Ly6D, Ly6E, Ly6H or Ly6K was observed in sub-set of cancer type. The increased expression of Ly6D, Ly6E, Ly6H and Ly6K was found to be associated with poor outcome in ovarian, colorectal, gastric, breast, lung, bladder or brain and CNS as observed by KM plotter and PROGgeneV2 platform. The remarkable findings of increased expression of Ly6 family members and its positive correlation with poor outcome on patient survival in multiple cancer type indicate that Ly6 family members Ly6D, Ly6E, Ly6K and Ly6H will be an important targets in clinical practice as marker of poor prognosis and for developing novel therapeutics in multiple cancer type.
cancer biomarkers; stem cell genes; poor prognosis; lymphocyte antigens 6 complex; Ly6 genes
One of the long-standing challenges in biology is to understand how non-synonymous single nucleotide polymorphisms (nsSNPs) change protein structure and further affect their function. While it is impractical to solve all the mutated protein structures experimentally, it is quite feasible to model the mutated structures in silico. Toward this goal, we built a publicly available structure database resource (SNP2Structure, https://apps.icbi.georgetown.edu/snp2structure) focusing on missense mutations, msSNP. Compared with web portals with similar aims, SNP2Structure has the following major advantages. First, our portal offers direct comparison of two related 3D structures. Second, the protein models include all interacting molecules in the original PDB structures, so users are able to determine regions of potential interaction changes when a protein mutation occurs. Third, the mutated structures are available to download locally for further structural and functional analysis. Fourth, we used Jsmol package to display the protein structure that has no system compatibility issue. SNP2Structure provides reliable, high quality mapping of nsSNPs to 3D protein structures enabling researchers to explore the likely functional impact of human disease-causing mutations.
Active site mutations; Protein structure; Molecular modeling; Disease causing SNPs; SNP database
One-third of estrogen (ER+) and/or progesterone receptor-positive (PGR+) breast tumors treated with Tamoxifen (TAM) do not respond to initial treatment, and the remaining 70% are at risk to relapse in the future. Estrogen-related receptor gamma (ESRRG, ERRγ) is an orphan nuclear receptor with broad, structural similarities to classical ER that is widely implicated in the transcriptional regulation of energy homeostasis. We have previously demonstrated that ERRγ induces resistance to TAM in ER+ breast cancer models, and that the receptor’s transcriptional activity is modified by activation of the ERK/MAPK pathway. We hypothesize that hyper-activation or over-expression of ERRγ induces a pro-survival transcriptional program that impairs the ability of TAM to inhibit the growth of ER+ breast cancer. The goal of the present study is to determine whether ERRγ target genes are associated with reduced distant metastasis-free survival (DMFS) in ER+ breast cancer treated with TAM.
Raw gene expression data was obtained from 3 publicly available breast cancer clinical studies of women with ER+ breast cancer who received TAM as their sole endocrine therapy. ERRγ target genes were selected from 2 studies that published validated chromatin immunoprecipitation (ChIP) analyses of ERRγ promoter occupancy. Kaplan-Meier estimation was used to determine the association of ERRγ target genes with DMFS, and selected genes were validated in ER+, MCF7 breast cancer cells that express exogenous ERRγ.
Thirty-seven validated receptor target genes were statistically significantly altered in women who experienced a DM within 5 years, and could classify several independent studies into poor vs. good DMFS. Two genes (EEF1A2 and PPIF) could similarly separate ER+, TAM-treated breast tumors by DMFS, and their protein levels were measured in an ER+ breast cancer cell line model with exogenous ERRγ. Finally, expression of ERRγ and these two target genes are elevated in models of ER+ breast cancer with hyperactivation of ERK/MAPK.
ERRγ signaling is associated with poor DMFS in ER+, TAM-treated breast cancer, and ESRRG, EEF1A2, and PPIF comprise a 3-gene signaling node that may contribute to TAM resistance in the context of an active ERK/MAPK pathway.
Estrogen-related receptor gamma; Tamoxifen; ER+ breast cancer; MAPK; Apoptosis
Near universal administration of vaccines mandates intense pharmacovigilance for vaccine safety and a stringently low tolerance for adverse events. Reports of autoimmune diseases (AID) following vaccination have been challenging to evaluate given the high rates of vaccination, background incidence of autoimmunity, and low incidence and variable times for onset of AID after vaccinations. In order to identify biologically plausible pathways to adverse autoimmune events of vaccine-related AID, we used a systems biology approach to create a matrix of innate and adaptive immune mechanisms active in specific diseases, responses to vaccine antigens, adjuvants, preservatives and stabilizers, for the most common vaccine-associated AID found in the Vaccine Adverse Event Reporting System.
This report focuses on Guillain-Barre Syndrome (GBS), Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE), and Idiopathic (or immune) Thrombocytopenic Purpura (ITP). Multiple curated databases and automated text mining of PubMed literature identified 667 genes associated with RA, 448 with SLE, 49 with ITP and 73 with GBS. While all data sources provided valuable and unique gene associations, text mining using natural language processing (NLP) algorithms provided the most information but required curation to remove incorrect associations. Six genes were associated with all four AIDs. Thirty-three pathways were shared by the four AIDs. Classification of genes into twelve immune system related categories identified more “Th17 T-cell subtype” genes in RA than the other AIDs, and more “Chemokine plus Receptors” genes associated with RA than SLE. Gene networks were visualized and clustered into interconnected modules with specific gene clusters for each AID, including one in RA with ten C-X-C motif chemokines. The intersection of genes associated with GBS, GBS peptide auto-antigens, influenza A infection, and influenza vaccination created a subnetwork of genes that inferred a possible role for the MAPK signaling pathway in influenza vaccine related GBS.
Results showing unique and common gene sets, pathways, immune system categories and functional clusters of genes in four autoimmune diseases suggest it is possible to develop molecular classifications of autoimmune and inflammatory events. Combining this information with cellular and other disease responses should greatly aid in the assessment of potential immune-mediated adverse events following vaccination.
Electronic supplementary material
The online version of this article (doi:10.1186/s12865-014-0061-0) contains supplementary material, which is available to authorized users.
Next generation sequencing (NGS) technologies produce massive amounts of data requiring a powerful computational infrastructure, high quality bioinformatics software, and skilled personnel to operate the tools. We present a case study of a practical solution to this data management and analysis challenge that simplifies terabyte scale data handling and provides advanced tools for NGS data analysis. These capabilities are implemented using the “Globus Genomics” system, which is an enhanced Galaxy workflow system made available as a service that offers users the capability to process and transfer data easily, reliably and quickly to address end-to-endNGS analysis requirements. The Globus Genomics system is built on Amazon 's cloud computing infrastructure. The system takes advantage of elastic scaling of compute resources to run multiple workflows in parallel and it also helps meet the scale-out analysis needs of modern translational genomics research.
Next generation sequencing; Galaxy; Cloud computing; Translational research
The National Cancer Institute (NCI) Cancer Imaging Program organized two related workshops on June 26–27, 2013, entitled “Correlating Imaging Phenotypes with Genomics Signatures Research” and “Scalable Computational Resources as Required for Imaging-Genomics Decision Support Systems.” The first workshop focused on clinical and scientific requirements, exploring our knowledge of phenotypic characteristics of cancer biological properties to determine whether the field is sufficiently advanced to correlate with imaging phenotypes that underpin genomics and clinical outcomes, and exploring new scientific methods to extract phenotypic features from medical images and relate them to genomics analyses. The second workshop focused on computational methods that explore informatics and computational requirements to extract phenotypic features from medical images and relate them to genomics analyses and improve the accessibility and speed of dissemination of existing NIH resources. These workshops linked clinical and scientific requirements of currently known phenotypic and genotypic cancer biology characteristics with imaging phenotypes that underpin genomics and clinical outcomes. The group generated a set of recommendations to NCI leadership and the research community that encourage and support development of the emerging radiogenomics research field to address short-and longer-term goals in cancer research.
Supplemental Digital Content is available in the text.
Response to the oncology drug gemcitabine may be variable in part due to genetic differences in the enzymes and transporters responsible for its metabolism and disposition. The aim of our in-silico study was to identify gene variants significantly associated with gemcitabine response that may help to personalize treatment in the clinic.
We analyzed two independent data sets: (a) genotype data from NCI-60 cell lines using the Affymetrix DMET 1.0 platform combined with gemcitabine cytotoxicity data in those cell lines, and (b) genome-wide association studies (GWAS) data from 351 pancreatic cancer patients treated on an NCI-sponsored phase III clinical trial. We also performed a subset analysis on the GWAS data set for 135 patients who were given gemcitabine+placebo. Statistical and systems biology analyses were performed on each individual data set to identify biomarkers significantly associated with gemcitabine response.
Genetic variants in the ABC transporters (ABCC1, ABCC4) and the CYP4 family members CYP4F8 and CYP4F12, CHST3, and PPARD were found to be significant in both the NCI-60 and GWAS data sets. We report significant association between drug response and variants within members of the chondroitin sulfotransferase family (CHST) whose role in gemcitabine response is yet to be delineated.
Biomarkers identified in this integrative analysis may contribute insights into gemcitabine response variability. As genotype data become more readily available, similar studies can be conducted to gain insights into drug response mechanisms and to facilitate clinical trial design and regulatory reviews.
DMET; gemcitabine; NCI-60; pancreatic cancer; probabilistic networks
Systemic treatment of patients with early-stage cancers attempts to eradicate occult metastatic disease to prevent recurrence and increased morbidity. However, prediction of recurrence from an analysis of the primary tumor is limited because disseminated cancer cells only represent a small subset of the primary lesion. Here we analyze the expression of circulating microRNAs (miRs) in serum obtained pre-surgically from patients with early stage colorectal cancers. Groups of five patients with and without disease recurrence were used to identify an informative panel of circulating miRs using quantitative PCR of genome-wide miR expression as well as a set of published candidate miRs. A panel of six informative miRs (miR-15a, mir-103, miR-148a, miR-320a, miR-451, miR-596) was derived from this analysis and evaluated in a separate validation set of thirty patients. Hierarchical clustering of the expression levels of these six circulating miRs and Kaplan-Meier analysis showed that the risk of disease recurrence of early stage colon cancer can be predicted by this panel of miRs that are measurable in the circulation at the time of diagnosis (P = 0.0026; Hazard Ratio 5.4; 95% CI of 1.9 to 15).
The use and benefit of adjuvant chemotherapy to treat stage II colorectal cancer (CRC) patients is not well understood since the majority of these patients are cured by surgery alone. Identification of biological markers of relapse is a critical challenge to effectively target treatments to the ~20% of patients destined to relapse. We have integrated molecular profiling results of several “omics” data types to determine the most reliable prognostic biomarkers for relapse in CRC using data from 40 stage I and II CRC patients. We identified 31 multi-omics features that highly correlate with relapse. The data types were integrated using multi-step analytical approach with consecutive elimination of redundant molecular features. For each data type a systems biology analysis was performed to identify pathways biological processes and disease categories most affected in relapse. The biomarkers detected in tumors urine and blood of patients indicated a strong association with immune processes including aberrant regulation of T-cell and B-cell activation that could lead to overall differences in lymphocyte recruitment for tumor infiltration and markers indicating likelihood of future relapse. The immune response was the biologically most coherent signature that emerged from our analyses among several other biological processes and corroborates other studies showing a strong immune response in patients less likely to relapse.
colorectal cancer; relapse; variant analysis; integrative analysis; multi-omics; exome sequencing; systems biology; immune response
The most effective way to move from target identification to the clinic is to identify already approved drugs with the potential for activating or inhibiting unintended targets (repurposing or repositioning). This is usually achieved by high throughput chemical screening, transcriptome matching or simple in silico ligand docking. We now describe a novel rapid computational proteo-chemometric method called “Train, Match, Fit, Streamline” (TMFS) to map new drug-target interaction space and predict new uses. The TMFS method combines shape, topology and chemical signatures, including docking score and functional contact points of the ligand, to predict potential drug-target interactions with remarkable accuracy. Using the TMFS method, we performed extensive molecular fit computations on 3,671 FDA approved drugs across 2,335 human protein crystal structures. The TMFS method predicts drug-target associations with 91% accuracy for the majority of drugs. Over 58% of the known best ligands for each target were correctly predicted as top ranked, followed by 66%, 76%, 84% and 91% for agents ranked in the top 10, 20, 30 and 40, respectively, out of all 3,671 drugs. Drugs ranked in the top 1–40, that have not been experimentally validated for a particular target now become candidates for repositioning. Furthermore, we used the TMFS method to discover that mebendazole, an anti-parasitic with recently discovered and unexpected anti-cancer properties, has the structural potential to inhibit VEGFR2. We confirmed experimentally that mebendazole inhibits VEGFR2 kinase activity as well as angiogenesis at doses comparable with its known effects on hookworm. TMFS also predicted, and was confirmed with surface plasmon resonance, that dimethyl celecoxib and the anti-inflammatory agent celecoxib can bind cadherin-11, an adhesion molecule important in rheumatoid arthritis and poor prognosis malignancies for which no targeted therapies exist. We anticipate that expanding our TMFS method to the >27,000 clinically active agents available worldwide across all targets will be most useful in the repositioning of existing drugs for new therapeutic targets.
Pediatric palliative care is an organized method for delivering effective, compassionate and timely care to children with cancer and their families, but it currently faces many challenges despite advances in technology and health care delivery. A key challenge involves unnecessary suffering from debilitating symptoms, such as pain, resulting from insufficient personalized treatment. Additionally, breakdowns in communication and a paucity of usable patient-centric information impede effective care. Recent advances in informatics for consumer health through eHealth initiatives have begun to be adopted in care coordination and communication, but overall remain under-utilized. Tremendous potentials exist in effective use of health information technology (HIT) to improve areas requiring personalized care such as pain management in pediatric oncology patients.
This article aims first to identify communication challenges and needs in pediatric palliative cancer care from the perspectives of the entire group of individuals around the pediatric oncology patient, and then to describe how adoption and adaptation of these technologies can improve patient-provider communication, behavioral support, pain assessment, and education through integration into existing work flows. The goal of this research is to promote the value of using HIT standards-based technology solutions and stimulate development of interoperable, standardized technologies and delivery of context-sensitive information through user-friendly portals to facilitate communication in an existing pediatric clinical care setting.
health information technology; palliative care; pain management; quality of care and care coordination; health information exchange
Quality control and harmonization of data is a vital and challenging undertaking for any successful data coordination center and a responsibility shared between the multiple sites that produce, integrate, and utilize the data. Here we describe a coordinated effort between scientists and data managers in the Cancer Family Registries to implement a data governance infrastructure consisting of both organizational and technical solutions. The technical solution uses a rule-based validation system that facilitates error detection and correction for data centers submitting data to a central informatics database. Validation rules comprise both standard checks on allowable values and a crosscheck of related database elements for logical and scientific consistency. Evaluation over a 2-year timeframe showed a significant decrease in the number of errors in the database and a concurrent increase in data consistency and accuracy.
Breast cancer; colon cancer; cancer registry; coordination; data governance; bioinformatics; epidemiology; ESAC; informatics
The aim of this study was to perform comparative analysis of multiple public datasets of gene expression in order to identify common genes as potential prognostic biomarkers. Additionally, the study sought to identify biological processes and pathways that are most significantly associated with early distant metastases (<5 years) in women with estrogen receptor-positive (ER+) breast tumors. Datasets from three published studies were selected for in silico analysis of gene expression profiles of ER+ breast cancer, using time to distant metastasis as the clinical endpoint. A subset of 44 differently expressed genes (DEGs) was found common to all three studies and characterized by mitotic checkpoint genes and pathways that regulate mitotic spindle and chromosome dynamics. DEG promoter regions were enriched with NFY binding sites. Analysis of miRNA target sites identified significant enrichment of miR-192, miR-193B, and miR-16-1 targets. Aberrant mitotic regulation could drive increased genomic instability leading to a progression towards an early onset metastatic phenotype. The relative importance of mitotic instability may reflect the clinical utility of mitotic poisons in metastatic breast cancer, including poisons such as the taxanes, epothilones, and vinca alkaloids.
estrogen receptor alpha-positive; mitotic checkpoint signaling; mitotic regulation network; microRNA targets; early distant metastasis
Lynch syndrome accounts for 2–5% of endometrial cancer cases. Lynch syndrome prediction models have not been evaluated among endometrial cancer cases.
Area under the receiver operating curve (AUC), sensitivity and specificity of PREMM1,2,6, MMRpredict, and MMRpro scores were assessed among 563 population-based and 129 clinic-based endometrial cancer cases.
A total of 14 (3%) population-based and 80 (62%) clinic-based subjects had pathogenic mutations. PREMM1,2,6, MMRpredict, and MMRpro were able to distinguish mutation carriers from noncarriers (AUC of 0.77, 0.76, and 0.77, respectively), among population-based cases. All three models had lower discrimination for the clinic-based cohort, with AUCs of 0.67, 0.64, and 0.54, respectively. Using a 5% cutoff, sensitivity and specificity were as follows: PREMM1,2,6, 93% and 5% among population-based cases and 99% and 2% among clinic-based cases; MMRpredict, 71% and 64% for the population-based cohort and 91% and 0% for the clinic-based cohort; and MMRpro, 57% and 85% among population-based cases and 95% and 10% among clinic-based cases.
Currently available prediction models have limited clinical utility in determining which patients with endometrial cancer should undergo genetic testing for Lynch syndrome. Immunohistochemical analysis and microsatellite instability testing may be the best currently available tools to screen for Lynch syndrome in endometrial cancer patients.
endometrial cancer; genetic screening; genetic testing; Lynch syndrome; prediction models
Summary: Differential dependency network (DDN) is a caBIG® (cancer Biomedical Informatics Grid) analytical tool for detecting and visualizing statistically significant topological changes in transcriptional networks representing two biological conditions. Developed under caBIG® 's In Silico Research Centers of Excellence (ISRCE) Program, DDN enables differential network analysis and provides an alternative way for defining network biomarkers predictive of phenotypes. DDN also serves as a useful systems biology tool for users across biomedical research communities to infer how genetic, epigenetic or environment variables may affect biological networks and clinical phenotypes. Besides the standalone Java application, we have also developed a Cytoscape plug-in, CytoDDN, to integrate network analysis and visualization seamlessly.
Availability: The Java and MATLAB source code can be downloaded at the authors' web site http://www.cbil.ece.vt.edu/software.htm
Supplementary information: Supplementary data are available at Bioinformatics online.
Summary: Phenotypic Up-regulated Gene Support Vector Machine (PUGSVM) is a cancer Biomedical Informatics Grid (caBIG™) analytical tool for multiclass gene selection and classification. PUGSVM addresses the problem of imbalanced class separability, small sample size and high gene space dimensionality, where multiclass gene markers are defined by the union of one-versus-everyone phenotypic upregulated genes, and used by a well-matched one-versus-rest support vector machine. PUGSVM provides a simple yet more accurate strategy to identify statistically reproducible mechanistic marker genes for characterization of heterogeneous diseases.
Supplementary information: Supplementary data are available at Bioinformatics online.
Currently, cancer therapy remains limited by a “one-size-fits-all” approach, whereby treatment decisions are based mainly on the clinical stage of disease, yet fail to reference the individual's underlying biology and its role driving malignancy. Identifying better personalized therapies for cancer treatment is hindered by the lack of high-quality “omics” data of sufficient size to produce meaningful results and the ability to integrate biomedical data from disparate technologies. Resolving these issues will help translation of therapies from research to clinic by helping clinicians develop patient-specific treatments based on the unique signatures of patient's tumor. Here we describe the Georgetown Database of Cancer (G-DOC), a Web platform that enables basic and clinical research by integrating patient characteristics and clinical outcome data with a variety of high-throughput research data in a unified environment. While several rich data repositories for high-dimensional research data exist in the public domain, most focus on a single-data type and do not support integration across multiple technologies. Currently, G-DOC contains data from more than 2500 breast cancer patients and 800 gastrointestinal cancer patients, G-DOC includes a broad collection of bioinformatics and systems biology tools for analysis and visualization of four major “omics” types: DNA, mRNA, microRNA, and metabolites. We believe that G-DOC will help facilitate systems medicine by providing identification of trends and patterns in integrated data sets and hence facilitate the use of better targeted therapies for cancer. A set of representative usage scenarios is provided to highlight the technical capabilities of this resource.
Finding better therapies for the treatment of brain tumors is hampered by the lack of consistently obtained molecular data in a large sample set, and ability to integrate biomedical data from disparate sources enabling translation of therapies from bench to bedside. Hence, a critical factor in the advancement of biomedical research and clinical translation is the ease with which data can be integrated, redistributed and analyzed both within and across functional domains. Novel biomedical informatics infrastructure and tools are essential for developing individualized patient treatment based on the specific genomic signatures in each patient’s tumor. Here we present Rembrandt, Repository of Molecular BRAin Neoplasia DaTa, a cancer clinical genomics database and a web-based data mining and analysis platform aimed at facilitating discovery by connecting the dots between clinical information and genomic characterization data. To date, Rembrandt contains data generated through the Glioma Molecular Diagnostic Initiative from 874 glioma specimens comprising nearly 566 gene expression arrays, 834 copy number arrays and 13,472 clinical phenotype data points. Data can be queried and visualized for a selected gene across all data platforms or for multiple genes in a selected platform. Additionally, gene sets can be limited to clinically important annotations including secreted, kinase, membrane, and known gene-anomaly pairs to facilitate the discovery of novel biomarkers and therapeutic targets. We believe that REMBRANDT represents a prototype of how high throughput genomic and clinical data can be integrated in a way that will allow expeditious and efficient translation of laboratory discoveries to the clinic.
Rembrandt; personalized medicine; translational research; clinical genomics; data integration