Despite thousands of reported studies unveiling gene-level signatures for complex diseases, few of these techniques work at the single-sample level with explicit underpinning of biological mechanisms. This presents both a critical dilemma in the field of personalized medicine as well as a plethora of opportunities for analysis of RNA-seq data. In this study, we hypothesize that the “Functional Analysis of Individual Microarray Expression” (FAIME) method we developed could be smoothly extended to RNA-seq data and unveil intrinsic underlying mechanism signatures across different scales of biological data for the same complex disease. Using publicly available RNA-seq data for gastric cancer, we confirmed the effectiveness of this method (i) to translate each sample transcriptome to pathway-scale scores, (ii) to predict deregulated pathways in gastric cancer against gold standards (FDR<5%, Precision=75%, Recall =92%), and (iii) to predict phenotypes in an independent dataset and expression platform (RNA-seq vs microarrays, Fisher Exact Test p<10−6). Measuring at a single-sample level, FAIME could differentiate cancer samples from normal ones; furthermore, it achieved comparative performance in identifying differentially expressed pathways as compared to state-of-the-art cross-sample methods. These results motivate future work on mechanism-level biomarker discovery predictive of diagnoses, treatment, and therapy.
In recent years, there have been numerous initiatives undertaken to describe critical information needs related to the collection, management, analysis, and dissemination of data in support of biomedical research (J Investig Med 54:327-333, 2006); (J Am Med Inform Assoc 16:316–327, 2009); (Physiol Genomics 39:131-140, 2009); (J Am Med Inform Assoc 18:354–357, 2011). A common theme spanning such reports has been the importance of understanding and optimizing people, organizational, and leadership factors in order to achieve the promise of efficient and timely research (J Am Med Inform Assoc 15:283–289, 2008). With the emergence of clinical and translational science (CTS) as a national priority in the United States, and the corresponding growth in the scale and scope of CTS research programs, the acuity of such information needs continues to increase (JAMA 289:1278–1287, 2003); (N Engl J Med 353:1621–1623, 2005); (Sci Transl Med 3:90, 2011). At the same time, systematic evaluations of optimal people, organizational, and leadership factors that influence the provision of data, information, and knowledge management technologies and methods are notably lacking.
In response to the preceding gap in knowledge, we have conducted both: 1) a structured survey of domain experts at Academic Health Centers (AHCs); and 2) a subsequent thematic analysis of public-domain documentation provided by those same organizations. The results of these approaches were then used to identify critical factors that may influence access to informatics expertise and resources relevant to the CTS domain.
A total of 31 domain experts, spanning the Biomedical Informatics (BMI), Computer Science (CS), Information Science (IS), and Information Technology (IT) disciplines participated in a structured surveyprocess. At a high level, respondents identified notable differences in theaccess to BMI, CS, and IT expertise and services depending on the establishment of a formal BMI academic unit and the perceived relationship between BMI, CS, IS, and IT leaders. Subsequent thematic analysis of the aforementioned public domain documents demonstrated a discordance between perceived and reported integration across and between BMI, CS, IS, and IT programs and leaders with relevance to the CTS domain.
Differences in people, organization, and leadership factors do influence the effectiveness of CTS programs, particularly with regard to the ability to access and leverage BMI, CS, IS, and IT expertise and resources. Based on this finding, we believe that the development of a better understanding of how optimal BMI, CS, IS, and IT organizational structures and leadership models are designed and implemented is critical to both the advancement of CTS and ultimately, to improvements in the quality, safety, and effectiveness of healthcare.
Strategies to stage and treat cancer rely on a presumption of either localized or widespread metastatic disease. An intermediate state of metastasis termed oligometastasis(es) characterized by limited progression has been proposed. Oligometastases are amenable to treatment by surgical resection or radiotherapy.
We analyzed microRNA expression patterns from lung metastasis samples of patients with ≤5 initial metastases resected with curative intent.
Patients were stratified into subgroups based on their rate of metastatic progression. We prioritized microRNAs between patients with the highest and lowest rates of recurrence. We designated these as high rate of progression (HRP) and low rate of progression (LRP); the latter group included patients with no recurrences. The prioritized microRNAs distinguished HRP from LRP and were associated with rate of metastatic progression and survival in an independent validation dataset.
Oligo- and poly- metastasis are distinct entities at the clinical and molecular level.
We aim to provide clinically applicable, reproducible, mechanistic interpretations of gene expression changes that lack in gene overlap among predictive gene-signatures. Using a method we recently developed, Functional Analysis of Individual Microarray Expression (FAIME), we provide evidence that Gene Ontology-anchored signatures (GO-signatures) show reliable prognosis in lung cancer. In order to demonstrate the biological congruence and reproducibility of FAIME-derived mechanism classifiers, we chose a disease where gene expression classifiers signatures alone had failed to significantly stratify a larger collection of samples and that exhibited poor or no genetic overlap. For each patient in the two lung adenocarcinoma studies, personalized FAIME-profiles of GO biological processes are generated from genome-wide expression profiles. For both training studies, GO-signatures significantly associated to patient mortality were identified (Prediction Analysis for Microarrays; three-fold cross-validation). These two GO-signatures could effectively stratify patients from an independent validation cohort into sub-groups that show significant differences in disease-free survival (log-rank test P=0.019; P=0.001). Importantly, significant mechanism overlaps assessed by information-theory similarity were detected between the two GO-signatures (Fischer Exact Test p=0.001). Hence, together with machine learning technologies, FAIME could be utilized to develop an ontology-driven and expression-anchored prognostic signature that is personalized for an individual patient.
Lung transplantation remains the only viable therapy for patients with end-stage lung disease. However, the full utilization of this strategy is severely compromised by a lack of donor lung availability. The vast majority of donor lungs available for transplantation are from individuals after brain death (BD). Unfortunately, the early autonomic storm that accompanies BD often results in neurogenic pulmonary edema (NPE), producing varying degrees of lung injury or leading to primary graft dysfunction after transplantation. We demonstrated that sphingosine 1–phosphate (S1P)/analogues, which are major barrier-enhancing agents, reduce vascular permeability via the S1P1 receptor, S1PR1. Because primary lung graft dysfunction is induced by lung vascular endothelial cell barrier dysfunction, we hypothesized that the S1PR1 agonist, SEW-2871, may attenuate NPE when administered to the donor shortly after BD. Significant lung injury was observed after BD, with increases of approximately 60% in bronchoalveolar lavage (BAL) total protein, cell counts, and lung tissue wet/dry (W/D) weight ratios. In contrast, rats receiving SEW-2871 (0.1 mg/kg) 15 minutes after BD and assessed after 4 hours exhibited significant lung protection (∼ 50% reduction, P = 0.01), as reflected by reduced BAL protein/albumin, cytokines, cellularity, and lung tissue wet/dry weight ratio. Microarray analysis at 4 hours revealed a global impact of both BD and SEW on lung gene expression, with a differential gene expression of enriched immune-response/inflammation pathways across all groups. Overall, SEW served to attenuate the BD-mediated up-regulation of gene expression. Two potential biomarkers, TNF and chemokine CC motif receptor-like 2, exhibited gene array dysregulation. We conclude that SEW-2871 significantly attenuates BD-induced lung injury, and may serve as a potential candidate to improve human donor availability.
neurogenic pulmonary edema; lung injury; sphingosine 1–phosphate; sphingolipids; lung transplant donors
DNA variants that affect alternative splicing and the relative quantities of different gene transcripts have been shown to be risk alleles for some Mendelian diseases. However, for complex traits characterized by a low odds ratio for any single contributing variant, very few studies have investigated the contribution of splicing variants. The overarching goal of this study is to discover and characterize the role that variants affecting alternative splicing may play in the genetic etiology of complex traits, which include a significant number of the common human diseases. Specifically, we hypothesize that single nucleotide polymorphisms (SNPs) in splicing regulatory elements can be characterized in silico to identify variants affecting splicing, and that these variants may contribute to the etiology of complex diseases as well as the inter-individual variability in the ratios of alternative transcripts. We leverage high-throughput expression profiling to 1) experimentally validate our in silico predictions of skipped exons and 2) characterize the molecular role of intronic genetic variations in alternative splicing events in the context of complex human traits and diseases. We propose that intronic SNPs play a role as genetic regulators within splicing regulatory elements and show that their associated exon skipping events can affect protein domains and structure. We find that SNPs we would predict to affect exon skipping are enriched among the set of SNPs reported to be associated with complex human traits.
Alternative splicing is a common eukaryotic cellular mechanism that allows for the production of multiple proteins from one gene and occurs in 40%–90% of all human genes. Alternative splicing has been shown to be important for many critical biological processes, including development, evolution, and even psychological behavior. Additionally, alternative splicing has been associated with 15%–50% of human genetic diseases, including breast cancer; however, the precise mechanism by which genetic variations regulate this process remains to be fully elucidated. In this study, we develop an integrative approach that utilizes sequence-based analysis and genome-wide expression profiling to identify genetic variations that may affect alternative splicing. We also evaluate their enrichment among established disease-associated variations. Our study provides insights into the functionality of these variations and emphasizes their importance for complex human traits and diseases.
The aberrant activity of developmental pathways in prostate cancer may provide significant insight into predicting tumor initiation and progression, as well as identifying novel therapeutic targets. To this end, despite shared androgen-dependence and functional similarities to the prostate gland, seminal vesicle cancer is exceptionally rare.
We conducted genomic pathway analyses comparing patient-matched normal prostate and seminal vesicle epithelial cells to identify novel pathways for tumor initiation and progression. Derived gene expression profiles were grouped into cancer biomodules using a protein–protein network algorithm to analyze their relationship to known oncogenes. Each resultant biomodule was assayed for its prognostic ability against publically available prostate cancer patient gene array datasets.
Analyses show that the embryonic developmental biomodule containing four homeobox gene family members (Meis1, Meis2, Pbx1, and HoxA9) detects a survival difference in a set of watchful-waiting patients (n = 172, P = 0.05), identify men who are more likely to recur biochemically postprostatectomy (n = 78, P = 0.02), correlate with Gleason score (r = 0.98, P = 0.02), and distinguish between normal prostate, primary tumor, and metastatic disease. In contrast to other cancer types, Meis1, Meis2, and Pbx1 expression is decreased in poor-prognosis tumors, implying that they function as tumor suppressor genes for prostate cancer. Immunohistochemical staining documents nuclear basal-epithelial and stromal Meis2 staining, with loss of Meis2 expression in prostate tumors.
These data implicate deregulation of the Hox protein cofactors Meis1, Meis2, and Pbx1 as serving a critical function to suppress prostate cancer initiation and progression.
Summary: GO-Module is a web-accessible synthesis and visualization tool developed for end-user biologists to greatly simplify the interpretation of prioritized Gene Ontology (GO) terms. GO-Module radically reduces the complexity of raw GO results into compact biomodules in two distinct ways, by (i) constructing biomodules from significant GO terms based on hierarchical knowledge, and (ii) refining the GO terms in each biomodule to contain only true positive results. Altogether, the features (biomodules) of GO-Module outputs are better organized and on average four times smaller than the input GO terms list (P = 0.0005, n = 16).
Supplementary information: Supplementary data are available at Bioinformatics online.
Novel therapies are desperately needed for radiation-induced lung injury (RILI), which, despite aggressive corticosteroid therapy, remains a potentially fatal and dose-limiting complication of thoracic radiotherapy. We assessed the utility of simvastatin, an anti-inflammatory and lung barrier–protective agent, in a dose- and time-dependent murine model of RILI (18–(25 Gy). Simvastatin reduced multiple RILI indices, including vascular leak, leukocyte infiltration, and histological evidence of oxidative stress, while reversing RILI-associated dysregulated gene expression, including p53, nuclear factor–erythroid-2–related factor, and sphingolipid metabolic pathway genes. To identify key regulators of simvastatin-mediated RILI protection, we integrated whole-lung gene expression data obtained from radiated and simvastatin-treated mice with protein–protein interaction network analysis (single-network analysis of proteins). Topological analysis of the gene product interaction network identified eight top-prioritized genes (Ccna2a, Cdc2, fcer1 g, Syk, Vav3, Mmp9, Itgam, Cd44) as regulatory nodes within an activated RILI network. These studies identify the involvement of specific genes and gene networks in RILI pathobiology, and confirm that statins represent a novel strategy to limit RILI.
radiation pneumonitis; lung vascular permeability; simvastatin; gene dysregulation; protein–protein interaction
Although trait-associated genes identified as complex versus single-gene inheritance differ substantially in odds ratio, the authors nonetheless posit that their mechanistic concordance can reveal fundamental properties of the genetic architecture, allowing the automated interpretation of unique polymorphisms within a personal genome.
Materials and methods
An analytical method, SPADE-gen, spanning three biological scales was developed to demonstrate the mechanistic concordance between Mendelian and complex inheritance of Alzheimer's disease (AD) genes: biological functions (BP), protein interaction modeling, and protein domain implicated in the disease-associated polymorphism.
Among Gene Ontology (GO) biological processes (BP) enriched at a false detection rate <5% in 15 AD genes of Mendelian inheritance (Online Mendelian Inheritance in Man) and independently in those of complex inheritance (25 host genes of intragenic AD single-nucleotide polymorphisms confirmed in genome-wide association studies), 16 overlapped (empirical p=0.007) and 45 were similar (empirical p<0.009; information theory). SPAN network modeling extended the canonical pathway of AD (KEGG) with 26 new protein interactions (empirical p<0.0001).
The study prioritized new AD-associated biological mechanisms and focused the analysis on previously unreported interactions associated with the biological processes of polymorphisms that affect specific protein domains within characterized AD genes and their direct interactors using (1) concordant GO-BP and (2) domain interactions within STRING protein–protein interactions corresponding to the genomic location of the AD polymorphism (eg, EPHA1, APOE, and CD2AP).
These results are in line with unique-event polymorphism theory, indicating how disease-associated polymorphisms of Mendelian or complex inheritance relate genetically to those observed as ‘unique personal variants’. They also provide insight for identifying novel targets, for repositioning drugs, and for personal therapeutics.
Personal genomics; protein interaction networks; medicine; translational bioinformatics; complex disease; ontology; protein–protein interactions; bioinformatcis; alternative splicing; genetics; network; SNP; protein networks; text-mining; bioinformatics; knowledge representations; uncertain reasoning and decision theory; languages; computational methods
Gene expression signatures that are predictive of therapeutic response or prognosis are increasingly useful in clinical care; however, mechanistic (and intuitive) interpretation of expression arrays remains an unmet challenge. Additionally, there is surprisingly little gene overlap among distinct clinically validated expression signatures. These “causality challenges” hinder the adoption of signatures as compared to functionally well-characterized single gene biomarkers. To increase the utility of multi-gene signatures in survival studies, we developed a novel approach to generate “personal mechanism signatures” of molecular pathways and functions from gene expression arrays. FAIME, the Functional Analysis of Individual Microarray Expression, computes mechanism scores using rank-weighted gene expression of an individual sample. By comparing head and neck squamous cell carcinoma (HNSCC) samples with non-tumor control tissues, the precision and recall of deregulated FAIME-derived mechanisms of pathways and molecular functions are comparable to those produced by conventional cohort-wide methods (e.g. GSEA). The overlap of “Oncogenic FAIME Features of HNSCC” (statistically significant and differentially regulated FAIME-derived genesets representing GO functions or KEGG pathways derived from HNSCC tissue) among three distinct HNSCC datasets (pathways:46%, p<0.001) is more significant than the gene overlap (genes:4%). These Oncogenic FAIME Features of HNSCC can accurately discriminate tumors from control tissues in two additional HNSCC datasets (n = 35 and 91, F-accuracy = 100% and 97%, empirical p<0.001, area under the receiver operating characteristic curves = 99% and 92%), and stratify recurrence-free survival in patients from two independent studies (p = 0.0018 and p = 0.032, log-rank). Previous approaches depending on group assignment of individual samples before selecting features or learning a classifier are limited by design to discrete-class prediction. In contrast, FAIME calculates mechanism profiles for individual patients without requiring group assignment in validation sets. FAIME is more amenable for clinical deployment since it translates the gene-level measurements of each given sample into pathways and molecular function profiles that can be applied to analyze continuous phenotypes in clinical outcome studies (e.g. survival time, tumor volume).
Clinical utilization of multi-gene expression signatures that are predictive of therapeutic response has been steadily increasing, however, interpretation of such results remains challenging because multi-gene signatures, generated from analyzing different patient cohorts, tend to be equally predictive but contain minimal overlap. Whereas pathway-level analyses of expression arrays show promise for generating clinically meaningful mechanistic signatures, current approaches do not permit single-patient based analyses that are independent of cross-group calculations. To bridge the gap between deterministic biological mechanisms of single-gene biomarkers and the statistical predictive power of multi-gene signatures that are disconnected from mechanisms, we developed FAIME, a novel method that transforms microarray gene expression data into individualized patient profiles of molecular mechanisms. We have validated its capability for predicting clinical outcomes, including cancer patient samples derived from six different clinical trial cohorts of head and neck cancers. This method provides opportunities to harness an untapped resource for personal genomics: clinical evaluation and testing of individually interpretable mechanistic profiles derived from gene expression arrays.
Thousands of complex-disease single-nucleotide polymorphisms (SNPs) have been discovered in genome-wide association studies (GWAS). However, these intragenic SNPs have not been collectively mined to unveil the genetic architecture between complex clinical traits. The authors hypothesize that biological annotations of host genes of trait-associated SNPs may reveal the biomolecular modularity across complex-disease traits and offer insights for drug repositioning.
Trait-to-polymorphism (SNPs) associations confirmed in GWAS were used. A novel method to quantify trait–trait similarity anchored in Gene Ontology annotations of human proteins and information theory was developed. The results were then validated with the shortest paths of physical protein interactions between biologically similar traits.
A network was constructed consisting of 280 significant intertrait similarities among 177 disease traits, which covered 1438 well-validated disease-associated SNPs. Thirty-nine percent of intertrait connections were confirmed by curators, and the following additional studies demonstrated the validity of a proportion of the remainder. On a phenotypic trait level, higher Gene Ontology similarity between proteins correlated with smaller ‘shortest distance’ in protein interaction networks of complexly inherited diseases (Spearman p<2.2×10−16). Further, ‘cancer traits’ were similar to one another, as were ‘metabolic syndrome traits’ (Fisher's exact test p=0.001 and 3.5×10−7, respectively).
An imputed disease network by information-anchored functional similarity from GWAS trait-associated SNPs is reported. It is also demonstrated that small shortest paths of protein interactions correlate with complex-disease function. Taken together, these findings provide the framework for investigating drug targets with unbiased functional biomolecular networks rather than worn-out single-gene and subjective canonical pathway approaches.
Complex disease; SNP; gene ontology; protein-interaction networks; information theory; translational bioinformatics; complex disease; ontology; bioinformatcis; genetics; network; prostate cancer; protein networks; pathway analysis; network modeling; knowledge representations; uncertain reasoning and decision theory; languages and computational methods
We used two-dimensional quantitative trait locus analysis to identify interacting genetic loci that contribute to the native airway constrictor hyperresponsiveness to methacholine that characterizes A/J mice, relative to C57BL/6J mice. We quantified airway responsiveness to intravenous methacholine boluses in eighty-eight (C57BL/6J X A/J) F2 and twenty-seven (A/J X C57BL/6J) F2 mice as well as ten A/J mice and six C57BL/6J mice; all studies were performed in male mice. Mice were genotyped at 384 SNP markers, and from these data two-QTL analyses disclosed one pair of interacting loci on chromosomes 11 and 18; the homozygous A/J genotype at each locus constituted the genetic interaction linked to the hyperresponsive A/J phenotype. Bioinformatic network analysis of potential interactions among proteins encoded by genes in the linked regions disclosed two high priority subnetworks - Myl7, Rock1, Limk2; and Npc1, Npc1l1. Evidence in the literature supports the possibility that either or both networks could contribute to the regulation of airway constrictor responsiveness. Together, these results should stimulate evaluation of the genetic contribution of these networks in the regulation of airway responsiveness in humans.
Acute lung injury (ALI) and mechanical ventilator-induced lung injury (VILI), major causes of acute respiratory failure with elevated morbidity and mortality, are characterized by significant pulmonary inflammation and alveolar/vascular barrier dysfunction. Previous studies highlighted the role of the non–muscle myosin light chain kinase isoform (nmMLCK) as an essential element of the inflammatory response, with variants in the MYLK gene that contribute to ALI susceptibility. To define nmMLCK involvement further in acute inflammatory syndromes, we used two murine models of inflammatory lung injury, induced by either an intratracheal administration of lipopolysaccharide (LPS model) or mechanical ventilation with increased tidal volumes (the VILI model). Intravenous delivery of the membrane-permeant MLC kinase peptide inhibitor, PIK, produced a dose-dependent attenuation of both LPS-induced lung inflammation and VILI (∼50% reductions in alveolar/vascular permeability and leukocyte influx). Intravenous injections of nmMLCK silencing RNA, either directly or as cargo within angiotensin-converting enzyme (ACE) antibody–conjugated liposomes (to target the pulmonary vasculature selectively), decreased nmMLCK lung expression (∼70% reduction) and significantly attenuated LPS-induced and VILI-induced lung inflammation (∼40% reduction in bronchoalveolar lavage protein). Compared with wild-type mice, nmMLCK knockout mice were significantly protected from VILI, with significant reductions in VILI-induced gene expression in biological pathways such as nrf2-mediated oxidative stress, coagulation, p53-signaling, leukocyte extravasation, and IL-6–signaling. These studies validate nmMLCK as an attractive target for ameliorating the adverse effects of dysregulated lung inflammation.
endotoxin/lipopolysaccharide; nmMLCK; mice; lung injury; endothelial barrier
Cancer staging and treatment presumes a division into localized or metastatic disease. We proposed an intermediate state defined by ≤5 cumulative metastasis(es), termed oligometastases. In contrast to widespread polymetastases, oligometastatic patients may benefit from metastasis-directed local treatments. However, many patients who initially present with oligometastases progress to polymetastases. Predictors of progression could improve patient selection for metastasis-directed therapy.
Here, we identified patterns of microRNA expression of tumor samples from oligometastatic patients treated with high-dose radiotherapy.
Patients who failed to develop polymetastases are characterized by unique prioritized features of a microRNA classifier that includes the microRNA-200 family. We created an oligometastatic-polymetastatic xenograft model in which the patient-derived microRNAs discriminated between the two metastatic outcomes. MicroRNA-200c enhancement in an oligometastatic cell line resulted in polymetastatic progression.
These results demonstrate a biological basis for oligometastases and a potential for using microRNA expression to identify patients most likely to remain oligometastatic after metastasis-directed treatment.
MicroRNAs, small non-coding RNAs, may act as tumor suppressors or oncogenes, and each regulate their own transcription and that of hundreds of genes, often in a tissue-dependent manner. This creates a tightly interwoven network regulating and underlying oncogenesis and cancer biology. Although protein-coding gene signatures and single protein pathway markers have proliferated over the past decade, routine adoption of the former has been hampered by interpretability, reproducibility, and dimensionality, whereas the single molecule–phenotype reductionism of the latter is often overly simplistic to account for complex phenotypes. MicroRNA-derived biomarkers offer a powerful alternative; they have both the flexibility of gene expression signature classifiers and the desirable mechanistic transparency of single protein biomarkers. Furthermore, several advances have recently demonstrated the robust detection of microRNAs from various biofluids, thus providing an additional opportunity for obtaining bioinformatically derived biomarkers to accelerate the identification of individual patients for personalized therapy.
MicroRNA signatures; gene expression; biomarkers; bioinformatics; knowledge representations; uncertain reasoning and decision theory; languages and computational methods; prostate cancer; protein networks; pathway analysis; network modeling; machine learning; predictive modeling; statistical learning; privacy technology
Uncovering the dominant molecular deregulation among the multitude of pathways implicated in aggressive prostate cancer is essential to intelligently developing targeted therapies. Paradoxically, published prostate cancer gene expression signatures of poor prognosis share little overlap and thus do not reveal shared mechanisms. The authors hypothesize that, by analyzing gene signatures with quantitative models of protein–protein interactions, key pathways will be elucidated and shown to be shared.
The authors statistically prioritized common interactors between established cancer genes and genes from each prostate cancer signature of poor prognosis independently via a previously validated single protein analysis of network (SPAN) methodology. Additionally, they computationally identified pathways among the aggregated interactors across signatures and validated them using a similarity metric and patient survival.
Using an information-theoretic metric, the authors assessed the mechanistic similarity of the interactor signature. Its prognostic ability was assessed in an independent cohort of 198 patients with high-Gleason prostate cancer using Kaplan–Meier analysis.
Of the 13 prostate cancer signatures that were evaluated, eight interacted significantly with established cancer genes (false discovery rate <5%) and generated a 42-gene interactor signature that showed the highest mechanistic similarity (p<0.0001). Via parameter-free unsupervised classification, the interactor signature dichotomized the independent prostate cancer cohort with a significant survival difference (p=0.009). Interpretation of the network not only recapitulated phosphatidylinositol-3 kinase/NF-κB signaling, but also highlighted less well established relevant pathways such as the Janus kinase 2 cascade.
SPAN methodolgy provides a robust means of abstracting disparate prostate cancer gene expression signatures into clinically useful, prioritized pathways as well as useful mechanistic pathways.
Prostate cancer; protein networks; systems biology; information theory; network modeling; Simulation of complex systems (at all levels: molecules to work groups to organizations); knowledge representations; Uncertain reasoning and decision theory; languages and computational methods; statistical analysis of large datasets; advanced algorithms; discovery and text and data mining methods; Natural-language processing; Automated learning; Ontologies
Characterizing the biomolecular systems’ properties underpinning prognosis signatures derived from gene expression profiles remains a key clinical and biological challenge. In breast cancer, while different “poor-prognosis” sets of genes have predicted patient survival outcome equally well in independent cohorts, these prognostic signatures have surprisingly little genetic overlap. We examine ten such published expression-based signatures that are predictors or distinct breast cancer phenotypes, uncover their mechanistic interconnectivity through a protein-protein interaction network, and introduce a novel cross-“gene expression signature” analysis method using (i) domain knowledge to constrain multiple comparisons in a mechanistically relevant single-gene network interactions, and (ii) scale-free permutation resampling to statistically control for hubness (SPAN - Single Protein Analysis of Network with constant node degree per protein). At adjusted p-values < 5%, 54 genes thus identified have a significantly greater connectivity than those through meticulous permutation resampling of the context-constrained network. More importantly, eight of ten genetically non-overlapping signatures are connected through well-established mechanisms of breast cancer oncogenesis and progression. Gene Ontology enrichment studies demonstrate common markers of cell cycle regulation. Kaplan-Meier analysis of three independent historical gene expression sets confirms this network-signature’s inherent ability to identify “poor outcome” in ER(+) patients without the requirement of machine learning. We provide a novel demonstration that genetically distinct prognosis signatures, developed from independent clinical datasets, occupy overlapping prognostic space of breast cancer via shared mechanisms that are mediated by genetically different yet mechanistically comparable interactions among proteins of differentially expressed genes in the signatures. This is the first study employing a networks’ approach to aggregate established gene expression signatures in order to develop a phenotype/pathway-based cancer roadmap with the potential for (i) novel drug development applications and for (ii) facilitating the clinical deployment of prognostic gene signatures with improved mechanistic understanding of biological processes and functions associated with gene expression changes. http://www.lussierlab.org/publication/networksignature/
systems biology; protein-interaction networks; breast cancer; gene signatures; context-constrained networks
Kinase inhibition is an increasingly popular strategy for pharmacotherapy of human diseases. Although many of these agents have been described as “targeted therapy”, they will typically inhibit multiple kinases with varying potency. Pre-clinical model testing has not predicted the numerous significant toxicities identified during clinical development. The purpose of this study was to develop a bioinformatics-based method to predict specific adverse events (AEs) in humans associated with the inhibition of particular kinase targets (KTs).
The AE frequencies of protein kinase inhibitors (PKIs) were curated from three sources (PubMed, Thompson Physician Desk Reference and PharmGKB), and affinities of 38 PKIs for 317 kinases, representing > 50% of the predicted human kinome, were collected from published in vitro assay results. A novel quantitative computational method was developed to predict associations between KTs and AEs that included a whole panel of 71 AEs and 20 PKIs targeting 266 distinct kinases with Kd < 10uM. The method calculated an unbiased, kinome-wide association score via linear algebra on (i) the normalized frequencies of AEs associated with 20 PKIs and (ii) the negative log-transformed dissociation constant of kinases targeted by these PKIs. Finally, a reference standard was calculated by applying Fisher’s exact test to the co-occurrence of indexed Pubmed terms (p≤0.05, and manually verified) for AE and associated kinase targets (AE-KT) pairs from standard literature search techniques. We also evaluated the enrichment of predictions between the quantitative method and the literature search by Fisher’s Exact testing.
We identified significant associations among already empirically well established pairs of AEs (e.g. diarrhea and rash) and KTs (e.g. EGFR). The following less well recognized AE-KT pairs had similar association scores: diarrhea-(DDR1; ERBB4), rash-ERBB4, and fatigue-(CSF1R; KIT). With no filtering, the association score identified 41 prioritized associations involving 7 AEs and 19 KTs. Among them, 8 associations were reported in the literature review. There were only 78 out of a total of 4,522 AE-KT pairs meeting the evaluation threshold, indicating a strong association between the predicted and the text mined AE-KT pairs (p= 3×10−7). As many of these drugs remain in development, a larger volume of more detailed data on AE-PKI associations is accessible only through non-public databases. These prediction models will be refined with these data and validated through dedicated prospective human studies.
Conclusion and future directions
Our in silico method can predict associations between kinase targets and AE frequencies in human patients. Refining this method should lead to improved clinical development of protein kinase inhibitors, a large new class of therapeutics.
Adverse event; toxicity; kinome; kinase inhibitor; computational modeling; translational bioinformatics
Nearly a decade since the completion of the first draft of the human genome, the biomedical community is positioned to usher in a new era of scientific inquiry that links fundamental biological insights with clinical knowledge. Accordingly, holistic approaches are needed to develop and assess hypotheses that incorporate genotypic, phenotypic, and environmental knowledge. This perspective presents translational bioinformatics as a discipline that builds on the successes of bioinformatics and health informatics for the study of complex diseases. The early successes of translational bioinformatics are indicative of the potential to achieve the promise of the Human Genome Project for gaining deeper insights to the genetic underpinnings of disease and progress toward the development of a new generation of therapies.
Translational bioinformatics; systems medicine; systems biology; bioinformatics; biomedical informatics; knowledge representation; information retrieval; phylogenetics; modeling physiologic and disease processes; linking the genotype and phenotype; identifying genome and protein structure and function; visualization of data and knowledge; simulation of complex systems (at all levels: molecules to work groups to organizations); knowledge representations; uncertain reasoning and decision theory; languages; computational methods; statistical analysis of large datasets; advanced algorithms; discovery; text and data mining methods; natural-language processing; automated learning; ontologies
The complex regulatory network between microRNAs and gene expression remains unclear domain of active research. We proposed to address in part this complex regulation with a novel approach for the genome-wide identification of biomodules derived from paired microRNA and mRNA profiles, which could reveal correlations associated with a complex network of de-regulation in human cancer. Two published expression datasets for 68 samples with 11 distinct types of epithelial cancers and 21 samples of normal tissues were used, containing microRNA expression (Lu et al. Nature Letters 2005) and gene expression (Ramaswarmy et al. PNAS 2001) profiles, respectively. As results, the microRNA expression used jointly with mRNA expression can provide better classifiers of epithelial cancers against normal epithelial tissue than either dataset alone (p=1×10-10, F-Test). We identified a combination of six microRNA-mRNA biomodules that optimally classified epithelial cancers from normal epithelial tissue (total accuracy = 93.3%; 95% confidence intervals: 86% - 97%), using penalized logistic regression (PLR) algorithm and three-fold cross-validation. Three of these biomodules are individually sufficient to cluster epithelial cancers from normal tissue using mutual information distance. The biomodules contain 10 distinct microRNAs and 98 distinct genes, including well known tumor markers such as miR-15a, miR-30e, IRAK1, TGFBR2, DUSP16, CDC25B and PDCD2. In addition, there is a significant enrichment (Fisher’s exact test p=3×10-10) between putative microRNA-target gene pairs reported in five microRNA target databases and the inversely correlated micro-RNA-mRNA pairs in the biomodules. Further, microRNAs and genes in the biomodules were found in abstracts mentioning epithelial cancers (Fisher Exact Test, unadjusted p<0.05). Taken together, these results strongly suggest that the discovered microRNA-mRNA biomodules correspond to regulatory mechanisms common to human epithelial cancer samples. In conclusion, we developed and evaluated a novel comprehensive method to systematically identify, on a genome scale, microRNA-mRNA expression biomodules common to distinct cancers of the same tissue. These biomodules also comprise novel microRNA and genes as well as an imputed regulatory network, which may accelerate the work of cancer biologists as large regulatory maps of cancers can be drawn efficiently for hypothesis generation.
biomodule; microRNA expression; gene expression; cancer; diagnosis
We implemented an end-to-end notification system that pushed urgent clinical laboratory results to Blackberry 7510 devices over the Nextel cellular network. We designed our system to use user roles and notification policies to abstract and execute clinical notification procedures. We anticipated some problems with dropped and non-delivered messages when the device was out-of-network, however, we did not expect the same problems in other situations like device reconnection to the network. We addressed these problems by creating cascading “fault tolerance” policies to drive notification escalation when messages timed-out or delivery failed. This paper describes our experience in providing an adaptable, fault tolerant pervasive notification system for delivering secure, critical, time-sensitive patient laboratory results.
Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN.
Mouse xenograft models, in which human cancer cells are implanted in immune-suppressed mice, have been popular for studying the mechanisms of novel therapeutic targets, tumor progression and metastasis. We hypothesized that we could exploit the interspecies genetic differences in these experiments. Our purpose is to elucidate stromal microenvironment signals from probes on human arrays unintentionally cross-hybridizing with mouse homologous genes in xenograft tumor models.
By identifying cross-species hybridizing probes from sequence alignment and cross-species hybridization experiment for the human whole-genome arrays, deregulated stromal genes can be identified and then their biological significance were predicted from enrichment studies. Comparing these results with those found by the laser capture microdissection of stromal cells from tumor specimens resulted in the discovery of significantly enriched stromal biological processes.
Using this method, in addition to their primary endpoints, researchers can leverage xenograft experiments to better characterize the tumor microenvironment without additional costs. The Xhyb probes and R script are available at http://www.lussierlab.org/publications/Stroma