|Home | About | Journals | Submit | Contact Us | Français|
Over the past 50 years, like molecular cell biology, medicine and pharmacology have been driven by a reductionist approach. The focus on individual genes and cellular components as disease loci and drug targets has been a necessary step in understanding the basic mechanisms underlying tissue/organ physiology and drug action. Recent progress in genomics and proteomics, as well as advances in other technologies that enable large-scale data gathering and computational approaches, is providing new knowledge of both normal and disease states. Systems-biology approaches enable integration of knowledge from different types of data for precision medicine and systems therapeutics. In this review, we describe recent studies that contribute to these emerging fields and discuss how together these fields can lead to a mechanism-based therapy for individual patients.
Disease diagnosis and classification, drug development, and patient treatment during the molecular biology era were based on the implicit assumption that diseases arise from the malfunction of one or a limited set of genes/proteins and that targeting these proteins will be therapeutically useful. Most often, proteins that serve as drug targets are important receptors or enzymes. Such approaches have been highly successful, as demonstrated by the work of James Black. He developed propranolol, a β-adrenergic receptor antagonist, as a drug to treat hypertension, and cimitidine, a histamine-2 receptor antagonist, to treat ulcers.1 In addition, some diseases can be accounted for by a defect in a single protein. For example, Fabry’s disease arises from a defect in the α-galactosidase gene. Replacing this protein “cures” the disease.2 However, as we have developed a greater understanding of the molecular and cell biological processes underlying many physiological processes, it has become clear that many complex physiologies and pathophysiologies arise from a collection of molecular defects in multiple regulatory pathways. Such molecular and regulatory complexities have been best demonstrated in cancers; however, they are likely to occur in many other diseases as well. A recognition of the multiplicity of molecular characteristics underlying the same or very similar pathophysiologies has led to calls for a new taxonomy for disease based on detailed individual characteristics of the patients, at both the molecular and the clinical levels rather than on a purely symptomatic basis. A recent Institute of Medicine report postulates that such a taxonomy can lead to precision medicine in which diagnosis is based on individual characteristics that may fall into a limited number of sets of characteristics that result in the same pathophysiology, and therapy—most often with drugs—can take a computationally predictable course with maximal therapeutic efficiency and minimal adverse side effects.3 A good example of precision medicine is a recent study by the Cancer Genome Atlas Network on human breast cancer.4 This network of scientists used multiple types of data, including mutations, copy-number variations, DNA methylation patterns, gene expression patterns, microRNA patterns, and protein arrays for changes in protein and phosphoprotein levels, to classify 507 patients into four types of breast cancer. This ability to bin a large number of patients into a small number of states of disease will enable clinicians to devise therapy that will be more efficacious in the context of the genomic, epigenomic, and other characteristics of individual patients.
For precision medicine to become reality, the new approaches in diagnosing and classifying diseases need to be accompanied by the knowledge of drug action and new drugs that can be used to treat patients based on their genomic characteristics and environmental history in an efficacious manner. Drug usage remains largely empirical based on correlations between clinical characteristics and treatment outcomes. In addition, drug development has not fully kept up with the enormous increase in knowledge at the molecular level. In recent years, the development and approval by the US Food and Drug Administration (FDA) of new drugs has declined steadily. Although the human proteome is based on nearly 25,000 genes, only ~400 gene products are targeted by ~1,200 FDA-approved drugs.5 The drug development process often fails because of poor efficacy in humans or unexpected severe side effects6 discovered during late and expensive phase III trials. Sometimes rare but severe adverse events are found only after the drug is brought to the market. These experiences suggest that the low-hanging fruits in drug discovery and action seem to have been largely collected, and new strategies for drug development and drug usage are needed. The one-drug-for-all approach (i.e., the blockbuster drug model) is likely to be replaced by a more personalized drug treatment approach based on the genomic/epigenomic status of the patient as well as environmental exposure. For this type of treatment to occur, we need tight integration of our mechanistic understanding of disease progression with knowledge of drug action in individuals whose pathophysiology is correlated with genomic and epigenomic status. To obtain such integrated knowledge, the myriad of experimental and computational approaches used in the field of systems biology will be necessary.
Systems pharmacology takes a broad view of drug action,7 wherein targets for drugs are considered as a part of network of molecular interactions. Such networks include protein–protein, protein–nucleic acid, protein–lipid, and protein–sugar interactions. This holistic view allows us to understand how both therapeutic and adverse events arise as physiological consequences of drug targets participating in molecular networks. To understand and predict drug action from a network perspective, the same genomic and proteomic tools used in precision medicine are needed. Therefore, systems pharmacology is likely to be an essential part of precision medicine. Ideally, if precision medicine works as intended, the physician should on the basis of a molecules-to-physiology (genotype-to-phenotype) understanding of the disease dynamics be able to appropriately identify the stage of the disease (the precision medicine part) and use this knowledge to computationally predict the dose for the most efficacious drug(s) to maximize the therapeutic effect and minimize adverse events (the systems-therapeutics part). Thus, precision medicine represents a precise multiscale definition of what is now widely called personalized medicine.
The continuing advances and decreasing prices of DNA sequencing technologies have led to a substantial increase in large-scale data gathering focused on whole-genome characterizations of copy-number variations, single-nucleotide polymorphisms (SNPs), mutations and DNA methylation, and microRNA profiling.8–12 In addition, with the increasing usage of mRNA-Seq to profile gene expression patterns in greater depth and accuracy on a genome-wide level, the ability to relate changes in mRNA expression patterns to genomic mechanisms is becoming increasingly feasible. These technologies that produce large comprehensive data sets enabling genome-wide characterization have been reviewed elsewhere,13–15 whereas here we focus on computational approaches. Statistical analysis and graph theory models allow us to use the data obtained from these genome-wide experiments to build the knowledge base for precision medicine and systems pharmacology and therapeutics. Dynamical models that incorporate genomic, epigenomic, and protein expression characteristics of individual patients will enable us to predict drug efficacy, resistance, and adverse events.
From a general perspective, there are several issues that we need to address both at a conceptual level and at the operational level, if systems approaches are to meaningfully affect the treatment of complex diseases:
We address these questions in this review. As with other emerging areas, the picture is quite incomplete and the studies we describe are quite independent of each other and so they do not always provide a coherent connected picture. Indeed, we suspect many of the authors of these studies may not think of themselves as working in either precision medicine or systems therapeutics. Nevertheless, we think we can see the contours of the emergence of these interconnected fields. Computation is the glue that ties the two fields together. Recent studies using computational and network biology approaches to analyze experimental and clinical data to develop the hypothesis indicate that computational models may be very valuable in drug treatment optimization based on patient-specific clinical variables and molecular signatures. As we provide an overview of the current status of these emerging fields by describing studies that contribute to the formulation of key concepts that are used as section headings, it will become increasingly obvious to the reader that genomics and proteomics, precision medicine, and systems pharmacology represent a continuum within the spectrum of multiscale understanding of disease and therapeutics. In Figure 1, we show a schematic overview of how experimental approaches that provide data on the characteristics of the genome, transcriptome, and proteome, when combined with clinical characteristics and prior knowledge through network analysis, can yield a library of human disease networks that can be used to classify disease states—as was recently done by the TCGA consortium for breast cancer4 and other cancers. Such classification can then be used to generate an individual patient–relevant network that can be used for personalized prediction of therapeutic strategies.
One key concept in systems biology relevant to both precision medicine and systems therapeutics is modularity. A decade of data support the idea that biological mechanisms have moved from molecular to modular.16 Signaling pathways such as the β-adrenergic receptor pathway from receptor to the transcription factor cAMP responsive element binding protein or L-type calcium channel are modules that are most readily recognized. But other types of modules, such as the proteins involved in transcriptional regulation or actin cytoskeletal dynamics, have also been identified. Alterations in the activities of such functional modules are often associated with disease. All of these modules are interconnected and together they form a large network that we call the interactome. Networks are representations of systems of components (nodes in mathematical parlance) and interactions (edges). This type of representation allows us to computationally analyze the system. The branch of mathematics involved in the analysis of networks is called graph theory.
When all of the components in such a network are proteins, they are called protein–protein interaction networks. This is a somewhat artificial representation, because in a cell the interactome includes proteins, nucleic acids (both DNA and RNA), lipids and sugars, and ions. Nevertheless, from an operational standpoint, the protein–protein interaction networks are useful because they allow us to track information flow from genotype to phenotypic functions. At the level of protein–protein interactions, the interactome represents the sum of all the known interactions (chemical reactions and protein binding) between proteins. Here, we focus on human proteins. Currently, the human interactome contains some 13,000 nodes (proteins) and 80,000 edges (interactions), but these numbers vary quite a bit based on the included databases. A definitive human protein–protein interaction is an important future goal.
An interactome consists of many functional modules that are linked to each other, and proteins can function as constitutive members of a functional module, e.g., as part of a protein complex required for function such as the proteasome for regulated proteolysis or the Arp2/3 complex for actin cytoskeleton formation, and/or as nodes mediating the communication between different modules such as E3 ligases for the proteasome or neuronal Wishott-Aldrich syndrom protein for actin cytoskeleton. Constitutive members of a module often show correlated gene expression with their protein interaction partners, whereas gene expression of communication or regulatory proteins would correlate less with gene expression of their neighbors. The combination of coexpressed genes under a defined pathophysiological condition and placing the differentially expressed protein in the context of its interacting patterns offer a broad approach to characterizing the different proteins in a specific cellular function or pathological process.
Components within such networks fall into two categories: nodes with few interactions and nodes with many interactions; the latter are called hubs, which are further classified as party and date hubs.17,18 Although the names are lighthearted, such classification provides mechanistic clues to how these different types of nodes function. Party hubs can be thought of as nucleating centers that bring a functional module together. Date hubs serve as connectors between modules. The average measure of correlation between a hub and each of its neighbors for coexpression at the mRNA level should be higher for party hubs, given that they are at the center of a functional module, and lower for date hubs, which serve as connectors between functional modules. Consistent with this prediction, the correlation coefficient of yeast17 and human hubs18 showed a bimodal or multimodal distribution, allowing the separation of date or intermodular and party or intramodular hubs. Network connectivity can be more readily disrupted by the removal of date hubs than by the removal of party hubs, supporting the proposed function of date hubs to organize different functional modules.
Understanding this type of organization within the human interactome is important for the identification of disease states. Module change during disease progression occurs in breast cancer.18 In a cohort of patients with sporadic breast cancer,19 hub modularity was investigated in patients who were disease-free after extended follow-up and in those with poor prognosis.18 A total of 256 hubs showed a changed modularity in patients with poor prognosis, indicating a functional disorganization of the network modules. The 256 hubs with differing modularity in patients with poor prognosis form an interconnected network in the interactome, which is enriched for functional groups known to be misregulated in breast cancer. For example, the expression of the hub protein breast cancer susceptibility protein 1 (BRCA1) and its neighbors Meiotic recombination homolog (MRE11) and breast cancer susceptibility protein 2 (BRCA2), all three members of the BRCA1-associated genome surveillance complex, correlated in patients with good prognosis, but not in patients with poor prognosis. Therefore, it appears that the analysis of modularity can be used to characterize disease states. The origins and progression of diseases might not only involve a change in the activity of individual pathways (i.e., modules) but could also involve the re- or disorganization of functional modules. Successful treatment might depend on the ability to go back to a prior network organization or to go to a new organization characteristic of normal physiology.
Even when genome-wide gene expression data associated with disease phenotype are not available, the human interactome offers an opportunity to identify disease modules on a genome-wide basis. These modules are parts of the larger networks. These parts are called subnetworks (or neighborhoods) and are enriched for proteins involved in the disease pathogenesis.20,21 Network expansion methods assume that a functional or disease module consists of a locally dense neighborhood in the interactome, such that its proteins are more linked with each other than with other proteins. Using validated disease proteins from candidate disease gene experiments as starting points (seed nodes), network expansion methods can be used to identify and characterize neighborhoods containing proteins with a high probability of acting as drivers of disease. Two commonly used methods for network expansion are nearest-neighbor expansion and random walk–based approaches.
The first expansion method is based on the assumption that proteins directly interacting with disease-associated proteins are likely to cause a similar disease phenotype. On the basis of a seed list of 70 Alzheimer’s disease–related proteins curated from the OMIM database, a disease-related subnetwork was developed.22 Among the top 20–ranked nodes in this subnetwork, 19 are already known to be involved in Alzheimer’s disease, indicating a high predictive power of the method. Ranked at position 18, β1catenin (CTNNB1) has not been previously associated with Alzheimer’s disease. CTNNB1 is a part of the WNT signaling pathway, which is a target of amyloid β toxicity,23,24 explaining a possible involvement of CTNNB1 in Alzheimer’s disease.
The second network expansion method—a random walk–based method called the mean first-passage time—has been used to define a disease neighborhood for the long-QT syndrome (LQTS).25 LQTS is a congenital or drug-induced cardiac pathophysiology characterized by a prolonged QT interval that is associated with an increased risk for the development of torsade de pointes tachycardia and fatal arrhythmias. Twelve genes with known causative mutations for LQTS, most of them coding for ion channels, and one gene annotated with reduced susceptibility have been identified so far. These 13 genes were used as seed nodes to define a subnetwork in the human interactome using an algorithm based on the mean first-passage time, which calculates the functional distance between any two nodes. The mean first-passage time computations showed that the LQTS neighborhood is enriched for protein targets of drugs listed on the Arizona Center for Education and Research on Therapeutics—a database that describes the LQTS adverse-event potential associated with various drugs. Thus, network analysis allows us to identify the common basis of how drugs or mutated genes can cause the same pathophysiology.
This study also tested the ability of network analysis to predict new drug targets that could be associated with a rare occurrence of LQTS. To identify new drugs associated with LQTS as adverse events, the FDA Adverse Event Reporting System database was screened for drugs that were reported to be associated with LQTS at least once during monotherapy. The targets of these drugs are enriched in the LQTS neighborhood. This type of enrichment analysis suggests that when a drug has a target in the LQTS neighborhood, its usage should be monitored more closely.
Building such pathophysiology-focused networks can be a powerful tool for understanding nonobvious convergence such as interactions that result in a shared propensity to evoke the same adverse events. For example, dasatinib, the cancer drug (a tyrosine kinase inhibitor), and loperamide, an antidiarrhea drug, are both associated with LQTS in the Adverse Event Reporting System. How might this happen? Dasatinib could be connected to the LQTR disease gene KCNQ1 through protein kinase C and Src, and the antidiarrhea drug loperamide, which is an opiate agonist but also binds to calmodulin, is connected to KCNH2 through calmodulin and protein kinase C. Therefore, building disease modules can help us understand how two drugs used to treat two very different pathophysiologies could cause the same adverse event, in one case through its intended target and in the other through an off-target (Figure 2).
Several recent studies using experimental systems-biology techniques and the concepts of modularity that arise from network analysis provide substantial new insights into drug-resistance mechanisms. We focus on tyrosine kinase inhibitors, as they have become a major class of therapeutics targeted at cancers. Notable successes include the treatment of the BCR-ABL protein kinase translocation–positive chronic myeloid leukemia with the tyrosine kinase inhibitor imatinib.26 However, single protein kinase targeting often causes drug resistance after an initial treatment period. Studies focused on developing a mechanistic understanding of the emergence of drug resistance have helped define new disease states as well as strategies to overcome resistance to drugs. One study by Johnson and co-workers investigated the protein kinase activity profiles in two triple-negative breast cancer27 cell lines before and after inhibition of mitogen activated protein kinase (MAPK)/extracellular signal regulated kinase (ERK) kinase (MEK), a protein kinase in the ERK pathway.28 Exposure to MEK inhibitor caused initial MAPK1,2 inhibition, leading to growth reduction. Prolonged treatment resulted in resistance with the MEK inhibitor no longer suppressing proliferation. Increased activity of multiple receptor tyrosine kinases such as Platelet-derived growth factor receptor β, vascular endothelial growth receptor 2 and Axl receptor tyrosine kinase emerged and reactivated MEK2, but not MEK1 activity, and also activated the AKT and mechanistic target of rapamycin pathways. The upregulation of receptor tyrosine kinase (RTK) activities was shown to be the result of increased proteasomal degradation of the transcriptional repressor c-Myc following decreased ERK-mediated phosphorylation of c-Myc. In consequence, transcriptional repression of the receptor tyrosine kinase genes is relieved. This results in multiple new RTKs and a reconfiguration network. The naive, drug inhibited, and reconfigured networks are shown in Figure 3. Consequently, the combined exposure with MEK inhibitor and the broad-spectrum tyrosine kinase inhibitor sorafenib significantly increased growth inhibition of these cancer cell lines and a mouse xenograft model, whereas inhibition of the tyrosine kinases alone had no effect.
Circumvention of protein kinase inhibition by the activation of RTKs has also been demonstrated by the ability of growth factors to rescue kinase inhibitor–treated cancer cell lines as a mechanism of pre-existing, i.e., immediate, resistance.29,30 Targeting the appropriate protein kinase in 40 kinase-addicted human cancer cell lines leads to growth reduction that can be overcome by adding four of six tested agonists for receptor tyrosine kinases.30 The most successful in driving resistance even as a single agent are hepatocyte growth factor (HGF), fibroblast growth factor, and neuregulin (NRG1). Another study has shown that innate, i.e., immediate, resistance against cytotoxic but particularly against targeted anticancer agents, is mediated by stromal cells in the tumor microenvironment.29 Innate vemurafenib resistance of v-Raf murine sarcoma viral oncogene homolog B1 (BRAF)V600E-mutant melanoma cells, for example, is driven by fibroblast-secreted HGF causing the sustained activation of AKT and ERK. In agreement with this observation, patients with high-plasma HGF levels30 and patients with stromal cell–secreted HGF as detected by immunohistochemistry of tumor biopsies29 showed a poorer response to vemurafenib treatment. The combined targeting of BRAF by vemurafenib and MET by crizotinib blocked HGF-driven rescue in cancer cell lines as well as in xenograft studies,29 highlighting how subnetwork reorganization can be overcome by polypharmacology. Although these studies did not explicitly use computational analyses, they demonstrate the value of network analysis–based reasoning to identify subnetworks and modules that help uncover the existence of distal unexpected loci that are involved in initial treatment failure, disease progression, and the appearance of drug resistance. A combination of experimental and computational approaches is likely to be needed for identifying such mechanisms, which can be depicted as disease state subnetworks as outlined in Figure 1.
A recent study that combines different types of data such as gene expression patterns and protein–protein interaction subnetworks to understand and predict disease progression and efficacy of drug treatment is a good example of how computational analysis can lead to the discovery of nonobvious mechanisms. Komurov et al.31 have focused on identifying changes in activities of intracellular processes based on changing gene expression patterns that occur during disease progression and onset of drug resistance. The changes in the expression levels of various genes were used as the basis for a network algorithm to identify new modules that emerge as the diseases progresses. This random walk–based algorithm was used to identify the increased activity of the glucose deprivation response subnetwork as a drug-resistance mechanism that emerges during long-term treatment of breast cancer cells with lapatinib. This unbiased approach to discovering changes in cellular regulatory modules was useful because more conventional hypotheses for resistance development such as additional mutations, overexpression, or altered responses in the inhibited EGFR/ErbB2/ErbB3 signaling pathway could not be experimentally verified. The disease subnetwork characteristic of the drug-resistant state includes classical processes involved in the compensation of glucose deprivation, such as aminoacid catabolism, glucagon signaling, glycogen breakdown, and unfolded protein response in the endoplasmatic reticulum that is involved in reducing damage of reactive oxygen species.32 These findings indicate that lapatinib toxicity is likely to be associated with glucose uptake inhibition, causing oxidative stress. The clinical relevance of these observations was demonstrated in patients by correlating the expression of relevant genes with prognosis. In agreement with the cell culture results, “unfolded protein response” genes are among the most enriched gene ontology processes in the resulting poor prognosis clinical subnetwork. To characterize possible treatment options, the Connectivity Map data set33 was screened for drugs that downregulate the upregulated genes in the glucose deprivation subnetwork and should therefore inhibit the subnetwork. The anthelmintic drug pyrvinium was identified as one such drug and was shown to cause significant growth arrest in resistant cells in comparison with nonresistant cells. It is noteworthy that statistical analysis of the gene expression data alone did not succeed in the identification of any of the four functional modules characterized in this study, including the glucose deprivation response network described here. The success in combining gene expression data with prior knowledge–based networks indicates that prior knowledge in the form of protein–protein interactions, metabolic, signaling, and other modules significantly complements unbiased high-throughput data for the precise molecular identification of disease states (Figure 1).
Another approach that has been used to identify disease state–relevant networks is the integration of gene coexpression networks with genotypic data such as SNPs describing quantitative trait loci (QTLs). Assuming that SNPs as part of the genome should be the same in all cells, including white blood cells, in humans, genotypic data do not only aid in the characterization of disease networks but—once associated with a disease network—also offer an easily accessible biomarker set for genetic susceptibility toward a defined disease state.
Gene coexpression networks contain genes as nodes, and two genes are connected if their expression values highly correlate with each other under different conditions. Such conditions can be different phases of the cell cycle or different disease phenotypes. Different metabolic disease states as they were investigated in groups of mice or humans are described here. A QTL is a region of a chromosome that is associated with a trait, e.g., height or obesity. A QTL linked to the expression of a gene is called a cis- or trans-expression QTL, depending on whether the locus of the gene of interest is in the vicinity of the QTL. The association between the expression of a gene and a phenotypic trait with a QTL can occur for several reasons.34 In an independent relationship, the QTL affects the gene’s expression and the phenotypic trait independently; in a causal relationship, the QTL affects the gene’s expression, which in turn affects the phenotypic trait; and in a reactive relationship, the QTL affects the phenotypic trait, which then affects the gene’s expression. When the possibility of multiple QTLs and transcripts associated with each other are considered, many complicated relationships can occur. Sophisticated statistical analysis is needed to identify these complex relationships.34
Integration of genomic characteristics and gene expression network analyses has been used to characterize disease networks involved in metabolic syndrome and to identify an associated genomic biomarker set35–37 (Figure 4). The disease network was first identified in mice35 and then the mouse network was used to find a similar network in humans36 that was linked to a group of cis-expression QTLs as a predictive biomarker set for obesity, demonstrating the potential of this approach for the identification of biomarkers. Genomic, transcriptomic, and pathophysiological data were collected from 330 mice with various metabolic phenotypes.35 Animals were genotyped for ~1,300 SNPs, clinically characterized with regard to obesity, atherosclerosis, and other diabetes-related phenotypes, and gene expression was determined in the liver and adipose tissue. QTLs for the metabolic traits were identified on the distal part of chromosome 1.37 The gene expression values were used to build one liver and one adipose tissue gene coexpression network. Clustering of each of the two coexpression networks revealed several modules in each network. Each module consisted of genes that show a high pairwise correlation with the other genes in the same module. To determine which of these modules is relevant for metabolic diseases, the measured SNPs describing the QTLs for metabolic traits were integrated into the further analysis. Each module was checked for enrichment of genes with at least one causal relationship between the QTL and the corresponding metabolic trait. One adipose tissue and one liver module with ~50% gene overlap were both enriched for genes causally associated with each of the metabolic traits. These two modules were combined into one supermodule. The supermodule was significantly enriched in genes involved in immune defense and inflammatory responses and in genes associated with bone marrow–derived macrophages and spleen. Therefore, it most likely derived from a gene expression network in the liver and adipose tissue macrophages and was named macrophage-enriched metabolic network. The network, which could reflect differential gene expression in macrophages and/or changes in total macrophage numbers,38,39 contained many known genes involved in metabolic disease. Among the novel metabolic disease genes predicted by this approach and confirmed by gene ablation experiments in mice were lipoprotein lipase, β-lactamase, and protein phosphatase-1L.
The mouse macrophage-enriched metabolic network was used to characterize a similar disease module in humans. In a study of more than 600 adults of all ages and with a body mass index distribution matching the body mass index distribution of Western countries’ population, sex-specific gene coexpression modules were identified in subcutaneous fat tissue.36 Explicit comparison of overlapping genes between the human modules and the mouse macrophage-enriched metabolic network identified a human macrophage-enriched metabolic network. It was enriched for the same biological processes related to inflammatory responses and macrophage activation, and for genes associated with bone marrow–derived macrophages and spleen, thymus, and lymphoid tissue. Well-known macrophage surface markers such as EGF-like module containing mucin-like hormone receptor-like 1 and CD68 were also part of the human disease module. The expression of 98% of the human macrophage-enriched metabolic network genes significantly correlated with body mass index, indicating a key role of the module in obesity. To identify genotypic markers that could be determined in easily accessible white blood cells, the vicinity of each gene of this network was searched for that cis-expression SNP which most strongly associates with the expression of the gene. Although individual SNPs did not correlate with obesity, multiple linear regression analysis combined with a randomization approach revealed that the set of identified cis-expression SNPs significantly associated with body mass index.
Therefore, networks built from differentially expressed genes allow the mechanistic identification of genomic characteristics (in this case, SNPs) that are associated with phenotypes. This computational approach, which integrates the analysis of genotypes, gene expression networks, and clinical data, shows promise for the identification of biomarkers or biomarker sets for disease progression and drug response and can become part of a useful toolkit for precision medicine.
Some concepts of precision medicine have already become part of clinical practice, especially in the treatment of cancer. Apart from being treated on the basis of their tissues of origin, an increasing number of cancers are also treated on the basis of the cancer’s molecular signature, which is currently represented by single-molecule markers. Cytotoxic agents are combined with or replaced by targeted agents that specifically target the product of a mutated cancer gene or a normal gene the cancer cell specifically depends on—as summarized in the concepts of oncogene addiction, non-oncogene addiction, and synthetic lethality.40 Single cancer gene mutations are used as predictors for treatment success in the clinic.
Oncogene addiction describes the dependence of cancer growth on a certain pathogenic oncogene. A classical example for oncogene addiction is Human Epidermal Growth Factor Receptor 2 (HER2/ERBB2) as a marker for successful treatment with trastuzumab for metastatic breast cancer.41 Similarly, BCR-ABL rearrangement in chronic myeloid leukemia is a marker for successful treatment with the tyrosine kinase inhibitor imatinib.40 Non-oncogene addiction describes the extended need of the cancer cell for a regular nonpathologically transformed gene. HER2 oncogene expression in breast cancer cells seems to depend on heat shock protein 90 kDa (HSP90), because the treatment of trastuzumab-refractory HER2-positive breast cancer with tanespimycin, inhibiting the heat shock protein, has shown promising results.42 The concept of synthetic lethality indicates a defect of the cancer cell in a certain function so that targeting a compensatory mechanism will cause cancer cell death. It is exemplified by cancer cells harboring breast cancer susceptibility protein 1/2 (BRCA1/2) gene mutations resulting in a malfunction of DNA double-strand break repair by homologous recombination. Treatment with poly(ADP-ribose) polymerase inhibitors impairing repair of DNA single-strand breaks resulted in a 40% response rate in patients with advanced BRCA1/2-mutated ovarian cancer in a phase I trial.43
To identify new single-molecule markers and multifeature signatures of drug sensitivity, two recent experimental studies investigated the connection between drug response rates and multiple genomic and transcriptomic features in many cancer cell lines originating from different tissues, including lung, ovary, brain, gastrointestinal tract, and bone marrow. The Cancer Cell Line Encyclopedia, which consists of ~1,000 human cancer cell lines encompassing 36 various tumor types, was used to characterize the relationship between gene mutations, DNA copy-number variations, gene expression levels, and the sensitivity to 24 targeted and cytotoxic agents.44 In a similar approach, 639 human tumor cell lines originating from rare and common types of epithelial, mesenchymal, and hematopoietic origin in children and adults were screened for sensitivity toward 130 drugs with various, but also overlapping, targets and mechanisms.45 The data collection for the biomarker analysis in this study included the sequencing of 64 commonly mutated cancer genes, the investigation of genome-wide copy-number variations, microarray gene expression analysis, and the determination of seven commonly rearranged cancer genes and of microsatellite instability. Both approaches identified known markers for drug sensitivity of cancer cells, highlighting the potential of the approaches to identify relevant associations. For example, advanced statistical analyses such as elastic net regression revealed BRAFV600E mutation as a top-ranked marker for RAF inhibitors, HGF expression, and MET amplification for MET/anaplastic lymphoma kinase inhibition and activating mutations in BRAF and neuroplastoma RAS viral oncogene homolog for MEK inhibition.44 Similarly, statistical analysis of gene mutations and drug sensitivity identified many known predictors and revealed that almost every cancer gene mutation was associated with increased resistance or sensitivity to at least one drug. This was independent of the drug targeting the biomarker oncogene. As new single-molecule markers both studies43,44 identified predictors for drug sensitivity in Ewing sarcoma. Schlafen family member 11 expression was identified as the top correlate of sensitivity to irinotecan and topotecan,44 whereas the EWS-FL1 rearrangement characteristic of Ewing sarcoma strongly predicted sensitivity to the poly (ADP-ribose) polymerase inhibitor olaparib,45 which might offer a new treatment option for this tumor.
In agreement with standard clinical treatment strategies, which are mainly based on treating cancer with regard to the originating tissue, cell lineage was a major predictive factor for several compounds in both studies. For example, the success rate of the histone deacetylase inhibitor panobinostat was significantly increased in hematopoietic as compared with solid-tumor cell lines,44 and breast cancer cells were sensitive to phosphoinositid 3-kinase pathway inhibitors, renal cell carcinoma cells to Src inhibitors, and glioma cells to a rho-associated coiled-coil containing protein kinase inhibitor.45 However, the response to most drugs did not correlate with a specific tumor tissue type,45 supporting the concept that a molecular signature-based treatment of cancer would also have to be taken into account when deciding on strategies.
In addition to single markers, multifeature signatures of drug sensitivity were identified for most of the investigated drugs.45 RAF and MEK1/2 inhibitor sensitivities, for example, were associated not only with BRAF mutation but also with 67 other features, including known regulators of MAPK signaling (Sprouty homolog 2, Dual Specificity Protein Phosphatases 4 and 6). Multifeature signatures of drug sensitivity are likely to complement single-molecule markers for disease states and optimal drug treatment as another step toward precision medicine. They probably describe the susceptibility of one or more subcellular networks in the cancer cells to perturbation by the drug. Multifeature signatures are likely to reflect different disease state networks and can help us build the library of disease state networks outlined in Figure 1. The two studies described here were based on statistical analysis of high-throughput data to identify associations between biomarkers or biomarker sets and drug sensitivity. Further work, including the integration of high-throughput screening data with computational network biology, should enable us to identify the underlying disease state networks and the associated subcellular processes.
A first step into this direction was made by a recent study using gene expression profiles and copy-number variations in the genome to classify 49 breast cancer cell lines into the luminal, basal, and claudin-low subtypes defined in primary tumors and into cell lines with HER2/ERBB2 amplification (ERBB2AMP).46 Screening with 77 drugs showed subtypespecific responses for 23 agents, with a high correlation of responses toward drugs with similar targets or mechanisms. For example, etoposide, docetaxel, and cisplatin sensitivities were preferentially documented in basal and claudin-low cell lines as they had also been observed clinically.47,48 In contrast to the experimental studies described above, computation-based integration of the genotypic and expression profiles to identify pathway-based mechanisms revealed that the susceptibility of basal cancer cell lines toward cisplatin could be explained by the upregulation of a module related to DNA-damage response.46,49 Prospectively, such a transcriptome signature could be used to identify patients who would respond to treatment with cisplatin.
Sensitivity toward the HSP90-inhibitor geldamycin in ERBB2AMP cell lines was associated with increased activity of an ERBB2-HSP90 module. Clinically, high HSP90 expression is an adverse prognostic marker in HER2+ breast cancer,50 probably because of the dependency of ERBB2 on HSP90.42 As outlined above, this study also reinforces the notion that network analysis based on prior knowledge and genomic data could be valuable in both more precisely defining disease state and yielding better predictors of response to drug therapy.
In developing precision medicine, an important role of systems therapeutics will be to enable the prediction of dosing regimens for drugs. Predicting the correct dose depends on understanding both the pharmacokinetics and pharmacodynamics in the context of genomic and epigenomic status of the patient. For this, we need to use dynamical models, most often multicompartment ordinary differential equation (ODE) models. Two recent studies provide a glimpse of the potential power of combining genomic information into network models and using these as the basis for ODE-based pharmacokinetics/pharmacodynamics models.
Panetta et al.51 evaluated the pharmacokinetics of methotrexate in patients with acute lymphoblastic leukemia. Their ODE model focused on the intracellular disposition of methotrexate and its polyglutamylated species in leukemic blasts. These metabolites are linked to the folate biochemical pathway, which contains key drug targets: dihydrofolate reductase, thymidylate synthase, glycinamide ribonucleotide transformylase, and aminoimidazole carboxamide ribonucleotide transformylase. The model computes how gene polymorphisms influence methotrexate transport and intracellular metabolism. The model can be used to predict how de novo purine synthesis is affected by different doses or different durations of drug infusion in patients. Such approaches that use multicompartment ODE models to connect SNPs to gene expression patterns and these patterns, in turn, to drug action provide an explicit method for connecting patient characteristics between scales of biological organization. One noteworthy feature in this study is that the folate biochemical pathway model was computed using parameters from biochemical reactions measured in several cell types, i.e., canonical parameters, a feature quite consistent with current system biology approaches.
In a study on patients with colon cancer, Hector et al.52 focused on measuring variations in the levels of apoptotic proteins in individual patients and developed ODE model–based prediction of the efficacy of chemotherapy based on 5-fluorouracil. Changes in the levels of key apoptotic proteins were measured by western blotting, and the differing levels of these proteins were used to develop individual dynamical models that computed the activity of caspase 3 in the tumor for each patient. The models were binned into two categories: “caspase activation predicted” and “no caspase activation predicted.” The model prediction for each patient was then correlated with chemotherapy outcomes. Eighty-five percent of individual models for each patient correctly predicted chemotherapy outcome.
Both these studies provide clear indications of the power of dynamical systems approach in precision medicine. In both studies, the dynamical models relied on the use of canonical parameters to develop the pharmacokinetic and pharmacodynamic models. Such canonical parameters can be changed to make them patient-specific on the basis of genomic or proteomic information from the patient. The dynamical models can then be used to determine dosing regimens. A futuristic view of the work flow for how dosing regimens for individuals could be arrived at is shown in Figure 5. It is possible that the use of such canonical models may limit the accuracy and predictive capability of these multiscale models. Further research in systems pharmacology is needed to define the conditions under which parameters in canonical models can be used in predictive models for individual patients.
If dynamical models are to become an integral part of the systems approaches in precision medicine and drug therapy, we need computer programs that can take molecular data from patients, incorporate them into the computational models, and produce predictive outputs. Yang et al.53 have developed a computational algorithm to identify such optimal treatment regimens. This algorithm uses a trial-and-error approach to produce optimal treatment predictions on the basis of a dynamical model of a disease network. The parameters, including rate constants and protein and metabolite concentrations, can be canonical or be defined by the user. The authors used an ODE model of arachidonic acid metabolism network to compute the therapeutic vs. adverse effects of nonsteroidal anti-inflammatory drugs and predicted the effect of single drugs and combination therapy. This type of algorithm shows promise in the integration of network analyses and ODE-based simulations for the prediction of drug efficacy in individual patients.
As the fields of precision medicine and systems therapeutics are only in the beginning stages, it is difficult to predict with certainty how these fields will develop for different diseases. Nevertheless, it is clear that both fields will rely on systems approaches and, as has been shown here, that computation is a critical common method for extracting knowledge from the large data sets that the systems-type experiments produce. From such computational analyses, the various studies summarized in this review allow us to articulate several key concepts that answer the questions raised in the Introduction.
In a recent article,54 we proposed that integrating genomic and epigenomic characteristics along with gene expression data to develop networks that can be used to build dynamical models of drug action could be a feasible approach whereby precise diagnosis of disease state can be used to predict drug efficacy and, in the future, the propensity for rare adverse events. Currently, the high-throughput method closest to routine use in clinical laboratories is DNA sequencing technology. As the price of sequencing falls and accuracy increases, we will be able to gather different types of genomic information, including copy-number variation and SNPs, including mutations and rearrangements. By using the sequencing technology, we can also obtain information on microRNAs and gene expression patterns. Statistical analyses of these rich data sets will most likely allow us to develop networks that are characteristic of disease states as the pathophysiology progresses in individual patients, i.e., have a precise disease taxonomy. When these networks are converted to systems of differential equations so that drug action can be modeled, systems pharmacology can use the precision in disease taxonomy to predict several facets of drug action. Such integrated algorithms can be used for multiple purposes. From the therapeutic perspective, these include whether monoor polypharmacology is likely to be most effective, selection of doses, the time frame of administration, and whether sequential or simultaneous polypharmacology would work better. From an adverse effect perspective, the algorithm will be able to predict resistance to therapy in individuals as well as the propensity of rare adverse events that could be triggered by the drug or drug combinations. The transition to electronic medical records, the widespread availability of high-performance computing, and the developing ability of clinical laboratories to produce high-quality genome-wide sequence data make the integration of precision medicine and systems pharmacology likely in the next decade or so.
This study was supported by National Institutes of Health grants P50GM 071558 and GM54508.
CONFLICT OF INTEREST In 2012, R.I. has served as a consultant for GlaxoSmithKline and Roche. GlaxoSmithKline supports a research fellow in the R.I.’s laboratory. J.H. declared no conflict of interest.