|Home | About | Journals | Submit | Contact Us | Français|
A majority of therapeutic interventions occur late in the pathological process, when treatment outcome can be less predictable and effective, highlighting the need for new precise and preventive therapeutic development strategies that consider genomic and environmental context. Translational bioinformatics is well positioned to contribute to the many challenges inherent in bridging this gap between our current reactive methods of healthcare delivery and the intent of precision medicine, particularly in the areas of drug development, which forms the focus of this review.
A variety of powerful informatics methods for organizing and leveraging the vast wealth of available molecular measurements available for a broad range of disease contexts have recently emerged. These include methods for data driven disease classification, drug repositioning, identification of disease biomarkers, and the creation of disease network models, each with significant impacts on drug development approaches.
An important bottleneck in the application of bioinformatics methods in translational research is the lack of investigators who are versed in both biomedical domains and informatics. Efforts to nurture both sets of competencies within individuals and to increase interfield visibility will help to accelerate the adoption and increased application of bioinformatics in translational research.
It is possible to construct predictive, multiscale network models of disease by integrating genotype, gene expression, clinical traits, and other multiscale measures using causal network inference methods. This can enable the identification of the “key drivers” of pathology, which may represent novel therapeutic targets or biomarker candidates that play a more direct role in the etiology of disease.
A majority of medical interventions occur late in the pathological process, when treatment outcome can be less predictable and effective, highlighting the need for new precise and preventive therapeutic development strategies that consider genomic and environmental context. Translational bioinformatics is well positioned to contribute to the many challenges inherent in bridging this gap between our current reactive methods of healthcare delivery and the intent of precision medicine, particularly in the areas of drug development, which forms the focus of this review.
A range of powerful informatics methods have been developed to organize and capitalize on the deluge of molecular level information available in a range of disease contexts, in areas as diverse as disease classification, drug repositioning, identification of disease biomarkers and the creation of disease network models, each with significant impacts on drug development approaches.
The approaches discussed are relevant to multiple domains of clinical practice, and exist as a diverse array of potential and realized translational endeavors. While some methods (e.g., the use of disease biomarkers to drive treatment choices) already reflect a standard of care within certain areas of medicine, other areas (e.g., the modeling of causal disease networks) which do not yet routinely impact clinical care, are expected to add insight to the identification of high-quality drug targets to inform drug development, as well as guiding the interpretation of individual genetic variation in creating tailored therapeutic strategies.
Healthcare in the 21st century faces a unique set of challenges in facing rising healthcare costs and declining research and development productivity in therapeutic discovery and development. Until 2010, costs have been rising faster than economic growth in many Organization for Economic Cooperation and Development countries,1 which has positioned healthcare expenditure as a key economic focus. As innovation of new treatments, rising provider costs, and an increased prevalence of expensive, chronic conditions press treatment costs upwards, the expanding, ageing population in many developed nations is reflected in an increased demand for medical care.
An important driver of this cost growth is the partial success of many existing treatments; medical interventions which lead to so-called “chronification” of complex, previously fatal diseases which can now be modified or at least managed symptomatically for many years (e.g., ischemic heart disease, type 2 diabetes, cerebrovascular disease and chronic obstructive pulmonary disease), with the incidence of such diseases increasing with age.2 Despite the benefits of being able to offer such treatments to patients, the intervention and maintenance of treatment at this level of disease progression is inefficient and expensive. Of course, historically there has been little alternative but for a patient become symptomatic before seeking clinical review, as the combination of clinically observable signs and reported symptoms have been our primary view into a patient's physiology, and it is usually only once a significant system perturbation has occurred that an individual will notice anything unusual. However, in many pathologies, predisease conditions can now be identified (e.g., prehypertension, prediabetes, mild cognitive impairment), states which imply increased risk of progression to full-blown disease for a patient, and in some cases, with an accompanying opportunity to engage in early therapeutic strategies (e.g., exercise prescriptions, dietary changes, medications) which can reduce the risk of progression. Some of the key challenges in this paradigm of disease screening are in identifying predisease states which represent significantly increased risk of future adverse health outcomes, identifying reliable markers of these predisease states, and developing useful therapeutic strategies to offer individuals should they receive a predisease diagnosis. With healthcare costs amounting to US$2.6 trillion in 2010, and three out of four treatment dollars being spent on the management of chronic conditions, the need for new approaches to address these issues has never been more pressing.3
The concept of precision medicine, an approach where a patient's health traits are interpreted alongside state-of-the-art molecular profiles, to develop accurate diagnostic, prognostic and therapeutic strategies which are tailored to reflect individual physiological context, has been put forth as a model for transitioning towards a more safe, effective, and efficient healthcare paradigm.4 Precision medicine offers an expectation of an improved power to diagnose disease at earlier stages, facilitating treatment initiation at more stable phases of pathogenesis. Recent advances in methods for gathering and analyzing genomic and other molecular level data across many technology platforms, and different biological contexts (e.g., different diseases, populations, tissues, or time periods5) serves as the groundwork for enabling precision medicine.
The development of safe and effective targeted therapeutics is a core component of precision medicine; however, the current industry development pipeline is not presently well poised to meet this need. The number of Food and Drug Administration applications and subsequent approvals of new therapies has been stable for the past decade (Fig. 1) reflecting the growing costs and increasing failure rates of clinical trials.6 Explanations for this trend are not universally agreed upon, but some cite an exhaustion of figurative “low-hanging fruit” in drug development opportunities, forcing the pursuit of riskier development choices, and the requirement to deepen our understanding of the basic biological pathways which underlie disease processes.7 While target based screens, cell-based assays, and genetic studies are capable of generating many potential drug target leads, in many cases we aren't yet capable of interpreting the greater biological and physiological context of the target modulatory effects, and thus, are limited in the ability to exploit such leads. How then, can we link the expectations of precision medicine to current paradigms in drug discovery? In this review, we discuss approaches in which bioinformatics can be used to assist these challenges, with a particular focus on translational bioinformatics, a field which focuses on “the development of storage, analytic and interpretive methods to optimize the transformation of increasingly voluminous biomedical data into proactive, predictive, preventative, and participatory health.”8
First, we outline the manner in which integrating high-dimensional data drawn from multiple levels of biological hierarchy can be used to create new classification systems for disease, and in turn allow the inference of new insights on the basis of shared features between diseases. Next, we discuss the ways in which translational bioinformatics has helped to move beyond population-level evaluation of new therapies, to identify subpopulations of patients which may derive benefit from a therapy while saving others from needless, costly or even harmful interventions. We also describe the ways in which translational bioinformatics can leverage available genomic, molecular, clinical, and other data sources to create integrated frameworks, which can be used to identify new indications for existing therapies. Furthermore, we also review some of the tools that have been developed to integrate matched DNA sequence data, gene expression data, and disease trait data to construct probabilistic causal models of disease, which allow the identification of the true drivers of the molecular networks that underlie disease.
Although the main focus of this review is the applicability of translational bioinformatics to drug development in general, we will conclude by reviewing some of the existing bioinformatic approaches which have been applied to the complex field of wound healing, and discuss some of the ways in which the methods described in this article could be utilized for novel approaches, such as transcriptomic profiling to identify wound subtypes and guide treatment decisions, or to identify new therapeutic leads to inform drug development. The methods described here are also relevant to the extent that they can be used to improve diagnostic and therapeutic strategies for the range of systemic diseases which have an impact on wound healing, such as peripheral vascular disease, type 2 diabetes and venous hypertension.
The value of creating a framework by which to classify diseases was apparent in the time of Hippocrates, though wasn't developed in a systematic way until the Enlightenment, initially by Sauvages9 and then built upon by Linnaeus.10 The complexity of modern disease taxonomies have increased significantly since then, though the underlying principles of classification on the basis of observable attributes remains similar. Our current disease classification systems reflect the diverse requirements of stakeholders, including classification for processing of reimbursement (e.g., International Classification of Disease [ICD]), research purposes (e.g., Medical Subject Headings), specialist reporting (e.g., Systematized Nomenclature of Medical Terms-Clinical Terms [SNOMED-CT]), and clinical classification (ICD, Diagnostic and Statistical Manual IV). These taxonomies tend to reflect the attributes of disease that are routinely measured, which is itself a function of convenience and utility. Therefore, current systems classify diseases according to constellations of clinical, anatomical or investigative findings. Such methods have undoubtedly been beneficial, but can also be misleading, by clustering essentially dissimilar pathologies (which may share organs systems or clinical manifestations) or separating similar ones (which may appear distinct at a clinical level). Loscalzo et al.11 discuss the examples of diseases which form clinically distinct syndromes yet emerge from the same molecular basis, and conversely, diseases that have multiple genetic bases, yet converge into a common phenotype. Sickle cell anemia, a monogenic disease, with a well-characterized, single genetic basis, can clinically manifest along a spectrum of clinical severity ranging from mild anemia to veno-occlusive stroke, dependent on additional genetic and environmental risks.11 This is contrasted with diseases, such as familial pulmonary artery hypertension (PAH), which are associated with many different mutations affecting members of the transforming growth factor beta receptor superfamily, yet tending towards a common phenotype.12
There have been multiple efforts to classify systems of diseases according to their molecular level correlates, including investigation by the National Academy of Science into creating a knowledge network of disease to underpin a new taxonomy of disease.13 At a disease class level, Sirota et al.14 examined Genome Wide Association Study results to determine genetic variation profiles associated with a range of autoimmune disease, identifying sets of diseases which appeared to cluster together, conferring some protective effects against diseases in other classes. They reported individual single nucleotide polymorphism (SNP) and gene associations that appear to drive the similarities and differences between different clusters of diseases. Interestingly, they found that some disease pairs (e.g., Crohn's disease and rheumatoid arthritis), which share some treatments, and clinical features, possess different underlying genetic profiles suggesting a differing molecular basis. Suthram et al.15 used protein interaction networks to identify functional modules, which were then assessed in the context of gene expression microarray data for 54 different diseases. Each disease was comprised of a signature of “module response scores”, which were used to cluster diseases. Interestingly, sets of modules relating to key biological functions were found to be perturbed in many diseases. This same set of pan-disease modules was also enriched for proteins identified as pluripotent drug targets (i.e., drugs targeting these proteins tended to have many indications), suggesting the relevance of targets within these modules for a wide range of therapeutic uses.
The highlighted works demonstrate an important principle—the potential for data-driven molecular taxonomy approaches to challenge and improve existing models of disease taxonomy. Such studies offer the possibility to share insights between “similar” diseases with the potential of identifying common disease pathways (with associated drug repositioning opportunities), as well as by looking at dissimilar pairs, which can lead to the identification of exclusive pathways, with associated opportunities for further hypothesis-led studies to identify novel treatment targets (Fig. 2).
Despite an increasing ability to compare and contrast different disease entities at a range of biological levels of organization, an important challenge lies in translating this into clinical decision-making tools. While new disease descriptions and definitions may be scientifically informative, their utility is only maximized if they can be used to guide expectation of clinically relevant outcomes (such as disease prognosis or therapeutic response). With current methods, a lower bound on the time to identify such associations is formed by the natural time course of the disease process under investigation.
Along with our ability to measure huge amounts of genetic and other molecular data has come the realization that in many cases, certain molecular traits can be used to predict important variability between patients in clinically relevant areas, such as treatment response, side effect profiles or toxicity risk. The translation of such approaches into panels of diagnostic markers to guide therapeutic strategies is known as companion diagnostics and forms a key component in the growing toolkit of personalized and precision medicine.16 This carries a clear set of advantages to the patients who are predicted to benefit from the therapy, and in the context of oncology, where its application is most commonly seen, tangible benefits to those who won't benefit, by saving the time, financial and physical costs of needless therapy (Fig. 3).
The number of companion diagnostics on offer is growing rapidly, occurring in regulated and unregulated contexts. Common examples of companion diagnostics include the use of human epidermal growth factor receptor 2 immunohistochemistry and gene-amplification tests to guide the use of trastuzumab in the treatment of metastatic breast cancer, and the c-kit immunohistochemistry test to guide the use of imatinib in the treatment of gastrointestinal stromal tumors.16
Lung cancer remains the commonest cause of cancer related death worldwide.17 Since 2004, treatment options for non small cell lung cancer (NSCLC, the most common histological type) have grown rapidly, largely as a result of increased knowledge of driver mutations which can then be used to guide appropriate treatment. Such approaches have demonstrated the value of evaluating the efficacy of medications in light of molecular subtypes, rather than broad disease class. An early drug trial of kinase inhibitors gefitinib and erlotinib initially demonstrated no clear overall benefit when applied to a cohort of patients with the broad diagnosis of NSCLC. However, it was recognized at the time that a subpopulation of ~10% of patients actually experienced dramatic antitumor results, which was a sufficient basis for Food and Drug Administration approval in 2003, despite the fact that the specific underlying mechanism conferring sensitivity wasn't yet known.18 Later, it was found that positive treatment response correlated with activating epidermal growth factor receptor mutations, which could then be used as a distinct biomarker to guide initiation of treatment.19–22 This biomarker enabled a more effective trial design, with optimization of therapeutic indications; thus, reducing unnecessary side effects and costs in patients that wouldn't benefit from the therapy.
A more recent example, centers on the story of crizotinib, another kinase inhibitor (particularly MET [mesenchymal epithelial transition factor] and ALK [anaplastic lymphoma kinase] activity) used in the treatment of NSCLC. Part way through a clinical trial assessing its efficacy on a broad population of lung cancer patients,23 an independent team published findings of a particular chromosomal translocation affecting the ALK gene with an association to tumor growth in a subset of NSCLC patients.24 This allowed modification of the clinical trial to assess treatment efficacy within the frame of this particular variant, with dramatically different results, showing an increase in median survival time to about a year, as opposed to a few months with the standard of care. Such targeted approaches also offer the potential benefit of identifying useful therapeutic responses on smaller cohorts of patients.
It is clear that historically, a level of serendipity has contributed to these discoveries, suggesting that many additional opportunities exist, and have perhaps been missed. The increased use of systematic, bioinformatics approaches to correlate molecular and genomic level patient data with treatment response and adverse outcomes and drug toxicity is likely to enable more routine and accelerated discovery in relevant patient sub-populations. Such approaches could potentially leverage any accessible, meaningful biological elements, from single gene genotyping,25 to non-coding RNA expression,26 to multi-module disease gene expression signature.27
The practice of identifying additional therapeutic indications for existing drug compounds is referred to as drug repositioning, and has some key benefits over traditional methods of drug development. Estimating the average costs of bringing a drug to market is a complicated undertaking, though estimates placed capitalized costs between approximately US$800 million28 and US$1.8 billion29 with an accompanying timeline of 15 years.30 A large portion of these costs are associated with early stages of development and toxicity testing, with over 90% of tested compounds failing to progress beyond this stage.31 The identification of additional indications for existing medications offers clear time and cost benefits, thereby avoiding many of these early hurdles.
Traditional approaches to drug development usually focus on identification of a novel treatment target, followed by a search for a compound capable of appropriately modulating the target. A lengthy and costly target validation process then follows target identification. For a given compound, additional targets are not usually investigated, and additional clinical applications, are not routinely explored.32 This represents a huge opportunity for the systematic identification of new indications for existing therapeutics—some of the most commercially successful medications are administered for different indications than envisioned in their initial development. Prominent examples amongst many include sildenafil (an antiangina medication now used for treatment of erectile dysfunction and PAH), minoxidil (anti-hypertensive, now used for alopecia) and thalidomide (antinausea, now used for multiple myeloma and erythema nodosum leprosum).
Multiple approaches have been used in drug repositioning efforts, ranging from blind screening of libraries of drug compounds against model systems, to data-driven computational approaches that integrate and search across links between drugs and diseases using various forms of biomolecular and clinical data. Here we focus on computational approaches, which offer substantial advantages in cost and speed of discovery compared to experimental screening based approaches.
In general, computational approaches exploit the known links between diseases and drugs, with the accompanying possibility that shared attributes between drugs used to treat the same disease, or diseases which share treatments, implies a degree of meaningful similarity between some aspects of the linked objects, which can be used to generalize existing treatments into new clinical contexts. Dudley et al.32 describe a means to classify computational approaches to drug repositioning according to whether the mode of inference is drug-based or disease-based (Fig. 4). We will briefly outline these approaches, giving examples of each.
Drug similarity approaches leverage known properties of existing drug compounds to compare and classify on the basis of similarity in their chemical properties. Such approaches use a variety of chemical descriptors, such as QSAR models (quantitative structure-activity models) and pharmacophores to allow comparison and clustering of drugs which share similar properties.33 Such clustering can then allow the inference of similarity between drugs, and identification of potential novel therapeutic applications.
Noeske et al.34 used this conceptual approach to identify new treatment targets for a group of known metabotropic glutamate (mGlu) receptor agonists by preparing pharmacophore descriptors for the antagonists as well as a range of additional known compounds, and used self-organizing maps to create a compound map. They reported subclusters containing mGlu antagonists, with proximity to additional compounds known to bind to dopamine and histamine receptors. This was then used as the basis for further experiments, which confirmed the predicted interactions between mGlu receptor antagonists and Histamine Receptor 1 and dopamine receptor 2 targets.
Keiser et al.35 developed a quantitiative approach for clustering protein targets on the basis of similarity amongst their associated ligands. Their “Similarity Ensemble Approach” used 65,000 ligands grouped into sets on the basis of known interactions with hundreds of protein targets. A pairwise ligand similarity score was calculated between every member of each set and every other member of every other set, to derive the inter-set similarity, which reflected the similarity of the protein targets at the core of each set. In this way, protein targets can be linked even when they don't share any common ligands. Despite utilizing a protein similarity score derived entirely from chemical similarity of ligands, biologically related protein clusters nevertheless emerged, alongside unexpected links, such as methadone antagonizing muscarinic M3 receptors and loperamide antagonizing neurokinin N2 receptors (these predictions were then confirmed experimentally). With its potential to link proteins which are functionally related, yet share nothing in the way of structural or sequence similarity, this ligand-centric approach represents a valuable tool for deriving knowledge about biological function which can circumvent some of the limitations of approaches which rely solely on protein similarity metrics.
Drug similarity approaches are limited by a number of factors, including the requirement for accurate surveys of chemical properties for any drug under consideration and closed access to these types of information, often as part of the intellectual property strategy of a pharmaceutical company which has invested heavily in the development of its compounds. In addition, the effect of a drug is much more complex than its initial chemical properties. Many drugs undergo extensive metabolic transformations into their active components, and such processes aren't always well characterized, or well accounted for in pharmacophore descriptions.36
Another exciting approach to drug repositioning arises by characterizing drug compounds according to their impact on molecular activity. Such an approach frames a compound as a perturbation to the system it is introduced to, which can be used to identify a characteristic signature for that compound. This can then be used to compare many medications, with many opportunities to generalize therapeutic indications between “similar” drugs.
One of the most widely used resources in this area is the Connectivity Map,37,38 a repository for the whole genome transcriptional response of 1,309 small bioactive molecules applied to a range of cultured human cells. Iorio et al.39 used the connectivity map to perform pairwise comparisons between all compounds, followed by clustering to identify drug communities with cluster co-location, implying possible repositioning opportunities. By leveraging these associations, they predicted previously unknown autophagy activity for fasudil, a Rho-kinase inhibitor, which was later validated experimentally.
Sirota et al.40 used an integrated approach which compared drug signatures, with analogous disease signatures to identify repositioning opportunities. Leveraging publicly available microarray gene expression data, the authors looked at 100 diseases, and 164 drug compounds, hypothesizing that if a given disease is characterized by a gene expression signature, then drugs which induce an inverse signature may have therapeutic value, while drugs with similar signatures may exacerbate the disease. Using this approach, the authors rederived many known therapeutic applications, such as the efficacy of prednisolone in treatment of Crohn's disease and ulcerative colitis. In addition, they predicted therapeutic benefit for the over-the-counter H2-receptor antagonist cimetidine (commonly used in gastric reflux and peptic ulcer disease to reduce gastric acid secretion) in the treatment of lung adenocarcinoma. They then validated this prediction in-vitro using cell lines, and in-vivo using xenograft mouse models.
It is clear that such approaches aren't constrained to the use of gene expression data. Any method which provides meaningful, high-dimensional description of a drug compound can be considered. Chen et al.41 used PubChem42 to create signatures on the basis of bioassay results for PubChem compounds. They then constructed a bipartite network that mapped PubChem profiles onto biological networks (protein interaction networks and metabolic pathways) on the basis of sequence similarity between compound protein targets and the proteins represented in the biological networks. This allowed interpretation of compound efficacy or adverse effects to occur in a rich biological context, which can form the basis of further hypothesis led experiments.
The primary limitations of molecular similarity approaches are their reliance on the quality of data that are used to form the disease and drug signatures, as well as the simplifications which must be made in forming the disease signatures. For example, Connectivity Map reference signatures are derived from specific cell lines, which may be irrelevant to the actions of certain compounds, as well as neglecting the often complex metabolism which compounds undergo from the time of their administration to their arrival in distant cells. Additional simplifications occur in the transcriptional classification of many diseases, which are known to affect multiple tissue types and organ systems (such as peripheral vascular disease), or diseases where the relevant tissue types may not be obvious (hypertension) or amenable to easy sampling (Alzheimer's disease).
Molecular docking is a simulation-based approach that uses 3D structural protein information to characterize the interaction interface between drug compounds and targets. This can then be used as the basis for making predictions about whether drug compounds will interact with novel treatment targets. If a compound is predicted to interact with a novel treatment target, then an inference can be made that the compound may be useful in any diseases in which the treatment target is implicated.
In an interesting molecular docking approach, Kinnings et al.42 started with approved drugs, and identified binding sites from the 3D structural data of their known targets, using these sites as the basis for a similarity search with other proteins. Additional proteins that were found to possess similar sites were highlighted as predicted additional targets for the drug compound, allowing the repositioning of entacapone (most frequently used in Parkinson's disease to inhibit the metabolism of levodopa) as a treatment for multi-drug resistant tuberculosis.
Such approaches rely heavily on having access to 3D structural information of drugs and their ligands. Although many such structures are available and growing rapidly, some very significant omissions exist, including many members of the G-protein coupled receptor superfamily.
Another approach to drug repositioning which has been recently explored is one which leverages known associations between drugs and diseases, on the assumption that drugs which share many disease indications may share mechanisms of action; thus, allowing the generalization of putatively similar drugs into additional clinical contexts.43 As has been noted previously,32 this interesting approach is currently constrained by lack of systematic knowledge of the therapeutic intent which links a particular drug to a particular disease. For example, the treatment of an infectious disease may require concomitant administration of paracetamol for relief of fever and an antibiotic for the underlying infection—two treatments which clearly engage with the underlying pathology at different levels.
Some repositioning strategies proceed on the assumption that diseases that share drugs may do so because of common aspects in their molecular pathophysiology, implying a shared responsiveness to the therapeutic functions of the drug. On the basis of this, drugs may be repositioned into additional diseases that look quite different at the phenotypic level. The approach is analogous in many ways to methods which make use of shared molecular similarity, in fact a number of interesting studies have emerged which combine the two approaches to cluster diseases and drugs into communities to identify repositioning opportunities, including some already discussed.15,40 Hu and Agarwal44 used Gene Expression Omnibus (GEO) data to formulate disease signatures, and then clustered these using a correlation based approach. These signatures were then integrated with Connectivity Map data to cluster disease molecular pathology and drug mechanisms.
Understandably, such approaches are limited by the technologies that are used to characterize a pathophysiological state (e.g., gene expression microarray). Additional current limitations reside in capturing and meaningfully comparing the multi-tissue, multi-organ and dynamic nature of many diseases over time.
Another novel repositioning approach consists of linking drugs together on the basis of their side effect similarity. On the assumption that such side effect profiles provide a read-out of the physiological effect of a compound on a host system, Campillos et al.45 mapped medication side-effects from package leaflets, and mapped the listed terms to the Unified Medical Language System, creating weighted associations between side effects and drugs on the basis of side-effect frequency. Interdrug similarity was then calculated by the sum of common weights, allowing formation of drug communities, from which repositioning opportunities could be inferred.
Approaches that rely on the integration of data represented as a structured language are subject to the constraints of codifying such a complex, heterogeneous, and ambiguous domain as healthcare delivery into a formalized system. Some of the more common issues include discrepancies in categorization of concepts and the development of multiple representations of the same concept (e.g., SNOMED-CT possesses a concept for “Apgar score at 1 minute” as an observable entity, as well as a separate concept for “Apgar score at 1 minute” as a finding entity)46 which can lead to ambiguities in interpreting the semantic context of a given term.
Such limitations are also compounded by the precision with which side effect profiles are recorded. Many clinical signs and symptoms are quite non-specific, and the level of granularity with which side-effects are elicited could have significant effects on characterizing a drug. For example, a reported side-effect of abdominal pain, if not characterized further may imply mechanisms as diverse as gastric reflux, hepatotoxicity or drug-induced diarrhea.
Many promising computational drug repositioning approaches exhibiting quantifiable success have emerged in the past decade. It's also clear from the described approaches that techniques can be devised which exploit any sufficiently high-dimensional description of a disease or its known efficacious treatments. Some of the most promising approaches have been ones which integrate multiple strategies to various degrees: utilizing high throughput data describing therapies, as well as diseases to create communities of drug-disease interactions, with the possibility of linking these to other forms of rich biological data, such as protein interaction networks, DNA and protein sequence data, transcriptional and metabolic networks. Future approaches may also see the use of much higher-level phenotypic data, extracting features from imaging studies, electroencephalogram, or wireless monitoring devices as the basis for comparing disease and drug similarity.
An emerging perspective in systems medicine is that to understand biology, and identify the true drivers of pathology in individual patients will require the construction of high-dimensional networks representing multiple levels of biological hierarchy.47 In this view, core biological processes are mediated by networks of interacting molecules (DNA, RNA, proteins, metabolites, etc.) in a manner which is dynamically coordinated within and between tissues, and in a highly context dependent manner. This systems medicine view frames these molecular networks as key sensors to environmental and genetic perturbation,48 shifting their state in response to a myriad of context dependent cues, and in the frame of pathogenesis, representing intermediate molecular phenotypes which mediate the flow of risk between disease associated DNA variants and disease states.49
One of the great challenges implicit in developing such an understanding is the extraction of the relevant networks from an immensely noisy molecular background, to identify the molecular underpinnings of normal and pathogenic processes. Some exciting computational approaches have been developed, which integrate matched DNA variation, gene expression, and clinical data to construct probabilistic causal networks50—models which capture functional relationships between key biological objects (such as gene regulatory networks) and phenotypic data (such as disease traits), to approach a mechanistic understanding of disease and allow the differentiation of reactive elements from the true drivers of disease (Fig. 5). Bayesian network approaches are commonly employed to learn the conditional probability distributions of molecular and clinical variables, and then used to solve conditional probability equations to infer probable causal networks.51 Bayesian networks often infer many equally possible network graphs as solutions.51 This quickly leads to nontrivial computational and methodological challenges in exploring the complete space of all probable network graphs. Other popular approaches for network reconstruction are gene coexpression networks, or methods that first group genes and loci into functionally coherent groups called “modules”.51 Causal inference is then performed to infer networks of functional gene modules.
The Macrophage Enriched Metabolic Network (MEMN) is an example of a disease network derived by integrating gene expression, genotype and clinical data from adipose and liver tissue, in mice and humans, which was found to be causally associated with phenotypic traits of metabolic disease, such as diabetes, heart disease and obesity.49 The authors combined co-expression network analysis, with a statistical procedure called “likelihood-based causality model selection”50 to identify whether patterns of gene expression were causal for, or reactive to the phenotypic disease traits. The MEMN was shown to be strongly causal for the disease traits, facilitating the identification and validation of three previously unknown type 2 diabetes associated genes.
A key approach in constructing such networks, builds on weighted gene co-expression network analysis52 to identify sets of highly co-expressed genes in a biological context of interest. These gene modules can then be converted into a network through consideration of pair-wise gene co-expression thresholds. Schadt et al.50 demonstrated that these correlation-derived association networks could be converted into causal networks (where the direction of an interaction can be known) by integrating them with a systematic form of biologically meaningful perturbation—DNA sequence variation. By utilizing genetic loci that are correlated with mRNA transcript abundance (expression quantitative trait loci [eQTL]), the authors developed a statistical approach that considers an eQTL alongside a pair of genes with transcript levels that correlate with sequence variation of eQTL. By analyzing conditional correlations between these three variables, it is possible to infer the most likely relationship between the gene pair (either gene A regulates gene B, gene B is regulated by gene A, or gene A and B do not regulate each other).
Once such directed networks of gene regulatory relationships are available, it becomes possible to identify the key drivers of that network, and thus, the disease trait it has been constructed to reflect.50 By functionally annotating the network members, and identifying the enriched functional pathways which it is associated with, it becomes possible to work upstream into the network to identify genes which regulate many downstream members implicated in this particular function (local drivers) and ultimately to identify the most upstream genes (global drivers).53 This set of drivers represents a very high leverage set of targets that can then be investigated for disease diagnosis, monitoring and drug targeting purposes.
Such networks also offer the opportunity for in-silico screening of drug compounds by simulating the extended effects a given drug may impose on the network. In this way, unfavorable side effect profiles, or worthwhile therapeutic potential can be anticipated. Utilizing the network comprising the MEMN, Chen et al.49 predicted that Ppm1l (at the time, a poorly characterized protein phosphatase) had the potential to improve insulin resistance (highlighting its potential as a treatment target for Type 2 diabetes); however, due to its network position, it was also likely to increase weight gain, and increase blood pressure (therefore worsening overall cardiovascular risk). These predictions were then verified in Ppm1l knock-down mice, which demonstrated an improved diabetic profile, but with increased weight gain and blood pressure. This method of prioritizing leads for streamlined experimental validation represents a very promising advance in early drug development.
The importance of recognizing and accounting for inter-tissue interactions within such disease networks is also a critical factor, particularly for complex diseases that span multiple tissues. The majority of investigations into the relationships between DNA variation and the intermediate phenotypes associated with complex disease have traditionally focused on single tissue analysis, which doesn't account for the richly interrelated networks which occur between different tissue types. Dobrin et al.54 investigated multi-tissue coexpression networks in obesity, with a focus on inter-tissue networks between the hypothalamus, liver and adipose tissue (Fig. 6). Their work revealed biologically sensible networks of relevance to obesity, which weren't present in any single specific tissue, yet clearly emerged through a systems approach. The insights that can be built upon these types of methods are expected to prove useful in better capturing disease complexity, as a dynamic entity occurring within a richly interconnected network of organs and tissue, with consequences for our ability to understand pathology and ultimately guide drug development.
As treatment costs and demand for healthcare rise, it has become imperative to investigate new means of providing high quality care to patients. It's increasingly clear that we cannot continue to manage disease at the level that we have in the past. Translational bioinformatics is well poised to make key contributions to this complex set of challenges, in particular to the identification of novel, high-leverage therapeutic targets, optimizing the allocation of existing treatments to disease, and in the formation of biologically meaningful models of physiology and disease.
We have discussed a range of translational bioinformatics approaches which can contribute to these challenges at multiple levels, including the appropriate characterization and classification of diseases on the basis of their underlying molecular features, enabling the transfer of insights between pathologies deemed to be similar, and aiding the search for underlying principles which may separate pathologies deemed to be distant. One implication of this type of disease map is the possibility to identify specific biomarkers that correlate with disease subtypes, with relevance for choice of therapy, and allowing prediction and prevention of unfavorable side-effect profiles or toxicity effects.
We have reviewed some of the translational bioinformatics approaches for computational drug repositioning, which has the potential to powerfully leverage from pre-existing drug development efforts to expand and reshape the catalog of therapies which are currently available. We can see that current data-driven methods can be readily adapted to utilize increasingly voluminous, high-quality biological data to identify ever more precise attributes of disease and therapy, which can be used to generalize between therapeutic contexts.
We have also provided a brief overview of the evolving area of probabilistic causal network modeling of disease, which integrates genotype, expression and clinical data to provide deep insights into disease physiology and facilitating the development of powerful new therapeutic options. We expect that causal network modeling of disease will play a key role in building truly predictive models of natural disease progression in the near future. As the cost of molecular profiling technologies continues to decrease, it will become tractable to score multiscale molecular traits from individuals over time, and thus, become possible to characterize the dynamic changes driving network states from health to disease.55
A variety of studies have looked at the complex field of wounds and wound healing using bioinformatics approaches, including transcriptional and proteomic profiling of different wound areas,56 different wound types in animal models57 and human subjects.58 These kinds of approaches are valuable steps in elucidating the underlying molecular pathways, which may be aberrant, particularly in chronic wounds. Such studies could also lay a foundation for further works that place such molecular wound signatures within the landscape of other diseases, to allow for comparisons between different wound subtypes and other phenotypically different diseases, to identify opportunities for adapting existing therapies. An example of a hypothesis led approach was described which explored similarities between chronic wound healing and stroma tissue in cancers;59 however, to our knowledge an approach integrating profiles from a wider range of disease has not yet been performed. Other interesting bioinformatic wound studies have included the use of transcriptomic profiling to identify wound edge biomarkers which could then be used to guide surgical debridement.60 Such approaches could be adapted to identify topical therapeutics that could help guide enzymatic debridement, or to identify therapeutics with complemetary transcriptional profiles. Additional interesting directions may include the use of transcriptomic or proteomic profiling to identify different molecular subtypes of wound, to guide therapy choice, or a data-driven comparison between wound signatures and compound molecular signatures to identify new topical or systemic treatments.
Tables 1–3 provide an example of a bioinformatics approach to wound care. We used publicly available microarray data, collected by Smiley et al.,61 (stored in the GEO repository,62 GSE3204) in an investigation of the transcriptome of cultured skin substitute (CSS), an autologous culture of patient keratinocytes and fibroblasts which can facilitate healing in large burns. Although useful, such CSS only contain two cell types, and are thus, limited in their capacity to emulate normal skin. We hypothesized that identifying the difference signature between CSS and normal skin may highlight an opportunity for an existing drug compound to act as an adjunctive therapy alongside CSS, to augment its transcriptional response in the direction of normal skin.
We used the web-based tool GEO2R5 to rederive the most differentially expressed genes, reflecting the difference signature between CSS and normal skin cells. We extracted the 100 most differentially expressed genes, which comprised 25 upregulated (Table 1) and 75 downregulated genes (Table 2), which we used as the basis of a query signature, for use with Connectivity Map.37,38 We present the top 16 compounds (Table 3) which were found to have an “opposite” transcriptional signature to the CSS signature, comprising a selection of well characterized, approved medications, as well as a number of experimental compounds. We suggest that these results could form the basis of hypothesis led studies investigating the role of adjuvant therapies in the application of CSS to large healing wounds.
Additional ways in which translational bioinformatic approaches are being applied to the field of wound care, is in the contribution of better treatments of the diseases which predispose to chronic wounds, such as type 2 diabetes,49 venous hypertension and peripheral arterial disease.63,64
One of the key challenges facing the systematic integration of bioinformatic approaches with clinical medicine is the relative paucity of “bilingual” practitioners—computer scientists with a sense of front-line clinical priorities, and clinicians familiar with a broad range of computational tools and approaches. To remain current in any one field is a full-time undertaking, so dual trainees are still a limited resource. An ideal scenario would be one in which individuals possess enough biological knowledge to ask clinically useful questions, and also possess a command of existing computational and bioinformatic approaches to be able to address this question in a scientific way. Indeed, one of the major challenges presented by large-scale, integrative informatics approaches is the fact that these approaches enable discovery and evaluation of large number of hypotheses, and it can require significant clinical domain expertise to reduce or focus the set of hypotheses evaluated by informatics approaches.
Some universities have begun integrating bioinformatic content into their curriculum, including units on translational bioinformatics65 and interpretation of whole genome sequencing,66 for bioinformaticists, PhDs, genetic counselors and medical practitioners alike. An initial approach may also lie in broad, cross-field educational modules, for example, a series of reviews directed towards clinicians to provide outlines of the field of bioinformatics, without necessarily focusing on developing the computational skills to build specific solutions. A value inflection point will occur when clinicians possesses some fluency in the capabilities and limits of the available bioinformatics tools, and are able to recognize and articulate some broad applications to their areas of clinical expertise. Likewise, a similar approach could be envisioned where a survey of key clinical needs within medical domains are made available to bioinformaticians. This may then help catalyze conversations and collaboration with domain experts in both fields, informing the next iteration of tool-building and biological inquiry.
J.D. declares that he owns an equity interest in NuMedii, Inc. and has served as a consultant to GSK and Janssen Pharmaceuticals. B.R. declares that he has no competing financial interests. The content of this article was expressly written by B.R. and J.D. No ghostwriters were used to write this article.
Dr. Ben Readhead, MBBS, is an Australian trained medical practitioner with a long-standing interest in innovation within medicine and the life sciences. This has included the design and exploration of novel technologies for the delivery of advanced wound care in difficult to heal wounds. More recent work has looked at network level features of disease genes to gain biological insight into the mechanisms of pathogenesis in certain disease types. He has recently joined the Institute for Genomics and Multiscale Biology (Icahn School of Medicine at Mount Sinai) as a Biomedical Informatician with a focus on translational bioinformatic approaches to drug repositioning, gene co-expression network analysis and causal disease modeling. Dr. Joel Dudley, PhD, is a veteran bioinformatics and genomics researcher with more than 10 years of professional experience studying the genomic basis of species evolution and human disease. He has published more than 40 peer-reviewed research articles pertaining to personal genomics, genomic medicine, pharmacogenomics, drug discovery, bioinformatics, and evolutionary genomics. Joel is Director of Informatics and Assistant Professor of Genetics and Genomics Sciences at Icahn School of Medicine at Mount Sinai in New York. He earned a BS in Microbiology from Arizona State University and a PhD in Biomedical Informatics from Stanford University.