Purpose of review
The desire for biomarkers for diagnosis and prognosis of diseases has never been greater. With the availability of genome data and an increased availability of proteome data, the discovery of biomarkers has become increasingly feasible. This article reviews some recent applications of the many evolving “omic” technologies to organ transplantation.
With the advancement of many high throughput “omic” techniques such as genomics, metabolomics, antibiomics, peptidomics and proteomics, efforts have been made to understand potential mechanisms of specific graft injuries and develop novel biomarkers for acute rejection, chronic rejection, and operational tolerance.
The translation of potential biomarkers from the lab bench to the clinical bedside is not an easy task and will require the concerted effort of the immunologists, molecular biologists, transplantation specialists, geneticists, and experts in bioinformatics. Rigorous prospective validation studies will be needed using large sets of independent patient samples. The appropriate and timely exploitation of evolving “omic” technologies will lay the cornerstone for a new age of translational research for organ transplant monitoring.
genomics; proteomics; organ transplant; biomarker; translational medicine
Advances in biotechnology offer a fast growing variety of high-throughput data for screening molecular activities of genomic, transcriptional, post-transcriptional and translational observations. However, to date, most computational and algorithmic efforts have been directed at mining data from each of these molecular levels (genomic, transcriptional, etc.) separately. In view of the rapid advances in technology (new generation sequencing, high-throughput proteomics) it is important to address the problem of analyzing these data as a whole, i.e. preserving the emergent properties that appear in the cellular system when all molecular levels are interacting. We analyzed one of the (currently) few datasets that provide both transcriptional and post-transcriptional data of the same samples to investigate the possibility to extract more information, using a joint analysis approach.
We use Factor Analysis coupled with pre-established knowledge as a theoretical base to achieve this goal. Our intention is to identify structures that contain information from both mRNAs and miRNAs, and that can explain the complexity of the data. Despite the small sample available, we can show that this approach permits identification of meaningful structures, in particular two polycistronic miRNA genes related to transcriptional activity and likely to be relevant in the discrimination between gliosarcomas and other brain tumors.
This suggests the need to develop methodologies to simultaneously mine information from different levels of biological organization, rather than linking separate analyses performed in parallel.
Recent advances in high-throughput biotechnologies have led to the rapid growing research interest in reverse engineering of biomolecular systems (REBMS). ‘Data-driven’ approaches, i.e. data mining, can be used to extract patterns from large volumes of biochemical data at molecular-level resolution while ‘design-driven’ approaches, i.e. systems modeling, can be used to simulate emergent system properties. Consequently, both data- and design-driven approaches applied to –omic data may lead to novel insights in reverse engineering biological systems that could not be expected before using low-throughput platforms. However, there exist several challenges in this fast growing field of reverse engineering biomolecular systems: (i) to integrate heterogeneous biochemical data for data mining, (ii) to combine top–down and bottom–up approaches for systems modeling and (iii) to validate system models experimentally. In addition to reviewing progress made by the community and opportunities encountered in addressing these challenges, we explore the emerging field of synthetic biology, which is an exciting approach to validate and analyze theoretical system models directly through experimental synthesis, i.e. analysis-by-synthesis. The ultimate goal is to address the present and future challenges in reverse engineering biomolecular systems (REBMS) using integrated workflow of data mining, systems modeling and synthetic biology.
reverse engineering biological systems; high-throughput technology; –omic data; synthetic biology; analysis-by-synthesis
Omics and bioinformatics are essential to understanding the molecular systems that underlie various plant functions. Recent game-changing sequencing technologies have revitalized sequencing approaches in genomics and have produced opportunities for various emerging analytical applications. Driven by technological advances, several new omics layers such as the interactome, epigenome and hormonome have emerged. Furthermore, in several plant species, the development of omics resources has progressed to address particular biological properties of individual species. Integration of knowledge from omics-based research is an emerging issue as researchers seek to identify significance, gain biological insights and promote translational research. From these perspectives, we provide this review of the emerging aspects of plant systems research based on omics and bioinformatics analyses together with their associated resources and technological advances.
Bioinformatics; Data integration; Genome-scale approach; Omics; Systems analysis
The desire for biomarkers for diagnosis and prognosis of diseases has never been greater. With the availability of genome data and an increased availability of proteome data, the discovery of biomarkers has become increasingly feasible. However, the task is daunting and requires collaborations among researchers working in the fields of transplantation, immunology, genetics, molecular biology, biostatistics, and bioinformatics. With the advancement of high throughput omic techniques such as genomics and proteomics (collectively known as proteogenomics), efforts have been made to develop diagnostic tools from new and to-be discovered biomarkers. Yet biomarker validation, particularly in organ transplantation, remains challenging because of the lack of a true gold standard for diagnostic categories and analytical bottlenecks that face high-throughput data deconvolution. Even though microarray technique is relatively mature, proteomics is still growing with regards to data normalization and analysis methods. Study design, sample selection, and rigorous data analysis are the critical issues for biomarker discovery using high-throughout proteogenomic technologies that combine the use and strengths of both genomics and proteomics. In this review, we look into the current status and latest developments in the field of biomarker discovery using genomics and proteomics related to organ transplantation, with an emphasis on the evolution of proteomic technologies.
Biomarker discovery; proteogenomics; genomics; proteomics; microarray; transplantation; acute rejection; peptidomics
High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge.
In the Life Sciences ‘omics’ data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.
Random Forest; variable importance; local importance; conditional relationships; variable interaction; proximity
Significant research has been devoted to predicting diagnosis, prognosis, and response to treatment using high-throughput assays. Rapid translation into clinical results hinges upon efficient access to up-to-date and high-quality molecular medicine modalities.
We first explain why this goal is inadequately supported by existing databases and portals and then introduce a novel semantic indexing and information retrieval model for clinical bioinformatics. The formalism provides the means for indexing a variety of relevant objects (e.g. papers, algorithms, signatures, datasets) and includes a model of the research processes that creates and validates these objects in order to support their systematic presentation once retrieved.
We test the applicability of the model by constructing proof-of-concept encodings and visual presentations of evidence and modalities in molecular profiling and prognosis of: (a) diffuse large B-cell lymphoma (DLBCL) and (b) breast cancer.
information retrieval; molecular medicine; semantic model; clinical bioinformatics; predictive computational models
Genomic, proteomic, and other omic-based approaches are now broadly used in biomedical research to facilitate the understanding of disease mechanisms and identification of molecular targets and biomarkers for therapeutic and diagnostic development. While the Omics technologies and bioinformatics tools for analyzing Omics data are rapidly advancing, the functional analysis and interpretation of the data remain challenging due to the inherent nature of the generally long workflows of Omics experiments. We adopt a strategy that emphasizes the use of curated knowledge resources coupled with expert-guided examination and interpretation of Omics data for the selection of potential molecular targets. We describe a downstream workflow and procedures for functional analysis that focus on biological pathways, from which molecular targets can be derived and proposed for experimental validation.
Proteomics; Genomics; Bioinformatics; Biological pathways; Cell signaling; Databases; Molecular targets; Biomarkers
The identification of novel candidate markers is a key challenge in the development of cancer therapies. This can be facilitated by putting accessible and automated approaches analysing the current wealth of ‘omic’-scale data in the hands of researchers who are directly addressing biological questions. Data integration techniques and standardized, automated, high-throughput analyses are needed to manage the data available as well as to help narrow down the excessive number of target gene possibilities presented by modern databases and system-level resources. Here we present CancerMA, an online, integrated bioinformatic pipeline for automated identification of novel candidate cancer markers/targets; it operates by means of meta-analysing expression profiles of user-defined sets of biologically significant and related genes across a manually curated database of 80 publicly available cancer microarray datasets covering 13 cancer types. A simple-to-use web interface allows bioinformaticians and non-bioinformaticians alike to initiate new analyses as well as to view and retrieve the meta-analysis results. The functionality of CancerMA is shown by means of two validation datasets.
Over the past two decades, high-throughput (HTP) technologies such as microarrays and mass spectrometry have fundamentally changed clinical cancer research. They have revealed novel molecular markers of cancer subtypes, metastasis, and drug sensitivity and resistance. Some have been translated into the clinic as tools for early disease diagnosis, prognosis, and individualized treatment and response monitoring. Despite these successes, many challenges remain: HTP platforms are often noisy and suffer from false positives and false negatives; optimal analysis and successful validation require complex workflows; and great volumes of data are accumulating at a rapid pace. Here we discuss these challenges, and show how integrative computational biology can help diminish them by creating new software tools, analytical methods, and data standards.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-011-0983-z) contains supplementary material, which is available to authorized users.
Development of innovative high throughput technologies has enabled a variety of molecular landscapes to be interrogated with an unprecedented degree of detail. Emergence of next generation nucleotide sequencing methods, advanced proteomic techniques, and metabolic profiling approaches continue to produce a wealth of biological data that captures molecular frameworks underlying phenotype. The advent of these novel technologies has significant translational applications, as investigators can now explore molecular underpinnings of developmental states with a high degree of resolution. Application of these leading-edge techniques to patient samples has been successfully used to unmask nuanced molecular details of disease vs healthy tissue, which may provide novel targets for palliative intervention. To enhance such approaches, concomitant development of algorithms to reprogram differentiated cells in order to recapitulate pluripotent capacity offers a distinct advantage to advancing diagnostic methodology. Bioinformatic deconvolution of several “-omic” layers extracted from reprogrammed patient cells, could, in principle, provide a means by which the evolution of individual pathology can be developmentally monitored. Significant logistic challenges face current implementation of this novel paradigm of patient treatment and care, however, several of these limitations have been successfully addressed through continuous development of cutting edge in silico archiving and processing methods. Comprehensive elucidation of genomic, transcriptomic, proteomic, and metabolomic networks that define normal and pathological states, in combination with reprogrammed patient cells are thus poised to become high value resources in modern diagnosis and prognosis of patient disease.
The development of high throughput experimental technologies have given rise to the "-omics" era where terabyte-scale datasets for systems-level measurements of various cellular and molecular phenomena pose considerable challenges in data processing and extraction of biological meaning. Moreover, it has created an unmet need for the effective integration of these datasets to achieve insights into biological systems. While it has increased the demand for bioinformatics experts who can interface with biologists, it has also raised the requirement for biologists to possess a basic capability in bioinformatics and to communicate seamlessly with these experts. This may be achieved by embedding in their undergraduate and graduate life science education, basic training in bioinformatics geared towards acquiring a minimum skill set in computation and informatics.
Based on previous attempts to define curricula suitable for addressing the bioinformatics capability gap, an initiative was taken during the Workshops on Education in Bioinformatics and Computational Biology (WEBCB) in 2008 and 2009 to identify a minimum skill set for the training of future bioinformaticians and molecular biologists with informatics capabilities. The minimum skill set proposed is cross-disciplinary in nature, involving a combination of knowledge and proficiency from the fields of biology, computer science, mathematics and statistics, and can be tailored to the needs of the "-omics".
The proposed bioinformatics minimum skill set serves as a guideline for biology curriculum design and development in universities at both the undergraduate and graduate levels.
Protein phosphorylation is one of the most important post-translational modifications (PTMs) as it participates in regulating various cellular processes and biological functions. It is therefore crucial to identify phosphorylated proteins to construct a phosphor-relay network, and eventually to understand the underlying molecular regulatory mechanism in response to both internal and external stimuli. The changes in phosphorylation status at these novel phosphosites can be accurately measured using a 15N-stable isotopic labeling in Arabidopsis (SILIA) quantitative proteomic approach in a high-throughput manner. One of the unique characteristics of the SILIA quantitative phosphoproteomic approach is the preservation of native PTM status on protein during the entire peptide preparation procedure. Evolved from SILIA is another quantitative PTM proteomic approach, AQUIP (absolute quantitation of isoforms of post-translationally modified proteins), which was developed by combining the advantages of targeted proteomics with SILIA. Bioinformatics-based phosphorylation site prediction coupled with an MS-based in vitro kinase assay is an additional way to extend the capability of phosphosite identification from the total cellular protein. The combined use of SILIA and AQUIP provides a novel strategy for molecular systems biological study and for investigation of in vivo biological functions of these phosphoprotein isoforms and combinatorial codes of PTMs.
SILIA; AQUIP; plant; quantitative proteomics; post-translational modification; cell signaling and regulation; mass spectrometry-based interactomics
Microarray-based gene expression profiling represents a major breakthrough for understanding the molecular complexity of breast cancer. cDNA expression profiles cannot detect changes in activities that arise from post-translational modifications, however, and therefore do not provide a complete picture of all biologically important changes that occur in tumors. Additional opportunities to identify and/or validate molecular signatures of breast carcinomas are provided by proteomic approaches. Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) offers high-throughput protein profiling, leading to extraction of protein array data, calling for effective and appropriate use of bioinformatics and statistical tools.
Whole tissue lysates of 105 breast carcinomas were analyzed on IMAC 30 ProteinChip Arrays (Bio-Rad, Hercules, CA, USA) using the ProteinChip Reader Model PBS IIc (Bio-Rad) and Ciphergen ProteinChip software (Bio-Rad, Hercules, CA, USA). Cluster analysis of protein spectra was performed to identify protein patterns potentially related to established clinicopathological variables and/or tumor markers.
Unsupervised hierarchical clustering of 130 peaks detected in spectra from breast cancer tissue lysates provided six clusters of peaks and five groups of patients differing significantly in tumor type, nuclear grade, presence of hormonal receptors, mucin 1 and cytokeratin 5/6 or cytokeratin 14. These tumor groups resembled closely luminal types A and B, basal and HER2-like carcinomas.
Our results show similar clustering of tumors to those provided by cDNA expression profiles of breast carcinomas. This fact testifies the validity of the SELDI-TOF MS proteomic approach in such a type of study. As SELDI-TOF MS provides different information from cDNA expression profiles, the results suggest the technique's potential to supplement and expand our knowledge of breast cancer, to identify novel biomarkers and to produce clinically useful classifications of breast carcinomas.
Current microbiome research has generated tremendous amounts of data providing snapshots of molecular activity in a variety of organisms, environments, and cell types. However, turning this knowledge into whole system level of understanding on pathways and processes has proven to be a challenging task. In this review we highlight the applicability of bioinformatics and visualization techniques to large collections of data in order to better understand the information that contains related diet—oral microbiome—host mucosal transcriptome interactions. In particular, we focus on systems biology of Porphyromonas gingivalis in the context of high throughput computational methods tightly integrated with translational systems medicine. Those approaches have applications for both basic research, where we can direct specific laboratory experiments in model organisms and cell cultures, and human disease, where we can validate new mechanisms and biomarkers for prevention and treatment of chronic disorders.
oral microbiome; in silico modeling; systems biology; systems medicine; Porphyromonas gingivalis; biomarkers; probiotics; vaccines
Constant technological advances have allowed scientists in biology to migrate from conventional single-omics to multi-omics experimental approaches, challenging bioinformatics to bridge this multi-tiered information. Ongoing research in renal biology is no exception. The results of large-scale and/or high throughput experiments, presenting a wealth of information on kidney disease are scattered across the web. To tackle this problem, we recently presented the KUPKB, a multi-omics data repository for renal diseases.
In this article, we describe KUPNetViz, a biological graph exploration tool allowing the exploration of KUPKB data through the visualization of biomolecule interactions. KUPNetViz enables the integration of multi-layered experimental data over different species, renal locations and renal diseases to protein-protein interaction networks and allows association with biological functions, biochemical pathways and other functional elements such as miRNAs. KUPNetViz focuses on the simplicity of its usage and the clarity of resulting networks by reducing and/or automating advanced functionalities present in other biological network visualization packages. In addition, it allows the extrapolation of biomolecule interactions across different species, leading to the formulations of new plausible hypotheses, adequate experiment design and to the suggestion of novel biological mechanisms. We demonstrate the value of KUPNetViz by two usage examples: the integration of calreticulin as a key player in a larger interaction network in renal graft rejection and the novel observation of the strong association of interleukin-6 with polycystic kidney disease.
The KUPNetViz is an interactive and flexible biological network visualization and exploration tool. It provides renal biologists with biological network snapshots of the complex integrated data of the KUPKB allowing the formulation of new hypotheses in a user friendly manner.
In high-throughput -omics studies, markers identified from analysis of single data sets often suffer from a lack of reproducibility because of sample limitation. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple -omics data sets is challenging because of the high dimensionality of data and heterogeneity among studies. In this article, for marker selection in integrative analysis of data from multiple heterogeneous studies, we propose a 2-norm group bridge penalization approach. This approach can effectively identify markers with consistent effects across multiple studies and accommodate the heterogeneity among studies. We propose an efficient computational algorithm and establish the asymptotic consistency property. Simulations and applications in cancer profiling studies show satisfactory performance of the proposed approach.
High-dimensional data; Integrative analysis; 2-norm group bridge
Currently, cancer therapy remains limited by a “one-size-fits-all” approach, whereby treatment decisions are based mainly on the clinical stage of disease, yet fail to reference the individual's underlying biology and its role driving malignancy. Identifying better personalized therapies for cancer treatment is hindered by the lack of high-quality “omics” data of sufficient size to produce meaningful results and the ability to integrate biomedical data from disparate technologies. Resolving these issues will help translation of therapies from research to clinic by helping clinicians develop patient-specific treatments based on the unique signatures of patient's tumor. Here we describe the Georgetown Database of Cancer (G-DOC), a Web platform that enables basic and clinical research by integrating patient characteristics and clinical outcome data with a variety of high-throughput research data in a unified environment. While several rich data repositories for high-dimensional research data exist in the public domain, most focus on a single-data type and do not support integration across multiple technologies. Currently, G-DOC contains data from more than 2500 breast cancer patients and 800 gastrointestinal cancer patients, G-DOC includes a broad collection of bioinformatics and systems biology tools for analysis and visualization of four major “omics” types: DNA, mRNA, microRNA, and metabolites. We believe that G-DOC will help facilitate systems medicine by providing identification of trends and patterns in integrated data sets and hence facilitate the use of better targeted therapies for cancer. A set of representative usage scenarios is provided to highlight the technical capabilities of this resource.
It becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity. Of vital importance for refined clinical practice and improved intervention strategies is to define the hidden molecular distinct diseases using modern large-scale genomic approaches. Microarray omics technology has provided a powerful way to dissect hidden genetic heterogeneity of complex diseases. The aim of this study was thus to develop a bioinformatics approach to seek the transcriptional features leading to the hidden subtyping of a complex clinical phenotype. The basic strategy of the proposed method was to iteratively partition in two ways sample and feature space with super-paramagnetic clustering technique and to seek for hard and robust gene clusters that lead to a natural partition of disease samples and that have the highest functionally conceptual consensus evaluated with Gene Ontology.
We applied the proposed method to two publicly available microarray datasets of diffuse large B-cell lymphoma (DLBCL), a notoriously heterogeneous phenotype. A feature subset of 30 genes (38 probes) derived from analysis of the first dataset consisting of 4026 genes and 42 DLBCL samples identified three categories of patients with very different five-year overall survival rates (70.59%, 44.44% and 14.29% respectively; p = 0.0017). Analysis of the second dataset consisting of 7129 genes and 58 DLBCL samples revealed a feature subset of 13 genes (16 probes) that not only replicated the findings of the important DLBCL genes (e.g. JAW1 and BCL7A), but also identified three clinically similar subtypes (with 5-year overall survival rates of 63.13%, 34.92% and 15.38% respectively; p = 0.0009) to those identified in the first dataset. Finally, we built a multivariate Cox proportional-hazards prediction model for each feature subset and defined JAW1 as one of the most significant predictor (p = 0.005 and 0.014; hazard ratios = 0.02 and 0.03, respectively for two datasets) for both DLBCL cohorts under study.
Our results showed that the proposed algorithm is a promising computational strategy for peeling off the hidden genetic heterogeneity based on transcriptionally profiling disease samples, which may lead to an improved diagnosis and treatment of cancers.
Bioinformatics is the application of omics science, information technology, mathematics and statistics in the field of biomarker detection. Clinical bioinformatics can be applied for identification and validation of new biomarkers to improve current methods of monitoring disease activity and identify new therapeutic targets. Acute lung injurt (ALI)/Acute respiratory distress syndrome (ARDS) affects a large number of patients with a poor prognosis. The present review mainly focused on the progress in understanding disease heterogeneity through the use of evolving biological, genomic, and genetic approaches and the role of clinical bioinformatics in the pathogenesis and treatment of ALI/ARDS. The remarkable advances in clinical bioinformatics can be a new way for understanding disease pathogenesis, diagnosis and treatment.
Acute lung injury; Acute respiratory distress syndrome; Genomics; Proteomics; Metabolomics; Bioinformatics
Deep sequencing techniques provide a remarkable opportunity for comprehensive understanding of tumorigenesis at the molecular level. As omics studies become popular, integrative approaches need to be developed to move from a simple cataloguing of mutations and changes in gene expression to dissecting the molecular nature of carcinogenesis at the systemic level and understanding the complex networks that lead to cancer development.
Here, we describe a high-throughput, multi-dimensional sequencing study of primary lung adenocarcinoma tumors and adjacent normal tissues of six Korean female never-smoker patients. Our data encompass results from exome-seq, RNA-seq, small RNA-seq, and MeDIP-seq. We identified and validated novel genetic aberrations, including 47 somatic mutations and 19 fusion transcripts. One of the fusions involves the c-RET gene, which was recently reported to form fusion genes that may function as drivers of carcinogenesis in lung cancer patients. We also characterized gene expression profiles, which we integrated with genomic aberrations and gene regulations into functional networks. The most prominent gene network module that emerged indicates that disturbances in G2/M transition and mitotic progression are causally linked to tumorigenesis in these patients. Also, results from the analysis strongly suggest that several novel microRNA-target interactions represent key regulatory elements of the gene network.
Our study not only provides an overview of the alterations occurring in lung adenocarcinoma at multiple levels from genome to transcriptome and epigenome, but also offers a model for integrative genomics analysis and proposes potential target pathways for the control of lung adenocarcinoma.
Personalized medicine promises patient-tailored treatments that enhance patient care and decrease overall treatment costs by focusing on genetics and “-omics” data obtained from patient biospecimens and records to guide therapy choices that generate good clinical outcomes. The approach relies on diagnostic and prognostic use of novel biomarkers discovered through combinations of tissue banking, bioinformatics, and electronic medical records (EMRs). The analytical power of bioinformatic platforms combined with patient clinical data from EMRs can reveal potential biomarkers and clinical phenotypes that allow researchers to develop experimental strategies using selected patient biospecimens stored in tissue banks. For cancer, high-quality biospecimens collected at diagnosis, first relapse, and various treatment stages provide crucial resources for study designs. To enlarge biospecimen collections, patient education regarding the value of specimen donation is vital. One approach for increasing consent is to offer publically available illustrations and game-like engagements demonstrating how wider sample availability facilitates development of novel therapies. The critical value of tissue bank samples, bioinformatics, and EMR in the early stages of the biomarker discovery process for personalized medicine is often overlooked. The data obtained also require cross-disciplinary collaborations to translate experimental results into clinical practice and diagnostic and prognostic use in personalized medicine.
The modern biomedical research and healthcare delivery domains have seen an unparalleled increase in the rate of innovation and novel technologies over the past several decades. Catalyzed by paradigm-shifting public and private programs focusing upon the formation and delivery of genomic and personalized medicine, the need for high-throughput and integrative approaches to the collection, management, and analysis of heterogeneous data sets has become imperative. This need is particularly pressing in the translational bioinformatics domain, where many fundamental research questions require the integration of large scale, multi-dimensional clinical phenotype and bio-molecular data sets. Modern biomedical informatics theory and practice has demonstrated the distinct benefits associated with the use of knowledge-based systems in such contexts. A knowledge-based system can be defined as an intelligent agent that employs a computationally tractable knowledge base or repository in order to reason upon data in a targeted domain and reproduce expert performance relative to such reasoning operations. The ultimate goal of the design and use of such agents is to increase the reproducibility, scalability, and accessibility of complex reasoning tasks. Examples of the application of knowledge-based systems in biomedicine span a broad spectrum, from the execution of clinical decision support, to epidemiologic surveillance of public data sets for the purposes of detecting emerging infectious diseases, to the discovery of novel hypotheses in large-scale research data sets. In this chapter, we will review the basic theoretical frameworks that define core knowledge types and reasoning operations with particular emphasis on the applicability of such conceptual models within the biomedical domain, and then go on to introduce a number of prototypical data integration requirements and patterns relevant to the conduct of translational bioinformatics that can be addressed via the design and use of knowledge-based systems.
Consortia of microorganisms, commonly known as biofilms, are attracting much attention from the scientific community due to their impact in human activity. As biofilm research grows to be a data-intensive discipline, the need for suitable bioinformatics approaches becomes compelling to manage and validate individual experiments, and also execute inter-laboratory large-scale comparisons. However, biofilm data is widespread across ad hoc, non-standardized individual files and, thus, data interchange among researchers, or any attempt of cross-laboratory experimentation or analysis, is hardly possible or even attempted.
This paper presents BiofOmics, the first publicly accessible Web platform specialized in the management and analysis of data derived from biofilm high-throughput studies. The aim is to promote data interchange across laboratories, implementing collaborative experiments, and enable the development of bioinformatics tools in support of the processing and analysis of the increasing volumes of experimental biofilm data that are being generated. BiofOmics’ data deposition facility enforces data structuring and standardization, supported by controlled vocabulary. Researchers are responsible for the description of the experiments, their results and conclusions. BiofOmics’ curators interact with submitters only to enforce data structuring and the use of controlled vocabulary. Then, BiofOmics’ search facility makes publicly available the profile and data associated with a submitted study so that any researcher can profit from these standardization efforts to compare similar studies, generate new hypotheses to be tested or even extend the conditions experimented in the study.
BiofOmics’ novelty lies in its support to standardized data deposition, the availability of computerizable data files and the free-of-charge dissemination of biofilm studies across the community. Hopefully, this will open promising research possibilities, namely the comparison of results between different laboratories, the reproducibility of methods within and between laboratories, and the development of guidelines and standardized protocols for biofilm formation operating procedures and analytical methods.