Understanding complex diseases toward the development and assessment of putative therapies requires traversing between the bench and bedside, often referred to as the ‘T1 translational barrier.’25
As a goal, the objective is uncomplicated—to ascertain how basic science observations can be applied to clinical contexts, either in the form of prognostic, diagnostic, or therapeutic approaches to disease. As an endeavor, it represents a grand challenge in modern medicine and also a potential paradigm shift for how to integrate a broad set of data points.
The high dimensionality of potential data types when considering the full array of biological and clinical data that can be generated dwarfs any previous attempt at heterogeneous data integration. There is therefore a need to develop the next generation of clinical decision support systems that can incorporate data from massive biological datasets that will need to be combined with relevant disease phenotype information and computable knowledge bases to offer clinically useful suggestions. Perhaps more mundane, but of equal significance, is the need to develop approaches that can accommodate a dizzying set of file formats and representation standards. These are not, by themselves, completely new challenges to the biomedical informatics community. Nonetheless, they reflect a core area of emphasis where energy is needed to integrate knowledge across clinical genomics, genomic medicine, pharmacogenomics, and genetic epidemiology in light of the avalanche of additional genomic and clinical data and the corresponding knowledge of inter-relationships.
Amidst the challenges of knowledge integration and handling unprecedented volumes of data, TBI is greatly challenged with developing approaches that can bridge biological knowledge and place it into a meaningful
clinical context. The volume of data can lead to spurious correlations that may be an artifact of the data and neither biologically nor clinically insightful. For example, if a physician had access to a patient's entire genome, how could it be leveraged to provide clinically insightful knowledge that would not have been possible using solely data already in a medical chart (eg, family history of a disease)? As shown for the genomic era's ‘Patient 0,’ it is plausible to integrate genomic data with relevant clinical data to develop prognostic approaches.27
The potential to provide appropriate care with respect to predicted disease outcome or efficacy of therapeutics offers great incentive for developing TBI approaches that integrate the full complement of biological, clinical, and environmental data. For this reason, phenotypic annotation of samples whose gene expression or single nucleotide polymorphic information is available in genomic data repositories such as GEO28
is underway in different laboratories,30
involving methodologies that are widely used in health informatics (eg, natural language processing, ontology mapping). Finally, approaches such as those implemented by the Crimson system32
hold promise for capitalizing on the clinical data that are captured as an artifact of standard clinical care. The extent to which this type of relatively noisy data can be used for research is still the object of active research by the TBI community.
Projects that involve TBI approaches to integrate biological and clinical data are already underway. The NIH-funded eMERGE (Electronic Medical Records and Genomics) project is a multi-site endeavor exploring issues involved with linking genomic information (from genome-wide association studies) with clinical data for individuals with specific conditions.33
Other efforts such as the Personal Genome Project,34
the Exome Project,35
the Million Veteran Program,36
and the 1000 Genomes Project37
reflect the increasing interest of the biomedical research and clinical communities in studying the complexity of genotype–phenotype relationships as well as postulating hypotheses for disease that incorporate genomic data. In addition to human-based genome projects, there are also initiatives such as the Human Microbiome Project (HMP38
) and Metagenomics of the Human Intestinal Tract (MetaHIT39
) that strive to provide a census of commensal microbial flora potentially related to disease.40