|Home | About | Journals | Submit | Contact Us | Français|
Multi-site consortia have become the preferred setting for team-based translational research programs. Such consortia are able to facilitate increased breadth and depth of basic science and clinical research activities, but also present numerous challenges related to data collection, analysis, storage, and exchange. The Chronic Lymphocytic Leukemia (CLL) Research Consortium (CRC), a s a prototypical instance of such a consortia, uses numerous loosely coupled web applications to address its informatics needs. Over a decade of operations have allowed the CRC to identify usability and computational limitations relative to the preceding information management architecture. In response, the CRC has launched the TRITON project, with the ultimate objective of developing an open-source, extensible, and fully integrative translational research information management platform. In this manuscript, we describe the architecture, design processes, and initial implementation of thatplatform.
The Chronic Lymphocytic Leukemia Research Consortium (CRC, http://cll.ucsd.edu) is an NCI-funded program/project (P01CA081534) consisting of eight sites. Initially funded in 1999, the CRC coordinates and facilitates an integrated translational research program, with specific emphasis on basic and clinical research targeting the genetic, biochemical and immunologic bases of Chronic Lymphocytic Leukemia (CLL). A critical facility supporting the ability of the CRC to engage in such research is the use of shared data repositories, associated data collection instruments, and data mining and analysis tools. The CRC Integrated Information Management System (CIMS), is the data management system currently used by the consortium, incorporating: 1) multiple task-specific web portal interfaces supporting clinical trial, basic science and tissue bank data management; and 2) a set of shared data repositories. CIMS facilitates the collection and storage of numerous heterogeneous bio-molecular data sources generated by instrumentation and methodological approaches including: quantitative and qualitative immunophenotyping, multiple modalities of gene expression analysis, and Fluorescent In Situ Hybridization (FISH) analyses of cytogenetic abnormalities. CIMS was initially deployed for use by the CRC in 2000, and at the time of this submission, is being used to collect, manage and analyze data for well over 5000 patients involved in multiple clinical trial modalities, as well as hundreds of thousands of CLL-specific tissue samples. Despite the success of CIMS in satisfying the informatics requirements of the CRC over the past ten years, CRC participants have identified numerous usability and computational limitations of CIMS, including:
Motivated by these limitations, the CRC has launched the TRITON (Translational Research Information Technology Omnibus) project, in order to re-engineer the current CIMS platform and develop a highly usable, extensible, standards-based, open source, and integrative translational research information management platform. A primary goal of these efforts is to enable the integration between TRITON and basic science, clinical research and translational science focused data management tools and interchange mediums associated with the NCI’s Cancer Biomedical Informatics Grid (caBIG) initiative, including the caGrid service-oriented middleware (1, 2). In doing so, our objectives is to increase the translational capacity of the CRC by enabling consortium investigators to discover, integrate, analyze and disseminate heterogeneous, multi-dimensional data sets. It is anticipated that many of these data sets will be generated by high-throughput bio-molecular technologies or instrumentation, as well as electronic health record (EHR) systems that are currently in use at the majority of CRC sites.
In the following sub-sections, we will describe three complementary and concurrent axes consisting of both technologies and methodologies, which collectively are being used to design and implement the TRITON platform.
The first objective of the TRITON project is to migrate the existing CIMS database management systems and web-based interface applications to a standards-based, open-source software platform. This migration is necessary due to the current reliance of many CIMS components on an operational data repository that is implemented using the proprietary relational database management system, web application platforms, and programming languages. Such dependence significantly reduces the extensibility and adoptability of CIMS to other, analogous research programs. The specific open-source components we are utilizing for the TRITON project include: 1) the MySQL relational database management system; 2) the LifeRay web portal platform; 3) J2EE (JSR 168) compliant portlets; 4) the caGrid electronic data interchange (EDI) middleware (1, 2) and GAARDS grid-based user authentication and authorization system (3); and 5) the caTissue bio-repository management suite (4). In order to mitigate potential workflow disruptions associated with the migration of an actively utilized information management system to a new software “stack”, we are implementing the above technologies in an phased manner, as summarized in Table 1. Of note, at the time of submission, the TRITON project has completed Phases 1-2 of this technology migration, and is actively involved in Phase 3.
The second objective of the TRITON project is to develop a foundational domain model, derived from CIMS-specific workflows, that maps constituent objects and attributes to NCI EVS-compliant concept definitions, and to utilize that model to build caGrid “wrappers” capable of supporting the interoperability of TRITON with external caBIG electronic data interchange standards and scientific analysis applications. The first part of this objective, has been accomplished via a multi-step process consisting of:
The rationale for this approach is that the extensibility afforded by using a terminology management platform, such as LexEVS, and an ISO11179-compliant metadata management system, such as openMDR, will enable: 1) external semantic interoperability with the NCI-EVS and caDSR; as well as 2) additional terminology or ontology standards or services that already exist or may evolve during the course of this project or that are required by other adopters in the future. This is particularly desirable to ensure that TRITON can be generalized beyond the immediate oncology domain. Borlawsky et al. (5) provide a more detailed description of the model-driven architecture and knowledge engineering techniques being utilized by our team.
The second component of this objective, the implementation of caGrid-compliant wrappers that leverage the preceding ontology-anchored domain models, will be accomplished using the caGrid Data Service Framework and Introduce toolkit (1, 6). This will ensure that where appropriate, TRITON data sets will be caBIG-compliant, and therefore interoperable with other nationwide efforts. Our initial objective in this regard is to implement a bidirectional wrapper that will support the execution of queries against the TRITON participant registry, study calendar and protocol metadata, by enabling the mapping between SQL and CQL (Common Query Language), an axiomatic logical query syntax that is supported by the caGrid middleware. The wrapper will utilize an instance-specific rule base to define the semantics and logic of such mappings. The rules will be defined in terms of both local data type definitions and ontology-anchored concept definitions, maintained in the project’s LexEVS and openMDR instances.
The third objective of this project is to extend the existing CIMS operational data repository and web-based interface applications to support the collection, storage and analysis of novel bio-molecular and phenotypic data sets. This objective will be primarily satisfied through the adoption and integration of existent or emergent caBIG-developed data storage and analysis platforms. A primary goal in the context of this objective is to support tissue sample and correlative phenotypic data capture in the context of longitudinal studies. This goal will be accomplished using a two-part approach: 1) an instance of the caGrid-compatible caTissue Suite bio-specimen management system will be deployed to manage CRC tissue core logistics, and integrated with the previously described LifeRay portal interface; and 2) an instance of the open-source Jess production rules system, with an accompanying ontology-anchored rule base, will be de ployed and linked to the LifeRay portal interface in order to execute and generate data-driven messages via both the web-portal interface and e-mail, based upon the axiomatic rules defined in the rule base. A primary use of this decision support mechanism will be to employ rule-based alerting and prompts in order to increase compliance with bio-specimen and correlative data collection protocols.
The second goal in the context of the preceding objective is to facilitate on-demand, integrative query and analysis of tissue sample availability and corresponding phenotype and bio-molecular data sets, using a combination of the following components: 1) ontology-anchored data definition and integration schemas; 2) caGrid-based electronic data interchange platforms; and 3) a web portal application intended to support the discovery, integration and interchange of heterogeneous biomedical data sets using the two preceding components. Specifically, we will deploy an instance of the TOKEn conceptual knowledge discovery platform and web portal (10). This portal will be integrated with the overall TRITON LifeRay portal in order to provide an integrative and federated query mechanism that spans the re-factored CIMS data repository, caTissue data repository and associated tissue annotations, and any pertinent and appropriately grid-enabled correlative or external data sources. The ultimate rationale behind this approach is that by leveraging caGrid-compatible portal technologies in conjunction with local data repositories, caGrid-compliant data repositories and potentially data analysis services, CRC investigators and staff will be able to realize the efficiencies associated with the creation of truly translational informatics pipelines, as have been previously described by Kickenger and colleagues (11).
Table 2 summarizes the current state of the TRITON project, relative to the three axes described in the preceding section. Relative to the user-centered design methodologies introduced in our description of Phase 3 of the project, an iterative series of focus-group sessions have been conducted with members of the CRC in order to critical human-factors concerns relative to current CIMS functionality and future TRITON functionality. Thematic analyses of such sessions have identified a number of reoccurring and high priority areas, as summarized below:
The TRITON platform, as a successor the existing CIMS tools and data repositories, will adhere to the overall architectural model illustrated in Figure 2. Fundamental to this architecture are:
We are actively providing open-source access to all TRITON software components via a gForge collaboration and software distributed site (https://project.bmi.ohio-state.edu).
The TRITON project represents a prototypical instance of the use of prevailing open-source and standards-based technologies in order to develop, deploy, and disseminate an extensible and integrative translational research information management platform. The design and implementation approach used in this project, should be informative to analogous efforts and programs. Of note, the availability of such informatics platforms is critical and pre-requisite to the advancement and success of a wide variety of research registry and dissemination portals, such as those associated with a number of large-scale NIH programs (TCGA, dbGaP, etc.) Furthermore, given the open-source and multi-institutional nature of the TRITON project, there is significant opportunity for the development of a community-based effort to extend and adopt this platform, with demonstrable benefits in terms of clinical and translational research efficiencies and capacity.
This work was supported in part by NCI grants R01CA134232 and P01CA081534.