Search tips
Search criteria

Results 1-9 (9)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses 
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.
PMCID: PMC4203338  PMID: 24462600
Bioinformatics; Scientific workflow; Sequencing analyses; Cloud computing; Galaxy
2.  Utilisation of a thoracic oncology database to capture radiological and pathological images for evaluation of response to chemotherapy in patients with malignant pleural mesothelioma 
BMJ Open  2012;2(5):e001620.
An area of need in cancer informatics is the ability to store images in a comprehensive database as part of translational cancer research. To meet this need, we have implemented a novel tandem database infrastructure that facilitates image storage and utilisation.
We had previously implemented the Thoracic Oncology Program Database Project (TOPDP) database for our translational cancer research needs. While useful for many research endeavours, it is unable to store images, hence our need to implement an imaging database which could communicate easily with the TOPDP database.
The Thoracic Oncology Research Program (TORP) imaging database was designed using the Research Electronic Data Capture (REDCap) platform, which was developed by Vanderbilt University. To demonstrate proof of principle and evaluate utility, we performed a retrospective investigation into tumour response for malignant pleural mesothelioma (MPM) patients treated at the University of Chicago Medical Center with either of two analogous chemotherapy regimens and consented to at least one of two UCMC IRB protocols, 9571 and 13473A.
A cohort of 22 MPM patients was identified using clinical data in the TOPDP database. After measurements were acquired, two representative CT images and 0–35 histological images per patient were successfully stored in the TORP database, along with clinical and demographic data.
We implemented the TORP imaging database to be used in conjunction with our comprehensive TOPDP database. While it requires an additional effort to use two databases, our database infrastructure facilitates more comprehensive translational research.
The investigation described herein demonstrates the successful implementation of this novel tandem imaging database infrastructure, as well as the potential utility of investigations enabled by it. The data model presented here can be utilised as the basis for further development of other larger, more streamlined databases in the future.
PMCID: PMC3488720  PMID: 23103606
Basic Sciences
3.  Enabling collaborative research using the Biomedical Informatics Research Network (BIRN) 
As biomedical technology becomes increasingly sophisticated, researchers can probe ever more subtle effects with the added requirement that the investigation of small effects often requires the acquisition of large amounts of data. In biomedicine, these data are often acquired at, and later shared between, multiple sites. There are both technological and sociological hurdles to be overcome for data to be passed between researchers and later made accessible to the larger scientific community. The goal of the Biomedical Informatics Research Network (BIRN) is to address the challenges inherent in biomedical data sharing.
Materials and methods
BIRN tools are grouped into ‘capabilities’ and are available in the areas of data management, data security, information integration, and knowledge engineering. BIRN has a user-driven focus and employs a layered architectural approach that promotes reuse of infrastructure. BIRN tools are designed to be modular and therefore can work with pre-existing tools. BIRN users can choose the capabilities most useful for their application, while not having to ensure that their project conforms to a monolithic architecture.
BIRN has implemented a new software-based data-sharing infrastructure that has been put to use in many different domains within biomedicine. BIRN is actively involved in outreach to the broader biomedical community to form working partnerships.
BIRN's mission is to provide capabilities and services related to data sharing to the biomedical research community. It does this by forming partnerships and solving specific, user-driven problems whose solutions are then available for use by other groups.
PMCID: PMC3128398  PMID: 21515543
Genomics; statistical genetics; bioinformatics; complex traits; data; machine learning; data sharing; information integration; data mediation; data security; data management; knowledge engineering
4.  A Roadmap for caGrid, an Enterprise Grid Architecture for Biomedical Research 
caGrid is a middleware system which combines the Grid computing, the service oriented architecture, and the model driven architecture paradigms to support development of interoperable data and analytical resources and federation of such resources in a Grid environment. The functionality provided by caGrid is an essential and integral component of the cancer Biomedical Informatics Grid (caBIG™) program. This program is established by the National Cancer Institute as a nationwide effort to develop enabling informatics technologies for collaborative, multi-institutional biomedical research with the overarching goal of accelerating translational cancer research. Although the main application domain for caGrid is cancer research, the infrastructure provides a generic framework that can be employed in other biomedical research and healthcare domains. The development of caGrid is an ongoing effort, adding new functionality and improvements based on feedback and use cases from the community. This paper provides an overview of potential future architecture and tooling directions and areas of improvement for caGrid and caGrid-like systems. This summary is based on discussions at a roadmap workshop held in February with participants from biomedical research, Grid computing, and high performance computing communities.
PMCID: PMC3292259  PMID: 18560123
5.  A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid 
With the emergence of “service oriented science,” the need arises to orchestrate multiple services to facilitate scientific investigation—that is, to create “science workflows.” We present here our findings in providing a workflow solution for the caGrid service-based grid infrastructure. We choose BPEL and Taverna as candidates, and compare their usability in the lifecycle of a scientific workflow, including workflow composition, execution, and result analysis. Our experience shows that BPEL as an imperative language offers a comprehensive set of modeling primitives for workflows of all flavors; while Taverna offers a dataflow model and a more compact set of primitives that facilitates dataflow modeling and pipelined execution. We hope that this comparison study not only helps researchers select a language or tool that meets their specific needs, but also offers some insight on how a workflow language and tool can fulfill the requirement of the scientific community.
PMCID: PMC2901112  PMID: 20625534
scientific workflow; functional programming; Taverna; BPEL; caGrid
6.  e-Science, caGrid, and Translational Biomedical Research 
Computer  2008;41(11):58-66.
Translational research projects target a wide variety of diseases, test many different kinds of biomedical hypotheses, and employ a large assortment of experimental methodologies. Diverse data, complex execution environments, and demanding security and reliability requirements make the implementation of these projects extremely challenging and require novel e-Science technologies.
PMCID: PMC3035203  PMID: 21311723
7.  CaGrid Workflow Toolkit: A taverna based workflow tool for cancer grid 
BMC Bioinformatics  2010;11:542.
In biological and medical domain, the use of web services made the data and computation functionality accessible in a unified manner, which helped automate the data pipeline that was previously performed manually. Workflow technology is widely used in the orchestration of multiple services to facilitate in-silico research. Cancer Biomedical Informatics Grid (caBIG) is an information network enabling the sharing of cancer research related resources and caGrid is its underlying service-based computation infrastructure. CaBIG requires that services are composed and orchestrated in a given sequence to realize data pipelines, which are often called scientific workflows.
CaGrid selected Taverna as its workflow execution system of choice due to its integration with web service technology and support for a wide range of web services, plug-in architecture to cater for easy integration of third party extensions, etc. The caGrid Workflow Toolkit (or the toolkit for short), an extension to the Taverna workflow system, is designed and implemented to ease building and running caGrid workflows. It provides users with support for various phases in using workflows: service discovery, composition and orchestration, data access, and secure service invocation, which have been identified by the caGrid community as challenging in a multi-institutional and cross-discipline domain.
By extending the Taverna Workbench, caGrid Workflow Toolkit provided a comprehensive solution to compose and coordinate services in caGrid, which would otherwise remain isolated and disconnected from each other. Using it users can access more than 140 services and are offered with a rich set of features including discovery of data and analytical services, query and transfer of data, security protections for service invocations, state management in service interactions, and sharing of workflows, experiences and best practices. The proposed solution is general enough to be applicable and reusable within other service-computing infrastructures that leverage similar technology stack.
PMCID: PMC3098092  PMID: 21044328
8.  caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research 
To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research.
An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG™) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study.
The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications.
The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL:
While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community.
PMCID: PMC2274794  PMID: 18096909
9.  caGrid 1.0: A Grid Enterprise Architecture for Cancer Research 
caGrid is the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIGTM) program. The current release, caGrid version 1.0, is developed as the production Grid software infrastructure of caBIGTM. Based on feedback from adopters of the previous version (caGrid 0.5), it has been significantly enhanced with new features and improvements to existing components. This paper presents an overview of caGrid 1.0, its main components, and enhancements over caGrid 0.5.
PMCID: PMC2655925  PMID: 18693901

Results 1-9 (9)