The revolutionary development of massively parallel DNA sequencing has enabled identification of biomedically relevant genomic variants via whole genome [1
] and exome resequencing [2
]. Information relevant for personalized medicine such as assessment of longitudinal disease risks, and personalized treatment [3
] are now within reach.
In a few very recent personal genomic studies, results have directly led to targeted treatment and dramatic improvement in the patient's quality of life [4
]. These examples are paving the way to soon turn genomic sequencing into a routine diagnostic procedure and enable personalized medicine.
Currently, analysis of sequencing data on a genomic scale requires bioinformatic expertise and access to extensive computational resources, presenting a significant barrier. Most cutting-edge genome analysis applications [5
] are still limited to a command line interface and require at least moderate informatics expertise to operate. In addition, large scale genomic data analysis requires routine access to a high performance compute cluster. Such requirements are entirely unsuitable for the operational models of smaller research/diagnostic laboratories due to the excessive investment requirements in computing infrastructure and personnel.
The deployment of genomic analysis Software as a Service (SaaS) within a cloud computing framework offers a unique solution for these problems. The concept behind cloud computing is to outsource computation to third-party servers or clusters at a remote location. This allows small laboratories to take advantage of external computational resources without having to maintain an in-house compute cluster. This software as a service model removes the upfront investment requirement and any delays associated with building local computing infrastructure. Earlier solutions such as CloudBurst [7
] and Crossbow [8
] have attempted to tackle the very specific problem of mapping short read data and assembling large genomes using the scalability offered by the map-reduce framework deployed on top of a compute cluster. While this is useful the users would still need to have considerable bioinformatics skill and acquaintance with cluster infrastructure to undertake such an analysis. Other solutions such as CloudMan [9
] from the Galaxy Project provide a user interface and remove the need for user to have informatics experience but are not specifically designed for personal genome analysis.
To this end we integrated our variant analysis pipeline - Atlas2 Suite - onto a "local cloud" using the Genboree Workbench http://www.genboree.org
and onto a "commercial cloud" via the Amazon Web Services http://aws.amazon.com
. We performed a case study using the Atlas2 Genboree pipeline as a proof of concept to demonstrate the potential of personal genome analysis on the cloud. We also processed two whole exome capture samples using our Atlas2 Amazon pipeline to outline the cost of running analysis on Amazon. Our cloud analysis pipeline on Genboree has a web browser-based drag and drop interface, allowing users to interact with the software through their browser at any location, and making it practical for the software to be used by non-bioinformaticians. Our cloud pipeline is actively maintained by our team, which also removes the need for users to update the software.