PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of amolbioBioMed CentralBiomed Central Web Sitesearchsubmit a manuscriptregisterthis articleAlgorithms for Molecular Biology : AMB
 
Algorithms Mol Biol. 2012; 7: 12.
Published online May 2, 2012. doi:  10.1186/1748-7188-7-12
PMCID: PMC3394223
Fractal MapReduce decomposition of sequence alignment
Jonas S Almeida,corresponding author1 Alexander Grüneberg,1,2 Wolfgang Maass,2,3 and Susana Vinga4,5
1Div Informatics, Dept Pathology, University of Alabama at Birmingham, USA
2Research Center for Intelligent Media, Furtwangen University, Furtwangen, Germany
3Information and Service Systems, Dept of Law and Economics, Saarland University, Germany
4INESC-ID, Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento, Lisboa, Portugal
5FCM-UNL Faculdade Ciências Médicas - Universidade Nova de Lisboa, Portugal
corresponding authorCorresponding author.
Jonas S Almeida: jalmeida/at/uab.edu; Alexander Grüneberg: gruene/at/uab.edu; Wolfgang Maass: wolfgang.maass/at/iss.uni-saarland.de; Susana Vinga: svinga/at/kdbio.inesc-id.pt
Received October 8, 2011; Accepted May 2, 2012.
Abstract
Background
The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required.
Results
In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming.
Conclusions
The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing.
Availability
Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm".
Articles from Algorithms for Molecular Biology : AMB are provided here courtesy of
BioMed Central