Search tips
Search criteria

Results 1-2 (2)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  VARiD: A variation detection framework for color-space and letter-space platforms 
Bioinformatics  2010;26(12):i343-i349.
Motivation: High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together.
Results: We present VARiD—a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined.
Availability: The toolset is freely available at
PMCID: PMC2881369  PMID: 20529926
2.  SHRiMP: Accurate Mapping of Short Color-space Reads 
PLoS Computational Biology  2009;5(5):e1000386.
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at
Author Summary
Next Generation Sequencing (NGS) technologies are revolutionizing the way biologists acquire and analyze genomic data. NGS machines, such as Illumina/Solexa and AB SOLiD, are able to sequence genomes more cheaply by 200-fold than previous methods. One of the main application areas of NGS technologies is the discovery of genomic variation within a given species. The first step in discovering this variation is the mapping of reads sequenced from a donor individual to a known (“reference”) genome. Differences between the reference and the reads are indicative either of polymorphisms, or of sequencing errors. Since the introduction of NGS technologies, many methods have been devised for mapping reads to reference genomes. However, these algorithms often sacrifice sensitivity for fast running time. While they are successful at mapping reads from organisms that exhibit low polymorphism rates, they do not perform well at mapping reads from highly polymorphic organisms. We present a novel read mapping method, SHRiMP, that can handle much greater amounts of polymorphism. Using Ciona savignyi as our target organism, we demonstrate that our method discovers significantly more variation than other methods. Additionally, we develop color-space extensions to classical alignment algorithms, allowing us to map color-space, or “dibase”, reads generated by AB SOLiD sequencers.
PMCID: PMC2678294  PMID: 19461883

Results 1-2 (2)