PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of amolbioBioMed CentralBiomed Central Web Sitesearchsubmit a manuscriptregisterthis articleAlgorithms for Molecular Biology : AMB
 
Algorithms Mol Biol. 2012; 7: 5.
Published online Apr 5, 2012. doi:  10.1186/1748-7188-7-5
PMCID: PMC3341196
A normalization strategy for comparing tag count data
Koji Kadota,corresponding author1,2 Tomoaki Nishiyama,3 and Kentaro Shimizu1
1Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan
2Project on Health and Anti-aging, Kanagawa Academy of Science and Technology, 3-2-1 Sakado, Takatsu-ku, Kawasaki, Kanagawa 213-0012, Japan
3Advanced Science Research Center, Kanazawa University, 13-1 Takara-machi, Kanazawa 920-0934, Japan
corresponding authorCorresponding author.
Koji Kadota: kadota/at/bi.a.u-tokyo.ac.jp; Tomoaki Nishiyama: tomoakin/at/kenroku.kanazawa-u.ac.jp; Kentaro Shimizu: shimizu/at/bi.a.u-tokyo.ac.jp
Received December 1, 2011; Accepted April 5, 2012.
Abstract
Background
High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data.
Results
We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset.
Conclusion
Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.
Articles from Algorithms for Molecular Biology : AMB are provided here courtesy of
BioMed Central