PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bmcgenoBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Genomics
 
BMC Genomics. 2009; 10: 623.
Published online 2009 December 22. doi:  10.1186/1471-2164-10-623
PMCID: PMC2807881
A new measurement of sequence conservation
Xiaohui Cai,1 Haiyan Hu,2 and Xiaoman Licorresponding author3
1Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. MC0446, La Jolla, CA 92093, USA
2School of Electrical Engineering and Computer Science, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA
3Burnett School of Biomedical Science, College of Medicine, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, USA
corresponding authorCorresponding author.
Xiaohui Cai: xcai/at/ucsd.edu; Haiyan Hu: haihu/at/cs.ucf.edu; Xiaoman Li: xiaoman/at/mail.ucf.edu
Received July 15, 2009; Accepted December 22, 2009.
Abstract
Background
Understanding sequence conservation is important for the study of sequence evolution and for the identification of functional regions of the genome. Current studies often measure sequence conservation based on every position in contiguous regions. Therefore, a large number of functional regions that contain conserved segments separated by relatively long divergent segments are ignored. Our goal in this paper is to define a new measurement of sequence conservation such that both contiguously conserved regions and discontiguously conserved regions can be detected based on this new measurement. Here and in the following, conserved regions are those regions that share similarity higher than a pre-specified similarity threshold with their homologous regions in other species. That is, conserved regions are good candidates of functional regions and may not be always functional. Moreover, conserved regions may contain long and divergent segments.
Results
To identify both discontiguously and contiguously conserved regions, we proposed a new measurement of sequence conservation, which measures sequence similarity based only on the conserved segments within the regions. By defining conserved segments using the local alignment tool CHAOS, under the new measurement, we analyzed the conservation of 1642 experimentally verified human functional non-coding regions in the mouse genome. We found that the conservation in at least 11% of these functional regions could be missed by the current conservation analysis methods. We also found that 72% of the mouse homologous regions identified based on the new measurement are more similar to the human functional sequences than the aligned mouse sequences from the UCSC genome browser. We further compared BLAST and discontiguous MegaBLAST with our method. We found that our method picks up many more conserved segments than BLAST and discontiguous MegaBLAST in these regions.
Conclusions
It is critical to have a new measurement of sequence conservation that is based only on the conserved segments in one region. Such a new measurement can aid the identification of better local "orthologous" regions. It will also shed light on the identification of new types of conserved functional regions in vertebrate genomes [].
Articles from BMC Genomics are provided here courtesy of
BioMed Central