We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Sequence comparisons and alignments are among the most powerful tools in research in biology. Since similar sequences play, in general, similar functions, identification of sequence conservation between two or more nucleotide or amino acid sequences is often used to infer common biological functionality. Sequence comparisons, however, have limitations; often similar functions are encoded by higher order elements which do not hold a univocal relationship to the underlying primary sequence. In consequence, similar functions are frequently encoded by diverse sequences. Promoter regions are a case in point. Often, promoter sequences of genes with similar expression patterns do not show conservation. This is because, even though their expression may be regulated by a similar arrangement of transcription factors, the binding sites for these factors may exhibit great sequence variability. To overcome this limitation, the authors obtain predictions of transcription factor binding sites on promoter sequences, and annotate the predicted sites with the labels of the corresponding transcription factors. They develop an algorithm—inspired in an early algorithm to align restriction enzyme maps—to align the resulting sequence of labels—the so-called TF-maps (transcription factor maps). They show that TF-map alignments are able to uncover conserved regulatory elements common to the promoter regions of co-regulated genes, but those regulatory elements cannot be detected by typical sequence alignments.