Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood.
We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA).
Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of repurposed TFos, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest exaptation of some functional regulatory sequences into new function.
Despite TFos repurposing, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TFos – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions.
We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1245-6) contains supplementary material, which is available to authorized users.