Since the completion of the human genome, cataloging transcription factor binding sites (TFBSs) has been critical for understanding gene regulation. The use of comparative genomics (evolutionary conservation across species) is often championed as a method to separate the functional regulatory sequence “wheat” from the nonfunctional “chaff” [1
]. As the number of mammalian full genome drafts increases, the integration of TFBS predictions with lists of conserved noncoding regions (CNCs) has emerged as a key step in the TFBS identification process [2
]. If TFBS predictions are contiguous to DNA features that coincidentally have a critical structural role such as maintenance of chromatin organization, the appearance of conservation may be intensified even further.
Although these methods have greatly enhanced our knowledge of the human genome's regulatory repertoire, overreliance on conservation information can potentially exclude genuine binding sites. Since TFBSs are typically small, they can arise by chance in a gene's promoter and therefore may decrease selective pressures to maintain already existing sites [5
]. Another wrinkle is that the evolutionary forces that created the conservation blocks may no longer be functionally relevant to humans [6
]. Additionally, recent scans for natural selection in human gene coding regions have revealed that distinct biological pathways often are subject to widely different evolutionary pressures [7
], particularly since mutation rates have been shown to vary across the genome [9
]. Genes involved in oncogenesis and tumor suppression have experienced recent selection for mutation in primate lineages [7
]. DNA binding sites of transcription factors are also functional components of these pathways and are likely under similar evolutionary pressures. Indeed, we have focused recently on identifying human single nucleotide polymorphisms that alter the function of transcription factors [11
]. As a result, we have investigated the assumptions for using mammalian conservation as an obligatory screening step for seeking TFBSs.
The p53 tumor suppressor gene, encoded by the TP53
master regulatory gene, is a transcription factor that coordinates a network of cellular responses to environmental insults. Over half of human cancers have a mutation in the p53 protein or one of its partners [13
]. The p53 protein is estimated to have several hundred transregulation target genes that affect pathways including apoptosis, DNA damage repair, and cell-growth arrest [14
]. As a result, p53 target genes are highly sought-after drug targets for halting cancer progression. According to in vitro experiments, the p53 protein binds specifically to a palindromic consensus sequence, RRRCWWGYYY(N0−13
], with nearly all REs containing at least one mismatch; in vivo results have suggested that the spacer region may be much smaller [14
]. The sequence is typically located within 5,000 bases of the target gene's transcriptional start site, and p53 either induces or represses expression upon p53 binding [16
]. One feature of p53 that confounds the discovery of novel transregulated genes is that while some binding sites match the expected consensus sequence quite well, others can be consensus poor and yet are both necessary, and sufficient, to transactivate a gene [18
]. A recent study has suggested that the “rules of engagement” for p53 REs may differ based on the activated pathway, particularly in the apoptosis and cell-cycle–related systems [19
]. Thus, we have used cross-species conservation to examine if these groups of elements exhibit distinct conservation profiles.
To evaluate the utility of comparative genomics approaches in the identification of potential p53 target REs, we gleaned the literature for a high quality set of bona fide p53 REs to estimate the degree of conservation between humans and other mammals. To relate the TP53
system to other master regulators, we compare its binding site conservation to those of the transcription factors encoded by two other genes: NFκB
(nuclear factor of kappa light chain gene enhancer in B-cells),
central to inflammation responses, and NFE2L2,
which encodes NRF2 (nuclear factor [erythroid-derived 2]-like 2 nuclear factor), a regulator of oxidative stress. Their repertoire of interactions is expected to be highly preserved throughout the mammalian lineage. The NFκB transcription factor is a heavily studied biological switch of the inflammation, apoptosis, and immune responses [20
]. It binds the consensus sequence GGGRNNYYCC [22
], and its signaling system is highly conserved even when examined in invertebrates [21
]. NRF2 binds antioxidant REs (consensus sequence = TMANNRTGAYNNNGCRWWWW [25
]) that are comparable in size to those of p53, show high levels of conservation [26
], and are found in the promoters of genes that confer protection from oxidative stress and chemical carcinogens [27
]. Mouse models of Nrf2-dependent response to oxidative and electrophilic insults have been used to study function [28
]. Additionally, the Nrf2 pathway in zebrafish operates similarly to humans and underscores the likelihood of high conservation in regulatory binding sites [30
]. Because the NFκB and NRF2 binding sites were determined to be highly conserved, these two sets of TFBSs serve as positive controls in estimates of conservation. Our comparative genome analysis, which includes a coincident evaluation of sampled promoter sequences and coding region sequence, reveals that mammalian conservation does not apply to p53 target REs in general. However, among subgroups of target genes we observe purifying selection acting on a number of p53 binding sites, including many cell-cycle–related genes, while rodent to human homology is lacking for p53 REs in apoptosis-related genes.