Major points One of the most interesting findings in this study is based on results showing that there are two classes of p53 binding sites in Alus: a class that appears to be derived from methylation and deamination of CpG sites as previously reported by Vingron and colleagues [
36] and a class of sites that appears to have been present in ancestral Alus as has been proposed for LTR-derived p53 sites by the Haussler group [
14]. I am slightly confused about one observation regarding these two mechanisms. The sites derived from methylation and deamination map to the region of the element that is enriched for zero length spacers, which in turn are over-represented in young Alu elements. How is it that the youngest elements have experienced more CpG methylation and deamination than the older elements that have had many more millions of years to accumulate such mutations?
Authors' response: I.K. Jordan made an interesting point by linking our observation of the progressive decrease in the length spacer (in the p53 BSs) with the existence of two different mechanisms involved in generation of p53 sites in Alu elements.
We disagree, however, with the reviewer' assessment: "The sites derived from methylation and deamination map to the region of the element that is enriched for zero length spacers" (that is, to the regions around the Boxes A/A'). Below, we show that:
(1) not all sites with S = 0 originated through CpG mutations (in the Boxes A/A');
(2) not all CpG mutations resulted in formation of the p53 sites with S = 0.
(1) To compare different Alu subfamilies in terms of their enrichment for various length spacers, in addition to the absolute numbers presented in Table , we need to consider the 'relative' values (similar to GC content). For example, consider the ratio (number of the p53 sites with S = 0) divided by (number of all Alu elements in the subfamily). This ratio is the highest for the Sg1 subfamily (58%) and exceeds more than twofold the ratios for the other Alu subfamilies. In other words, the AluSg1 sequences are characterized by the highest 'content' of p53 sites with S = 0. But it does not mean that the members of Sg1 subfamily experienced the highest CpG mutation rate. As follows from Figure S4-H, the Sg1 subfamily is unique in the sense that nearly all p53 sites with spacer S = 0 are mapped to the Box B, while in the other subfamilies the sites with S = 0 are mapped to the Boxes A/A' (Figures and ). The consensus AluSg1 sequence (Figure , Box B) suggests that most of the p53 sites in the Sg1 subfamily originated without mutations in CpG (Figure ), and the AluSg1 elements have to be excluded from this consideration.
(2) Furthermore, mutations CG:CG-to-CA:TG are required not only for creation of the p53 sites with S = 0, but also for the sites with S = 8 and S = 14 (see consensus sequences in Figure , Box B). If we calculate the cumulative 'content' of p53 sites with S = 0, 8 and 14 bp (Table ), it will be the highest for AluJo subfamily (35%). The next are the FLAM-C and AluSp subfamilies with 30%. The youngest subfamilies in Table , AluSc and AluY, have very low contents, 8% and 3%, respectively.
Thus, we see that the old AluJo subfamily has the highest fraction of the p53 sites which likely originated through CpG mutations.
The authors touch on a very interesting issue towards the end of the discussion, where they mention that p53 binding of Alus effectively silences their transcription by disrupting assembly of the pol III machinery. However, this discussion is curiously devoid of biological context with respect to the effects of Alu transcription on cellular function. I would urge the authors to consider speculating further on the biological significance of this idea. The dysregulation of TEs has previously been associated with a number of disorders including cancer. However, the classic models for the role of p53 in tumorgenesis are related to transcriptional activation and/or repression of host genes, and indeed much of the focus of this manuscript is on the potential effects of Alu carried p53 binding sites on host genes. But the work reported here suggests the possibility that defects in the function of p53 could lead to cancer via dysregulation (specifically upregulation) of the Alu elements themselves.
Authors' response: Thank you for the suggestion - we included more biologically-related speculations in the last section of Discussion.
The authors propose a selective model whereby young Alus transpose near the promoter regions of host genes bearing p53 binding sites that disrupt the expression of the element, thereby mitigating some of its potential deleterious effects, and also can effect the regulation of the host gene. It is difficult to reconcile this model with the observation that younger Alus are enriched for the short spacer binding sites in the A-box region that appear to have evolved via CpG methylation and deamination. Thus, the intitial effect of many of these insertions would not be to bring that p53 binding site and so selection could not act at that point. This model is also inconsistent with the observation that young Alu elements are depleted near genes relative to older Alu elements.
Authors' response: In our opinion, not only the young Alus bear p53 binding sites that could potentially disrupt the expression of the Alu element (through the interaction between p53 N-terminus and pol III transcriptional machinery), but the old Alu elements may also play a similar role. As shown in Figure , all Alu subfamilies (from FLAM to AluY) share a highly conserved p53 site in the Box A. All Alus except AluSc and AluY bear a site in the Box B with spacer S = 3, 8 or 14 bp. It is conceivable that p53 could bind these sites, disrupting the expression of Alu elements and at the same time, playing a regulatory role in the transcription of nearby host genes.
With respect to the analyses reported here, there are a few rather qualitative conclusions drawn from data that could be quantitatively and statistically analyzed in such a way as to provide more definitive results. The trends that the authors point to do seem to be there, but a more quantitative analysis could provide additional support for their conclusions. I provide a few suggestions to this end below.
First, the method for comparing the p53 binding site spacer length distributions for repetitive and nonrepetitive DNA seems indirect. It appears as if the authors compared, 1) the entire genome including repeats and non-repeats (see filled circles in Figure ) with 2) only the non-repeat part of the genome (open circles in Figure ). In other words, the non-repetitive fraction analysis in part 2 was done on a subset of the entire genome analysis in part 1. Why not directly compare the spacer distributions for the repeat and non-repeat parts of the genome directly?
Authors' response: In Figure , we compared the whole genome with the non-repeat part of the genome. The latter is characterized by nearly constant frequency of occurrence of the spacer S (where S varies from 0 to 20 bp). This means that the peaks in the spacer length distribution originate (almost) entirely due to repeats. The approximate values for the repeat part of the genome can be easily obtained by subtracting 60,000 from the values presented in Figure for the whole genome. (Approximate because those few p53 BSs that occur at the borders of the repeat elements would be eliminated from consideration.) In addition, the spacer length distribution for Alu repeats is given in Figure .
Second, the relationship between the age of Alu elements and the length of the spacer, which has important functional implications since short spacers tend to bind p53 with higher affinity, is quite interesting. The trend the authors highlight in Table does seem somewhat apparent, but this could benefit from a more definitive quantitative analysis. In particular, some of the data fit the trend of decreasing spacer length with element age, but others, such as the youngest family AluY, do not. The relative age of the families could be correlated with the average spacer length for each class, or perhaps simply the length of the most prevalent spacer, to more quantitatively evaluate the trend.
Authors' response: Unfortunately, the estimates of the average age of Alu elements are contradictory in two aspects. First, there is uncertainty regarding the order of the Alu subfamilies. For example, Kapitonov & Jurka [
72] proposed that the subfamily Sx is younger than Sq, whereas several other studies, including one from the Pevzner group [
73], suggested that Sx is older than Sq. Second, the ages of Alu subfamilies estimated by various groups differ substantially. For instance, Kapitonov & Jurka estimated the age of AluJo to be ~80 Myr, while the corresponding estimate made by Pevzner and colleagues is ~60 Myr.
Given the noticeable discrepancy and uncertainty of the age estimate of Alu subfamilies in the literature, we preferred to present the dependence as shown in Table - at the qualitative level, all is clear here. A quantitative evaluation of the correlation between the average spacer length and the age of the Alu subfamilies is hardly possible, at least for a while.
Minor points: There are a few statements that are not directly supported by the data or the literature cited.
On page 4, Haussler and co-workers [
14] are cited as substantiating previous results that TEs (LTRs) contain p53 binding sites. The references cited as providing the original observations that are substantiated by Wang
et al. [
14] are both abstracts from conference proceedings; thus it is not clear what they report. Is it not the case that the Haussler paper was the first to show that TE sequences bind p53 genome-wide based on experimental evidence as opposed to simply binding site predictions? It would help to clarify this.
Authors' response: Yes, it is correct that Wang
et al. [
14] were the first to show that numerous p53 binding sites detected in the p53-ChIP experiments [
7] are embedded in TE sequences. We made appropriate changes in the Background, to make this clear. On the other hand, in the two short abstracts published earlier [
12,
13] we showed that "simply" predicted p53 binding sites reside in TEs genome-wide. Unfortunately, we were unable to publish a detailed description of our results in 2003, because at that time the idea of thousands (let alone millions) of p53 sites residing in 'junk DNA' was absolutely unacceptable in the p53 community.
On page 5, the authors mention that they find it 'remarkable' that most of the predicted Alu-derived p53 binding sites are clustered in the same regions of the elements as seen for the well characterized response elements. I don't understand why this is remarkable in light of the fact that one would expect p53 to bind Alu elements in the regions that contain consensus binding site motifs.
Authors' response: In our opinion, this is "remarkable" because numerous predicted p53 BSs behave similar to those few experimentally validated p53 REs that bind to Alu repeats. This point is discussed in detail in Conclusions (third paragraph). However, to comply with the Reviewer's objection, we substituted "remarkably" by "importantly."
On page 6, the authors state that "promoter regions are enriched with Alu elements" to underscore the potential of Alu-derived p53 binding sites to influence host gene regulation and cite Polak and Domany [
40] in support of this statement. In fact, this manuscript and several others show that Alus, along with other TEs, are actually substantially depleted in proximal promoter regions adjacent to transcriptional start sites (TSS) and then steadily increase in frequency moving away from the TSS. Several kb distal from the TSS Alus are indeed found in slightly higher frequencies than for the genome as a whole, and intergenic regions in particular, but this can be attributed simply to the fact that Alus are enriched in-and-around genes.
Authors' response: Thank you for the correction - indeed, these are the upstream regions of the TSS (several kilobases in length) that are enriched with Alu elements, not the promoters themselves, as we wrote. We changed the end of the Background accordingly.