Reviewer #1, Dr. Guillaume Bourque, McGill University, nominated by Dr. Jerzy Jurka, had the following comments:
This is an interesting paper that reports an over-representation of conserved TF binding motifs embedded in microRNA precursor sequences. Although this observation is not totally novel (see comment #1 below), the analysis is more comprehensive and the simulations designed to test the significance of this observation are non-trivial. One weakness of the paper in its current form is that it uses too many tables (there are 9) when I think that a few figures (there is currently only 1) would drive some of the points much better (see comment #2).
Comments
#1 I didn't see a reference to the paper "Genomic analysis of human microRNA transcripts", Saini et al. PNAS 2007 which should be cited. The Figure of that paper in particular is very similar to the main result of the current paper. You should explain how your work differs and expands on what was done previously.
Response: If you look closely at Figure of the Saini et al paper, you will see that they characterized the regions UPSTREAM (+) and DOWNSTREAM (-) of the pre-miR sequence but they did NOT examine the pre-miR sequence itself! Nowhere in that paper do they demonstrate or even suggest the possibility that TF binding sites may reside within the pre-miR. However, we will add Saini et al to our reference list as providing prior supporting evidence for our own data showing that the regions immediately flanking the pre-miR are also enriched in TF binding sites (albeit to a lesser extent compared to within the pre-miR itself).
#2 There are many tables some with too little information (e.g. Table , Table 8), some with information that would be best represented by a figure (e.g. Table 7) and some with too much information that's not directly relevant to the main point (e.g. Table 9). I believe that many of these tables could be replaced by a few multi-panel figures (e.g. Table , ) that would greatly enhance the readability of the paper.
Response: We have now represented several of the tables by figures. Notably, we simplified the presentation of Table and converted it to a figure (Figure ) to make it more readable. We also reorganized and simplified some of the text throughout the paper to increase the readability.
#3 One of the first questions I had when I read the first section of the result section (e.g. on page 5) was whether the observation made for precursor sequences was restricted to the actual precursor sequences or extended to the flanking regions.
Could you show this directly in Table (now Figure ) or, even better, in a figure? I know that you talk about these things later in a different section on the properties of pre-mirRNAs with motifs (page 7, par 2) but to me this goes earlier when you're trying to establish the association. Also, instead of Additional file
2, I think that a figure that shows where the motifs are relative to the precursors sequences and that the enrichment doesn't extend beyond those sequences would probably help significantly.
Response: These comments seem to imply that we are claiming that the TF binding sites are restricted to pre-miR sequences and NOT also enriched in flanking regions. However, as stated above, the enrichment does cover both the pre-miR and to a lesser extent, the flanking regions as well.
#4 Also about Table (now Figure ) and the enrichment, could you also include another control such as gene promoter sequences so that we can see the strength of the enrichment relative to a positive control?
Response: We appreciate the sentiment behind this request, but there are several problems with doing so. First, promoter sequences were used in the construction of the statistical model that defined motif matching and significance, so there is some circularity in using similar sequences for statistical testing. Second, the outcome of such a test is irrelevant to the point of our paper - it does not matter if the density of TF binding sites within pre-miRs is as great, greater than or less than the density within promoters. The fact that they are there AT ALL (much less in the majority of conserved pre-miRs) is surprising, unexpected and deserves to be acknowledged.
#5 Page 6, paragraph 2: Isn't this observation circular? You've looked for pre-miRNA sequences with conserved TFBS and you now observed that they are more conserved on a sequence-level... Wouldn't you have to look for any TFBS (whether conserved or not) and try to make that case?
Response: To some extent, what you are saying is true. However, the pre-miR sequences of highly conserved mature miRNAs do show significant drift in certain regions (e.g. the loop region). Since we showed that the TFBS sites are generally NOT co-located exactly with the mature miRNA sequence (Table 7, now Figure ), there is no reason to assume that the set of conserved pre-miRs [defined by overall similarity across rat, mouse and human] should show the detailed conservation of exact TFBS motifs that it does, nor that it should extend to other vertebrate classes. More importantly, we show in a separate analysis that TFBS are highly enriched in pre-miRs even when the analysis includes all non-conserved sites and non-conserved pre-miRs. This analysis also shows that the prevalence for TFBS is greater in conserved pre-miRs than in primate-specific pre-miRs.
#6 Page 7, paragraph 1: Are the cancer pathways enriched for these miRNAs? If not this is not really a critical observation.
Response: Correct. The point is not that they are enriched in cancer miRs, but that they affect many of the most-studied miRs and pathways that investigators care about.
#7 Page 12, par 1 and Page 21, Table : "TFBS with experimental support", why do you mean here by experimental support? Do you mean that the motifs are experimentally supported? What is the source of the other ones? That wasn't clear to me. Also in that table, what are the two numbers in each cell? Average and St Dev?
This needs to be explained in the table caption. Do you mean 715 sets of 1000 sequences or 1000 set of 715 sequences (since that's the number of human pre-miRNAs that you use).
Response: We have simplified Table , changed it to a figure (Figure ), and rewritten the legend so that it is now clear. We removed the separate data for "with experimental support" as not being essential.
#8 Page 22, Table (now Figure ): The enrichment is more subtle based on this test (not even 2 fold). Can you comment on this discrepancy in the discussion?
Response: There is no discrepancy here. In this case, we are examining all pre-miR sequences fully, rather than only conserved regions, so both the true hits and the baseline "noise" level of hits are higher than when only conserved hits were considered. For example, on the top line of Table (now Figure ), the average number of TFBS hits in the randomized set is 4016 with a SD of 97. Stated another way, the null distribution of hits expected by chance has a mean of 4016 and SD of 97. What we actually observed in human pre-miRs is an average of 4721 hits. 4721-4016 = 705, which means the observed value is 7.268 SD away from the mean of the null distribution. This is extremely unlikely to have occurred by chance. What is important is the difference between pre-miRs and randomized pre-miR sequences, in terms of Standard Deviations - not the fold difference in hits.
Small comments
Page 3, par 2, line 1: "track is visible" - > "track is available"
Page 3, par 2, line 3: "398 transcription factor binding sites", this is a bit confusing to me. Do you mean 398 transcription factor binding motifs? The term "binding site" is used to describe a specific instance of a binding motif.
Response: Done.
Page 10, par 2, line 11: "Importantly, since this paper was originally submitted for publication, Zhu et al have reported" - > "Consistent with our findings, Zhu et al. have recently reported"
Response: This erroneously implies that their observations predated ours.
Reviewer #2, Dr. Dmitri Pervouchine, Moscow State University, nominated by Dr. Mikhail Gelfand, had the following comments:
In order to check whether the reported association is indeed present, I sampled 20 human microRNAs and looked them up by eye in the Genome Browser. Of these, 16 cases were not associated with TRANSFAC-predicted binding sites.
Response: Is the reviewer saying that out of 20 human pre-miRs which we claimed to have TFBS, 16 were not supported by eye in the Genome Browser? That would indicate a serious problem with our ms. and we would appreciate clarification of this point. However, it seems that he merely chose 20 in an unsystematic manner. Many human miRs are primate specific and will not show TFBS in the Genome Browser.
hsa-mir-17 belonged to a polycistronic cluster (also containing hsa-mir-18a, hsa-mir-19a, and hsa-mir-20a) residing in a large genomic region highly enriched with TF binding sites, let-7a and let-7f, also likely to be transcriptionally coupled, were also enriched with TFBSs, and mir-7-1 was also found in a large genomic region with high density of TFBSs. In this regard one should ask whether or not miRNAs tend to occur in genomic loci with higher than on average TFBS density (this is different from the statement made in the paper).
Response: As discussed above with regard to the comments of reviewer 1, TFBS motifs are indeed enriched in regions flanking pre-miRs [that was previously known] as well as within pre-miRs [our novel observation].
The authors should make a statistical control by using genomic regions with high overall TFBS density to address the possible confounding effect.
Response: We did that. They comprise the negative control dataset comprised of sequences "most similar" to pre-miRs in conservation and dinucleotide sequence composition (results shown in Figure ).
Another statistical control comparing to hairpins that are similar to microRNAs would be necessary to address whether or not the RNA structure is responsible for the seeming relationship.
Response: We agree that it is likely that the association of TFBS motifs is related somehow to the hairpin structure of pre-miRs. However, were that to be true [and to hold for some other miR-like hairpins in the genome], it would only make our data more interesting and provide more biological context (e.g., it might tie in with the observation that some transcription factors bind double-stranded sequences). It would not imply that our observations are some type of artifact. One might think of snoRNAs as a putative negative set, but we now know that many snoRNAs actually give rise to miRNA-like small RNAs which may be functionally related to miRNAs. Thus, it is not clear whether snoRNAs should be appropriately viewed as NEGATIVE control sequences, or potentially as additional POSITIVE examples! In short, we do not know of any dataset of "hairpins similar to microRNAs" that should definitely be negative and that can be used unambiguously for such a test.
Also, another control would be necessary to address to what extent the observed association is influenced by the cluster organization of miRNAs.
Response: We did that. As shown in Table (now Table ), the phenomenon affects clustered and unclustered miRs equally.
Accordingly, the manuscript "Transcription factor binding sites are highly enriched within microRNA precursor sequences" in its current form is not recommended for publication.
Response: The most important point of our paper is that the MAJORITY of conserved human pre-miRs express one or more transcription factor binding sites, as defined by the same algorithms and stringent statistical criteria that are used for TFBS within promoters. In our view, this is likely to have BIOLOGICAL significance regardless of the level of statistical significance. The fact that the statistical significance is also extremely high is a bonus. Had we reported the presence of TFBS just upstream of pre-miRs (as Saini et al did), no one would have questioned our observation in the slightest. It is only because current knowledge does not provide an obvious expectation that TFBS should be present, that we believe reviewers have had such strong objections to our paper. Yet, we feel that one of the major reasons for carrying out bioinformatics analyses is to make surprising observations that can stimulate further mechanistic investigations. The recent Zhu et al paper already lends further independent bioinformatics support to our observations, and we pointed out that the experimental literature offers two tentative biological explanations - namely, that pre-miRs contain promoter elements, and/or that transcription factors bind pri-miRs and pre-miRs directly. Thus, we believe that publication at this point is justified.
Reviewer #3, Dr. Yuriy Gusev, Georgetown University Medical Center, provided no comments for publication.