Nicolas Galtier, CNRS-Université Montpellier II, France
This article revisits the literature about genomic GC-content distribution across bacteria in the light of variations in the structure of the catalytic subunit of DNA polymerase III. Three classes of the dimeric subunit of DNA pol III have been described in bacteria, each influencing the genomic GC-content in a specific way.
This paper confirms/demonstrates that DNA pol III is a major determinant of between-species GC-content variations in bacteria, and pinpoints a couple of previous studies in which inappropriate conclusions were reached by not accounting for this effect.
In my opinion, this manuscript contains two important results, which revive and illuminate long-lasting controversies. The first one is about the relationship between GC-content and aerobiosis. We have known for ten years or so that aerobic bacteria show a higher GC-content than anaerobic ones, on average, and this is paradoxical given that C->T and G->A are generally the most common mutations in oxidative context. This study demonstrates that the relationship is largely, or entirely, explained by the differential usage of DNA pol III subunit between aerobes and anaerobes: aerobes tend to carry the GC-enriching polymerase, and anaerobes the AT-enriching one. The second strong result, in my opinion, is about the relationship between genomic GC-content and optimal growth temperature (OGT), two variables that were found unrelated across prokaryotes [60
]. Here it is shown that, within each of the three categories of DNA pol III, GC% and OGT do correlate positively. The reason why this relationship did not come out in all-species analyses is that thermophiles most frequently use the AT-enriching polymerase, and mesophiles or psychrophiles the GC-enriching one. It seems to me that these two results, if confirmed, should have a strong impact on bacterial comparative and environmental genomics, in which GC-content variations are obvious, and so far poorly understood.
That said, I have a number of comments/concerns about the form of the paper, the underlying statistics, and its potential implications, which I hope might help improve the manuscript.
- I would suggest introducing the current work as an attempt to account for a confounding factor so far overlooked. Currently the manuscript focuses on their importance of replication genes in GC-content variations, but this very result was previously published (by the same authors), and this study does not add so much to that argument.
Authors' response: We appreciate the reviewer's encouragement and suggestions. We have restructured our manuscript to emphasize the correlations between relevant confounding factors and GC content variation. In this study, we found several lines of solid evidence, which confirmed our previous conclusions, based on large-scale comparative genome analyses.
- Rather, I would suggest developing the two results I outline above: specifically review the relevant bibliography; show the GC%/OGT relationship within DNA pol groups, and globally (similarly to figure ); perform two-way ANOVA of GC% on DNA pol category and OGT (on one hand), and on DNA pol category and aerobiosis (on the other hand), and discuss the percentage of variance of GC% explained by these variables; conclude about misinterpretations in existing literature.
Authors' response: We agree. We re-analyzed the relationship between GC% and OGT (see additional file 1) and have added a new reference referring to a relevant result from one of our early studies. We also performed the corresponding two-way ANOVA analyses and incorporated the results into the revised manuscript
- By comparison, it seems to me that the analyses of ecological and metabolic features and of genomic gene content (figure , , , ) add less to existing bibliography. I would suggest shortening these sections, and especially the section about gene number, in which separating species by DNA pol III classes does not appear to change much of the prevailing hypotheses.
Authors' response: After analyzing the contribution of OGT and oxygen requirement to GC content variation, based on our dnaE-based group framework, we think that it is necessary for us to perform analysis on the contribution of other related factors, such as several ecological and metabolic features, to provide evidence for the universality of the dnaE-based grouping scheme. For example, plant- and terrestrial-associated bacteria that are reported to have higher GC content are mostly grouped in the dnaE1|dnaE2 group. Therefore, we think that some of the previously described relationships between GC content and environmental factors may also fall into our scheme, but have not been realized. Indeed, from Tables and , we observe that there are still not enough data for a meaningful statistical analysis. We hope that we can draw a more significant conclusion in the near future, when more bacterial genome sequences become available. As to the analysis performed on gene number, our major conclusion is that the dnaE2 group bacteria that have a higher GC content tend to have larger genomes, in contrast to the opposite situation in the dnaE3 group bacteria. Therefore, we believe that the positive correlation between genome size (or gene number) and GC content is much more pronounced when analyzed under our dnaE-based grouping scheme.
- The manuscript does not explicitly address the problem of phylogenetic independence of the observations. The author might think of using the Independent Contrast method, or any related method, to check further the significance of the relationships they uncover. At any rate, the authors must give an idea of the phylogenetic distribution of the three classes of DNA pol III: are they scattered throughout the bacterial tree, or clustered by phyla/families? This is partly answered by figure , in which within-genus variations of DNA pol III class are reported, somewhat suggesting that the phylogenetic inertia on this trait is weak. Confirmation welcome.
Authors' response: We fully agree with the reviewer and it would be compelling to analyze the phylogenetic independence of these observations. However, it is not straightforward to illustrate these points in the current manuscript and we believe that it is beyond the scope of this manuscript. We have prepared another manuscript on the evolutionary scenarios of these four different polymerases, as well as analysis of their relationship in a context of both bacterial taxonomy and sequence evolution.
- Figure , figure and many sentences in the manuscript make convincing cases suggesting that changes in DNA pol III affect bacterial GC-content evolution. However, I wonder how representative are these examples: were they specifically selected to illustrate the main pattern reported in this study, or are they more or less random instances? Figure : why choosing just ten thermophilic species, and why these ten?
Authors' response: We thank the reviewer for his constructive comments. We wanted to explain the ambiguous relationship between OGT and GC content based on real data. The reasons we choose these 10 bacteria are as follows. First, we needed to select bacteria that have precise OGT information. Second, to exclude the interference of phylogenetic distance with GC content, we need to select several bacteria that have close phylogenetic relationships in each phylum. Third, all the bacteria should fall into the three different dnaE-based groups evenly. Fourth, both their GC content and OGT have to vary significantly.
Figure : are Shewanella and Mycobacterium the only genera showing variations in DNA pol III? If not, could you please provide a more global picture, and mention counter-examples if there are some? I have a similar concern about the discussion, in which the focus is presumably put on examples fitting the general theory, not counter-examples.
Authors' response: We analyzed a collection of what are currently available in the public domain and have not found a single example that contradicts our grouping scheme and predictions concerning the trend of GC content variation in relationship with other extrinsic factors. Our large-scale comparative screening demonstrated that most closely related bacteria tend to have the same isoforms of dnaE polymerases. We also identified two examples, namely, Shewanella and Mycobacterium, where the rules are not followed but the explanation is apparent.
- Along the same lines, the removal of "outliers" (figure ) does not appear justified to me, even though I agree that horizontal gene transfer presumably perturb the observed relationship, which is good to mention.
Authors' response: Agreed. We further revised the corresponding description by performing linear regression analysis and removing the "outliers" by more robust upper and lower 90% prediction limits.
- It seems to me that the surprising report by Foerstner et al. [26
] of very different GC- content distributions between distinct environmental samples (despite comparable representation of the bacterial phyla) could reflect a differential usage of the three DNA pol III across environments. This could perhaps be checked by identifying DNA pol III sequences in the corresponding metagenomic data.
Authors' response: You are right. We also think that DNA polymerase III may be an excellent group of genes for phylogeny and related evolutionary analysis. We are currently working on several metagenomic data and will apply this idea and report the results as soon as we have concrete conclusions.
- Having demonstrated that the DNA pol III subunit plays a major role in GC% variations, it is tempting to ask what determines variations in DNA pol III usage across groups of bacteria. For instance: do aerobic bacteria most frequently use the GC-enriching DNA pol III because it is GC-enriching, or because it is more efficient in aerobic conditions, and incidentally GC-enriching?
Authors' response: The reviewer poses a very interesting and challenging question here. We believe that the four dnaE isoforms diverged at a very early stage of eubacterial evolution and drove the bacteria towards not only different GC contents, but also different evolutionary routes or landscapes, either randomly or under environmental pressures. Over time, bacteria that possess different dnaE isoforms have favored different environments, leading to the current diversity.
- The manuscript would strongly benefit from English corrections
- Abstract (and introduction, last paragraph):
"The contribution of other environmental or bacteriological factors, such as genome size, temperature, oxygen requirements, and habitats, either indirectly rely on the choice of mutator genes or take the advantage of their fine-tuning effect on the trends determined by other factors." This sentence is unclear to me and probably deserves rephrasing.
Authors' response: We have rephrased this paragraph.
- The Background section introduces codon usage biases and transcription-coupled mutation/repair, but these two aspects are not addressed in this study. The potential role of OGT, aerobiosis, metabolism and environment are not, or very briefly, introduced.
Authors' response: Our previous study confirmed that codon usage biases are driven by GC content changes, but not vice versa
], as suggested by Knight et al.
]. Therefore, we did not pay too much attention to this point here. The contribution of transcription-coupled repair was discussed in Gramineae
], but we are still uncertain how to analyze this in bacteria. For the convenience of the discussion, we summarized 10 other different hypotheses that have been put forward as potential mechanisms for generating GC content variation (Table ), and we will write a more comprehensive review when the conclusions become clearer
- Table and figure : I suggest grouping "microaerophilic" with "anaerobic" (or "microaerophilic" with "facultative" if you think it is more appropriate). This is because percentages are meaningless in small groups of species, and percentages are very important in this table.
- Table and figure : I suggest grouping psychrophile with psychrotrophic bacteria, and thermophiles with hyperthermophiles (same reason).
Authors' response: Agreed. We have revised this in related tables and figures.
- Figure and : keep the same order as in table and table , respectively, for categories. Authors' response: We have the made revisions.
Adam Eyre-Walker, Centre for the Study of Evolution and School of Life Sciences, University of Sussex, Brighton, United Kingdom.
The current paper follows up work the authors have done on the relationship between genomic GC and the presence of various DNA polymerase alpha subunits in eubacterial genomes. They confirm, as in their previous work [28
] that species which use a combination of dnaE3 and polC subunits tend to have lower genomic GC contents than those which use dnaE1 subunits, which have much lower genomic GC contents than those which use a combination of dnaE1 and dnaE2. They argue therefore that mutation biases introduced by the alpha polymerase is a major determinant of genomic GC content in bacteria.
Unfortunately, this conclusion is not justified given that there is a high level of phylogenetic non-independence in their data. If we accept their classification of alpha subunits into the four main familes (dnaE1-3 and polC) then almost all bacteria that have dnaE3 and polC are firmicutes and almost all bacteria with dnaE1 and dnaE2 bacteria are proteobacteria and actinobacteria [29
]. Hence it is possible that the association between alpha polymerase subunits and GC content is coincidental, established by a few coincidental evolutionary changes; for example, it might be that the evolution of the dnaE2 subunit happened at the same as another unrelated evolutionary change which caused a shift towards high genomic GC content. If there have been relatively few instances in which the alpha polymerase has evolved then association with GC content may be coincidental.
Authors' response: We thank the reviewer for the critical comments. We have overlooked the molecular mechanisms that govern compositional (sequence) variations, but concentrated on sequence variation itself. A minute change in the conformation of these mutator enzymes may alter the GC content in another direction. Clearly, Figure shows that in genera Shewanella and Mycobacterium, bacteria in the dnaE1|dnaE2 group generally have higher GC content (by about 10%) as compared with those in the dnaE1|polV group. In addition, we found that all three newly sequenced (deposited in the public database) bacteria in Firmicutes (the dnaE3 group) have unexpectedly high GC content (>60%) and two of them (Alicyclobacillus acidocaldarius subsp. Acidocaldarius DSM 446 and Symbiobactrium thermophilum IAM 14863) correlate well with the presence of dnaE2. One bacterium (Candidatus desulforudis audaxviator MP104C) has been proven to have lost polC, similar to what we found in Pelotomaculum thermopropionicum SI. Furthermore, analyzing the pattern and distribution of bacterial SSR (simple sequence repeats), we found one bacterium, Acidiphilium cryptum JF-5, which was previously identified as dnaE1|polV group bacterium, has now been proven to have SSR patterns similar to that of dnaE1|dnaE2 group bacteria. Our further genome-wide screening led to the discovery of a single copy dnaE2 in one of its plasmids (manuscript in preparation). Therefore, we think that the correlation between dnaE polymerases and GC content is a rule rather than coincidental and exceptional, albeit lacking direct experimental confirmation. Of course, we do not think that there are no exceptions to the rule, but we predict that they are the minority.
The authors need to conduct a proper comparative analysis by, for example, selecting related pairs of bacteria that differ in their alpha-polymerase subunits. They give some examples at the end of the current paper, but they need to find more examples, and to find these without reference to the genomic GC content. Once they have set the problem within a proper comparative framework they can start to investigate the relative correlation between GC content and alpha polymerase subunits, genome size, lifestyle....etc.
Authors' response: We have conducted a comparative analysis by selecting bacteria that differ in their alpha-polymerase subunits, as shown in Figure . In future investigations, we may be able to show more examples, but what we have now is limited by the availability of the relevant public data.
As it stands I do not think there is much evidence to support the authors' hypothesis that GC content evolution is determined by alpha polymerase subunits. Even if this was proven it is evident from their figure that a large proportion of the variance in genomic GC content is not explained by subunits, since there is a large variance in genomic GC content within each subunit category.
Authors' response: We cited our previous related papers and added several lines of evidence to support our hypothesis. It is true that GC content variation in each group varies to different extents. What we are emphasizing here are two concepts. One is the fact that there are boundaries or specific spectra in compositional variability. The dnaE1|polV group is the extreme, which appears to have no limit in GC content variation but is regulated by mutator genes. Other groups have boundaries and they either prefer low-GC or high-GC contents. The other concept is why GC content varies and the complexity required to explain such variability. Large variances within each subunit category reflect the complexity of diverse factors contributing to GC content variation. As exemplified in our manuscript, there are also many other mutator genes (such as mutT, mutY, and mutM), as well as several environmental and bacteriological factors contributing to GC content variations. Horizontal gene transfer is another major factor that often results in broader GC content variability; not only as a mechanism of genetic material exchange, but also the material itself often makes significant contributions.
Quality of written English: Needs some language corrections before being published.
Authors' response: We have carefully checked the wording throughout the manuscript and revised the manuscript for clarity.