The process of genome reduction which has occurred within the
Prochlorococcus radiation has to our knowledge never been observed so far in any other free-living prokaryote. Since
Prochlorococcus sp. MIT9313 has a genome size very similar to that of
Synechococcus sp. WH8102 (2.4 megabase-pair (Mbp)), as well as several other marine
Synechococcus spp. (M. Ostrowski and D. Scanlan, personal communication), it is reasonable to assume that the common ancestor of all
Prochlorococcus species also had a genome size around 2.4 Mbp. Under this hypothesis, the genome reduction which has occurred in MED4 would correspond to around 31%. By comparison, the extent of genome reduction in the insect endosymbiont
Buchnera, as compared to a reconstructed ancestral genome, is around 77% [
27]. The genome of
P. marinus SS120 - and
a fortiori the MED4 genome - is considered to be near minimal for a free-living oxyphototrophic organism [
13]. It would seem that genome reduction in these organisms probably cannot proceed below a certain limit, corresponding to a gene pool containing all the essential genes of biosynthetic pathways and housekeeping functions (probably including most of the 1,306 four-way orthologous genes identified in this study) plus a number of other genes, including genus-specific as well as niche-specific genes. For instance, MED4 encodes a number of photolyase-related proteins, a few specific ABC transporters (for cyanate, for example; [
14] and data not shown). These specific compounds might be critical for survival in the upper water layer, which receives high photon fluxes, UV light and is nutrient-depleted, but less so for life deeper in the water column.
If both
Prochlorococcus lineages and host-dependent organisms have undergone genome reduction associated with accelerated substitution rates, these phenomena must have arisen from very different causes as the resulting gene repertoires of the two types of organisms differ tremendously. Indeed, the genome evolution of endosymbionts and obligatory pathogens is driven by two main processes which have mutually reinforcing effects on genome size and evolutionary rates. Being confined inside their host, these bacteria have tiny population sizes and are regularly bottlenecked at each host generation or at each new host infection. Consequently, they experience a strong genetic drift [
28] involving an increase in substitution rate. This acceleration results in the accumulation at random of slightly deleterious mutations in protein-coding genes [
8,
29] as well as in rRNA genes [
29,
30]. This genetic drift enhances the downsizing of the genome through inactivation and then elimination of potentially beneficial but dispensable genes. Among these, there have been a number of DNA-repair genes, the disappearance of which could have further increased the mutation rate [
6,
31-
33]. Furthermore, a number of genes may be subject to a relaxation of purifying selection which is therefore rendered less effective in maintaining gene function. This relaxation particularly affects genes which have become useless because they are redundant in their host genome, such as genes involved in the biosynthesis of amino acids, nucleotides, fatty acids and even ATP [
4-
6,
8,
9,
32]. Selection pressure is also reduced for genes involved in environmental sensing and regulatory systems, such as two-component systems, because of the much buffered environment offered by the host [
6].
In the free-living genus
Prochlorococcus, the very large size of field populations [
34] means that these populations are subject to much lower genetic drift and their genomes are subject to much stronger purifying selection than are those of endosymbionts and pathogens [
35]. Consequently, the observed accelerated rate of evolution probably results merely from the increase in the mutation rate, which in turn is probably due to the loss of DNA-repair genes, even if one should note that, in
P. marinus SS120 only two such genes are missing (Table ). We observed a similar acceleration of amino-acid substitutions for all functional categories (Figure ). This finding is more consistent with a global increase in the mutation rate than with relaxed selection, the latter being unlikely to occur to the same extent at all loci. We also assume that most amino-acid substitutions that have occurred in
Prochlorococcus proteins are neutral; that is, they have not altered protein function. Indeed, populations of the HL clade which, like MED4, have the most derived protein sequences of all
Prochlorococcus species, appear to be the most abundant photosynthetic organisms in the upper layer of the temperate and inter-tropical oceans [
16]. Such an ecological success would hardly be possible for organisms handicapped by a large number of slightly deleterious mutations, especially given the fact that most genes are single copy, and so compensation of gene function is generally not possible. The effect of the maintenance of a high level of purifying selection on counteracting deleterious substitutions is particularly obvious in the rRNA genes. Contrary to the protein-coding genes, relative rate tests did not show any significant differences in the rates of evolution of the 16S rRNA genes in the four marine picocyanobacterial genomes, and thus there is no evidence that either SS120 or MED4 could have accumulated mutations destabilizing the secondary structure of their 16S rRNA molecule. One noteworthy consequence of the acceleration in the rates of evolution of protein-coding genes in
Prochlorococcus is that phylogenetic reconstructions based on protein sequences are biased. Indeed, this leads to much longer branches for these two strains than for MIT9313. The resulting tree topology most often does not support that obtained with the 16S rRNA gene, for which the molecular clock hypothesis holds true according to our analyses. Thus, rRNA genes are likely to be among the few genes that will give reliable estimates of the phylogenetic distances between
Prochlorococcus strains.
If it is neither the relaxation of purifying selection nor an increase in genetic drift that has been the main factor causing
Prochlorococcus genome reduction, an alternative possibility is that the latter could be the result of a selective process favoring the adaptation of
Prochlorococcus to its environment. The apparently better ecological success in oligotrophic areas of
Prochlorococcus species compared to their close relative
Synechococcus [
16,
34], strongly suggests that the reduction of
Prochlorococcus genome size could provide a competitive advantage to the former. Indeed, extensive comparisons of the gene complements of these two organisms show very few examples - at least among genes for which function is known - of the occurrence of specific genes in MED4 which could explain its better adaptation (data not shown). One noteworthy exception is the presence in
Prochlorococcus, but not
Synechococcus, of flavodoxin and ferritin, two proteins that possibly give
Prochlorococcus a better resistance to iron stress. Apart from that,
Synechococcus appears more like a generalist, in particular with regard to nitrogen or phosphorus uptake and assimilation [
22], and should
a priori be more suited to sustain competition. Hence, we assume that the key to the success of
Prochlorococcus resides less in the development of a specific complex or pathway to cope better with unfavorable conditions than in the simplification of its genome and cell organization, which can allow this organism to make substantial economies in energy and material for cell maintenance.
The mere reduction in genome size
per se is a potential source of substantial economies for the cell, as it reduces the amount of nitrogen and phosphorus, two particularly limiting elements in the upper part of the ocean, which are necessary, for instance, in DNA synthesis. Another advantage is that it allows a concomitant reduction in cell volume. It has been previously suggested (see, for example [
36]) that, for a phytoplanktonic organism, a small cell volume confers two selective advantages by reducing self-shading (the package effect) and by increasing the cell surface-to-volume ratio, which can improve nutrient uptake. The first advantage would improve the fitness of the LL strains, whereas the second would offer an advantage to the HL strains living in nutrient-depleted surface waters. Finally, cell division is less costly for a small than for a large cell. On the basis of these observations, we assume that the major driving force for genome reduction within the
Prochlorococcus radiation has been the selection for a more economical lifestyle. The bias toward an A+T-rich genome in MED4 and SS120 is also consistent with this hypothesis, as it can be seen as a way to economize on nitrogen. Indeed, an AT base-pair contains seven atoms of nitrogen, one less than a GC base-pair.
With this hypothesis in mind, we propose a possible scenario for the evolution of
Prochlorococcus genomes. Using a rate of 16S rRNA divergence of 1% per 50 million years [
37], one can estimate that the differentiation of these two genera is as recent as 150 million years, as the molecular clock hypothesis holds for this gene in
Prochlorococcus and
Synechococcus. The ancestral
Prochlorococcus cells must have developed in the LL niche, a niche probably left free by other picocyanobacteria. Given the considerable difference in genome size between the LL strains MIT9313 and SS120, it appears that genome reduction itself must have started in one (or possibly several) lineage(s) within the LL niche some time after
Prochlorococcus differentiation from its common ancestor with marine
Synechococcus species. Why the selection has affected only one (or some?) and not all
Prochlorococcus lineages remains unclear. Examination of the gene repertoire of
P. marinus SS120 [
13] suggests that this genome reduction must have concerned the random loss of dispensable genes from many different pathways. At some point during evolution, some genes involved in DNA repair have been affected; these would include the
ada gene, which may be responsible for the shift in base composition, but also possibly several others, not necessarily involved in GC to AT mutation repair (see Table ). Loss of these genes may have led to an increase in the mutation rate and therefore in the rate of evolution of protein-coding genes, accompanied by a more rapid genome shrinkage and a shift of base composition toward AT. It is worth noting that one likely consequence of this genome-wide compositional shift is the absence of the adaptive codon bias in the genomes of
Prochlorococcus species MED4 and SS120. AT-rich codons are preferentially used whatever the amino acid (Figure ). Thus, codon usage in these genomes appears to reflect more the local base-composition bias than the selection for a more efficient translation through the use of optimal codons. The same conclusion has been drawn for other small genomes with high A+T content [
28,
38].
Later during evolution (around 80 million years ago, according to the degree of 16S rRNA sequence divergence between MED4 and SS120) one LL population which probably already had a significantly reduced cell and genome size must have progressively adapted to the HL niche and eventually recolonized the upper layer. How this change in ecological niche was possible is still hard to define. Comparison of the gene set that differs between the LL-adapted SS120 and the HL-adapted MED4 shows that very few genes might be sufficient to shift from one to the other niche, including a multiplication of
hli genes [
39] and the differential retention of genes which were present in the common ancestor of
Prochlorococcus and
Synechococcus, (such as the photolyases and cyanate transporters mentioned above) and were secondarily lost in the LL-adapted lineages.