Exhibition of a wide range of genomic G+C-content (30.8% to 50.7%) and genome sizes (1.6 Mb to 2.7 Mb) by different strains of P. marinus, and also their adaptation to different ecological niches - a situation encountered rarely in the microbial world - demand detailed investigation. We have performed a large scale comprehensive study to critically analyze the direction and strength of mutational pressure and genomic/proteomic determinants associated with the adaptation of these strains to oceanic environments subject to different light intensities. From this study it appears that (a) low light adapted (LL) free living Prochlorococcus strains exclusively show strand asymmetry in synonymous codon usage, (b) general trends in amino acid usage in LLa, LLb and HL strains differ appreciably, (c) distinct dinucleotide abundance profiles are exhibited by LLa, LLb and HL strains, (d) higher number of genes have undergone positive selection between the strains with distinct light optima, i.e., between LL and HL strains and (e) there are definite trends in variations of different physicochemical and structural features in core proteomes of different groups of Prochlorococcus strains, which are not solely governed by their genomic G+C-bias. These observations, along with the findings on large-scale genome reduction associated with gradual increase in genomic A+T-content and extensive chromosomal rearrangements between different strains, strongly suggest a stepwise diversification of Prochlorococcus strains, in course of their adaptive evolution (Figure ).
Among several genome/proteome signatures of
P. marinus strains reported for the first time in this work, the most notable is the impact of pronounced replication-strand-specific asymmetry on synonymous codon usage, observed exclusively in the low light adapted strains of
P. marinus (Figure ). This is noteworthy for two reasons: (i) Presence of pronounced strand-specific mutational bias with detectable influence on codon usage was observed so far mostly for obligatory intracellular microorganisms having reduced genomes [
2,
3,
5]. Interestingly, all 6 LL strains of
P. marinus exhibiting strand-specific synonymous codon usage are free-living and two of them (LL1 and LL2) are characterized by relatively larger genome size. On the other hand, for the reduced genomes of 6 HL strains, no perceivable sign of strand asymmetry could be seen in their usage of synonymous codons. (ii) In most of the other microbial genomes with asymmetric mutational bias, the genes, especially the highly expressed ones, are present in the leading strands of replication in significantly higher numbers, the phenomenon referred to as replicational-transcriptional selection [
2,
3,
34,
35]. No such definite significant bias in gene distribution is observed in either of the strands of replication in the LL strains of
P. marinus. Strand asymmetry in codon usage of
Prochlorococcus, therefore, may not bear an explicit causality to the event of genome reduction or with replicational-transcriptional selection.
The homogenization of the strand asymmetric bias in the HL strains may be attributed, at least partially, to the absence of a specific type of DNA repair enzyme MutY. In previous studies of Rocap
et al. [
20] and Dufresne
et al. [
25] it have been shown that the enzyme MutY is absent in the strain
P. marinus str. CCMP1986 (HL3), while it is present in
P. marinus str. CCMP1375 (LL3) and
P. marinus str. MIT9313 (LL1). MutY, an A/G-specific DNA glycosylase, acts with MutT (NTP pyrophosphohydrolase) and MutM (formamido-pyrimidine-DNA glycosylase) to avoid misincorporation of oxidized guanine (8-oxoG) in DNA and to repair the base mismatches A:8-oxoG [
37]. Knocking out both
mutM and
mutY in
E. coli results in a 1,000-fold increase of G:C to A:T transversions in comparison to the wild-type strain [
38]. Our analysis reveals (through BLASTP search) that
mutY is present only in the LL strains, but not in any of the 6 HL strains. The excess number of 'G's present in the leading strands of LL strains might have transversed to 'A's in the HL strains due to the absence of
mutY in the later, and this in turn, caused a simultaneous increase of 'T's in the lagging strands, eventually leading to homogenization of the G+T and A+C frequencies in two strands of replication in the HL strains. Existing mutational drift towards A+T-enrichment in the HL strains might also have facilitated achieving the uniformity in those strains. Further insights may be accumulated in this regard with the availability of more completely sequenced
Prochlorococcus genomes in future.
In the process of gradual genome reduction, mutations often accumulate in expendable genes, thereby transforming them, by degrees, to pseudogenes, to small fragments, to extinction [
39]. In the reduced genomes of
P. marinus, we have found some putative remnants of coding regions, the A+T-content of which are, in general, higher than that of coding regions, but lower than other non-coding regions. This is in agreement with the fact that the reduced genomes of
P. marinus (especially those of HL strains) are subject to a strong mutational A+T-drift, and will therefore result in gradual A+T-enrichment of the genic remnants already released from amino-acid-coding constraints in recent past. The base composition of such remnants is expected to gradually approach the A+T-content of
bona fide non-coding regions.
Comparison of orthologous gene synteny from five representative strains having different genome size and G+C-content clearly points at a high level of chromosomal rearrangement during genome shrinkage in
Prochlorococcus. This finding is in agreement with earlier findings on association of chromosomal rearrangement events with higher rates of chromosomal evolution and/or the phenomenon of genome reduction, as in
Arabidopsis thaliana [
40] and different endoparasites/endosymbionts [
41,
42]. Intra-chromosomal recombination at duplicated sequences often results in deletion of intervening sequences, and rearrangement of flanking regions, thereby leading to genome shrinkage [
39].
Previous analyses with endosymbiotic or endoparasitic organisms like
Bartonella,
Tropheryma,
Buchnera,
Wigglesworthia etc. [
2,
3,
28,
29] revealed that the phenomenon of genome reduction is normally associated with population bottlenecks or other mechanisms such as selective sweeps. In case of the hyperthermophile
Nanoarchaeum equitans, extreme genome reduction is a feature of its thermoparasitic adaptation [
1]. Although our knowledge of bacterial populations in open oceans is not exhaustive, it may certainly be assumed that
P. marinus ecotypes, the most abundant free-living marine cyanobacteria and an important contributor to global photosynthesis, are not subject to small population sizes [
13-
15,
30]. More importantly, the HL strains with reduced genomes are apparently biologically superior than their LL counterparts [
21]. It is possible that the bias towards reduced A+T rich genomes in HL strains is consistent with cellular economy at regions with limited nitrogen and phosphorous near the ocean surface. Scarcity of these elements that are essential in DNA synthesis favors the incorporation of an AT base-pair containing seven atoms of nitrogen, one less than a GC base-pair. It is worth mentioning at this point that the trends in amino acid usage in different
P. marinus strains, as observed in this study are quite compatible with the earlier report by Lv
et al. [
43] on influence of resource availability on proteome composition of these species. For instance, increase in overall aromaticity from LLa to LLb and HL strains is in full agreement with the observations by Lv
et al. [
43] on increased carbon-content in the encoded proteins of different HL strains as compared to that of LL strains. The average instability indices of the HL proteins are significantly lower than those of their LL orthologs, suggesting that the HL proteins, in general, may be more stable. Proteins characterized by higher percentages of helix structures, experience increased overall packing that imparts more rigidity [
44] and, hence, a decrease in regions with helix-forming propensities with a subsequent increase in coiled structures in HL proteins probably makes them more flexible. It is also tempting to presume that higher values of aromaticity and pI in HL proteins, as compared to LL orthologs, might facilitate cation-pi interactions in the former, imparting more stability. The central issue in the adaptation of HL proteins to their environmental niches may, therefore, be the conservation of their functional state, characterized by a well-balanced optimization of stability and flexibility.