The oceans play a key role in global nutrient cycling and climate regulation. The unicellular cyanobacterium Prochlorococcus
is an important contributor to these processes, as it accounts for a significant fraction of primary productivity in low- to mid-latitude oceans [1
and its close relative, Synechococcus
], are distinguished by their photosynthetic machinery: Prochlorococcus
uses chlorophyll-binding proteins instead of phycobilisomes for light harvesting and divinyl instead of monovinyl chlorophyll pigments. Although Prochlorococcus
coexist throughout much of the world's oceans, Synechococcus
extends into more polar regions and is more abundant in nutrient-rich waters, while Prochlorococcus
dominates relatively warm, oligotrophic regions and can be found at greater depths [3
]. The Prochlorococcus
group consists of two major ecotypes, high-light (HL)-adapted and low-light (LL)-adapted, that are genetically and physiologically distinct [4
] and are distributed differently in the water column [5
]. Given their relatively simple metabolism, well-characterized marine environment, and global abundance, these marine cyanobacteria represent an excellent system for understanding how genetic differences translate to physiological and ecological variation in natural populations.
The first marine cyanobacterial genome sequences suggested progressive genome decay from Synechococcus
to LL Prochlorococcus
to HL Prochlorococcus
, characterized by a reduction in genome size (from 2.4 to 1.7 Mb) and a drop in G + C content from ~59% to ~30% [7
]. Notably, genes involved in light acclimation and nutrient assimilation appeared to have been sequentially lost, consistent with the niche differentiation observed for these three groups [7
]. This comparison suggested that the major clades of marine cyanobacteria differentiated in a stepwise fashion, leading to patterns of gene content that corresponded to the isolates' 16S rRNA phylogeny.
Recently, however, molecular sequence data and physiology studies have revealed complexity beyond the HL/LL paradigms. Within the LL ecotype, for instance, some but not all isolates can use nitrite as a sole nitrogen source [10
], and the LL genomes range widely in size [7
]. Moreover, the distribution of phosphate acquisition genes among Prochlorococcus
genomes does not correlate to their rRNA phylogeny but instead appears related to phosphate availability: strains isolated from low-phosphate environments are genetically better equipped to deal with phosphate limitation than those from high-phosphate environments, regardless of their 16S rRNA phylogeny [11
]. Thus, while the HL/LL distinction has held up both phenotypically and genotypically, there are other differences among isolates that are not consistent with their rRNA phylogeny. Thus, to understand diversification and adaptation in this globally important group, we must characterize the underlying patterns of genome-wide diversity.
Lateral gene transfer (LGT) is one mechanism that creates complex gene distributions and phylogenies incongruent with the rRNA tree. The question of whether a robust organismal phylogeny can be inferred despite extensive LGT is still hotly debated [12
]. If a core set of genes exists that is resistant to LGT, then gene trees based on these core genes should reflect cell division and vertical descent, as has been argued for the gamma Proteobacteria
]. Others argue that genes in a shared taxon core do not necessarily have the same evolutionary histories, making inference of an organismal phylogeny difficult [14
]. In spite of this debate, the core genome remains a useful concept for understanding biological similarity within a taxonomic group. Recent comparisons within the lactic acid bacteria, cyanobacteria, and Streptococcus agalacticae
groups, for instance, have each revealed a core set of genes shared by all members of the group, on top of which is layered the flexible genome [15
]. The vast majority of genes in the core genome encode housekeeping functions, while genes in the flexible genome reflect adaptation to specific environments [16
] and are often acquired by LGT. Thus the core and flexible genomes are informative not only in a phylogenetic context, for understanding the mechanisms and tempo of genome evolution, but also in an ecological context, for understanding the selective pressures experienced in different environments.
To further understand diversification and adaptation in Prochlorococcus
, we obtained sequences of eight additional genomes representing diverse lineages, both LL- and HL-adapted, spanning the complete 16S rRNA diversity (97% to 99.93% similarity) of cultured representatives of this group [18
] (). Comparing these genomes with available genomes for Prochlorococcus
and marine Synechococcus
, our goal was to reconstruct the history of vertical transmission, gene acquisition, and gene loss for these marine cyanobacteria. In particular we identified functions associated with the core and flexible genomes and analyzed the metabolic pathways encoded in each. This analysis reveals not only what differentiates Synechococcus
from LL Prochlorococcus
from HL Prochlorococcus
, but also informs our understanding of how adaptation occurs in the oceans along gradients of light, nutrients, and other environmental factors, providing essential biological context for interpreting rapidly expanding metagenomic datasets.
General Characteristics of the Prochlorococcus and Synechococcus Isolates Used in This Study