|Home | About | Journals | Submit | Contact Us | Français|
Genomic islands have been shown to harbor functional traits that differentiate ecologically distinct populations of environmental bacteria. A comparative analysis of the complete genome sequences of the marine Actinobacteria Salinispora tropica and S. arenicola reveals that 75% of the species-specific genes are located in 21 genomic islands. These islands are enriched in genes associated with secondary metabolite biosynthesis providing evidence that secondary metabolism is linked to functional adaptation. Secondary metabolism accounts for 8.8% and 10.9% of the genes in the S. tropica and S. arenicola genomes, respectively, and represents the major functional category of annotated genes that differentiates the two species. Genomic islands harbor all 25 of the species-specific biosynthetic pathways, the majority of which occur in S. arenicola and may contribute to the cosmopolitan distribution of this species. Genome evolution is dominated by gene duplication and acquisition, which in the case of secondary metabolism provide immediate opportunities for the production of new bioactive products. Evidence that secondary metabolic pathways are exchanged horizontally, coupled with prior evidence for fixation among globally distributed populations, supports a functional role and suggests that the acquisition of natural product biosynthetic gene clusters represents a previously unrecognized force driving bacterial diversification. Species-specific differences observed in CRISPR (clustered regularly interspaced short palindromic repeat) sequences suggest that S. arenicola may possess a higher level of phage immunity, while a highly duplicated family of polymorphic membrane proteins provides evidence of a new mechanism of marine adaptation in Gram-positive bacteria.
Linking functional traits to bacterial phylogeny remains a fundamental but elusive goal of microbial ecology (Hunt et al 2008). Without this information, it becomes difficult to resolve meaningful units of diversity and the mechanisms by which bacteria interact with each other and adapt to environmental change. Most bacterial diversity is delineated among clusters of sequences that share >99% 16S rRNA gene sequence identity (Acinas et al 2004). These sequence clusters are believed to represent fundamental units of diversity, while intra-cluster microdiversity is thought to persist due to weak selective pressures (Acinas et al 2004) suggesting little ecological or taxonomic relevance. Recently, progress has been made in terms of delineating units of diversity that possess the fundamental properties of species by linking genetic diversity with ecology and evolutionary theory (Achtman and Wagner 2008, Fraser et al 2009). Despite these advances, there remains no widely accepted species concept for prokaryotes (Gevers et al 2005), and sequence-based analyses reveal widely varied levels of diversity within assigned species boundaries.
The comparative analysis of bacterial genome sequences has revealed considerable differences among closely related strains (Joyce et al 2002, Thompson et al 2005, Welch et al 2002) and provides a new perspective on genome evolution and prokaryotic species concepts. Genomic differences among closely related strains are concentrated in islands, strain-specific regions of the chromosome that are generally acquired by horizontal gene transfer (HGT) and harbor functionally adaptive traits (Dobrindt et al 2004) that can be linked to niche adaptation. The pelagic cyanobacterium Prochlorococcus is an important model for the study of island genes, which in this case are differentially expressed under low nutrient and high light stress in ecologically distinct populations (Coleman et al 2006). Despite convincing evidence for the adaptive significance of island genes among environmental bacteria, the precise functions of their products have seldom been characterized and their potential role in the evolution of independent bacterial lineages remains poorly understood.
The marine sediment inhabiting genus Salinispora belongs to the Order Actinomycetales, a group of Actinobacteria commonly referred to as actinomycetes. Actinomycetes are a rich source of structurally diverse secondary metabolites and account for the majority of antibiotics discovered as of 2002 (Berdy 2005). Salinispora spp. have likewise proven to be a rich source of secondary metabolites (Fenical and Jensen 2006) including salinosporamide A, which is currently in clinical trials for the treatment of cancer (Fenical et al 2009). At present, the genus is comprised of three species that collectively constitute a microdiverse sequence cluster (sensu (Acinas et al 2004), i.e., they share ≥99% 16S rRNA gene sequence identity (Jensen and Mafnas 2006). Although the microdiversity within this cluster has been formally delineated into species-level taxa (Maldonado et al 2005), it remains to be determined if these taxa represent ecologically or functionally distinct lineages.
Here we report the comparative analysis of the complete genome sequences of S. tropica (strain CNB-440, the type strain for the species and thus a contribution to the Genomic Encyclopedia of Bacteria and Archaea project), hereafter referred to as ST, and S. arenicola (strain CNS-205), hereafter referred to as SA, the first obligately marine Actinobacteria to be obtained in culture (Mincer et al 2002). The aims of this study were to describe, compare, and contrast the gene content and organization of the two genomes in the context of prevailing species concepts, identify the functional attributes that differentiate the two species, assess the processes that have driven genome evolution, and search for evidence of marine adaptation in this unusual group of Gram-positive marine bacteria.
The sequencing and annotation of the SA genome was as previously reported for ST (Udwary et al 2007). Both genomes were sequenced as part of the Department of Energy, Joint Genome Institute, Community Sequencing Program. Orthologs within the two genomes were predicted using the Reciprocal Smallest Distance (RSD) method (Wall et al 2003), which includes a maximum likelihood estimate of amino acid substitutions. A linear alignment of positional orthologs was created and the positions of rearranged orthologs and species-specific genes identified. Genomic islands were defined as regions >20 kb that are flanked by regions of conservation and within which <40% of the island genes possess a positional ortholog in the reciprocal genome. Paralogs within each genome were identified using the blastclust algorithm (Dondoshansky and Wolf 2000) with a cut-off of 30% identity over 40% of the sequence length. The automated phylogenetic inference system (APIS) was used to identify recent gene duplications (Badger et al 2005).
All genes were assessed for evidence of HGT based on abnormal DNA composition, phylogenetic, taxonomic, and sequence-based relationships, and comparisons to known Mobile Genetic Elements (MGEs). Genes identified by ≥2 different methodologies were counted as positive for HGT. To reflect confidence in the assignments, genes displaying positive evidence of HGT were color coded from yellow to red corresponding to total scores from 2 to 6. The results were mapped onto the genome to reveal HGT clustering patterns and adjacent clusters were merged (Figure 1a). Four DNA compositional analyses included G+C content (obtained from the JGI annotation), codon adaptive index, calculated with the CAI calculator (Wu et al 2005) using a suite of housekeeping genes as reference, dinucleotide frequency differences (δ*), calculated using IslandPath (Hsiao et al 2003), and DNA composition, calculated using Alien_Hunter (Vernikos and Parkhill 2006). G+C content or codon usage values >1.5 standard deviations from the genomic mean and dinucleotide frequency differences >1 standard deviation from the mean were scored positive for HGT. Taxonomic relationships in the form of lineage probability index (LPI) values for all protein coding genes were assigned using the Darkhorse algorithm (Podell and Gaasterland 2007). Genes with an LPI of <0.5, indicating the orthologs are not in closely related genomes, were scored positive for HGT. A reciprocal Darkhorse analysis (Podell et al 2008) was then performed on the orthologs of all positives, and if these genes had an LPI score >0.5, indicating the match sequence is phylogenetically typical within its own lineage, they were assigned an additional positive score.
A phylogenetic approach using the APIS program (Badger et al 2005) was also employed to assess HGT. Using this program, bootstrapped neighbor-joining trees of all predicted protein coding genes within each genome were created. All genes cladding with non-Actinobacterial homologs were binned into their respective taxonomic groups and given a positive HGT score. Evidence of HGT was also inferred from RSD analyses of each genome against a compiled set of 27 finished Actinobacterial genomes that included at least two representatives of each genus for which sequences were available. Genes present in SA and/or ST and not observed among the 27 Actinobacterial genomes were assigned a positive HGT score. Bacteriophage were identified using Prophage (Bose 2006) and Phage Finder (Fouts 2006). Other insertion elements were identified as prophage or transposon in origin through blastX homology searches. Gene annotation based on searches for identity across PFAM, SPTR, KEGG and COG databases was also used to help identify mobile genetic elements (MGEs). Each gene associated with an MGE was assigned a positive HGT score. Test scores were amalgamated and those genes showing evidence of HGT in two or more tests (maximum score 6) were classified as horizontally acquired. The results were mapped onto the genome and genes identified by only one test but associated with clusters of genes that scored in two or more tests were added to the total HGT pool. Adjacent clusters were merged.
CRISPRs were identified using CRISPR finder (http://crispr.u-psud.fr/Server/CRISPRfinder.php) while repeats larger than 35 bases were identified using Reputer (Kurtz et al 2001). Secondary metabolite gene clusters were manually annotated as in (Udwary et al 2007). Cluster boundaries were predicted using previously reported gene clusters when available as in the case of rifamycin. For unknown clusters, loss of gene conservation across the Actinobacteria was used to aid boundary predictions. In the future, programs such as “ClustScan” may prove useful for pathway annotation and product prediction (Starcevic et al 2008). However, many biosynthetic genes are large (5-10 kb) and highly repetitive creating challenges associated with gene calling and assembly, eg., (Udwary et al 2007) and the interpretation of operon structure. The ratio of non-synonymous to synonymous mutations (dN/dS) for all orthologs was calculated using the perl progam SNAP (http://www.hiv.lanl.gov) with the alignments for all values >1 checked manually.
The ST and SA genomes share 3606 orthologs, representing 79.4% and 73.2% of the respective genomes (Table 1). The average nucleotide identity among these orthologs is 87.2%, well below the 94% cut-off that has been suggested to delineate bacterial species (Konstantinidis and Tiedje 2005). Despite differing by only seven nucleotides (99.7% identity) in the 16S rRNA gene, the genome of SA is 603 kb (11.6%) larger and possesses 1505 species-specific genes compared to 987 in ST. Seventy-five percent of these species-specific genes are located in 21 genomic islands (Tables (Tables1,1, S1), none of which are comprised of genes originating entirely from one genome (Figure 1). The presence of genomic islands in the same location on the chromosomes of closely related bacteria is well recognized (Coleman et al 2006) and facilitated by the presence of tRNAs (Tuanyok et al 2008). Twelve islands in the Salinispora alignment share at least one tRNA between both genomes and of those, four share two or more tRNAs within a single island indicating multiple insertion sites. In addition to tRNAs, direct repeats detected in the same location in both genomes could also act as insertion sites to help create islands. These islands are enriched with large clusters of genes devoted to the biosynthesis of secondary metabolites (Figure 1). They house all 25 of the species-specific secondary metabolic pathways, while eight of the 12 shared pathways occur in the genus-specific core (Tables (Tables2,2, ,3).3). We have isolated and identified the products of eight of these pathways, which include the highly selective proteasome inhibitor salinosporamide A (Feling et al 2003) as well as sporolide A (Buchanan et al 2005), which is derived from an enediyne polyketide precursor (Udwary et al 2007), one of the most potent classes of biologically active agents discovered to date. A previous analysis of 46 Salinispora strains revealed that secondary metabolite production is the major phenotypic difference among the three species (Jensen et al 2007), an observation supported by the analysis of the S. tropica genome (Udwary et al 2007).
Of the eight secondary metabolites that have been isolated from the two strains, all but salinosporamide A, sporolide A, and salinilactam have been reported from unrelated taxa (Figure 1), providing strong evidence of HGT. Further evidence for HGT comes from a phylogenetic analysis of the polyketide synthase (PKS) genes associated with the rifamycin biosynthetic gene cluster (rif) in SA and Amycolatopsis mediterranei, the original source of this antibiotic (Yu 1999). This analysis confirms prior observations of HGT in this pathway (Kim et al 2006) and reveals that all 10 of the ketosynthase domains are perfectly interleaved, as would be predicted if the entire PKS gene cluster had been exchanged between the two strains (Figure S1). Evidence of HGT coupled with prior evidence for the fixation of specific pathways such as rif among globally distributed SA populations (Jensen et al 2007) supports vertical inheritance following pathway acquisition (Ochman 2005). This evolutionary history is what might be expected if pathway acquisition fostered ecotype diversification or a selective sweep (Cohan 2002) resulting from strong selection for the acquired pathway, either of which provide compelling evidence that secondary metabolites represent functional traits with important ecological roles. The concept that gene acquisition provides a mechanism for ecological diversification that may ultimately drive the formation of independent bacterial lineages has been previously proposed (Ochman et al 2000). The inclusion of secondary metabolism among the functional categories of acquired genes that may have this effect sheds new light on the functional importance and evolutionary significance of this class of genes. Although the ecological functions of secondary metabolites remain largely unknown, and thus it is not clear how these molecules might facilitate ecological diversification, there is mounting evidence that they play important roles in chemical defense (Haeder et al 2009) or as signaling molecules involved in population or community communication (Yim et al 2007).
Differences between the two species also occur in CRISPR sequences, which are non-continuous direct repeats separated by variable (spacer) sequences that have been shown to confer immunity to phage (Barrangou et al 2007). The ST genome carries three intact prophage and three CRISPRs (35 spacers), while only one prophage has been identified in the genome of SA, which possesses eight different CRISPRs (140 spacers). The SA prophage is unprecedented among bacterial genomes in that it occurs in two adjacent copies that share 100% sequence identity. These copies are flanked by tRNA att sites and separated by an identical 45 bp att site, suggesting double integration as opposed to duplication (te Poele et al 2008). Remarkably, four of the SA CRISPRs possess a spacer that shares 100% identity with portions of three different genes found in ST prophage 1 (Figure 2). These spacer sequences have no similar matches to genes in the SA prophage or in any prophage sequences deposited in the NCBI, CAMERA, or the SDSU Center for Universal Microbial Sequencing databases. The detection of these spacer sequences provides evidence that SA has been exposed to a phage related to one that currently infects ST and that SA now maintains acquired immunity to this phage genotype as has been previously reported in other bacteria (Barrangou et al 2007). This is a rare example in which evidence has been obtained for CRISPR-mediated acquired immunity to a prophage that resides in the genome of a closely related environmental bacterium. Given that SA strain CNS-205 was isolated from Palau while ST strain CNB-440 was recovered 15 years earlier from the Bahamas, it appears that actinophage have broad temporal-spatial distributions or that resistance is maintained on temporal scales sufficient for the global distribution of a bacterial species.
Enhanced phage immunity, as evidenced by 140 relative to 35 CRISPR spacer sequences, coupled with a larger genome size and a greater number of species-specific secondary metabolic pathways may account for the cosmopolitan distribution of SA relative to ST, which to date has only been recovered from the Caribbean (Jensen and Mafnas 2006). Also included among the SA-specific gene pool is a complete phospho-transferase system (PTS, Sare4844-4850). PTSs are centrally involved in carbon source uptake and regulation (Parche et al 2000) and may provide growth advantages that also factor into the relatively broad distribution of SA. However, additional strains will need to be studied before any of these differences can be firmly linked to species distributions.
The 21 genomic islands are not contiguous regions of species-specific DNA but were instead created by a complex process of gene acquisition, loss, duplication, and inactivation (Figure 3). The overall composition, evolutionary history, and function of the island genes are similar in both strains, with duplication and HGT accounting for the majority of genes and secondary metabolism representing the largest functionally annotated category. Remarkably, 42% of the rearranged island orthologs fall within other islands indicating that inter-island movement or “island hopping” is common, thus providing support for the hypothesis that islands undergo continual rearrangement (Coleman et al 2006). There is dramatic, operon-scale evidence of this process in the shared yersiniabactin pathways (ST sid2 and SA sid1), which occur in islands 15 and 10, respectively, and in the unknown dipeptide pathways (ST nrps1 and SA nrps3), which occur in islands 4 and 15, respectively. In both cases, these pathways remain intact yet are located in different islands in the two strains (Figure 1, Table Table2,2, ,3).3). There is also evidence of cluster fragmentation in the 10-membered enediyne gene set SA pks3, which contains the core set of genes associated with calicheamicin biosynthesis (Figure S2) (Ahlert et al 2002), yet is split by the introduction of 145 kb of DNA from three different biosynthetic loci (island 10, Figure 1). The conserved fragments appear to encode the biosynthesis of a calicheamicin anolog, while flanking genes display a high level of gene duplication and rearrangement indicative of active pathway evolution. Cluster fragmentation is also observed in the 9-membered enediyne PKS cluster SA pks1(A-C), which is scattered across the genome in islands 4, 10, and 21 (Figure 1, Table 3).
The genomic islands are also enriched in mobile genetic elements including prophage, integrases, and actinobacterial integrative and conjugative elements (AICEs) (Burrus et al 2002) (Tables S2, S3), the later of which are known to play a role in gene acquisition and rearrangement. The Salinispora AICEs possess traB homologs, which promote conjugal plasmid transfer in mycelial streptomycetes (Reuther 2006), suggesting that hyphal tip fusion is a prominent mechanism driving gene exchange in these bacteria. AICEs have been linked to the acquisition of secondary metabolite gene clusters (te Poele et al 2007) and their occurrence in island 7 (SA AICE1), which includes the entire 90 kb rif cluster, and island 10 (SA AICE3), which contains biosynthetic gene clusters for enediyne, siderophore, and amino acid-derived secondary metabolites, provides a mechanism for the acquisition of these pathways (Figure 1). Six additional secondary metabolite gene clusters (ST nrps1, ST spo, SA nrps3, SA pks5, SA cym, and SA pks2) are flanked by direct repeats, providing further support for HGT. In the case of cym (Schultz 2008), which is clearly inserted into a tRNA, the pseudogenes preceding and following it are all related to transposases or integrases providing a mechanism for chromosomal integration.
Despite exhaustive analyses of HGT, only 22% of the 127 genes in the five biosynthetic pathways (rif, sta, des, lym, cym) whose products have also been observed in other bacteria (Figure 1, Table 3) scored positive for HGT. This observation suggests that the pathways either originated in Salinispora or that the exchange of these biosynthetic genes has occurred largely among closely related bacteria and therefore gone undetected with the HGT methods applied in this study. The latter scenario is supported by the observation that all five of the shared biosynthetic pathways were previously reported in other actinomycetes. The acquisition of genes from closely related bacteria likely accounts for many of the species-specific island genes for which no evidence of evolutionary history could be determined (Figure 3b). These genes were poorly conserved among 27 Actinobacterial genomes (Figure 3d) providing additional support that they were acquired, most likely from environmental Actinobacteria that are not well represented among sequenced genomes. Although gene loss was not quantified, this process is also a likely contributor to island formation. In support of an adaptive role for island genes, 7.6% (44/573) of the orthologs show evidence of positive selection (dN/dS >1) compared to 1.6% (49/3027) of the non-island pairs. Given that the majority of island genes display evidence of HGT, the increased dN/dS ratio is in agreement with the observation that acquired genes experience relaxed functional constraints (Hao and Golding, 2006).
Functional differences between related organisms can be obscured when orthologs are taken out of the context of the gene clusters in which they reside. For example, the PKS genes Sare1250 and Stro2768 are orthologous and likely perform similar functions, yet they reside in the rif and slm pathways, respectively, and thus contribute to the biosynthesis of dramatically different secondary metabolites. Likewise, intra-cluster PKS gene duplication (Sare3151 and Sare3152, Figure 1) has an immediate effect on the product of the pathway by the introduction of an additional acyl group into the carbon skeleton of the macrolide, as opposed to the more traditional concept of parology facilitating mutation-driven functional divergence (Prince 2002). Sub-genic, modular duplications are also observed (Sare3156 modules 4 and 5, Figure 1), which likewise have an immediate effect on the structure of the secondary metabolite produced by the pathway. While HGT is considered a rapid method for ecological adaptation in bacteria (Ochman et al 2000), PKS gene duplication provides a complementary evolutionary strategy (Fischbach et al 2008) that could lead to the rapid production of new secondary metabolites that subsequently drive the creation of new adaptive radiations.
Salinispora species are the first marine Actinobacteria reported to require seawater for growth (Maldonado et al 2005). Unlike Gram-negative marine bacteria, in which seawater requirements are linked to a specific sodium ion requirement (Kogure 1998), Salinispora strains are capable of growth in osmotically adjusted, sodium-free media (Tsueng and Lam 2008). An analysis of the Salinispora core for evidence of genes associated with this unusual osmotic requirement reveals a highly duplicated family of 29 polymorphic membrane proteins (PMPs) that include homologs associated with polymorphic outer membrane proteins (POMPs). POMPs remain functionally uncharacterized however there is strong evidence that they are type V secretory systems (Henderson 2001), making this the first report of type V autotransporters outside of the Proteobacteria (Henderson 2004). Phylogenetic analyses provide evidence that the Salinispora PMPs were acquired from aquatic, Gram-negative bacteria and that they have continued to undergo considerable duplication subsequent to divergence of the two species (Figure S3). The occurrence of this large family of PMP autotransporters in marine Actinobacteria may represent a low nutrient adaptation that renders cells susceptible to lysis in low osmotic environments.
In conclusion, the comparative analysis of two closely related marine Actinobacterial genomes provides new insight into the functional traits associated with genomic islands. It has been possible to assign precise, physiological functions to island genes and link differences in secondary metabolism to fine-scale phylogenetic architecture in two distinct bacterial lineages, which by all available metrics maintain the fundamental characteristics of species-level units of diversity. It is clear that gene clusters devoted to secondary metabolite biosynthesis are dynamic entities that are readily acquired, rearranged, and fragmented in the context of genomic islands, and that the results of these processes create natural product diversity that can have an immediate effect on fitness or niche utilization. The high level of species specificity associated with secondary metabolism suggests that this functional trait may represent a previously unrecognized force driving ecological diversification among closely related, sediment inhabiting bacteria.
This manuscript is dedicated to Professor William Fenical for his pioneering work on the secondary metabolites of marine actinomycetes. PRJ and BSM were funded by the California Sea Grant Program (R/NMP-98), NOAA grant NAO80AR4170669, and the JGI Community Sequencing Program. Additional funding was from NIH grant CA127622 to BSM and a post-doctoral fellowship from the DAAD to MN. EEA thanks the Gordon and Betty Moore Foundation for funding through CAMERA. We acknowledge Dr. Jonathan Badger for assistance with APIS and Prof. Terry Gaasterland for computational assistance.
Genome sequences have been deposited in GenBank under accession numbers CP000667 (S. tropica) and CP000850 (S. arenicola).
Supplementary information is available at the ISME Journal's website.