|Home | About | Journals | Submit | Contact Us | Français|
The uncultured bacterial symbiont “Candidatus Endobugula sertula” is known to produce cytotoxic compounds called bryostatins, which protect the larvae of its host, Bugula neritina. The symbiont has never been successfully cultured, and it was thought that its genome might be significantly reduced. Here, we took a shotgun metagenomics and metatranscriptomics approach to assemble and characterize the genome of “Ca. Endobugula sertula.” We found that it had specific metabolic deficiencies in the biosynthesis of certain amino acids but few other signs of genome degradation, such as small size, abundant pseudogenes, and low coding density. We also identified homologs to genes associated with insect pathogenesis in other gammaproteobacteria, and these genes may be involved in host-symbiont interactions and vertical transmission. Metatranscriptomics revealed that these genes were highly expressed in a reproductive host, along with bry genes for the biosynthesis of bryostatins. We identified two new putative bry genes fragmented from the main bry operon, accounting for previously missing enzymatic functions in the pathway. We also determined that a gene previously assigned to the pathway, bryS, is not expressed in reproductive tissue, suggesting that it is not involved in the production of bryostatins. Our findings suggest that “Ca. Endobugula sertula” may be able to live outside the host if its metabolic deficiencies are alleviated by medium components, which is consistent with recent findings that it may be possible for “Ca. Endobugula sertula” to be transmitted horizontally.
IMPORTANCE The bryostatins are potent protein kinase C activators that have been evaluated in clinical trials for a number of indications, including cancer and Alzheimer's disease. There is, therefore, considerable interest in securing a renewable supply of these compounds, which is currently only possible through aquaculture of Bugula neritina and total chemical synthesis. However, these approaches are labor-intensive and low-yielding and thus preclude the use of bryostatins as a viable therapeutic agent. Our genome assembly and transcriptome analysis for “Ca. Endobugula sertula” shed light on the metabolism of this symbiont, potentially aiding isolation and culturing efforts. Our identification of additional bry genes may also facilitate efforts to express the complete pathway heterologously.
Marine invertebrates, such as tunicates and sponges, are known to harbor symbiotic communities of bacteria. Because these animals are sessile and have limited physical defenses, their microbiomes are thought to serve defensive functions, and bacterial symbionts in these invertebrates have been implicated in the production of many bioactive small molecules (1, 2). Such small molecules can possess therapeutically relevant activities, so there is great interest in studying the biosynthetic potential and symbiotic functions of these defensive microorganisms (1, 2). However, most symbionts (and most environmental bacteria in general [1, 3, 4]) are difficult to culture, which renders their bioactive metabolites challenging to access on an industrial or clinically useful scale. Due to fastidious or cryptic growth requirements, the study of these systems is possible only through culture-independent methods, such as direct sequencing of environment-derived DNA (termed shotgun metagenomics).
In the present work, we used shotgun metagenomics to gain a greater understanding of the uncultured bacterial symbiont “Candidatus Endobugula sertula,” which resides within the marine bryozoan Bugula neritina (Fig. 1A), where it is known to produce defensive compounds called bryostatins (5,–8) (Fig. 1B). Although the adult bryozoan is covered in a protective layer of chitin, the larvae require chemical defense after release (9,–11), and high concentrations of “Ca. Endobugula sertula” have been observed within free-swimming larvae (10).
Despite many unsuccessful attempts to culture the producing organism in laboratory settings, the bryostatins remain a target of therapeutic interest (7), as they are cytotoxic highly potent protein kinase C activators that have been investigated in clinical trials for the treatment of cancer, Alzheimer's disease, and HIV infection (Fig. 1B) (7). The bry pathway for bryostatin production is what is known as a trans-acetyltransferase (AT)-type polyketide synthase (PKS) pathway (8, 12). PKSs are related to fatty acid synthases, and they construct a molecule in a similar fashion from two-carbon units derived from malonate (13), using multiple enzymatic domains in a typical PKS protein. Most PKS pathways include AT domains within PKS proteins that are used to load the activated S-coenzyme A (CoA) thioester form of malonate, but in trans-AT systems, the AT exists on a separate protein. Portions of the bry pathway for the bryostatins were recovered previously using clone library methods (6), but the complete genome sequence, including missing pieces of the bry pathway, has remained elusive due to recalcitrance of the symbiont to culture. A number of factors are thought to contribute to this recalcitrance, including the possibility of substantial genome reduction (7), which has been reported in other invertebrate symbioses (14,–16). Indeed, the divergence of symbionts in genetically isolated hosts (17), as well as restriction to B. neritina and allied species, the vertical mode of symbiont transmission, and the ongoing inability to isolate and culture “Ca. Endobugula sertula” suggested a lifestyle of extreme host dependence that can lead to gene loss and genome reduction (18). However, we recently found that under some circumstances, “Ca. Endobugula sertula” may be horizontally transferred between B. neritina individuals (19), which is not consistent with strict host dependency that is associated with extreme genome reduction (18).
Our objective was to assemble the genome of “Ca. Endobugula sertula” in order to determine whether the symbiosis has evolved to a state of codependency, evidenced by bacterial genome degradation, as in some other defensive symbioses (14,–16). Shotgun DNA and RNA sequencing revealed a genome with few signs of reduction, largely intact mainstream metabolic pathways, and a putative mechanism of vertical transmission of “Ca. Endobugula sertula” to host larvae.
Collection of samples AB1_ovicells and MHD_larvae has been described elsewhere (76). Briefly, an adult individual (AB1) was collected by hand from the sides of a floating dock in November 2013 near Morehead City, NC, at site Atlantic Beach (AB; 34°42′24.527″N 76°44′18.286″W). Mature larval brooding chambers (ovicells) were dissected from AB1 and combined to form sample AB1_ovicells. Additional B. neritina individuals were collected at Morehead City Docks (MHD; 34°43′8.879″N 76°42′49.838″W), also in November 2013. Approximately 20 of these B. neritina colonies were used for combined larval collection. The collected larvae were combined to form sample MHD_larvae. Both samples were genotyped using Linneman et al.'s protocol (19) and found to be the “shallow” (S) genotype. Both samples were preserved in RNAlater and stored at −80°C.
DNA was extracted from RNAlater-preserved tissue samples using a procedure previously optimized for tunicate shotgun metagenomics (20). A portion of AB1_ovicells was ground with a mortar and pestle under liquid nitrogen before being resuspended in 5 ml of 2 mg/ml lysozyme in Tris-EDTA (TE) buffer. A portion of MHD_larvae was added directly to 5 ml of 2 mg/ml lysozyme in TE. In both cases, extractions were incubated at 30°C, with shaking, for 1 h. After this time, 1.2 ml of 0.5 M EDTA was added to each tube along with proteinase K (final concentration, 0.2 mg/ml; Qiagen), and mixtures were incubated at 30°C for 5 min. After the addition of 650 μl of 10% SDS, the mixtures were incubated at 37°C with shaking overnight. NaCl (1.2 ml of 5 M) was then added to each tube, along with 1.0 ml of cetyltrimethylammonium bromide (CTAB)-NaCl solution (10% CTAB in 0.7 M NaCl), and the tubes were incubated at 65°C for 20 min. Mixtures were extracted twice with 1:1 phenol-chloroform, and 1 volume of isopropanol was added to the aqueous fraction, which was then stored at 4°C overnight. Tubes were spun down at 3,220 × g for 30 min at 0°C. Supernatants were carefully removed, and 2 ml of 70% ethanol in water was added to each tube before they were spun down again. The supernatants were removed, and tubes were inverted for 20 min before 500 μl of TE was added. The tubes were left overnight at 4°C to allow DNA to dissolve, before extractions were subjected to repurification by Genomic-tip 100/G (Qiagen), according to the manufacturer's instructions. This procedure yielded 9.4 μg of DNA from AB1_ovicells and 882 ng of DNA from MHD_larvae (both in 500 μl, with the concentration measured by the PicoGreen assay). For metagenomic analysis, TruSeq libraries were prepared with ~300-bp inserts and then subjected to sequencing in 2 × 100-bp runs on a HiSeq 2000 (Illumina). RNA was extracted from approximately 40 mg of AB1_ovicells. After grinding with a mortar and pestle under liquid nitrogen, the material was resuspended in 600 μl of buffer RLT (Qiagen) containing 6 μl of β-mercaptoethanol. The mixture was homogenized by drawing a sterile 20-G needle up and down 15 times before being spun down at 16,800 × g for 3 min. Total RNA was then purified from the crude lysate using the RNeasy minikit (Qiagen), utilizing the optional DNase step. This procedure yielded 1.6 μg of RNA (in 100 μl). The resulting RNA was flash-frozen in liquid nitrogen and stored at −80°C. Prokaryotic and eukaryotic rRNA was depleted with the Ribo-Zero rRNA removal (epidemiology) kit (Epicentre), and eukaryotic polyadenylated transcripts were depleted with poly(T) beads. RNA in the resulting eluate was recovered and purified with Agencourt RNAClean XP beads. Stranded RNA sequencing (RNA-seq) Illumina libraries were prepared with ~300-bp inserts and subjected to two Illumina HiSeq 2000 sequencing runs, one at 2 × 101 bp and one at 2 × 151 bp. Sequence yields are shown in Table S1 in the supplemental material. Reads were quality filtered and then assembled with SPAdes (21), and contigs were taxonomically classified by using MEGAN (22) to process the results of BLASTP (protein-protein Basic Local Alignment Search Tool) searches of all predicted open reading frames (ORFs) against the NCBI nonredundant protein database (NR), as previously described (76). These classifications allowed bacterial contigs to be separated from eukaryotic and “unclassified kingdom” contigs likely originating from the host genome. AB1_ovicells and MHD_larvae metagenomes were compared by aligning raw MHD_larvae sequence reads to bacterial AB1_ovicells contigs of >3 kbp in length. Contigs with >1× read coverage in the resulting alignment were visualized in R (23) as points on a graph of G+C% versus coverage (see Fig. S1 in the supplemental material). Outlier contigs to the single apparent cluster were removed using mclust (24) in R (23). This cluster was determined to be the “Ca. Endobugula sertula” genome, based on phylogeny and inclusion of known “Ca. Endobugula sertula” genes (see Results).
Primers (see Table S3 in the supplemental material) were designed to have an annealing temperature of ~55°C manually and using the Primer3 algorithm (25) to test various aspects of genomic assemblies. For a 10-μl reaction matrix, the following volumes and concentrations of each component were used: 5 μl of 2 KOD buffer, 2 μl of 2 mM dinucleoside triphosphates (dNTPs), 0.2 μl of KOD Xtreme Hot Start DNA polymerase (Merck Group, Darmstadt, Germany), 1 μl of 3 μM forward primer, 1 μl of 3 μM reverse primer, and 0.8 μl of template (1:10 dilution of the AB1_ovicells DNA extraction). Reactions were carried out on a Bio-Rad (Hercules, CA) C1000 Touch thermal cycler in 200-μl 8-strip tubes using a thermocycling program consisting of 94°C for 2 min; 35 cycles of 98°C for 10 s, 55°C for 30 s, and custom extension time (1 min per kbp expected product) at 68°C; and then 68°C for 10 min with an indefinite hold at 12°C upon thermocycling completion.
RNA-seq data were filtered with SeqyClean (https://github.com/ibest/seqyclean), using the parameter “-polyat.” The resulting filtered reads were aligned with Bowtie 2 (26) to contigs in the “Ca. Endobugula sertula” assembly using the end-to-end alignment options “-very-sensitive -no-discordant -no-unal.” Contigs were annotated with the Prokka pipeline (27) and combined with RNA-seq alignment files in Geneious (Biomatters Ltd., San Francisco, CA) to visualize transcript abundance. Reads aligned to each ORF were counted in Geneious, and for each gene, normalized reads per kilobase pair of gene per million (RPKM) reads aligning to annotated ORFs in the “Ca. Endobugula sertula” genome (28) values were calculated.
MEGAN (22) was used to assign functional Kyoto Encyclopedia of Genes and Genomes (KEGG) categories to predicted protein sequences. KEGG trees were uncollapsed two levels in MEGAN, and all assignments except for “organismal systems” and “human diseases” were exported to a .csv file (with the columns “read name” and “KEGG name”). Calculated RPKM values and the MEGAN .csv table were used to calculate proportions of the “Ca. Endobugula sertula” transcriptome that corresponded to each KEGG category. Where multiple KEGG categories were assigned to one predicted gene, that gene's RPKM value was split equally among the assigned categories.
Predicted genes in the draft “Ca. Endobugula sertula” assembly with homologs in other gammaproteobacteria (Fig. 2; see also Fig. S4 and Table S2 in the supplemental material) were identified with a procedure previously used to determine homologs conserved between an intracellular symbiont and its closest free-living relative (47). Predicted protein sequences from the “Ca. Endobugula sertula” assembly were used as queries in BLASTP searches against the NR database, and the ranks of proteins from target bacteria were tabulated with a custom script. Genes were counted as having homologs in the target sequence set if a target sequence had a rank of <100 in the BLAST results. The best BLAST hit from the target sequence set was counted as the homolog of the protein query from the “Ca. Endobugula sertula” assembly.
Genes identified by annotation and homology were referenced to UniProt to associate and confirm Enzyme Commission (EC) numbers with each gene. The identified EC numbers were mapped to biochemical pathways and compared using BioCyc and KEGG pathways. The functional continuity of biochemical pathways was tested using a draft constraint-based model of “Ca. Endobugula sertula” containing 1,774 biochemical reactions using a biomass objective function and flux balance analysis (FBA).
AMPHORA2 (30, 31) was used to scan the “Ca. Endobugula sertula” assembly for 104 phylogenetic marker genes, which were extracted and manually examined to resolve instances of multiple hits. The marker genes were individually aligned to the internal reference database supplied with AMPHORA2. Genes from gammaproteobacterium IMCC1989 (32) were manually added, as this bacterium was not present in the AMPHORA2 database. The marker genes were individually aligned to the internal reference database supplied with AMPHORA2. The set of individual marker alignments was filtered such that only reference genomes with >75% of the marker genes were retained, and then marker genes represented in <75% of these genomes were removed. The resulting 29 marker alignments were concatenated, and residues not aligning to AMPHORA2's hidden Markov models (HMMs), signified by lowercase residues in the resulting alignment file, were removed from the alignment. Trees were then constructed with FastTree 2 (33) using the parameters “-gamma -slow -spr 10 -mlacc 3 -bionj.” After FastTree 2 runs were complete, accession numbers were substituted for strain designations according to entries in the RefSeq database. The resulting tree was rooted arbitrarily at the divergence of the phylum Deinococcus-Thermus and other bacteria, as others have done previously (30). The tree was manipulated using the Interactive Tree of Life server (34).
Raw shotgun metagenome reads for AB1_ovicells and MHD_larvae were deposited in the Sequence Read Archive (SRA). The accession numbers for AB1_ovicells are SRR4020077, SRR4020078, SRR4020081, SRR4020082, SRR4020083, and SRR4020084; the accession numbers for MHD_larvae are SRR4020079, SRR4020080, SRR4020086, SRR4020087, and SRR4020088. Metatranscriptomic RNA-seq reads for AB1 were deposited in the SRA under accession numbers SRR4020085 and SRR4125604. The annotated draft genome for AB1_endobugula assembly is accessible through the NCBI under the accession number MDLC00000000. All these data sets are accessible under BioProject PRJNA322176 and BioSample numbers SAMN05039512 (AB1_ovicells) and SAMN05513359 (MHD_larvae).
DNA from two separate samples was sequenced: free-swimming larvae from a pooled sample of B. neritina colonies (MHD_larvae), and ovicells dissected from a single adult colony (AB1_ovicells). Reference-based assembly of 16S sequences using EMIRGE (35, 36) reconstructed a 16S rRNA sequence with 99% identity to that of “Ca. Endobugula sertula” (type S strain, also termed BnSP; accession number AF006606 ) in both AB1_ovicells and MHD_larvae, but it suggested the presence of multiple bacteria in the AB1_ovicells sample that were not found in MHD_larvae. Through 16S rRNA amplicon sequencing, it was found that this “Ca. Endobugula sertula” sequence accounted for 2.4% and 5.6% of reads in AB1_ovicells and MHD_larvae, respectively, and apart from this symbiont, there was virtually no overlap in the wider microbiome between these two samples (76). Reads from both AB1_ovicells and MHD_larvae were assembled separately using SPAdes (21). Contigs from the AB1_ovicells assembly, which had generally higher-quality sequence assembly characteristics (larger overall assembly size, higher N50, etc.) than the pooled MHD_larvae sample (see Table S1 in the supplemental material), were classified taxonomically based on the homology of their open reading frames (ORFs) to the NCBI NR database. To simplify the overall metagenomic assembly of AB1_ovicells, contigs smaller than 3,000 bp and contigs that were classified as eukaryotic (~135 Mbp in total) or unclassified at the kingdom level based on these taxonomic assignments were removed.
Because the EMIRGE-reconstructed 16S sequences suggested that “Ca. Endobugula” was the only shared bacterium between the AB1_ovicells and MHD_larvae assembly, contigs with read coverage in both samples were identified, which should include sequences belonging to “Ca. Endobugula sertula” and any host genome contigs misclassified as bacteria. AB1_ovicells contigs that had MHD_larvae coverage appeared to form two discrete clusters based on G+C% and coverage (see Fig. S1 in the supplemental material). These two clusters were automatically separated using a technique where data are fitted to a certain number of groups based on discrete distribution patterns, known as normal-mixture modeling (24). The accuracy of this clustering approach was evaluated using the assigned ORF (see Fig. S2 in the supplemental material) and contig-based (see Fig. S3 in the supplemental material) taxonomy of these two clusters, as well as single-copy-marker analysis (Table 1). The contigs belonging to one of these groups (clusters 1 and 2) contained bry pathway components and had ORF taxonomic classifications consistent with previous 16S-based taxonomies of “Ca. Endobugula sertula,” such as known relatives gammaproteobacterium IMCC1989 (32) and Teredinibacter turnerae (38).
The draft genome of “Ca. Endobugula sertula” is 3.4 Mbp in size (112 contigs), has 41.2% G+C content, and is estimated to be 100% complete by single-copy-marker gene analysis (39) (Table 1). The “Ca. Endobugula sertula” 16S rRNA gene sequence reconstructed by EMIRGE was joined to one of the putative “Ca. Endobugula sertula” contigs through PCR and Sanger sequencing. In addition, a whole-genome phylogenetic tree constructed from concatenated bacterial marker genes placed the genome in a clade with known “Ca. Endobugula sertula” relatives gammaproteobacterium IMCC1989, T. turnerae, Saccharophagus degradans, and Cellvibrio japonicus (Fig. 2A) (17, 38, 40). All of these relatives are free living except for T. turnerae, which is an intracellular symbiont of wood-feeding shipworms (38), although it should be noted that T. turnerae can be grown in the laboratory, in contrast to “Ca. Endobugula sertula.”
“Candidatus Endobugula sertula” possesses many proteins with homologs in related gammaproteobacteria (Fig. 2B), but a large portion, including the bry pathway, are also not shared with the closest free-living relatives. Teredinibacter turnerae (38), S. degradans (41), and C. japonicus (42) are all specialized in the degradation of complex polysaccharides, but “Ca. Endobugula sertula” does not appear to possess homologs of the ~100 glycosyl hydrolase (GH) domains found in T. turnerae, which are used to digest wood for its host. “Candidatus Endobugula sertula” also does not appear to possess homologs of the nitrogen fixation genes found in T. turnerae (38), which compensate for a nitrogen-poor wood diet. However, a group of genes were identified that showed homology to proteins found in Photorhabdus and/or Xenorhabdus species (Fig. 2B), which are predominantly insect pathogens that associate with nematode vectors (43). These genes included several toxin complex (Tc) gene clusters (Fig. 2C), which have been associated with both host specificity and pathogenesis (44). One of these loci, on contig NODE28, was found to contain two chitinase genes.
“Candidatus Endobugula sertula” has, to our knowledge, never been cultured in the laboratory, and it has been previously suggested that its genome might be reduced (7), similar to some long-term symbionts of insects (15, 18) and marine tunicates (14, 16). Although the “Ca. Endobugula sertula” genome is the smallest out of several close relatives (Table 2), there is little other evidence to indicate genome degradation or reduction. It is 3.4 Mbp in size, larger than the previously predicted 2 Mbp based on flow cytometry (7), and despite the high A+T content (~70%) of certain regions in the bryostatin pathway (6, 7), the genome as a whole is only 58.4% A+T. Furthermore, the coding density of the “Ca. Endobugula sertula” genome is 85.1%, which falls into the average bacterial coding density range (85 to 90% ). We compared the lengths of genes in T. turnerae and gammaproteobacterium IMCC1989 to homologs in “Ca. Endobugula sertula” in an attempt to identify potential pseudogenes that were <80% of the length of their homologs (46, 47). A low number of genes, 30, were identified as potential pseudogenes, and most of these were annotated with functions associated with motility and regulatory sensors (see Fig. S4 and Table S2 in the supplemental material). Bioinformatic analysis and metabolic modeling, however, indicated potential metabolic insufficiencies in amino acid metabolism, as we could not find complete pathways to synthesize methionine and threonine (Fig. 3).
To gain a functional snapshot of “Ca. Endobugula sertula,” we extracted and sequenced RNA from the AB1_ovicells sample (Fig. 4). In “Ca. Endobugula sertula,” 12.1% of the functionally assigned transcriptome is dedicated collectively to putative symbiotic functions: bryostatin synthesis (3.6%) and the expression of Tc genes (8.5%). Despite apparent insufficiencies in amino acid biosynthesis, the single largest fraction (14.7%), by KEGG category, of the symbiont's transcriptome was dedicated to processes in amino acid metabolism (Fig. 4). A number of enzymes in the amino acid metabolism category are involved in arginine biosynthesis. For instance, argininosuccinate synthase (AB835_04910) and acetylornithine aminotransferase (AB835_00040) are the two most highly expressed enzymes in this amino acid metabolism category. Another highly expressed enzyme involved in amino acid biosynthesis was aspartate transaminase (AB835_06295), which is a multifunctional enzyme that is involved in aspartate, tyrosine, phenylalanine, and tryptophan metabolism. While the pathways for histidine and lysine appeared to be complete, present, and transcribed, lower expression levels of certain enzymes in these pathways (Fig. 3; see also Data Set S1 in the supplemental material) make the actual level of amino acid biosynthesis more uncertain, given that transcript levels often do not accurately predict downstream protein abundance (48).
The bry pathway was previously sequenced as a contiguous locus in the type S genotype of “Ca. Endobugula sertula,” containing five genes encoding PKS proteins (BryBCXDA) and various accessory genes, including the trans-AT bryP and genes involved in β-branching (bryQR, Fig. 5C) (6). The known sequence included several exact repeats, which complicated de novo assembly of this region in the AB1_ovicells metagenome. A procedure devised by Albertsen et al. (49) was used to resolve the repeats by identifying paired reads aligning at the ends of two (or more) contigs (Fig. 5A). Relative read coverage of repeats versus unique regions, as well as contig orientation, was used to construct a connection map and restrict the sequence to two possible structures (Fig. 5B). The correct bry locus structure, as determined by PCR (see Tables S3 and S4 in the supplemental material), was in agreement with the published sequence (accession no. EF032014). The published pathway, however, is missing some key components needed for β-branching to produce the methylated esters seen in the final bryostatin structures. In polyketides made by trans-AT PKS pathways, β-alkyl chains are commonly introduced by a set of proteins minimally including a standalone acyl-carrier protein (ACP), a decarboxylative ketosynthase (KS), a 3-hydroxy-3-methylglutaryl-CoA synthase (HMGS) and 1-2 enoyl-CoA-hydratases (ECH) (50) (Fig. 5C). A previously unknown ECH and an adjacent ACP were found in a separate locus (bryTU), and these are predicted to act in the dehydration of the new side chain and as a carrier for the malonate β-branch, respectively. These genes are homologous to corresponding genes in other trans-AT PKS pathways, with the closest homologs to bryT and bryU being batD (batumin/kalimantacin pathway , 53% amino acid identity) and calX (calyculin pathway , 45% amino acid identity), respectively.
Interestingly, all bry mRNA transcripts were detected through metatranscriptomics, except bryS, a methyltransferase previously thought to carry out the O-methylation of β-branches (6). Unlike other bry genes, bryS has a homolog in gammaproteobacterium IMCC1989 that is annotated as a tRNA-methyltransferase (accession no. WP_040805166) and therefore may not be part of the bry pathway, as previously predicted.
Reductive evolution is known to occur in a number of settings, including intracellular symbiosis (18, 53), intracellular pathogenesis (54), and free-living pelagic microbes (55). In strictly intracellular organisms, reduction is thought to occur due to genetic drift as a consequence of genetic isolation and strict vertical transmission, low effective population sizes, and frequent bottlenecks (18). In free-living examples, it is thought that reduction is driven by selection rather than drift (56), potentially due to a selective advantage in not investing in the production of a metabolite that has become a “public good” in the community (57). In the “Black Queen” model (57), when a metabolic pathway becomes rare enough in the community, further loss is selected against to prevent loss of community fitness as a whole.
Features in the “Ca. Endobugula sertula” genome were inconsistent with extreme reduction due to sequence drift. The early stages of such a process are characterized by a proliferation of pseudogenes and an accompanying low coding density (18). In contrast to the hundreds of pseudogenes identified in Sodalis glossinidius, a tsetse fly symbiont in the early stages of genome degradation (58), only 30 potential pseudogenes were found in the “Ca. Endobugula sertula” genome. Furthermore, the “Ca. Endobugula sertula” genome did not have a particularly low G+C content but did have a coding density within the typical bacterial range (45) and a genome size substantially larger than the previously predicted 2 Mbp (7). However, careful examination of the metabolic capabilities implied by the “Ca. Endobugula sertula” genome suggested that it lacked a number of enzymes required for the synthesis of certain amino acids (Fig. 3), potentially explaining its dependence on its host environment and, perhaps, other microbial constituents of the host microbiome. Metatranscriptomic analysis suggested that the symbiont was actively involved in metabolic processes related to amino acid metabolism, such as arginine biosynthesis. High expression levels of the arginine biosynthesis pathway may help explain an interesting observation made by a previous study that found that B. neritina larvae do not become depleted of nitrogen during the swimming stage prior to settlement, compared to the larvae of other Bugula species that are aposymbiotic (59). Perhaps then, the host's larval nutrition may be augmented through nitrogen recycling by “Ca. Endobugula sertula” in the form of ammonia assimilation as carbamoyl phosphate (60) before storage as arginine (61).
Previously, it was found that different populations and sibling species of Bugula harbor different strains of “Ca. Endobugula sertula” (7), which is a pattern of distribution consistent with a vertical mode of symbiont transmission, where respective populations have been genetically isolated since host divergence. However, we previously found evidence that horizontal transmission is likely also possible (19). We found that two sibling species of B. neritina, northern (N) and shallow (S), previously thought to be allopatrically distributed, actually coexist along the Western Atlantic. Type N populations are divergent from type S animals and are aposymbiotic in their typical northern range. However, in Western Atlantic populations, both type N and type S individuals can be found harboring 100% identical “Ca. Endobugula sertula” strains (as measured by 16S and internal transcribed spacer [ITS] sequences). The most parsimonious explanation for this observation is the horizontal transfer of symbionts from type S to type N individuals.
The noncongruence of host and symbiont phylogenies (17, 38, 40) also argues against a long evolutionary history of strict vertical transmission. Although “Ca. Endobugula sertula” has proven difficult to isolate and culture, the lack of extensive genome reduction suggests that a transient host-free existence may be possible, a plausible explanation for the symbiont acquisition by type N hosts found alongside symbiotic type S hosts at low latitudes (19). A similar situation is found in the tunicate Lissoclinum patella, which harbors a photosynthetic symbiont, Prochloron didemni (62). As with “Ca. Endobugula sertula,” P. didemni has never been cultivated in the laboratory, but its large genome size and the lack of evidence for genetic drift and accelerated evolution suggest that it might not be genetically isolated inside individual hosts (63). “Candidatus Endobugula sertula” appears to have specific metabolic deficiencies, and these deficiencies might be the reason that “Ca. Endobugula sertula” is dependent on the host. If the symbiont is indeed able to withstand short periods of time outside the host (i.e., it is not genetically isolated within individual hosts), these metabolic deficiencies may have come about due to selection for not investing in “public goods” that are supplied by the host (57). However, over evolutionary time scales, further metabolic deficiencies may develop if the symbiont becomes more strictly restricted to only living within the host and undergoes genome degradation and reduction, which would likely prevent horizontal transmission of the symbiont.
Despite few signs of genome degradation, the “Ca. Endobugula sertula” genome shows some potential adaptations to symbiotic life. Several pseudogenes were annotated with functions in flagellar assembly and motility, and similar functions are lost in intracellular Rickettsiales (64). The chitinase-containing Tc locus that was found on NODE28 is related to the Yen-Tc locus from the insect pathogen Yersinia entomophaga (65) and might also be involved in symbiosis. The chitinases in Yen-Tc are thought to associate with the secreted Tc assembly, which exhibits chitinase activity in vitro, to contribute to Y. entomophaga pathogenicity (65). The chitinase-containing Tc locus in “Ca. Endobugula sertula” is highly expressed in the ovicells, consistent with its use in allowing “Ca. Endobugula sertula” to move from the funicular cords within the adult host (5) through the potentially chitinaceous ectocyst (66, 67), thus ensuring vertical transmission to the larvae. The high expression of Tc loci in ovicells is suggestive of the importance of these proteins to the symbiont during the reproductive phase of its host. Other symbiotic systems have been shown to coopt mechanisms used in virulence and immunogenicity of pathogens, such as during the acquisition of the light-producing symbiont Vibrio fischeri by the Hawaiian bobtail squid, where the immunogenic bacterial products lipopolysaccharide and peptidoglycan are central to symbiont-host interactions (68, 69). In a similar fashion, the other Tc loci in the “Ca. Endobugula sertula” genome might also be involved in aspects of recognition and communication with the bryozoan host, and their presence might signify that ancestors of “Ca. Endobugula sertula” were pathogenic.
In the present work, a draft assembly of the “Ca. Endobugula sertula” genome was recovered and estimated to be 100% complete by single-copy-marker analysis (39) using comparative shotgun metagenomic data sets. Adapting a method developed by Albertsen et al. (49), the sequence of the repeat-heavy bry biosynthetic pathway was confirmed to agree with the sequence previously constructed using a clone library method and Sanger sequencing (6). Although additional pathway components were identified, enzymes in the “Ca. Endobugula sertula” genome that could be responsible for the addition of the ester side chains could not be found (Fig. 1B), suggesting that the symbiont employs either a novel mechanism that could not be identified bioinformatically or enzymes annotated with roles in primary metabolism, as others have observed (16, 70). Alternatively, the side chains may be installed by the host, rather than the symbiont. Additionally, because bryS showed no RNA-seq coverage, this gene may not be involved in the biosynthesis of bryostatins. If BryS is not responsible for O-methylation of β-branches, this conversion is most likely to be carried out by methyltransferase domains in BryA and BryB polyketide synthases.
Interestingly, although the majority of bacterial secondary metabolite pathways consist of genes clustered into one chromosome region (71), several symbiotic bacteria have been found to contain fragmented pathways (14,–16, 29), which include the previously known portions of the bry pathway found to be split into two separate loci in the type D genotype of “Ca. Endobugula sertula” (7). The AB1 draft genome of “Ca. Endobugula sertula” does not contain any other PKS pathway, apart from bry, that would explain the presence of newly identified bryTU genes. Identifying these genes likely implicated in bryostatin biosynthesis highlights the importance of recovering complete or near-complete genomes and, thus, the use of unbiased shotgun sequencing in capturing and understanding the fragmented biosynthetic pathways of uncultured microorganisms. Our identification of these two additional bry genes, along with the metabolic analysis of “Ca. Endobugula sertula,” may facilitate further studies into the renewable supply of bryostatins, either through heterologous expression or targeted symbiont culture.
This research was performed in part using the computer resources and assistance of the UW-Madison Center For High Throughput Computing (CHTC) in the Department of Computer Sciences. We thank Niels Lindquist (UNC) for assistance with field collections, Ben Oyserman (UW-Madison) for assistance with RNA-seq analysis, and Ahron Flowers (RMC) for assistance with B. neritina genotyping. We thank the University of Wisconsin Biotechnology Center DNA Sequencing Facility for providing sequencing facilities and library preparation services.
The CHTC is supported by UW-Madison, the Advanced Computing Initiative, the Wisconsin Alumni Research Foundation, the Wisconsin Institutes for Discovery, and the National Science Foundation, and is an active member of the Open Science Grid, which is supported by the National Science Foundation and the U.S. Department of Energy's Office of Science. Additionally, this work utilized computer resources at Future Grid, which is supported by National Science Foundation grant 0910812. This work was supported by grant R21AI121704-01 from NIAID, as well as funding from The Thomas F. and Kate Miller Jeffress Memorial Trust (Bank of America, Trustee) and the American Foundation for Pharmaceutical Education (to I.J.M.), as well as the School of Pharmacy, the Graduate School, and the Institute for Clinical & Translational Research at the University of Wisconsin-Madison.
G.E.L.-F. and J.C.K. collected and prepared samples for analysis; I.J.M. and J.C.K. designed and performed the research; I.J.M., N.V., G.E.L.-F., S.S.F., and J.C.K. analyzed data; and I.J.M. and J.C.K. wrote the paper.
Supplemental material for this article may be found at http://dx.doi.org/10.1128/AEM.01800-16.