|Home | About | Journals | Submit | Contact Us | Français|
M.J.M. designed the experiments, performed all work not described below, and wrote the paper. M.S.W. conducted the phylogenetic studies. K.L.V. constructed plasmids and strains, and imaged biofilm phenotypes. M.J.M. planned and performed the genome assembly and analytics, and M.J.M, E.V.S., and E.G.R. analyzed the bioinformatics results.
Microbial symbioses are essential for the normal development and growth of animals1,2,3. Often, symbionts must be acquired from the environment during each generation, and identification of the relevant symbiotic partner against a myriad of unwanted relationships is a formidable task4. While examples of this specificity are well-documented, the genetic mechanisms governing it are poorly characterized5. Here we show that the two-component sensor kinase RscS is necessary and sufficient for conferring efficient colonization of Euprymna scolopes squid by bioluminescent Vibrio fischeri from the North Pacific Ocean. In the squid symbiont V. fischeri ES114, RscS controls light-organ colonization by inducing the Syp exopolysaccharide, a mediator of biofilm formation during initial infection. A genome-level comparison revealed that rscS, while present in squid symbionts, is absent from the fish symbiont V. fischeri MJ11. We found that heterologous expression of RscS in strain MJ11 conferred the ability to colonize E. scolopes in a manner comparable to that of natural squid isolates. Furthermore, phylogenetic analyses support an important role for rscS in the evolution of the squid symbiosis. Our results demonstrate that a regulatory gene can alter the host range of animal-associated bacteria. We show that, by encoding a regulator and not an effector that interacts directly with the host, a single gene can contribute to the evolution of host specificity by switching “on” pre-existing capabilities for interaction with animal tissue.
Genomic technologies are facilitating major advances in understanding the relationships between metazoans and their bacterial symbionts. The analysis of unculturable endosymbionts has revealed complex genetic interdependence between host and bacteria amid patterns of genome reduction in endosymbiotic lineages3. Similarly, members of the human microbiota are being identified through metagenomic analysis1,2, and the molecular communication between host and microbe has begun to be interpreted through transcriptional profiling6. Despite these advances, the mechanisms by which host-symbiont specificity develops in animal-bacterial interactions are not clear. Many animals, including humans, are born devoid of symbionts and must recruit their microbiota from the environment7. The process by which hosts and symbionts find each other to initiate a mutualism must be sensitive enough to identify the correct partner even when the symbiont is a minority constituent of the microbial community, and specific enough to exclude interlopers from gaining access to the host. The basis of species specificity is also poorly understood for pathogenic interactions, as similar congeneric bacteria often have distinct host ranges8,9,10.
In this study we used a comparative genomics approach to reveal how bacterial-host specificity is established in the Euprymna scolopes (squid)-Vibrio fischeri mutualism. We took advantage of the fact that V. fischeri strain MJ11—which was isolated from the Japanese pinecone fish, Monocentris japonica11—is unable to efficiently colonize E. scolopes. As such, comparison of MJ11 with natural squid symbionts provided a valuable system for examining the genomic basis of host specificity in an animal symbiont.
While the genome sequence of squid-symbiotic V. fischeri strain ES114 is known12,13, we determined here the sequence of the fish symbiont MJ11. Genome assembly of MJ11 was based on the ES114 model using a combination of PCR- and fosmid-based approaches. Genome sequencing also revealed a 179 kb circular plasmid in MJ11 that we term pMJ100, in which 82% of the ORFs are annotated as hypothetical proteins, and that is distinct from the plasmid carried by ES114. Alignment of the assembled chromosomes revealed two circular MJ11 chromosomes that are colinear to those in ES114 (Fig. 1a). Over 90% of ES114 ORFs are shared by MJ11, and the orthologs have a median amino acid identity of 98.8%. One exception to the high level of conservation was significant divergence observed specifically in the LuxR quorum-sensing system (Supplementary Fig. 1 and Supplementary Discussion).
Examination of ES114 genes for those that could facilitate specific recognition identified rscS as a promising candidate because its product acts during symbiotic initiation14,15, and we discovered it to be absent in the MJ11 genome (Fig 1b). RscS is a membrane-bound two-component sensor kinase that acts upstream of the response-regulator SypG16. SypG, a σ54-dependent transcriptional activator, facilitates transcription of the eighteen-gene exopolysaccharide (EPS) locus sypA-R17. Production of the Syp EPS enables V. fischeri aggregation in squid-derived mucus during colonization of E. scolopes. During growth in culture, syp genes are expressed at low levels but can be induced by the plasmid-borne rscS1 overexpression allele15,17,18, leading to the production of robust biofilms. Because the MJ11 genome revealed an intact syp locus, we asked whether signal transduction downstream of rscS was maintained in MJ11 by introducing rscS1. As shown in Figure 1c–d, rscS1 in MJ11 induced multiple biofilm phenotypes, suggesting that MJ11’s syp locus was functional. We therefore examined whether rscS was sufficient to allow MJ11 to efficiently colonize E. scolopes.
We tested ES114 and MJ11 for their ability to colonize aposymbiotic E. scolopes hatchlings in a 3 h-inoculation assay. ES114 colonized successfully, whereas MJ11 failed to initiate colonization, even if present at a 10-fold higher inoculum concentration (Fig. 1e). However, when provided with rscS+ in trans, MJ11 was competent to colonize E. scolopes to levels comparable with those seen from the natural symbiont ES114 (Fig. 1e). Furthermore, the luminescence emitted by MJ11/rscS+-colonized animals was 100-fold greater than that from animals colonized by ES114. The increased luminescence is consistent with that of the brighter fish symbiont19, and was not influenced by plasmid carriage (Supplementary Fig. 2). This result argues that the carrying capacity of the juvenile squid light organ is specified by symbiont cell number and not by the amount of luminescence emitted, provided that a minimum threshold of light production is achieved20,21.
We next asked whether rscS was present in a collection of V. fischeri squid and fish isolates from the North Pacific Ocean to determine whether the gene’s host distribution was consistent with a functional role in nature. All V. fischeri in the analysis revealed the presence of three representative syp genes (Fig. 2a). In contrast, while all of the squid isolates encoded rscS, regardless of geography, only five of the ten fish isolates encoded rscS; in four of these five the allele was significantly divergent. PCR-amplification of the divergent alleles produced a band that was distinctively larger than the allele in ES114 and the rest of the squid isolates, whereas MJ12 was the only fish isolate that had this smaller squid symbiont-like band (Fig. 2a).
We term the allele encoded by the smaller band rscSA, and that encoded by the larger band rscSB. RscSA was found in all assayed North Pacific squid isolates, and fish isolate MJ12, while RscSB was identified only in four fish isolates. Sequencing revealed that, within each type, the RscS alleles are highly conserved (amino acid identity ≥ 96%), but that divergence between the types was greater (84–86%; Supplementary Fig. 3a). The presence of an identical domain structure in all V. fischeri RscS proteins (Supplementary Fig. 3b) led us to ask whether there was detectable functional significance to this level of divergence.
All of the rscSA strains were competent to colonize E. scolopes squid efficiently (Fig. 2b). In contrast, strains lacking rscS or encoding the divergent rscSB were unable to colonize consistently. The defect appeared to be due to RscS function and not due to the syp locus or other differences; introduction of rscS+ from ES114 (A-type) into rscSB-containing mjapo.8.1 conferred 100% colonization efficiency (Fig. 2b). The only fish-symbiotic strain that was competent to colonize E. scolopes reproducibly (MJ12) was also the only one with the conserved rscSA allele. Interruption of rscS in MJ12 abolished its ability to colonize E. scolopes (Fig. 2b), confirming that rscSA is both sufficient and necessary to colonize the squid host in these populations.
To understand the evolution of rscS and its role in determining specificity in nature, we reconstructed the phylogeny of V. fischeri strains (Fig. 3 and Supplementary Fig. 4) using three well-characterized loci. Strains encoding rscS formed a monophyletic group within V. fischeri that was statistically well-supported. Parametric bootstrapping rejected the alternative hypothesis of non-monophyletic origin for the rscS-encoding strains, at a significance level of P < 0.01.
We propose a model for rscS evolution in the symbioses of North Pacific Ocean squids and fish. This model represents a parsimonious synthesis of the colonization and genomics data, within the phylogenetic framework. Specifically, we hypothesize that an acquisition event introduced rscS into the V. fischeri lineage prior to this species’ expansion into squid hosts in the North Pacific Ocean (Fig. 3, Fig. 4). An initial acquisition, followed by vertical transmission of rscS among V. fischeri, would predict both a similar GC-content among all rscS alleles in the species, and a single conserved genomic location for the gene in all extant V. fischeri genomes. We confirmed these predictions, as the rscS alleles from V. fischeri have similar GC-content (Supplementary Fig. 3a) and are present in the same genomic position (Fig. 2a).
Because the fish isolates that contain rscS fall within the same clade as squid isolates (Fig. 3), we argue that the fish- and squid-symbiotic populations in Japan are, indeed, sympatric and that the rscS-containing fish isolates are descendents of squid symbionts. We hypothesize that rscSA diverged significantly in the fish host to generate the rscSB allele, which is not sufficient to allow these strains to colonize the squid host niche. Further, because RscSB maintained its reading frame and domain structure—despite significant amino acid divergence and loss of function for squid colonization—we hypothesize that RscSB is fish-adapted, and may play a role in activating syp, and/or other targets, under fish-specific conditions. The identification of a fish isolate encoding RscSA provides strong evidence that the rscSA locus does not preclude successful fish colonization by V. fischeri, despite the low frequency of this allele among the fish isolates examined. Unfortunately, M. japonica eggs do not fully develop in the laboratory, so we are unable to test this aspect of our model by investigating V. fischeri colonization of fish22.
There are two formal possibilities for how rscS first entered the V. fischeri lineage. Either a gene duplication/translocation event within V. fischeri led to the initial generation of rscS, or rscS was generated outside of V. fischeri and then acquired by horizontal gene transfer. We have found no DNA sequences paralogous to either of the rscS alleles in the full genomes of ES114 or MJ11. These data are also the most compelling, if indirect, evidence supporting the proposal of horizontal gene transfer. Nonetheless, it is difficult to reconstruct the event that introduced rscS to V. fischeri based on the usual criteria that define larger genomic islands (e.g., direct repeats or insertion elements in flanking DNA, or aberrant codon usage or GC content within rscS)23. Furthermore, the only convincing ortholog of rscS outside of V. fischeri is V. shiloi AK1 VSAK1_16757. If a horizontal transfer event were responsible for rscS transmission into V. fischeri, the V. shiloi ortholog is unlikely to be the source: the GC-content of ES114 rscS is 31.7%, or 6.6% below the ES114 genome average (38.3%). The V. shiloi ortholog has a GC-content (41.0%) that is even higher than this average, and 9.3% higher than that of the ES114 rscS allele.
Attempts to understand the molecular basis of host specificity have been unsuccessful in many pathogenic host-animal interactions. Salmonella enterica serovar Typhi can infect only humans, whereas serovar Typhimurium has a broad host range that includes mice9. Although the conserved regions of the genomes of these two strains are over 97% identical, efforts to account for this differential host specificity have not succeeded. Similarly, different Brucella species share over 98% identity across 90% of their genes, yet exhibit strict host specificity; the molecular basis of this specificity remains unclear10. In contrast, the study of mutualisms is providing insight into how specificity develops. In plant-associated bacteria, work from many laboratories has established nitrogen-fixing, nodulating rhizobia as the best-understood system for the development and evolution of host specificity24. Bacteria secrete Nod factors—lipo-chitooligosaccharide signals—to the plant host, and host strain-specific backbone modifications encoded by the bacteria lead to relationship specificity. Recently, in an animal-bacterial mutualism, the nilABC genes of Xenorhabdus nematophila were characterized as sufficient for colonization of Steinernema carpocapsae worms by congeneric Xenorhabdus bacteria25.
In contrast to rhizobia and Xenorhabdus, in which specificity comes either from the modification of a secreted signal or from structural proteins in the cell envelope, respectively, rscS-mediated specificity in V. fischeri is novel because the immediate effect of cytoplasmic RscS is on bacterial gene expression, which only subsequently has an effect on the interaction with the host. Because RscS is a signal transduction protein, the evolutionary consequence of the introduction of rscS appears to be a reprogramming of inherent V. fischeri capabilities to expand the host range into squid populations that V. fischeri could not previously colonize, or could colonize only inefficiently. It remains a mystery as to why the syp genes are conserved in V. fischeri strains that are naïve to rscS (e.g., MJ11). That such syp clusters are functional, and ancestral to rscS in V. fischeri, strongly suggests that regulation of syp in these strains may be achieved in a manner independent of rscS. In support of this, there are V. fischeri isolated from the Mediterranean Sea that lack rscS, yet have syp genes and colonize squid hosts of a different genus through morphological structures that are conserved with those of E. scolopes26.
Our study indicates that a regulatory gene is sufficient to alter host range in an animal-bacterial mutualism. The fundamental biological question of how animal-bacterial partnerships are established has been difficult to access through investigations of pathogenic interactions. In contrast, mutualism evolves to confer joint benefits to its partners, and relies on a strict specificity for this outcome: i.e., entry of only a few appropriate symbionts and exclusion of the large number of nonspecific interlopers. The evolution of developmental mechanisms to winnow the appropriate partner(s) is a hallmark of all horizontally-acquired mutualisms. The binary squid-Vibrio system thus represents a valuable model in which to interrogate the mechanisms that underlay the development of bacteria-host specificity.
The previously-deposited draft genome of V. fischeri MJ11 was assembled into the final scaffold by comparing contigs to the ES114 assembly using Mauve27. Hypotheses were tested by PCR across the contig gaps or by sequencing of fosmids spanning the gap. In a minority of cases, no PCR product was produced, and the model was refined by rearranging contigs and retesting. In this manner, all of the contigs were arranged relative to ES114, and contig gaps that could be spanned by PCR were sequenced to complete the gap sequence. Three gaps on chromosome I contained tandem (≥2) rrn operons; in these cases, sequence flanking the gap was PCR-amplified through the first rRNA gene at each end of the rrn array so that the completed genome is expected to contain all predicted ORFs in V. fischeri MJ11.
We identified and corrected frameshift and nonsense mutations in the genome model12, and the final sequences were annotated by JCVI. To identify ES114-MJ11 orthologs, we performed reciprocal BLASTP searches between the predicted proteomes. Percent identity was used to score results, and at least 60%/each protein coverage was demanded.
Full methods and associated references are available in the online version of the paper at www.nature.com/nature.
Standard microbial techniques were used to construct strains and plasmids28. Growth of V. fischeri was at 20–28°C with aeration. Media for growth of V. fischeri was LBS29, and for E. coli was LB30 or Brain Heart Infusion (Bacto, Becton Dickinson and Company, Sparks, MD). Antibiotics used include: chloramphenicol (2.5 µg/ml for V. fischeri, 25 µg/ml for E. coli), erythromycin (5 µg/ml for V. fischeri, 150 µg/ml for E. coli), and tetracycline (5 µg/ml for V. fischeri).
Plasmids pKV69, pLMS33, and pKG11 were introduced into V. fischeri by triparental conjugation as described previously28 with E. coli carrying the pEVS104 helper plasmid. Briefly overnight cultures of the following strains were used for the reaction. One hundred microliters from each of donor and helper E. coli strains were pelleted at 16,000 × g for 2 min in a microfuge tube, and the supernatant was aspirated. One hundred microliters of recipient V. fischeri was added to the same tube, and pelleted as above. Following aspiration of the supernatant, the pellet was resuspended in 10 µl LBS, and the entire 10 µl was spotted onto an LBS agar plate, and incubated at 28°C overnight. The spot was resuspended in 500 µl LBS, and 50 µl were plated onto selective media (LBS-chloramphenicol) to select for plasmid transfer.
The rscS mutagenesis plasmid pKV188 was constructed by subcloning an approximately 700-bp internal rscS fragment from ES114, terminating in the internal PstI site, into the KpnI/PstI sites of pEVS12228. Mutagenesis of rscS in the strains noted (Supplementary Table 1) was as described previously31; briefly, following triparental conjugation as above, integration of the suicide vector was identified by selection of the entire mating spot on LBS-erythromycin.
V. fischeri strain MJ11 (alias MJ101) was isolated by sterile expression of the light-organ sample from a live M. japonica at the Steinhart Aquarium, in February 1991. M. japonica symbiont strains denoted “mjapo.#.#” were kindly shared by Paul Dunlap, University of Michigan.
Sources of other strains are as noted in Supplementary Table 1.
The V. fischeri MJ11 draft genome was sequenced by the J. Craig Venter Institute (JCVI) as part of The Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project. Cloning and shotgun sequencing was performed at JCVI; draft coverage was obtained at 8.57X-fold coverage, and the contigs were previously deposited into GenBank as a whole genome shotgun (WGS) sequencing project, accession no. ABIH00000000, project version 01.
V. fischeri MJ11 contigs from the above project were aligned to the ES114 genome using Mauve27 and Projector2 32. Alignment of all contigs that were from multiple reads (>2 kb), excluding repetitive rDNA sequences, identified strong matches to the ES114 Chromosome I and Chromsome II, with the exception of 179 kb contig #1101159000798. Due to the high level of conservation between the strains, the corresponding ES114 sequences at all contig gaps were used as estimates of the gap lengths—and as guides for sequencing primers across long gaps—and primers were designed to amplify across each contig-gap, including at least an additional 200 bp of overlap with each adjacent contig, and extending beyond regions of repetitive DNA at either side of the gap boundary. PCR primers were designed with Primer3Plus33 and sequencing primers were designed at the SGD web site34. In some cases, custom software was utilized to assist in the identification of probable-unique regions for primer-binding-sites within extended repeat regions.
Contig #1101159000798 was unique in bearing no homology to ES114 or to any extended sequence in GenBank at the start of this project (Fall 2006). We postulated that this represented a large (circular) plasmid in MJ11, and consistent with this hypothesis primers pointing outward from both ends of the contig together amplified a small fragment (<1 kb).
Three of the contig gaps contain tandem rRNA operons, and PCR across the entire gap was unsuccessful. We initiated PCR-walking into the each gap by amplifying from one end of the gap to a conserved region that was distal to the 16S–23S spacer on that end of the gap, using primers listed in Supplementary Table 2. Using this approach, we identified the six unique spacer sequences that comprised the terminal rRNA spacers for the three gaps, leaving only additional rRNA (and potentially tRNA) sequences remaining to be sealed within each of these three regions. These contigs were still assembled into a scaffold and submitted for deposition so that information regarding the relative positions of DNA sequences within the molecule (Chromosome I) was preserved.
We previously published methodologies to identify and correct sequencing errors in microbial genomes12. We applied this technology to the MJ11 genome and identified fifteen high-priority resequencing targets. Nine of these sites were in fact in error, and we corrected these errors for inclusion in the final genome release. The PCR/sequencing primers used to target these regions are listed in Supplementary Table 2.
Following assembly, the sequence was resubmitted to the JCVI Annotation Service and post-processing at JCVI and NCBI, and deposited into GenBank as listed in Supplementary Table 3.
We compared the predicted proteomes from both ES114 chromosomes against both MJ11 chromosomes. Reciprocal exhaustive BLASTP35 searches were performed with an expect cutoff of 10. Results were filtered to demand that the query length and subject length each be a minimum of 60% of their respective total lengths. Among the remaining results for each query protein, best-hits were scored by percent amino acid identity, and additional results were included for analysis if they scored at least 70% of the maximum score for that query. ES114-MJ11 protein pairs included on reciprocal lists were candidate orthologs, and for pairs in which there was a duplicate of query or subject protein, manual assignment of orthology was curated using the parameters of percent amino acid identity, percent of each protein aligned, and the local genomic context (synteny) of the two proteins.
The two biofilm phenotypes evaluated were colony morphology and pellicle formation. To assay for the ability to form wrinkled colonies, cells were streaked onto LBS-tetracycline agar and the plates were incubated at room temperature for two days. To assay for the ability to form pellicles, cells were inoculated into HEPES minimal medium36 containing 0.3% casamino acids, 0.2% glucose, and tetracycline at a final concentration of 30 µg/ml. After overnight growth at 28°C with shaking, cells were diluted to an OD600 of 0.1 in fresh HEPES minimal medium. 3 ml of cell suspensions were introduced into the wells of a 12-well microtiter dish and the cells were incubated statically at room temperature for five days. To facilitate visualization and imaging of the pellicles, the media surface was disturbed with a pipet tip, resulting in clumps of cells if a pellicle had formed.
Juvenile E. scolopes hatchlings were collected aposymbiotically, and washed in Instant Ocean (Aquarium Systems Inc., Mentor, OH) that was filter-sterilized through a 22-µm pore-sized Nalgene filter (FSIO, filter-sterilized Instant Ocean). Overnight cultures of bacteria in LBS were subcultured 1:40, grown for 70 min, assayed for OD600, and then inoculated into 40-ml of FSIO/squid at a volume equivalent to 1.25 µl per OD600. The inoculum was plated onto LBS plates to confirm that the bacterial concentration was 2–10 × 103 CFU/ml (CFU, colony-forming units). Squid were washed with fresh FSIO at 3 h and 24 h post-inoculation. Individual animal luminescence was recorded at 48 h post-inoculation before they were sacrificed at 48 h by freezing at −80°C. Symbiont CFU/squid were determined by homogenizing thawed animals, and plating the homogenates onto LBS. For experiments involving pKV69-series plasmids, squid hatchlings were maintained in FSIO containing chloramphenicol at a final concentration of 2.5 µg/ml.
Luminescence of animals is reported as RLU = 24 × lum, where lum is the recorded luminescence of a single animal in a TD20/20 luminometer, with recordings performed in 4-ml FSIO in glass scintillation vials at 51.9% sensitivity with 6-sec integration. 1 RLU ≈ 1.98 × 104 quanta/s. Animals with RLU > 25 were scored as colonized. Since the strains that failed to colonize E. scolopes were significantly brighter than ES114, this metric served as a conservative measure of colonization competency for the set of isolates examined in this study.
PCR amplification was conducted using Platinum Taq DNA Polymerase High-Fidelity (Invitrogen, Carlsbad, CA). Fifty-microliter reactions contained: 50 ng MJ11 genomic DNA, 1X reaction buffer, 0.2 mM of each dNTP, 2 mM MgSO4, 0.25 µM of each primer, and 1 U DNA Polymerase. At least three independent PCR reactions were combined for sequencing to minimize the effect of PCR error. Thermal cycling was conducted in a PTC-200 thermal cycler (MJ Research, Watertown, MA): 95°C for 2:00; then 30 cycles of 95°C for 0:30, 55°C for 0:30, 68°C for 0:30–1:00 per kb amplified; then 68°C for 5:00.
Products > 5 kb were amplified with Platinum Pfx DNA Polymerase (Invitrogen, Carlsbad, CA) prior to sequencing. Fifty-microliter reactions contained: 50 ng MJ11 genomic DNA, 2X reaction buffer, 0.3 mM of each dNTP, 1 mM MgSO4, 0.30 µM of each primer, and 1 U DNA Polymerase. Thermal cycling was conducted as above.
Primers for MJ11 genome closure are listed in Supplementary Table 2.
Conditions were as described above (MJ11 genome closure) with the following alterations. Template preparation consisted of bacterial strains grown overnight in LBS, diluted 1:100 in dH2O, and then used as template in the PCR reactions at a dilution of 1:10. Annealing temperature for the sypR-internal primer set was 50°C. Primers for diagnostic amplification of rscS and syp locus genes are listed in Supplementary Table 2.
PCR amplification was conducted using GoTaq (Promega, Madison, WI). Bacterial strains were grown overnight in LBS, diluted 1:100 in dH2O, and then used as template in the PCR reactions at a dilution of 1:10. Twenty-five-microliter reactions contained: template preparation (2.5 µl), 1X colorless reaction buffer, 0.2 mM of each dNTP, 0.9 µM of each primer, and 1 U DNA Polymerase. Thermal cycling was conducted in a PTC-200 thermal cycler (MJ Research, Watertown, MA): 95°C for 3:00; then 26 cycles of 94°C for 0:30, 55°C for 0:30, 72°C for 1:00; then 72°C for 10:00. Primers for phylogenetic analyses are listed in Supplementary Table 2.
Sanger-type sequencing of PCR products for MJ11 genome assembly, and for phylogenetic analyses, was performed at the University of Washington High-Throughput Genomics Unit (Seattle, WA) and the University of Wisconsin Biotechnology Center DNA Sequencing Facility (Madison, WI), with the primers listed in Supplementary Table 2.
Sequence data for rscS from V. fischeri ES114 and from V. shiloi AK1are from GenBank accession nos. AF319618 and EDL55668, respectively. Sequence data for phylogenetic analysis of the following strains are as noted: Aeromonas salmonicida subsp. salmonicida (GenBank accession no. CP000644), Vibrio harveyi BB120 (CP000789, CP000790), Vibrio parahaemolyticus RIMD2210633 (BA000031, BA000032), Photobacterium profundum 3TCK (AAPH00000000), and Vibrio fischeri ES114 (CP000020, CP000021).
Sequences from the three loci (recA, mdh, katA) were aligned using ClustalX 1.8337, and trimmed and concatenated using custom Perl scripts38 and MEGA439. The best-fit model of DNA substitution and parameter estimates used for tree reconstruction was chosen by performing hierarchical likelihood ratio tests on these data, as implemented in PAUP* 4.0b1040 and MODELTEST 3.741; this model was SYM+I+G, a submodel under the GTR+I+G (general time reversible with gamma-rate distribution across sites and a proportion of invariant sites). Phylogenetic trees’ clade topology and confidence were studied using three approaches: i) MrBayes 3.142, implementing the Markov chain Monte Carlo method with an evolutionary model set to GTR with gamma-distributed rate variation across sites and a proportion of invariable sites, was run for 5,000,000 generations using the CIPRES project portal43. Sample frequency was 1,000, creating a posterior probability distribution of 5,000 trees; when summarizing the substitution model parameters and trees, 1,250 trees were discarded as burn-in to address potential chain instability. ii) ML (maximum likelihood) analyses were performed using the genetic algorithm approach of GARLI44 as implemented in the CIPRES portal, with an evolutionary model set to GTR with gamma-distributed rate variation across sites, and a proportion of invariable sites. Bootstrap analysis of 1,000 replications was used to assess the support for internal nodes. iii) Unweighted MP (maximum parsimony) analysis and bootstrap were performed by PAUP* (1,000 replications) using heuristic searches implementing TBR (tree bisection and reconnection) branch-swapping to find the shortest trees and assess the support for internal nodes. For ML and Bayesian approaches, the process was independently repeated three times to ensure arrival at a similar, most-likely tree topology. Resulting trees were rooted with A. salmonicida subsp. salmonicida A449 as the outgroup (Fig. S2).
Parametric bootstrap analysis: From our original data, the difference in likelihood scores between an unconstrained phylogeny and a constrained phylogeny with a non-monophyletic, rscS-containing clade was calculated. One hundred simulations of the dataset were created using the constrained topology; likelihood scores were produced from these 100 simulated datasets both with and without the constraint of the non-monophyly of the rscS-containing clade using the software PAUP*. Our null hypothesis of the significance of the constraint of non-monophyly of the rscS-containing clade within our initial phylogeny was rejected based on analysis of the resulting likelihood ratio distribution (p < 0.01).
Whole genome codon usage for ES114 was analyzed by the methods of Karlin and Mrázek45, against a set of reference sequences from ES114 which included ribosomal proteins, chaperones, and transcription/translation factors. Calculations were performed on the Computational Microbiology Laboratory server46, and the codon usage of rscS was predicted to be neither highly expressed (PHX) nor alien (PA). All comparisons of codon bias performed placed rscS in the 95% confidence interval for the ES114 genome.
In addition to software noted above, Lasergene Seqbuilder and Seqman (DNASTAR, Madison, WI) were employed for sequencing and genome assembly. Mauve27 was used extensively during the dynamic process of contig assembly and orientation. Analysis of RscS domain structure was assisted by PFAM47 and Phobius48. Primer design was aided by Primer3Plus33. Treeview 1.6.649 was used to view phylogenetic trees.
We thank P. Dunlap for sharing bacterial strains; N. Perna, J. Glasner, K. Geszvain, D. Baum, J. Johnson, M. Sarmiento, and S. Ferriera for technical assistance; A. Wier, N. Bekiares, R. Gates, and the Hawaii Institute of Marine Biology for animal facilities and care; J. McCosker and the Steinhart Aquarium for access to fish specimens; M. McFall-Ngai, H. Goodrich-Blair, C. Brennan, and J. Troll for critical discussions; and L. Proctor for project support. MJ11 genome sequencing was funded by the Gordon and Betty Moore Foundation Marine Microbial Genome Sequencing Project, E.G.R. and coworkers are funded by the NIH-NCRR and the NSF-IOS, E.V.S. is funded by an NSF CAREER Award, K.L.V. is funded by the NIGMS, M.J.M. is funded by an NIGMS NRSA Postdoctoral Fellowship, M.S.W is funded by an NSF Predoctoral Fellowship and an NIH Molecular Biosciences Training Grant to the UW.
Sequence data for recA, mdh, katA, and rscS from the additional strains described in the article have been deposited at GenBank under accession nos. EU907941-EU908017; MJ11 genome data are accession nos. CP001133, CP001134, and CP001139. Details are listed in Supplementary Table 3 linked to the paper at www.nature.com/nature. Reprints and permissions information is available at www.nature.com/reprints.
The authors declare no competing financial interests.