|Home | About | Journals | Submit | Contact Us | Français|
Anthropogenic changes can influence mutualism evolution; however, the genomic regions underpinning mutualism that are most affected by environmental change are generally unknown, even in well-studied model mutualisms like the interaction between legumes and their nitrogen (N)-fixing rhizobia. Such genomic information can shed light on the agents and targets of selection maintaining cooperation in nature. We recently demonstrated that N-fertilization has caused an evolutionary decline in mutualistic partner quality in the rhizobia that form symbiosis with clover. Here, population genomic analyses of N-fertilized versus control rhizobium populations indicate that evolutionary differentiation at a key symbiosis gene region on the symbiotic plasmid (pSym) contributes to partner quality decline. Moreover, patterns of genetic variation at selected loci were consistent with recent positive selection within N-fertilized environments, suggesting that N-rich environments might select for less beneficial rhizobia. By studying the molecular population genomics of a natural bacterial population within a long-term ecological field experiment, we find that: (i) the N environment is indeed a potent selective force mediating mutualism evolution in this symbiosis, (ii) natural variation in rhizobium partner quality is mediated in part by key symbiosis genes on the symbiotic plasmid, and (iii) differentiation at selected genes occurred in the context of otherwise recombining genomes, resembling eukaryotic models of adaptation.
Mutualisms play critical roles in natural and managed ecosystems. Recent work on context-dependency in mutualisms has shown that changes in the economy of benefits, such as a shift in the availability of an important traded resource in the environment, can alter the symbiont community composition and/or the forms of natural selection acting on mutualists, even causing the evolution of decreased partner quality or abandonment of the mutualism [1–5]. Despite these recent advances, however, we generally lack empirical evidence for the selective agents that act on mutualist partner quality and thus how partner quality (co)evolves in natural mutualist populations [6,7]. Moreover, few mutualist systems to date have afforded both the ecological relevance and genomic resolution necessary to study the mechanistic underpinnings of mutualism evolution.
The legume–rhizobium symbiosis, in which leguminous plants house nitrogen (N)-fixing bacteria in root nodules, is responsible for the vast majority of non-anthropogenically fixed N in terrestrial systems and thus plays a major role in the N cycle [8,9]. This symbiosis has become a key model for mutualism evolution, given the relative ease with which rhizobium genetic variants can be isolated and subsequently manipulated in experiments to test key predictions of evolutionary theory [10–14]. This recent boom in studies of phenotypic evolution in rhizobia, a long history of molecular genetic investigation resulting in well-annotated reference genomes (e.g. [15,16]), and most recently the ability to collect high-quality population genomic data with relative ease (e.g. [17–20]), all make N-fixing rhizobia ideal systems for addressing both mechanistic questions of mutualism evolution and more general questions about the mode of bacterial adaptation.
In a recent quantitative genetic study of Rhizobium leguminosarum, rhizobium symbionts of clover, we demonstrated that rhizobia from field soil populations fertilized with N for 22 years were, on average, less beneficial for their host plants, compared to those from control populations . In particular, rhizobium strains isolated from the N-fertilized plots resulted in host plants with 17–30% less biomass, 10–28% fewer leaves, and 6–17% lower leaf chlorophyll (depending on the clover host species), supporting a major prediction of mutualism theory—that changes in the resource environment can lead to the evolutionary decline of mutualism benefits [21–23]. The observed phenotypic differentiation in response to long-term experimental N-fertilization, however, does not address the rhizobium loci targeted by selection or the evolutionary mechanisms driving reduced rhizobium quality. For example, although numerous genes required for symbiosis with legumes have been characterized , whether those same loci underlie naturally occurring partner quality variation remains largely unknown . Similarly, partner quality could decline in high N environments if (i) positive selection has favoured lower quality rhizobium strains (i.e. rhizobia have adapted to sustained high N), or (ii) selection for high-quality strains has been relaxed (i.e. strains have simply become defective mutualists in response to sustained high N). These alternative hypotheses point out a critical distinction between defective mutualists, which are simply less beneficial for their partners, versus true ‘cheaters’, which gain a fitness benefit from being less beneficial [26,27]. The evolutionary importance of low-quality partners to the dynamics of contemporary mutualism dynamics remains controversial [14,26,28–31]. Thus, identifying whether selection favours cheating can shed light on this important issue in mutualism evolution.
Using natural populations of mutualist bacteria that have become phenotypically differentiated in response to a manipulated variable (N fertilization) provides the rare opportunity to address the genetic underpinnings of mutualism evolution in response to long-term changes in the resource environment in a well-replicated manner. Here, we compare the genome sequences of N-fertilized and control strains of R. leguminosarum to investigate patterns of genomic variation between N-fertilization treatments to identify the genomic basis of mutualism decline in rhizobia. Next, we ask whether patterns of nucleotide variation at putatively selected genes suggest adaptation of rhizobium mutualists, versus relaxed selection for mutualism, in high-resource environments. Our study merges long-term ecological field experiments and bacterial population genomics to answer key questions about evolutionary mechanisms in nature.
Here, we studied genomic variation among 63 strains of R. leguminosarum bv. trifolii, N-fixing rhizobial mutualists of clover (Trifolium spp.), that were isolated and characterized phenotypically by Weese et al. ; complete methods detailing the long-term ecological experiment, strain isolation, and phenotypic experiments are contained there. Briefly, rhizobium strains were isolated from both N-fertilized (22 years of fertilization with 12.3 g N m−2 per year granular ammonium nitrate) and control (unfertilized) plots located at the Kellogg Biological Station Long Term Ecological Research Site (KBS LTER; http://lter.kbs.msu.edu/). Weese et al.  found that rhizobium populations from the N-fertilization treatment had evolved to be lower quality partners for their hosts (i.e. resulted in lower plant above-ground biomass and chlorophyll content), compared with rhizobium populations from control plots (see electronic supplementary material, Dataset S1).
Strain isolates were grown on solid tryptone yeast (TY) media plates, incubated at 30°C until bacterial colonies became visible (3–4 days), and isolated strains were grown in 5 ml TY liquid media  in a shaking incubator for 1–2 days at 30°C. Genomic DNA was extracted using the MasterPure™ DNA Purification Kit (Epicentre). We used a NanoDrop™ 2000c (Thermo Scientific, Waltham, MA, USA) to ensure that all samples had A260/280 readings between 1.8 and 2.0. We generated multiplexed libraries (no bead normalization step) using the Nextera XT DNA Sample Prep Kit (Illumina, San Diego, CA, USA). Libraries were quantified with a Qubit fluorometer 2.0 (Life Technologies, Carlsbad, CA, USA) and then diluted with molecular grade water for normalization. Normalized libraries were pooled for paired end sequencing in a single lane on a HiSeq2000 sequencer (Illumina) at the W.M. Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign. The lane produced 467 million reads, resulting in an average of 720 megabase (MB) and 97× coverage per genome.
We used the A5 pipeline  with default parameters to automate sequence data cleaning, error correction, de novo assembly, and quality control. Genomes were assembled de novo rather than using reference-based assembly to avoid potential biases in synteny and to better assess highly variable (i.e. flexible) genomic content often found in bacteria [20,34,35]. A single control strain (155_C) could not be assembled properly due to poor sequence quality and was excluded from subsequent analyses. A portion of gaps were filled in silico with GapFiller v. 1.11, which re-uses paired reads to fill gaps left in draft assemblies . We annotated the genomes by uploading A5 assemblies to the ‘Rapid Annotation using Subsystem Technology’ server . Genes from annotated assemblies were grouped into recognized homologues using the ITEP toolkit , which uses the MCL program (Markov Chain Clustering) to cluster BLASTp results (0.4 cut-off, 2.0 inflation parameter ). Assembled genomes had a median length of 7.50 Mbp, a median scaffold number of 45, and a median N50 of 465 604 bp (electronic supplementary material, Dataset S2).
We defined the ‘core genome’ as orthologous, single-copy ITEP gene clusters found within all strains and the reference R. leguminosarum bv. trifolii strain WSM1325 . The remaining (non-core) genes comprise the ‘flexible genome’ (genes with variable copy number and/or shared among subsets of strains). Analysis of the flexible genome revealed high variation in gene content among strains, but no candidates strongly associated with partner quality decline (i.e. no genes were completely absent in control strains and present in all N-fertilized strains or vice versa). See electronic supplementary material, Dataset S3 and text S1, for full discussion of presence-absence variation (PAV) results. Using an ITEP pipeline, core gene sequences were aligned and reverse-translated with the auto flag in mafft v. 7.123b  and PAL2NAL v. 14 , respectively. The ITEP GBlocks wrapper script was used for automated alignment curation (-c 24 -f 24 -n 10 -m 5 -g a) for each core gene . There were a total of 2 433 core (i.e. single copy, present in all strains) genes among all 62 strains. For analyses using the 56 strains harbouring symbiosis genes on the symbiotic plasmid (pSym) (see below), there were a total of 3 173 core genes.
To determine evolutionary relationships among the experimental strains, we concatenated aligned core gene sequences on the chromosome and five plasmids, including the pSym. We used the GTR-Γ model with 100 bootstrap replicates in RAxML v. 8.0.19  to generate a core gene phylogeny and visualized the distribution of partner quality on the tree by mapping mean trait values using iTOL v. 2 . To test whether nucleotide variation in the rhizobium genome was structured geographically and/or between N fertilization treatments, we used Arlequin v. 18.104.22.168  to carry out separate hierarchical analyses of molecular variance (AMOVA) for core gene single nucleotide polymorphisms (SNPs) on the chromosome (2 521 single-copy genes present in all strains) and the core plasmid genomes (238, 195, 8, 113, and 98 genes in plasmids pR132501–pR132505, respectively). Strain, plot within treatment (six plots per N treatment), and N treatment were included as factors. The first 100 000 SNP sites were used for AMOVA on the chromosome due to data input limitations of Arlequin. AMOVA on remaining chromosomal SNP sites (data not shown) did not alter the results.
Many genes known to be critical for the formation and maintenance of symbiosis with legumes  exist within a symbiosis island roughly centred at 0.303 Mega base pair (Mbp) on the pSym of the reference strain, R. leguminosarum bv. trifolii WSM1325 , and we were interested in testing whether genes at this region were differentiated between N-fertilized and control rhizobia. We found that six strains (four N-fertilized strains, 40, 538, 643, and 773 and two control strains, 160 and 3) lacked many of the genes in this region and that the average partner quality for these six strains was below the average for all other strains (significantly so for plant biomass; t = −2.048, p = 0.045; electronic supplementary material, Dataset S2). This left a large gap in the core gene alignment at this critical region of interest (‘Results’, and gap on the pSym in electronic supplementary material, figure S1 and figure 1 of text S1); therefore, we used the 56 strains that possessed the region of the pSym in population genetic analyses of the core genome to test for patterns of differentiation at the symbiosis gene region of the pSym.
We expected that genes underlying phenotypic differentiation between N-fertilized and control rhizobia should be genetically differentiated (have elevated FST) between N-fertilized and control groups, and also be associated with symbiotic partner quality (i.e. differentiate low- and high-quality strains). Thus, we calculated FST for core genes in two ways. First, we identified genes showing elevated FST between N-fertilized and control strains (i.e. between the groups of rhizobia originally collected from N-fertilized versus control plots). Second, we ordered strains according to rank partner quality (rather than N treatment) by summing the ranks of all plant growth traits (see ranks in electronic supplementary material, Dataset S2) and split the strains into two groups: ‘high-quality’ (eight N-fertilized and 20 control strains) versus ‘low-quality’ (16 N-fertilized and 12 control strains). We then identified genes showing elevated FST between these two quality groups. In conjunction with the ITEP toolkit, we used a collection of ‘pop_genome’ scripts  (https://github.com/nyoungb2/pop_genome) to carry out population genetic analyses with core gene alignments. Population pairwise FST was calculated with default parameters for each core gene using Arlecore v. 22.214.171.124 . The 99th percentile of FST values was calculated with the ‘quantile’ function in R v. 3.0.3 (R Core Team, 2014; http://www.R-project.org/). We performed a sliding window analysis in R to calculate the mean of pairwise FST values using a window size of 13 genes (to roughly match the size of the region spanning the between-treatment differentiated symbiosis genes; see ‘Results’) and a step-size of seven genes (452 windows total).
Forms of selection acting differently in N-fertilized (versus control) plots should generate distinct patterns of nucleotide variation at candidate genes. If positive directional selection has favoured low-quality strains in the N plots, for example, N-fertilized strains should have reduced nucleotide diversity (π) at candidate genes compared to control strains, relative to the rest of the genome [48–51]. Alternatively, relaxed selection in the N plots should lead to increased π at candidate genes compared to control strains. We calculated π within N-fertilized and control populations for each core gene on the chromosome or pSym using ProSeq v. 3.5  and then performed a sliding window analysis of π as described above (394 windows total). The other four plasmids were excluded due to a relative deficiency of core genes with significantly elevated FST. R v. 3.0.3 was used to plot the regression of sliding window results for N-fertilized and control populations, and to generate the 95% confidence and prediction intervals to identify regions of the genome (sliding windows) with anomalous π.
We found that the vast majority of genetic variation occurred among individual strains (AMOVA; electronic supplementary material, table S1), consistent with other bacterial populations that are quite variable even at fine spatial scales [53,54]. We also detected spatial genetic structure (significant among-plot variance; electronic supplementary material, table S1), suggesting some limits to gene flow at a rather fine geographical scale in these rhizobia [55,56]. We found little evidence of genome-wide differentiation between N-fertilized and control rhizobia, as evidenced first by an intermixed core genome phylogeny (electronic supplementary material, figure S2), and second by the lack of between-treatment variance in the AMOVAs (electronic supplementary material, table S1A). This finding is consistent with a previous analysis of the internal transcribed spacer (ITS) locus  and facilitates the identification of genomic outliers that are particularly structured due to differential selection in N-fertilized versus control plots [57,58]. We posited that, compared to the rest of the genome, rhizobium genes underlying mutualism decline should: (i) be genetically differentiated between N-fertilized and control rhizobia, (ii) be associated with partner quality phenotypes (rhizobium effects on plant growth and chlorophyll content), and (iii) show anomalous patterns of nucleotide diversity consistent with recent selection.
The top 1% of FST values in our comparison of N-fertilized and control rhizobium populations (FST ≥ 0.074) included 20 chromosomal genes, nine pSym genes, and a single gene on both plasmid pR132502 and pR132505 (electronic supplementary material, Dataset S4). Twenty-five of these genes were significantly differentiated (17 chromosomal genes: mean FST = 0.090, s.d. = 0.016; eight pSym genes: mean FST = 0.125, s.d. = 0.037). These outliers stand out against a backdrop of low overall genomic differentiation (mean FST = 0.022, s.d. = 0.019, 99th percentile = 0.074), and the sliding window analysis revealed a particularly strong FST peak at the symbiosis gene region of the pSym (blue vertical line in figure 1a). Genes among the top 1% of FST values were significantly enriched for pSym genes (32% of genes in the top 1% were on the pSym, versus the null expectation of 8.8% from the core genome, two-tailed Fisher's exact test, p = 0.001). Evidence from flexible genome analyses (i.e. genes with variable copy number and/or presence among strains) also revealed that six relatively low-quality strains were missing several symbiosis genes on the pSym (see pSym gap in electronic supplementary material, figure S1), further implicating the symbiosis island as an area of interest.
Consistent with the above findings, genes in the top 1% of FST values comparing low versus high partner quality rhizobia were also enriched for pSym genes (48% on the pSym, p < 0.0001), and sliding window analysis revealed an island of high differentiation at the symbiosis region on the pSym (figure 1b), compared to a low genomic background (mean FST = 0.027, s.d. = 0.022, 99th percentile = 0.086). Specifically, the top 1% of FST values (FST ≥ 0.086; 34 genes) in this high- versus low-quality comparison included 19 on the chromosome (14 significant; mean FST = 0.100, s.d. = 0.017) and 14 on the pSym (13 significant; mean FST = 0.189, s.d. = 0.053), and a single gene on plasmid pR132502 (p > 0.05; electronic supplementary material, Dataset S4).
Putting these two analyses together, only six genes were among the most differentiated (top 1%) in both FST comparisons: nifH, nifA, nifD, fixC, nodB, and an electron transfer flavoprotein subunit (Rleg_4928)—all on the pSym (blue vertical line in figure 1). Together, these results implicate genes in this region as players in the evolution of reduced cooperation in response to N fertilization. A strong history of molecular genetic work using rhizobial mutants and changes in gene expression during various stages of symbiosis has revealed the necessity of many rhizobium genes (particularly those in this region) for the establishment and maintenance of symbiosis with legumes (e.g. [15,59–64]). Ecological genetic studies such as ours, which focus on natural variation and the loci underlying ecologically relevant traits, can complement mutant analyses by helping to resolve the loci targeted by contemporary natural selection [6,7,25,65].
The genome-wide average π was higher among N-fertilized strains compared with control strains, resulting from a few N-fertilized strains (209, 231, and 717) that dominated a clade more distantly related to the majority of other strains (electronic supplementary material, figure S2). Nevertheless, π covaried between the two groups for both the chromosomal and pSym windows (chromosome R = 0.81, p < 0.0001; pSym R = 0.56, p < 0.001; figure 2b,c), suggesting that the relative selective constraints among genes are similar across most of the genome for N-fertilized and control groups. Two chromosomal sliding window regions fell below the 95% prediction interval in the π correlation (figure 2b), suggesting positive selection in N-fertilized environments; however, these regions did not possess genes with FST in the top 1% of either FST analysis. Two pSym sliding window regions also fell below the 95% prediction interval (blue triangles in figure 2c), and these regions contained the six pSym loci from the top 1% of the FST outlier analyses above. Reduced nucleotide diversity at the six pSym genes revealed in the FST outlier analyses suggests positive selection on these genes in N-fertilized environments. In other words, these results are consistent with natural selection favouring less beneficial rhizobia in high N conditions. Manipulative experiments are required to test definitively whether low-quality N strains have higher fitness than control strains in high N environments; these experiments are underway.
One selective agent favouring lower partner quality in high N plots might be a shift towards the saprophytic lifestyle at the expense of symbiosis. Among other ecological changes in the Long Term Ecological Research (LTER) study since N fertilization began, the availability of clover hosts has declined in N-fertilized plots [5,66], likely leading to more frequent/prolonged saprobic phases for R. leguminosarum. Genetic trade-offs between symbiotic and free-living lifestyles could drive positive selection for less beneficial rhizobia in high N environments. For example, selection could favour mutualists that invest more in storage compounds such as poly-3-hydroxybutyurate (PHB), which can increase rhizobium fitness in the soil but likely divert carbon away from N fixation and thus from host benefits . Both molecular and experimental evolution studies suggest the potential for such a trade-off between saprotrophy and mutualism. Rhizobium etli (PHB synthase) and R. tropici (glycogen synthase) mutants have higher rates of nitrogen fixation in symbiosis and hoard less carbon for life outside the nodule [68,69], while Bradyrhizobium japonicum strains that evolved the most significant increase in host-free fitness after 450 generations in culture showed the largest decline in partner quality . We are currently investigating whether the N-fertilized strains in our study are better saprophytes.
Only six symbiosis-related genes on the pSym (nifH, nifA, nifD, fixC, nodB, and an electron transfer flavoprotein subunit) matched all three criteria for identifying the genes underlying the response to N fertilization. Because these genes are in close proximity, it is unclear which particular genes have been targeted by selection; nevertheless, given the large body of genetic work on symbiosis genes in this region , several interesting candidates stand out. For example, nifH, nifD, nifA, nodB, and fixC are among the top 1% of FST values in both FST analyses, and additionally nifK, nifE, nifN, and nodF are in the top 1% of FST values in the second analysis of high- and low-quality groups (a full list of candidate loci is in the electronic supplementary material, Dataset S4). The N-fixing (nif) genes nifH, nifD, and nifK are responsible for the formation of the nitrogenase enzymatic complex (Fe and FeMo proteins), which reduces atmospheric di-nitrogen (N2) to ammonia for plant use . The nifB, nifE, and nifN genes are involved in the synthesis of the FeMo cofactor that is located at the active site of the FeMo protein, where N2 binds and is reduced [72,73]. Additionally, nifA helps regulate the expression of various nif genes , while fixC is part of the fixABCX complex involved in gene transcription regulation in low oxygen conditions (e.g. inside the nodule) and is posited to have a central role in electron transfer to nitrogenase [75,76]. In general, nod genes code for enzymes that generate Nod factors, rhizobium signalling molecules between rhizobia and plant required for nodulation initiation; nodB in particular is part of the nodABC operon, which generates the main Nod factor structure .
Aside from the six symbiosis-related pSym genes discussed above, 12 window regions on the chromosome and one window on the pSym possessed decreased π in control, compared with N-fertilized, strains (and thus fall above the upper bounds of the prediction interval in figure 2b,c). There were no outliers from both FST analyses within these 13 windows; however, they did contain five chromosomal genes that were significantly differentiated between N-fertilized and control rhizobia (electronic supplementary material, Dataset S4), including adenylate/guanylate cyclase and FecR. It is unclear how these genes might govern partner quality traits per se, since they did not differentiate low- and high-quality rhizobia in the second FST comparison; nevertheless, they might have an important role in uncharacterized phenotypic differences between N- and control strains. Adenylate cyclases have recently been implicated in infection thread formation , as well as in the negative regulation of secondary infections after initial infection thread formation , potentially controlling the number of successful nodulation events. N-fixing rhizobia inside the nodule have a high demand for iron (e.g. nitrogenase and cytochrome synthesis), and iron uptake could occur by means of ferric citrate , which requires the cytoplasmic membrane protein FecR in Escherichia coli . A full understanding of how chromosomal and pSym genes together determine mutualist partner quality and other rhizobium phenotypes will require a much larger effort, ideally implementing emerging techniques for association mapping in bacteria [81,82].
Bacterial symbionts are key players in many plant and animal mutualisms [83–85], yet the historical dearth of ecological genetic studies in bacteria has meant that the rules governing adaptation in natural populations of bacteria are still being elucidated . The balance between natural selection and recombination within natural bacterial populations is one important unresolved aspect. With little recombination, selection acting at one or more loci should result in genome-wide sweeps that purge previously existing genetic variation within the population [19,87]. Many recent microbial studies, however, have discovered adaptive regions of reduced genetic diversity amidst high background polymorphism [18,88–91]—implying abundant recombination. Using a population genomic approach, we found a genomic island of differentiation between rhizobium populations that have experienced different fertilization histories and that this region stands out against a genome-wide background of recombination/admixture between the two groups. These findings are congruent with a recent study of a natural R. leguminosarum population from the UK, in which high rates of recombination sufficient to prevent divergence by drift were found . Thus R. leguminosarum in nature appear to form dynamic, diverse populations that are unified by gene flow despite selection acting at one or more loci. In our case, where genomic islands of differentiation (this study) and large phenotypic differences  were detected, selection due to N addition must be exceptionally strong.
Because the rhizobium populations were not assayed when the LTER experiment began, we cannot know with certainty whether particular nucleotide-level differences underlying our phenotypic differentiation have arisen through mutation since that time (22 years ago), or instead pre-dated the establishment of the N plots. These alternatives represent two distinct mechanisms of evolution . On the one hand, new mutations could have arisen and increased in frequency atop a diverse phylogeny of rhizobium strains (selection on new mutations); on the other hand, pre-existing alleles could have increased in frequency (selection on standing genetic variation). Given the levels of nucleotide diversity found throughout the genome, in addition to our observations of substantial standing variation in partner quality variation within the control plots , we suspect that the alleles at the key outlier region on the pSym pre-dated the establishment of the N plots—i.e. that selection has acted on standing genetic variation. Evolution from standing variation should proceed more quickly than evolution from new mutations [93,94], potentially allowing populations to adapt more quickly. Indeed, how quickly N-fertilized populations of rhizobia might return to control levels of partner quality after a return to low N conditions is an interesting open question.
We thank B. Gordon for laboratory assistance, N. Youngblut and M. Friesen for sequencing and bioinformatics advice, and R. Whitaker, Z. Cheviron, P. Tiffin, and the Heath laboratory for comments on the manuscript. This is KBS publication no. 1903.
Illumina reads and de novo genome assemblies are available via DRYAD (http://dx.doi.org/10.5061/dryad.9gn28).
We declare we have no competing interests.
This work was funded by the National Science Foundation (NSF DEB-1257938), the NSF LTER Program at the Kellogg Biological Station (DEB-1027253), and Michigan State University AgBioResearch.