IS abundance within the Synechococcus OS-A and Synechococcus OS-B′ reference genomes.
OS-A and Synechococcus
OS-B′ genomes were examined exhaustively for ISSoc sequences (extending and enhancing the initial annotation) (4
). A summary of the findings is presented in (also see Fig. S1, first and second rings, and Tables S1 and S2 in the supplemental material). The Synechococcus
OS-A genome has 71 full-length ISs. Of these, 55 (78%) are putatively functional, with intact transposase genes and flanking regions. The other 16 either are interrupted by insertions or have transposase genes with point mutations and/or frameshift mutations. Although they no longer appear to be capable of autonomous transposition, the preservation of their transposition signals leaves open the possibility that they are still capable of being acted upon by an active transposase. We also identified 103 partial IS copies lacking large segments of sequence through truncations at one or both ends or by internal deletion. Synechococcus
OS-B′ has 82 full-length ISs, including 78 putatively functional ISs (95%) and 88 partial ISs.
Abundance and structure of ISSoc subfamily insertion sequences in Synechococcus OS-A and Synechococcus OS-B′
Fifteen distinct IS subfamilies have been identified in the two genomes (designated ISSoc families [4
]) (see Fig. S2 in the supplemental material). Representative members of each subfamily were classified using the IS database analysis tool (31
) and by examining the structure of the ISs (). Four of the subfamilies (ISSoc3, ISSoc6, ISSoc10, and ISSoc11) are IS605
-like. Six subfamilies are similar to IS1341
(ISSoc1, ISSoc5, ISSoc7, ISSoc9, ISSoc12, and ISSoc15). The subfamily ISSoc2 is part of the IS607
family, ISSoc4 is part of the IS630
family, and ISSoc13 is a member of the IS5
family (as reported in the IS database [31
]). ISSoc8 could not be placed in any described IS family, and ISSoc14 had moderate similarity to the IS605
IS family assignments and conserved insertion site targets for ISSoc subfamilies
In addition to the intact ISs, many of the partial ISSoc2s in these genomes have been proposed to be active nonautonomous transposable elements (unpublished data). Novel placements of these elements were observed in the metagenomic data set, suggesting an additional 30 active transposable elements in Synechococcus OS-A and an additional 43 in Synechococcus OS-B′. These and other potentially active but partial sequences of other ISSoc families represent an additional source of mutations that these populations must withstand in order to survive.
Estimations of abundance and distribution of ISs in the natural community.
Multiple representatives of all 15 subfamilies were identified in the metagenome (). Since our focus was on ISSoc activity in Synechococcus OS-A-like and Synechococcus OS-B′-like community members, we screened clones containing ISSoc sequences for those that were derived from Synechococcus OS-A or Synechococcus OS-B′ (see Materials and Methods). Of the 5,025 reads containing ISSoc sequences, 1,527 were identified as Synechococcus OS-A-like (30.4%), 2,167 were Synechococcus OS-B′-like (43.1%), and 523 (10.4%) could not be distinguished between the two based on our criteria. Of the remaining 808 sequences, 530 (10.4%) have best hits to a Synechococcus species, and an additional 76 sequences have best hits to other cyanobacteria. No database match (E value cutoff for BLASTN, 1e−10; that for BLASTX, 1e−5) was found for 164 reads. The remaining 38 reads have greatest similarity to a variety of organisms, although the similarity is not high enough and does not cover a long enough stretch of the read to make a confident taxonomic designation. Since there is no clear evidence of ISSoc sequences in other organisms and since nearly all (94.5%) of the metagenomic sequences that harbor ISSoc sequences appear to have been derived from Synechococcus species, we conclude that ISSocs are limited to Synechococcus-like organisms.
Fig. 1. ISSoc subfamily abundance in the metagenome. Reads were binned taxonomically by similarity to the reference Synechococcus OS-A and Synechococcus OS-B′ genomes. OS-like, reads met bin criteria for both references; other, reads met bin criteria (more ...)
To compare the ISSoc content of the binned metagenomic reads to those of Synechococcus OS-A and Synechococcus OS-B′, we estimated the expected percentages of ISSoc sequence in the Synechococcus OS-A-like and Synechococcus OS-B′-like bins and compared them to the observed contents (). The observed percentage of IS content for the Synechococcus OS-A-like bin was within 1 standard deviation of the expected value, while the Synechococcus OS-B′-like bin had an observed value between 1 and 2 standard deviations from the expected percentage. This suggests that the total ISSoc content of the populations from which the metagenomic data set was derived is not significantly different from that for the genomes of the cultured isolates. The apparent stability in the abundance of ISSocs seen in the Synechococcus OS-A-like and Synechococcus OS-B′ populations may indicate that these levels represent a “carrying capacity” for ISSocs in these species.
Genomic IS content in environmental populations of Synechococcus
The subfamily distribution observed in the metagenome was similar to that seen in the two reference genomes. The most abundant ISSoc subfamilies (ISSoc1 and ISSoc2) were found in both the Synechococcus OS-A-like and Synechococcus OS-B′-like bins. Subfamilies with lower abundances were found in only one or the other bin: ISSoc3 and ISSoc5, which are present in the Synechococcus OS-A genome but not the Synechococcus OS-B′ genome, were found only in the Synechococcus OS-A-like bin, and ISSoc9 to -13, which are found exclusively in the Synechococcus OS-B′ genome, were found only in the Synechococcus OS-B′-like bin. This may be due to random chance, as the most abundant ISSocs are statistically most likely to be transferred laterally. We cannot rule out, however, that some selective pressure that prohibits the persistence of all ISSoc families in both species is at work.
We did find evidence, however, that some ISSoc families that did not have intact copies in both reference genomes were active in both populations in the environment. The sole copy of ISSoc8 in the Synechococcus OS-A reference genome is interrupted by a gene of unknown function. In the metagenome, an intact copy of ISSoc8 was found on a sequence assigned to the Synechococcus OS-A-like bin (YMAC826TR). Furthermore, the Synechococcus OS-B′ genome has only mutated and partial copies of ISSoc4, but the Synechococcus OS-B′-like bin contains genes with higher similarity to intact copies of ISSoc4 from Synechococcus OS-A than to the compromised copies found in Synechococcus OS-B′, suggesting that ISSoc4 may be intact and active in some Synechococcus OS-B′-like community members.
Evidence for transmission of ISSocs between Synechococcus OS-A and Synechococcus OS-B′.
Genome regions showing unusually high nucleotide sequence conservation between organisms are evidence of a recent lateral gene transfer event. Two lines of evidence from our data set suggest recent lateral gene transfer in Synechococcus
. NAID between copies of ISSoc1, ISSoc2, ISSoc7, and ISSoc8 from Synechococcus
OS-A and Synechococcus
OS-B′ is >92%, on average, and in many cases is >99%. Overall, the average nucleotide identity (ANI) (20
) between Synechococcus
OS-A and Synechococcus
OS-B′ is 82.7%. In addition, a phylogenetic tree of the ISSoc1 family does not show genome-specific branching ().
Fig. 2. Phylogenetic analysis of the ISSoc1 subfamily does not show species-specific branching. All full-length copies of ISSoc1 from both Synechococcus OS-A and Synechococcus OS-B′ were used to create a rooted neighbor-joining tree, using the maximum (more ...)
Such equivalent sequence variation is unlikely to appear independently in two lineages. It is also unlikely that these sequences were present in the last common ancestor but avoided the genetic drift observed for the other orthologs. Thus, we conclude that there are ongoing exchanges of DNA between these two species that have led to this distribution.
The mechanism may be a general DNA transfer event. We identified a syntenic region with >95% NAID (Synechococcus
OS-A positions 683,000 to 697,900 and Synechococcus
OS-B′ positions 2,208,000 to 2,223,800) that contains an intact ISSoc1 (CYA_IS004/CYB_IS002), demonstrating that large genomic segments can move between the two species and carry ISSocs with them (A). It is not known if a specific transfer machinery is required for such large genomic regions to be moved across organisms or whether the natural transformation and competence uptake system present in these and other cyanobacteria (2
) can function in this capacity.
Fig. 3. Putative laterally transferred regions contain ISSoc sequences. The syntenic regions displayed (A, B, and C) are >93% identical (nucleotide identity). Numbers indicate genome coordinates. The OS-B′ region in panel C is presented reversed (more ...) ISSoc activity in natural populations.
Many of the metagenomic sequence reads containing IS sequences were fully syntenic with one or both reference genomes (936 [66%] of the Synechococcus
OS-A-like sequences, 861 [43%] of the Synechococcus
OS-B′-like sequences, and 222 [45%] of the ambiguous sequences), indicating that many of the ISSoc insertions (i.e., insertion of a specific ISSoc at a particular location) in the reference genomes (particularly for Synechococcus
OS-A) are common in the population. The lower percentage of reads syntenic to Synechococcus
OS-B′ indicates a larger degree of variation within the Synechococcus
OS-B′-like population, an observation consistent with earlier analysis of the metagenomic data set (4
Metagenomic reads containing ISSoc sequences that were not syntenic with their reference genome were examined further to see if they provided evidence of IS activity in the natural environment. Metagenomic sequence reads were screened for those that had two nonoverlapping alignments to noncontiguous regions of one of the reference genomes. These sequence reads were considered evidence of an IS insertion event if one of the alignment regions consisted of an ISSoc sequence and was not uniquely mappable ( A) or if the genomic region between the two alignment regions consisted entirely of IS sequence (B). For some genomic locations, multiple metagenomic reads showed identical IS events. This may indicate the presence of subpopulations with alternate genomic structures within the community. We do not know, however, if the spatial distribution of these subpopulations within the larger community is patchy or even; it is possible that these insertions have caused mutations that are beneficial in specific microniches in the environment.
Fig. 4. Evidence of ISSoc activity detected by comparative analysis. (A) Incongruous read analysis. Metagenome sequences with two (or more) nonoverlapping regions mapped to nonadjacent areas of the reference genome, and one region consisted entirely of ISSoc (more ...)
All subfamilies of ISSocs found in Synechococcus OS-A were found to show evidence of transposition activity. The number of events for each ISSoc was directly proportional to the abundance of that ISSoc in the genome, suggesting that all ISs in the Synechococcus OS-A-like population have similar rates of activity. In the Synechococcus OS-B′-like bin, a larger proportion of events was observed for ISSoc1, and no activity was observed for ISSoc14 (although this may simply be due to trying to score a rare event). ISSoc1 and ISSoc2 were found in both species, but different activities were observed. ISSoc2 is highly abundant and showed many events in Synechococcus OS-A but is sparse and showed few events in Synechococcus OS-B′. Conversely, ISSoc1 is highly abundant and displayed many insertion events in Synechococcus OS-B′ and was moderately abundant in Synechococcus OS-A, with a directly proportional number of events observed. While it is possible that bursts of activity after the lineages separated led to the observed inequality in IS content, maintaining this difference over long periods would be unlikely if, as suggested by our data, copies of these ISs are passed between the two subpopulations.
Another possible explanation for the observed variation in IS abundance is the difference in the thermal environments of these two organisms. Synechococcus OS-A dominates in mats that experience temperatures of 58 to 65°C, while Synechococcus OS-B′ dominates in mats where the temperature fluctuates from 51 to 61°C. The enzymes within these organisms have likely evolved to be most active in their native temperature range, and it is reasonable to assume that different ISSoc transposases have different optimal temperatures for activity. An ISSoc transferred between the two populations could encounter temperatures at which it has poor or no activity, resulting in a lower abundance in that population. This could also be the mechanism for the subfamily restriction observed between the two populations. Temperature has not yet been shown to be a selective force shaping these populations, but deep sequencing data derived from specific common genes (e.g., those for photosynthesis) across the temperature gradient in the microbial mats may provide evidence for changes in protein structure and activity as a function of temperature.
Detection of deleterious mutations.
Transposition events that compromise a critical genetic locus (be it a coding or regulatory sequence) can result in the death of the individual, making these mutations difficult to study in culture, but the metagenomic approach employed here captured a snapshot of the population, including individuals with potentially lethal mutations that had not yet been selected. We identified a number of interrupted genes by examining the locations of putative IS insertions in the metagenome data set (see Table S3 in the supplemental material). Many of these interrupted genes have poorly characterized functions and are probably not critical to cell survival; however, some have functions that are likely critical or highly advantageous to the cell. These include dgkA (encoding diacylglycerol kinase, involved in phospholipid biosynthesis) and purE (encoding the phosphoribosylaminoimidazole carboxylase catalytic subunit, involved in purine biosynthesis) in the Synechococcus OS-A-like bin and, more dramatically, gyrA (encoding the DNA gyrase subunit, involved in maintenance of DNA topological isomers) and dnaX (encoding the DNA polymerase III subunit, involved in DNA replication) in the Synechococcus OS-B′-like bin. This is evidence that ongoing IS activity in the population is producing selectable mutations and thus that IS-induced mortality is an ongoing process that affects survival in the environment.
Application of theoretical models.
The low abundance of ISs in most bacterial and archaeal genomes suggests that, in general, ISs are deleterious to individuals that acquire them and, furthermore, that continued IS accumulation in a population will cause its extinction. So how do some organisms tolerate a high abundance of ISs without going extinct?
Transmission, the movement of ISs between individuals, has been suggested to be critical for IS persistence within a host population (28
). A transposition event can be fatal to the host and therefore also to the resident IS. If all individuals in a population harboring copies of the IS die, a situation most likely when only a few individuals in the population are infected, the IS becomes extinct in the population. Thus, it is beneficial to the IS to have a nonvertical mechanism for transmission—it can then be passed to other members of the population regardless of its effect on its host. However, if the transmission rate is high enough to make the IS ubiquitous in the population, then the population might be susceptible to extinction if the transposition rate causes IS-induced mortality to exceed the growth rate.
We identified several syntenic genome regions in Synechococcus OS-A and Synechococcus OS-B′ with unusually high sequence conservation, which is indicative of recent lateral transfer. Some of these laterally transferred regions contain whole and/or partial ISs (for examples, see ), demonstrating that ISs can move between species by this mechanism. The phylogenetic analysis of ISSoc1 sequences from the two isolates () also supports this hypothesis, since the branching pattern indicates that sequences between species can be related more closely than sequences within the species. Since this represents exchange across a species boundary, it is fair to assume that transfer between individuals of the same species occurs at an equivalent, if not higher, rate.
Several lines of evidence suggest that ISs in these populations have a high transmission rate. First, there is a direct positive correlation between the transmission rate and the number of IS families in an organism (32
). We identified multiple IS families resident in the Synechococcus
OS-A (8 families) and Synechococcus
OS-B′ (11 families) genomes. Second, our analysis of the metagenome suggests that the IS content of natural populations closely matches that of the cultured isolates (). This suggests an even distribution of these IS families among individuals in the population (i.e., most families are present in most individuals). This type of even, ubiquitous distribution of IS families with various abundances is unlikely without a high transmission rate. In populations with a low transmission rate, low-abundance IS families would be cured from individuals at a certain rate, leading to patchy distribution in the population. A high transmission rate would promote the reintroduction of an IS family to an individual that had been cured of it. Third, several IS families are shared between the two species, suggesting that transmission is not only common but also somewhat promiscuous. The ability of an IS family to have a reservoir in an entirely separate host lessens the likelihood of it being eliminated from the community. Finally, phylogenetic analysis of the ISSoc1 family shows it to be different from the other ISSoc families in that it is quite diverse (see Fig. S2 in the supplemental material), raising the possibility that the high abundance of ISSoc1 in the reference genomes is due to the introduction of many different variants via lateral gene transfer rather than to duplication of resident ISSoc1s.
Any time that an IS transposes, it may cause a mutation that is deleterious to the host cell. Thus, total IS abundance will be greater where there is a lower chance of an insertion event negatively impacting the cell; thus, natural selection ultimately controls the number of ISs in a cell (32
). A simple mechanism for surviving a high IS abundance is for the ISs to have little (or no) activity. For the populations examined in this study, this mechanism would require all 15 resident IS families to have little or no activity. This seems unlikely because the genomic arrangements observed in the metagenomic data set are consistent with recent and ongoing IS activity.
Another mechanism for reducing deleterious effects of IS activity is for the IS to insert preferentially into genomic regions with neutral selection (i.e., intergenic regions or noncritical genes such as other ISs). Nothing is known about insertion site specificities for the ISSocs beyond the insertion site signals we identified (); however, we detected many gene interruptions in the metagenomic data set, some of which were likely deleterious to cell survival (see Table S3 in the supplemental material). Thus, there does not appear to be a strong bias against insertion into selectable loci, and the observed intergenic distribution of ISSocs in the genomes is the result of natural selection on the host populations rather than insertion site selection by the IS.
The Synechococcus OS-A-like population has fewer unique genomic locations where IS insertions are observed than the Synechococcus OS-B′ population (see Fig. S1 in the supplemental material). One interpretation of this observation is that under the environmental conditions acting upon these two populations, there are fewer sites within the Synechococcus OS-B′ genome where IS insertion is deleterious. With more available sites for insertion in the Synechococcus OS-B′ genome, one might think that the number of ISs in Synechococcus OS-B′ would be higher than that in Synechococcus OS-A; however, Synechococcus OS-A and Synechococcus OS-B′ have similar IS abundances. Insertion site availability is therefore not the only factor controlling abundance in the genomes.
A third mechanism for circumventing deleterious mutations is for the cell to have multiple copies of the genome present in each cell. Experiments have shown that Synechococcus
species can maintain multiple genome copies per cell (5
). Polyploidy would enhance the ability of the cell to survive IS infection in two ways. First, should a transposition event interrupt a critical gene, it would not be fatal to the host because there would still be an intact copy of the interrupted gene on the other copy or copies of the chromosome. Second, having another copy of the genome in the cell provides a template for repair of genes impacted by IS insertion or other mutation. Further experiments examining the relationship between DNA content and IS abundance are required to establish this hypothesis.
Ecological effects of transposition.
There is an ongoing debate over whether mobile genetic elements are purely selfish obstacles to survival of the individual or provide a selective advantage, even if it is sporadic or slight (18
). This question takes on greater importance for populations that have a high MGE abundance, such as the thermophilic cyanobacteria studied here. Does the higher abundance mean that populations are more susceptible to extinction, or is the selective advantage magnified? One possible scenario hinges on the extreme environment in which these organisms live. The hot spring microbial mat environment can change quite rapidly. Seismic activity affects the underground hydrology, leading to changes in temperature, flow rate, and chemical content of the effluent water. In addition, weather (rain, hail, etc.) can affect water chemistry and temperature or destroy the mats, necessitating successional reestablishment. Thus, there are many chances for strong selective sweeps to affect the community, which could lead to founder effects and small effective population sizes.
IS transposition could alleviate this situation by establishing many varied mutations in the population over a short period. If a mutation is deleterious under the current environmental conditions and purified from the population, the constrained nature of transposition (i.e., the movement of ISs is not entirely random) allows a similar or identical mutation to occur should conditions change such that it would be beneficial (or at least no longer deleterious). This could, in turn, lead to a rapid drift in the population that would expand the effective population size in a short period. Such a mutational mechanism could play a role in maintaining genetic variability in organisms lacking sexual recombination.