|Home | About | Journals | Submit | Contact Us | Français|
The genomes of the two closely related freshwater thermophilic cyanobacteria Synechococcus sp. strain JA-3-3Ab and Synechococcus sp. strain JA-2-3B′a(2-13) each host several families of insertion sequences (ISSoc families) at various copy numbers, resulting in an overall high abundance of insertion sequences in the genomes. In addition to full-length copies, a large number of internal deletion variants have been identified. ISSoc2 has two variants (ISSoc2-1 and ISSoc2-2) that are observed to have multiple near-exact copies. Comparison of environmental metagenomic sequences to the Synechococcus genomes reveals novel placement of copies of ISSoc2, ISSoc2-1, and ISSoc2-2. Thus, ISSoc2-1 and ISSoc2-2 appear to be active nonautonomous mobile elements derived by internal deletion from ISSoc2. Insertion sites interrupting genes that are likely critical for cell viability were detected; however, most insertions either were intergenic or were within genes of unknown function. Most novel insertions detected in the metagenome were rare, suggesting a stringent selective environment. Evidence for mobility of internal deletion variants of other insertion sequences in these isolates suggests that this is a general mechanism for the formation of miniature insertion sequences.
Transposable elements (TEs), DNA segments that can relocate within a genome, are common in eukaryotes, bacteria, and archaea (10). Transposition events can result in deleterious mutations within the host cell (21, 24, 29, 40). Indeed, the genomes of most prokaryotes sequenced so far show a low transposon abundance, and the number of transposons present in a genome is generally positively correlated with genome size (39). TEs fall into two categories. Autonomous TEs (ATEs) encode a transposase, the enzyme that catalyzes the excision and reinsertion of the sequence elsewhere in the genome. Nonautonomous TEs (NTEs) carry the sequence signals required for transposition but have no coding regions within them, so they rely on the transposase from an ATE for activity (20).
The best-studied class of NTEs is the miniature inverted-repeat transposable elements (MITEs). First identified as “transposon-like elements” in Neisseria gonorrhoeae (12), much of the early investigation was performed in plant species (7–9). There are now dozens of identified MITE families from plant and animal genomes (20). MITEs consist simply of a short central region surrounded by terminal inverted repeats (TIRs) that are the recognition site for the cognate transposase. MITEs duplicate their target site upon insertion, causing them to be flanked by terminal direct repeats (TDRs). Most MITE families are associated with known ATEs by similarity between the TIRs. While they are abundant and well studied in eukaryotes, we are only just beginning to understand their distribution and importance in bacterial populations (16, 17).
The recent availability of genome sequences has enabled the study of MITEs in bacteria. Genome comparison between closely related organisms has been used to discover small repeats carrying TIRs in Neisseria (12, 28), Rickettsia (32), Pneumococcus (33), Caulobacter (11), the enterobacteria (15, 22, 35), and the cyanobacteria (27, 40, 41). Bacterial MITES vary greatly in sequence and structure and have been observed to affect the expression and function of genes in their hosts. MITES in Neisseria have been shown to affect the stability of cotranscribed RNA (13, 28). Other MITES have putative transcriptional signals in their TIRs, possibly stimulating transcription of adjacent genes (6). MITES do not encode any genes, yet some have short open reading frames in their TIRs that can fuse potentially functional motifs to genes into which they have inserted, possibly leading to altered function (1, 15, 32). There is little known about the origin of MITE sequences in bacteria, and transposition activity is largely inferred from comparisons between genome sequences of related cultured isolates or even different species, although biologically verified transposition activity of MITEs in Enterobacter cloacae and Pseudomonas syringae has recently been reported (34, 37).
The thermophilic Synechococcus strains Synechococcus sp. strain JA-3-3-Ab (Synechococcus OS-A) and Synechococcus sp. strain JA-2-3B′a(2-13) (Synechococcus OS-B′) are members of the photosynthetic microbial mat community at Octopus Spring in Yellowstone National Park (19). Their genomes have been sequenced, and they have been shown to have an unusually high abundance of a simple class of TEs known as insertion sequences (ISs) (4, 31). Synechococcus OS-A harbors 71 full-length ISs from 9 IS subfamilies (termed ISSocs), and Synechococcus OS-B′ harbors 82 full-length ISs from 12 ISSoc subfamilies (4). Six of the ISSoc subfamilies have near-identical copies both populations. Most orthologs between these two isolates share only ~83% nucleotide identity. This unusual level of sequence conservation, in addition to evidence of recent lateral gene transfer between these two populations (4, 31), suggests that these ISSocs have been passed between the two species. In addition to the full-length copies, there are numerous partial copies present on the genomes. Some of these partial copies are missing one or both termini of the ISSoc sequence, while others are internal deletions that preserve the termini. A metagenomic data set generated from samples taken from the same mats from which the cultured isolates were obtained (4) allows examination of transposition activity in this system. Comparison of the genome sequences against the metagenomic data set has been used to precisely define the ends of the ISs and gauge transposition activity in the environmental population (31).
Further examination of this data set has revealed two novel, small, nonautonomous transposable elements derived by internal deletion of ISSoc2. Structural features distinguish them from other known bacterial NTEs. Further, recent activity of these novel NTEs in the natural environment is demonstrated by the variation in the insertion locations observed in the metagenome relative to the isolate genomes.
The genomes of Synechococcus OS-A and Synechococcus OS-B′ have previously been published (4). They are available from GenBank (accession numbers CP000239 and CP000240). The metagenomic data set consists of 202,329 paired-end sequence reads derived from 105,373 fosmid clones. Its generation has been described (4). The maximum insert size tolerated by the fosmid vector is 10 kb; the average sequence length in the data set is 829 bp. The sequences are available from GenBank (NCBI project_ids 20717, 20719, 20721, 20723, 20725, and 20727).
The metagenomic data set was searched (Megablast ) for reads with similarity to ISSoc2 or the internal deletions variants, ISSoc2-1 and ISSoc2-2. Reads were binned according to the ISSoc2 sequence variant to which they had best similarity (typically ≥95% nucleic acid identity [NAID]).
The nucmer alignment program from the MUMmer package (14) was used to align all the metagenome reads against both genomes. Identifying reads as “Synechococcus OS-A-like” or “Synechococcus OS-B′-like” was accomplished in a stepwise manner. The first pass identified sequences that aligned to a reference genome with ≥92% NAID contiguously across ≥ 95% of the read length. Since the two reference genomes show massive rearrangement relative to each other and the metagenome demonstrates that rearrangements are common in the natural population (data not shown), the second pass identified sequences that had multiple (usually two) alignment regions with ≥92% NAID that were noncontiguous and nonoverlapping and whose lengths summed to ≥95% of the read length. Clone membership was used to draw in reads that did not meet the above criteria: clone mates of binned reads were screened for those having ≥92% NAID across ≥50% of their length (noncontiguous) to the same reference as their mate. Any read whose clone mate was not a member of the same species-specific bin (Synechococcus OS-A-like or Synechococcus OS-B′-like) was removed from the bin. If both clone mates met the criteria for both Synechococcus OS-A and Synechococcus OS-B′ (i.e., appeared to derive from a genome region with an unusually high sequence identity between the two genomes) or if clone mates were members of opposite bins, the sequences were put into a “Synechococcus OS-A/B′-like” bin. If only one or neither mate met the criteria for either Synechococcus OS-A or Synechococcus OS-B′, the sequence reads were classified as “other.” Reads lacking a clone mate were binned based on their own characteristics.
Metagenome sequence reads were searched against the reference genomes using nucmer (14). Results were screened for multiple-alignment regions having >92% NAID, which overlapped less than 67% of the length of either alignment region and which were nonadjacent (with adjacency defined as being within 50 nucleotides [nt] to allow for short indels). Sequence regions containing IS segments usually resulted in regions of alignment to many areas of the genome. If any of the IS alignments met the synteny criteria, it was not considered further.
Members of the ISSoc2 subfamily of insertion sequences carry two genes, encoding a transposase and a resolvase. Sequence and structural similarity to IS607 from Helicobacter pylori places the ISSoc2s within the IS200/IS605 family in the Chandler-Mahillion nomenclature scheme (36). ISSoc2 is the most abundant IS in Synechococcus OS-A, with 19 intact copies, and there are two intact copies in Synechococcus OS-B′. In addition to these intact copies, there are many partial copies that we categorized as truncations (lacking a segment of sequence that includes one terminus of the IS), fragments (lacking segments at both termini), or internal deletions (with both ends being intact but missing internal sequence) (28).
While the truncations and fragments show heterogeneity in their length and termini, two internal deletion variants (ISSoc2s) were observed to have multiple near-identical copies. ISSoc2-1 has 21 copies in Synechococcus OS-A and 44 in Synechococcus OS-B′, while ISSoc2-2 has 9 copies in Synechococcus OS-A and 10 in Synechococcus OS-B′. The ISSoc2s have both structural and sequence differences (Fig. 1; see Fig. S1 in the supplemental material); however, all copies of ISSoc2-1 share >97% NAID, as do all copies of ISSoc2-2. The terminal sequences of both variants are identical to those of ISSoc2, but ISSoc2-1 has more internal sequence similar to ISSoc2, whereas ISSoc2-2 has a sequence segment of unknown origin.
We identified 494 sequences in the metagenomic data set that contained ISSoc2-specific sequence, 571 that contained ISSoc2-1-specific sequence, and 223 that contained ISSoc2-2-specific sequence (Table 1). Most of the metagenomic sequences containing an ISSoc2 or ISSoc2 (1,024 out of 1,288) could be confidently assigned as being derived from either Synechococcus OS-A-like or Synechococcus OS-B′-like individuals in the community, and an additional 159 derived from regions that appear to be recently laterally transferred between Synechococcus OS-A and Synechococcus OS-B′ (i.e., syntenic and sharing >95% NAID), making it impossible to determine from which of the two they derived. The remainder of the reads (109) had their ISs masked and were searched against the NCBI nucleotide database. Most (98) had either Synechococcus OS-A or Synechococcus OS-B′ as their best hit, with high identity (>90% NAID).
ISSoc2s appear to be present in fewer copies in individuals found in the natural environment than in those found in laboratory culture. The ratio of ISSoc2 to ISSoc2-1 to ISSoc2-2 in the Synechococcus OS-A genome is 2:2:1, while the ratio for reads identified as Synechococcus OS-A-like in the metagenome was 9:3:1, and the ratio of those found in the Synechococcus OS-B′ genome is 1:10:5, whereas the Synechococcus OS-B′-like metagenomic reads had a ratio of 1:5:2.
To search for evidence of ISSoc2 transposition activity in the environmental population, sequences from the metagenomic data set were compared to the Synechococcus OS-A and Synechococcus OS-B′ genomes. For presentation purposes, in this report we define “insertion” and “excision” events relative to the reference genomes; however, these observations cannot determine whether what we term an “insertion event” in the environmental sequence is not actually the result of an excision event which occurred in the cultured isolate and vice versa. The metagenomic sequences containing ISSoc2 or ISSoc2 sequence were further examined to identify those showing an insertion at a location where the reference genome does not have an insertion (insertion events) (Fig. 2).
For ISSoc2, 339 sequences were categorized as Synechococcus OS-A-like (see Materials and Methods for details of sequence binning), of which 106 (31%) display alternate insertion locations at 56 distinct locations (Table 1). In the Synechococcus OS-B′-like bin, there are 69 reads with ISSoc2 sequence, with 29 (20%) showing alternate insertion location at 17 distinct sites. Many alternate insertion locations were observed only once (29/55 and 10/17); others were found in up to 7 metagenomic sequences. These data serves as a standard of comparison for environmental transposition activity due to the ISSoc2 transposase.
A similar pattern of activity was observed for the ISSoc2s in the Synechococcus OS-A-like bin (Table 1). For ISSoc2-1, 36 out of 102 sequence reads (35%) show an alternate insertion location, and 8 out of 36 reads (25%) containing ISSoc2-2 do as well. In the Synechococcus OS-B′-like bin, 224 out of 346 (65%) of the ISSoc2-1-containing metagenome sequences and 71 out of 132 (54%) of ISSoc2-2-containing metagenome sequences show an alternate insertion location. An ~2:1 ratio of insertion reads to distinct insertion sites is observed in all cases. As seen for the intact ISSoc2 activity, approximately half of all insertion sites were observed only once in the metagenomic data set.
Transposition activity was also evaluated by screening metagenomic sequences for those that lack an ISSoc2 or ISSoc2 insertion at locations where one exists in a reference genome (Fig. 2). We identified 106 metagenomic sequences in the Synechococcus OS-A-like bin derived from regions in Synechococcus OS-A where ISSoc2 is inserted. Of these, 36 (34%) did not contain an ISSoc2 (Table 2). At least one metagenomic reading lacking ISSoc2 was identified for 11 of the 19 insertion sites in Synechococcus OS-A; however, since there is only ~4.2× coverage of the Synechococcus OS-A genome sequence in the metagenome, it is possible that our sampling missed “excision events” that occurred at those other eight sites. In the Synechococcus OS-B′-like bin, we identified only 14 sequences derived from the two regions where the ISSoc2 insertions are in Synechococcus OS-B′. Of these, 6 (43%) lack ISSoc2 sequence.
ISSoc2-1 and ISSoc2-2 “excision events” are observed in both the Synechococcus OS-A-like and Synechococcus OS-B′-like bins (Table 2). Most insertion sites had at least one metagenomic sequence showing absence of the ISSoc2. Of the 109 Synechococcus OS-A-like metagenome sequences derived from regions containing ISSoc2-1 insertions in Synechococcus OS-A, 43 (39%) lacked an ISSoc2-1 insertion, as did 14 of 34 (41%) metagenome sequences from ISSoc2-2 insertion regions. Higher ratios of absence to presence were detected in the Synechococcus OS-B′-like bin, with 133 of 229 (58%) metagenome reads showing a lack of ISSoc2-1 and 28 of 52 (54%) metagenome reads showing a lack of ISSoc2-2.
Internal deletion variants of other ISSoc subfamilies were identified in our examination of Synechococcus OS-A and Synechococcus OS-B′. ISSoc1, the most abundant IS in Synechococcus OS-B′, has 12 internal deletion variants in Synechococcus OS-B′ (and one in Synechococcus OS-A), ISSoc5 has five in Synechococcus OS-A, ISSoc6 has 19 in Synechococcus OS-A, and ISSoc10 has four in Synechococcus OS-B′. Many of these internal deletion variants share structural similarity with other copies of their class. Sequence conservation between these genomic copies, however, is lower than was observed for the ISSoc2s. We screened the metagenome for copies of these internal deletion mutants to look for evidence of insertion/excision activity (Table 3).
One hundred four metagenomic sequences have regions with best similarity to internal deletion mutants of ISSoc1 (ISSoc1s). Most are syntenic with their cognate reference genome: only three (all in the Synechococcus OS-B′-like bin) show an alternate insertion location for the ISSoc1 sequence.
For ISSoc6, there were 200 metagenomic reads with regions similar to known ISSoc6 internal deletions (ISSoc6s). As expected (since ISSoc6 is found only in Synechococcus OS-A), all were in the Synechococcus OS-A-like bin. Ten showed alternate insertion locations, representing only 2 unique insertion events. Six sequences showed an insertion adjacent to locus CYA_IS00031 (encoding an ISSoc2 transposase). The other four sequences showed an intergenic insertion between CYA_2460 (encoding an aminotransferase) and CYA_2461 (encoding an oxidoreductase).
We have identified two classes of internal deletion variants of the Synechococcus insertion sequence ISSoc2, ISSoc2-1 and ISSoc2-2, members of which are conserved in size, sequence, and structure. Although they lack TIRs and DRs, the terminal regions are conserved. The structure of the transposition complex and mechanism of action have been determined for the IS200/IS605 family insertion sequence IS608 from H. pylori. The left and right recognition signals consist of short hairpins approximately 20 bp from the termini and an adjacent short region complementary to the genome insertion site (3). While the transposase in this system (HpTnpA) is not homologous to the transposase found in ISSoc2, this example does demonstrate that the transposition signals in insertion sequences lacking TIRs do reside in the termini, and thus it is reasonable to propose that ISSoc2s are competent for transposition.
ISSoc2-1 and ISSoc2-2 are not simply different internal deletions of ISSoc2. There are sequence differences and small regions of DNA of unknown origin in their interiors. We have observed only nearly identical copies of these variants; there are no intermediate or “transitional” variants that are more similar to the parental ISSoc2 sequence. Thus, it is likely that ISSoc2-1 and ISSoc2-2 were formed in independent events. The presence of identical copies in Synechococcus OS-A and Synechococcus OS-B′, along with evidence that ISSoc2 and other ISs have been transferred between Synechococcus OS-A and Synechococcus OS-B′ (4, 31), suggests that the ISSoc2s spread by lateral gene transfer.
There are several factors that distinguish ISSoc2s from MITEs. Based strictly on the definition of MITEs, ISSoc2s do not qualify, as they lack terminal inverted repeats and are not surrounded by direct repeats; however, functionally ISSoc2s appear to be equivalent to MITEs. The origin of most other known bacterial MITEs is murky because the only sequence shared between described MITEs and their cognate autonomous transposable elements is the TIRs; the core region of known MITEs has no similarity to known TEs (20), with the one exception being the mPing family of MITEs in rice, which has homology to the Ping TE (23, 25, 30). Thus, what truly distinguish ISSoc2s from most known MITEs are the segments of the transposase gene that they contain in their core. MITEs have been categorized into two types (5). Type I have TIRs precisely identical to those of known ISs and are thought to derive through internal deletion of the intact IS. Type II has TIRs that are similar but not precisely identical to those of known ISs and thus are thought to have originated through convergent evolution. Our observations of the ISSoc2 elements are clear evidence that they form through internal deletion of active ISs, and our detection of similarly structured variants of other insertion sequences suggests this is a general mechanism for MITE formation in Synechococcus populations.
The available metagenome sequence allowed us to examine both the activity and impact of ISSoc2s in the natural population. We observed placement of ISSoc2 elements in novel locations relative to the genomes of Synechococcus OS-A and Synechococcus OS-B′ and observed the absence of these elements at locations where they exist in Synechococcus OS-A and Synechococcus OS-B′. This alternate placement does not appear to be due to general recombination events, because the endpoints of the insertions are precisely the borders of the ISSoc2 sequences. In addition, we do not believe these elements to be products of a senescence process that degrades intact ISSoc2 insertions, because there are conserved sequence differences between ISSoc2-1, ISSoc2-2, ISSoc2, and we observe only near-identical copies of the internal deletion variants. Thus, the most likely explanation for this distribution is that these elements are functional nonautonomous transposable elements that are active in the natural populations of these organisms.
We observed a ratio of sequences showing novel placement to unique insertion locations of 2:1. A majority of the locations showed only a single instance, while a few had up to 10 sequences. That is to say, we do not observe large subpopulations with specific insertions that may represent beneficial (or even neutral) mutations. This suggests that these novel insertions are short-lived either because they are selected against or because they are excised either to insert somewhere else or to be lost.
In Synechococcus OS-A, the rate of detection of ISSoc2 insertion and excision events is similar to that of the ISSoc2 insertion and excision events. This suggests that they are acting at a similar rate in the natural populations. In Synechococcus OS-B′, however, there is a higher rate of detection of ISSoc2 insertion and excision events. Previous studies have described a higher diversity in genomic structure in Synechococcus OS-B′ populations (4, 31). The presence of transposable elements such as insertion sequences and the internal deletion variants described here could play a role in variation in genomic structure, keying homologous recombination within the chromosome. Since we observe IS activity in both populations but greater diversity in the Synechococcus OS-B′ population, we believe that environmental factors and not transposition rate are the stronger determinant of genomic diversity. That is to say that both populations undergo similar rates of transposition, but fewer of the resulting variants persist in Synechococcus OS-A populations due to more stringent selection.
The stable growth conditions provided by laboratory culture might allow accumulation of ISSoc2s (and other transposable elements) in locations that would be detrimental or fatal to individuals in the natural environment. Analysis of the abundance of the ISSoc2s indicates that they are present in lower numbers in the natural population than in the cultured populations. Most insertion locations in the reference genomes are intergenic; however, it is possible that some of these insertions affect regulation of adjacent genes and thus could affect the viability. Some insertions that interrupt genes in the reference genomes were not observed in the metagenome, for example, in Synechococcus OS-A, insertions of ISSoc2-1 into an acyl phosphatase gene (CYA_0362/CYA_0361) and into a DNA methyltransferase gene (CYA_1314/CYA_1313). This would be expected if these functions were required for survival in the natural environment.
While most ISSoc2 insertions are intergenic, interrupted coding genes were observed in both the reference genomes and the metagenome reads (Fig. 2; see Table S1 in the supplemental material). Many of the insertions interrupting coding sequences (CDS), however, are within 30 nucleotides (nt) of the 3′ end of the gene, making it unlikely that gene function is lost, although it may be altered. To gauge the selective pressure on insertions at each location observed, we compared the number of metagenome reads showing an insertion at that location and the number lacking an insertion (see Table S1 in the supplemental material).
While many of the insertions that were novel to the metagenome (i.e., not present in the reference genomes) were observed only once, suggesting that most insertion events result in variants that are quickly selected from the population, a handful appear to be prevalent. In the Synechococcus OS-A-like bin, we identified an ISSoc2-1 insertion interrupting dnaX (GenBank locus_id CYA_0563, encoding the DNA polymerase III gamma and tau subunits) 13 nt from the 3′ end of the gene. This insertion was observed in all 7 metagenome reads that mapped to that region. The insertion causes the terminal 3 amino acids (LPF) to be replaced by the amino acid sequence HDSSQ. This change, being short and located at the carboxy terminus of the protein, is unlikely to affect the protein's function but could alter its stability. It is unlikely that this insertion affects the transcriptional profile of this region either, since the downstream CDS is in the opposite orientation. No rho-independent terminators have been predicted downstream of dnaX (26); however, there is a putative hairpin and a putative stem-loop structure that could be termination signals. The insertion separates the end of the dnaX CDS from these features, but ISSoc2-1 contains several putative hairpin structures that also may be transcription control features, including one of 30 bp that is only 30 nt from the insertion sequence terminus. In the Synechococcus OS-B′-like bin, an ISSoc2-1 insertion into the 5′ end of the era gene (CYB_1268, encoding a GTPase that in Escherichia coli is involved in coupling growth to cell division) was observed in all 9 metagenome reads that mapped to the region. In this case, translation is still possible from an open reading frame in the terminal region of the ISSoc2-1 that contains an ATG start codon and fuses in frame to the era coding frame. Regulation of expression of this gene is likely altered by this mutation. That these mutations appear to have become fixed (or at least prevalent) in the environmental populations suggests either that there is some beneficial effect to their presence or that they are neutral mutations linked to an advantageous trait.
Intergenic ISSoc2 insertions might result in selectable phenotypes. Some intergenic insertions found in the reference genomes were not identified in any metagenomic reads. Others were found in all metagenomic reads that mapped to that location. These biases in the representation of these insertions in the population are likely due to a selective advantage. Insertions at these sites may be either disrupting or introducing regulatory elements that affect expression of adjacent genes.
Genome sequencing technology has enabled the discovery of many bacterial MITEs. The availability of complete genome sequences allows inter- and intragenomic comparisons that identify repeated sequences with TIRs and DRs. The simple structure of MITEs has also allowed the development of computational methods that can identify these elements independent of comparative analyses (27). However, our results demonstrate that reliance on the presence of TIRs might underestimate the number of NTEs present.
These additional mobile elements affect the cell in a manner similar to that of the cognate IS. Transposition activity can result in gene interruptions or other mutations that affect the survivability of the host individual. To fully understand the role of these elements, long-term in vitro evolution studies will need to be performed to track the rates and patterns of transposition of both the IS elements and associated NTEs.
This work was supported by the Frontiers in Integrative Biology Program of the National Science Foundation (grant EF-0328698). D. Bhaya acknowledges support from the Carnegie Institution for Science.
Published ahead of print 4 May 2012
Supplemental material for this article may be found at http://jb.asm.org/.