We have detected associations between REP sequences and IS elements in 6 out of 19 analyzed genomes. In the set of genomes without association, there are cases with absent IS elements along the genome (
Rickettsia conorii), and cases with scarce presence (
A. tumefaciens). In addition, we have adopted a strict criterion to select the IS elements associated with REP sequences. In several cases, the limited number of IS copies do not allow the determination of IS elements as a REP recognizer (Table ). One of these cases could be the case of IS1397 in
E. coli. It is experimentally proved that IS1397 [
11,
12,
17] can insert into REP elements but, within the available genomes of Enterobacteriaceae, there are only two instances of IS1397, which are in
E. coli CFT073 (Table ). In one of these instances, IS1397 is clearly inserted into an
E. coli REP element [see Additional file
13], but does not fulfill the association criteria of our study. Probably, the association with REP elements detected in analyzing genomes was only the tip of the iceberg, and it could be that many IS elements chose REP sequences as their targets in natural isolates.
While many IS elements display little obvious target site selectivity, some IS elements display considerable selectivity [
18]. We have detected a set of elements displaying high target selectivity by REP sequences (elements with type 1 association, Table ). There is experimental evidence for IS1397 and ISKpn1, suggesting that transposases themselves appear to be responsible for target specificity [
12]. The results of our study show that REP-targeting is not restricted to only one IS family, but it extends to five different IS families: IS3, IS110, IS4, IS256 and IS5. This is not surprising since the features of the DNA-target and of the transposase domain responsible for target choice, are not included in the criteria to define IS families [
16].
There are two families that include elements with experimental evidence of target selectivity by REP sequences. One family is the IS3 family which includes IS1397 and ISKpn1 [
11,
12,
17]. The other one is the IS110 family, to which belongs IS621 [
10]. We have detected additional IS elements belonging to these two families that specifically transpose into REP elements. The element representing the IS3 family that we have detected with a strong type 1 association with
P. syringae REP sequences is ISPsy8 (Tables and ). Members of the IS3 family are similar in many aspects, and form an extremely coherent and highly related family. Usually, the transposase is encoded by two ORFs that are sometimes overlapping. The OrfB products, similar to retroviral integrases [
19,
20], carry a DD(35)E motif and are responsible for catalytic activity. The target recognition capability is usually located in the OrfA protein, which in various members of the family exhibits a relatively strong helix-turn-helix motif that could provide sequence-specific binding to DNA [
21,
22]. Many members also carry a putative leucine zipper located at the end of OrfA that could be involved in multimerization [
23]. The N-terminal domain of the OrfA protein of ISPsy8 is positive for the Pfam Hidden Markov Model profile PF01527, named Transposase_8 (Table ). The region identified by this profile includes a helix-turn-helix motif at the N terminus followed by a leucine zipper motif that is also present in other IS3 family elements. Probably, this HTH motif is involved in DNA target choice. There are experiments proving that IS30 needs an H-HTH motif [
24], similar to the H-HTH motif involved in the DNA binding of the response regulator FixJ, in order to bind specific DNA target sequences [
24]. This data suggests that some transposases could recognize palindromic REP sequences in a similar way that some transcriptional regulators recognize their palindromic binding sites.
The IS110 family is the other family with one element, the IS621, with an experimentally proved REP sequence target [
10]. In our genomic analysis results, this family is the family most represented in the set of IS elements associated with REP sequences (Table ). IS110 is a very special family of IS elements that has characteristics very different from the other families. The majority of their members have not inverted repeats flanking the transposase gene and little overall similarity can be detected between the ends. The mechanism of transposition of these elements is not well determined. However, the target sequences of the members IS117 from
Streptomyces coelicolor and IS900 from
Mycobacterium paratuberculosis exhibit similarities to the circle junction, suggesting an insertion mechanism by site specific recombination [
16,
25,
26]. The site-specific invertase Piv from
Moraxella lacunata also belongs to the IS110 family. This protein is included in the IS110 family, because it exhibits amino acid homology with the transposases of this family. The tertiary structure of amino-terminal domain of Piv invertase has been modelled [
27], based on crystal structures of catalytic domains of HIV-1 integrase [
28], avian sarcoma virus integrase (ASV) [
29], and Tn5 transposase-related inhibitor protein [
30], and the predicted structure matched with mutagenesis studies [
27]. These results led Tobiasson and colleagues to propose that Piv invertase and the IS110 transposases could mediate DNA recombination by a common mechanism involving a catalytic DED or DDD motif [
27]. Our study adds data that relates the IS110 family with site specific recombination processes. ISPa11, ISPpu9 and ISPpu10 exhibit a high selectivity in their target choice and could share mechanisms of target recognition and/or catalytic activity with some site-specific recombinases and viral integrases.
Using pairwise whole genome alignments, it is possible to segment bacterial genomes into a common conserved backbone and strain-specific sequences called loops [
31]. These strain-specific loops include mobile elements, genes adapted to specific ecological environments, genes involved in pathogenicity, and other genes acquired by horizontal gene transfer. Strikingly, whole genome comparative analysis in Escherichia coli strains showed that strain-specific loops are associated with BIMEs (composed by different types of
E. coli REP elements) [
31]. In parallel, the mapping of the IS elements in different
E. coli strains revealed that ISs are associated with deletion of genome fragments and incorporation of horizontally acquired genes [
32]. In addition, some phenotypic features of
E. coli are explained by the inactivation of genes by IS elements. This is the case for the absence of expression of the OmpC porin with the correspondingly elevated expression of the OmpF porin reported for
E. coli B [
32]. Thus, REP elements and IS elements are related with similar genome evolution events. Our detection of REP elements as frequent targets for transposases could explain the involvement of both in common genome plasticity phenomena. All these facts suggest that REP-recognizer transposases could be contributing to the repertoire of bacterial adaptive mechanisms.
The IS4 family had not been previously related to REP sequence target selectivity, but our genome analysis has detected that ISRm22, a member of this family, has its nine copies inserted into REP elements along the
S. meliloti 1021 genome (Figure and and Table ). There is data about the Tn5 transposon that helps to understand this IS4 family. The Tn5 transposon is comprised of a cluster of antibiotic resistance genes bordered by two IS50 Insertion Sequences. IS50 belongs to the IS4 family and a truncated version of the IS50 transposase that contains the catalytic active site, termedTn5 transposase-related inhibitor protein, has been crystallized [
30]. The structure of its catalytic domain is probably similar to the Piv invertase member of the IS110 family of transposases (See above), connecting both families with detected REP-recognizer members. One of the characteristics frequently found for T
n5 transposition target sites is the palindromic structure of the insertion site, and also, there is a frequent occurrence of GC pairs at each end of the Direct Repeats [
33,
34]. The insertion sites that we have detected for ISRm22 fulfill both requirements (Figure ). Another proposed characteristic of T
n5 transposition is the preferable integration in actively transcribing or highly super-coiled DNA regions [
33]. In this sense, REP sequences are frequently located in regions between convergent genes. These DNA fragments are especially prone to be highly supercoiled since simultaneous transcription of both convergent genes can generate increased positive supercoiling at the end of the genes [
3]. Through testing the frequency of Tn5 insertion into specifically designed synthetic target sequences, it has been found that IS50 recognizes a preferred 9-bp sequence as its target. Moreover, sequences resembling this consensus target function optimally when embedded in a cluster of overlapping similar sequences [
33]. In accordance with these Tn5 data, we have found that the majority of ISRm22 copies are inserted into a cluster of REP sequences.
In the type 1 association cases (ISPa11, ISPpu9, ISPpu10, ISRm22 and ISRm19) the conserved sequence encompasses almost the complete REP sequence (Figure ). All consensus sequences share a high percentage of GCs, a greater conservation in GCs than in ATs, a palindromic structure, and a similar length (with the exception of ISPsy8 which displays a shorter consensus). In spite of the differences in their corresponding transposase sequences, ISPpu9 and ISPpu10 show the same point of insertion within the consensus sequence. REP-recognizer ISs could share some features in their target recognition domains. The determination of the transposases belonging to this subset could provide new clues to search for a common mechanism of recognizing the DNA target.
Target selectivity differs significantly between different ISs. While some ISs display high target specificity, other elements exhibit regional preferences that could reflect more global parameters such as local DNA structure [
16]. Thus, regional specificity has been related with GC or AT abundance, degree of supercoiling, DNA bending, replication related factors, and transcription related factors [
16]. Transposition activity is frequently modulated by various host factors. The list of such factors includes the histone-like protein IHF, which has been experimentally proved to bind REP sequences. Another two REP-binder proteins, DNA polymerase I [
35,
36] and DNA gyrase [
37-
39] have also been implicated in transposition activity. Clusters of REP sequences could provide an appropriate context to recruit all the elements playing a role in transposition. The detected type 2 associations could reflect a favourable context for transposition provided by REP sequence clusters in combination with a minor stringency for the DNA target.
REP elements have also been related to recombination events. Thus, REP sequences have been found at the recombination junctions of lambda bio transducing phages [
13] and it has been experimentally detected that amplification of plasmid F_128 is initiated by REP-REP recombination [
14]. REP elements are DNA points especially suitable for undergoing transposition or recombination events, because they are frequently placed at extragenic spaces limited by convergent genes [
5]. Their extragenic location would warrant that transposition did not disrupt genes. Their preference for spaces between convergent genes would make it probable that transcriptional regulatory signals remained unaltered, since the end of two genes is not a site for recruitment of transcriptional regulators. Moreover, taking into account that bacteriophage Mu is excluded from insertion in regions of DNA to which regulatory proteins are bound [
40], spaces between convergent genes would have the additional advantage of being sites always free of bound regulators. Furthermore, spaces limited by convergent genes usually are spaces between two independent transcriptional units. Hence, REP sequences could be used as tags, generally positioned at the end of the genes, indicating genome points especially advantageous for transposition. Thus, the characterization of some REP elements as hot spots for recombination and transposition suggests that, probably, REP elements are key elements in adaptive bacterial evolution. REP sequences provide genome points that warrant secure recombination and transposition without severe detrimental effects. Moreover, REP sequences are genome elements that can vary in position and number supplying additional variability to this set of selectable points of insertion. Taking this into consideration, it is probable that comparative genomics studies between phylogenetically close strains could be more revealing. Transposition plays a crucial role in horizontal gene transfer in bacteria, including the spread of antibiotic resistance [
41-
43]. In addition, some virulence genes are regulated by transposition [
44] and it is proven that some insertions, deletions, inversions and chromosome fusions are caused by transposition [
45,
46]. REP sequences could be playing a role in these important mechanisms.