|Home | About | Journals | Submit | Contact Us | Français|
Integrons are natural expression vectors in which gene cassettes are integrated downstream of a promoter region by a site-specific recombinase. Gene cassettes usually consist of a single gene followed by a recombination site designated attC. A major unanswered question is how a gene becomes associated with an attC site. Here, we investigate the potential role of a specific lineage of group IIC introns, named group IIC-attC, in cassette formation. Group IIC-attC introns preferentially target attC while retaining the ability to target transcriptional terminators. We show using a PCR-based mobility assay with Escherichia coli that the S.ma.I2 intron from the genome of a clinical isolate of Serratia marcescens can target both attC site and putative terminator motifs of resistance genes. Quantitative results showed that S.ma.I2 is more efficient in targeting various attC sequences than three group IIC-attC introns (54 to 64% sequence identity) from the genomes of environmental isolates. We also show that purified group IIC-attC intron-encoded reverse transcriptases have both RNA-dependent and DNA-dependent DNA polymerase activities in vitro. These data permit us to suggest a new model for gene cassette formation, in which a group IIC-attC intron targets separately a transcriptional terminator adjoining a gene and an isolated attC, joins the gene and the attC by homologous recombination, and then splices and reverse transcribes a gene-attC RNA template, leading to the formation of a cassette.
Integrons and gene cassettes are considered important genetic elements in the evolution of multiresistance plasmids and transposons in gram-negative bacteria (39). Integrons can be categorized as mobile or chromosomal depending on their genomic location (i.e., on plasmids or on chromosomes). Mobile integrons and their cassettes are known to have a role in the dissemination of antibiotic resistance genes, whereas the majority of chromosomal integron cassettes include open reading frames (ORFs), most of whose products have no known function (21). Integrons are natural expression vectors composed of an integrase gene (intI) followed by a cassette promoter region and an integrase-specific recombination site (called an attI site) into which new cassettes are integrated (12). Gene cassettes usually consist of one promoterless gene associated with a distinct integrase-specific recombination site (called an attC site) that is located downstream of the gene. attC sites found in mobile integron cassettes exhibit little sequence similarity but contain a characteristic central palindromic sequence (11). attC sites can vary considerably in length (57 to 141 bp), and their sequence similarities are restricted primarily to the boundaries, which correspond to two pairs of conserved inverted repeats, 1L-2L and 2R-1R (40). Usually, a unique attC site is associated with one gene and is named by reference to the gene (e.g., the aminoglycoside resistance gene aadA1 and the attCaadA1 site). In contrast, attC sequences from chromosomal integrons with large cassette arrays are usually closely related and species specific, despite marked differences in the cassette ORF codon usage (3, 6, 10, 15, 32). These observations suggest that ORFs and attC sites of cassettes have independent origins and are associated by an unknown mechanism that plays a crucial role in integron evolution. By taking into account the unique structural characteristics of most integron cassettes (i.e., the absence of promoters, the paucity of noncoding sequence, and the presence of only one gene), it has been suggested that integron cassette genesis involves reverse transcription of mRNA molecules in an organism that encodes a reverse transcriptase (RT) (11). In such a model, the attC site may either be part of the original transcript (such as a Rho-independent transcription terminator) or added by an unknown mechanism from an isolated source of inverse repeat sequences (12, 30). The idea of reverse transcription became pertinent when we first reported the presence of a group IIC intron identified as S.ma.I2 inserted into a mobile integron between a structural gene and its associated attC site in the multiresistant Serratia marcescens SCH909 strain (4). Since then, a specific lineage of several group IIC introns, named the group IIC-attC introns, has been reported to be present in cassettes of mobile or chromosomal integrons (23, 27-29, 41, 46) or adjacent to isolated attC sites (18, 28).
Group II introns are catalytic RNAs and mobile retroelements that self-splice by means of a lariat intermediate (35). They are found in lower eukaryote organelle DNAs and also in several bacterial genomes. Group II introns are separated into several lineages based on their intron-encoded protein (IEP) sequences (36, 48). These mobile elements encode proteins having RT, RNA splicing (maturase), and sometimes DNA endonuclease (En) activities. The mobility of group II introns (known as retrohoming) involves invasion of a DNA target by an intron RNA and is mediated by a ribonucleoprotein complex composed of the excised intron lariat and the IEP (17). In the first step, cleavage of the DNA top strand occurs by reverse splicing of the excised intron RNA at the intron target site. In the next step, bottom strand cleavage is catalyzed by the IEP's En domain. Then the 3′ end of the cleaved bottom strand is used as a primer by the IEP's RT domain for reverse transcription of the intron RNA. In the final step, cellular repair processes complete the insertion of the intron (degradation of the intron RNA template, second-strand DNA synthesis, and ligation) into the new target site (38). Retrohoming is highly site specific because the intron RNA base pairs to a complementary DNA target known as intron binding site 1 (IBS1), IBS2, and δ sequence (or IBS3) to help position the reverse-splice site, while the IEP recognizes a small number of specific bases (24, 37).
Bacterial group IIC introns are a subgroup of group II introns that differ from the other bacterial introns in many RNA secondary structure aspects (42). Importantly for this study, in group IIC introns fewer base pairs are required for target site specificity (5 bp compared to 13 bp for groups IIA and IIB) because of missing IBS2-exon binding site 2 (EBS2) and shorter IBS1-EBS1 as well as IBS3-EBS3 pairing motifs (17). Nevertheless, group IIC introns are site-specific retroelements that are inserted directly after transcriptional terminator motifs or other inverted repeats, such as attC sites (28, 31). It was shown previously that recognition of a secondary structure, using the folded inverted repeats, was implicated in the target site specificity (19, 29, 31). Another important characteristic of group IIC introns is the absence of a C-terminal En domain in the IEPs. It has been shown that group II introns that cannot carry out site-specific second-strand cleavage due to the lack of an En domain are still mobile, but the residual mobility shows a pronounced strand bias and dependence on replication (20, 47).
In the present study, we searched for and found evidence for the potential role of group IIC-attC introns in gene cassette formation. We used an Escherichia coli two-plasmid mobility assay in order to compare the frequencies of mobility of four group IIC-attC introns (having 54 to 64% sequence identity) into attC sites and into putative transcriptional terminator motifs of resistance genes. We also purified group IIC-attC intron-encoded RTs in order to screen for both RNA- and DNA-dependent DNA polymerase activities in vitro. Together, our findings support a new cassette formation model in which a group IIC-attC intron uses both its specificity for DNA targets with stem-loop motifs and its IEP activities for targeting and then joining distant genes and attC sites.
Bacterial strains and plasmids are described in Table Table1.1. S. marcescens SCH909, Shewanella baltica OS155, and E. coli strains were grown in Luria-Bertani (LB) broth at 37°C or at the temperature indicated below. When necessary, antibiotics were used at the following concentrations: ampicillin (Ap), 100 μg/ml; and chloramphenicol (Cm), 68 μg/ml. Nitrosomonas europaea was cultured as described previously (18). Geobacter sulfurreducens genomic DNA and the S. baltica OS155 strain were kindly provided by The Institute for Genomic Research and by the DOE Joint Genome Institute, respectively. Total DNA was isolated using a phenol-chloroform purification method (34).
For PCRs the Phusion DNA polymerase (Finnzymes) was used according to the manufacturer's instructions. PCR primers were synthesized using an ABI-3900 DNA synthesizer from Applied Biosystems Inc. (Foster City, CA) (Table (Table22).
For each target site (attC or putative transcriptional terminator), secondary structures of the homing strand were determined using the MFOLD 3.2 program (49).
Intron mobility was determined by PCR amplification of the targeted sites using an E. coli two-plasmid assay in which an intron cloned into pUC18 inserts into a target site cloned into pACYC184 (Table (Table1).1). The recipient clones were obtained by cloning full-length complementary oligonucleotides (Table (Table2)2) or purified PCR amplicons of the target sites tested. For each mobility assay, the intron donor plasmid (Apr) and the recipient plasmid (Cmr) were transformed simultaneously in E. coli DH5-α competent cells and subjected to Ap and Cm selection. One colony of each double transformant was grown in 5 ml of LB medium in the presence of Ap and Cm at 30°C overnight. The overnight preculture (100 μl) was pelleted, inoculated into 5 ml of LB medium with both antibiotics, and incubated at 37°C until the optical density at 600 nm was 0.5. Expression of the intron was then induced by addition of 1 mM isopropyl-β-d-thiogalactopyranoside (IPTG), followed by incubation at 37°C for 3 h. Plasmid DNAs were extracted using a QIAprep kit (Qiagen). Mobility was screened using PACYC-5′ and one of the four intron-specific primers (NeI1.for, GsI2.for, SbI1.for, or SmaI2.for) (Table (Table22).
The homing frequency of S.ma.I2 was determined by colony patch hybridization. Purified plasmids (100 ng) from induced cultures were digested with PvuI. This enzyme cuts only in the intron donor plasmid, outside the intron sequence. E. coli DH5α cells were transformed with the PvuI-digested products and plated on LB medium containing Cm to select for cells containing recipient plasmids that were or were not interrupted by the intron. To calculate the intron mobility efficiency, expressed as the percentage of recipient plasmids that received the intron, 100 isolated Cmr colonies were patched onto plates containing LB medium with Cm. The patches were lifted onto nylon membranes and hybridized with a 5′-end-labeled probe (SmaI2.for) using [γ-33P]ATP (3,000 Ci/mmol; GE Healthcare, Piscataway, NJ). For each target site tested, randomly chosen positive colonies were cultured, and the homing site was confirmed by PCR amplification of the target site (using the PACYC-5′ and PACYC.rev primers) and sequencing.
For expression and purification of soluble IEPs, we used the pMAL protein fusion and purification system of New England Biolabs (NEB). pMAL-c2X clones were expressed in E. coli Rosetta2(DE3)(pLysS) (Novagen) (Table (Table1).1). A single colony was inoculated into 5 ml of LB medium with 100 μg/ml Ap and 34 μg/ml Cm and grown at 30°C overnight. This preculture (500 μl) was pelleted, inoculated into 50 ml of LB medium containing 1% glucose and the same antibiotics, and grown at 37°C to an optical density at 600 nm of 0.5, and this was followed by induction with 1 mM IPTG for 2 h at 37°C. Cells were harvested by centrifugation for 20 min at 4,000 × g, resuspended in 10 ml of column buffer (20 mM Tris [pH 7.4], 200 mM NaCl, 1 mM EDTA), and kept at −20°C. Cells were lysed on ice by sonication using four 20-s 25-W bursts interspersed with 20-s rest periods. All subsequent steps were carried out at 4°C. Lysates were cleared by centrifugation for 30 min at 9,000 × g, and the maltose binding protein (MBP)-IEP fusion proteins were purified on 500 μl of amylose resin used according to the manufacturer's instructions (NEB). Fusion proteins were eluted using 2 ml of column buffer with 20 mM maltose, pooled, and concentrated using a Centricon YM-10 (Milipore). Finally, glycerol was added to a final concentration of 20%.
RNA-dependent (RT) and DNA-dependent DNA polymerase assays were carried out with 500 ng of MBP-IEPs, as determined by the Bradford assay (Bio-Rad) using bovine serum albumin (NEB) as the standard. RT assays were performed using 500 ng of the artificial template and primer substrates poly(rA) and oligo(dT)12-18 (Amersham Biosciences) and 5 μCi [α-32P]dTTP (3,000 Ci/mmol; GE Healthcare, Piscataway, NJ) in a 10-μl reaction mixture consisting of 1× RT buffer (50 mM Tris-HCl [pH 8.3], 75 mM KCl, 3 mM MgCl2, 5 mM dithiothreitol). Reverse transcription reactions were initiated by addition of purified MBP-IEPs to the reaction mixtures, which were incubated for 30 min at 42°C. To measure the incorporation of [α-32P]dTTP, 5-μl portions of the reaction mixtures were spotted onto DE81 paper (Whatman, Fairfield, NJ) and washed four times with 2× SSC (1× SSC is 150 mM NaCl plus 15 mM sodium citrate) for 5 min each time. The filters were then dried and quantified by Cerenkov counting using a Beckman LS1801 scintillation counter. RT activities were compared to the activity of the SuperScript III polymerase (100 U; Invitrogen).
DNA-dependent DNA polymerase activity was tested in the presence of activated (nicked with DNase I) calf thymus DNA (100 μg/ml; Amersham Biosciences), dATP (10 μM), dTTP (10 μM), dGTP (10 μM), and 1 μM [3H]dCTP. Reactions were performed with 25-μl (total volume) mixtures at 37°C using the RT reaction buffer. DNA synthesis was stopped after 10 min by addition of 600 μl of cold 5% trichloroacetic acid-1% sodium pyrophosphate, which was followed by nucleic acid precipitation on ice for 10 min. The samples were filtered (Millipore) and washed with 250 μl of 5% trichloroacetic acid-1% sodium pyrophosphate to measure the labeled DNA by scintillation counting (filter-based assay). The background was determined by a parallel incubation with purified MBP. DNA-dependent DNA polymerase activities were compared to the activity of Biotools DNA polymerase (5 U; Biotools).
In recent years, sequencing of the genomes of clinical and environmental bacteria has led to the identification of 71 bacterial genomes with full-length or fragmented group IIC-attC introns (28). Fifteen of the full-length introns are in mobile integrons in antibiotic resistance cassettes conferring resistance to β-lactams, aminoglycosides, or trimethoprim (Fig. (Fig.1A)1A) (4, 16, 23, 27, 29, 41). Five of them are in chromosomal integrons in gene cassettes that have no known homologues (except for the N. europaea integron, which contains an ORF possibly related to qacE, which mediates resistance to quaternary ammonium compounds) (Fig. (Fig.1B)1B) (18, 28). Three do not have any nearby integrase gene homolog but are inserted into an attC site (Fig. (Fig.1C)1C) (18, 28).
In both mobile and chromosomal integrons, a single intron copy is usually present in the last cassette and in the direction opposite that of cassette transcription. Figure Figure1A1A shows that various clinical strains isolated in Korea (e.g., Acinetobacter genomospecies and Enterobacter cloacae strains) contain virtually identical intron copies in mobile integrons that contain related resistance gene cassettes (16, 46). Figures Figures1B1B and and1C1C show that in the N. europaea and the G. sulfurreducens genomes, second identical intron copies have attC sites at their 5′ ends but no recognizable ORF at the other end. Another intron attC, without any nearby intI homolog, is present in the metagenome of an uncultured Shewanella sp. strain (accession no. AACY020561240) (45). Sequence analysis revealed two intron copies inserted into distinct attC sites and separated by a 480-bp noncoding sequence (Fig. (Fig.1C).1C). Homologs of the 480-bp sequence between the two introns have been found to be cassettes with Vibrio cholerae repetitive DNA sequences (VCRs) in multiple copies in the chromosomal integron of Vibrio cholerae N16961 (accession no. AE003853) that contains 176 cassettes. Non-protein-encoding cassettes occurring in several copies are intriguing features of Vibrio integrons, notably Vibrio vulnificus integrons (5).
Analysis of the homing sites showed that most introns are inserted into attC sites of various sequences in the consensus TTGT/T (IBS1-IBS3) motif that is located downstream of imperfect inverted repeat sequences (28) (see Fig. S1 in the supplemental material). Interestingly, compilation of the target sites for introns in mobile integrons showed that there was a preponderance of the 60-bp attCaadA1 site at the 5′ ends of several introns. However, in some cases a resistance gene other than the expected aadA1 gene was found at the 3′ end of the same introns. Instead, we found the qacF resistance gene in S. marcescens (accession no. AY030343), the putative fosfomycin resistance gene fosX in Pseudomonas putida (accession no. AY065966), and the blaVIM-2 resistance gene in Klebsiella pneumoniae (accession no. DQ153218) and Providencia rettgeri (accession no. AY887109) (Fig. (Fig.1A).1A). Lack of correspondence between the 5′ and 3′ exon sequences was also observed in Salmonella enterica serovar Typhimurium (accession no. AY204504) for the attCcmlA site and the aadB resistance gene. Together, these observations suggest that genes and attC sites with independent origins are brought together by group IIC-attC introns and that intron-containing cassettes could represent intermediates in cassette assembly.
We used a two-plasmid mobility assay with E. coli to compare the levels of mobility of group IIC-attC introns S.ma.I2 (accession no. AF453998), N.e.I1 (accession no. AL954747), G.s.I2 (accession no. AE017180), and S.b.I1 (accession no. CP000563) into various attC sites (Fig. (Fig.2A;2A; see Materials and Methods and Fig. S2 in the supplemental material). We used these introns because their IEPs share only 48 to 58% amino acid identity and because they are present either in gene cassette arrays of integrons (S.ma.I2, N.e.I1, and G.s.I2) or in an isolated attC site without a nearby integrase gene (S.b.I1) (28). Six attC sites that were successfully targeted by S.ma.I2 (29) were subcloned into pACYC184 in order to remove unnecessary sequences (i.e., resistance gene and other recombination sites) from the previous recipient plasmids. Additionally, we cloned a VCR site (22), which was not previously tested as a homing site (Table (Table11).
Intron mobility was screened by PCR and/or colony patch hybridization methods (see Materials and Methods). Figure Figure2B2B shows the results of a sensitive PCR assay performed with PACYC-5′ and one of the four intron-specific primers in order to amplify the 3′ intron-exon integration junction. PCR bands corresponding to specific homing events were detected for all introns. At least two of the seven attC sites (attCaadA1 and attCsat) were successfully targeted by all of the introns tested. These two sites are the smallest sites (60 bp) and share 88.1% sequence identity. Sequencing of the PCR products showed that the four introns inserted specifically between the IBS1-IBS3 motifs (5′-TTGT/T) located in the bottom strand sequence of all attC sites tested (data not shown). These results confirmed the results of the previous in vivo mobility assays performed with S.ma.I2 (29). Interestingly, S.ma.I2 is the only intron that targeted all attC sites, including the newly tested VCR site. Analysis of the folded structures of the various target sites showed that the VCR bottom strand folds into a stem-loop structure with a large central loop composed of unpaired bases (see Fig. S2 in the supplemental material). Compared to the three-base loops of most attC sites tested in this study, the large central loop of VCR might affect recognition of either stem or IBS1-IBS3 motifs by the incoming introns. We also included in this assay the pNeI1-EBS1 mutant clone, in which the 5′-ACAU→ACAA EBS1 mutation of N.e.I1 could result in stronger EBS1-IBS1 pairing, and the pSmaI2-YAHH mutant clone, which potentially suppresses the RT activity of the S.ma.I2 IEP. Figure Figure2B2B shows that the single-base EBS1 change allowed the N.e.I1 mutant to target two additional attC sites (attCdfrA1 and attCpse1) compared to the wild-type intron. As expected, mobility into attC sites was not detected for the S.ma.I2 YAHH mutant.
In order to compare the levels of mobility of S.ma.I2 for the distinct attC sites, we used the colony patch hybridization method. Under the experimental conditions used, the S.ma.I2 mobility frequency (defined as the percentage of homing site plasmids containing introns) varied from <1.0% for the attCdfrA1 and VCR sites to 95.5% for the attCpse1 site (Table (Table3).3). Results shown in Table Table33 were confirmed using the PACYC-5′ and PACYC-rev primers in a semiquantitative PCR assay in which the intensity of the PCR products correlates with the mobility frequency values (see Fig. S3 in the supplemental material). Together, the results shown in Fig. Fig.2B2B and Fig. S3 in the supplemental material suggest that the N.e.I1, G.s.I2, and S.b.I1 introns have low mobility frequencies into attCs compared to S.ma.I2. Colony patch hybridization assays confirmed that the mobility frequencies of the N.e.I1, G.s.I2, and S.b.I1 introns into attCaadA1 were <1.0% (data not shown).
Assays of the mobility of group IIC-attC introns into mutated attC sites showed that a structural motif, rather than a defined homing site sequence, is involved in target site recognition along with the short IBS1-IBS3 motifs (19, 29). In this study, we suggest that the group IIC-attC introns are involved in the recruitment of distant genes and attC sites. We tested whether S.ma.I2, which showed the highest mobility frequencies for various attC sites, could also target transcriptional terminator sequences, which are the preferred targets of other group IIC introns (31). We searched GenBank databases for potential IBS1-IBS3 and stem-loop motifs in putative transcriptional terminators of antibiotic resistance genes. Candidate sequences (designated Term sites) downstream of the plasmid-borne β-lactamase genes blaCMY-8 (NCBI Entrez gene ID 6383913; in K. pneumoniae plasmid pK29; accession no. EF382672), blaSHV-5 (gene ID 1446571; in E. coli plasmid p1658/97; accession no. AF550679), and blaCTX-M-3 (in Citrobacter freundii plasmid pCTX-M3; accession no. AF550415) were cloned and tested as homing sites for S.ma.I2 in E. coli (Table (Table1).1). Figure Figure3A3A shows agarose gels containing the PCR products corresponding to the 3′ intron-exon integration junction in the Term sites indicated. PCR bands corresponding to S.ma.I2 mobility were detected with the wild-type Term_blaCMY-8 site in the bottom strand sequence and with the Term_blaCTX-M-3 site in the top strand sequence. Sequencing of the PCR amplicons showed intron mobility into the 5′-CTGT/C motif (IBS1-IBS3), which overlaps the stop codon of blaCMY-8, and into the 5′-TTGT/T motif, which is located on the top strand downstream of a potential stem-loop motif in the Term_blaCTX-M-3 sequence (Fig. (Fig.3B;3B; see Fig. S4 in the supplemental material). Other group IIC introns have been found to be joined to transcriptional terminators in nature (31); however, the orientation and strand specificity of S.ma.I2 insertion appear to be determined by the position of the IBS1 and IBS3 motifs (19).
The importance of complementary EBS1-IBS1 motifs was confirmed by mutation of putative IBS1 motifs for the consensus 5′-TTGT sequence in the Term_blaCMY-8 and Term_blaSHV-5 bottom strand sequences. A two-base mutation of the 5′-TTTC sequence located upstream of the Term_blaCMY-8 insertion site (see above) to 5′-TTGT resulted in a stronger PCR band corresponding to a specific insertion of S.ma.I2 into the new 5′-TTGT/T (IBS1-IBS3) site. Similarly, mutation of the 5′-GGGT sequence that overlapped the stop codon of blaSHV-5 on the bottom strand to 5′-TTGT resulted in a weak PCR band corresponding to a specific insertion of S.ma.I2 into the new 5′-TTGT/T (IBS1-IBS3) site. In contrast to the positive results for S.ma.I2, in similar mobility assays with the N.e.I1, G.s.I2, and S.b.I1 introns negative PCR results were obtained for all of the Term sites (data not shown).
We compared the levels of mobility of S.ma.I2 for the distinct Term sites using the colony patch hybridization method (Table (Table3).3). Under the experimental conditions used, the S.ma.I2 mobility frequencies were <1.0% for both wild-type and mutated Term sites.
Based on the structural characteristics of most integron cassettes (i.e., the absence of promoters, the paucity of noncoding sequences, and the presence of only one gene), it has been suggested that cassette formation involves reverse transcription of assembled gene-attC sequences at the RNA level (4, 11, 12, 30). Several studies showed the RT activity of purified group II IEP or ribonucleoprotein preparations (25, 26, 31, 38, 44). Here, we searched for RNA-dependent (RT) and DNA-dependent DNA polymerase activities of purified IEPs from two distinct introns, N.e.I1 and G.s.I2. The IEPs of these introns (Netr and Gstr, respectively; 59.4% amino acid identity) were expressed in E. coli and affinity purified using the pMAL protein fusion and purification system (see Materials and Methods). The purified ~90-kDa MBP-IEP fusions (~40 kDa for the MBP and ~50 kDa for the IEP) were tested for RT and DNA-dependent DNA polymerase activities using poly(rA)-oligo(dT)12-18 and activated calf thymus DNA substrates, respectively (see Materials and Methods). For the RT assays, we used a purified MBP preparation as a negative (or background) control and a commercial RT, SuperScript III, as a positive control. For the DNA-dependent DNA polymerase assays, we used a purified MBP preparation as a negative control and a commercial DNA polymerase, the Biotools DNA polymerase, as a positive control. Figure Figure44 shows that the MBP-Netr and MBP-Gstr fusions displayed high RT activities (comparable to the SuperScript III activity) with the poly(rA)-oligo(dT)12-18 substrate, whereas no significant activity was detected with MBP alone. As expected, the RT activity of the MBP-Netr protein was abolished by point mutations of the conserved aspartate residues in the YADD motif (changed to YAHH) of the IEP RT domain. Figure Figure44 shows that the MBP-Netr and MBP-Gstr fusions also displayed significant DNA-dependent DNA polymerase activities with the calf thymus DNA substrate. The MBP and MBP-Netr YAHH proteins displayed no activity, and the activity of the Biotools DNA polymerase used as a positive control was at least twice that of the wild-type MBP-IEP fusions.
Integrons and gene cassettes are ancient structures that play a major role in bacterial genome evolution by increasing the exchange of exogenous genes. Lateral gene transfer of integrons allows bacteria to adapt to different environmental pressures, such as antibiotic use in human medicine or in agriculture. However, little information is available about the origins of the resistance genes and the way in which they are assembled into gene cassettes. The large cassette arrays of some chromosomal integrons are potential sources of integron cassettes for mobile integrons (7, 13, 33, 43). Analysis of the gene cassette metagenome of environmental bacteria showed that most of the chromosomal integrons include ORFs (whose products have no known function), and occasionally a chromosomal integron includes a single antibiotic resistance gene (2, 14, 21). However, the few antibiotic resistance cassettes found in chromosomal integrons cannot explain the large variety of resistance cassettes found in mobile integrons (8, 9). Interestingly, potential progenitors of the cat (chloramphenicol acetyltransferase) and oxa2 (β-lactamase) cassette-associated genes have been found in the chromosomes of Pseudomonas aeruginosa, Agrobacterium tumefaciens, and Synechocystis (30), but not as cassettes. Despite 63 to 70% nucleotide identity between the gene cassettes and the corresponding chromosomal genes, attC sites have not been found downstream of the ORFs. These examples strongly suggest an independent origin for the genes and the attC sites and a mechanism of cassette formation. Furthermore, analysis of known integron cassettes suggested that their genesis involves an intermediate step involving reverse transcription of mRNA molecules in order to explain the absence of promoters, the paucity of noncoding sequence, and the presence of only one gene (11, 12, 30). However, this previous model did not explain how genes and attC sites are assembled or the origin of the RT. Recent studies showed that group IIC introns can target various transcriptional terminators (and that some group IIC introns target attC sequences) and that their ORFs code for a multidomain protein with RT activity (29, 31).
In this study, we investigated whether group IIC-attC introns could be involved in cassette formation by targeting and then joining distant genes and attC sites. Using an experimental system, we showed that the S.ma.I2 intron from a clinical isolate is more efficient in targeting various attC sites than the N.e.I1, G.s.I2, and S.b.I1 introns from environmental isolates are. Analysis of flanking exon sequences of group IIC-attC introns from the mobility assays and their natural genomic contexts showed that these introns recognize a consensus homing site, 5′-TTGT/T (IBS1-IBS3), which is at the 3′ end of most attC sites (bottom strand) (28, 29). We suggest that the homing specificity of these introns may explain the precise juxtaposition of the structural gene stop codon and the inverted core site of the attC site in many integron cassettes (11). Furthermore, we showed that S.ma.I2 targets in vivo at low levels non-attC sequences located downstream of resistance genes. Based on the genomic context, we suggest that the targeted sequences correspond to putative bidirectional transcriptional terminators. The data show that the target sites of group IIC-attC introns are not limited to attC sites but also include putative transcriptional terminator motifs, potentially leading to gene-intron intermediates in cassette formation. Despite the substantial bacterial genomic sequencing that has been done, none of the members of the group IIC-attC lineage has been found to be associated with putative terminators in nature. The low mobility frequencies observed here for the Term sites might be one of the reasons for this. An alternative explanation is that the intermediates are unstable.
The data presented here suggest a new model for integron cassette formation (Fig. (Fig.5).5). In this model, the first step involves homing of identical group IIC-attC introns to a transcriptional terminator adjoining a gene and to an isolated attC site (step 1). The targeted gene can be an ORF or a resistance gene with a putative bidirectional terminator, as shown here in vivo with S.ma.I2 (Fig. (Fig.3).3). Isolated intron-attCs have been found in N. europaea, G. sulfurreducens, and Shewanella sp. genomes without any nearby intI homolog (Fig. (Fig.1C).1C). A recent study of the distribution of group IIC-attC intron target sites in complete bacterial genomes showed that single or multiple copies of a particular site (with ≥90% nucleotide identity) are present in diverse organisms (28). Moreover, some targets are not confined to specific clades in bacterial orders. The next step involves recombination, possibly RecA mediated, between the two introns (step 2). Potential ORF-intron-attC cassette intermediates have been found in the Shewanella sp. metagenome (accession no. AACY020561240) (Fig. (Fig.1C)1C) and in several mobile integrons (Fig. (Fig.1A).1A). These intermediates may represent “frozen” intermediates that are unable to complete the process or that are selected for by the expression of cassettes from the internal (intron) promoter oriented toward exon 1 (G. Léon, C. Quiroga, and P. H. Roy, unpublished data), particularly in K. pneumoniae (accession no. AJ971342), S. marcescens (accession no. AF453998), and E. coli (accession no. AY785243) clinical isolates (Fig. (Fig.1A).1A). The next step involves full-length transcription of the cassette intermediate bottom strand sequence (i.e., the IEP coding strand) and splicing of the intron (step 3). This normally takes place in the gene's chromosomal context and is not yet associated with an integron. Ligated gene-attC RNAs were detected by RT-PCR following the self-splicing of N.e.I1 (data not shown) and S.ma.I2 introns from gene-intron-attC transcripts in vitro (29). The next step involves reverse transcription of the joined gene-attC RNA by the intron-encoded RT using an unknown primer or priming mechanism (step 4). We showed that purified RTs from two group IIC-attC introns (N.e.I1 and G.s.I2) have both RNA- and DNA-dependent DNA polymerase activities in vitro. To our knowledge, this is the first report of a significant DNA-dependent DNA polymerase activity for bacterial group II IEPs (1). The final step involves integration of the cassette by site-specific recombination (step 5). While not all integron-containing strains contain group IIC-attC introns, cassettes formed in strains that do contain them would be exchanged by lateral transfer of their integrons, associated with plasmids or transposons. The unique structural characteristics of gene cassettes and the rapid appearance of new antibiotic resistance cassettes in mobile integrons (e.g., extended-spectrum β-lactamase and carbapenemase cassettes, such as blaIMP and blaVIM), which do not have homologs among chromosomal integron cassettes, suggest that there is an RT-based cassette formation mechanism that is boosted by the selective pressure of antibiotics. As more genome project sequences are assembled and analyzed, it is likely that more group IIC-attC introns will be shown to occur in genera in which chromosomal integrons are present and data will ultimately show the contribution of these introns to the cassette neoformation process.
We thank Melanie Martin for technical assistance with the DNA polymerase assays.
This work was supported by Canadian Institutes of Health Research grant MT-13564 to P.H.R. G.L. was supported by a fellowship from Canadian Institutes of Health Research Strategic Training Initiatives in Health Research grant STP-59324.
Published ahead of print on 24 July 2009.
†Supplemental material for this article may be found at http://jb.asm.org/.