|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this study was to investigate the characteristics of transfer RNA (tRNA) responsible for the association between tRNA genes and genes of apparently foreign origin (genomic islands) in five high-light adapted Prochlorococcus strains. Both bidirectional best BLASTP (basic local alignment search tool for proteins) search and the conservation of gene order against each other were utilized to identify genomic islands, and 7 genomic islands were found to be immediately adjacent to tRNAs in Prochlorococcus marinus AS9601, 11 in P. marinus MIT9515, 8 in P. marinus MED4, 6 in P. marinus MIT9301, and 6 in P. marinus MIT9312. Monte Carlo simulation showed that tRNA genes are hotspots for the integration of genomic islands in Prochlorococcus strains. The tRNA genes associated with genomic islands showed the following characteristics: (1) the association was biased towards a specific subset of all iso-accepting tRNA genes; (2) the codon usages of genes within genomic islands appear to be unrelated to the codons recognized by associated tRNAs; and, (3) the majority of the 3′ ends of associated tRNAs lack CCA ends. These findings contradict previous hypotheses concerning the molecular basis for the frequent use of tRNA as the insertion site for foreign genetic materials. The analysis of a genomic island associated with a tRNA-Asn gene in P. marinus MIT9301 suggests that foreign genetic material is inserted into the host genomes by means of site-specific recombination, with the 3′ end of the tRNA as the target, and during the process, a direct repeat of the 3′ end sequence of a boundary tRNA (namely, a scar from the process of insertion) is formed elsewhere in the genomic island. Through the analysis of the sequences of these targets, it can be concluded that a region characterized by both high GC content and a palindromic structure is the preferred insertion site.
The marine unicellular cyanobacterium Prochlorococcus, possibly the smallest and most abundant photosynthetic organism on the earth, dominates the euphotic zone in tropical and subtropical oligotrophic waters between 40° N and 40° S (Partensky et al., 1999; Hess et al., 2001; Tian et al., 2005; Zinser et al., 2007). Considering its wide distribution and strong adaptability, it is a suitable organism to explore microdiversity. Comparisons of the genome sequences of certain closely related Prochlorococcus strains have revealed the intimate link between their genomic divergence and adaptability to different oceanic niches (Rocap et al., 2003). Horizontal gene transfer (HGT), a process in which one organism transfers genetic material to another organism that is not its offspring (Syvanen, 1994; Koonin, 2009), plays an important role in giving rise to extremely dynamic cyanobacteria genomes (Zhaxybayeva et al., 2006).
HGT is now recognized as a major force shaping the evolutionary histories of prokaryotes (Koonin et al., 2001; Zhaxybayeva et al., 2006; Boto, 2010). In many prokaryotes, horizontal transfer genes (HTGs) contribute 1.6%–32.6% of the genes (Nelson et al., 1999; Garcia-Vallvé et al., 2000; Koonin et al., 2001; Nakamura et al., 2004; Choi and Kim, 2007). Recent studies have shown that a new type of mobile element known as a genomic island (GI), clusters of genes of apparently foreign origin in a prokaryotic genome, is acquired through HGT (Hacker and Carniel, 2001; Hsiao et al., 2003). A large amount of GIs are created by a site-specific recombination mechanism, which plays a crucial role and therefore is significantly useful in exploring the formation of GIs. Mediated by transfer RNA (tRNA) and initiated at the 3′ ends of the tRNA genes, this kind of site-specific recombination mechanism creates some short and direct repeats identical or nearly identical to the 3′ ends (Reiter et al., 1989; Cheetham and Katz, 1995; Williams, 2002; Baar et al., 2003; Tuanyok et al., 2008).
Currently, four hypotheses have been proposed from the perspectives of tRNA gene characterization on why the tRNA genes are frequently used as recombination sites of GIs. The first one holds that the complementarity of 5′ and 3′ ends of tRNA will bring about a pair of inverted repeats that presumably tend to be recognizable to an integrase, which, in turn, can integrate foreign genetic materials into genomes (Reiter et al., 1989). However, evidence has shown that in the λ phage, the distance between a pair of inverted repeats is 7 nucleotides (nt), but in tRNA it is at least 50–60 nt, a long separation that does not favor DNA recombination (Hou, 1999). Consequently, the first hypothesis is not approximate. The second hypothesis assumes that it is the multiple copies of tRNA in bacterial genomes that lead to repeated insertions of GIs into tRNA genes (Cheetham and Katz, 1995). The third hypothesis proposes that the conserved CCA end sequence at the 3′ end provides a cleavage site of initial recognition for the integrase, and after cleavage, the 3′ end of specific tRNA transcript will hybridize with one of the two disengaged DNA strands to form a stable RNA-DNA hybrid (Hou, 1999). The fourth hypothesis suggests that because a specific tRNA gene is associated with a GI, it is given preference to read the codons carried by this GI (Ritter et al., 1995).
In this study, to investigate whether tRNA genes are insertional hotspots of GIs and the cause of such insertions in Prochlorococcus strains, we identified the GIs in the most closely related five Prochlorococcus strains by multiple genomic comparisons, including Prochlorococcus marinus AS9601 (PMB), P. marinus MIT9515 (PMC), P. marinus MED4 (PMED4), P. marinus MIT9301 (PMG), and P. marinus MIT9312 (PMI), in view of cutting down the interference by large rearrangements such as translocation or inversion in the identification of GIs and maximizing the colinearity of compared genomes. Through the analysis of the tRNA genes associated with GIs in the five strains, we found that some of these tRNA genes demonstrate interesting characteristics that are in discordance with the hypotheses described above. Therefore, we discuss our observations on the basis of sequence analysis, and provide our perception of the location of real insertion site in tRNA and the cause of frequent insertion into tRNA genes as far as the five Prochlorococcus strains are concerned.
BioBIKE (Elhai et al., 2009), a web-based environment, was employed as our work platform. It combines a biological knowledge base, a graphical programming interface, and an extensible set of tools. The genomes used in this study and their sources are listed in Table TableA1A1 (Rocap et al., 2003; Coleman et al., 2006; Kettler et al., 2007).
We utilized a pipeline developed in BioLisp within BioBIKE to identify GIs in five Prochlorococcus strains, including PMED4, PMC, PMB, PMG, and PMI. The procedure was as follows: first, the candidate orthologs were obtained in BioBIKE through bidirectional best BLASTP (basic local alignment search tool for proteins) search with threshold E-value 10−6 (Altschul et al., 1997). Then they were analyzed through conservation of gene order, and if two candidates appear in the same order in closely related genomes, they are assigned to one orthologous group. Second, we identified the alien genes by examining whether each gene of one Prochlorococcus strain has orthologs in the genomes of the other four strains. A gene was operationally defined as alien if there were orthologous genes in at most one of the four other Prochlorococcus genomes, and a region that was composed of one or more continuous alien genes formed a GI. Therefore, 31 regions (191 genes) in PMED4, 52 regions (297 genes) in PMC, 21 regions (211 genes) in PMB, 27 regions (168 genes) in PMG, and 16 regions (208 genes) in PMI were identified as GIs (Liu and Zhu, 2010).
A program written in BioLisp within BioBIKE was developed to simulate the insertion of foreign genetic materials into a genome. The model for the insertion of a GI adjacent to tRNA is as follows: an insertion (X) is thrown randomly at a genome, and therefore follows a uniform distribution (X~U[1, L], where L means the length of a host genome). In every round of simulation, one insertion was shot at the host genome, and we examined whether the insertion was inside a gene, next to tRNA, or in the intergenic sequences. Excluding the insertions inside the genes (because the model supposes that they would lead to the death of the organism), we counted the number of remaining insertions that were next to tRNA genes and in the intergenic sequences, and calculated the expected ratios (f) of the former to the latter (f=N tRNA/N inter, where N tRNA is the number of insertions next to tRNA and N inter is the number of insertions in the intergenic sequences). The Chi-square (χ 2) test with one degree of freedom (df) was used to assess the significance of the insertions of GIs into tRNA genes:
, where N obs is the observed number of GIs associated with tRNA genes in a given genome and N total is the total number of GIs in a given genome.
To obtain the direct repeat sequences of the tRNA 3′ ends in GIs, we used iterative search, a type of BioBIKE function, to find all sequences related to an initial query with less than four mismatches (or 12 mismatches in the case of the longer direct repeat sequence in tRNA-Pro). For the other sequences, we used BLAST (Altschul et al., 1997; Wang et al., 2005) to search all similar sequences related to an initial query with threshold E-value 10−6.
Using program ClustalW under default settings in MEGA Version 4.0 (Tamura et al., 2007), we performed the multiple sequence alignment of the sequences within GIs and removed the unconserved regions of alignment manually. In addition, we constructed the phylogenetic trees through MEGA version 4.0 employing the neighbor-joining (NJ) method and unweighted pair group method with arithmetic mean (UPGMA), whose substitution model of nucleotide was p-distance.
The secondary structures of single-stranded DNA sequences were determined using the MFOLD 3.2 program (Zuker, 2003).
According to our sequence analysis of the five closely related Prochlorococcus strains, there are 7 GIs immediately adjacent to tRNA in PMB, 11 in PMC, 8 in PMED4, 6 in PMG, and 6 in PMI (Table (TableA2).A2). The observed ratios of the GIs inserted into tRNA to those into intergenic sequences are 0.33 in PMB, 0.21 in PMC, 0.26 in PMED4, 0.22 in PMG, and 0.37 in PMI. In order to assess whether GIs appear adjacent to tRNA genes more frequently than expected by chance alone, we implemented 100 000 replications of simulation, and found that the expected ratios of the number of insertions next to tRNA genes to the number of intergenic sequences are 0.0661, 0.0600, 0.0801, 0.0568, and 0.0828 in PMB, PMC, PMED4, PMG, and PMI, respectively (Table (Table1).1). According to the χ 2 test, the observed insertions are significant at P=0.01 level, which proved that tRNA gene loci are the insertion hotspots in the genome of Prochlorococcus. Our results confirm, from a statistical perspective, many earlier observations in prokaryotes (Reiter et al., 1989; Parreira and Gyles, 2003; van Aartsen, 2008). This previous work has revealed that tRNA loci are not only central components in translation, but also commonly serve as insertion sites for mobile elements in bacteria because there is an attB (bacterial attachment site) within some tRNA genes such as Arg and Pro (Reiter et al., 1989; Semsey et al., 2002), and therefore, the presence of these tRNA genes gives rise to variable genomic regions and the observed divergence of Prochlorococcus genomes.
Numbers of insertions into tRNA genes in PMB, PMC, PMED4, PMG, and PMI
When we observed the inserted sites of the GIs in the five Prochlorococcus strains, we found that the GIs associated with specific tRNA genes (tRNA-Ala, tRNA-Arg, tRNA-Pro, and tRNA-Thr) favor to insert into certain instead of all iso-accepting tRNA genes, although they are homologous (Table (Table2).2). We also found that in a total of 16 tRNA genes associated with GIs, 11 do not have CCA ends at their 3′ ends. That is, the second and the third hypotheses mentioned above can hardly explain these observations. We further computed two kinds of codon usages that are defined as the ratio of the number of occurrences of a codon corresponding to tRNA associated with GIs to the sum of all synonymous codons in genomes and GIs (Xu et al., 2008). The results of our computation showed a strong positive correlation (R 2=0.93) in the codon usages between genomes and GIs (Fig. (Fig.1).1). At the same time, most of the codons corresponding to the tRNA associated with GIs are rarely used in GIs and genomes. These findings are inconsistent with the fourth hypothesis which proposes that the codons corresponding to these tRNA genes associated with GIs tend to be used within GI genes.
Codon usages corresponding to the tRNA genes associated with the GIs among both GIs and genomes
Total number of GI insertions into the tRNA genes corresponding to the same codons in the five strains
The mechanism of the introgression of foreign genes into host genomes can provide an important clue to the determination of insertion sites. In some cases, the 3′ end sequence of a boundary tRNA is repeated elsewhere in GIs, always as a direct repeat; therefore, it assists one to probe into the underlying mechanisms. Here, we took the GI associated with tRNA-Asn in PMG as an example (Fig. (Fig.2a).2a). There are three genes present, including P9301_01431, P9301_01441, and P9301_001451. To obtain the remnant information of the GI insertion into tRNA-Asn, we analyzed the regions flanking the GI (namely seq.1 and seq.2 in Fig. Fig.2a)2a) and found that seq.1, seq.2, and seq.3 in PMG and PMB are homologous. As is shown in Fig. Fig.2b,2b, if seq.1 was native, seq.2 should have been more homologous with it than with seq.3, but in fact, seq.2 has a similarity of 72.4% with it, and 83% with seq.3. That is, the sequence between the 3′ end of tRNA-Asn and its direct repeat comes from some other organism, which has a segment similar to seq.2. This suggests that foreign genetic materials are introduced into the host genomes by site-specific recombination using the 3′ end of tRNA as the target and the direct repeats are generated at the time the GIs are formed (Fig. (Fig.2c2c).
Schematic representation of the insertion of foreign genetic materials into tRNA by site-specific recombination exemplified by the GI between tRNA-Asn and P9301_01461 in PMG
We analyzed the tRNA genes (Asn-AAC, Pro-CCA, Ser-TCG, Thr-ACC, Tyr-TAC, and Cys-TGC) in the five Prochlorococcus strains and found a general characteristic: high GC contents at all the 3′ ends inserted by GIs (Fig. (Fig.33).
GC contents of the repeated sequences in tRNA genes associated with GIs
We also noticed that, although the direct repeats are identical with the 3′ ends and therefore have high GC contents, they are not preferred in the insertion. According to our observations, the GI associated with tRNA-Asn in PMC is separated by a direct repeat into two regions, which means that it is formed by the foreign materials acquired from two insertions (Fig. (Fig.4).4). We determined the time order of the insertions so as to make out whether the target site is the 3′ end or its direct repeat. In general, due to their low selective pressure, intergenic sequences should reflect the DNA composition of the donor and the host genomes more explicitly than the sequences of coding genes. Our calculations showed that the GC fractions of the intergenic sequences in Region 1, Region 2, and genome are 29.9%, 18.2%, and 22.7%, respectively. Also, in order to get an impression of the variability in GC measurments, we carried out 100 replications of simulation, joining all intergenic sequences in genome and calculating a randomly selected 782 nt within them. The mean GC content of the 100 simulations was 22.6%, with a standard deviation of 3.2%. The expected GC fraction in the genome deviates far more from Region 1 than from Region 2. It is well-known that the earlier a foreign genetic material introgresses, the more similar its DNA composition is to the host genome due to its amelioration in the recipient organism. Therefore, Region 2 was inserted into the recipient genome earlier than Region 1. The time order implicated that the 3′ end is the target site, and otherwise Region 1 would have been inserted earlier. Moreover, according to the phylogenetic trees, the direct repeats closer to tRNA-Pro are more similar to the 3′ end sequence of tRNA-Pro (Figs. (Figs.55–7). It also shows that the 3′ ends are preferred in insertions, because the earlier an insertion occurs, the higher degree of mutation the direct repeat formed by the insertion demonstrates due to the evolutionary force. In other words, only when a 3′ end is taken as the insertion target will its direct repeats form a pattern of ascending order in terms of the similarity to it.
Schematic representation of the GI between tRNA-Asn and P9515_01551 in PMC
Schematic representations of the GIs associated with tRNA-Pro in PMB, PMC, PMED4, PMG, and PMI
Phylogenetic tees deduced from the 3′ ends of tRNA-Pro genes associated with GIs and their direct repeats using NJ method provided by MEGA 4.0 in PMB (a), PMC (b), PMED4 (c), PMG (d), and PMI (e), respectively
Alignments of the 3′ ends of tRNA-Pro genes associated with GIs and their direct repeats in PMB (a), PMC (b), PMED4 (c), PMG (d), and PMI (e), respectively
Being the duplications of a 3′ end of tRNA, direct repeats are not preferred in insertions. It suggests that there should be more elements than high GC contents that affect the process. Therefore, we included in our analysis the sequences immediately in front of the segments of the 3′ ends that are repeated elsewhere in GIs, and found indeed the second general characteristic: they have palindromic regions (Fig. (Fig.8).8). As the sequence of high GC content can form a stable DNA-DNA hybridization in recombination, and the palindromic structure can bind an integrase, we presume that the palindromic regions, whose ends are adjacent to the sequences of high GC contents, are the real insertion sites of GIs in Prochlorococcus.
Secondary structures of the 3′ ends of tRNA
To confirm our supposition, we analyzed the GIs that are not flanked by tRNA, including the region between P9601_00511 and P9601_00611 in PMB, and its counterparts in PMC, PMED4, PMG, and PMI (Fig. (Fig.9a).9a). In P9601_00511 and its orthorlogs P9515_00571, PMM0050, P9301_00531, and PMT9312_0051, we found a palindromic structure at each of their 3′ ends (Fig. (Fig.9b),9b), but only the 3′ ends of P9601_00511 and P9301_00531 are adjacent to an intergeneic sequence of high GC content (“CCCA” and “TCCCA” respectively) (Fig. (Fig.9c).9c). It is in these two genes that the insertion of foreign genetic materials happens. Having both high GC content and palindromic structure is a necessary condition for a real insertion site. As we all know, there is a great tendency of mutation from base “C” into “T”. If a sequence of high GC content lies within a gene such as tRNA, it is not likely to mutate from “G or C” to “A or T”, and therefore can undergo repeated insertion. On the contrary, an intergentic sequence, due to its high mutation rate, can hardly be inserted.
GIs between the homologs of P9601_00511 and P9601_00611 in five Prochlorococcus strains
GIs that confer fitness on an organism to occupy a particular ecological niche are horizontally transferred sequences. The tRNA loci usually serve as the target site for GI integration. Evidence shows that four different hypotheses have been proposed to elucidate the mechanism of the insertion of GIs into tRNA, thoroughly but insufficiently. We consequently propose that the real insertion site of GIs prefers the region characterized by a palindromic structure adjacent to a sequence of high GC content, and as the 3′ end of a conserved tRNA gene can maintain this property, it can be inserted repeatedly.
We thank Prof. Jeff ELHAI in Virginia Commonwealth University, Richmond, USA, for helpful comments and insights, and Ms. Xin XU in Chengdu University of Information Technology, Chengdu, China, and Dr. Guo-bo CHEN in Zhejiang University, Hangzhou, China, for their helps with the manuscript.
|Organism||Abbreviation||Light adaptation||Size (Mb)||Gene number||Source|
The source of genome sequence is the National Center for Biotechnology Information (NCBI), with the given accession number. The other source is Joint Genome Institute (JGI). The HL represents high-light-adapted ecotypes
*Project (No. 2006AA10A102) supported by the National High-Tech R & D Program (863) of China