|Home | About | Journals | Submit | Contact Us | Français|
Comparative sequencing of Pseudomonas aeruginosa genes oriC, citS, ampC, oprI, fliC, and pilA in 19 environmental and clinical isolates revealed the sequence diversity to be about 1 order of magnitude lower than in comparable housekeeping genes of Salmonella. In contrast to the low nucleotide substitution rate, the frequency of recombination among different P. aeruginosa genotypes was high, leading to the random association of alleles. The P. aeruginosa population consists of equivalent genotypes that form a net-like population structure. However, each genotype represents a cluster of closely related strains which retain their sequence signature in the conserved gene pool and carry a set of genotype-specific DNA blocks. The codon adaptation index, a quantitative measure of synonymous codon bias of genes, was found to be consistently high in the P. aeruginosa genome irrespective of the metabolic category and the abundance of the encoded gene product. Such uniformly high codon adaptation indices of 0.55 to 0.85 fit the ubiquitous lifestyle of P. aeruginosa.
The γ-subdivision proteobacterium Pseudomonas aeruginosa is capable of thriving in a great number of seemingly dissimilar ecological niches. It is ubiquitously distributed in aquatic habitats and in soil (6) but is also found as part of the normal bacterial flora of the intestine, mouth, and skin of animals (35). Under normal circumstances, colonization is harmless and infection only occurs when local or general defense mechanisms are reduced (6). In susceptible animals, P. aeruginosa may cause infection at any site, particularly wounds and the respiratory tract (6). Moreover, P. aeruginosa is an opportunistic invader of plants (5). P. aeruginosa has become one of the most important nosocomial opportunistic pathogens in humans (6). A peculiar feature is chronic airway infections in patients with cystic fibrosis (CF) (15).
Common approaches for analyzing the structure of natural populations of bacteria are multilocus enzyme electrophoresis (MLEE) (45) and multilocus sequence typing (MLST) (25). Allelic variation is indexed in MLEE by the electrophoretic mobilities of housekeeping enzymes (45) and in MLST by single nucleotide polymorphisms (SNPs) in selected genes (25). By applying either method, isolates within bacterial populations are assigned to specific clones due to their multilocus allelic profiles. Population structures of taxospecies range from the effectively panmictic Neisseria gonorrhoeae to the almost strictly clonal Salmonella (12, 13, 27, 46, 54, 58).
The population structure of P. aeruginosa has so far not been analyzed by MLST. MLEE has been applied to P. aeruginosa to study the association between electrophoretic types and lipopolysaccharide O-antigen serotypes (7) and to detect the nosocomial spread of strains in cancer (16) or CF patients (3, 26). In our study, comparative sequence analysis was applied to a variety of environmental and clinical isolates in order to assess genetic diversity of P. aeruginosa and hence to gain insights into the molecular evolution of this widespread opportunistic pathogen. The sequence diversity of a set of P. aeruginosa genes being highly diverse in function and representative of the conserved gene repertoire of this taxospecies was evaluated. The P. aeruginosa population was found to consist of hierarchically equivalent genotypes whereby all strains of a genotype share identical alleles.
In order to assess the genetic variation within the P. aeruginosa population, 19 P. aeruginosa strains from various clinical and environmental habitats were selected: TB , sputum from a CF patient, Sehnde, Germany, 1984; 892 , CF sputum, Hannover, Germany, 1983; 63741 , burn wound, intensive care unit, Hannover, Germany, 1990; K9  and K10 , sputum isolates with differential phage susceptibility and morphotypes from a CF patient, Husum, Germany, 1985; G7  and G9 , sequential sputum isolates with differential phage susceptibility and morphotypes from a CF patient in a stable clinical state and ten months later during a pulmonary exacerbation, Stade, Germany, 1986; SG1  [clone C (39)], throat swab from a CF patient, Bückeburg, Germany, 1986; SG31  [clone C (39)], river, Mülheim, Germany, 1993; DM, CF sputum, Hamburg, Germany, 1984; HJ2, sputum isolate, Cologne, Germany, 1990; DSM 1128 (ATCC 9027), ear infection, United States, 1980; ATCC 10145, neotype, type strain, Prague, Czech Republic, <1960; ATCC 15691, PAT, wound, Melbourne, Australia, 1952; ATCC 33356, international serotype 9, human faeces, Heidelberg, Germany, 1955; ATCC 33818, mushroom Agaricus bisporus; ATCC 21176, soil, Japan; H2, catheter, ward for infectious disease; and PAO1, genomic and genetic reference strain, burn wound, Melbourne, Australia.
Common numbers in square brackets indicate strains that were defined as clonal variants because they exhibit >70% identities in their SpeI macrorestriction fragment patterns (17).
Their genomic DNA was prepared using a rapid method for gram-negative bacteria (8). Different consensus primer sets (49, 52) enabled the amplification of the minimal origin of replication oriC, the β-lactamase gene ampC, the citrate synthase gene citS, the lipoprotein I gene oprI, both types of flagellin genes, and the type IV pilin genes among all analyzed strains. After purification by ultrafiltration with Ultrafree-MC Filter Units (Millipore), the PCR products were sequenced in both directions by the dideoxy chain terminating method using the Dye Terminator Cycle Sequencing Ready Reaction Kit (Applied Biosystems Inc.) and analyzed on a 373A automatic sequencer (ABI).
P. aeruginosa bacteria grown to the late-exponential phase were encapsulated in agarose blocks and lysed with detergents and proteinase K, and the intact chromosomes were cleaved with SpeI as described previously (38). The SpeI digests were separated by pulsed-field gel electrophoresis (PFGE) in a Bio-Rad DR cell (U = 200 V, 37 h, 10°C, two linear ramps of 5 to 25 s and 5 to 60 s in 1-s increments) and transferred onto nylon membranes by capillary blotting (38). Partial oriC, ampC, citS, oprI, a-type or b-type fliC, or different classes of pilA sequences were amplified from P. aeruginosa DNA by PCR, labeled with digoxigenin-dUTP (38), and hybridized with the pulsed-field blot. Hybridized fragments were detected by chemiluminescence using an alkaline phosphatase-conjugated anti-digoxigenin antibody and subsequently CDP-Star (Tropix) as substrates.
Multiple alignments of DNA sequence data were generated with the CLUSTAL program of the Genetics Computer Group Sequence Analysis Software Package (University of Wisconsin, Madison) (10). Nonrandom clustering of polymorphic sites was tested with a program by T. S. Whittam, Pennsylvania State University, University Park, based on the algorithm of Stephens (53). The whole set of polymorphic sites is split into subsets called phylogenetic partitions. The sites are successively tested for whether they support each partition for spatial clustering. Significant clustering of partition-specific sites indicates intragenic recombination.
Phylogenetic analysis was performed using the PC program MEGA, Version 1.01 (23). The numbers of synonymous nucleotide substitutions per 100 synonymous sites (dS) and nonsynonymous substitutions per 100 nonsynonymous sites (dN) were obtained by applying the Jukes-Cantor formula (21). Genetic distance matrices were calculated as the number of nucleotide differences from pairwise comparisons due to the correction of Jukes-Cantor (21), thereby ignoring all positions which were lacking in any allele. Unrooted evolutionary trees for single P. aeruginosa loci were constructed by the neighbor-joining method (41). The significance of the branching order was evaluated by bootstrap analysis of 500 computer-generated trees. In order to construct the consensus multilocus tree, the genetic distance matrices of each locus were normalized by setting the maximum allelic distance to 1 and averaged. The resulting mean genetic distance matrix enabled the construction of the normalized consensus multilocus tree by the neighbor-joining method. The classification of P. aeruginosa genotypes was performed by cluster analysis (unweighted pair group method using arithmetic averages) of SpeI macrorestriction fragment patterns as described previously (17, 38).
The population structure of P. aeruginosa was assessed by testing the null hypothesis that the alleles are in linkage equilibrium. The index of association (IA) is defined by the observed variance (VO) of the mean number of loci at which two P. aeruginosa strains differ divided by the expected variance (VE) under assumption of linkage equilibrium, minus 1 (27). IA was calculated as a measure of linkage disequilibrium using a program by B. Spratt, University of Oxford, Oxford, United Kingdom. The significance of IA was estimated with the same software by generation of 1,000 data sets under the assumption of random association of loci. The variances of their mean differences between two strains were compared to the VO of the mean number of loci at which two P. aeruginosa strains differ. If the VO had been greater than that obtained with any of the randomized datasets, the number of loci at which two P. aeruginosa strains differ varied more extensively than maximally attained among individuals in linkage equilibrium, indicating significant nonrandom association of gene loci (linkage disequilibrium). Generation of 1,000 randomized data sets led to a significance level of P ≤ 0.001.
The MEGA software was also applied to determine the relative synonymous codon usage (RSCU) of analyzed genes, that is, the observed frequency of a particular codon divided by its expected frequency under the assumption of equal usage of the synonymous codons for an amino acid (47):
where xij is the number of occurrences of the jth codon for the ith amino acid, and ni is the number of alternative codons for the ith amino acid. The codon adaptation index (CAI) provides a quantitative measure to assess the synonymous codon bias of various genes (47). It is defined as the geometric mean of the RSCU values corresponding to each of the codons used in that gene, divided by the maximum possible CAI for a gene of the same amino acid composition (47):
where RSCUk is the RSCU value for the kth codon in the gene, RSCUkmax is the maximum RSCU value for the amino acid encoded by the kth codon in the gene, and L is the number of codons in the gene. As the RSCU values refer to the mean codon usage in the P. aeruginosa genome, the CAI also indicates the relative adaptiveness of the codon usage of a particular gene to the average codon usage of the P. aeruginosa genome. The complete P. aeruginosa PAO1 genome sequence is accessible on the Website http://www.pseudomonas.com. The null hypothesis of whether the codon usage of the P. aeruginosa genome is exclusively determined by GC content was tested. The observed codon frequencies in the P. aeruginosa genome were compared with the expected codon frequencies calculated from the GC content at the first, second, and third codon positions under the assumption of the same amino acid composition. The significance of the differences was evaluated by χ2 statistics.
In order to assess the sequence diversity and structure of natural P. aeruginosa populations, a variety of P. aeruginosa isolates from the aquatic environment and human disease habitats was selected (see Materials and Methods). These 19 strains were assigned to 14 genotypes by cluster analysis of their SpeI macrorestriction fragment patterns (see below). Six chromosomal loci of the strains' common gene repertoire that met the following criteria were compared: (i) even distribution on the chromosome; (ii) different functions of the encoded proteins, ranging from housekeeping to accessory functions; and (iii) various cellular localization of the gene products representing all cell compartments: the origin of replication oriC (59), the citrate synthase gene citS (11), the β-lactamase gene ampC (24), the lipoprotein I gene oprI (40), the flagellin gene fliC (50, 55) and the type IV pilin gene pilA (42, 49) were taken as a representative collection of the genetic repertoire of P. aeruginosa. Figure Figure11 shows a SpeI-DpnI map of the circular PAO chromosome, indicating the location of the selected genes.
The average pairwise differences of nucleotide sequences in individual gene loci ranged from 0.05 to 29.7% and, in the case of different pilin gene classes, even up to 71.3% (Table (Table1).1). oriC, citS, ampC, oprI, and a-type and b-type fliC showed only single nucleotide substitutions. Table Table22 displays the distribution of different sequence polymorphisms in the 19 isolates. No evidence could be found for nonrandom clustering of polymorphic sites as tested with the algorithm of Stephens (53), i.e., SNPs are more or less randomly dispersed except in two cases (see below). The average rate of sequence polymorphism was 0.3% in the above-mentioned genes, which is about 1 order of magnitude lower than in comparable housekeeping genes of Salmonella (Table (Table1).1). In contrast, pilA encoding the subunits of type IV pili was highly polymorphic. The analyzed P. aeruginosa strains harbored several distinct classes of pilin sequences that are less closely related among themselves than with pilins of other species (51). Their Pseudomonas-atypical codon usage and a pilin group-specific sequence insertion downstream of pilA lead to the conclusion that the pilin genes were acquired by repetitive horizontal gene transfer (Fig. (Fig.2).2). Despite similar localization at the bacterial cell surface, the flagellin gene (fliC) was more conserved than the hypervariable pilA. Genetic diversity of a-type and b-type flagellins, respectively, was limited to several nucleotide substitutions and, in the case of strains ATCC 21776 and DSM 1128, to a variable 141-bp central cassette showing 28% nucleotide and 40% amino acid diversity (50, 52). Significant nonrandom clustering of polymorphic sites within this cassette indicated an intragenic recombination event and a mosaic gene structure (Fig. (Fig.2).2). Figure Figure22 summarizes the detected sequence variations of all analyzed loci, including the mosaic structure of flagellin and pilin genes.
Probes of the indicated genes were hybridized on PFGE-separated SpeI digestions of P. aeruginosa chromosomes in order to elucidate the genomic variability of the respective chromosomal region by restriction fragment length polymorphisms (Fig. (Fig.3a).3a). The oriC hybridizing fragments (Fig. (Fig.3a3a [yellow bands]) were consistent in length. Eleven of 14 genotypes belong to a single fragment length class (225 to 248 kb), indicating that the genome organization around the origin of replication is highly conserved. ampC represents another segment of the auxotroph-rich region of the chromosome, i.e., a conserved region with a variety of housekeeping functions (20). However, these ampC hybridizing fragments (Fig. (Fig.3a3a [blue bands]) showed the most pronounced size variation that we found (134 to 585 kb). The ampC locus itself did not exhibit extensive sequence polymorphism (Tables (Tables11 and and2).2). Whereas the oprI and citS hybridizing fragments (Fig. (Fig.3a3a [white and pink bands, respectively]) were assigned to three and four distinct classes of fragment lengths, the fliC and pilA probes (Fig. (Fig.3a3a [green and red bands, respectively]) detected a broad range of SpeI fragment sizes among genotypes, from 83 to 485 kb and from 52 to 455 kb, respectively, suggesting that both fliC and pilA genes are localized in chromosomal regions of extended interclonal variability. Except for the highly polymorphic pilin genes, sequence diversity in the common gene repertoire (Tables (Tables11 and and2)2) did not reflect the diversity of the surrounding genome organization in P. aeruginosa. Gene sequences were less polymorphic than the corresponding macrorestriction patterns (Fig. (Fig.3a),3a), leading to the impression that mainly insertions, deletions, and rearrangements contribute to the substantial diversity of the P. aeruginosa chromosome (39, 43). In summary, sequence variability of the gene loci did not depend on their chromosomal map position. Highly polymorphic and more-conserved coding sequences seemed to be scattered throughout the chromosome, thereby creating a mosaic-like structure of sequence diversity.
Different approaches were employed to assess the P. aeruginosa population structure. In order to examine whether adjacent gene loci are genetically linked, evolutionary trees were constructed for each locus on the basis of polymorphic sites by the neighbor-joining method (41). The single-locus trees did not resemble one another in their topologies. Among these single-locus trees, all interior branches with a bootstrap confidence level of >0.95 resulted in significantly different partitioning of strains between the tested loci, indicating that the molecular evolutionary relationships among all strains differed from locus to locus. Only variants of the same genotype showed identical sequences in the six analyzed gene loci, except for one single synonymous nucleotide substitution (G1283A) of the CF isolate G9 in citS (Table (Table2).2). The similarity of coding sequences confirmed the close genetic relationships among each genotype that were also detected by SpeI macrorestriction analysis (see below). For comparison of the cumulative sequencing data with macrorestriction fragment pattern data, the molecular evolutionary relationships among all strains at six loci were summarized in a consensus multilocus tree, a normalized hypothetical phylogenetic tree based on sequenced genes (Fig. (Fig.4).4).
The noncongruent topologies of single-locus trees suggested either no or rather weak genetic linkage of adjacent loci. Since trees give some indication but cannot provide definitive evidence for the extent of linkage, the associations between genes at different loci were evaluated by the calculation of the IA (27). IA is a generalized measure of linkage disequilibrium (27) and defined by the VO of the mean number of loci at which two P. aeruginosa strains differ divided by the expected variance VE under assumption of linkage equilibrium, minus one (27). If there is random association between loci, VO approximates VE, i.e., IA has an expected value of zero. The significance of IA was estimated by generation of 1,000 data sets (P ≤ 0.001) under the assumption of random association of loci. Analysis of all strains revealed an IA of 1.057. As the VO (1.67) of the mean number of loci at which two P. aeruginosa strains differ was greater than the maximal variance (Vmax trial = 1.65) obtained in 1,000 trials of randomized data sets, the number of loci at which two P. aeruginosa strains differ varied more extensively than maximally attained among individual strains in linkage equilibrium. This indicated significant nonrandom association of gene loci (linkage disequilibrium, P < 0.001). However, after treatment of each genotype as a unit, the evidence of association disappeared with P < 0.001 (IA = 0.313). In this case the VO of 0.95 was far below the maximal variance obtained in 100 (Vmax trial = 1.52) or in 1,000 (Vmax trial = 1.85) randomized data sets. This quantitative analysis confirms that strains which belong to the same genotype are characterized by nonrandom association of alleles that is not disrupted by recombination. In contrast, the recombination frequency of large chromosomal segments between genotypes was high enough to break up clonal associations and have all genotypes in linkage equilibrium to each other. Hence, the P. aeruginosa genotypes are equivalent biovars that form a net-like population structure. Each genotype represents a cluster of closely related strains (clonal variants) that share identical alleles.
Next, the population data derived from sequenced genes were compared with the results of SpeI macrorestriction fragment analysis (Fig. (Fig.3a).3a). Strains that exhibited more identities in their SpeI fragment patterns than expected from random distribution of restriction sites and a similar genome size, i.e., more than 70%, were assigned to the same genotype (17, 38). Figure Figure3b3b depicts the similarity of SpeI macrorestriction fragment patterns of analyzed strains. Members of the same genotype formed clusters in correspondence with the sequencing data, but otherwise the tree did not show pronounced hierarchy. With the exception of these genotypic clusters of closely related strains, the UPGMA tree based on fragment patterns did not fit any single-locus tree, nor did it resemble the normalized consensus multilocus tree (Fig. (Fig.4)4) based on sequence analysis.
The CAI provides a quantitative measure to assess the synonymous codon bias of various genes (for exact definition see Materials and Methods and reference 47). Here, the CAI indicates the relative adaptiveness of the codon usage of a particular gene to the average codon usage in the P. aeruginosa genome (http://www.pseudomonas.com). A gene that consists only of the most frequently used codons in the P. aeruginosa chromosome has the maximal possible CAI value of 1.0. The pattern of codon usage was monitored for each of the five sequenced genes. Figure Figure55 shows that dissimilar P. aeruginosa genotypes did not differ in their CAI values for a particular locus, although a-type and b-type flagellins and different groups of pilins could be distinguished. The average CAI values of ampC (0.684), citS (0.725), and both fliC types (a type, 0.667; b type, 0.619) correspond with the mean chromosomal CAI of 0.654, but pilA and oprI are characterized by lower mean CAI values of 0.339 (group I pilA), 0.235 (group II pilA), and 0.420 (oprI). For a more thorough evaluation, further P. aeruginosa genes were selected and their CAI values were compared to those of homologous Escherichia coli genes (Fig. (Fig.6).6). The selection focused on genes from similar metabolic categories (paralogs: trpE-phnA, trpG-phnB, arcB-argF, plcN-plcSR1,R2) localized in distant chromosomal regions, genes with different levels of gene expression (lowly expressed regulatory genes: amiR, fleR, algR, regA, glpR, anr, and trpI; highly expressed genes: pilA and fliC), and genes adjacent to oprI (pfeA, pyrF, lipA, and lipH) colocalized on SpeI macrorestriction fragment SpVPAO.
Except for pilA and oprI, all analyzed P. aeruginosa genes had consistently high CAI values (Fig. (Fig.6).6). As is evident from Fig. Fig.6,6, the codon adaptation indices of P. aeruginosa genes apparently did not depend on the level of gene expression, the chromosomal map position, or the function and/or cellular location of the gene product. The null hypothesis of whether the high CAI values simply reflect the high genomic GC content of 67.2% was tested. The observed codon frequencies in the P. aeruginosa genome were compared with the expected codon frequencies calculated from the GC content at the first, second, and third codon positions under the assumption of the same amino acid composition. The average CAI in the P. aeruginosa genome (CAIobs = 0.654) was significantly lower (χ2 = 1,786; df = 63; P < 10−6) than the CAI (CAIexp = 0.743) predicted from GC content. Amino acids could be classified into three categories according to whether codon usage coincided with the theoretical value calculated from GC content with equal weight for G and C or showed a moderate or strong deviation. The strongest deviation of codon usage from the theoretical value was shown by Glu followed by (from strongest to weakest) Leu, Gly, Arg, Thr (strong, P 0.001) and Pro, Ala, Ile, His, Ser, Asp, Tyr, and Phe (moderate, P < 0.01), whereby no general trend towards a particular nucleotide at the third position was evident. In contrast, the synonymous codon usage of Val, Gln, Asn, Lys, and Cys was compatible with the frequencies expected from GC content. Hence, the GC content is an important but not exclusive determinant for the high CAI values of P. aeruginosa genes.
The comparative sequence analysis of P. aeruginosa revealed a strikingly low sequence diversity in the common gene pool irrespective of metabolic category. The mean sequence diversity of 0.3% in conserved genes is about 1 order of magnitude lower than in comparable housekeeping genes of Salmonella (4, 31–34, 46, 57). Although—with the exception of two fliC genes—no intragenic recombination within the selected genes was observed in the strain panel, the MLST analysis indicated a high recombination frequency among different genotypes (IA ≈ 0; P < 0.01) leading to random association of gene loci at the interclonal level (linkage equilibrium). Genotypes are equivalent biovars that form a net-like population structure and cannot be classified into taxonomic groupings. In contrast, the UPGMA tree based on similarity of SpeI macrorestriction fragment patterns showed hierarchical genotypic lineages, suggesting a clonal structure of the P. aeruginosa population. At first glance these results contradict the paradigm of population genetics that the structure of a bacterial population is mainly determined by its recombination frequency (29, 30, 45, 46, 58). It was mainly deduced from population analyses of Enterobacteria and other pathogens (E. coli and Salmonella, Shigella, Legionella, Haemophilus, Bordetella, Streptococcus, and Listeria spp.) (44) by means of MLEE. Their rate of recombination is low enough to permit formation of a hierarchy of clonal lineages. This does not apply to P. aeruginosa: the alleles of the common gene pool are in linkage equilibrium among genotypes, but nevertheless, clones can be discerned by means of MLST and macrorestriction analysis. The macrorestriction fragment pattern analysis classifies isolates in terms of gain or loss of SpeI recognition sites and genome rearrangements such as insertions, deletions, and inversions; i.e., it fingerprints the whole chromosome. Our previous work on P. aeruginosa genome diversity revealed that changes in fragment pattern are caused in 92% of cases by insertions and/or deletions (indels), but in only 8% by point mutations in SpeI recognition sites (39). It is not the SNPs but mainly indels and rearrangements that account for differences in the SpeI macrorestriction fragment patterns which were used to differentiate genotypes (17, 38). The genotypes showed some ranking in the UPGMA tree which is based—as we know from previous studies (39)—on specific DNA insertions shared by variants of the same genotype. Since these nonconserved DNA blocks were not included in the MLST approach, the UPGMA tree fit neither each single-locus tree based on sequenced genes nor the consensus multilocus tree that summarizes the molecular evolutionary relationships among all strains at six loci. Moreover, closely related strains which belong to the same genotype are characterized by nonrandom association of alleles that is not disrupted by recombination. Hence, the P. aeruginosa genotypes can be considered clones. The members of a clone (clonal variants) show almost identical sequences in their conserved gene pool and retain a clone-specific set of 1- to 200-kb DNA blocks (39, 43) (clone-specific signature).
The clones of Enterobacteria are specified by adaptation to a particular habitat, so one single electrophoretic type predominates one habitat (Fig. (Fig.7)7) (44, 46, 58). Electrophoretic types are associated with particular pathogenicity islands which result in disease-associated clones (18). In contrast, our work did not reveal any correlation between P. aeruginosa clones and habitats (Fig. (Fig.7)7) or between habitats and alleles of hypervariable loci, like pilA and fliC, that superficially might be considered indicators of adaptive divergence. Dominant clones are ubiquitously distributed in both disease and environmental habitats (14, 37): for example, members of the same clone were recovered from oil shale and from the lungs of patients with CF. Disease and environmental isolates of P. aeruginosa clones are indistinguishable in their genotypic (37, 39) and chemotaxonomic properties (14) and are functionally equivalent in several traits relevant for their virulence and environmental properties (1). The virulence factors of P. aeruginosa exert a broad tropism to both animals and plants (36). Hence, our analysis of population structure drawn from sequencing of genes that are representative for the conserved repertoire of the taxospecies is not biased by the overrepresentation of CF isolates. P. aeruginosa appears to be so versatile that it can colonize a variety of different ecological niches without specialization (Fig. (Fig.7).7). The data of this study suggest that the mode of codon usage is a key feature for such a successful universal lifestyle.
A variety of P. aeruginosa genes were analyzed in order to examine whether the CAI corresponds to the chromosomal localization of a locus (in analogy to eukaryotic isochores), functional features of the gene products, or the phylogenetic origin of DNA segments. Except for pilA and oprI, all tested genes exhibited consistently high codon adaptation indices irrespective of their chromosomal localization, level of gene expression, and protein function (Fig. (Fig.6).6). High CAI values apparently are a species-specific feature of P. aeruginosa. In contrast, the CAIs of E. coli genes support the partitioning of these genes into three classes (9, 22, 28, 48). High CAI values correlate with high levels of gene expression, and average or low CAI values correlate with low levels of gene expression (28). Examples of CAI values of highly and lowly expressed E. coli genes are displayed in Fig. Fig.55 and compared to CAIs of the corresponding P. aeruginosa genes. Genes of the third E. coli class were introduced into the E. coli genome by horizontal gene transfer (28). The exceptionally low CAI values of pilA and oprI might be representatives of an analogue class of genes in P. aeruginosa. At least for the pilin genes, there is evidence that the genes were recently acquired from other taxospecies by horizontal gene transfer, most likely from the Moraxella lineage (51). Although the GC content of pilin genes is still significantly different from that in the bulk genome, the pilin genes tend to adapt the species-specific codon usage (51).
The null hypothesis that the high average CAI of the P. aeruginosa genome (CAIobs = 0.654) just reflects the high GC content of this taxospecies was discarded. There must be additional factors that exert selective pressure to maintain high CAIs within a narrow range for the majority of genes. In the case of E. coli, codon preferences have been interpreted in terms of translational efficiency and fidelity and substitutional biases operating during DNA transcription, replication, and repair processes (49). Variations in tRNA availabilities are considered the key factor in producing the codon bias of the highly expressed genes (2). In the P. aeruginosa genome, strong disparities of either G, C, A, or U at the third codon position for individual amino acids (Leu, Thr, Glu, Arg, and Gly) demonstrate a compositional asymmetry between the coding and the noncoding strands. Hence, the high and uniform CAI of P. aeruginosa is apparently governed not only by high GC content but also by codon-anticodon interaction, proofreading and codon context.
A high correlation between CAI and protein abundance has been experimentally verified for exponentially growing E. coli and interpreted as matching substrate levels with cellular demands (22, 56). Hence, it is striking that in the case of P. aeruginosa, virtually all genes fall into a narrow range of high CAI, implying an optimal codon usage independent of the encoded metabolic category. Even the weakly expressed regulatory genes and members of paralogous gene families encoding similar, but not identical, functions have CAI values of 0.55 to 0.85 (Fig. (Fig.6).6). The fact that the codon usage of most genes is optimally adapted and not correlated with cellular demands suggests that the translational apparatus of P. aeruginosa handles the recruitment of its genetic repertoire with similar efficacy. In other words, P. aeruginosa can rapidly exploit its broad metabolic potential in order to adapt to changes in supply of nutrients or other environmental factors. A high CAI is ideal for a ubiquitous microorganism that typically lives in aquatic habitats with a low supply of nutrients and metabolizes virtually all carbon sources (35). Correspondingly, the growth rate of P. aeruginosa is not further stimulated by an above-average increase of supply of nutrients. Hence, in nutrient-rich chemostats, other bacterial species can efficiently compete with P. aeruginosa (35). However, the abundance of nutrients is not typical for natural environmental habitats. In summary, the generally high CAI predisposes the organism to metabolic versatility and facilitates adaptation to new habitats. After completion of the P. aeruginosa PAO1 sequencing project, the upcoming methodology of functional genomics to study gene expression at a global level could address this issue, whether or not the uniformly high CAI is a key feature of P. aeruginosa in the colonization of and persistence in virtually all aquatic mesophilic habitats.
We thank B. Spratt, University of Oxford, Oxford, United Kingdom, for kindly providing us with the program to test linkage equilibrium. We also thank M. Achtman, Max-Planck-Institut für molekulare Genetik, Berlin, Germany, for letting us have access to his computer resources to test the mosaic structure of P. aeruginosa genes. The technical assistance by Jutta Boßhammer is gratefully acknowledged. Special thanks go to our colleagues Karen Larbig and Lutz Wiehlmann for helpful discussions.
This work was supported by a grant from the Deutsche Forschungsgemeinschaft.