|Home | About | Journals | Submit | Contact Us | Français|
We present the complete genome sequence of Mycoplasma hyopneumoniae, an important member of the porcine respiratory disease complex. The genome is composed of 892,758 bp and has an average G+C content of 28.6 mol%. There are 692 predicted protein coding sequences, the average protein size is 388 amino acids, and the mean coding density is 91%. Functions have been assigned to 304 (44%) of the predicted protein coding sequences, while 261 (38%) of the proteins are conserved hypothetical proteins and 127 (18%) are unique hypothetical proteins. There is a single 16S-23S rRNA operon, and there are 30 tRNA coding sequences. The cilium adhesin gene has six paralogs in the genome, only one of which contains the cilium binding site. The companion gene, P102, also has six paralogs. Gene families constitute 26.3% of the total coding sequences, and the largest family is the 34-member ABC transporter family. Protein secretion occurs through a truncated pathway consisting of SecA, SecY, SecD, PrsA, DnaK, Tig, and LepA. Some highly conserved eubacterial proteins, such as GroEL and GroES, are notably absent. The DnaK-DnaJ-GrpR complex is intact, providing the only control over protein folding. There are several proteases that might serve as virulence factors, and there are 53 coding sequences with prokaryotic lipoprotein lipid attachment sites. Unlike other mycoplasmas, M. hyopneumoniae contains few genes with tandem repeat sequences that could be involved in phase switching or antigenic variation. Thus, it is not clear how M. hyopneumoniae evades the immune response and establishes a chronic infection.
Slightly over a century ago Edmond Nocard and Emile Roux successfully cultivated the agent of contagious bovine pleuropneumonia, Mycoplasma mycoides (38). Since that time approximately 111 species of the genus Mycoplasma have been identified in animals, and another 102 organisms comprise the remaining portion of the class Mollicutes (http://www.the-icsp.org/subcoms/mollicutes.htm). Most members of this class are pathogenic and colonize a wide variety of hosts, including animals, plants, and insects. They represent a group of cell wall-less, low-G+C-content eubacteria that are phylogenetically related to the Clostridium-Streptococcus-Lactobacillus branch of the phylum (29, 44, 61). Their small genomes (580 to ~1,300 kb) infer a degenerative evolutionary tract, and their minimal metabolism has provided a focus for studies of the minimal genome (22, 35). The complete genome sequences of nine mycoplasmas have been reported previously.
Despite the genomic simplicity of the organisms, mycoplasma diseases are complex and relatively unstudied. One hallmark of these diseases is disease chronicity (47, 54), but equally important is the ability to alter or circumvent the immune response and potentiate disease due to other pathogens (5, 37, 54). Mycoplasma hyopneumoniae is the causative agent of porcine enzootic pneumonia, a mild, chronic pneumonia of swine commonly complicated by opportunistic infections with other bacteria (46). Like most other members of the order Mycoplasmatales, M. hyopneumoniae is infective for a single host species, but the mechanisms of host specificity are unknown. This organism is highly infectious, has a worldwide distribution, and occurs in almost every herd. Relative control has been achieved through active vaccination programs, but porcine enzootic pneumonia continues to be one of the major economic problems in the swine industry. While progress has been made in understanding the molecular basis for some mycoplasma diseases (41), M. hyopneumoniae has resisted advances because of its fastidious growth and the lack of genetic tools and transforming protocols. Complicating the studies are active phenotypic switching and antigenic variation mechanisms within mycoplasmas (63) that may also play an important role in M. hyopneumoniae-host interactions but are largely unexplored.
Despite years of study there have been relatively few M. hyopneumoniae studies defining pathogenic mechanisms. The cellular adhesin that mediates specific attachment to swine cilia was not identified until the mid-1990s (66), and only recently has the molecular basis of adherence been studied (18-20, 34). There are many aspects of adherence and interaction with host cells that remain largely unexplored (33, 40, 65), and even the surface architecture of this mycoplasma species has not been fully evaluated (48, 51, 59, 60; T. A. Burnett, S. J. Cordwell, S. J. Geary, M. F. Walker, F. C. Minion, and S. P. Djordjevic, Abstr. 14th Congr. Int. Org. Mycoplasmol., abstr. 169, 2002). As a first step toward gaining new insight into the components that contribute to virulence and the mechanisms by which M. hyopneumoniae causes disease, we sequenced the genome of M. hyopneumoniae strain 232. Through this analysis we identified cilium adhesin homologs, lipoproteins, and other components that might contribute to persistence and virulence.
Strain 232 of M. hyopneumoniae was isolated from a pig infected with strain 11 (4, 30) and has been commonly used to study virulence and vaccination regimens in the United States (49, 54, 55). Chromosomal DNA was isolated by the method of Artiushin et al. (1), and two libraries were constructed. Chromosomal DNA was incompletely digested with Tsp509I and size fractionated on sucrose gradients, and 2- to 4-kb fragments were ligated into EcoRI-digested, dephosphorylated pZERO vector (Invitrogen) with selection for kanamycin resistance (library 1). A second library was constructed from chromosomal DNA that was sheared to an average size of 3 to 5 kb by using a Hydro-Shear apparatus (Genomic Solutions, Ann Arbor, Mich. [formerly GeneMachines, San Carlos, Calif.]). Ends were repaired by using T4 DNA kinase and the Klenow fragment of DNA polymerase (New England Biolabs, Beverly, Mass.), the blunt end fragments were ligated into SmaI-digested, dephosphorylated, gel-purified pBluescript KS(+), and the resulting ligation mixture was transformed into Escherichia coli DH10B [(lac)X74 deoR recA1 endA1 araD139 Δ(ara-leu)7679 galU galK rpsL nupG mcrA Δ(mcrCB-hsdSMR-mrr) (80 lacZΔM15)]. Recombinant clones were picked and arrayed into 384-well plates containing SOB medium supplemented with 20% glycerol by using a Q-PIX picking and arraying robot (GENETIX, Hampshire, United Kingdom). These plates were sealed and stored at −70°C following overnight incubation at 37°C.
Cells from stock 384-well plates were inoculated into deep-well 96-well plates containing 400 μl of 2× YT medium supplemented with either 100 μg of ampicillin per ml or 50 μg of kanamycin per ml. The plates were shaken in a HiGRO apparatus (Genomic Solutions) at 250 rpm for 14 h at 37°C and centrifuged, and plasmid DNA was extracted by the alkaline lysis protocol. The entire protocol was performed robotically by using Beckman 96-channel MultiMEK robots. Lysates were cleared by filtration through Millipore lysate clearing plates (Millipore, Billerica, Mass.). Cleared lysates were then added to Millipore plasmid purification plates, and bound plasmid DNA was eluted with Tris-EDTA (pH 8.0). Sequencing reaction mixtures consisted of 2 μl of ABI Big Dye sequencing reaction mixture, 15 pmol of either primer T7 or primer T3, and 300 ng of plasmid DNA. Reactions were performed in MJ Research Tetrads (MJ Research, Waltham, Mass.) for 30 cycles of 95°C for 10 s, 50°C for 5 s, and 60°C for 4 min. The reaction products were precipitated with 2-propanol, washed with 80% ethanol, and resuspended in loading dye (formamide, 0.1% blue dextran, EDTA), and loaded onto MegaBACE sequencers equipped with ABI dye-compatible filter sets and mobility correction. Sequence data were analyzed and assembled by using the Phred/Phrap/Consed assembly package (9, 10, 15) and the CAP3 sequence assembly program (21). To close gaps, custom primers were designed near the ends of contigs, and PCRs were performed with chromosomal template DNA. Sequences were obtained from PCR products that spanned the gaps. The sequence coverage of both strands was 10×, and the error rate was less than 1 base per 10 kb.
Annotation of the M. hyopneumoniae genome was performed by utilizing a combination of programs for gene prediction, similarity searching, and functional assignment. The information from these analyses was imported into a relational database that was accessible from an internal website that allowed us to view and update individual gene records, which allowed us to refine start sites and add functional descriptions and notes (17). Basic sequence analysis tools were provided by the Accelrys Genetics Computer Group package of programs (Wisconsin Package, version 10.3).
For determination of potential protein coding sequences we utilized Glimmer (7) to create organism-specific open reading frame (ORF) models that could then be used to search the entire genome for ORFs matching the predictive models. Glimmer was modified to take into account the alternative genetic code of mollicutes which use TGA as a tryptophan codon instead of as a stop codon. The genome was first arranged so that the initial base of the ATG start codon of the putatively identified dnaA gene was base 1 of the forward strand. Each predicted ORF was then assigned an M. hyopneumoniae identification number. Open reading frame mhp001 was assigned to the putative dnaA gene, and each subsequent ORF was then numbered consecutively according to the leftmost base (the start codon for ORFs on the forward strand and the stop codon for genes on the reverse strand).
BLAST searches were performed for all predicted ORFs by using a BLASTP search of amino acid similarities to sequences in the GenBank nonredundant protein database. In addition to BLAST similarity searching, we also tentatively identified functional domains within the M. hyopneumoniae ORFs by searching for similarities to the Prosite motif library (11). Programs from the Wisconsin Package provided composition and hydrophobicity analyses along with scanning for potential signal peptide and transmembrane domains. The results of these additional analyses allowed us to refine the gene assignments initially made with BLAST. Furthermore, alignments with known proteins provided assistance with start codon prediction.
The results of all of these searches were used to putatively identify each M. hyopneumoniae ORF when a significant hit between the M. hyopneumoniae sequence and the GenBank sequence was found. Computer-aided gene prediction along with human inspection of each gene record was then used to finalize gene assignments for each M. hyopneumoniae ORF.
To identify genomic sequences that code for tRNAs, the set of programs that constitute the software package tRNAscan-SE (26) was used. rRNAs were identified by similarity to the corresponding genes in the Ribosome Database Project sequence database (28). The sequences for tmRNA (57), the 4.5S signal recognition particle (45), and RNase P (31) were also identified based upon sequence similarity with known representatives of these RNA genes.
To identify structural features of the genome, a set of programs that search for motifs and structural elements was used. The Tandem Repeats Finder program was used to identify tandem repeat sequences throughout the genome (3). These sequences are commonly found in mycoplasma lipoproteins. REPuter (25) was used to identify repeat regions throughout the genome. Blastn was used to identify runs of adenines and thymidines. Such runs have been identified as gene expression control points in other Mycoplasma species.
The genome sequence of M. hyopneumoniae has been deposited in the GenBank database under accession number AE017332. The annotated genome and supplementary data are available on the World Wide Web at http://mycoplasma.genome.uab.edu/.
Table Table11 and Fig. Fig.11 show the general features of the genome of M. hyopneumoniae strain 232. The genome consists of 892,758 bp and 692 predicted protein coding sequences. The overall G+C content is 28.6 mol%. Functions have been assigned to 304 (44%) of the predicted protein coding sequences, while 261 (38%) of the sequences encode conserved hypothetical proteins and 127 (18%) encode unique hypothetical proteins. The genome contains one 16S-23S rRNA operon, one 5S RNA located 100 kb from the 23S rRNA gene, and 30 tRNA coding sequences. The positions of these sequences in the genome are shown in Fig. Fig.11.
The codon adaptation index for each gene was calculated by the method of Sharp and Li (53). The values obtained represent codon usage patterns in individual genes relative to a reference set of genes. The plot of these values in Fig. Fig.11 identifies regions of the genome with low codon adaptation values that have unusual codon usage patterns, possibly reflecting recent acquisition of genes by horizontal transfer. The higher values are thought to imply that the genes are more highly expressed since they conform more closely to the most abundant codons for the same amino acids.
In order to locate the origin of replication of the M. hyopneumoniae genome, we attempted to identify several features that have been associated with replication origins in other bacterial species, including mollicutes. Bacterial origins of replication are typically located in the vicinity of the dnaA and dnaN genes, which code for proteins involved in DNA replication (62, 64). In addition, multiple DnaA box primary sequence motifs are commonly found within the intergenic regions on either side of the dnaA gene (6). M. hyopneumoniae has 348- and 148-bp intergenic regions between the rpmH and dnaA genes and between the dnaA and dnaN genes, respectively. Within these intergenic regions we searched for the presence of consensus dnaA box motifs using the pattern TTATC(C,A)A(C,A) (12). Only two DnaA box motifs were found in the intergenic region upstream of dnaA, and none were found between dnaA and dnaN. The complete M. hyopneumoniae sequence had 688 sequences matching this consensus motif scattered throughout the genome. When we used two slightly different, more relaxed DnaA box consensus motifs [TT(A,T)C(C,A,T)A(C,A)A and TTA(T,A)(T,C)(C,A)A(C,A)], two sequences matching each of these patterns were found between the dnaA and dnaN genes. However, in each case, well over 2,000 hits located throughout the rest of the genome were also seen. Therefore, the specificity of the pattern used to try to detect dnaA box motifs was very low, which decreased our confidence in the significance of the sequences identified. These findings are in contrast to the multiple DnaA boxes found in the intergenic regions surrounding dnaA in other mollicutes, such as Mycoplasma pulmonis, Mycoplasma genitalium, Mycoplasma capricolum, and Spiroplasma citri (6). In addition to the presence of DnaA box motifs, replication origins can also frequently be identified by looking for biases in strand composition through measures such as GC skew and by looking for asymmetries in transcription polarity (36, 43). For M. hyopneumoniae, there are no significant asymmetries that can be readily detected in either GC skew or transcription polarity (Fig. (Fig.1).1). The lack of such bias in M. hyopneumoniae is even more pronounced than that observed for the mollicutes Mycoplasma pneumoniae and Ureaplasma urealyticum, which exhibit some detectable asymmetry, although it is less pronounced than that observed for the four mollicutes indicated above, which also have multiple DnaA box motifs (14, 16). Therefore, the only significant feature of the M. hyopneumoniae genome that provides any possible indication of the location of the origin of replication is the presence of the dnaA gene. Otherwise, there are no features that allow definitive mapping of the origin to the intergenic region upstream of the dnaA gene, as seen in other bacteria.
Mycoplasmas have been shown to have unusual mechanisms for controlling gene expression and altering the surface structure. Many of these processes occur during DNA replication. For instance, certain classes of genes, most notably lipoprotein genes, are known to contain tandem repeats that undergo slipped strand mispairing, resulting in either phase switching or size variation (41, 63). The Tandem Repeat Finder program (3) was used to identify repetitive domains in the M. hyopneumoniae genome. Each of these domains was then mapped to the chromosomal region to identify the ORF. Unlike other mycoplasma species studied, M. hyopneumoniae had few significant tandem repeat sequences. One exception, the gene for the cilium adhesin P97, had several tandem repeats, one of which is involved in cilium adherence (18, 34). The paralogs of the P102 gene, the second gene in the P97 operon, also contained tandem repeats (Table (Table2).2). The gene encoding one potential lipoprotein, mhp288, contained the repeat CTACCCAAAAAGATCAAA in 10.6 copies 310 bp downstream of its start codon (Table (Table22).
Sequence duplications, palindromic structures, and degenerate repeats have also been shown to play important roles in bacterial gene regulation and virulence (56). To study this in M. hyopneumoniae, the genome sequence was subjected to REPuter analysis (24). Figure Figure22 shows the results of a repeat analysis with a minimum size of 25 bp. The most frequent repeats are forward and palindromic (reverse complement). These repeats also are the largest repeats found in the genome. There appear to be no large regions of duplication that account for the apparent clustering of the repeats around the 200,000-, 400,000-, and 700,000-bp regions. Many of these repeats represent paralogous families of genes (Table (Table33).
Because of their limited genome sizes, mycoplasmas must use a higher proportion of the coding capacity for essential functions than other organisms use. Often these essential functions are represented in paralogous families. In a BLASTCLUST analysis in which proteins with >30% amino acid identity over 70% of the length were defined as paralogs, approximately 26.3% of the proteins (182 of 692 ORFs) could be assigned to a paralogous family containing two or more members. This represented 15.3% of the genome sequence. This compares with the 5.5% of M. genitalium, 19.1% of M. pneumoniae, 12.7% of M. pulmonis, 8.1% of U. urealyticum, and 25.4% of Mycobacterium penetrans ORFs that are members of paralogous families (50). Families containing three or more members are described in Table Table3.3. The largest family represents ABC membrane transporters with ATP binding site motifs. Most of the families are composed of unique genes with no known function, but additional families consist of genes encoding GTP binding proteins, site-specific DNA methyl transferases, trsE transfer complex proteins, and the P97 and P102 families.
M. hyopneumoniae attaches to the swine respiratory tract epithelium in a site-specific manner. The bacterium attaches only to the cilia of the respiratory epithelium (32) as a result of the interaction of a surface protein, P97, with its cognate receptor (66). Genetic analysis of the adhesin revealed a gene coding for a 126-kDa protein (18) in a two-gene operon (20). The binding epitope of P97 was identified by using transposon mutagenesis in a heterologous expression system (19, 34). The second gene in the operon coded for a 102-kDa protein designated P102. The function of this protein is unknown. A BLASTP analysis of the M. hyopneumoniae genome revealed several paralogs of both the P97 and P102 genes in the genome, although previous DNA hybridization studies showed that there was only a single copy of the P97 gene (18, 58). The same studies also showed that that there were several copies of the P102 gene. The failure of the hybridization studies to find homologous P97 sequences was due to the low homology of the P97 paralogs at the DNA level, and thus they were unable to hybridize (Table (Table3).3). The location of these paralogs is shown in Fig. Fig.1.1. Alignments of the protein sequences are shown in Fig. Fig.3.3. Most of the paralogs represent gene fusions with unrelated sequences. Almost every paralog is found in a two-gene operon, and most of the operons consist of both P97 and P102 paralogs. The function of these paralogs is unknown, but only P97 contains the R1 region required for binding to swine cilia (34).
Lipoproteins, which are one of the major components of mycoplasma membranes, comprise approximately two-thirds of the total membrane mass in mycoplasmas (42). They provide antigenic diversity in the face of host responses and also possibly have a structural function. A motif analysis of the M. hyopneumoniae genome identified 53 ORFs with prokaryotic lipoprotein lipid attachment sites. This represents 8.5% of the genome coding capacity. Only three of these ORFs (psgA, trsE and ushA) have a functional assignment, and two of them, the P46 ORF (13) and the P65 ORF (23), have been studied previously. P65 (mhp677) has two paralogs at the protein level, mhp539 and mhp069. Interestingly, trsE is part of a three-member paralogous family, and only one of the members (mhp532) contains a lipid attachment site. All other putative lipoproteins are hypothetical proteins that have no known function. In addition to the acylation motif, ORF mhp371 has a high-affinity transport motif and ORF mhp531 has a membrane permease motif, so lipoproteins may have many different functions in the cell.
Maturation of a prolipoprotein to a mature lipoprotein is thought to require three separate events in bacteria, transfer of the diacylglycerol moiety to the sulfhydryl group of the N-terminal cysteine by prolipoprotein diacylglyceryl transferase activity, cleavage of the substrate by lipoprotein signal peptidase, and acylation of the N-terminal cysteine by apolipoprotein transacylase. In M. hyopneumoniae there is a prolipoprotein diacylglyceryl transferase (lgt, mhp282) and there is a putative lipoprotein signal peptidase (LspA, signal peptidase [SPase II], mhp032), but there is no apolipoprotein transacylase. The same is true for Mycoplasma gallisepticum (39) and other sequenced mycoplasmas (2). This suggests that the processing of lipoproteins in M. hyopneumoniae and other mycoplasmas is incomplete compared to that in gram-negative and gram-positive bacteria.
M. hyopneumoniae contains an abbreviated membrane protein secretory system. The pathway consists of secA (mhp295), secY (mhp207), secD (mhp139), prsA (mhp675), dnaK (mhp072), trigger factor (mhp233), and lepA (mhp079). Notably absent are secE, secG, and secF. There is an ORF with a signal peptidase I motif (mhp028), and it likely functions as lepB (SPase I), although the levels of homology with other lepB gene sequences are not high (Table (Table4).4). There is also a likely candidate for lspA (SPase II, mhp032) for processing of lipoproteins. The signal recognition particle pathway contains Ffh (mhp060) and FtsY (mhp008). A comparison of the secretory pathway components of the previously published mycoplasma genomes is shown in Table Table44.
Interestingly, M. hyopneumoniae contains no plsB, plsC, or plsX homologs that are readily identified (Table (Table4).4). Almost all other mycoplasma sequences contain plsX and plsC; a single exception is the M. pneumoniae sequence, which contains plsB in place of plsC. Also, M. hyopneumoniae does not contain an acyl carrier protein or acyl carrier protein synthetase. It does contain, however, two copies of an acyl carrier protein phosphodiesterase, mhp453 and mhp454. This suggests that M. hyopneumoniae has virtually no ability to modify phospholipids.
Motif analysis revealed a family of proteins with the phosphotransferase motif. The ORFs included sgaA (mhp386), sgaB (mhp387), sgaT (mhp571), mtlF (mhp567), mtlA (mhp569), nagE (mhp590), and licA (mhp042). There are 34 genes with ABC transporter family signatures (Table (Table22).
DNA polymerase III of M. hyopneumoniae contains the alpha subunit (polC, mhp549), alpha subunit 2 (dnaE, mhp599), the beta subunit (dnaN, mhp002), the gamma-tau subunit (dnaX, mhp123), and one additional subunit (mhp119) that is conserved but whose function is unknown. Notably absent are the E. coli components θ, δ, χ, and ψ and the 3′-5′ exonuclease dnaQ subunit. An ORF with the 5′-3′ exonuclease domain of the DNA polymerase I gene (mhp598) is also present, but an RNase H gene is not present, so it is not clear how the Okazaki fragments are removed during DNA replication. This is similar to what is found in other mycoplasma species, however (67). A functional DNA gyrase is present and is composed of subunits A (gyrA, mhp545) and B (gyrB, mhp270). Members of the primosome include the dnaG primase (mhp062) and the dnaC replicative DNA helicase (mhp664). The E. coli homologs PriA, PriB, PriC, DnaB, and DnaT are absent.
Genes encoding topoisomerases are present in the genome to generate single-strand breaks to release superhelical tension. The topoisomerases include DNA topoisomerase I (topA, mhp097) and topoisomerase IV composed of subunits A (parC, mhp 034) and B (parE, mhp035).
Also like other mycoplasmas (67), M. hyopneumoniae has a primitive DNA repair and stress response system. The elements of the general recombinational repair and SOS repair system in the genome are recA (mp041), recR (mhp121), ruvA (mhp421), and ruvB (mhp422). Notably absent are mucB, ruvC, recBC, and recD. The nucleotide excision repair system is composed of activities encoded by uvrA (mhp288), uvrB (mhp669), uvrC (mhp070), and lig (DNA ligase, mhp113). No mismatch repair genes (mutHLS, sbcB, vsr, or recJ) are present. The genes that encode the base repair system are ung (uracil-DNA glycosylase, mhp251), fpg (foramidopyrimidine DNA glycosylase, mhp595), and nfo (endonuclease V, mhp 065).
With the exception of the cilium adhesin P97, virulence factors have not been clearly established in M. hyopneumoniae. Thus, it was necessary to predict potential factors based on motif analysis and known virulence factors of other mycoplasma species or other bacteria. Proteases have been linked to virulence in gram-positive pathogens, and M. hyopneumoniae has five proteins with aminopeptidase signatures (map [mhp209], pepA [mhp462], pepF [mhp520], pepP [mhp680], and gcp [mhp656]). There are also two proteins with serine protease signatures (mhp287 and mhp292), and there is a ClpB homolog (mhp278). One or more of these proteases are thought to be involved in the posttranslational processing of the cilium adhesin P97 (8).
Like other mycoplasmas, M. hyopneumoniae has few obvious ways to control gene expression. There is a single sigma factor (rpoD, mhp063). No PROSITE araC, lysR, gntR, luxR, or sigma 54 interaction domains were found in the genome. A search for helix-turn-helix motifs revealed numerous AraC motifs with low scores, and consequently, these proteins have a low probability of having a regulatory function. A search for two-component regulatory system components revealed weak hits to a sensor histidine kinase (mhp310) and a hybrid sensor and regulator (mhp482).
Short homopolymeric tracts of adenines or thymidines are involved in phase switching of lipoproteins in Mycoplasma hyorhinis, Mycoplasma hominis, Mycoplasma fermentans, and Mycoplasma arthritidis (63). Thus, it was of interest to determine the extent of homopolymeric tracts in the M. hyopneumoniae genome and the potential of these tracts for involvement in phase switching. This analysis entailed searching the genome for polyadenine or polythymidine sequences and identification of their locations relative to ORFs and potential promoter sequences. Table Table55 shows the homopolymeric tracts consisting of 15 or more residues identified in regions of the genome that might affect gene expression either by interrupting transcription or by altering the translational reading frame.
Although they are not directly involved in gene regulation, chaperones nevertheless play important roles in protein structure and functional expression of translated products. M. hyopneumoniae and the other nine mycoplasmas whose genomes have been sequenced contain the members of the DnaK-DnaJ-GrpE chaperone complex, which has the role of controlling folding of nascent polypeptide chains as they emerge from the ribosome (information was obtained from the MolliGen website at http://cbi.labri.fr/outils/molligen/) (2). Notably absent in the strain of M. hyopneumoniae used was Hsp60 (GroEL), although this protein has been reported to be present previously (52). This was confirmed by a BLAST analysis of the genome with the reported sequence (accession no. AJ251765) at an expectation value of 1. Also absent was GroES, suggesting that the DnaK-DnaJ-GrpR complex has the sole responsibility for protein folding and rescue of misfolded proteins in the cytoplasm. Other mycoplasmas that lack GroEL and GroES include M. pulmonis, Mycoplasma mobile, M. penetrans, M. mycoides subsp. mycoides SC, and U. urealyticum (Table (Table6).6). A Lon protease (mhp541) is present to degrade misfolded or inactivated proteins. How M. hyopneumoniae maintains secreted proteins in a translocation-competent form for export is not known since there is no SecB homolog in the genome. This function may be served by trigger factor (tig, mhp233), a ribosome-associated chaperone that has been found to be essential for secretion of a cysteine protease from Streptococcus pyogenes (27). Trigger factor is missing in M. mycoides subsp. mycoides SC, but it is present in other mycoplasmas. A homolog of clpB (mhp278) was also found in the genome; its role in the stress response of M. hyopneumoniae is not known. clpB was not annotated in the M. pulmonis, M. gallisepticum, and onion yellows phytoplasma genomes (Table (Table6).6). Finally, other potential chaperones include mhp234 (a member of the HSP33 family of disulfide bond chaperones); mhp278 (ATPase with chaperone activity and clpAB signature 2); and mhp554, mhp656, and mhp673 (metal-dependent proteases and putative molecular chaperones).
The M. hyopneumoniae genome is the tenth genome of a mycoplasma species that has been completely sequenced and the only sequence of a porcine mycoplasma that has been completely sequenced. Phylogenetically, the closest relative among the sequenced mycoplasmas is M. pulmonis, a rodent pathogen. A direct comparative analysis of the mycoplasmas is hindered by a high degree of sequence divergence and a relative lack of synteny in all but M. pneumoniae and M. genitalium, two closely related species (16).
In summary, this paper presents the primary features of the M. hyopneumoniae genome and describes a preliminary analysis of its genome structure. As such, it should provide the foundation for future experiments aimed at fully understanding the molecular mechanisms that this pathogen employs in its colonization of the swine lung and the development of disease.
This study was supported in part by funds from the Iowa Healthy Livestock Advisory Council and in part by funds from the Australian Research Council, Bioproperties Pty. Ltd., and the Asia Pacific Centre for Animal Health.
We thank Glenn Browning for his support of this project.