Reannotation and correction of the Rlow genome. De novo
sequencing of the Rhigh
genome and comparative analyses ultimately allowed for updates to be made to the previously deposited sequence of the progenitor strain Rlow
. These comprised changes at 52 loci and included 19 single nucleotide indels and 21 substitutions, with several leading to incorrect disruption of eight genes (see Table S1 in the supplemental material). The changes corrected misassembly in the clustered regularly interspaced short palindromic repeats (CRISPR) region, a noncoding region characterized by 36-bp exact repeats interspersed by 30-bp unique regions (reviewed in reference 67
). Added to the Rlow
genome sequence was approximately 2.2 kb of sequence originally excluded from within, and adjacent to, an approximately 1.7-kbp direct repeat within the CRISPR. Also added in the vhlA
4 locus region of the Rlow
genome was an approximately 14.2-kbp tandem repeat sequence containing six vlhA
genes (MGA_1336 to MGA_1342) (Fig. , vlhA
4.7.1 to vlhA
4.7.6). These changes resulted in a revised genome size of 1,012,800 bp for Rlow
. Use of standard bacterial start codons altered translational start positions of 419 (58%) previously annotated coding sequences (CDSs), changing their predicted coding and upstream sequences. Finally, reanalysis altered the predicted genetic content of Rlow
, with 40 RNA genes, 682 intact CDS gene regions, and 62 potential or likely pseudogenes (784 total genes) now predicted.
FIG. 1. Major genomic differences between F and R strain genomes. (A) Dot plot of F and Rlow genome sequences. vlhA loci are indicated with dotted lines and expanded to demonstrate gene structure at vlhA5 loci (B), vlhA1 loci (C), and vlhA4/3 loci (D). vlhA genes (more ...) Genome comparisons of Rhigh and Rlow.
The 156-passage Rhigh
genome was 1,012,027 bp in length and highly similar to that of its Rlow
parent, exhibiting no large-scale insertions, deletions, or genomic rearrangements. Genomic differences were observed at a total of 64 loci, with the majority comprised of small-scale changes, including 55 indels of which 53 were located in small repetitive tracts (Table ). This highlights the variable nature of repeats in M. gallisepticum
and is consistent with differences in tandem repeat elements observed between strains of Mycoplasma hyopneumoniae
). Twenty-three indels were located in predicted CDSs, and 32 were in noncoding regions, including 29 located in variable GAA repeat sequences involved in phase-variable expression of vlhA
). Noted were only nine substitutions (five transitions and four transversions), six of which were found in ORFs and were nonsynonymous. Thus, of the 29 differences observed in potential coding regions, all introduced amino acid substitutions or indels, translational stops, or frameshifts that might indicate functional protein differences between Rhigh
(Table ). Though some of these genomic changes have been previously described, specifically in cytadherence-related genes gapA
), and hlp
), mutations in additional genes accumulated over 156 in vitro
passages likely contribute to the attenuated Rhigh
Nucleotide and coding differences identified between genomes of Rlow and Rhigh
Among the genes truncated in Rhigh
, two are involved with sugar metabolism, the regulation of which is thought to play a major role in the survival and virulence of mycoplasmas in vivo
(reviewed in reference 21
). The glycerol kinase gene glpK
(MGA_0644) and the fructose/mannose-specific IIABC component gene fruA
) of the phosphoenolpyruvate:fructose phosphotransferase system (PTS) are likely inactive, as both lack significant portions of coding sequence (157 and 179 amino acids, respectively). The fruA
lesion (a deletion of 581 bp) is the largest indel observed between Rlow
. These genes are predicted to be necessary for transport/utilization of what is believed to be the three primary alternative carbon sources (glycerol, fructose, and mannose) used by M. gallisepticum
in glycolysis and subsequent ATP production. Glycerol catabolism has also been implicated in mycoplasmal virulence as it yields very high concentrations of H2
per mole of O2
and glycerol consumed (68
). In Mycoplasma mycoides
biotype small colony, glycerol uptake and utilization are strongly correlated with virulence. It is critical for the production of cytotoxic hydrogen peroxide by bacterial l
-alpha-glycerophosphate oxidase (GlpO), which catalyzes the oxidation of glycerol-3-phosphate and release of H2
). M. gallisepticum
encodes a homolog (MGA_0646) of the Mycoplasma pneumoniae
GlpO, encoded by a glycerol dehydrogenase-like gene (glpD
; MPN051), which was recently shown to mediate H2
production, thereby affecting host-cell cytotoxicity (22
). Thus, glycerol uptake and conversion to H2
are potentially significant in M. gallisepticum
virulence. The loss of GlpK function could conceivably result in reduced levels of glycerol-3-phosphate available for GlpO-mediated H2
production. Notably, one of the observed amino acid substitutions in Rhigh
occurs within the ATP binding region of a putative ABC-type, sn-glycerol-3-phosphate transport system ATP-binding protein (MGA_0677), making a functional change in glycerol-3-phosphate transport also a possibility. PTS components have also been implicated as virulence factors. The plant pathogen Spiroplasma citri
requires fructose PTS activity for virulence (13
), and PTS functions and carbohydrate metabolism have been linked with in vivo
survival in extraintestinal pathogenic E
). The fruA
PTS component may also have a conceivable role in M. gallisepticum
Disrupted in Rhigh are other genes with limited functional commonality. Two disrupted genes had highly similar paralogs elsewhere in the genome and encoded a potential ABC-type peptide/nickel transporter permease (MGA_0221) and a member of the variable lipoprotein family, vlhA3.02 (MGA_0379). These genes have coding sequences that are frameshifted relative to orthologs in Rlow, with Rhigh MGA_0221 lacking two transmembrane helices and a potential substrate binding domain and Rhigh vlhA3.02 lacking the C-terminal 218 amino acids present in Rlow. Paralogs of these genes conceivably provide equivalent functions and thus directly complement MGA_0221 and MGA_0379 mutations. Conversely, disruption in these genes may indicate loss of, and/or functional shift away from, very specific transport or attachment functions affecting virulence, with paralogs conceivably complementing functions with different specificity.
Also disrupted in Rhigh was a gene with no obvious function, the conserved hypothetical gene MGA_0173. Identified in MGA_0173 was a tlyC domain which consists of a duf21 (domain of unknown function) transmembrane domain and a CBS domain potentially involved in ligand binding. MGA_0173 was also similar to M. pneumoniae tlyC, a gene proposed to encode a hemolysin but noted by other investigators as lacking a hemolysin domain (GenBank accession NP_109847). Despite the lack of obvious function for this gene, its disruption in Rhigh makes it an obvious target for investigation as a potential virulence determinant.
Genomic comparison of strains Rlow and F. (i) General comparison.
The F strain genome was similar to that of R strain, sharing 747 predicted gene orthologs in largely syntenic genomic regions. The F strain was predicted to contain 781 total genes, including 688 intact CDS genes and 53 potential or likely pseudogenes. Despite sharing similar overall genomic content, however, the F strain demonstrated features indicating that it is clearly distinct from the R strain. Overall, F strain genes were 2.7% divergent at the amino acid level relative to orthologs in strain R, indicating significant genetic distance between these two strains of the species M. gallisepticum. The F strain genome was 977,612 bp in length, approximately 35 kbp shorter than that of R strain, largely due to a unique and reduced vlhA gene complement. Notable large-scale genomic differences included a genomic inversion of approximately 500 kbp between the vlhA3 locus and the vlhA4 locus (Fig. ) and approximately 17 kbp of novel sequence present in the F strain and absent in the R strain (Fig. ). Notable in the F strain genome was an insertion of a tandem duplication of approximately 25.7 kbp (duplicated within positions 305536 to 356916) which contained 17 genes corresponding to R strain MGA_0271 to MGA_0312. Other larger-scale indels in non-vlhA and non-CRISPR regions were limited (18 indels of 100 bp or more), with 14 of these present in regions containing only transposons and/or potential pseudogenes in both strains but with others affecting intact genes (Table ). A total of 126 genes exhibited length differences between F and R strains, but most of these resulted in minor internal or terminal changes in affected proteins (52 genes), amino-terminal changes which likely reflected incorrect start codon prediction (17 genes), or changes in genes predicted to be fragmented in both strains (22 genes). Changes did, however, account for at least 25 F strain genes potentially intact or disrupted relative to strain R orthologs (Tables and ). Differences in coding potential were examined for clues as to the genetic basis of M. gallisepticum virulence. Of particular interest were differences in genes absent, disrupted, or highly divergent in strain F relative to strain R (Fig. and Table ; see also Table S2 in the supplemental material).
Genes fragmented or absent in strain F relative to strain Ra
Genes intact or present in strain F relative to strain Ra
(ii) vlhA gene regions.
The F strain genomic regions most variable relative to strain R contained vhlA
loci displayed divergent and nonsyntenic gene complements suggestive of local and inter-vlhA
rearrangements and paralogous gene gain/loss, with only the two-gene vlhA
2 locus highly conserved between F and R strains (Fig. ). Indeed, the F strain contained 23 vlhA
genes (20 intact), 28 fewer than the 51 present (44 intact) in Rlow
. Intact VlhA ORFs demonstrated lower average amino acid identity (88%, ranging from 61 to 99%) to R strain homologs than did non-VlhA homologs (98%). One intact F strain vlhA
ORF (MGF_4735; predicted 75-kDa protein) was notably divergent from VlhA ORFs in the R strain (59% amino acid identity to VlhA4.11) yet more similar (73%) to the strain S6 ORF pMGA 1.4 (43
). A 75-kDa immunodominant protein specific to the F strain has been previously described (30
); however, whether MGF_4735 encodes this protein remains to be shown. The nonsyntenic nature of many interstrain VlhA best-matches and the presence of a large genomic inversion, gene duplication, and indels suggested recombination in and around vlhA
loci (Fig. ). This included potential vlhA
4 locus rearrangement bounding the genomic inversion, where genes present in the 3′ region of each R strain locus appear to have recombined and switched positions, resulting in the vlhA
3/4 locus and the vlhA
4/3 locus in the F strain (Fig. ). Additional complexity at these and other vlhA
loci makes elucidation of discrete rearrangement events speculative. In addition, the unique 17-kbp sequence adjacent to the vlhA
5 locus in the F strain is essentially in the same locus as most vlhA
5 locus genes present in the R strain (Fig. ).
genes encode immunodominant lipoproteins and hemagglutinins that undergo phase-variable expression both in vitro
and in vivo
), and they are thought to be virulence determinants which facilitate establishment of chronic infection through immune evasion. Overall, the divergence between F and R strain loci was extreme relative to the single vlhA
gene disruption observed between Rhigh
and, regardless of mechanism, was consistent with phase-variable gene locus variation observed between strains of other mycoplasma species (62
). Extreme variation in vlhA
complement, be it phase-variable expression or interstrain genetic heterogeneity, likely reflects significant disruptive selective pressure exerted on these genes as they encode major immune targets, and it likely results in elicitation of different serological specificities in the host. Whether frameshifted vlhA
genes are expressed directly or though recombination with other vlhA
genes, as seen in Mycoplasma synoviae
), is unknown; however, these, too, could conceivably contribute to antigenic variation in the host. Similarly, a recombinatorial effect on gene order and ultimately phase variation is not known in M. gallisepticum
, nor is such a mechanism obvious, given the data here. Notably and despite this variation in vlhA
complement, F strain continues to induce immune responses that are generally protective against distinct strains.
Transposons are mobile genetic elements encoding transposases and are capable of random genomic integration, disruption of coding sequences, and/or mediating movement of nontransposon sequence within or between organisms. Such transposon-mediated changes likely occurred between F and R strains as their transposase gene loci demonstrated variability. These included two distinct F strain transposase insertions (MGF_2103 and MGF_2868) into intergenic regions upstream of potential lipoproteins (MGF_2102 and MGF_2118) and a ribosomal protein. A third F strain transposase gene (MGF_4139) was intact, and it essentially replaced a 2,441-bp locus which contained both transposase gene fragments (MGA_1108/9) and a conserved protein gene (MGA_1107) present in strain R. MGF_4139 was similar to MGA_0910 transposase, which at a different locus disrupts the MGA_0908/0911 hypothetical transmembrane protein (multigene family) gene in strain R but is absent in strain F, leaving an intact MGA_0908/0911 ortholog (MGF_1196) (Table ). Although a direct transposition between these two loci is possible, similarity to other M. gallisepticum transposases and remnants of MGF_4139 in R strain sequence leave this unclear. Overall, strain F contains 14 transposase genes (3 intact) relative to the 16 genes (2 intact) present in strain R. In addition, transposons may affect indel events in adjacent genes, as six genes with coding potential affected by larger indels in F strain relative to the R strain were adjacent to transposon loci.
(iv) MGA_1107 and virulence assessment.
Transposon-mediated genomic changes ultimately may act to alter virulence, host range, or tissue tropism. Notably, this may be the case for the R strain MGA_1107 gene, which again is adjacent to transposase genes and is absent in the F strain. MGA_1107 contains a domain shared among proteins involved in DNA metabolism, including proteins similar to the putative nuclease RmuC, thought to affect DNA recombination of short inverted repeat sequences in E. coli
). MGA_1107 also shares a relatively high level of amino acid identity (92%) with M. synoviae
MS53_0172, indicating that the MGA_1107 gene, and perhaps flanking transposon sequences, have involved or mediated a horizontal gene transfer (HGT) event, consistent with previous observation of a likely vlhA
HGT between M. synoviae
and M. gallisepticum
). MGA_1107 genes were identical between Rlow
(Table ). Based on these genomic data and on previous data indicating that MGA_1107 is transcriptionally upregulated in Rlow
upon exposure to cells to which M. gallisepticum
), an isogenic mutant of MGA_1107 was assessed in a chicken challenge system for virulence in vivo
. Though lung lesions and minor airsacculitis were induced by the MGA_1107 mutant, lung and air sac lesions have been previously reported to be significantly variable in this experimental system and thus are precluded from being used for quantitative purposes (54
). Tracheal mucosal thickness was reduced compared to that in the wild type, and histopathological lesion scores were similar to those of negative-control birds (P
< 0.05, analysis of variance [ANOVA] on ranks and posthoc pairwise comparison) (Fig. ). This mutant was recovered from lung and air sac tissues (albeit to a lesser degree than Rlow
) but was unrecoverable from the trachea. These data indicate that MGA_1107 contributes in a yet uncharacterized manner to the generation of tracheal lesions typical of virulent M. gallisepticum
infection in vivo
and that the loss of this ORF may be a factor in the attenuated phenotype of the F strain. This experimental evidence illustrates the utility of the approach of genomic comparison of virulent and attenuated strains in identifying genetic factors that influence survival in the host or the production of lesions.
FIG. 2. Attenuation of MGA_1107 mutant of R strain in chickens. (A) Histopathological lesion scores in tracheas of chickens infected with Hayflick's medium (Medium), mutant (MGA_1107) organism, and virulent organism (Rlow). Horizontal bars indicate 25th percentile (more ...) (v) Subtilisin-like proteases.
Indels occurred within subtilisin-like protease genes, of which there are five paralogs (three intact) encoded in the R strain. F strain lacks 1,693 bp encoding the majority of the MGA_0798 subtilisin-like gene intact in the R strain. A similarly sized deletion (1,686 bp) removed the majority of the nearby MGA_0801 subtilisin-like locus; however, MGA_0801 was a likely pseudogene and thus was predicted to be nonfunctional in both strains. Orthologous, but fragmented, MGF_5102F and MGA_0517/8 subtilisin-like loci also demonstrated coding variation, with two frameshifts restoring all but the likely N terminus to MGF_5102F. Subtilisins are a ubiquitous family of proteases with a range of functions, including roles in bacterial virulence. All five genes in the R strain belong to the D-H-S subgroup of subtilases encoded in other pathogenic bacteria, including Bacillus anthracis
(reviewed in reference 63
). Other essential subtilases conferring microbial virulence include dentilisin and SufA. Dentilisin is used by the oral spirochete Treponema denticola
to degrade host chemokines, cytokines, and fibrinogen (4
) and to rearrange the bacterial outer sheath (26
). SufA, encoded by the Gram-positive bacterium Finegoldia magna
(an opportunistic pathogen of humans), inactivates antimicrobial peptides and chemokines and is believed to aid in bacterial survival in the host (29
(vi) HAD-like proteins.
Also affected by indels were genes encoding potential hydrolases of the haloacid dehalogenase (HAD) superfamily, of which strain R contains five paralogs. MGF_4199 lacked the PTS-like N-terminal domain of the R strain ortholog (MGA_1083), a fusion reflecting deletion of additional PTS lichenan-specific IIA component gene sequences (present as fragments in F strain MGF_4207f) from strain R. Two HAD hydrolase genes were duplicated in the 24-kbp tandem repeat, yielding seven HAD hydrolase loci in F strain. Though functions of mycoplasmal subtilases and HAD hydrolases are unknown, the presence of, and variability among, multiple copies in M. gallisepticum suggest a role in host interaction and pathogenesis.
(vii) hsd genes.
Disrupted or variable in the F strain relative to the R strain were host specificity of DNA (hsd
) genes adjacent to a fragmented transposase (MGF_5343f). These encode protein subunits of a type I restriction-modification system (R-M) complex which mediates methylation (modification subunit, hsdM
), sequence-specific recognition of methylation state (specificity, hsdS
), and restriction enzyme activity (hsdR
). The F strain hsdR
gene (MGF_5319f) is prematurely terminated at nucleotide position 2307 (of 3,198), possibly resulting in a loss of restriction enzyme function. HsdS often contains N- and C-terminal domains of similar structures, but each has discrete sequence specificities (or target recognition domains). Both the R and F strains, however, encode two separate single-domain HsdS units, akin to a single N-terminal domain and similar to the single-domain ORFs present in M. pneumoniae
and other bacteria. This is consistent with data indicating that single-domain dimerization confers proper HsdS function in E. coli
). Notably, while the first copy of HsdS is identical between R and F strains (MGF_5309/MGA_0539), the central domain of the second copy (MGF_5313/MGA_0540) is highly divergent, with little recognizable nucleotide similarity.
Hsd systems primarily protect bacteria from large fragments of foreign DNA such as those encountered during bacteriophage infection and which may interfere with normal cellular processes (reviewed in reference 47
). Mycoplasma pulmonis
encodes a unique hsd
system that undergoes phase-variable gene expression and generates hsdS
sequence variation (and likely target sequence specificity) through sequence-specific recombination between two distinct hsd
). In addition, M. pulmonis hsd
expression has been associated with bacterial tissue tropism as expression becomes active in the lower, but not upper, respiratory tissues of rodent hosts in vivo
). While the extreme sequence divergence observed here between orthologous MGF_5313 and MGA_0540 genes could conceivably be generated through a recombination process, lack of both a second hsd
locus and obvious inverted repeats bounding the divergent hsdS
domain makes this unlikely. Though a role for the hsd
system in M. gallisepticum
host range and/or virulence is speculative, hsd
genes do appear to be under different selective pressures in the R and F strains.
(viii) Transport proteins.
Though none appeared to involve glucose metabolism, multiple solute transporter-like genes were variable in F strain relative to R strain. These included two genes fragmented and two genes intact in F strain relative to R strain. Fragmented in F strain were MGF_0748f, a protein with weak similarity to ABC-transport proteins, and MGF_0026f, a prematurely terminated ortholog of the intact MGA_0626 mdlB
-like gene in R strain. MGF_0026f contained a stop site located between the ABC transporter transmembrane domain and the ATP binding domain, likely affecting this protein, which is similar to proteins involved in multidrug efflux and transport of lipids and proteins. Conversely, intact in F strain relative to R strain were MGF_0017, a second mdlB
-like gene adjacent to the fragmented MGF_0026f paralog, and MGF_3370, an intact ATPase component of an ABC transport protein of unknown specificity. The MGF_3378f locus contained a fragmented ortholog of the MGA_1283 mtlA
-like gene in R strain; however, MGA_1283 itself is similar only to the C-terminal EIIB domain of MtlA. MtlA is a PTS transporter for mannitol and, although all genes required for mannitol utilization are present in the members of the pneumoniae clade, this system does not appear to work in M. pneumoniae
), and mannitol transport has been reported to be absent in M. gallisepticum
strain NCTC 10115 (68
). The variability between F and R strains in multiple solute transport proteins indicates that, again, they may affect growth and survival in different hosts or host tissues, likely affecting virulence.
(ix) Other disrupted strain F genes.
Other genes disrupted in F strain relative to R strain have metabolic functions or are of unknown function (Table ). MGF_3677f is a highly fragmented arcA gene encoding arginine deaminase and is intact in R strain (MGA_1220). MGF_4156 encodes a protein orthologous to the MGA_1100 asparaginyl-tRNA synthetase in strain R, including the AspRS/AsnRS core domain; however, it lacks an N-terminal domain conserved in other species. While these changes potentially affect F strain metabolism, paralogs (MGF_2849 and MGF_4297) conceivably provide compensatory functions. Four conserved and six unique hypothetical proteins are fragmented in F strain relative to R strain. One genomic region containing four small hypothetical proteins (MGF_5453 through MGF_5463) is highly variable between F and R strains, suggesting that, although no intact homolog has been observed, these may represent fragments of a novel gene.
(x) Highly divergent genes.
Many genes demonstrated above-average amino acid divergence between F and R strains (see Table S2 in the supplemental material). While 173 intact F ORFs were identical to R strain homologs at the protein level, the rest differed on average by 2.4%. Among the most divergent protein homologs were those similar to known or putative cytadhesins and cytadherence-related proteins (Table S2). Indeed, of the 37 intact ORFs differing between strains by 5% or more, nine (24%) are putatively involved in cytadherence or tip structure formation, including GapA, CrmA, CrmB, MGC2, and several ORFs with similarity to HMW cytadhesin-related proteins. Also divergent is PvpA, a protein that is localized to the terminal tip structure, undergoes phase variation under antibody pressure both in vitro
and in vivo
, and is potentially involved in antigenic variation and immune evasion in the host (35
). Genomic sequences presented here confirm the loss of about 230 nucleotides previously reported in the direct repeat 1 (DR1) and DR2 regions in the F strain relative to the R strain (39
(xi) Genes present or functionally intact in strain F.
In addition to genes absent or disrupted, the F strain genome contains several genes that are absent in the R strain genome or that are intact relative to homologs in strain R (Table ). The most striking of these included the 11 genes present in the approximately 17-kbp sequence adjacent to the F strain vlhA5
locus (Fig. ). This “17-kbp locus” includes genes involved in acquisition, transport, and metabolism of maltose/maltodextrin and other sugars (Table ). The ORFs at the 17-kbp locus shared homology with syntenically conserved or semiconserved loci in other mycoplasmas, in particular, members of the hominis group (72
). This includes similarity to Mycoplasma fermentans
, in which the entire locus was conserved in content and was highly similar at the amino acid level (27% to 71%), and to M. synoviae
, another poultry pathogen which contains three genes of the locus which also are highly similar to those in the F strain (59% to 62% amino acid identity). Notably, homologs of genes in this locus were not obvious in other species of the pneumoniae group to which M. gallisepticum
belongs. Thus, the 17-kbp locus may have been the product HGT from a species from the hominis group, conceivably a common ancestor of M. fermentans
and M. synoviae
. In addition, two genes bounding the 17-kbp locus in the F strain (MGF_3470 and MGF_3567) are present as remnants in the R strain (MGA_1349 and MGA_1265, respectively), indicating that the locus was lost from the R strain subsequent to its introduction to M. gallisepticum
Other genes in the F strain represent genes intact relative to fragments present in strain R (Table ). MGF_5077 encodes a conserved hypothetical protein fragmented in the R strain (MGA_0485 and MGA_0487) and is located at the 5′ end of the FoF1 ATP synthase operon, where it conceivably plays a part in energy production. The MGF_4515 unique hypothetical protein contains an additional C-terminal 35 amino acids relative to the R strain MGA_1014, which contains a frameshift. Similarly, the MGF_0872 conserved hypothetical lipoprotein gene contains an additional C-terminal 141 amino acids relative to MGA_0829, potentially affecting a surface-exposed domain and the ability to interact with the host. How these and other intact genes affect the phenotype of strain F relative to strain R is unknown, but they conceivably mediate virulence or host range functions.
Notably, sequence analyses using Rhigh
and F strains enabled identification of differences in the genome sequence of Rlow
that were likely specific to the particular clone selected for sequencing (Rlow
clone 2) (52
). These included the 37-bp frameshifting insertion noted in pvpA
) and mutations in the MGA_0223/4 ABC-transporter permease, MGA_0250 unique hypothetical protein, and MGA_1117/9 cytadherence-related molecule B (crmB
)-like protein genes that disrupted coding potential in Rlow
clone 2. While the effect of these changes on the phenotype of Rlow
clone 2 relative to the Rlow
wild-type population is unknown, the comparative genomics approach proved a powerful means to discern them.
CGH of vaccine strains.
Comparative genomic hybridizations of the three commercially available vaccine strains (F, ts-11, and 6/85) were performed, and the fold difference relative to results obtained with Rlow were determined. In an attempt to identify gene divergence/loss associated with M. gallisepticum attenuation, focus was given to features absent in all vaccine strains relative to Rlow. Only seven non-vlhA and non-CRISPR region probes hybridized 4-fold or less in all the vaccine strains compared to Rlow, and these genomic lesions were further probed with PCR and sequencing (Table ).
Of the seven gene features absent in all vaccine strains, five are located in likely gene fragments and verified subtilisin and transposase gene loci affected by indels in F strain genome analysis (Table ). Although divergence or loss of sequence within pseudogenes might be expected, these data verified their absence in vaccine strains. Sequences for a GTPase similar to the tRNA modification protein MnmE (MGA_0604) and a putative Holliday junction resolvase (MGA_0836) gene were confirmed to be present in vaccine strains by sequencing, with SNPs within sequence spanning the 50-mer probe responsible for the lack of hybridization signal. Whether divergence in these enzymes might contribute to an attenuated phenotype remains to be proven. In addition to genes that are missing in all three vaccine strains, multiple features were observed to be divergent or absent in only one or two strains (see Table S3 in the supplemental material). Strain-specific phenotypes may be associated with these mutations, supporting the conclusion that M. gallisepticum virulence is complex and multigenic.
In this study, we used comparative genomic analyses of virulent and attenuated M. gallisepticum strains to identify determinants involved in pathogenesis and survival in the host. Genomes of the attenuated high-passage derivative, Rhigh, and heterologous vaccine strain F were sequenced and compared to the known genome sequence of the virulent, low-passage strain Rlow, revealing mutations in numerous genes and indicating a range of protein functions potentially involved in virulence. While these included suspected or known cytadherence-related functions, which are of primary importance for M. gallisepticum virulence, other novel virulence determinants were indicated. Relative to other genomic changes, those associated with the vlhA major variable lipoprotein genes were highly represented—as promoter region variability in strain Rhigh and as a highly divergent gene complement in strain F—supporting the notion that vlhA gene expression and VlhA phenotypic diversity are important for persistence of M. gallisepticum in the host. Notably, changes in sugar metabolism and solute transport functions were apparent in both attenuated strains, indicating that metabolic substrate utilization may be a significant mechanism by which strains exhibit phenotypic differences in the host. While genes involving metabolism, proteolysis, and restriction-modification were predicted to be compromised in attenuated strains, other potential virulence determinants included genes with no or nonspecific functional prediction and thus of interest for further characterization. We proceeded to characterize one such gene by demonstrating reduced tracheal lesions in chickens infected with an isogenic mutant of the MGA_1107 gene. CGH analysis identified few common genes missing or divergent in vaccine strains F, ts-11, and 6/85, indicating that no single gene was likely responsible for their attenuation. This supports the notion that M. gallisepticum pathogenesis is complex, multifaceted, and multigenic, consistent with these and previous results indicating that independent genes may be essential for virulence. The comparative genomic analysis presented here point to additional factors potentially critical for colonization and virulence in the host, and it provides the framework essential for the rational design of future vaccines.