The M. pneumoniae
genome contains four types of large repetitive elements (RepMP1, RepMP2/3, RepMP4 and RepMP5; ) which constitute 8% of the M129 genome. RepMP2/3, RepMP4 and RepMP5 have been studied extensively as they are found within genes that encode major virulence factors of M. pneumoniae
, adhesin P1 and cytadherence-related proteins, respectively). Sequence divergence within RepMP2/3 and RepMP4 allows the classification of all worldwide strains into two groups. Similar repeats were also detected within adherence related genes of M. genitalium
) and M. gallisepticum
. It has been assumed, and demonstrated in case of M. genitalium
, that homologous recombination among numerous copies of these repeats allows for sequence variations among strains.
In this study we focused on the M. pneumoniae
-specific RepMP1 sequence element and its role in generating sequence divergence among clinical isolates. RepMP1-proteins belong to the largest M. pneumoniae
protein family that is united by the coiled-coil domain (DUF16) within their distal regions ( and S2
. RepMP1-proteins form a subset of this protein family that also shares different degrees of homology within their proximal regions. Sequence similarities of the proximal domains result from the RepMP1-core element that, in most cases, is localized within the 5′-end of the gene (Figure S1
). The sequence of the distal domains (DUF16) is not conserved, and domains differ in amino acid residues and length. The important common feature of DUF16 domains is the presence of direct tandem 7-aa repeats that mediate its coiled-coil structure 
Through analysis of 31 M. pneumoniae isolates, including genome sequence of M. pneumoniae strain 309 we clearly demonstrate a major recombination event associated with three RepMP1-genes (mpn130, mpn137 and mpn138). Recombination produced a hybrid gene (mpn138/7) that has the proximal region (or RepMP1-core) of mpn138 and the distal region (or DUF16 domain) of mpn137 joined together through a 49-nt remnant of a third gene (mpn130).
Since we detected identical sequence rearrangements involving mpn130, mpn137 and mpn138 genes in all type 2 strains, we investigated mpn129-mpn140 chromosomal regions for short repeats associated with RepMP1-core. Completed analysis revealed several copies of all five sReps (, , ). Based on identified sequence differences and on the position of the short repeats sRepB, sRepD and sRepE, we propose the occurrence of two subsequent events (). First, homologous recombination lead to the exchange of chromosomal regions enclosed between sRepDs and sRepEs (). As a result, RepMP1-cores and DUF16 domains were rearranged in these three genes. In the second step, the recombination between the direct repeats sRepB resulted in deletion of the region enclosed between them (). Thus, in place of three genes (mpn130, mpn137 and mpn138 as observed in M129) only one gene was retained (mpn138/7 as described for all type 2 strains).
The presence of numerous RepMP1-core elements within M. pneumoniae
genomes prompted us to look for and evaluate the short repeats (sReps) within the genome, as they seem to be involved in intergenic recombination of domains and deletion mechanisms. BLAST analysis of M129 genome revealed numerous copies of all five sReps (A–E) and their association with RepMP1- and DUF16-genes (encoded proteins contain the DUF16 domain but not RepMP1-core) (Table S2
). Short repeats A and B were found exclusively within intergenic regions adjacent to these genes. All three remaining sReps are localized within coding regions. While copies of sRepD are found within the conserved domain of several genes (3′-end of the RepMP1-core element), sRepC and sRepE are found within the coiled-coil region of several RepMP1-genes (i.e., mnp094
Analyses and cross-comparison of RepMP1-genes/proteins lead us to the conclusion that RepMP1-core elements and sReps provide a network for intergenic domain exchanges. For example, as demonstrated in , sRepD and sRepE-mediated recombination among three genes leads to three novel genes/proteins with different combinations of conserved proximal regions with coiled-coil domains ( and ). It is predictable that the exchange of domains will provide proteins with modified function. Currently, function(s) of both conserved and DUF16 domains as well as the majority of RepMP1-proteins remain unknown. So far, it has been shown that transposon insertions within MPN104 and MPN524 resulted in M. pneumoniae
mutants with altered satellite growth phenotype and altered gliding motility, possibly suggesting these proteins could play a role in cytoskeletal functions 
. Recombination-mediated protein domain variations have been reported previously for the Arp protein (an immunoglobulin A receptor in the M protein family) of Streptococcus pyogenes
. Repeat-associated plasticity in the Helicobacter pylori
RD gene family has been analyzed, and a mechanism leading to the exchange of domains was proposed 
. In eukaryotes, these translocations often involve transcriptional factors 
Predicted secondary structure of modified RepMP1-proteins.
Variability in the number of tandem repeats within the DUF16 domain is commonly observed in several RepMP1-proteins (, ). Such modifications in repeat numbers could likely result from slipped-strand mispairing events combined with unequal crossovers. In contrast to type 1 strain M129, one of the 21-nt tandem direct repeats is deleted in all type 2- mpn138/7 and mpn524. Due to this deletion all Mpn138/7 fused proteins are missing a 7-aa tandem repeat when compared with M129-Mpn137 putative protein sequence (V160EGRLDS, , ). Likewise, the type 2-specific Mpn524 protein is missing seven amino acids (E132KMDKME, ). Furthermore, the type 2-specific Mpn127 protein contains an additional fourteen residues when compared with M129-Mpn127 (amino acids RLVSMESRLDSMEN inserted after N206, , ). Similarly, three type 2 strains possess a Mpn501 protein with an additional 7-aa repeat (residues VKMDKME inserted after E187, , ). All these changes are found within coiled-coil regions of the proteins and likely impact on their structure and function (). For instance, the coiled coil region of the fused Mpn138/7 protein is not recognized as a Leucine zipper (found in the M129-Mpn138) (). Insertion of additional seven residues within coiled-coil region of Mpn501 might lead to the loss of the transmembrane domain (TM) ().
Summary of RepMP1-gene sequence variability in analyzed M. pneumoniae strains.
Recently, the numbers of tandem repeats within MPN501 and MPN524 were evaluated as part of a multi-locus variable-number tandem-repeat analysis (MLVA) of nearly 340 M. pneumoniae
strains originating from Tunisia, Japan, Germany, England, Wales, and other European countries 
. In the analyzed strains, the number of MPN501-repeats varied from four to six while the number of MPN524-repeats fluctuated from six to eight and variations were not type specific. Data were presented that tandem repeat numbers did not change during strain passage in broth culture and, possibly, in the course of persistent infection. Our analysis of MPN501 and MPN524 revealed comparable numbers of tandem repeats. We observed type-specific differences in the numbers of tandem repeats within MPN524, as well as within MPN1387 and MPN127.
In conclusion, numerous copies of RepMP1-core elements and associated short repeats are spread throughout the M. pneumoniae
genome, creating a network for gene rearrangement through homologous recombination. Still, we identified only a singular identical recombination of the same three RepMP1-genes in all type 2 isolates. Impressively, regardless of the presence of this intricate network, our data provide further evidence for the existence of two highly conserved groups of M. pneumoniae
strains as demonstrated in the past 
. Previous experiments clearly indicate that type-specific combinations of the repetitive elements in the P1 and mpn142
genes are not essential for the successful adherence of M. pneumoniae
to host cells and the colonization of the respiratory tract of guinea pigs 
. Therefore, M. pneumoniae
virulence does not seem to be considerably influenced by the strictly defined combination of repetitive elements and further studies are required to explain and understand reason(s) behind this lack of sequence divergence.