Our understanding of the rhs gene family has been hampered by a very incomplete knowledge of global sequence diversity, due in part to the characterisation of rhs in E. coli K12, which, as should now be clear, has a relatively meagre and unrepresentative rhs repertoire. In fact, rhs genes are much more diverse than previously appreciated and comprise six structurally and phylogenetically distinct lineages within the Enterobacteriaceae. The taxonomic distribution of these distinct rhs types is punctate, reflecting frequent and independent gene gains and losses in different genera, as well as occasional LGT. When we look closer within particular species, inter-strain variation exposes some of the mechanisms responsible. In S. enterica, rhs are restricted to mobile elements and are limited to a single functional copy through differential deletion. In E. coli, there has been a major expansion in Clade I rhs through transposition to novel loci, and loss of other clades relative to other Escherichia spp. Comparison of C-terminal tips and dissociated fragments shows that while C-termini vary greatly within a locus, each distinct sequence is conserved in related strains, indeed in other species and genera. Hence, we must conclude that they are structurally conserved rather than hypervariable, that is, C-terminal variability is facilitated by dynamic substitution from a theoretically large pool of structurally diverse, (but evolutionarily old), C-terminal sequences rather than rapid divergence of static sequences under selection. These auxiliary C-termini are not all resident on the same chromosome and so must exist episomally and, since related strains can often have the same arrangement of dissociated rhs fragments, displacement of one C-terminus by another proceeds relatively slowly.
The structural features of rhs
loci, the combination of conserved and variable domains coinciding with distinct GC signatures and the tandem repetition of gene fragments downstream of an intact gene copy, have analogues in other organisms. In Neisseria meningitidis
, the mafB
genes occupy three distinct loci and each is arranged in this way; a full-length gene copy is followed by C-terminal fragments of variable length and low GC-content, but flanked by conserved domains [31
]. Comparative genomics indicates that a given C-terminal tip can be associated with a complete maf
gene in some strains, and found unattached downstream in others, mirroring the evidence presented here for rhs
]. There is no evidence that rhs
are homologous, and so their structurally analogy may reflect a convergence enforced by common mechanistic constraints.
The structure of rhs
genes and their downstream silent tips has superficial similarities with gene variation mechanisms in several unrelated bacterial pathogens including the pilus antigenic variation system seen in Neisseria gonorrhoeae
], the haemagglutinin in Mycoplasma synoviae
], and antigenic variation in Borrelia
]. In these systems variation is introduced into the expressed gene by recombination between itself and one of a number of silent copies of that gene located either downstream or elsewhere in the genome. All gene copies are composed of constant and variable regions where the latter show little conservation but are flanked by conserved sequences that facilitate recombination. As stated above, Rhs
genes are not hypervariable and are not truly analogous to systems of antigenic variation, but recombination may be similarly employed to introduce structural variability. Our observation of C-terminal displacement is also superficially reminiscent of the 'terminal reassortment' process that might create novel type III secretion systems through the generation of sequence mosaics [40
]. However, this is presented as a largely unregulated process of highly promiscuous recombination between unrelated genes, while our observations suggest a process tightly regulated by structural homology and, consequently, with much less scope for introducing novelty.
In proposing a mechanism for how within-genome rhs sequence variability is generated, we need to explain the principal observation that rhs fragments consisting of core domain sequence and variable C-termini are found dissociated downstream of intact genes, often in long strings. Our essential contention is that this reflects previous displacement events by 'incoming' tips, but we also need to explain why the core fragments are of variable length and why tips at a given locus only ever contain core sequences of the same clade, (i.e., why are there inherent phylogenetic limits on what can insert). We propose a model that requires two independent recombination events, which is described in Figure . First, there is homologous recombination between the conserved core sequence within a dissociated tip and either the core region of the rhs gene or that of another unattached tip. This would result in the production of a small episomal circle carrying a conserved core region that is attached to a variable C-terminal tip and any intervening genes. The length of the core fragment would depend on where the recombination breakpoint occurred. Second, after transfer to another bacterial isolate, homologous recombination would be required between the chromosomal rhs core and identical sequences located on the episomal circle carrying the unattached tip. The phylogenetic limits of displacement would indicate that very high identity is required between core domain sequences to permit recombination. This second cross-over event involving a circular intermediate is required to explain how an attached tip can be displaced by a new tip without deleting or entirely replacing the old tip, but simply shunting it to a silent position downstream of the intact rhs gene.
Figure 6 A hypothetical model of C-terminal tip displacement. Homologous recombination between the conserved core sequence within the downstream unattached tip and either the core region of the rhs gene or that of another unattached tip. This event would result (more ...)
All circumstantial evidence supports existing tips being repeatedly replaced by non-homologous alternatives, yet there is little evidence that these new tips originate from other locations in the same genome. This means that new sequences must be acquired from other bacteria with related rhs
elements; or alternative tips could be carried into the genome as cargo on other mobile genetic elements. The former would require that the episomal circle carrying the alternative tip is sufficiently stable so that it could be transferred between bacteria. This could occur by generalised transduction or natural transformation. Precedents for such a stable circular intermediate come from both integron-mediated gene exchange [41
], which requires an integrase that is not known to co-occur with Rhs
, and from the pilus antigenic variation system in Neisseria
, in which recombination between pilE
and the silent pilS
genes does not involve a site-specific integrase but instead utilises the host RecA machinery [42
]. Episomal circles carrying pilin tips are sufficiently stable to be used to naturally transform Neisseria
]. With regard to the introduction of new tip by LGT, we have provided evidence that rhs
genes and alternative tips are frequently carried on genomic islands, such as SPI-6 and ROD9 in Salmonella
. However, these alternative tips would still need to form stable closed circles to be able to insert at related rhs
This study has shown that the rhs
gene family is ancient and a core component of Enterobacterial genomes, and their structural diversity suggests that they have multiple roles. While C-terminal displacement engenders structural flexibility, which is itself ancient, rhs
do not show any hypervariability that might indicate they were interacting with host immune systems; indeed, we have shown that pathogenic E. coli
generally lack rhs
present in commensal strains. While the biological function of Rhs proteins is still rather unclear, the presence of the repeated motif that is found in other surface proteins like the wall-associated protein WapA from Bacillus subtilis
] and the teneurin family of proteins [23
] suggest a cell surface-associated binding function.
A small number of recent studies have provided potential clues to the functions of Rhs proteins in E. coli
, although none offer conclusive evidence for function. The most recent and detailed study of RhsA was during an analysis of the mechanism of secretion of group 2 capsular antigens in E. coli
]. RhsA was identified as a likely component of a large hetero-oligomeric capsule biosynthesis/export complex, based on crosslinking to the KpsD protein in vivo
. In an rhsA
mutant, levels of the group 2 capsule were reduced in E. coli
and the KpsD and KpsE proteins were no longer localised at the poles of the cells, suggesting a direction function of RhsA in the assembly and/or functioning of the capsular export pathway. Another study presented evidence that an rhsA
mutant of E. coli
O26:H- has a significant colonisation defect in calves compared to the wild-type strains suggesting an important role for RhsA in vivo
]. Finally, a study examining the response of E. coli
to the biocide polyhexamethylene biguanide (PHMB) reported the increased expression of a number of rhs
genes in E. coli
after exposure to PHMB [48
]. Despite this rather disparate data, it is clear that the functions of Rhs protein are at the cell surface or cell envelope and their molecular function may well include a role in carbohydrate binding of some form. However, the role of the alternative tip structures and the functional consequences of tip replacement are yet to be elucidated.