TX16 was the first
E. faecium strain sequenced and has been used in various studies since [
26,
28,
63,
64]. The TX16 genome is characterized by numerous hyper variant loci and a large number of IS elements and transposons. Ortholog analysis as well as core and pan-genome analysis of TX16 and the other 21 sequenced strains revealed that
E. faecium genomes are highly heterogeneous in gene content and possess a large number of dispensable genes. Similar to the findings by van Schaik et al. [
32], pan and core genome analysis predict the pan genome to be open. Phylogenetic analysis using single-copy orthologs of the same length and gene content dissimilarity analysis in addition to recent studies [
33,
57] looking at core genes, SNPs and 16S rRNA, all indicate a large divergence between CA-clade isolates and HA-clade isolates. Furthermore, our previous analysis [
33,
57] and analyses within this study show that CC17 genogroup isolates cluster more closely together and further away from the CA-clade isolates than the other non-CC17 HA-clade isolates, indicating the CC17 genogroup is a more recently evolved genogroup.
Genomic island analysis by codon usage bias and composition variation showed that TX16 has 9 GIs, although TX16 also possesses a large number of hyper variant loci, suggesting that most of the genomic variable loci in TX16 were acquired through lateral gene transfer, possibly through mobile elements such as transposons. In general, strains in the HA clade harbored more transposons than the CA strains and certain IS elements such as IS
16. These findings are consistent with a previous study using whole genome microarray [
31].
Although IS16 presence has been proposed as an indicator of hospital-associated strains such as those apart of the CC17 genogroup [
48], IS16 was not found in all HA-clade strains. Of note, however, all HA-clade strains contained the
pbp5-R allele (except for 1,231,501 and D344SRF which is a spontaneous deletion mutant of
pbp5) which may indicate that this is a reliable marker for hospital-associated isolates. Indeed, the
pbp5-R allele is also found in animal and community isolates that are considered within the HA-clade, but not considered clinically associated [
35,
36]. The exception, 1,231,501 is interesting in that it is the HA-clade isolate from the blood of a hospitalized patient with no resistance genes, possibly supporting the concept that the genomic content of a strain, not just antibiotic resistance, adds to the survival in the hospital environment. In the 100 gene analysis by Galloway-Pena et al., it was found that 5 of the 92 genes of this strain studied grouped with the community clade, indicating it is a hybrid strain [
33] as also reported in a recent study [
34].
Capsular and other cell envelope polysaccharides of several gram-positive bacteria are known to have important roles in virulence and protective immunity [
65-
67]. Although the majority of studies on enterococcal surface polysaccharides have focused on
E. faecalis, similar molecules have also been identified in
E. faecium and suggested as targets for opsonic antibodies and as potential vaccine candidates [
43,
68], and also implicated in resistance of TX16 to phagocytosis in normal human serum [
63]. Two such gene clusters,
cps and
epa, have been identified in
E. faecalis[
55,
56,
69,
70]. Although a 7-9-gene
cps region (
cpsC to
cpsK) was recently determined necessary for the production of an
E. faecalis capsular polysaccharide [
54] and shown to contribute to pathogenesis and evasion of the host innate immune response [
67,
69], TX16 only contains two homologs of the genes in this locus (
cpsA-cpsB)[
54]. In contrast, 15 of the 18
E. faecalis epa polysaccharide genes have homologs in TX16 and the other 21
E. faecium genomes, although their sequences vary between the two species. Therefore, it is likely that
E. faecalis and
E. faecium produce compositionally related, but not identical, Epa surface polysaccharides.
The hyper variable nature of the two polysaccharide loci found in TX16 raises the possibility that they are involved in biosynthesis of antigenically diverse surface polysaccharides which could help protect
E. faecium against host immune responses. Similar to other gram-positive bacteria, various MSCRAMM-like cell wall anchored proteins have been previously identified in
E. faecium; these include the collagen adhesin Acm and biofilm-associated Ebp pili, shown to be important for endocarditis and UTI in animal models [
26,
71], respectively, as well as two other collagen-binding MSCRAMMs, Scm and Fms18 (EcbA) [
21,
72]. Our comparison of 15 previously described MSCRAMM and pilus encoding genes of TX16 [
17,
18,
21] with those of 21
E. faecium draft genomes found them to be common among these strains and the majority of them (12/15) to be enriched among HA clade strains or have a sequence variant mostly/exclusively carried by CA clade strains. Thus, these findings agree with previous hybridization results [
14,
16,
17,
22] and with the presence of two distinct subpopulations of
E. faecium. Furthermore, one of these genes,
acm, was previously found to be expressed more often by clinical versus non-clinical isolates, whereas a pseudogene was often found in isolates from the community [
26,
64]. Taken together, these data indicate a clear difference in the MSCRAMM and pilus gene profiles of the HA and CA clades, suggesting that these genes may have favored the emergence of HA-clade
E. faecium in nosocomial infections.
When we combined our finding with previously published results, four of the 21 E. faecium genomes contain the CRISPR-cas locus. Three of these strains are within the CA clade and lack all antibiotic resistances analyzed in this study. One of the strains, 1,231,408, is a unique strain in which its genome is a hybrid of CA and HA genes. However, it does have 8 antibiotic resistance associated genes, showing there is not always an inverse relation between the number of antibiotic resistance determinants and the presence of CRISPR loci. More strains containing CRISPR-loci will need to be studied in order to determine if 1,231,408 is just an exception to the rule, or if the highly recombinant nature of E. faecium makes it different from E. faecalis with respect to the presence of CRISPR-loci in relation to antibiotic resistance determinants.
Overall, there seem to be some patterns that point to specific evolutionary events throughout
E. faecium’s history as a species. First and foremost, there is a large ancestral split between the CA- and HA-clade strains which are separated by at least a 3–4% difference in their core genome [
33]. The CA-clade isolates, except one, do not have either polysaccharide synthesis Locus 3 or 4 downstream of the
epa region, antibiotic resistance genes, certain genomic islands, or IS elements. After the HA-clade diverged from CA-clade there was further evolution within the HA clade and some HA-clade strains studied here may represent phylogenetic transitional lineages (Figure

B and C). Like the CA-clade strains, these transitional lineages are characterized by a lack of IS
16 (E1039; 1,231,501; and E1071) and have neither Locus 3 nor 4 (E1039; 1,231,501; E1071; E1636; E1679) in the
epa extension. Although the data are limited, one scenario that could explain these observations is if Locus 1 replaced Locus 2 in a HA-clade ancestral strain, after the split from the CA clade, which later acquired IS
16 and then, subsequently, Locus 3 or 4 replaced Locus 1 in the
epa extension region. Even if this is not the case, it seems clear that only strains further along in the phylogenetic trees, indicating a division within the HA-clade (Figure

A and B), acquired IS
16 and the polysaccharide biosynthesis Loci 3 and 4. The exception is E980, a strain previously shown to have 8 of 92 genes from the HA-clade, which could have gained Locus 4 via recombination. Also of note, three of the four strains that have Locus 1 downstream of the
epa locus lack the
ebp genes, possibly suggesting there may have been some kind of gain and loss through homologous recombination.
Figure

shows the projected scenarios for the evolution of the two clades of
E. faecium as can be envisioned using our data as well as other previous publications [
31,
33,
34,
57]. The hypothesis is that there was a primordial type of
E. faecium which split many millinea ago and evolved into two early community groups which had homologous genes e.g. the
pbp5-S or
pbp5-R alleles, the latter representing community sources of ARE (ampicillin resistant
E. faecium). These lineages could recombine with each other resulting in hybrid strains (i.e. 1,231,408 and 1,231,501) (scenario 1). The divergence between the two community groups eventually reached a core genomic difference of approximately 3–4%, creating a HA clade, which includes both ampicillin- resistant, community-based isolates, such as those from some canine and feline origins, as well as most of the clinical-, hospital- and outbreak- associated isolates and a CA clade, which consists mostly of community derived isolates. Most likely, community and hospital ARE isolates split from the same ancestor, as represented by scenario two. However, it is also possible that ARE clones evolved from the animal reservoir (scenario 3), or that animal ARE isolates represent evolutionary descendants of hospital ARE transferred from humans to their pets (scenario 4).