A number of previous publications have demonstrated differences between many hospital-associated and community-associated E. faecium
strains, including differences in the rates of putative virulence genes, antibiotic resistance determinates, IS elements and transposons 
. Many of these, however, are part of the accessory genome presumably acquired through lateral gene transfer. Pyrosequencing and microarray studies also noted the genomic differences between such strains, suggesting the existence of two different clades, while still emphasizing the importance of the accessory genome differences in distinguishing these subpopulations 
. In a recent publication, we noted a large intra-species difference (3–10%) in the nucleotide sequences of pbp5
, and gls20
between HA clade and CA clade strains 
. Interestingly, we subsequently noted an ~6% difference (data not shown) between HA clade and CA clade strains in the purK
allele (used for MLST) which also separated the strains into two distinct groups using UPGMA. Although not all HA-clade strains contained the purK1
allele, we found that these strains were still distinctly different from strains in the CA clade (data not shown). With the advent of numerous draft genome E. faecium
sequences and one closed E. faecium
genome sequence (manuscript in preparation) and the fact that extensive analysis had yet to be reported regarding the core genomic differences, we sought to determine the extent of the differences between the two groups at a more fundamental level.
Consistent with our previous study of four genes 
as well as consistent with the division seen in the phylogenomic tree for 7 of these strains using a concatenation of 649 proteins by van Schaik et al. 
, the difference in the concatenated sequences of 100 genes between the two clades is approximately 3.5–4.2%, clearly establishing the core genomic differences between these two subpopulations. The fact that >90% of the 100 core genes separated into two distinct groups and that the associated amino acid changes were found in most of the proteins analyzed and in a wide variety of metabolic and cellular processes ( and Table S1
) , shows that there are likely differences between the two clades at a fundamental level. In addition, a relatively large number of the sequence changes between the strains (~60%) were clade-specific changes. Changes in metabolism and cellular processes could be another reason why some strains adapt better to the hospital environment.
Not all strains that grouped genetically with the strains of the HA or CA clades had a hospital or community origin. One blood culture strain from a hospitalized patient, 1141733, always associated with strains in the CA clade, and E1039, a healthy volunteer fecal sample grouped with the HA clade. The fact that two of the 21 strains did not separate into the CA or HA clade according to origin demonstrates the complex ecology of colonizing and infecting E. faecium. A strain that did not fall strictly into one or the other clade was 1231408, which we call a hybrid strain ( and ). In the SNP analysis, we were able to see that this strain recombined somewhere around ORF 10683, as the first part of its concatenated SNP sequence before ORF 10683 showed near identity with the SNP sequence of CA clade strains, and, after that point, its sequence showed near identity to the concatenated SNP sequence of the HA strains. Other evidence of recombination lies in the fact that, frequently, a few genes from a strain in one clade grouped with genes in the other clade a limited number per strain.
We also sought to estimate the time of separation of the HA and CA clades. A number of publications have tried to infer the molecular evolution of bacteria 
, with two main strategies having been used: 16S rRNA and synonymous SNPs for the whole genome. Our 16S rRNA analysis showed a 0.06% to 0.1% difference between the two clades; this is a relatively large percentage within a species ( and Figure S3
) and estimates the time of divergence between the clades as between 1.5–2.5 million years ago. However, since recent studies have expressed concerns about 16S rRNA being a reliable chronometer for bacterial evolution we also used the sSNPs to calculate the molecular clock. It has been suggested that the sSNP sites of protein-encoding genes reflect the underlying rate of mutation more reliably because they are not affected by selection or genetic drift and are distributed across genomes 
. This methodology seems to be especially useful for species with high levels of lateral gene transfer, such as E. faecium 
. Whole genome SNP phylogenies have been shown to be highly accurate in terms of phylogeny and are more robust in defining deeper and higher resolution relationships among closely related individuals 
. In a recent publication, it was determined that a select number of SNPs was sufficient to accurately determine the current phylogenetic position of any B. anthracis
strain and could replace a tedious genome-wide SNP analysis, indicating that our approach, using 100 genes present in all strains and spread throughout the chromosome 
, was justified.
According to our sSNP molecular clock estimate using E. coli parameters, strains in the HA clade diverged from each other ~100,000 to 300,000 years ago, whereas strains in the CA clade diverged from each other ~300,000 to 900,000 years ago, corroborating the idea that HA clade strains in the available collection stem from a relatively recent ancestor. The sSNP analysis estimates, as well as the 16S rRNA analysis, the split between the two clades as somewhere around 1–3 million years ago. Even using the higher mutational rate used for B. anthracis, the estimated divergence time was 300,000 years ago. This highlights the fundamental core genomic differences between the two clades that could (in addition to the accessory genomic differences) be a reason why some strains adapt to the hospital environment and become opportunistic pathogens, while other strains do not.
The estimates of the time of divergence above presume that the rate of evolution of the HA and CA strains has remained constant over time. However, the transition to the pathogenic role may be associated with an increase in the mutation rate through selection of mutator strains. Thus, an alternative hypothesis to a gradual evolution of these clades from a common ancestor is that a well-adapted strain entered a new niche and then accumulated spontaneous mutations in genes, for example, mismatch repair genes, that then allowed the strain to go through one or more periods of rapid evolution. Of interest, the mutS gene was one of the 100 genes analyzed and it also showed prominent differences between the two clades, with 9 amino acid differences although their effect on function is not known. Nonetheless, although the estimate for the time of divergence is a very crude one (as it is hard to determine the divergence of a species without a fossil record), it suggests, even using mutation rates up to 1000 fold higher than estimated for B. anthracis, that the CA clade and HA clade isolates diverged long before the modern antibiotic era and tertiary care environment.
In summary, a number of studies have previously shown that E. faecium
hospital-associated strains differ from many community/commensal strains 
and it has been postulated that the driving force behind the recent success of this opportunistic pathogen in hospitals was the gain of mobile genetic elements carrying antibiotic resistance determinants, virulence and/or fitness factors 
. In this paper, we have shown, using 100 core genes, that E. faecium
strains belong to one of two subpopulations, or clades, that differ by ~3.5–4.2% at the DNA level and that the estimated time of divergence between these two clades is at least 300,000 years ago, based on estimates for B. anthracis
and/or E. coli
rates of mutation and generation times in nature. Furthermore, the HA clade strains are more closely related to each other and diverged from each other more recently compared to the CA clade strains. These data further clarify the evolutionary history of hospital-associated E. faecium
and show the extent of the differences between the two clades at the core genomic, protein, and synonymous SNP, providing evidence that acquired elements are not the only factors behind the recent success of this opportunistic organism and suggest that divergence between and within the clades took place many years ago.