Molecular typing of
Staphylococcus aureus is commonly used for identification of putative transmissions among patients as well as for surveillance of both local and international clones. In such a context, sequence analysis of the repeat region of the
spa gene is extensively used for typing
S. aureus isolates (i.e.,
spa typing) (
5). Yet recent studies investigating the evolutionary history of single
S. aureus sequence types (STs) using high-throughput sequencing data highlighted that
spa typing may occasionally reflect homoplasies (
6,
9). Homoplasies are similarities in character states for reasons other than inheritance from a common ancestor and might have serious consequences for interpreting
S. aureus typing data. For example, homoplasies can misleadingly indicate transmission between unrelated patients (
11) or misleadingly suggest the global spread of individual local clones (
9). One way to get around ambiguities created by homoplasies is to add other independent markers to the
spa gene. This approach has, for example, been used in the double-locus sequence typing (DLST) method, for which partial sequences of the repeat regions of both
clfB and
spa genes are combined (
8). In this study, we aimed to investigate the utility of adding a second locus to the
spa gene to overcome the limitations of a single-locus typing method. For this reason, we analyzed a collection of 127 international
S. aureus isolates belonging to ST-5 with DLST. These isolates had previously been sorted into at least 14 phylogenetic lineages on the basis of genome-wide single nucleotide polymorphisms (SNPs), and they showed 19 different
spa types (
9). Among the nine
spa types shared by at least two isolates, six were found in multiple unrelated haplotypes and/or lineages, suggesting homoplasies (
9).
To determine the DLST types of the 127 ST-5 isolates, we sequenced approximately 500 bp from each of the
clfB and
spa genes as already described (
2,
8). It is important to note that although
spa typing and DLST-
spa investigate polymorphisms in the same repeat region of the
spa gene, the methods do not analyze exactly the same sequences. Whereas
spa typing analyzes the entire repeat region, DLST-
spa investigates only ca. 500 bp of the same region. Therefore, the
spa alleles of these two methods are not identical. A table of correspondence between the two categories of alleles can be found in reference
2. Thirty-six DLST-
clfB alleles and 25 DLST-
spa alleles were observed for the 127 isolates. In a first step, these alleles were mapped on the minimum spanning tree of these isolates that is based on the 156 SNPs assessed in reference
9 (A and B). Similarly to reference
9, an allele was considered homoplasious when it occurred simultaneously in haplotypes that were unrelated based on the minimum spanning tree, suggesting that it emerged several times independently. This is a valid approach because the SNP-based tree was almost unique, as there were almost no homoplasies among SNPs (homoplasy index, 0.04) (
9). In addition, several methods were used to identify homoplasies on more statistically robust grounds.
Among the 10 DLST-
clfB alleles and 11 DLST-
spa alleles that occurred in more than one haplotype on the minimum spanning tree, 5 and 6, respectively, occurred in unrelated haplotypes and represented potential homoplasies (asterisks in A and B). Combining both genes into DLST gives a total of 58 DLST types, confirming the higher discriminatory power of this method. Among the 14 DLST types that occurred in at least two haplotypes, 4 occurred in unrelated haplotypes and represented potential homoplasies (asterisks in C). This proportion is not significantly different than that with single-gene typing (4/14 versus 5/10 and 6/11), though the small sample sizes preclude a meaningful statistical analysis of proportions. The potentially homoplasious DLST types were in all cases composed of a homoplasious allele at one locus in combination with the ancestral allele at the other locus (i.e., either
clfB allele 2 and a
spa allele other than
spa allele 2, repectively, or a
clfB allele other than
clfB allele 2 and
spa allele 2, respectively). The stability of ancestral alleles is supported by the observation that for both loci, the respective ancestral type was shared by most lineages. A recent study showed that several strains isolated 2 to 3 decades apart in different parts of the world shared identical DLST-
spa alleles (
1).
Maximum-parsimony phylogenetic analysis globally showed the same picture as the minimum spanning tree, although the support for branching order (i.e., bootstrap values) was relatively low. Bootstrapping (and other resampling methods) provides low support for short branches in general because it is based on drawing subsamples from the alignment in such a way that some SNPs are not represented in some of the resulting alignments and trees.
Another method to identify homoplasies is to look for alleles occurring simultaneously in two different haplotypes, as described in reference
13 (i.e., 4-gamete test). In our data set, only DLST-
spa alleles 2 and 66 and 2 and 16 were in this situation, suggesting that these alleles or their haplotypes were homoplasious. In contrast to DLST-
spa, no shared DLST-
clfB alleles or DLST types occurred simultaneously in two different haplotypes, suggesting that adding the
clfB gene might overcome
spa homoplasies. However, this approach is relatively conservative, since it requires having haplotypes with shared alleles, and not all the homoplasies will be identified by this method (
7).
To further take into account phylogenetic uncertainty, we used a Bayesian Markov chain Monte Carlo (MCMC) approach (
10). We calculated the association index (AI), parsimony score (PS), and maximum monophyletic clade (MC) statistics, which are correlated with the strength of the phylogeny-trait association, for each allele/type of each typing method with BaTS v1.0 (
10). This software provides significance estimation while accounting for uncertainty by the use of posterior sets of trees obtained through earlier Bayesian MCMC analyses. MCMC analyses were performed using BEAST v.1.6.0 (
3) for 10
8 generations, with tree sampling every 10
5 generations. For BaTS analyses, the first 10 of the 1,000 sampled trees were discarded as burn-in and 200 randomizations were performed to estimate the null distributions for the AI, PS, and MC statistics (
10). For each typing method (DLST-
clfB, DLST-
spa, and DLST), the MC analyses identified several alleles without significant association with the SNP-based phylogeny (
P > 0.05) (), including DLST-
clfB alleles 19, 293, and 417, DLST-
spa alleles 257, 277, and 505, and DLST types 2-66, 4-2, 4-16, 417-2, and 4-277. The proportions of these homoplasious alleles among those occurring in more than one haplotype were 3/10 (30%) for DLST-
clfB, 3/11 (27%) for DLST-
spa, and 5/14 (36%) for DLST. Hence, sequencing a second locus did not reduce the proportion of homoplasious alleles. Moreover, the AI and PS statistics detected a significant association between trait and phylogeny, indicating that the potential homoplasies in each method did not affect the overall association between alleles and phylogeny.
| Table 1Values of the allelic MC and overall AI and PS statistics for each DLST-clfB and DLST-spa allele and each DLST type occurring in more than one SNP-based haplotype |
The existence of identical
clfB or
spa alleles in unrelated haplotypes is likely explained by the particular mutation patterns of these loci, which mostly diversify through duplication and/or deletion of repeat units (
4,
12). In this situation, it is not surprising to encounter the same configuration of the repeat several times during its evolution. Homoplasies do not seem to be frequent among clonal complexes (CCs), since most of the DLST-
clfB or DLST-
spa alleles are specific to CCs (
1). Although homoplasy seems to be common within ST-5, the extent of this phenomenon remains to be tested for other sequence types. A recent analysis of multiple ST-239 genomes highlighted only one homoplasy with
spa typing (
6), and the analyses of other clonal lineages will have to await the availability of high-resolution phylogenetic reconstructions.
In conclusion, adding a second highly variable locus to the spa gene (DLST) seemed to increase the discrimination of types. However, the high proportion of ancestral alleles caused the sequencing of an additional locus to be insufficient for determining definite inference of evolutionary relationships within a single multilocus sequence type.