|Home | About | Journals | Submit | Contact Us | Français|
Attention has recently been drawn to Enterococcus faecium because of an increasing number of nosocomial infections caused by this species and its resistance to multiple antibacterial agents. However, relatively little is known about pathogenic determinants of this organism. We have previously identified a cell wall anchored collagen adhesin, Acm, produced by some isolates of E. faecium, and a secreted antigen, SagA, exhibiting broad spectrum binding to extracellular matrix proteins. Here, we analyzed the draft genome of strain TX0016 for potential MSCRAMMs (microbial surface component recognizing adhesive matrix molecules). Genome-based bioinformatics identified 22 predicted cell wall anchored E. faeciumsurface proteins (Fms) of which 15 (including Acm) have typical characteristics of MSCRAMMs including predicted folding into a modular architecture with multiple immunoglobulin-like domains. Functional characterization of one (Fms10, redesignated Scm for second collagen adhesin of E. faeciu m) revealed that recombinant Scm65 (A- and B-domains) and Scm36 (A-domain) bound efficiently to collagen type V in a concentration dependent manner, bound considerably less to collagen type I and fibrinogen, and differed from Acm in their binding specificities to collagen types IV and V. Results from far-UV circular dichroism of recombinant Scm36 and of Acm37 indicated that these proteins are rich in β-sheets, supporting our folding predictions. Whole-cell ELISA and FACS analyses unambiguously demonstrated surface expression of Scm in most E. faecium isolates. Strikingly, 11 of the 15 predicted MSCRAMMs clustered in four loci, each with a class C sortase gene; 9 of these showed similarity to Enterococcus faecalis Ebp pilus subunits and also contained motifs essential for pilus assembly. Antibodies against one of the predicted major pilus proteins, Fms9 (redesignated as EbpCfm), detected a “ladder” pattern of high-molecular weight protein bands in a Western blot analysis of cell surface extracts from E. faecium, suggesting that EbpCfm is polymerized into a pilus structure. Further analysis of the transcripts of the corresponding gene cluster indicated that fms1 (ebpAfm), fms5 (ebpBfm) and ebpCfm are co-transcribed, consistent with pilus-encoding gene clusters of other gram-positive bacteria. All 15 genes occurred frequently in 30 clinically-derived diverse E. faecium isolates tested. The common occurrence of MSCRAMM and pilus-encoding genes and the presence of a second collagen-binding protein may have important implications for our understanding of this emerging pathogen.
Enterococcus faecium, a member of the normal commensal flora, has recently emerged as a prominent nosocomial pathogen causing serious infections, including infective endocarditis (Murray, 2000). Nosocomial infections due to E. faecium can be a life threatening challenge to physicians because of this organism’s multi-drug resistance (Murray, 2000; Rice, 2001). Furthermore, in addition to the selective advantage conferred by antibiotics, strains that emerge in the hospital setting more often carry putative virulence genes such as esp encoding enterococcal surface protein (Rice et al., 2003; Willems et al., 2001), hyl encoding a putative hyaluronidase (Rice et al., 2003) and a functional copy of the acm gene encoding a collagen adhesin (Nallapareddy et al., 2003; Nallapareddy et al., 2007b).
It is known that pathogenic bacteria have evolved a plethora of proteins to adhere to and invade host tissues and to resist host defenses (Pizarro-Cerda & Cossart, 2006). Among these is a family of surface proteins with well-established roles in host-pathogen adherence which have been termed MSCRAMMs (microbial surface component recognizing adhesive matrix molecules) (Patti et al., 1994). MSCRAMMs share several characteristics including i) an N-terminal signal peptide, ii) a non-repeated A-domain consisting of Ig-like fold(s), iii) a B-domain with a variable number of repeats among different strains, and iv) a C-terminal CWA (cell wall anchor) domain. Some of these MSCRAMMs were recently shown to be tethered to each other by a designated sortase to make up multimeric cell surface structures, named pili (Mora et al., 2005; Nallapareddy et al., 2006). Sortases, encoded by the srtA to srtD classes of genes (Dramsi et al., 2005), were originally described as membrane-bound transpeptidases that cleave the LPXTG-like motif in the CWA domain and covalently link them to the peptidoglycan (Marraffini et al., 2006). While Class A sortases appear to be ubiquitous and involved in cell surface anchoring of a large number of LPXTG containing proteins (Marraffini et al., 2006), the class C (subfamily 3) sortase enzymes, which have a more limited substrate specificity, have recently been shown to be involved in pilus biogenesis, in addition to their role in surface anchoring (Kemp et al., 2007; Scott & Zahner, 2006; Telford et al., 2006).
Studies that have characterized the binding interactions of staphylococcal and enterococcal MSCRAMMs identified that the ligand-binding A-domains consist of two to three subdomains (N1-N2/N3) each adopting an Ig-like fold (Liu et al., 2007; Nallapareddy et al., 2007a; Ponnuraj et al., 2003; Zong et al., 2005). Based on the crystal structures of prototype MSCRAMMs, two slightly different models have been proposed to explain their binding mechanisms to linear peptides of fibrinogen and to triple-helical collagen. In the “dock, lock and latch” binding model, a fibrinogen chain is inserted into a cleft between two Ig-folded subdomains and is then secured by a C-terminal N3 extension (latch) that is reoriented upon ligand binding and complements a β-sheet on the N2 subdomain (Ponnuraj et al., 2003). A variation of this two-subdomain binding model, “the collagen hug”, has been proposed for the collagen-binding MSCRAMMs (Liu et al., 2007; Zong et al., 2005).
Previous in silico analyses have identified a family of genes encoding MSCRAMM-like proteins in genomes of several gram-positive bacteria including our reports of the ebp (endocarditis and biofilm-associated pilus of Enterococcus faecalis) operon, Ace (adhesin of collagen from E. faecalis) and Acm (adhesin of collagen from E. faecium) (Bowden et al., 2005; Nallapareddy et al., 2000; Nallapareddy et al., 2003; Nallapareddy et al., 2006; Roche et al., 2003; Sillanpaa et al., 2004; Xu et al., 2004). Recently, Hendrickx et al. (2007) predicted 22 CWA proteins from E. faecium TX0016 (formerly, DO (Arduino et al., 1994)) genome, of which five were found to be enriched in isolates of the hospital-adapted clonal complex 17 (CC17). However, with the exception of the prototype MSCRAMM, Acm (Nallapareddy et al., 2003)), there has been no demonstration of other MSCRAMMs in E. faecium. Although our previous report identified an essential and secreted broad-spectrum adhesin, SagA, exhibiting binding to fibrinogen, collagen type I (CI), collagen type IV (CIV), fibronectin, and laminin, this protein lacks a CWA domain and other MSCRAMM characteristics (Teng et al., 2003).
In the present study, we identified 14 genes (in addition to acm) encoding predicted MSCRAMMs in the genome of E. faecium TX0016. Recombinant forms of one of these proteins, designated as Scm (for second collagen adhesin of E. faecium), were characterized to confirm the structural predictions and to identify its ligand. Cell surface expression of Scm by selected E. faecium strains was quantitated using FACS, and antibodies raised against a recombinant form of one of the major pilus proteins showed a high-molecular weight (HMW) ladder pattern characteristic of gram-positive pili. Co-transcription of one of the pilus-encoding gene clusters was demonstrated by Northern hybridization and RT-PCR. In addition, we determined the distribution of the MSCRAMM encoding genes among 30 diverse E. faecium clinical isolates.
Relevant characteristics of bacterial strains, growth conditions, and plasmids used in this study are summarized in Table 1 and in Supplementary Methods. All constructs were given TX numbers and plasmids from these constructs were assigned respective pTEX numbers (Table 1).
Escherichia coli and E. faecium cells were grown in Luria-Bertani (LB) broth/agar and Brain Heart Infusion broth/agar (Difco), respectively. For E. coli constructs, antibiotics were used at the following concentrations: 25 μg kanamycin ml-1 and 100 μg ampicillin ml-1.
Bioinformatics methods used for genome-wide identification of CWA proteins are described in the Supplementary Methods. Domain architecture as well as fold-recognition analyses were carried out by comparing the protein sequences against domain databases as described in Supplementary Methods.
Genomic DNA from E. faecium strains was isolated as described earlier (Wilson, 1994). DNA regions encoding amino acids 27 to 624 (A-domain and B-repeats) and 27 to 333 (A-domain) of Scm and aa 33 to 590 of EbpCfm (Fms9 was renamed as EbpCfm based on its 74% aa identity (84% similarity) with EbpC of E. faecalis over the entire protein) (mature protein without the signal peptide and the CWA domain) were amplified using primers listed in Supplementary Table S1 and cloned into the expression vector pQE30 (Qiagen) to obtain pTEX5432, pTEX5628 and pTEX5630 (Table 1). The corresponding expressed protein segments of Scm were designated as rScm65 and rScm36, respectively, based on their calculated molecular masses. Similarly, the cloned segment of EbpCfm was designated as rEbpCfm62. Constructs were confirmed by sequencing.
Recombinant proteins with N-terminal His6-tags were expressed and purified using nickel affinity chromatography followed by anion exchange chromatography (Nallapareddy et al., 2007a; Sillanpaa et al., 2004). Protein concentrations were determined by absorption spectroscopy at 280 nm using calculated molar absorption coefficient values. Molecular masses were determined with MALDI-TOF MS for rScm36 and rScm65.
Far-UV CD spectroscopy data were collected as described previously (Sillanpaa et al., 2004). Secondary structure compositions were quantitated with ContinLL, SELCON3 and CDSSTR (http://www.cryst.bbk.ac.uk/cdweb/html/home.html) (Lobley et al., 2002; Whitmore & Wallace, 2004).
Binding of the recombinant His-tag fusion proteins to components of the ECM (extracellular matrix) was tested using a previously described assay with minor modifications (Nallapareddy et al., 2000). In brief, 1 μg of each ECM protein (for source, see Supplementary Methods) was coated in 100 μl PBS in Immulon 2HB (Thermo Scientific) 96-well microplate wells. Wells were incubated with various concentrations of rScm and bound His-tag proteins were detected with anti-His6 mAb (GE Healthcare) followed by alkaline phosphatase-conjugated anti-mouse antibody (Bio-Rad). p-nitrophenyl phosphate (Sigma) was used for signal detection.
Polyclonal goat antibodies against rScm36 and rEbpCfm62 were raised at Bethyl Laboratories. Scm36- and EbpCfm62-specific Igs were eluted from CnBr-activated Sepharose 4B coupled with the corresponding antigen, according to the manufacturer’s protocol (GE Healthcare). 0.1 M glycine, pH 2.8, was used for elution of bound antibodies, which were immediately neutralized by 1 M Tris/HCl, pH 8.0, and dialyzed extensively against PBS. Antibody concentrations were determined using an estimated IgG molar absorption coefficient value of 210 000 M-1cm-1 and a molecular mass of 150 000 Da.
Surface expression of Scm on E. faecium cells was detected by a whole-cell ELISA assay (Nallapareddy et al., 2003) using affinity purified rScm36-specific Igs. Control antiserum against formalin-killed TX0016-whole-cells (Rakita et al., 2000) was used as a positive control.
To quantitate surface expression of Scm by FACS analysis, bacteria grown in BHI 14 h from an inoculum of OD600nm = 0.01, were labeled with preimmune or affinity-purified anti-Scm-specific antibodies followed by donkey anti-goat IgG conjugated with F(ab’)2-fragment-specific R-phycoerythrin, as described earlier (Kemp et al., 2007). Cells were then fixed in 1 % paraformaldehyde in PBS and analyzed with a Coulter EPICSXL AB6064 flow cytometer (Beckman Coulter) and System II software.
CWA proteins were extracted from E. faecium strains grown for 8 h in BHI broth to late-exponential phase (starting inoculum, ~0.01 OD600nm) with mutanolysin as described earlier (Nallapareddy et al., 2006). Equal amounts of concentrated mutanolysin extracts were separated using 4%-15% gradient SDS-PAGE gels (Bio-Rad) under reducing conditions and transferred to PVDF membranes according to the manufacturer’s protocol. Membranes were probed with affinity-purified anti-Scm36 and anti-EbpCfm62 antibodies (see above) followed by HRP-conjugated anti-goat IgG antibodies and, as a control, total IgG antibodies purified from preimmune goat sera were used. Signal was detected using SuperSignal West Pico Chemilumiscent detection reagents (Thermo Scientific).
Total RNA was isolated from TX0082 grown in BHI to mid-exponential phase using the RNAprotect Bacteria Reagent and RNeasy Mini Kit (Qiagen). Thirty micrograms of total RNA were separated using a formaldehyde-containing agarose gel under denaturing conditions and transferred to a Hybond-N+ membrane as described by the manufacturer (GE Healthcare). DNA probes obtained with primers listed in Supplementary Table S1 were radiolabeled by using the RadPrime DNA labeling system (Invitrogen). Hybridization was performed under high stringency conditions as detailed earlier for Southern blots (Murray et al., 1993).
Total RNA (isolated as above for Northern hybridization) was treated twice with 20 U RQ1 DNase (Promega) for 30 minutes at 37°C. DNase was removed using the RNeasy Mini Kit and purification protocol (Qiagen). Total RNA was then reverse transcribed with specific primers (Supplementary Table S1) using the SuperScript One-Step RT-PCR with Platinum Taq kit (Invitrogen) according to the manufacturer’s instructions. DNA sequencing verified that the primer regions of TX0082 are 100 % identical to the corresponding sequences in TX0016. As an internal control, a 509-bp fragment of gyrA (encoding gyrase A) was amplified using the FmGyrF and FmGyrR primers (Supplementary Table S1). Reactions without RT were used as controls to verify lack of DNA contamination in the total RNA preparation.
Our search of the nearly completed genome sequence of E. faecium endocarditis-derived strain TX0016 (GW, BEM, et al., unpublished data, http://www.hgsc.bcm.tmc.edu/) yielded a total of 22 ORFs encoding putative CWA proteins (designated Fms, for E. faeciumsurface proteins) with a tripartite pattern near the C-terminus (an LPXTG motif or variant, followed by a membrane-spanning hydrophobic region and a positively charged tail). This number is in the range of predicted CWA proteins of related gram-positive bacteria (Ponnuraj et al., 2003) which vary from 8 (Streptococcus mutans) to 41 (E. faecalis). While the number of CWA proteins identified here is the same as the number reported in a recent study (Hendrickx et al., 2007), which identified CWA proteins from a partial TX0016 genome sequence (available at NCBI), the published report did not include one of the ORFs here (Fms6) and classified a pseudogene (Fms15) as two ORFs (Supplementary Results).
Among these 22 putative CWA proteins, 18 contained a predicted N-terminal signal peptide sequence required for Sec-dependent secretion. Further analysis of the 5′-regions of the remaining four ORFs (Fms14, Fms15, Fms16, and Fms19) lacking a signal peptide revealed the presence of N-terminal signal peptides in the ORFs immediately upstream to Fms15, Fms16, and Fms19. Careful examination of the junction regions showed the presence of a premature stop codon due to a point mutation or frame-shift in fms15, fms16, and fms19 (Supplementary Results). Sequencing of fms16 and fms19 regions from additional isolates identified intact fms16 and fms19 genes in two of 10 E. faecium clinical isolates tested (Supplementary Results). However, we did not find an E. faecium strain with an intact fms15 gene (Supplementary Results).
Subsequently, using the fold recognition servers 3D-PSSM and PHYRE, we identified 15 of the predicted Fms proteins (Table 2) as containing one or more Ig-like folds enriched with β-sheets and only a small quantity of α-helices; similar secondary structure compositions have been found in the ligand binding A-regions of MSCRAMMs (Deivanayagam et al., 2002). One of these, Fms8, was previously identified in our laboratory as Acm (Nallapareddy et al., 2003). The percentage of E. faecium LPXTG-proteins containing Ig-like folds is greater than the percentages of other gram-positive bacteria (22 % - 45 %) (Ponnuraj et al., 2003). The Ig-like fold containing modules of the A-domain of a number of well characterized S. aureus MSCRAMMs (e.g., the fibronectin-, fibrinogen-, and elastin-binding protein, FnbpA; the fibrinogen-binding proteins, ClfA and ClfB; the collagen-binding protein, Cna) and the prototype MSCRAMM of E. faecium, Acm, have been shown to play a direct role in interactions with their respective ECM protein ligands (Nallapareddy et al., 2007a; Ponnuraj et al., 2003; Zong et al., 2005).
Because of our interest in surface adhesins, we next performed additional analyses on the subset of predicted Ig-like fold-containing proteins and identified that most possess a multi-domain architecture similar to the characterized MSCRAMMs of S. aureus and E. faecalis (Deivanayagam et al., 2002; Perkins et al., 2001; Sillanpaa et al., 2004) (Supplementary Results). A schematic of the predicted primary sequence organization of these proteins is depicted in Fig. 1. The N-terminal signal peptide sequences, found in all but Fms14, ranged from 24-56 aa in length. The majority had perfectly conserved canonical LPXTG motifs, while six had anchor motifs with a broader consensus (Table 2).
Fms10, Fms15, and Fms18, like Acm, are encoded by individual genes and are not clustered with other MSCRAMM-encoding genes. Among these, Fms18 has 94 % similarity (Table 2) to the E. faecalis MSCRAMM EF1896 which is predicted to be mostly Ig-folded and shows binding to a serum protein (J. S, S.R.N., B.E.M., and M. Hook, unpublished data). Scm (Fms10) of TX0016 has thirteen B-repeats of 19 aa in length and a partial repeat of 10 aa, while pseudogene fms15 predicts 8 B-repeats of 91 to 93 aa in length. The primary sequences of Scm and Fms15 each predict a single subdomain with an Ig-like fold in the A-domain, unlike the well-established MSCRAMMs which have two to three subdomains. Hence, we hypothesize that, if these proteins are adhesins, they may adopt a different binding mechanism, and we chose Scm for further characterization.
Total yields of the recombinant protein rScm65 (full mature protein consisting of the A-domain and B-repeats, aa 27-624) were low. Although we estimated that this protein was over 95 % pure immediately after eluting from the chromatographic columns as shown in Supplementary Fig. S1, it was susceptible to partial degradation even after short storage. rScm65 migrated as a larger band than expected from the calculated molecular weight of 64.5 kDa, likely due to the highly acidic nature (pI = 3.9) of the protein. To obtain a stable construct with higher yields, we next expressed rScm36 encompassing the complete A-domain (aa 27-333) (Supplementary Fig. S1). A MALDI-TOF mass spectrometry analysis determined the molecular mass of rScm36 as 35 711 Da, which is in good agreement with the calculated mass of 35 691 Da, and confirmed the absence of posttranslational modifications and homogeneity of the purified protein. As an additional confirmation of the identities of the purified protein segments and evidence of their intact N-termini, a monoclonal anti-His6 antibody detected the N-terminal His6- fusion in both purified protein segments in a Western blot assay (data not shown).
Since the adhesive functions of previously studied MSCRAMM proteins have mostly been ascribed to binding to protein components of the host ECM, we examined the potential ECM-adhesive functions of Scm by testing its binding to a panel of individual ECM proteins in an ELISA-type solid-phase ligand binding assay. As seen in Fig. 2(a), both rScm65 (containing the full mature protein) and Scm36 (containing only the A-domain) exhibited significant binding to CV (35-fold and 82-fold, respectively, higher than the background level seen with immobilized BSA). Interestingly, binding of these rScm proteins to fibrinogen and CI (especially rScm36) was also observed compared to their binding to CIV, laminin, and fibronectin (Fig. 2a).
Further analysis of the binding interactions of increasing concentrations of rScm36 with CI and CV as well as fibrinogen by dose response assays showed concentration dependent binding of rScm36 to CV which approached saturation with apparent half-maximal binding (KD) reached at 30.8 ± 1.8 μM (Fig. 2b). The apparent affinity of rScm36 for fibrinogen and CI was low and did not reach reliable saturation levels. In spite of partial degradation, Scm65 showed dose dependent binding with similar affinity for CV as Scm36 (data not shown). Although KD’s from steady-state assays are estimations, our affinity determination for the Scm A-domain is in the same range as previously shown for the minimal ligand binding domain of Ace to CI (48 ± 7 μM) (Rich et al., 1999), but lower than the A-domain of Acm to CI (3.8 μM) or CIV (12.8 μM) (Nallapareddy et al., 2003). Interestingly, Acm did not show appreciable binding to CV (data not shown) and bound efficiently to CI and, to a lesser degree to CIV, in agreement with our previous report (Nallapareddy et al., 2003). Taken together, our results suggest that CV is a binding ligand for Scm and this binding is mediated by the A-domain.
The three collagen types included in our assays have distinct structures and tissue distributions. CI is the most abundant form and the main component of collagen fibers that are widely distributed in human tissues, while CIV forms structurally different cross-linked networks and is found nearly exclusively in basement membranes. CV, although quantitatively less abundant, has a critical role in formation of the fibrillar collagen matrix and connecting interstitial collagen fibrils with membranous collagen networks (Nicholls et al., 1996; Wenstrup et al., 2004). Considering the diverse tissue localizations and structural differences of the various collagen types, possessing two collagen adhesins with different binding specificities to various collagen types could give E. faecium the ability to fine-tune its adherence phenotype to suit a given tissue. Since collagens, including CV, are also present in the intestinal submucosa (Liang et al., 2006), Scm could alternatively be involved in colonization and persistence in the intestinal tract, a major natural reservoir of both commensal and infection-associated strains of E. faecium in humans, or in facilitating translocation through damaged intestinal epithelium.
To investigate the structural predictions made above, we analyzed the full A-domain of Scm with far-UV CD spectroscopy. The obtained spectra showed a maximum at 197 nm and another at 191 nm, and a minimum at 217 nm (Fig. 3a). A similar overall pattern was observed with the collagen-binding A-domain of the E. faecium prototype MSCRAMM Acm, which we have previously characterized (Nallapareddy et al., 2003; Nallapareddy et al., 2007a), and also with the minimal collagen binding region of Ace from E. faecalis, for which crystal structure studies have recently revealed that it folds into a similar DEv variant of the Ig-fold as previously found in the ligand binding A-domains of staphylococcal MSCRAMMs (Liu et al., 2007).
Deconvolution of the collected data showed a secondary structure composition of 0.15 ± 0.03 of α-helix and 0.34 ± 0.01 of β-sheet for rScm36 (Fig. 3b). Although the helical content of this protein is slightly higher than in the two control proteins, these results generally resemble the overall secondary structure compositions of the ligand binding regions of Acm and Ace and are in good agreement with earlier CD analyses and crystal structure data of other MSCRAMM A-regions (Rich et al., 1999). While fold analyses and multiple alignments predicted a single Ig-folded subunit from aa 125 to 325 in the A-domain of Scm, our CD measurements indicate a high β-sheet composition for the entire Scm A-domain, thus suggesting that an Ig-folded or similar β-sheet-rich structure may extend over the whole A-domain (aa 27 to 333).
Using Scm-specific antibodies affinity purified (with rScm36) from goat anti-serum, we assessed the surface expression of Scm by 8 E. faecium clinical isolates (including the sequenced strain TX0016) using whole-cell ELISA. Two E. faecium community derived strains which lacked scm (identified below) were used as negative controls. Among 8 scm+ isolates, four (TX0074, TX2535, TX2400 and TX2589) were strongly positive, two (TX0068 and TX2081) were positive, one (TX2416) was weakly positive and one (TX0016) was negative in this assay (Supplementary Fig. S2), suggesting that Scm is produced on the cell surface of many E. faecium isolates during in vitro growth. Subsequent quantitation of Scm surface expression in these isolates by FACS (Fig. 4) showed variable levels of surface expression, with the percentage of cells expressing Scm ranging from 2.3 % to 96.9 %. In strains TX0074, TX2535, TX2400 and TX2589, Scm was detected on the surface of 90 to 97 % of cells and with relatively high fluorescence intensities, thus confirming our observations with whole-cell ELISA. Furthermore, these surface detection studies suggest that the majority of cells of scm+ isolates are actively producing Scm on the cell surface during in vitro growth. Consistent with our whole-cell ELISA and FACS results, a Western blot assay using the anti-Scm36 antibody identified a protein band corresponding to the expected size of mature Scm (~ 63 kDa) in mutanolysin cell wall extracts from TX0074 and TX2535, but not from TX0016 (data not shown).
Unlike Acm, Scm (Fms10), Fms15, and Fms18, the remaining 11 putative MSCRAMM genes were found clustered in four genomic loci. Nine of the clustered genes show significant similarity to pilus-associated proteins from other species (Table 2). Analysis of the nucleotide region of the locus spanning fms1, fms5 and ebpBfm (fms9) predicted the presence of four ORFs, all oriented in the same direction, including an srtC1 gene encoding a class C (subfamily 3) sortase (Fig. 1b). We subsequently renamed Fms1, Fms5, and SrtC1 as EbpAfm, EbpBfm, and Bpsfm, respectively, based on their high sequence identity with the E. faecalis Ebp cluster proteins (Nallapareddy et al., 2006) (Table 2). A similar arrangement of predicted pilus-associated proteins with an adjacent class C sortase was found in the fms14-17-13 and the fms11-19-16 clusters. Besides our prediction of ebpABCfm, fms14-17-13 and fms11-19-16 co-transcription, due to short or overlapping intergenic regions, we were unable to identify transcriptional terminator-like sequences in these gene clusters; thus, it is likely that these genes are in three operons (see also below). The genes encoding Fms20 and Fms21 were found to be located close to each other in the genome, separated by two ORFs. One of these ORFs encodes a predicted class C sortase (SrtC4). In addition to srtC4, this locus also contains one class A sortase-encoding gene (srtA) immediately upstream of fms21 (Fig. 1b). Another ORF located between SrtC4 and Fms20 shows 27 % similarity to the EbpB pilus protein and has a signal peptide sequence but no CWA motif. Of note, the E. faecium TX0016 genome also contains a sixth sortase gene (srtC5) predicting a class C sortase. However, none of the CWA protein genes were associated with the srtC5 locus.
Subsequent in silico predictions identified conserved pilin motifs and E-boxes of gram-positive pilus proteins in all four predicted major pilus protein homologues (EbpCfm, Fms13, Fms16, and Fms21) and all five of the predicted accessory protein homologues (EbpAfm, EbpBfm, Fms14, Fms17, and Fms19) associated with the four gene clusters described above (Supplementary Fig. S3). The lysine (K) residue of the pilin motif and the glutamic acid (E) of the E-box that were demonstrated to be essential in polymerization of Corynebacterium diphtheriae pilus (Ton-That & Schneewind, 2003; Ton-That et al., 2004) were found to be 100 % conserved. Two accessory proteins, namely EbpAfm and Fms14, have a von Willebrand factor type A-domain with a MIDAS-motif which is frequently found in pili-associated accessory proteins. Recently, crystal structure analyses of the minor pilin protein GBS52 of Streptococcus agalactiae and the major pilin Spy0128 of S. pneumoniae demonstrated that these proteins contain two Ig-like domains (Kang et al., 2007; Krishnan et al., 2007), consistent with our prediction of Ig-like folds in E. faecium pilus proteins. These features, together with the presence of an independent class C sortase (Class C sortases are used in the assembly of pili of related gram-positive bacteria) in each of the four clusters, indicate that E. faecium may harbor genes for multiple pilus-like structures. Recent studies on pili of C. diphtheriae, E. faecalis, group A and B streptococci, and pneumococci have demonstrated their role in bacterial adherence and biofilm formation, and their contribution to bacterial pathogenesis and modulation of the host immune system (Barocchi et al., 2006; Dramsi et al., 2006; Mandlik et al., 2007; Nallapareddy et al., 2006; Singh et al., 2007; Telford et al., 2006).
To confirm the predicted location of EbpCfm on the surface of E. faecium as part of polymeric pili, we probed mutanolysin cell wall extracts of endocarditis-derived E. faecium strains TX0082 and TX0016 with affinity-purified anti-rEbpCfm62 antibodies. A ladder-like pattern of HMW bands with sizes greater than 200 kDa was detected from strain TX0082 (Fig. 5). In contrast, only a single band was seen with Acm in mutanolysin extracts of TX0082 (Nallapareddy et al., 2003). The HMW banding pattern is consistent with observations of a large number of pilus proteins from other gram-positive bacteria and suggests that EbpCfm is similarly assembled into multimeric pilus structures on the E. faecium cell surface. In contrast to TX0082, no signal was detected from the cell wall extract of TX0016, which is similar to our whole-cell ELISA results (data not shown) and indicates that the EbpCfm-containing pilus is not expressed in this strain under the growth conditions used.
As stated above, bioinformatics analyses indicated that each of the four pilus-encoding gene clusters is likely to be organized as an operon. For the ebpAfm to bpsfm cluster, a 12-bp inverted repeat that could form a stem-loop structure (free energy of -22.55 kcal mol-1) located in the intergenic region between efbCfm and bpsfm, is likely to function as a ρ-independent transcriptional terminator, thus, predicting an efbAfm to efbCfm transcript and an independent bpsfm transcript. To validate this, we have carried out transcriptional analyses of the ebpAfm to bpsfm locus using Northern hybridization. The ebpAfm, ebpBfm, and ebpAfm probes all hybridized to a single RNA band of ~ 7 kb (Fig. 6a) which corresponds to the expected size of a polycistronic ebpABCfm mRNA transcript, demonstrating that these three genes are co-transcribed. Probing with bpsfm detected a very low intensity band (~ 1 kb), consistent with the expected size of a monocistronic bpsfm mRNA transcript, suggesting that this transcript may either be produced in low levels at this time point or has a relatively short half-life. RT-PCR using three independent sets of internal bpsfm primer pairs amplified fragments of the expected sizes (lane 4 of Fig. 6b) (data not shown for two additional primer pairs), thus confirming the expression of bpsfm. Another faint band of ~8 kb was also observed with the bpsfm -probed Northern blots; while its identity is currently unclear, it might represent a related mRNA sequence such as one transcribed from the other predicted pilus operons.
RT-PCR experiments of the ebpfm operon with primers designed to amplify intergenic regions of co-transcribed genes, confirmed the expected amplification between ebpAfm and ebpBfm as well as ebpBfm and ebpCfm, but not between ebpCfm and bpsfm (Fig. 6b). Taken together, our data unambiguously demonstrate that the E. faecium Ebpfm pilus-encoding gene cluster represents a three-gene operon, unlike the E. faecalis locus which produces a four-gene ebp-bps transcript in addition to an independent bps transcript (Nallapareddy et al., 2006).
We next examined 30 diverse (source/year/geographical location) human clinical isolates by colony hybridization using PCR generated DNA probes. E. faecium strain TX0016 and the acm gene served as controls for the hybridization studies. As shown in Table 2, the percentage of isolates showing hybridization to the individual gene probes varied from 73 % (fms20) to 100 %. The range in the number of probes hybridizing per isolate varied from 11 genes in one isolate (that lacked fms15, fms18, fms20, and fms21) to all 15 genes in 16 isolates (53 %). One isolate lacked the ebpABCfm (fms1-5-9) operon and another lacked the fms11-19-16 operon. The scm (fms10), fms14, fms17, and fms13 genes were present in all isolates tested, while fms15 was present in all but one. Among the endocarditis isolates included in this analysis, the multilocus sequence type 18 (ST18) isolate TX0068, belonging to the global epidemic hospital associated clonal complex CC17 (Leavis et al., 2003), had all 15 genes as did the sequenced ST18 isolate, TX0016. Another endocarditis isolate TX2535 (ST17) that also belongs to CC17, lacked fms18. A fourth endocarditis isolate, TX0074 (non-CC17, ST337), had 14 genes and lacked fms20. Thus, our results indicate broad but variable distribution of MSCRAMM-encoding genes in clinical isolates, consistent with a recent study of 22 E. faecium CWA protein-encoding genes in which four of the genes, fms11-19-16 (orf903-905-907) and fms18 (orf2430) were shown to be specifically enriched in CC17 isolates, while the remaining 9 genes were considered widespread in both CC17 and non-CC17 nosocomial isolates (Hendrickx et al., 2007). Of note, our preliminary data from screening for scm from several hundred of E. faecium isolates belonging to clinical and non-clinical groups indicate that scm is absent only rarely from clinical isolates but is frequently absent from stool isolates.
In summary, this study identified genes for 14 new E. faecium CWA proteins with MSCRAMM-like features. We characterized the function of one of these proteins, Scm, and demonstrated that it is a second collagen binding protein for E. faecium with specificity to collagen type V. We further showed that the identified minimal ligand-binding region has a β-sheet-rich secondary structure composition characteristic of the A-domains of MSCRAMMs. Scm was shown to be expressed on the surface of many E. faecium clinical isolates during in vitro growth. Eleven of the 14 MSCRAMM-like protein encoding genes were clustered in four loci and 9 are predicted to encode the components needed to form multiple, distinct pilus-like structures. We characterized one of these clusters (ebpABCfm-bpsfm) and showed that the four genes of this cluster are expressed in TX0082 as two transcripts, ebpABCfm and bpsfm. Subsequent detection of a HMW ladder pattern with anti-EbpCfm in Western blots of cell surface extracts from TX0082 provides experimental evidence for the assembly of EbpCfm into polymeric structures, such as pili. Our hybridization results for all 14 genes indicate a broad distribution of MSCRAMM and pilus-encoding genes among clinical isolates; a future study will assess their distribution in natural E. faecium populations from a variety of sources. The ability of E. faecium to produce two surface-anchored collagen adhesins with different collagen type specificities and the common occurrence of the scm gene as well as the 13 other newly identified MSCRAMM- and pilus-encoding genes among clinical E. faecium isolates may have important implications for colonization and infection by this opportunistic pathogen.
This work was supported in part by NIH grant R01 AI067861 from the Division of Microbiology and Infectious Diseases, NIAID to Barbara E. Murray. Additional support for this work is funded by NIH grant R01 AI20624 from the NIAID to Magnus Hook.
Publisher's Disclaimer: This is an author manuscript that has been accepted for publication in Microbiology, copyright Society for General Microbiology, but has not been copy-edited, formatted or proofed. Cite this article as appearing in Microbiology. This version of the manuscript may not be duplicated or reproduced, other than for personal use or within the rule of ‘Fair Use of Copyrighted Materials’ (section 17, Title 17, US Code), without permission from the copyright owner, Society for General Microbiology. The Society for General Microbiology disclaims any responsibility or liability for errors or omissions in this version of the manuscript or in any version derived from it by any other parties. The final copy-edited, published article, which is the version of record, can be found at http://mic.sgmjournals.org, and is freely available without a subscription.