|Home | About | Journals | Submit | Contact Us | Français|
We describe the development and application of a pooled Suppression Subtractive Hybridization (PSSH) method to describe differences between the genomic content of a pool of clinical Staphylococcus aureus isolates and a sequenced reference strain. In comparative bacterial genomics, Suppression Subtractive Hybridization (SSH) is normally utilized to compare genomic features or expression profiles of one strain versus another, which limits its ability to analyze communities of isolates. However, a PSSH approach theoretically enables the user to characterize the entirety of gene content unique to a related group of isolates in a single reaction. These unique fragments may then be linked to individual isolates through standard PCR. This method was applied to examine the genomic diversity found in pools of Staphylococcus aureus isolates associated with complicated bacteremia infections leading to endocarditis and osteomyelitis. Across four pools of 10 isolates each, four hundred and twenty nine fragments not found in or significantly divergent from the S. aureus NCTC 8325 reference genome were detected. These fragments could be linked to individual strains within its pool by PCR. This is the first use of PSSH to examine the S. aureus pangenome. We propose that PSSH is a powerful tool for researchers interested in rapidly comparing the genomic content of multiple unstudied isolates.
Staphylococcus aureus is an emerging bacterial pathogen and a leading cause of hospital- and community-acquired infections. In the United States alone, S. aureus is responsible for hundreds of thousands of infections and thousands of deaths each year (Klein, 2007). The severity of disease caused by S. aureus is dependent upon both host susceptibility and the genomic background of the infecting S. aureus strain. The S. aureus genome encodes multiple virulence factors, including numerous toxins, surface adhesion proteins and proteolytic enzymes that synergistically contribute to virulence. Comparative genomic analysis of sequenced S. aureus genomes suggests that the number of virulence factors, as well as the presence and expression of specialized factors, such as the phenol soluble modulins (Diep and Otto, 2008) and ACME (Diep et al., 2006) can be linked to disease outcome. Many of the virulence factors are carried on mobile genetic elements (MGE), including pathogenicity islands, bacteriophages and plasmids; that are responsible for horizontal gene transfer between individual strains. Acquisition of known or novel virulence factors by mobilization may lead to the emergence of hypervirulent strains or strains linked to a specific disease outcome. However, the enhancement of virulence need not be solely due to novel sequence. Sequenced S. aureus genomes still have relatively large numbers of Open Reading Frames (ORFs) without functions attributed to them. Within the thirteen sequenced S. aureus genomes, over 40% of the ORFs on their chromosomes are annotated as Hypothetical, Conserved Hypothetical or Unknown Function. Thus, it is quite likely that many of these ORFs are responsible for virulence, despite having no clear assigned function.
Several techniques, including whole genome sequencing and microarray based methods (Dunman et al., 2004), have been used to associate genomic content with a pathogenic phenotype in S. aureus. While whole genome sequencing is the preferred method for identifying DNA polymorphisms and genomic rearrangements across strains, alternative array based methods such as array comparative genome hybridization (aCGH) is often used for detection of gene or operon size differences that may contribute to pathogenicity or fitness of the bacterium. However, a significant drawback of aCGH as well as other microarray based methods is their reliance on probes representing known genes and inability to discriminate between genes with significant sequence divergence or identify genes that are novel or are acquired by horizontal transfer from other strains or species (Snipen et al., 2006).
A more effective method for novel virulence ORF discovery would use a high-throughput DNA sequencing approach that surveys multiple genomes in a single experiment and targets novel regions of the genome or those potentially associated with virulence. One DNA sequencing based method, Suppression Subtractive Hybridization (SSH), has been used in the past as an effective method of comparing two bacterial genomes (Diatchenko et al., 1996) and identification of sequence level differences between a control or reference strain and a test strain (Agron et al., 2002; Dordet-Frisoni et al., 2007; Guo et al., 2006). Adaptation of SSH to examine an environmental metagenome by other investigators (Galbraith et al., 2004) led us to develop an approach in which genomic DNA from multiple S. aureus isolates was combined into a single pool that was hybridized to a reference strain. This approach, which we have named Pooled Suppression Subtractive Hybridization (PSSH), vastly increases the power and utility of SSH and has enabled us to identify multiple novel virulence factors among a collection of clinical S. aureus isolates.
We applied PSSH (Figure) to survey a collection of 40 clinical S. aureus bacteremia isolates that were members of clonal complexes (CC) 5 and 30 associated with hematogenous seeding causing complicated endocarditis and bone and joint infections (Fowler et al., 2007). These isolates were pooled into four groups of ten isolates, based on CC (5 or 30) and complication type (endocarditis or bone and joint) and screened by PSSH using S. aureus NCTC 8325, a laboratory strain (Gillaspy, 2006) as the reference. We found that SSH can be scaled up to include pools of ten closely related microbial genomes. The application of PSSH to these pools quickly generates a library which allows for the detection of genomic differences of various sizes. We found PSSH has an exponential increase in genetic polymorphism detection power with the same efficiency and cost as a single strain comparison by SSH.
Overall, we describe a novel application, PSSH, which can be used in high-throughput screens of any bacterial genus/species to identify genomic level differences responsible for unique virulence phenotypes.
Strains (Table 1) were kindly provided by Vance G. Fowler from the Staphylococcus aureus Bacteremia Group (SABG) collected from patients at Duke University (Fowler et al., 2007). Genomic DNA was isolated as previously described in Fowler et al. (Fowler et al., 2007).
Isolates were combined into one of four pools based on Clonal Complex (5 or 30) and type of infection (infective endocarditis or bone/joint infection). Each of the four pools were assembled from 10 isolates for a total of 40 isolates. The process for each pool is as follows: 1.5μg of genomic DNA from each clinical isolate was combined in a final mixture of 15μg (final volume of 50 μL) of genomic DNA. The genomic DNA was then diluted to a final volume of 1.6 mL in sterile nebulization buffer (53.1% Glycerol, 37mM Tris-HCl, 5.5mM EDTA, pH to 7.5) (Margulies et al., 2005), mixed gently and transferred to a sealed AeroMist Downdraft Nebulizer (MedEx, Carlsbad, CA) which was placed on ice for 10 minutes. The genomic DNA was then nebulized to fragments smaller than 1000bp with N2 gas for 5 minutes at 60psi and then recovered and concentrated using the Qiagen PCR Cleanup Kit (Qiagen, Valencia, CA) to remove fragments smaller than 100bp. Each pool was then blunt-end polished and phosphorylated with the DNA Terminator End Repair Kit (Lucigen, Middleton, WI) according to manufacturer's specifications. Agarose gel electrophoresis was conducted to ensure the final range of the pool was between 100 and 1000 bp.
Procedures were carried out as described by Diatchenko (Diatchenko et al., 1996) and the Clontech PCR-Select Bacterial Genome Subtraction Kit (Clontech, Mountain View, CA) except as noted below. Briefly, Adaptor 1 (5'-CTAATACGACTCACTATAGGGCTCGAGCGGCCGCCCGGGCAGGT-3' & 5'-ACCTGCCCGG-3') and Adaptor 2R (5'-CTAATACGACTCACTATAGGGCAGCGTGGTCGCGGCCGAGGT-3' & 5'-ACCTCGGCCG-3') complexes were ligated to 120 ng of pooled blunt-ended DNA. Parallel reactions were set up with 1μL 10× Ligase Buffer, 1μL T4 Ligase (2,000,000U/mL) (New England Biolabs, Ipswitch, MA), 2μL of either Adaptor 1 or Adaptor 2R, in a final volume of 10μL. All ligation reactions were incubated at 16°C for 24 hours and then stopped by the addition of 1μL of 0.2μM EDTA and heat inactivation for 5 minutes at 72°C.
1uL of hybridization product was utilized as template in a primary PCR to enrich for fragments not found in the genome of S. aureus 8325. Conditions were as follows: 2.5uL of 10× PCR Buffer (BD Biosciences, Franklin Lakes, CA), 0.5uL 10mM dNTP mixture, 1uL of PCR Primer 1 (5'-CTAATACGACTCACTATAGGGC-3'), 0.5uL 50× BD Advantage 2 Polymerase Mix (BD Biosciences, Franklin Lakes, CA), adjusted to a final volume of 25uL with sterile H2O. The adaptors were extended by incubation at 72°C for 2 minutes followed by 25 cycles of 94°C × 30 seconds, 66°C × 30 seconds and 72°C × 90 seconds. 1uL of this product was then diluted in 39uL of sterile H2O and then 1uL of this diluent was then used as template in a secondary round of amplification. Conditions were as follows: 2.5uL of 10× PCR Buffer (BD Biosciences), 0.5uL 10mM dNTP mixture, 1uL of Nested PCR Primer 1 (5'-TCGAGCGGCCGCCCGGGCAGGT-3'), 1uL of Nested PCR Primer 2R (5'-AGCGTGGTCGCGGCCGAGGT-3'), 0.5uL 50× BD Advantage 2 Polymerase Mix (BD Biosciences), adjusted to a final volume of 25uL with sterile H2O. Enrichment of the library for sequences not found in S. aureus strain NCTC 8325 was finalized by fifteen cycles of 94°C × 30 seconds, 68°C × 30 seconds, and 72°C × 90 seconds.
Oligos, dNTPs, and short sequences were removed with the Qiagen PCR Cleanup Kit (Qiagen), and the enriched library was cloned into pCR-Blunt II – TOPO (Invitrogen, Carlsbad, CA) and transformed into One Shot Top 10 electrocompetent cells (Invitrogen) according to manufacturers protocol. Cells were grown under kanamycin selection, and the plasmid template DNA from individual colonies was extracted using the R.E.A.L. Prep 96 Plasmid Kit system (Qiagen). Plasmid templates were sequenced using BigDye reagents (Applied Biosystems, Foster City, CA) and analyzed on a AB 3130xl capillary DNA sequencer.
Sequences generated from each library were trimmed to remove extraneous vector and adaptor sequence. The trimmed sequences were then queried against the non-redundant nucleotide database of Basic Local Alignment Search Tool (Altschul et al., 1997) with an expect value cutoff of 1e−25. Sequences were then considered for further analysis if their BLAST record showed either no similarity to S. aureus strain 8325, or regions of at least 10% divergence in identity. Sequences were deposited into NCBI's GenBank under access numbers GS883538 through GS883961.
A total of 1,042 sequence reads were obtained from the four PSSH libraries. The filtration of sequences that matched the NCTC 8325 genome with at least 90% identity yielded 427 reads of interest, suggesting under these conditions, PSSH has an efficiency of 41%. This is comparable to efficiency previously reported for single strain SSH. Examination of these 427 reads revealed 16 sequences that did not match to the non-redundant nucleotide database at our cutoff levels, or 3.7% of the reads of interest. The breakdown of reads of interest obtained by pool and type is described (Table 2). The majority of these reads of interest had no homology to NCTC 8325 across their entire length. However, 55 sequences (12.9%) matched NCTC 8325 but were at least 10% divergent. Taken together these 427 reads represented 190,943bp of sequence. Each published S. aureus genome had some homologous sequence to at least one read in these pools.
When developing the PSSH approach, we first compared the efficiency of our PSSH assay with that of single strain SSH. We examined gene discovery efficiency using pool sizes ranging from 2 to 8 isolates per pool. There was no significant difference in efficiency of even the 8 or 10 strain pool when compared to single strain SSH (data not shown). Similar efficiency was previously reported for SSH experiments on Helicobacter pylori (Agron et al., 2002), Staphylococcus xylosus (Dordet-Frisoni et al., 2007) and Streptococcus mutans (Guo et al., 2006). This suggests PSSH is a powerful technique able to rapidly screen large libraries for novel ORFs or those potentially associated with virulence (or unique phenotypes).
We determined that PSSH is effective at detecting a wide range of genetic polymorphisms. These include: multi-gene operons, smaller insertions of one open reading frame or less, detecting homologs or orthologs that have some sequence diversity between comparable open reading frames and locating extrachromosomal elements such as plasmids which may be a factor in pathogenesis. One example of the power of this technique is the detection of SAR0158 SAR0159, SAR0160 and SAR0161 across three reads in the two Clonal Complex 30 pools. These genes (cap8HIJK) are members of a 16 open reading frame cluster that is co-transcribed and involved in type 8 capsule biosynthesis. In further agreement with our results, cap8HIJK has been reported as not detectable in NCTC 8325 by hybridization (Sau et al., 1997). It is not surprising that this operon is found in our pool as 50% of clinical isolates are capsule type 8 by serology (O'Riordan and Lee, 2004). By BLAST search (Altschul et al., 1997) cap8HIJK is found in four S. aureus genomes (MRSA252, MW2, RF122 and MSSA476).
PSSH also allows for the detection of reads which have detectable homology to the driver strain but are significantly divergent. We considered a fragment unique from NCTC 8325 if it was either not found in that genome or had 90% or less identity to that sequence. Even with an average read accuracy of 99.4% (Margulies et al., 2005), Sanger sequencing results in occasional base call errors. Therefore, it is important to set an identity cutoff that is stringent enough to prevent false positives from being entered into a library of unique fragments, but loose enough to allow detection of divergent sequence. Our use of 90% identity appeared to satisfy both conditions. As an example, a 427nt open reading frame, SAR2564, annotated to encode a putative membrane protein, was detected in the Clonal Complex 30 Endocarditis pool. SAR2564's detection serves to highlight the ability of PSSH to detect smaller polymorphisms. This locus has no homology to the chromosome of NCTC 8325 in its final 118nts. The flanking ORFs, SAR2563 and SAR2565, are conserved on NCTC 8325's genome with 92 and 90% identity, respectively.
PSSH is also able to detect sequence which sharply diverges from a well conserved ORF. Clone 1F05 (homologous to SAR2779 of MRSA252, an unstudied putative N-acetyltransferase) from the Clonal Complex 30 osteomyelitis pool had some 85 to 86% identity to a homologue on every published S. aureus genome, however it only matched MRSA252 with 100% identity. SAR2779 is strikingly different from its homologs, displaying 13% divergence in identity across its entire 801bp compared to its counterpart on each S. aureus genome contained in GenBank. This suggests that phenotypic differences between distantly related clonal complexes may be due to the slow accumulation of point mutations over time, in addition to the sudden uptake of horizontally transferred genes.
We also detected plasmid-like sequence in at least 60 reads of interest (14.1%). Given their multi-copy nature we were originally fearful that our libraries might be saturated with extrachromosomal elements. However, it appears that PSSH is effective in removing extreme imbalances in copy number and ensuring that no unique fragment is grossly overrepresented.
Analysis of sequence obtained by PSSH also provides insights into genetic horizontal transfer between distant genera. For example, clone 1C08 from the Clonal Complex 30 endocarditis pool was homologous to SAR0720, an unstudied putative cation exporting ATPase protein. Matching sequence was not found in any other genome beyond S. aureus MRSA 252 and Macrococcus caseolyticus JCSC5402 (Identity = 93%, expect = 4e−154, 100% query coverage). The predicted amino acid sequence of SAR0720 matched MCCL_0243 (Identity = 96%, expect = 0.0, 100% query coverage), a putative M. caseolyticus JCSC5402 cation-transporting ATPase. The Macrococcus is believed to be an ancestor of S. aureus, possibly donating the methicillin resistance complex to create MRSA (Baba et al., 2009). Another example is the detection of SAR0261, a putative nitric oxide reductase which is found only in one published S. aureus genome, MRSA252 (Holden et al., 2004). This has significant predicted protein homology to the nitric oxide reductases of many microbes, among them the norB of Neisseria meningitidis (Householder et al., 2000; Rock et al., 2007) (e value = 3e−117), and the Gram positive dental pathogen Lactobacillus fermentum (e value = 0). These results demonstrate the power of PSSH to efficiently detect horizontal gene transfer and detect environmental donors of virulence factors.
Several additional trends were noticed. Reads obtained from Clonal Complex 5 were more likely to be novel sequence (7.8%) not found in the non-redundant nucleotide database compared to Clonal Complex 30 (0.4%). These results suggest that the genomic content of S. aureus strains in this collection is divergent and similarities are likely to be found based on Clonal Complex rather than infection site. If Clonal Complex 30 sequences only matched a single S. aureus genome it was likely to be MRSA252, possibly due to the fact that MRSA252 is the only published Clonal Complex 30 genome available. There is yet to be a published representative genome for Clonal Complex 5 S. aureus. Had one been available we suspect that the number of novel sequences in detected in Clonal Complex 5 would have decreased. Clonal Complex 5 also had a high level of plasmid content compared to Clonal Complex 30. We also observed that while some reads overlapped the same open reading frame we did not see the significant level of repetition that would be expected if our sequencing power had saturated the PSSH libraries. Therefore we suspect that there are other ORFs in these libraries and associated with Endocarditis, Osteomyelitis and/or Clonal Complexes 5 and 30 that were not detected due to our limited data set.
Unlike other hybridization based methods that rely on a solid support matrix and/or foreknowledge of target genes (Gerrish et al., 2007; Herron-Olson et al., 2007), PSSH allows the user to detect previously unknown sequences without the time and expense of whole genome sequencing. PSSH allows the investigator to rapidly probe the genomes of numerous clinical isolates to determine which fragments are associated with a given phenotype, genetic background, or clinical outcome. We utilized PSSH to create enriched libraries of DNA fragments found in pools of ten strains but not found in a less virulent strain. This is the first description of PSSH and its first use to study the pangenome of Clonal Complex 5 and 30 clinical isolates.
In the study of bacterial genomes, SSH has mainly been utilized for the detection of differences between two genomes. A SSH approach could have been applied to identify unique sequences fragments found in our collection of S. aureus clinical isolates, but it would have been much more expensive and inefficient. A PSSH approach allows the investigator to probe large pools of strains for potential targets related to a phenotype, and then later tie these factors to individual strains. This methodology not only allows the researcher to sample entire populations present in a pangenome for novel factors contributing to a phenotype of interest but also confers a significant economic savings. As of this writing the most popular SSH commercially available SSH kit, the Clontech PCR-Select™ Bacterial Genome Subtraction Kit, has a per reaction cost of approximately $130 plus traditional Sanger sequencing costs. Utilizing SSH to analyze an entire microbial pangenome would quickly become prohibitively expensive and consume hours of labor with highly repetitive tasks. Data analysis would be complicated with the detection of the similar unique fragments across many isolates in the pangenome. However, PSSH permits a significant time and cost savings by analyzing numerous representatives from a given pangenome in parallel with the same efficiency and reliability as found in single strain SSH. Unique fragments are contained in the same library and can later be tied back to individual strains by PCR. Extremes in copy number due to plasmids or phage are reduced and relatively rare chromosomal polymorphisms can be detected with regularity.
Staphylococcus aureus is the causative agent of a diverse group of ailments the creation of a library of previously unstudied factors associated with discrete types of illness would be an initial step in understanding pathogenesis and proposing new treatment strategies. The strategy discussed in this communication produces targets for further study in the molecular basis of S. aureus disease. These results may enhance understanding of what bacterial factors are potentially responsible for pathogenesis and clinical outcome. PSSH may also be useful beyond the study of S. aureus pathogenesis. We propose the use of PSSH for the pangenomic analysis of any bacterial species.
This work was partially funded by start-up funds from the SUNY-Buffalo Center for Excellence in Bioinformatics and Life Sciences (S.R.G.), T32 DE007034 (R.S.G.) and R01-AI 59111 (V.G.F.)
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.