Development of the TMDH procedure
The TMDH procedure is outlined in Figure . Genomic DNA from the TMDH mutant library is digested using an appropriate restriction enzyme and amplified using linker PCR. Transcription is induced from the transposon T7 promoters in the presence of fluorescently-labelled dNTPs, and DNaseI is used to remove DNA, leaving labelled RNA run-offs. The labelled RNA is hybridised to a microarray, and the retrieved data used to screen for the presence of transposons as detailed below. Following the screening process, candidate essential genes are further interrogated for the presence of transposon inserts using PCR and DNA sequencing.
Figure 1 Diagrammatic representation of the TMDH process. a) Illustration of the mariner transposon b) A saturated genome-wide transposon library is produced. Cells with a transposon disrupting an essential gene will not survive. Genomic DNA from the library is (more ...)
The restriction digest is critical for the TMDH procedure as it provides boundaries to limit the extension of the RNA run-offs, preventing transcripts produced from transposons within non-essential regions from extending into adjacent essential genes. However, for any particular restriction enzyme the distribution of restriction sites may be sub-optimal for some genes, limiting the number of "informative probes" that can be used to assess gene essentiality (see below). To minimise this problem two TMDH experiments were performed, one using the restriction enzyme AluI (AG^CT), the other using RsaI (GT^AC). Even using two enzymes, some genes were still potentially problematic, but sufficiently few that an additional enzyme was not deemed necessary; the affected genes were investigated using PCR (see below).
For each TMDH experiment, chromosomal DNA from the library was digested using the appropriate restriction enzyme, and oligonucleotide linkers were ligated to the restriction fragments. Linker PCR was performed using a linker-specific primer and a transposon-specific primer. In vitro transcription was performed from the T7 promoters, with direct incorporation of Cy5-UTP. The reaction was treated with DNaseI and the resultant labelled RNA was purified and used for hybridisation to the microarray.
Construction of a TMDH transposon library in S. aureus
Full details are presented in the Methods section. Briefly, a TMDH transposon, with an outward facing T7 promoter, was initially developed for use in E. coli
based on the transposon Tn5 (EZ:Tn R6k ori Kan transposon, Epicentre) [12
]. The construct was adapted for use in S. aureus
by the addition of mariner
mosaic ends (ME) and an erythromycin resistance gene. It was incorporated into a temperature-sensitive (ts) plasmid that contains a chloramphenicol resistance gene and is stable for replication in S. aureus
at 30°C and below.
A large transposon library of around a million mutants was generated in S. aureus
SH1000, essentially using the procedure described by Bae et al
]. A serial dilution of the resulting culture was used to determine library size (~106
mutants) and DNA sequencing and a preliminary TMDH analysis were used to demonstrate that the transposon had integrated throughout the S. aureus
chromosome (see Figure ).
Figure 2 Cumulative distribution of transposons identified in preliminary experiment. Transposons were located using either the PCR/sequencing method or an initial microarray-based screen. From these experiments it was concluded that the mariner transposon inserted (more ...)
The TMDH method is designed to work for strains for which sequence information can be obtained. This includes non-sequenced strains that are related to strains with a genome sequence. Our experiments were carried out in the S. aureus
strain SH1000. This was derived from the genome sequenced strain NCTC 8325 (GenBank accession: CP000253
) through phage curing followed by reconstruction of the rsbU
]. The microarray probes that were relevant to TMDH in strain SH1000, and their positions relative to the NCTC 8325 genome sequence, were determined using BLAST (see Methods).
A set of 60-mer oligonucleotide probes was designed based upon the S. aureus
MW2 genome sequence (GenBank accession: BA000033
). The probes were spaced approximately every 100 bases on both strands across the whole genome, with the exception of repetitive regions. In-house inkjet printers capable of generating around 22,000 features per slide were used to produce microarrays [17
]. As the total number of probes to span the genome exceeded this capacity, each experiment was performed using three separate slides. Following hybridisation of the labelled RNA and washing, the arrays were scanned using an Agilent G2500A scanner, and the images analyzed using the Agilent Feature Extractor software. Full details of the microarray procedure are available in the Methods section.
Development of software to determine location of transposon insertions
For TMDH the primary interest is in a discrete binary property, i.e. the presence or absence of transposons. Microarray data are continuously variable, so it is necessary to adopt a strategy for scoring transposons as present or absent. The log2 of the spot intensities measured from each microarray show a mixed distribution (see Figure ). In the absence of any transposons, low signal intensities with an approximately normal distribution would be expected ("off" signals). In contrast, "on" signals would be expected to have higher intensities but follow an irregular distribution, reflecting the multiple factors that influence the signal if transposons are present. These include the number of transposons present, their distance from probes, and local sequence elements that may affect transcription from the promoter located in the transposon.
Figure 3 Illustration of the TMDH microarray scoring system (see text). a) Histogram of log2 of the raw microarray signals. b) As a), but overlaid with histogram of log2 of the microarray signals for the probes which do not hybridise to the S. aureus NCTC 8325 (more ...)
To test our hypothesis that the low intensity signals represent the "off" distribution, we exploited the differences between the genomes of the S. aureus strains SH1000, in which the library was constructed, and MW2, the genome sequenced strain used to design the microarray probes. Probes designed to hybridise to MW2-specific genomic islands will not hybridise to any target from the SH1000 genome, so act as negative controls. Plotting the data from these probes demonstrates that the lower region of the full distribution corresponds to the signal produced in the absence of any specific hybridisation (Figure ). The presence of a small number of higher signals within the negative control dataset suggests some non-specific hybridisation. This issue is addressed in the microarray analysis method detailed below.
For the analysis of TMDH data we developed a method for determining cut-off values to distinguish the "on" and "off" signals. The method is analogous to one commonly used in the analysis of microarray data derived from comparative genome hybridisation [CGH, also referred to as genomotyping; [18
]]. The "off" distribution is modelled using a normal distribution fit empirically to the microarray data. A low cut-off point is defined at the point where the number of probes predicted to show that particular signal by the fitted distribution drops below the observed number. A high cut-off is defined at the point where the fitted distribution explains close to 0% of the observed data at that intensity. Probes that gave a signal above the high cut-off were assigned a score of +1, since they were likely to be influenced by the presence of transposons. Probes with a signal below the lower cut-off were given a score of -1, as it is likely that their signal represents the background level without any influence from the RNA produced from the transposon promoters. Probes with an intermediate score were assigned a score of 0, since it was not possible to infer unambiguously the presence or absence of transposons. This method accounts for between-array signal variation, since the curve-fitting procedure is carried out independently for each array. It should also be noted that no account is taken of within-array variation. Standard normalisation approaches cannot be applied due to the high proportion of "on" signals within the dataset.
In TMDH, a probe that hybridises within an essential gene may still give an "on" signal if it is downstream of a transposon that has integrated outside the gene, since RNA run-offs are defined by restriction enzyme sites, and not gene boundaries. Therefore the position of the restriction sites is critical. A method was developed to score only "informative" probes for each restriction fragment (see Figure ). Probes were only considered informative if they could not be influenced by a transposon outside of the gene. These included probes downstream of an intragenic restriction site (shown in Figure as vertical black lines). If no transposon was evident anywhere within a particular restriction fragment, then all probes from within that fragment were considered informative, since there could be no interference from outside the gene.
Figure 4 Procedure for identifying "informative probes" for the automated scoring system. a) Probes are only informative for a gene if they are downstream of an intragenic restriction site. Other probes may be influenced by transposons located outside the gene, (more ...)
To determine a list of candidate essential genes in an automated manner, we examined the score of informative probes overlapping each gene, across all arrays using both restriction enzymes. A simple sum of the scores was found to be the most robust indicator of the presence/absence of transposons. This exploits the presence of multiple probes per gene to minimize the impact of any aberrant signals due to non-specific hybridisation or the lack of normalisation. Genes with a total score of -4 or lower were automatically classified as essential. This cut-off was chosen empirically to minimize the number of false positive genes that were considered unlikely to be essential based on their annotation or prior experimental evidence, whilst retaining most known essential genes. However, the cut-off also results in the omission of genes that had fewer than 4 informative probes, due to their short length or the distribution of restriction sites. The R scripts used to implement the TMDH scoring system are available upon request.
PCR-based identification of essential genes
Since the automated analysis of microarray data was not expected to be comprehensive, we devised a complementary PCR-based footprinting approach to generate a robust list of essential genes (see Figure ). A PCR primer was designed between 50 and 300 bases upstream of the target gene. An outward-facing primer was also designed based on the mariner transposon sequence. These primers can be used to amplify a range of products, each corresponding to one of the transposon inserts in the library. The size of the products can be determined on an agarose gel, and from each of these the location of the insert can be determined. If a gene is essential, no transposons should be found within the boundaries of the gene, so no PCR products should be seen within the corresponding size range. Most of these genes were investigated further by DNA sequencing, and the gene was considered non-essential if any of the PCR products was confirmed to be derived from an intragenic transposon. An exception to this rule was made if a single PCR product was identified in the region corresponding to the C-terminal portion of the protein where a transposon insertion was considered unlikely to disrupt functionality. If a PCR product could not be obtained, the ability of the gene-specific primer to generate a PCR product was assessed by linker PCR (see Methods).
Figure 5 PCR footprinting strategy used to confirm or reject putative essential genes. a) Diagrammatic representation of the strategy. PCR is performed using a gene specific primer 50–300 bases upstream of the start codon, and a primer corresponding to (more ...)
Manual inspection of the raw microarray data using an online genome browser (http://www-tmdh.vet.cam.ac.uk
) allowed us to identify a number of candidate essential genes (mostly small essential genes that are not detectable by the automated analysis). These genes were investigated using the PCR method. To validate the microarray analysis we also performed the PCR confirmation step on all genes identified as essential by the automated method, and also on all genes not identified as essential in S. aureus
, but which had an essential orthologue in B. subtilis
S. aureus Essential Gene List
Following the preliminary microarray screen and automated analysis, 274 candidate essential S. aureus
genes were identified. These, together with a further 235 candidates chosen as potentially essential by manual inspection of the microarray data, or because of their presence in the B. subtilis
essential gene list [4
], were further investigated by PCR and sequencing. Following this process, 351 S. aureus
genes were identified that were not disrupted by transposons and constitute our putative S. aureus
essential gene list. The genes were classified into the same functional categories as used in the B. subtilis
], and a summary of the findings is presented in Table . Additional file 1
contains a full list of the 351 essential genes in S. aureus
, together with a comparison with the essential gene list from B. subtilis
] and a number of previously published S. aureus
"essential gene" studies [13
]. The results are also presented in a more compact form in Additional file 2
. The full results of the PCR/sequencing analysis are available in Additional file 3
. Unless otherwise stated, gene names below are those from the B. subtilis
GenBank entry, with the S. aureus
gene name in parentheses if different. Genes that are unnamed and do not have a B. subtilis
orthologue are referred to by the S. aureus
NCTC 8325 systematic nomenclature, with the prefix "SAOUHSC_".
Tabulation of S. aureus essential genes by category.
Influence of experimental conditions on the essential gene list
Several studies have been carried out to determine the minimal set of genes that is essential for growth and replication of a bacterial cell [22
]. However, any attempt to determine this experimentally will inevitably be influenced by the conditions under which the experiment is performed. A gene may be scored as essential in a particular assay because it is required for survival following exposure to a particular stress inherent in the method, or because it is involved in the uptake or metabolism of the particular nutrients provided in the growth media. An example of this in our method is the requirement for extended incubation of S. aureus
at 44°C to remove temperature-sensitive replicons. Consequently, genes required for high temperature survival will be scored as putatively essential. To test this, we examined defined mutations in mrpF
(classified as essential from the TMDH score) and all were shown to confer heat sensitivity in S. aureus
(data not shown). Furthermore, mutations in clpP
can cause a growth defect at high temperatures [25
Transposon insertions may not be tolerated within a particular gene even if it is not truly "essential", since they may have a polar effect on the function of a downstream essential gene within the same operon. Examples of potential polarity issues include rimM and recU (possibly polar on trmD and pbp2, respectively). Under our experimental conditions, false positives may also be identified if transposon mutagenesis results in a reduced growth rate, since such mutants may be out-competed within the pool. Conversely, false negatives are possible since it may be possible to insert a transposon close to the 3' end of some essential genes without impairing gene function. Insertions close to the 5' end may also be possible if a suitable alternative initiation codon is available. This can also be considered an advantage of the technique, since the transposon screen effectively acts as a truncation assay that gives an indication of the minimal portion of the gene sequence required for function.
Evaluation of the microarray method as a screen
The microarray procedure alone was not expected to comprehensively identify the complete S. aureus essential gene list, but was applied as a screen to reduce the workload for the more robust but laborious PCR-based analysis. However, for some applications an exhaustive essential gene list may not be necessary, so it is useful to evaluate the efficacy of the microarray screen. The robust S. aureus essential gene list obtained using PCR footprinting allows the sensitivity (percentage of essential genes identified as such in the screen) and specificity (percentage of non-essential genes correctly identified by the screen) of the microarray procedure to be evaluated. Figure shows a ROC curve of sensitivity against false positive rate (1-specificity) for cut-off values from -1 to -10. The curve suggests that a cut-off value of -4 would give an optimal balance between false positives and false negatives in future studies. This gives a specificity of 80.3%, with a sensitivity of 99.1%.
ROC curves showing sensitivity against false positive rate (1-specificity) for the TMDH microarray screens using individual restriction enzymes, and the combined data from both, for cut-off values from -1 to -10.
The use of multiple restriction enzymes for TMDH has theoretical advantages, but increases the expense of the method since it involves an additional set of microarrays. Knowledge of the robust essential gene list allows us to evaluate our use of two restriction enzymes, by investigating the data from each microarray screen separately. The analysis of the individual microarray screens is also shown in Figure . In these analyses, a cut-off value of -3 is optimal. The AluI experiment performed better than the experiment using RsaI, with specificities of 74.1% and 70.1%, respectively, but both are inferior to the combined dataset. The choice of whether to use a single or multiple enzymes for future studies will depend on the application and available resources.