|Home | About | Journals | Submit | Contact Us | Français|
Corynebacterium diphtheriae is a Gram-positive, non-spore forming, non-motile, pleomorphic rod belonging to the genus Corynebacterium and the actinomycete group of organisms. The organism produces a potent bacteriophage-encoded protein exotoxin, diphtheria toxin (DT), which causes the symptoms of diphtheria. This potentially fatal infectious disease is controlled in many developed countries by an effective immunisation programme. However, the disease has made a dramatic return in recent years, in particular within the Eastern European region. The largest, and still on-going, outbreak since the advent of mass immunisation started within Russia and the newly independent states of the former Soviet Union in the 1990s. We have sequenced the genome of a UK clinical isolate (biotype gravis strain NCTC13129), representative of the clone responsible for this outbreak. The genome consists of a single circular chromosome of 2 488 635 bp, with no plasmids. It provides evidence that recent acquisition of pathogenicity factors goes beyond the toxin itself, and includes iron-uptake systems, adhesins and fimbrial proteins. This is in contrast to Corynebacterium’s nearest sequenced pathogenic relative, Mycobacterium tuberculosis, where there is little evidence of recent horizontal DNA acquisition. The genome itself shows an unusually extreme large-scale compositional bias, being noticeably higher in G+C near the origin than at the terminus.
Corynebacterium diphtheriae was shown to be the cause of the acute, communicable disease diphtheria after being isolated from diphtheritic pseudomembranes in the late 19th century; shortly after this, the diphtheria toxin (DT) was purified by filtration from C.diphtheriae cultures. Following this discovery, antitoxin prepared from an experimental animal was successfully used by von Behring to treat a case of diphtheria, leading to the introduction of antitoxin therapy. His contribution was acknowledged with the award of the Nobel Prize in Physiology and Medicine in 1901 (1). Despite this history, surprisingly little is known about the biology of this microorganism, although there have been extensive studies published on its clinical pathology and epidemiology (reviewed in 1).
After infection (by direct contact, sneezing or coughing), C.diphtheriae can colonise the skin and/or the upper respiratory tract where it releases DT, causing the symptoms of the disease. The toxin can also be absorbed by the circulatory system and distributed to distant organs, such as the heart (myocardium) or peripheral nervous system. In respiratory diphtheria the disease develops in the posterior structures of the mouth and the proximal pharynx, producing a membrane on one or both tonsils. The microorganism then multiplies on the surface of this membrane, resulting in the formation of the pseudomembrane, which is initially white and becomes grey later on in the infection. The coating of the trachea by the pseudomembrane can reduce the air flow and may eventually result in complete blockage, causing suffocation and death (2).
The lack of molecular investigation of this organism means that there is a limited knowledge of the factors involved in the colonisation of mucosal sites or indeed any other virulence factors that might be associated with invasion, carriage or proliferation. The main reason for this has been the poorly developed genetic systems for C.diphtheriae, making it difficult to identify and characterise these factors.
In contrast to the lack of general knowledge of the organism, the toxin itself, its transcriptional activation, and mechanisms of action have been very well studied. It is known to be encoded on a mobile temperate bacteriophage (corynephage) (3) and is exported via the general secretory pathway. On encountering a host cell it translocates into the cytoplasm and inhibits cellular protein synthesis by ADP ribosylation of elongation factor 2 (1).
Until very recently (4) there has also been a lack of a good animal model for respiratory diphtheria, due to the fact that mice and rats are naturally resistant to DT. Bitransgenic mice have been developed whose cells efficiently express the DT receptor, making them sensitive to the toxin and therefore the first animal model in which this disease and newly developed antidotes can be thoroughly studied.
This project was initiated to generate the genomic sequence and analysis of a pathogenic Corynebacterium in order to help develop and make the most of these new genetic systems so as to gain a better understanding of the biology and virulence of this microorganism. Diphtheria cases are still being reported from every single region of the World Health Organisation and new epidemics occur regularly, as for example in South East Asia and South America. The disease is also still endemic in many parts of the world, which has severe implications for the developed countries that have successful immunisation programmes.
In this work we describe the complete sequence and analysis of C.diphtheriae biotype gravis strain NCTC13129, a clinical toxigenic isolate from the current Eastern European outbreak.
Corynebacterium diphtheriae biotype gravis, NCTC13129 was isolated in 1997 from the pharyngeal membrane of a 72-year-old female with clinical diphtheria who had returned to the UK from a Baltic cruise (5). The organism was cultured onto Columbia blood agar (Oxoid, Basingstoke, UK) and characterised by standard microbiological methods (6). Chromosomal DNA was extracted directly from the plate culture using previously described methods (7).
The genome sequence was obtained from 60 750 end sequences (giving 9.2× coverage) derived from two pUC18 genomic shotgun libraries (with insert sizes of 1.4–2 and 2–4 kb) using dye terminator chemistry on ABI377 automated sequencers. End sequences from a large insert BAC library (pBACe3.6; 1.1× clone coverage, 6–9 kb insert size) were used as a scaffold. All identified repeats were bridged by read-pairs or end-sequenced PCR products. The sequence was assembled, finished and annotated as described previously (8).
The final chromosome sequence was searched with Orpheus (9) and Glimmer2 (10) in order to identify possible coding sequences (CDSs) and the results were curated manually. Predicted proteins were searched against the public databases using FASTA (11) and BLASTP (12), and protein domains were identified using Pfam (13) and Prosite (14). The results of all searches were collated using Artemis (15) to facilitate annotation. Orthologous proteins between C.diphtheriae and M.tuberculosis (16) were identified as reciprocal best matches using FASTA with subsequent manual curation. Pseudogenes had one or more mutations that would prevent full translation; each of the inactivating mutations was subsequently checked against the original sequencing data.
The sequence and annotation of the genome have been submitted to the DDBJ/EMBL/GenBank databases with the accession no. BX248353.
The general features of the C.diphtheriae genome are shown in Figure Figure11 and Table Table1.1. Metabolic analysis revealed a complete set of enzymes for the glycolysis, gluconeogenesis and pentose-phosphate pathways. The citrate cycle (TCA cycle) appears to be complete except for the conversion between succinate and succinyl-CoA. The usual bacterial enzyme catalysing this step, succinyl-CoA synthetase [encoded by sucC and sucD, which are both present in Corynebacterium efficiens (17)], is absent; instead, C.diphtheriae may utilise the product of DIP1902, a homologue of the Clostridium kluyveri cat1 gene, which has been shown to act as a succinyl-CoA:coenzyme A transferase (18). As expected, both aerobic and anaerobic respiration genes are present. All the de novo amino acid biosynthesis pathways are present, as is the purine nucleotide biosynthetic pathway. Conversely, the pyrimidine pathway seems to lack the final cytidine triphosphate synthetase (PyrG), which is present in M.tuberculosis, Corynebacterium glutamicum (19,20) and C.efficiens, although the pathway leading to the biosynthesis of thymidine nucleotide seems complete. Pantothenate, CoA and biotin production pathways are complete, although that for folic acid is apparently not.
Corynebacteria are characterised by their high G+C content as well as irregular cell morphology; however, within the genus Corynebacterium the G+C content is broad (51–70%) and this reflects its genetic diversity. The almost universal bacterial bias towards guanine (G) on the leading strand of the bidirectional replication fork (21) is conserved, allowing the designation of an origin of replication near base 1, and a terminus around base 1 249 000. However, a highly unusual feature of the C.diphtheriae genome is that the G+C content itself is not constant across the genome (Fig. (Fig.2a).2a). Strikingly, there is a region of ~740 kb (approximate bases 981 700– 1 720 500), which encompasses the terminus of replication, that has a significantly lower G+C content (49.99% overall, reaching a trough of 48.08%) than the remainder of the genome (54.96%). This region does not have clearly defined boundaries; indeed there seems to be a gradual transition from a higher G+C content in the region near the origin to a lower content in the region around the terminus. This change in G+C content is not due to recent acquisition of genes, or to a bias in the position of certain types or classes of gene, as a comparison with the genome of M.tuberculosis (16), a distantly related actinomycete, demonstrates that orthologous genes between the two genomes are spread across both sections of the genome (Fig. (Fig.11 and Supplementary Material Fig. 1). This is also evident in closer comparisons; the entire length of the C.diphtheriae genome is co-linear with the backbone of C.glutamicum (19,20) and that of C.efficiens (17), both non-pathogenic bacteria that show no such large genomic changes in their G+C content (Fig. (Fig.3).3). Further investigation of the C.diphtheriae genome indicated that almost all of the variation is due to changes in the third codon position within CDSs (Fig. (Fig.2b),2b), and within non-protein-coding regions (Fig. (Fig.2c).2c). A recent analysis (22) has shown that many bacterial genomes are structured in this way, with a lower G+C content near the terminus. In most cases, however, this was only detectable by measuring cumulative changes in third-position GC content in all genes across the genome; clearly the bias in C.diphtheriae is considerably stronger than these other genomes. It was hypothesised that this change could be due to structural constraints around the terminus, or to differential mutational pressures around the terminus leading to an increase in GC to AT changes. This would suggest that there is some temporal or physical compartmentalisation of the genome at some stage; the most obvious candidate being chromosomal replication [it has been shown in Escherichia coli that the pre- and post- replication sections of the chromosome occupy different areas of the cell (23)]. It is intriguing in this context that this extreme bias is only apparent in C.diphtheriae, not C.glutamicum or C.efficiens; this may reflect different environmental mutational pressures for the pathogenic versus the environmental species.
The occurrence of Rag (RGNAGGGS) motifs within the C.diphtheriae genome agrees with studies performed by Lobry and Louran (24). These motifs correspond to a family of G-rich octamers whose skew strongly shifts near the origin and the terminus of replication, and this is maintained in the C.diphtheriae genome even in the low G+C region surrounding the terminus of replication. The point at which the skew shifts near the terminus is marked by dif, a site devoted to chromosome dimer resolution. It has been proposed that these polarised Rag motifs are involved in facilitating the attachment of the septum-anchored protein FtsK to the chromosome, so preventing the capture of this region by the septum and facilitating dimer resolution.
Local anomalies in nucleotide composition, such as G+C content, GC skew [(G–C) / (G+C)] and/or dinucleotide frequency of the DNA can potentially be indicative of recent acquisition of DNA. In the C.diphtheriae NCTC13129 genome we have identified 13 regions (including the corynephage) matching some or all of these criteria, many of which are flanked by tRNAs (Fig. (Fig.11 and Table Table2).2). Subsequent comparisons with C.glutamicum and C.efficiens have shown that none of these regions are present in the genomes of these two environmental strains. It is also likely that there are other, smaller, regions that may have been horizontally acquired. Many genes that could contribute to the pathogenicity of C.diphtheriae are found within these putative islands. These PAIs encode the vast majority of the fimbrial and fimbria-related genes, as well as iron-uptake systems, a potential siderophore biosynthesis system and a lantibiotic biosynthesis system. This is reminiscent of the situation in the Enterobacteriaceae, where alternative (often mobile) pathogenicity determinants, allowing different host interactions and pathogenic lifestyles, are superimposed on a stable backbone encoding core functions (25), and similar to that of another Gram-positive pathogen, Staphylococcus aureus, in which some pathogenic determinants such as staphylococcal superantigen determinants are carried on staphylococcal PAIs (SaPIs) (26). However, this is in strong contrast to the situation described in the closest sequenced pathogenic relative of C.diphtheriae, M.tuberculosis, where pathogenicity appears to be a function of diverse factors encoded throughout the genome, and PAIs seem to be absent (16).
DT is one of the most widely studied bacterial toxins (reviewed in 1). In NCTC13129 the tox gene (DIP0222) encoding DT is situated within the right-hand end of an integrated corynephage (bases 154 153–190 718), just inside the att site, within a discrete region of low C+G content (42.54%) (Fig. (Fig.4).4). This arrangement, suggesting recent acquisition of the tox gene by the phage, is similar to that in several pathogenicity determinant-encoding phages in Streptococcus pyogenes (27). DtxR is an iron-dependent negative-regulatory protein in C.diphtheriae that has been shown, under high iron conditions, to transcriptionally repress the tox gene, the corynebacterial siderophore and some other components of the high-affinity iron-uptake system (28). Iron limitation is a common mechanism by which hosts can suppress bacterial growth, and thus low iron is a common environmental cue for pathogenic bacteria, to which the expression of DT has been coupled. Pathogenic bacteria need specialised mechanisms for acquiring iron, often by the manufacture of secreted high-affinity iron sequestration molecules termed siderophores. Only one ferrisiderophore receptor has been described before in C.diphtheriae, Irp6A. It is situated upstream of a putative iron-uptake system (DIP0109–DIP0111) under the control of a DtxR recognised promoter (29). Siderophores are often manufactured by complex polyketide or non-ribosomal peptide synthases, and two large candidate genes for siderophore biosynthesis are DIP2160 (7.9 kb), a predicted modular polyketide synthase with similarity to the Streptomyces verticillus bleomycin biosynthesis polyketide synthase BlmVIII (30), and DIP2161 (5.2 kb), a predicted non-ribosomal peptide synthase with similarities to the Pseudomonas aeruginosa pyochelin synthetase PchF (31), which are situated together in a potential PAI (Table (Table2).2). Downstream of these genes are a pair of ABC transporters with similarity to the Yersinia pestis ATP-binding protein YbtP required for iron transport, itself situated just downstream of the Yersiniabactin biosynthesis cluster (32). In all, seven putative iron-uptake systems have been found in the C.diphtheriae genome, two of which have been previously described: the siderophore receptor IrpA6 and the hemin utilisation gene cluster hmu (Supplementary Material Table 1). Of these seven systems, only two are present in C.glutamicum (hmu and that encoded by DIP1059–DIP1063) and none in C.efficiens.
Fimbriae (or pili) in C.diphtheriae have been previously described (33) although not molecularly characterised. The only actinomycetes in which fimbriae have been fully characterised are Actinomyces naeslundii and Actinomyces viscosus (34). These are the dominant commensal Actinomyces spp. on dental and mucosal surfaces of numerous animal hosts, although some have been implicated in infection. They present type 1 and type 2 fimbriae that bind to a number of host proteins. These fimbrial systems are completely unlike any described in Gram-negative systems, but their components instead show similarity to sortases and sortase-processed proteins. Sortases are membrane-bound transpeptidases that covalently link surface proteins to the cell wall peptidoglycan. This is achieved through recognition of one of the conserved motifs LP/AXTG or NPQTG just upstream of a C-terminal hydrophobic signal peptide sequence and subsequent cleavage at, and linkage of, the threonine via an amide bond to an amino group of the peptidoglycan. Such systems are often used by Gram-positive pathogens to attach host-interacting proteins to the bacterial surface (35). The mechanisms of production and polymerisation of the Actinomyces fimbriae are unknown, but the obvious possibility is that the sortase-like proteins are involved in anchoring or even polymerising the fimbrial subunits. The C.diphtheriae genome contains six genes encoding putative sortases: DIP0233, DIP0236, DIP2012, DIP2224, DIP2225 and DIP2272 (Supplementary Material Table 2). The last of these appears to be part of the backbone of the chromosome, as it is present in both C.glutamicum and C.efficiens, and hence may be the ‘housekeeping’ sortase, while the other five (which are more closely related to each other than to DIP2272) are located in potential PAIs and are not present in the two non-pathogenic corynebacterial genomes. It seems possible, therefore, that C.diphtheriae has recently acquired a sortase-related fimbrial system similar to that in Actinomyces that would aid the bacteria in its early stages of invasion and adherence to the host cell surfaces. A total of 18 CDSs were found with correctly situated potential sortase anchor sites; many of these are associated with fimbrial genes, and are similar to fimbria-related proteins from Actinomyces (Supplementary Material Table 3). Other significant sortase-anchored proteins include DIP2093, which shows weak similarity to members of the Ser–Asp repeat (Sdr) family of adhesins from staphylococci (36).
In addition to such apparently recently acquired elements, some other genes important for the pathogenic lifestyle of C.diphtheriae are found in what could be considered to be core regions. For example, the cell wall of actinomycetes is considered to be an important pathogenicity factor. The cell walls of C.diphtheriae and M.tuberculosis share several common features; both contain an arabinogalactan polymer that anchors an outer lipid-rich domain to the murein sacculus of the cell. The detailed structure of the corynebacterial arabinogalactan, however, remains to be defined. Although corynomycolic acids are significantly smaller than their mycobacterial counterparts, their basic construction is similar. Like the mycolic acids, they are alpha-alkyl, beta-hydroxy fatty acids produced via a Claisen-like condensation of two fatty acyl chains (37). Fatty acid synthesis in plants and mammals occurs via a multifunctional polypeptide (FAS-I type system encoded by fas) that carries all of the necessary enzymatic functions. In most bacteria, however, de novo synthesis occurs via a dissociated multi-enzyme FAS-II system. In mycobacteria de novo fatty acid synthesis is carried out through FAS-I, with further extensions performed by FAS-II leading to the long-chain mycolic acids (37). Consistent with the fact that direct condensation of FAS-I products should be sufficient for the synthesis of corynomycolates, our analysis of the C.diphtheriae genome has revealed that only a fas homologue is present. Moreover, this suggests that FAS-I may be a common means of de novo fatty acid synthesis in actinomycetes. However, FAS-II systems capable of de novo synthesis do occur in Streptomyces (38).
The esterification of mycolates to the arabinogalactan is catalysed by a family of mycolyltransferases known as the antigen 85 complex (FbpA,B,C1,C2) (39). A similar function has been ascribed to the product of csp1 in C.glutamicum (40). In both M.tuberculosis and C.diphtheriae the genes encoding these enzymes are situated in a highly conserved region of the genome that carries several cell wall-related functions. In mycobacteria and C.diphtheriae two ‘mycolyltransferase’ genes lie downstream of the galactosyltransferase implicated in cell wall galactan polymerisation. In M.tuberculosis fbpA is followed by a homologue, fbpC1, which lacks the catalytic triad found in other members of the antigen 85 complex; sequence alignments suggest that both C.diphtheriae enzymes (encoded by csp1 and DIP2194) possess this catalytic triad, which is necessary for corynomycolyltransferase activity.
In mycobacteria the complete embCAB cluster is required to produce the arabinan domain of arabinogalactan, and these three membrane proteins have been implicated as arabinosyltransferases. Only one putative arabinosyltransferase gene, emb (DIP0159), is apparent in the C.diphtheriae genome. In M.tuberculosis, arabinogalactan is attached to peptidoglycan via a rhamnose-N-acetylglucosamine disaccharide linker unit. All the necessary enzymes appear to be conserved in C.diphtheriae to form this linker. The genes required for the synthesis of the activated sugar donor dTDP-rhamnose (rmlABCD) in M.tuberculosis have recently been characterised (41). In C.diphtheriae NCTC13129 the orthologues of rmlCD appear to be fused (DIP0361) to form a bifunctional protein. The orthologue of rfbE, which encodes the putative ligase implicated in attachment of arabinogalactan to the peptidoglycan, is also located close to emb in a similar genetic context to that in M.tuberculosis. The major difference in the organisation of the genes involved in mycolylarabinogalactan synthesis in these bacteria is that emb and rfbE are located 468.8 kb distant from the glfT homologue in C.diphtheriae rather than in a single cluster as in M.tuberculosis.
It is known that some strains of C.diphtheriae exhibit sialidase (neuraminidase) and trans-sialidase activity (42). As these activities have been linked to virulence in several other microbial pathogens, and may enhance fimbrial-mediated adhesion in actinomycetes by unmasking receptors on mammalian cells (43), they represent potential virulence factors in C.diphtheriae. Furthermore, sialidases and trans-sialidases have proven attractive drug and vaccine targets in some pathogens. The NCTC13129 genome encodes two putative sialidases. DIP0330 is encoded within a four-gene insertion relative to the C.glutamicum genome but appears to lack a signal peptide or trans-membrane domain, and DIP0543 is encoded by a single-gene insertion and possesses a signal peptide, a coiled-coil domain and a C-terminal trans- membrane domain.
The recent diphtheria epidemics have emphasised that continuous expansion in the depth of knowledge of basic biological and genetic mechanisms, which could affect the organism’s adaptability and pathogenicity, will remain as one of our most powerful tools in the fight against diphtheria. The data from the genome sequence has allowed us to characterise a number of putative virulence factors, such as adhesins or fimbrial-related proteins, which could be used as targets for diagnostic reagents, antimicrobials and as potential vaccine candidates against invasive diphtheria. Unlike its closest sequenced pathogenic relative, M.tuberculosis (16), C.diphtheriae appears to have recently acquired many genes necessary for survival, attachment and virulence in the host. This difference may be a reflection of the different environments of the two organisms; M.tuberculosis is a predominantly intracellular pathogen and thus has less opportunity for genetic exchange than does the extracellular C.diphtheriae.
Supplementary Material, including a linear gene map and functional classification of identified genes is available at NAR Online.
We are very grateful to David Hopwood for his critical reading of the manuscript. We would like to acknowledge the support of the Wellcome Trust Sanger Institute core sequencing and informatics groups. This work was supported by the Wellcome Trust through its Beowulf Genomics initiative.
DDBJ/EMBL/GenBank accession no. BX248353