The Gram-negative bacterium Yersinia pestis is the causative agent of the bubonic plague. Efficient iron acquisition systems are critical to the ability of Y. pestis to infect, spread and grow in mammalian hosts, because iron is sequestered and is considered part of the innate host immune defence against invading pathogens. We used a proteomic approach to determine expression changes of iron uptake systems and intracellular consequences of iron deficiency in the Y. pestis strain KIM6+ at two physiologically relevant temperatures (26°C and 37°C).
Differential protein display was performed for three Y. pestis subcellular fractions. Five characterized Y. pestis iron/siderophore acquisition systems (Ybt, Yfe, Yfu, Yiu and Hmu) and a putative iron/chelate outer membrane receptor (Y0850) were increased in abundance in iron-starved cells. The iron-sulfur (Fe-S) cluster assembly system Suf, adapted to oxidative stress and iron starvation in E. coli, was also more abundant, suggesting functional activity of Suf in Y. pestis under iron-limiting conditions. Metabolic and reactive oxygen-deactivating enzymes dependent on Fe-S clusters or other iron cofactors were decreased in abundance in iron-depleted cells. This data was consistent with lower activities of aconitase and catalase in iron-starved vs. iron-rich cells. In contrast, pyruvate oxidase B which metabolizes pyruvate via electron transfer to ubiquinone-8 for direct utilization in the respiratory chain was strongly increased in abundance and activity in iron-depleted cells.
Many protein abundance differences were indicative of the important regulatory role of the ferric uptake regulator Fur. Iron deficiency seems to result in a coordinated shift from iron-utilizing to iron-independent biochemical pathways in the cytoplasm of Y. pestis. With growth temperature as an additional variable in proteomic comparisons of the Y. pestis fractions (26°C and 37°C), there was little evidence for temperature-specific adaptation processes to iron starvation.
Yersinia pestis proteins were sequentially extracted from crude membranes with a high salt buffer (2.5 M NaBr), an alkaline solution (180 mM Na2CO3, pH 11.3) and membrane denaturants (8 M urea, 2 M thiourea and 1% amidosulfobetaine-14). Separation of proteins by 2D gel electrophoresis was followed by identification of more than 600 gene products by MS. Data from differential 2D gel display experiments, comparing protein abundances in cytoplasmic, periplasmic and all three membrane fractions, were used to assign proteins found in the membrane fractions to three protein categories: (i) integral membrane proteins and peripheral membrane proteins with low solubility in aqueous solutions (220 entries); (ii) peripheral membrane proteins with moderate to high solubility in aqueous solutions (127 entries); (iii) cytoplasmic or ribosomal membrane-contaminating proteins (80 entries). Thirty-one proteins were experimentally associated with the outer membrane (OM). Circa 50 proteins thought to be part of membrane-localized, multi-subunit complexes were identified in high Mr fractions of membrane extracts via size exclusion chromatography. This data supported biologically meaningful assignments of many proteins to the membrane periphery. Since only 32 inner membrane (IM) proteins with two or more predicted transmembrane domains (TMDs) were profiled in 2D gels, we resorted to a proteomic analysis by 2D-LC-MS/MS. Ninety-four additional IM proteins with two or more TMDs were identified. The total number of proteins associated with Y. pestis membranes increased to 456 and included representatives of all six β-barrel OM protein families and 25 distinct IM transporter families.
Here we report the use of a multi-genome DNA microarray to investigate the genome diversity of Bacillus cereus group members and elucidate the events associated with the emergence of B. anthracis the causative agent of anthrax–a lethal zoonotic disease. We initially performed directed genome sequencing of seven diverse B. cereus strains to identify novel sequences encoded in those genomes. The novel genes identified, combined with those publicly available, allowed the design of a “species” DNA microarray. Comparative genomic hybridization analyses of 41 strains indicates that substantial heterogeneity exists with respect to the genes comprising functional role categories. While the acquisition of the plasmid-encoded pathogenicity island (pXO1) and capsule genes (pXO2) represent a crucial landmark dictating the emergence of B. anthracis, the evolution of this species and its close relatives was associated with an overall a shift in the fraction of genes devoted to energy metabolism, cellular processes, transport, as well as virulence.
The pathogenic mold Aspergillus fumigatus is the most frequent infectious cause of death in severely immunocompromised individuals such as leukemia and bone marrow transplant patients. Germination of inhaled conidia (asexual spores) in the host is critical for the initiation of infection, but little is known about the underlying mechanisms of this process.
To gain insights into early germination events and facilitate the identification of potential stage-specific biomarkers and vaccine candidates, we have used quantitative shotgun proteomics to elucidate patterns of protein abundance changes during early fungal development. Four different stages were examined: dormant conidia, isotropically expanding conidia, hyphae in which germ tube emergence has just begun, and pre-septation hyphae. To enrich for glycan-linked cell wall proteins we used an alkaline cell extraction method. Shotgun proteomic resulted in the identification of 375 unique gene products with high confidence, with no evidence for enrichment of cell wall-immobilized and secreted proteins. The most interesting discovery was the identification of 52 proteins enriched in dormant conidia including 28 proteins that have never been detected in the A. fumigatus conidial proteome such as signaling protein Pil1, chaperones BipA and calnexin, and transcription factor HapB. Additionally we found many small, Aspergillus specific proteins of unknown function including 17 hypothetical proteins. Thus, the most abundant protein, Grg1 (AFUA_5G14210), was also one of the smallest proteins detected in this study (M.W. 7,367). Among previously characterized proteins were melanin pigment and pseurotin A biosynthesis enzymes, histones H3 and H4.1, and other proteins involved in conidiation and response to oxidative or hypoxic stress. In contrast, expanding conidia, hyphae with early germ tubes, and pre-septation hyphae samples were enriched for proteins responsible for housekeeping functions, particularly translation, respiratory metabolism, amino acid and carbohydrate biosynthesis, and the tricarboxylic acid cycle.
The observed temporal expression patterns suggest that the A. fumigatus conidia are dominated by small, lineage-specific proteins. Some of them may play key roles in host-pathogen interactions, signal transduction during conidial germination, or survival in hostile environments.
Mass spectrometry; LC-MS/MS; APEX; Shotgun proteomics; Aspergillus fumigatus; Germination; Spore; Conidia; Fungi; Hypothetical proteins
Here we report the use of a multi-genome DNA microarray to elucidate the genomic events associated with the emergence of the clonal variants of H. influenzae biogroup aegyptius causing Brazilian Purpuric Fever (BPF), an important pediatric disease with a high mortality rate. We performed directed genome sequencing of strain HK1212 unique loci to construct a species DNA microarray. Comparative genome hybridization using this microarray enabled us to determine and compare gene complements, and infer reliable phylogenomic relationships among members of the species. The higher genomic variability observed in the genomes of BPF-related strains (clones) and their close relatives may be characterized by significant gene flux related to a subset of functional role categories. We found that the acquisition of a large number of virulence determinants featuring numerous cell membrane proteins coupled to the loss of genes involved in transport, central biosynthetic pathways and in particular, energy production pathways to be characteristics of the BPF genomic variants.
Haemophilus; Brazilian Purpuric Fever; pathogen emergence; virulence; comparative genomics; microarray
Shigella dysenteriae serotype 1 (SD1) causes the most severe form of epidemic bacillary dysentery. Quantitative proteome profiling of Shigella dysenteriae serotype 1 (SD1) in vitro (derived from LB cell cultures) and in vivo (derived from gnotobiotic piglets) was performed by 2D-LC-MS/MS and APEX, a label-free computationally modified spectral counting methodology.
Overall, 1761 proteins were quantitated at a 5% FDR (false discovery rate), including 1480 and 1505 from in vitro and in vivo samples, respectively. Identification of 350 cytoplasmic membrane and outer membrane (OM) proteins (38% of in silico predicted SD1 membrane proteome) contributed to the most extensive survey of the Shigella membrane proteome reported so far. Differential protein abundance analysis using statistical tests revealed that SD1 cells switched to an anaerobic energy metabolism under in vivo conditions, resulting in an increase in fermentative, propanoate, butanoate and nitrate metabolism. Abundance increases of transcription activators FNR and Nar supported the notion of a switch from aerobic to anaerobic respiration in the host gut environment. High in vivo abundances of proteins involved in acid resistance (GadB, AdiA) and mixed acid fermentation (PflA/PflB) indicated bacterial survival responses to acid stress, while increased abundance of oxidative stress proteins (YfiD/YfiF/SodB) implied that defense mechanisms against oxygen radicals were mobilized. Proteins involved in peptidoglycan turnover (MurB) were increased, while β-barrel OM proteins (OmpA), OM lipoproteins (NlpD), chaperones involved in OM protein folding pathways (YraP, NlpB) and lipopolysaccharide biosynthesis (Imp) were decreased, suggesting unexpected modulations of the outer membrane/peptidoglycan layers in vivo. Several virulence proteins of the Mxi-Spa type III secretion system and invasion plasmid antigens (Ipa proteins) required for invasion of colonic epithelial cells, and release of bacteria into the host cell cytosol were increased in vivo.
Global proteomic profiling of SD1 comparing in vivo vs. in vitro proteomes revealed differential expression of proteins geared towards survival of the pathogen in the host gut environment, including increased abundance of proteins involved in anaerobic energy respiration, acid resistance and virulence. The immunogenic OspC2, OspC3 and IpgA virulence proteins were detected solely under in vivo conditions, lending credence to their candidacy as potential vaccine targets.
The Staphylococcus aureus surface protein G (SasG) is an important mediator of biofilm formation in virulent S. aureus strains. A detailed analysis of its primary sequence has not been reported to date. SasG is highly abundant in the cell wall of the vancomycin-intermediate S. aureus strain HIP5827, and was purified and subjected to sequence analysis by MS. Data from MALDI-TOF and LC-MS/MS experiments confirmed the predicted N-terminal signal peptide cleavage site at residue A51 and the C-terminal cell wall anchor site at residue T1086. The protein was also derivatized with N-succinimidyloxycarbonyl-methyl-tris(2,4,6-trimethoxyphenyl) phosphonium bromide (TMPP-Ac-OSu) to assess the presence of additional N-terminal sites of mature SasG. TMPP-derivatized SasG peptides featured m/z peaks with a 572 Da mass increase over the equivalent underivatized peptides. Multiple N-terminal peptides, all of which were observed in the 150 amino acid segment following the signal peptide cleavage at the residue A51, were characterized from MS and MS/MS data, suggesting a series of successive N-terminal truncations of SasG. A strategy combining TMPP derivatization, multiple enzyme digestions to generate overlapping peptides and detailed MS analysis will be useful to determine and understand functional implications of PTMs in bacterial cell wall-anchored proteins, which are frequently involved in the modulation of virulence-associated bacterial surface properties.
N-terminal truncation; TMPP labeling; multiple enzyme digestion; SasG; mass spectrometry; post-translational modifications
Uncharacterized proteases naturally expressed by bacterial pathogens represents important topic in infectious disease research, because these enzymes may have critical roles in pathogenicity and cell physiology. It has been observed that cloning, expression and purification of proteases often fail due to their catalytic functions which, in turn, cause toxicity in the E. coli heterologous host.
In order to address this problem systematically, a modified pipeline of our high-throughput protein expression and purification platform was developed. This included the use of a specific E. coli strain, BL21(DE3) pLysS to tightly control the expression of recombinant proteins and various expression vectors encoding fusion proteins to enhance recombinant protein solubility. Proteases fused to large fusion protein domains, maltosebinding protein (MBP), SP-MBP which contains signal peptide at the N-terminus of MBP, disulfide oxidoreductase (DsbA) and Glutathione S-transferase (GST) improved expression and solubility of proteases. Overall, 86.1% of selected protease genes including hypothetical proteins were expressed and purified using a combination of five different expression vectors. To detect novel proteolytic activities, zymography and fluorescence-based assays were performed and the protease activities of more than 46% of purified proteases and 40% of hypothetical proteins that were predicted to be proteases were confirmed.
Multiple expression vectors, employing distinct fusion tags in a high throughput pipeline increased overall success rates in expression, solubility and purification of proteases. The combinatorial functional analysis of the purified proteases using fluorescence assays and zymography confirmed their function.
While the pneumococcal protein conjugate vaccines reduce the incidence in invasive pneumococcal disease (IPD), serotype replacement remains a major concern. Thus, serotype-independent protection with vaccines targeting virulence genes, such as PspA, have been pursued. PspA is comprised of diverse clades that arose through recombination. Therefore, multi-locus sequence typing (MLST)-defined clones could conceivably include strains from multiple PspA clades. As a result, a method is needed which can both monitor the long-term epidemiology of the pneumococcus among a large number of isolates, and analyze vaccine-candidate genes, such as pspA, for mutations and recombination events that could result in ‘vaccine escape’ strains.
We developed a resequencing array consisting of five conserved and six variable genes to characterize 72 pneumococcal strains. The phylogenetic analysis of the 11 concatenated genes was performed with the MrBayes program, the single nucleotide polymorphism (SNP) analysis with the DNA Sequence Polymorphism program (DnaSP), and the recombination event analysis with the recombination detection package (RDP).
The phylogenetic analysis correlated with MLST, and identified clonal strains with unique PspA clades. The DnaSP analysis correlated with the serotype-specific diversity detected using MLST. Serotypes associated with more than one ST complex had a larger degree of sequence polymorphism than a serotype associated with one ST complex. The RDP analysis confirmed the high frequency of recombination events in the pspA gene.
The phylogenetic tree correlated with MLST, and detected multiple PspA clades among clonal strains. The genetic diversity of the strains and the frequency of recombination events in the mosaic gene, pspA were accurately assessed using the DnaSP and RDP programs, respectively. These data provide proof-of-concept that resequencing arrays could play an important role within research and clinical laboratories in both monitoring the molecular epidemiology of the pneumococcus and detecting ‘vaccine escape’ strains among vaccine-candidate genes.
Shigella dysenteriae serotype 1 (SD1) causes the most severe form of epidemic bacillary dysentery. We present the first comprehensive proteome analysis of this pathogen, profiling proteins from bacteria cultured in vitro and bacterial isolates from the large bowel of infected gnotobiotic piglets (in vivo). Overall, 1061 distinct gene products were identified. Differential display analysis revealed that SD1 cells switched to an anaerobic energy metabolism in vivo. High in vivo abundances of amino acid decarboxylases (GadB and AdiA) which enhance pH homeostasis in the cytoplasm and protein disaggregation chaperones (HdeA, HdeB and ClpB) were indicative of a coordinated bacterial survival response to acid stress. Several type III secretion system (T3SS) effectors were increased in abundance in vivo, including OspF, IpaC and IpaD. These proteins are implicated in invasion of colonocytes and subversion of the host immune response in S. flexneri. These observations likely reflect an adaptive response of SD1 to the hostile host environment. Seven proteins, among them the T3SS effectors OspC2 and IpaB, were detected as antigens in western blots using piglet antisera. The outer membrane protein OmpA, the heat shock protein HtpG and OspC2 represent novel SD1 subunit vaccine candidates and drug targets.
acid stress; bacillary dysentery; proteome analysis; Shigella dysenteriae
Mutations within codon 306 of the Mycobacterium tuberculosis embB gene modestly increase ethambutol (EMB) MICs. To identify other causes of EMB resistance and to identify causes of high-level resistance, we generated EMB-resistant M. tuberculosis isolates in vitro and performed allelic exchange studies of embB codon 406 (embB406) and embB497 mutations. In vitro selection produced mutations already identified clinically in embB306, embB397, embB497, embB1024, and embC13, which result in EMB MICs of 8 or 14 μg/ml, 5 μg/ml, 12 μg/ml, 3 μg/ml, and 4 μg/ml, respectively, and mutations at embB320, embB324, and embB445, which have not been identified in clinical M. tuberculosis isolates and which result in EMB MICs of 8 μg/ml, 8 μg/ml, and 2 to 8 μg/ml, respectively. To definitively identify the effect of the common clinical embB497 and embB406 mutations on EMB susceptibility, we created a series of isogenic mutants, exchanging the wild-type embB497 CAG codon in EMB-susceptible M. tuberculosis strain 210 for the embB497 CGG codon and the wild-type embB406 GGC codon for either the embB406 GCC, embB406 TGC, embB406 TCC, or embB406 GAC codon. These new mutants showed 6-fold and 3- to 3.5-fold increases in the EMB MICs, respectively. In contrast to the embB306 mutants, the isogenic embB497 and embB406 mutants did not have preferential growth in the presence of isoniazid or rifampin (rifampicin) at their MICs. These results demonstrate that individual embCAB mutations confer low to moderate increases in EMB MICs. Discrepancies between the EMB MICs of laboratory mutants and clinical M. tuberculosis strains with identical mutations suggest that clinical EMB resistance is multigenic and that high-level EMB resistance requires mutations in currently unknown loci.
Extraction of crude membrane fractions with alkaline solutions, such as 100–200 mM Na2CO3 (pH ~11), is often used to solubilize peripheral membrane proteins. Integral membrane proteins are largely retained in membrane pellets. We applied this method to the fractionation of membrane proteins of the plague bacterium Yersinia pestis. Extensive horizontal spot trains were observed in 2-DE gels. The pI values of the most basic spots part of such protein spot trains usually matched the computationally predicted pI values. Regular patterns of decreasing spot pI values and in silico analysis with the software ProMoST suggested `n-1' deamidations of asparagine (N) and/or glutamine (Q) side chains for `n' observed spots of a protein in a given spot train. MALDI-MS analysis confirmed the occurrence of deamidations, particularly in N side chains part of NG dipeptide motifs. In more than ten cases, tandem MS data for tryptic peptides provided strong evidence for deamidations, with y- and b-ion series increased by 1 Da following N-to-D substitutions. Horizontal spot trains in 2-DE gels were rare when alkaline extraction was omitted during membrane protein sample preparation. This study strongly supports the notion that exposure to alkaline pH solutions is a dominant cause of extensive N and Q side chain deamidations in proteins during sample preparation of membrane extracts. The modifications are of non-enzymatic nature and not physiologically relevant. Therefore, quantitative spot differences within spot trains in differential protein display experiments following the aforementioned sample preparation steps need to be interpreted cautiously.
Alkaline membrane extraction; deamidation; membrane proteome; spot train; two-dimensional gel electrophoresis
Whole genome amplification (WGA) offers new possibilities for genome-wide association studies where limited DNA samples have been collected. This study provides a realistic and high-precision assessment of WGA DNA genotyping performance from 20-year old archived serum samples using the Affymetrix Genome-Wide Human SNP Array 6.0 (SNP6.0) platform.
Whole-genome amplified (WGA) DNA samples from 45 archived serum replicates and 5 fresh sera paired with non-amplified genomic DNA were genotyped in duplicate. All genotyped samples passed the imposed QC thresholds for quantity and quality. In general, WGA serum DNA samples produced low call rates (45.00 +/- 2.69%), although reproducibility for successfully called markers was favorable (concordance = 95.61 +/- 4.39%). Heterozygote dropouts explained the majority (>85% in technical replicates, 50% in paired genomic/serum samples) of discordant results. Genotyping performance on WGA serum DNA samples was improved by implementation of Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) algorithm but at the loss of many samples which failed to pass its quality threshold. Poor genotype clustering was evident in the samples that failed the CRLMM confidence threshold.
We conclude that while it is possible to extract genomic DNA and subsequently perform whole-genome amplification from archived serum samples, WGA serum DNA did not perform well and appeared unsuitable for high-resolution genotyping on these arrays.
In the postgenomic era, high throughput protein expression and protein microarray technologies have progressed markedly permitting screening of therapeutic reagents and discovery of novel protein functions. Hexa-histidine is one of the most commonly used fusion tags for protein expression due to its small size and convenient purification via immobilized metal ion affinity chromatography (IMAC). This purification process has been adapted to the protein microarray format, but the quality of in situ His-tagged protein purification on slides has not been systematically evaluated. We established methods to determine the level of purification of such proteins on metal chelate-modified slide surfaces. Optimized in situ purification of His-tagged recombinant proteins has the potential to become the new gold standard for cost-effective generation of high-quality and high-density protein microarrays.
Two slide surfaces were examined, chelated Cu2+ slides suspended on a polyethylene glycol (PEG) coating and chelated Ni2+ slides immobilized on a support without PEG coating. Using PEG-coated chelated Cu2+ slides, consistently higher purities of recombinant proteins were measured. An optimized wash buffer (PBST) composed of 10 mM phosphate buffer, 2.7 mM KCl, 140 mM NaCl and 0.05% Tween 20, pH 7.4, further improved protein purity levels. Using Escherichia coli cell lysates expressing 90 recombinant Streptococcus pneumoniae proteins, 73 proteins were successfully immobilized, and 66 proteins were in situ purified with greater than 90% purity. We identified several antigens among the in situ-purified proteins via assays with anti-S. pneumoniae rabbit antibodies and a human patient antiserum, as a demonstration project of large scale microarray-based immunoproteomics profiling. The methodology is compatible with higher throughput formats of in vivo protein expression, eliminates the need for resin-based purification and circumvents protein solubility and denaturation problems caused by buffer exchange steps and freeze-thaw cycles, which are associated with resin-based purification, intermittent protein storage and deposition on microarrays.
An optimized platform for in situ protein purification on microarray slides using His-tagged recombinant proteins is a desirable tool for the screening of novel protein functions and protein-protein interactions. In the context of immunoproteomics, such protein microarrays are complimentary to approaches using non-recombinant methods to discover and characterize bacterial antigens.
DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip® array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.
Mycobacterium tuberculosis strains contain different genomic insertions or deletions called large sequence polymorphisms (LSPs). Distinguishing between LSPs that occur one time versus ones that occur repeatedly in a genomic region may provide insights into the biological roles of LSPs and identify useful phylogenetic markers. We analyzed 163 clinical M. tuberculosis isolates for 17 LSPs identified in a genomic comparison of M. tuberculosis strains H37Rv and CDC1551. LSPs were mapped onto a single-nucleotide polymorphism (SNP)-based phylogenetic tree created using nine novel SNP markers that were found to reproduce a 212-SNP-based phylogeny. Four LSPs (group A) mapped to a single SNP tree segment. Two LSPs (group B) and 11 LSPs (group C) were inferred to have arisen independently in the same genomic region either two or more than two times, respectively. None of the group A LSPs but one group B LSP and five group C LSPs were flanked by IS6110 sequences in the references strains. Genes encoding members of the proline-glutamic acid or proline-proline-glutamic acid protein families were present only in group B or C LSPs. SNP- versus LSP-based phylogenies were also compared. We classified each isolate into 58 LSP types by using a separate LSP-based phylogenetic analysis and mapped the LSP types onto the SNP tree. LSPs often assigned isolates to the correct phylogenetic lineage; however, significant mistakes occurred for 6/58 (10%) of the LSP types. In conclusion, most LSPs occur in genomic regions that are prone to repeated insertion/deletion events and were responsible for an unexpectedly high degree of genomic variation in clinical M. tuberculosis. Group B and C LSPs may represent polymorphisms that occur due to selective pressure and affect the phenotype of the organism, while group A LSPs are preferable phylogenetic markers.
We used Porphyromonas gingivalis gene microarrays to compare the total gene contents of the virulent strain W83 and the avirulent type strain, ATCC 33277. Signal ratios and scatter plots indicated that the chromosomes were very similar, with approximately 93% of the predicted genes in common, while at least 7% of them showed very low or no signals in ATCC 33277. Verification of the array results by PCR indicated that several of the disparate genes were either absent from or variant in ATCC 33277. Divergent features included already reported insertion sequences and ragB, as well as additional hypothetical and functionally assigned genes. Several of the latter were organized in a putative operon in W83 and encoded enzymes involved in capsular polysaccharide synthesis. Another cluster was associated with two paralogous regions of the chromosome with a low G+C content, at 41%, compared to that of the whole genome, at 48%. These regions also contained conserved and species-specific hypothetical genes, transposons, insertion sequences, and integrases and were located adjacent to tRNA genes; thus, they had several characteristics of pathogenicity islands. While this global comparative analysis showed the close relationship between W83 and ATCC 33277, the clustering of genes that are present in W83 but divergent in or absent from ATCC 33277 is suggestive of chromosomal islands that may have been acquired by lateral gene transfer.
The Mycobacterium tuberculosis alternate sigma factor, SigF, is expressed during stationary growth phase and under stress conditions in vitro. To better understand the function of SigF we studied the phenotype of the M. tuberculosis ΔsigF mutant in vivo during mouse infection, tested the mutant as a vaccine in rabbits, and evaluated the mutant's microarray expression profile in comparison with the wild type. In mice the growth rates of the ΔsigF mutant and wild-type strains were nearly identical during the first 8 weeks after infection. At 8 weeks, the ΔsigF mutant persisted in the lung, while the wild type continued growing through 20 weeks. Histopathological analysis showed that both wild-type and mutant strains had similar degrees of interstitial and granulomatous inflammation during the first 12 weeks of infection. However, from 12 to 20 weeks the mutant strain showed smaller and fewer lesions and less inflammation in the lungs and spleen. Intradermal vaccination of rabbits with the M. tuberculosis ΔsigF strain, followed by aerosol challenge, resulted in fewer tubercles than did intradermal M. bovis BCG vaccination. Complete genomic microarray analysis revealed that 187 genes were relatively underexpressed in the absence of SigF in early stationary phase, 277 in late stationary phase, and only 38 genes in exponential growth phase. Numerous regulatory genes and those involved in cell envelope synthesis were down-regulated in the absence of SigF; moreover, the ΔsigF mutant strain lacked neutral red staining, suggesting a reduction in the expression of envelope-associated sulfolipids. Examination of 5′-untranslated sequences among the downregulated genes revealed multiple instances of a putative SigF consensus recognition sequence: GGTTTCX18GGGTAT. These results indicate that in the mouse the M. tuberculosis ΔsigF mutant strain persists in the lung but at lower bacterial burdens than wild type and is attenuated by histopathologic assessment. Microarray analysis has identified SigF-dependent genes and a putative SigF consensus recognition site.
The complete 2,343,479-bp genome sequence of the gram-negative, pathogenic oral bacterium Porphyromonas gingivalis strain W83, a major contributor to periodontal disease, was determined. Whole-genome comparative analysis with other available complete genome sequences confirms the close relationship between the Cytophaga-Flavobacteria-Bacteroides (CFB) phylum and the green-sulfur bacteria. Within the CFB phyla, the genomes most similar to that of P. gingivalis are those of Bacteroides thetaiotaomicron and B. fragilis. Outside of the CFB phyla the most similar genome to P. gingivalis is that of Chlorobium tepidum, supporting the previous phylogenetic studies that indicated that the Chlorobia and CFB phyla are related, albeit distantly. Genome analysis of strain W83 reveals a range of pathways and virulence determinants that relate to the novel biology of this oral pathogen. Among these determinants are at least six putative hemagglutinin-like genes and 36 previously unidentified peptidases. Genome analysis also reveals that P. gingivalis can metabolize a range of amino acids and generate a number of metabolic end products that are toxic to the human host or human gingival tissue and contribute to the development of periodontal disease.
The comparative-genomic sequencing of two Mycobacterium tuberculosis strains enabled us to identify single nucleotide polymorphism (SNP) markers for studies of evolution, pathogenesis, and epidemiology in clinical M. tuberculosis. Phylogenetic analysis using these “comparative-genome markers” (CGMs) produced a highly unusual phylogeny with a complete absence of secondary branches. To investigate CGM-based phylogenies, we devised computer models to simulate sequence evolution and calculate new phylogenies based on an SNP format. We found that CGMs represent a distinct class of phylogenetic markers that depend critically on the genetic distances between compared “reference strains.” Properly distanced reference strains generate CGMs that accurately depict evolutionary relationships, distorted only by branch collapse. Improperly distanced reference strains generate CGMs that distort and reroot outgroups. Applying this understanding to the CGM-based phylogeny of M. tuberculosis, we found evidence to suggest that this species is highly clonal without detectable lateral gene exchange. We noted indications of evolutionary bottlenecks, including one at the level of the PHRI “C” strain previously associated with particular virulence characteristics. Our evidence also suggests that loss of IS6110 to fewer than seven elements per genome is uncommon. Finally, we present population-based evidence that KasA, an important component of mycolic acid biosynthesis, develops G312S polymorphisms under selective pressure.
Porphyromonas gingivalis, a black-pigmented, gram-negative anaerobe, is found in periodontitis lesions, and its presence in subgingival plaque significantly increases the risk for periodontitis. In contrast to many bacterial pathogens, P. gingivalis strains display considerable variability, which is likely due to genetic exchange and intragenomic changes. To explore the latter possibility, we have studied the occurrence of insertion sequence (IS)-like elements in P. gingivalis W83 by utilizing a convenient and rapid method of capturing IS-like sequences and through analysis of the genome sequence of P. gingivalis strain W83. We adapted the method of Matsutani et al. (S. Matsutani, H. Ohtsubo, Y. Maeda, and E. Ohtsubo, J. Mol. Biol. 196:445–455, 1987) to isolate and clone rapidly annealing DNA sequences characteristic of repetitive regions within a genome. We show that in P. gingivalis strain W83, such sequences include (i) nucleotide sequence with homology to tRNA genes, (ii) a previously described IS element, and (iii) a novel IS-like element. Analysis of the P. gingivalis genome sequence for the distribution of the least used tetranucleotide, CTAG, identified regions in many of the initial 218 contigs which contained CTAG clusters. Examination of these CTAG clusters led to the discovery of 11 copies of the same novel IS-like element identified by the repeated sequence capture method of Matsutani et al. This new 1,512-bp IS-like element, designated ISPg5, has features of the IS3 family of IS elements. When a recombinant plasmid containing much of ISPg5 was used in Southern analysis of several P. gingivalis strains, including clinical isolates, diversity among strains was apparent. This suggests that ISPg5 and other IS elements may contribute to strain diversity and can be used for strain fingerprinting.
The Porphyromonas gingivalis genome contains multiple copies of insertion element IS1126. When chromosomal DNA digests of different strains were probed with IS1126, between 25 and 35 hybridizing fragments per genome were detected, depending on the strain. Unrelated strains had very different restriction fragment length polymorphism (RFLP) patterns. When different laboratory copies of a specific strain were examined, the IS1126 RFLP patterns were very similar but small differences were observed, indicating that element-associated changes had occurred during laboratory passage. Within the next year, genome sequencing, assembly, and annotation for P. gingivalis W83 will be completed. Because repetitive elements complicate the assembly of randomly sequenced DNA fragments, we isolated and sequenced the flanking regions of IS1126 copies in strain W83. We also isolated and sequenced the flanking regions of IS1126 copies in strain ATCC 33277 in order to compare insertion sites in phylogenetically divergent strains. We identified 37 new sequences flanking IS1126 from strain ATCC 33277 and 30 from strain W83. The insertion element was found between genes except where it transposed into another insertion element. Examination of identifiable flanking genes or open reading frames indicated that the insertion sites were different in the two strains, except that both strains possess an insertion adjacent to the Lys-gingipain gene (J. P. Lewis and F. L. Macrina, Infect. Immun. 66:3035–3042, 1998). Most of the genes or sequences flanking IS1126 in ATCC 33277 were present in W83 but were contiguous and not insertion element associated. Thus, where genes were identified in both strains, their order was maintained, indicating that the two genomes are organized similarly, but the loci of IS1126 are different. In both strains, insertion element-associated duplicated target sites were lost from several copies of IS1126, providing evidence of homologous recombination between elements. Larger organizational differences between the genomes, such as deletions and inversions, may result from insertion element-mediated recombination events.
The recent outbreak of severe infections with Shiga toxin (Stx) producing Escherichia coli (STEC) serotype O104:H4 highlights the need to understand horizontal gene transfer among E. coli strains, identify novel virulence factors and elucidate their pathogenesis. Quantitative shotgun proteomics can contribute to such objectives, allowing insights into the part of the genome translated into proteins and the connectivity of biochemical pathways and higher order assemblies of proteins at the subcellular level.
We examined protein profiles in cell lysate fractions of STEC strain 86-24 (serotype O157:H7), following growth in cell culture or bacterial isolation from intestines of infected piglets, in the context of functionally and structurally characterized biochemical pathways of E. coli. Protein solubilization in the presence of Triton X-100, EDTA and high salt was followed by size exclusion chromatography into the approximate Mr ranges greater than 280 kDa, 280-80 kDa and 80-10 kDa. Peptide mixtures resulting from these and the insoluble fraction were analyzed by quantitative 2D-LC-nESI-MS/MS. Of the 2521 proteins identified at a 1% false discovery rate, representing 47% of all predicted E. coli O157:H7 gene products, the majority of integral membrane proteins were enriched in the high Mr fraction. Hundreds of proteins were enriched in a Mr range higher than that predicted for a monomer supporting their participation in protein complexes. The insoluble STEC fraction revealed enrichment of aggregation-prone proteins, including many that are part of large structure/function entities such as the ribosome, cytoskeleton and O-antigen biosynthesis cluster.
Nearly all E. coli O157:H7 proteins encoded by prophage regions were expressed at low abundance levels or not detected. Comparative quantitative analyses of proteins from distinct cell lysate fractions allowed us to associate uncharacterized proteins with membrane attachment, potential participation in stable protein complexes, and susceptibility to aggregation as part of larger structural assemblies.
A low genetic diversity in Francisella tularensis has been documented. Current DNA based genotyping methods for typing F. tularensis offer a limited and varying degree of subspecies, clade and strain level discrimination power. Whole genome sequencing is the most accurate and reliable method to identify, type and determine phylogenetic relationships among strains of a species. However, lower cost typing schemes are necessary in order to enable typing of hundreds or even thousands of isolates.
We have generated a high-resolution phylogenetic tree from 40 Francisella isolates, including 13 F. tularensis subspecies holarctica (type B) strains, 26 F. tularensis subsp. tularensis (type A) strains and a single F. novicida strain. The tree was generated from global multi-strain single nucleotide polymorphism (SNP) data collected using a set of six Affymetrix GeneChip® resequencing arrays with the non-repetitive portion of LVS (type B) as the reference sequence complemented with unique sequences of SCHU S4 (type A). Global SNP based phylogenetic clustering was able to resolve all non-related strains. The phylogenetic tree was used to guide the selection of informative SNPs specific to major nodes in the tree for development of a genotyping assay for identification of F. tularensis subspecies and clades. We designed and validated an assay that uses these SNPs to accurately genotype 39 additional F. tularensis strains as type A (A1, A2, A1a or A1b) or type B (B1 or B2).
Whole-genome SNP based clustering was shown to accurately identify SNPs for differentiation of F. tularensis subspecies and clades, emphasizing the potential power and utility of this methodology for selecting SNPs for typing of F. tularensis to the strain level. Additionally, whole genome sequence based SNP information gained from a representative population of strains may be used to perform evolutionary or phylogenetic comparisons of strains, or selection of unique strains for whole-genome sequencing projects.