The Gram-negative bacterium Yersinia pestis is the causative agent of the bubonic plague. Efficient iron acquisition systems are critical to the ability of Y. pestis to infect, spread and grow in mammalian hosts, because iron is sequestered and is considered part of the innate host immune defence against invading pathogens. We used a proteomic approach to determine expression changes of iron uptake systems and intracellular consequences of iron deficiency in the Y. pestis strain KIM6+ at two physiologically relevant temperatures (26°C and 37°C).
Differential protein display was performed for three Y. pestis subcellular fractions. Five characterized Y. pestis iron/siderophore acquisition systems (Ybt, Yfe, Yfu, Yiu and Hmu) and a putative iron/chelate outer membrane receptor (Y0850) were increased in abundance in iron-starved cells. The iron-sulfur (Fe-S) cluster assembly system Suf, adapted to oxidative stress and iron starvation in E. coli, was also more abundant, suggesting functional activity of Suf in Y. pestis under iron-limiting conditions. Metabolic and reactive oxygen-deactivating enzymes dependent on Fe-S clusters or other iron cofactors were decreased in abundance in iron-depleted cells. This data was consistent with lower activities of aconitase and catalase in iron-starved vs. iron-rich cells. In contrast, pyruvate oxidase B which metabolizes pyruvate via electron transfer to ubiquinone-8 for direct utilization in the respiratory chain was strongly increased in abundance and activity in iron-depleted cells.
Many protein abundance differences were indicative of the important regulatory role of the ferric uptake regulator Fur. Iron deficiency seems to result in a coordinated shift from iron-utilizing to iron-independent biochemical pathways in the cytoplasm of Y. pestis. With growth temperature as an additional variable in proteomic comparisons of the Y. pestis fractions (26°C and 37°C), there was little evidence for temperature-specific adaptation processes to iron starvation.
Yersinia pestis proteins were sequentially extracted from crude membranes with a high salt buffer (2.5 M NaBr), an alkaline solution (180 mM Na2CO3, pH 11.3) and membrane denaturants (8 M urea, 2 M thiourea and 1% amidosulfobetaine-14). Separation of proteins by 2D gel electrophoresis was followed by identification of more than 600 gene products by MS. Data from differential 2D gel display experiments, comparing protein abundances in cytoplasmic, periplasmic and all three membrane fractions, were used to assign proteins found in the membrane fractions to three protein categories: (i) integral membrane proteins and peripheral membrane proteins with low solubility in aqueous solutions (220 entries); (ii) peripheral membrane proteins with moderate to high solubility in aqueous solutions (127 entries); (iii) cytoplasmic or ribosomal membrane-contaminating proteins (80 entries). Thirty-one proteins were experimentally associated with the outer membrane (OM). Circa 50 proteins thought to be part of membrane-localized, multi-subunit complexes were identified in high Mr fractions of membrane extracts via size exclusion chromatography. This data supported biologically meaningful assignments of many proteins to the membrane periphery. Since only 32 inner membrane (IM) proteins with two or more predicted transmembrane domains (TMDs) were profiled in 2D gels, we resorted to a proteomic analysis by 2D-LC-MS/MS. Ninety-four additional IM proteins with two or more TMDs were identified. The total number of proteins associated with Y. pestis membranes increased to 456 and included representatives of all six β-barrel OM protein families and 25 distinct IM transporter families.
The complete genome sequence of the radiation resistant bacterium Deinococcus radiodurans R1 is composed of two chromosomes (2,648,615 and 412,340 basepairs), a megaplasmid (177,466 basepairs), and a small plasmid (45,702 basepairs) yielding a total genome of 3,284,123 basepairs. Multiple components distributed on the chromosomes and megaplasmid that contribute to the ability of D. radiodurans to survive under conditions of starvation, oxidative stress, and high levels of DNA-damage have been identified. D. radiodurans represents an organism in which all systems for DNA repair, DNA damage export, desiccation and starvation recovery, and genetic redundancy are present in one cell.
Here we report the use of a multi-genome DNA microarray to investigate the genome diversity of Bacillus cereus group members and elucidate the events associated with the emergence of B. anthracis the causative agent of anthrax–a lethal zoonotic disease. We initially performed directed genome sequencing of seven diverse B. cereus strains to identify novel sequences encoded in those genomes. The novel genes identified, combined with those publicly available, allowed the design of a “species” DNA microarray. Comparative genomic hybridization analyses of 41 strains indicates that substantial heterogeneity exists with respect to the genes comprising functional role categories. While the acquisition of the plasmid-encoded pathogenicity island (pXO1) and capsule genes (pXO2) represent a crucial landmark dictating the emergence of B. anthracis, the evolution of this species and its close relatives was associated with an overall a shift in the fraction of genes devoted to energy metabolism, cellular processes, transport, as well as virulence.
The pathogenic mold Aspergillus fumigatus is the most frequent infectious cause of death in severely immunocompromised individuals such as leukemia and bone marrow transplant patients. Germination of inhaled conidia (asexual spores) in the host is critical for the initiation of infection, but little is known about the underlying mechanisms of this process.
To gain insights into early germination events and facilitate the identification of potential stage-specific biomarkers and vaccine candidates, we have used quantitative shotgun proteomics to elucidate patterns of protein abundance changes during early fungal development. Four different stages were examined: dormant conidia, isotropically expanding conidia, hyphae in which germ tube emergence has just begun, and pre-septation hyphae. To enrich for glycan-linked cell wall proteins we used an alkaline cell extraction method. Shotgun proteomic resulted in the identification of 375 unique gene products with high confidence, with no evidence for enrichment of cell wall-immobilized and secreted proteins. The most interesting discovery was the identification of 52 proteins enriched in dormant conidia including 28 proteins that have never been detected in the A. fumigatus conidial proteome such as signaling protein Pil1, chaperones BipA and calnexin, and transcription factor HapB. Additionally we found many small, Aspergillus specific proteins of unknown function including 17 hypothetical proteins. Thus, the most abundant protein, Grg1 (AFUA_5G14210), was also one of the smallest proteins detected in this study (M.W. 7,367). Among previously characterized proteins were melanin pigment and pseurotin A biosynthesis enzymes, histones H3 and H4.1, and other proteins involved in conidiation and response to oxidative or hypoxic stress. In contrast, expanding conidia, hyphae with early germ tubes, and pre-septation hyphae samples were enriched for proteins responsible for housekeeping functions, particularly translation, respiratory metabolism, amino acid and carbohydrate biosynthesis, and the tricarboxylic acid cycle.
The observed temporal expression patterns suggest that the A. fumigatus conidia are dominated by small, lineage-specific proteins. Some of them may play key roles in host-pathogen interactions, signal transduction during conidial germination, or survival in hostile environments.
Mass spectrometry; LC-MS/MS; APEX; Shotgun proteomics; Aspergillus fumigatus; Germination; Spore; Conidia; Fungi; Hypothetical proteins
The recent outbreak of severe infections with Shiga toxin (Stx) producing Escherichia coli (STEC) serotype O104:H4 highlights the need to understand horizontal gene transfer among E. coli strains, identify novel virulence factors and elucidate their pathogenesis. Quantitative shotgun proteomics can contribute to such objectives, allowing insights into the part of the genome translated into proteins and the connectivity of biochemical pathways and higher order assemblies of proteins at the subcellular level.
We examined protein profiles in cell lysate fractions of STEC strain 86-24 (serotype O157:H7), following growth in cell culture or bacterial isolation from intestines of infected piglets, in the context of functionally and structurally characterized biochemical pathways of E. coli. Protein solubilization in the presence of Triton X-100, EDTA and high salt was followed by size exclusion chromatography into the approximate Mr ranges greater than 280 kDa, 280-80 kDa and 80-10 kDa. Peptide mixtures resulting from these and the insoluble fraction were analyzed by quantitative 2D-LC-nESI-MS/MS. Of the 2521 proteins identified at a 1% false discovery rate, representing 47% of all predicted E. coli O157:H7 gene products, the majority of integral membrane proteins were enriched in the high Mr fraction. Hundreds of proteins were enriched in a Mr range higher than that predicted for a monomer supporting their participation in protein complexes. The insoluble STEC fraction revealed enrichment of aggregation-prone proteins, including many that are part of large structure/function entities such as the ribosome, cytoskeleton and O-antigen biosynthesis cluster.
Nearly all E. coli O157:H7 proteins encoded by prophage regions were expressed at low abundance levels or not detected. Comparative quantitative analyses of proteins from distinct cell lysate fractions allowed us to associate uncharacterized proteins with membrane attachment, potential participation in stable protein complexes, and susceptibility to aggregation as part of larger structural assemblies.
Here we report the use of a multi-genome DNA microarray to elucidate the genomic events associated with the emergence of the clonal variants of H. influenzae biogroup aegyptius causing Brazilian Purpuric Fever (BPF), an important pediatric disease with a high mortality rate. We performed directed genome sequencing of strain HK1212 unique loci to construct a species DNA microarray. Comparative genome hybridization using this microarray enabled us to determine and compare gene complements, and infer reliable phylogenomic relationships among members of the species. The higher genomic variability observed in the genomes of BPF-related strains (clones) and their close relatives may be characterized by significant gene flux related to a subset of functional role categories. We found that the acquisition of a large number of virulence determinants featuring numerous cell membrane proteins coupled to the loss of genes involved in transport, central biosynthetic pathways and in particular, energy production pathways to be characteristics of the BPF genomic variants.
Haemophilus; Brazilian Purpuric Fever; pathogen emergence; virulence; comparative genomics; microarray
Shigella dysenteriae serotype 1 (SD1) causes the most severe form of epidemic bacillary dysentery. Quantitative proteome profiling of Shigella dysenteriae serotype 1 (SD1) in vitro (derived from LB cell cultures) and in vivo (derived from gnotobiotic piglets) was performed by 2D-LC-MS/MS and APEX, a label-free computationally modified spectral counting methodology.
Overall, 1761 proteins were quantitated at a 5% FDR (false discovery rate), including 1480 and 1505 from in vitro and in vivo samples, respectively. Identification of 350 cytoplasmic membrane and outer membrane (OM) proteins (38% of in silico predicted SD1 membrane proteome) contributed to the most extensive survey of the Shigella membrane proteome reported so far. Differential protein abundance analysis using statistical tests revealed that SD1 cells switched to an anaerobic energy metabolism under in vivo conditions, resulting in an increase in fermentative, propanoate, butanoate and nitrate metabolism. Abundance increases of transcription activators FNR and Nar supported the notion of a switch from aerobic to anaerobic respiration in the host gut environment. High in vivo abundances of proteins involved in acid resistance (GadB, AdiA) and mixed acid fermentation (PflA/PflB) indicated bacterial survival responses to acid stress, while increased abundance of oxidative stress proteins (YfiD/YfiF/SodB) implied that defense mechanisms against oxygen radicals were mobilized. Proteins involved in peptidoglycan turnover (MurB) were increased, while β-barrel OM proteins (OmpA), OM lipoproteins (NlpD), chaperones involved in OM protein folding pathways (YraP, NlpB) and lipopolysaccharide biosynthesis (Imp) were decreased, suggesting unexpected modulations of the outer membrane/peptidoglycan layers in vivo. Several virulence proteins of the Mxi-Spa type III secretion system and invasion plasmid antigens (Ipa proteins) required for invasion of colonic epithelial cells, and release of bacteria into the host cell cytosol were increased in vivo.
Global proteomic profiling of SD1 comparing in vivo vs. in vitro proteomes revealed differential expression of proteins geared towards survival of the pathogen in the host gut environment, including increased abundance of proteins involved in anaerobic energy respiration, acid resistance and virulence. The immunogenic OspC2, OspC3 and IpgA virulence proteins were detected solely under in vivo conditions, lending credence to their candidacy as potential vaccine targets.
The Staphylococcus aureus surface protein G (SasG) is an important mediator of biofilm formation in virulent S. aureus strains. A detailed analysis of its primary sequence has not been reported to date. SasG is highly abundant in the cell wall of the vancomycin-intermediate S. aureus strain HIP5827, and was purified and subjected to sequence analysis by MS. Data from MALDI-TOF and LC-MS/MS experiments confirmed the predicted N-terminal signal peptide cleavage site at residue A51 and the C-terminal cell wall anchor site at residue T1086. The protein was also derivatized with N-succinimidyloxycarbonyl-methyl-tris(2,4,6-trimethoxyphenyl) phosphonium bromide (TMPP-Ac-OSu) to assess the presence of additional N-terminal sites of mature SasG. TMPP-derivatized SasG peptides featured m/z peaks with a 572 Da mass increase over the equivalent underivatized peptides. Multiple N-terminal peptides, all of which were observed in the 150 amino acid segment following the signal peptide cleavage at the residue A51, were characterized from MS and MS/MS data, suggesting a series of successive N-terminal truncations of SasG. A strategy combining TMPP derivatization, multiple enzyme digestions to generate overlapping peptides and detailed MS analysis will be useful to determine and understand functional implications of PTMs in bacterial cell wall-anchored proteins, which are frequently involved in the modulation of virulence-associated bacterial surface properties.
N-terminal truncation; TMPP labeling; multiple enzyme digestion; SasG; mass spectrometry; post-translational modifications
Uncharacterized proteases naturally expressed by bacterial pathogens represents important topic in infectious disease research, because these enzymes may have critical roles in pathogenicity and cell physiology. It has been observed that cloning, expression and purification of proteases often fail due to their catalytic functions which, in turn, cause toxicity in the E. coli heterologous host.
In order to address this problem systematically, a modified pipeline of our high-throughput protein expression and purification platform was developed. This included the use of a specific E. coli strain, BL21(DE3) pLysS to tightly control the expression of recombinant proteins and various expression vectors encoding fusion proteins to enhance recombinant protein solubility. Proteases fused to large fusion protein domains, maltosebinding protein (MBP), SP-MBP which contains signal peptide at the N-terminus of MBP, disulfide oxidoreductase (DsbA) and Glutathione S-transferase (GST) improved expression and solubility of proteases. Overall, 86.1% of selected protease genes including hypothetical proteins were expressed and purified using a combination of five different expression vectors. To detect novel proteolytic activities, zymography and fluorescence-based assays were performed and the protease activities of more than 46% of purified proteases and 40% of hypothetical proteins that were predicted to be proteases were confirmed.
Multiple expression vectors, employing distinct fusion tags in a high throughput pipeline increased overall success rates in expression, solubility and purification of proteases. The combinatorial functional analysis of the purified proteases using fluorescence assays and zymography confirmed their function.
While the pneumococcal protein conjugate vaccines reduce the incidence in invasive pneumococcal disease (IPD), serotype replacement remains a major concern. Thus, serotype-independent protection with vaccines targeting virulence genes, such as PspA, have been pursued. PspA is comprised of diverse clades that arose through recombination. Therefore, multi-locus sequence typing (MLST)-defined clones could conceivably include strains from multiple PspA clades. As a result, a method is needed which can both monitor the long-term epidemiology of the pneumococcus among a large number of isolates, and analyze vaccine-candidate genes, such as pspA, for mutations and recombination events that could result in ‘vaccine escape’ strains.
We developed a resequencing array consisting of five conserved and six variable genes to characterize 72 pneumococcal strains. The phylogenetic analysis of the 11 concatenated genes was performed with the MrBayes program, the single nucleotide polymorphism (SNP) analysis with the DNA Sequence Polymorphism program (DnaSP), and the recombination event analysis with the recombination detection package (RDP).
The phylogenetic analysis correlated with MLST, and identified clonal strains with unique PspA clades. The DnaSP analysis correlated with the serotype-specific diversity detected using MLST. Serotypes associated with more than one ST complex had a larger degree of sequence polymorphism than a serotype associated with one ST complex. The RDP analysis confirmed the high frequency of recombination events in the pspA gene.
The phylogenetic tree correlated with MLST, and detected multiple PspA clades among clonal strains. The genetic diversity of the strains and the frequency of recombination events in the mosaic gene, pspA were accurately assessed using the DnaSP and RDP programs, respectively. These data provide proof-of-concept that resequencing arrays could play an important role within research and clinical laboratories in both monitoring the molecular epidemiology of the pneumococcus and detecting ‘vaccine escape’ strains among vaccine-candidate genes.
Shigella dysenteriae serotype 1 (SD1) causes the most severe form of epidemic bacillary dysentery. We present the first comprehensive proteome analysis of this pathogen, profiling proteins from bacteria cultured in vitro and bacterial isolates from the large bowel of infected gnotobiotic piglets (in vivo). Overall, 1061 distinct gene products were identified. Differential display analysis revealed that SD1 cells switched to an anaerobic energy metabolism in vivo. High in vivo abundances of amino acid decarboxylases (GadB and AdiA) which enhance pH homeostasis in the cytoplasm and protein disaggregation chaperones (HdeA, HdeB and ClpB) were indicative of a coordinated bacterial survival response to acid stress. Several type III secretion system (T3SS) effectors were increased in abundance in vivo, including OspF, IpaC and IpaD. These proteins are implicated in invasion of colonocytes and subversion of the host immune response in S. flexneri. These observations likely reflect an adaptive response of SD1 to the hostile host environment. Seven proteins, among them the T3SS effectors OspC2 and IpaB, were detected as antigens in western blots using piglet antisera. The outer membrane protein OmpA, the heat shock protein HtpG and OspC2 represent novel SD1 subunit vaccine candidates and drug targets.
acid stress; bacillary dysentery; proteome analysis; Shigella dysenteriae
Mutations within codon 306 of the Mycobacterium tuberculosis embB gene modestly increase ethambutol (EMB) MICs. To identify other causes of EMB resistance and to identify causes of high-level resistance, we generated EMB-resistant M. tuberculosis isolates in vitro and performed allelic exchange studies of embB codon 406 (embB406) and embB497 mutations. In vitro selection produced mutations already identified clinically in embB306, embB397, embB497, embB1024, and embC13, which result in EMB MICs of 8 or 14 μg/ml, 5 μg/ml, 12 μg/ml, 3 μg/ml, and 4 μg/ml, respectively, and mutations at embB320, embB324, and embB445, which have not been identified in clinical M. tuberculosis isolates and which result in EMB MICs of 8 μg/ml, 8 μg/ml, and 2 to 8 μg/ml, respectively. To definitively identify the effect of the common clinical embB497 and embB406 mutations on EMB susceptibility, we created a series of isogenic mutants, exchanging the wild-type embB497 CAG codon in EMB-susceptible M. tuberculosis strain 210 for the embB497 CGG codon and the wild-type embB406 GGC codon for either the embB406 GCC, embB406 TGC, embB406 TCC, or embB406 GAC codon. These new mutants showed 6-fold and 3- to 3.5-fold increases in the EMB MICs, respectively. In contrast to the embB306 mutants, the isogenic embB497 and embB406 mutants did not have preferential growth in the presence of isoniazid or rifampin (rifampicin) at their MICs. These results demonstrate that individual embCAB mutations confer low to moderate increases in EMB MICs. Discrepancies between the EMB MICs of laboratory mutants and clinical M. tuberculosis strains with identical mutations suggest that clinical EMB resistance is multigenic and that high-level EMB resistance requires mutations in currently unknown loci.
Extraction of crude membrane fractions with alkaline solutions, such as 100–200 mM Na2CO3 (pH ~11), is often used to solubilize peripheral membrane proteins. Integral membrane proteins are largely retained in membrane pellets. We applied this method to the fractionation of membrane proteins of the plague bacterium Yersinia pestis. Extensive horizontal spot trains were observed in 2-DE gels. The pI values of the most basic spots part of such protein spot trains usually matched the computationally predicted pI values. Regular patterns of decreasing spot pI values and in silico analysis with the software ProMoST suggested `n-1' deamidations of asparagine (N) and/or glutamine (Q) side chains for `n' observed spots of a protein in a given spot train. MALDI-MS analysis confirmed the occurrence of deamidations, particularly in N side chains part of NG dipeptide motifs. In more than ten cases, tandem MS data for tryptic peptides provided strong evidence for deamidations, with y- and b-ion series increased by 1 Da following N-to-D substitutions. Horizontal spot trains in 2-DE gels were rare when alkaline extraction was omitted during membrane protein sample preparation. This study strongly supports the notion that exposure to alkaline pH solutions is a dominant cause of extensive N and Q side chain deamidations in proteins during sample preparation of membrane extracts. The modifications are of non-enzymatic nature and not physiologically relevant. Therefore, quantitative spot differences within spot trains in differential protein display experiments following the aforementioned sample preparation steps need to be interpreted cautiously.
Alkaline membrane extraction; deamidation; membrane proteome; spot train; two-dimensional gel electrophoresis
Whole genome amplification (WGA) offers new possibilities for genome-wide association studies where limited DNA samples have been collected. This study provides a realistic and high-precision assessment of WGA DNA genotyping performance from 20-year old archived serum samples using the Affymetrix Genome-Wide Human SNP Array 6.0 (SNP6.0) platform.
Whole-genome amplified (WGA) DNA samples from 45 archived serum replicates and 5 fresh sera paired with non-amplified genomic DNA were genotyped in duplicate. All genotyped samples passed the imposed QC thresholds for quantity and quality. In general, WGA serum DNA samples produced low call rates (45.00 +/- 2.69%), although reproducibility for successfully called markers was favorable (concordance = 95.61 +/- 4.39%). Heterozygote dropouts explained the majority (>85% in technical replicates, 50% in paired genomic/serum samples) of discordant results. Genotyping performance on WGA serum DNA samples was improved by implementation of Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) algorithm but at the loss of many samples which failed to pass its quality threshold. Poor genotype clustering was evident in the samples that failed the CRLMM confidence threshold.
We conclude that while it is possible to extract genomic DNA and subsequently perform whole-genome amplification from archived serum samples, WGA serum DNA did not perform well and appeared unsuitable for high-resolution genotyping on these arrays.
A low genetic diversity in Francisella tularensis has been documented. Current DNA based genotyping methods for typing F. tularensis offer a limited and varying degree of subspecies, clade and strain level discrimination power. Whole genome sequencing is the most accurate and reliable method to identify, type and determine phylogenetic relationships among strains of a species. However, lower cost typing schemes are necessary in order to enable typing of hundreds or even thousands of isolates.
We have generated a high-resolution phylogenetic tree from 40 Francisella isolates, including 13 F. tularensis subspecies holarctica (type B) strains, 26 F. tularensis subsp. tularensis (type A) strains and a single F. novicida strain. The tree was generated from global multi-strain single nucleotide polymorphism (SNP) data collected using a set of six Affymetrix GeneChip® resequencing arrays with the non-repetitive portion of LVS (type B) as the reference sequence complemented with unique sequences of SCHU S4 (type A). Global SNP based phylogenetic clustering was able to resolve all non-related strains. The phylogenetic tree was used to guide the selection of informative SNPs specific to major nodes in the tree for development of a genotyping assay for identification of F. tularensis subspecies and clades. We designed and validated an assay that uses these SNPs to accurately genotype 39 additional F. tularensis strains as type A (A1, A2, A1a or A1b) or type B (B1 or B2).
Whole-genome SNP based clustering was shown to accurately identify SNPs for differentiation of F. tularensis subspecies and clades, emphasizing the potential power and utility of this methodology for selecting SNPs for typing of F. tularensis to the strain level. Additionally, whole genome sequence based SNP information gained from a representative population of strains may be used to perform evolutionary or phylogenetic comparisons of strains, or selection of unique strains for whole-genome sequencing projects.
In the postgenomic era, high throughput protein expression and protein microarray technologies have progressed markedly permitting screening of therapeutic reagents and discovery of novel protein functions. Hexa-histidine is one of the most commonly used fusion tags for protein expression due to its small size and convenient purification via immobilized metal ion affinity chromatography (IMAC). This purification process has been adapted to the protein microarray format, but the quality of in situ His-tagged protein purification on slides has not been systematically evaluated. We established methods to determine the level of purification of such proteins on metal chelate-modified slide surfaces. Optimized in situ purification of His-tagged recombinant proteins has the potential to become the new gold standard for cost-effective generation of high-quality and high-density protein microarrays.
Two slide surfaces were examined, chelated Cu2+ slides suspended on a polyethylene glycol (PEG) coating and chelated Ni2+ slides immobilized on a support without PEG coating. Using PEG-coated chelated Cu2+ slides, consistently higher purities of recombinant proteins were measured. An optimized wash buffer (PBST) composed of 10 mM phosphate buffer, 2.7 mM KCl, 140 mM NaCl and 0.05% Tween 20, pH 7.4, further improved protein purity levels. Using Escherichia coli cell lysates expressing 90 recombinant Streptococcus pneumoniae proteins, 73 proteins were successfully immobilized, and 66 proteins were in situ purified with greater than 90% purity. We identified several antigens among the in situ-purified proteins via assays with anti-S. pneumoniae rabbit antibodies and a human patient antiserum, as a demonstration project of large scale microarray-based immunoproteomics profiling. The methodology is compatible with higher throughput formats of in vivo protein expression, eliminates the need for resin-based purification and circumvents protein solubility and denaturation problems caused by buffer exchange steps and freeze-thaw cycles, which are associated with resin-based purification, intermittent protein storage and deposition on microarrays.
An optimized platform for in situ protein purification on microarray slides using His-tagged recombinant proteins is a desirable tool for the screening of novel protein functions and protein-protein interactions. In the context of immunoproteomics, such protein microarrays are complimentary to approaches using non-recombinant methods to discover and characterize bacterial antigens.
The in vitro stationary phase proteome of the human pathogen Shigella dysenteriae serotype 1 (SD1) was quantitatively analyzed in Coomassie Blue G250 (CBB)-stained 2D gels. More than four hundred and fifty proteins, of which 271 were associated with distinct gel spots, were identified. In parallel, we employed 2D-LC-MS/MS followed by the label-free computationally modified spectral counting method APEX for absolute protein expression measurements. Of the 4502 genome-predicted SD1 proteins, 1148 proteins were identified with a false positive discovery rate of 5% and quantitated using 2D-LC-MS/MS and APEX. The dynamic range of the APEX method was approximately one order of magnitude higher than that of CBB-stained spot intensity quantitation. A squared Pearson correlation analysis revealed a reasonably good correlation (R2 = 0.67) for protein quantities surveyed by both methods. The correlation was decreased for protein subsets with specific physicochemical properties, such as low Mr values and high hydropathy scores. Stoichiometric ratios of subunits of protein complexes characterized in E. coli were compared with APEX quantitative ratios of orthologous SD1 protein complexes. A high correlation was observed for subunits of soluble cellular protein complexes in several cases, demonstrating versatile applications of the APEX method in quantitative proteomics.
Mass spectrometry (MS) based label-free protein quantitation has mainly focused on analysis of ion peak heights and peptide spectral counts. Most analyses of tandem mass spectrometry (MS/MS) data begin with an enzymatic digestion of a complex protein mixture to generate smaller peptides that can be separated and identified by an MS/MS instrument. Peptide spectral counting techniques attempt to quantify protein abundance by counting the number of detected tryptic peptides and their corresponding MS spectra. However, spectral counting is confounded by the fact that peptide physicochemical properties severely affect MS detection resulting in each peptide having a different detection probability. Lu et al. (2007) described a modified spectral counting technique, Absolute Protein Expression (APEX), which improves on basic spectral counting methods by including a correction factor for each protein (called Oi value) that accounts for variable peptide detection by MS techniques. The technique uses machine learning classification to derive peptide detection probabilities that are used to predict the number of tryptic peptides expected to be detected for one molecule of a particular protein (Oi). This predicted spectral count is compared to the protein's observed MS total spectral count during APEX computation of protein abundances.
The APEX Quantitative Proteomics Tool, introduced here, is a free open source Java application that supports the APEX protein quantitation technique. The APEX tool uses data from standard tandem mass spectrometry proteomics experiments and provides computational support for APEX protein abundance quantitation through a set of graphical user interfaces that partition thparameter controls for the various processing tasks. The tool also provides a Z-score analysis for identification of significant differential protein expression, a utility to assess APEX classifier performance via cross validation, and a utility to merge multiple APEX results into a standardized format in preparation for further statistical analysis.
The APEX Quantitative Proteomics Tool provides a simple means to quickly derive hundreds to thousands of protein abundance values from standard liquid chromatography-tandem mass spectrometry proteomics datasets. The APEX tool provides a straightforward intuitive interface design overlaying a highly customizable computational workflow to produce protein abundance values from LC-MS/MS datasets.
DNA resequencing arrays enable rapid acquisition of high-quality sequence data. This technology represents a promising platform for rapid high-resolution genotyping of microorganisms. Traditional array-based resequencing methods have relied on the use of specific PCR-amplified fragments from the query samples as hybridization targets. While this specificity in the target DNA population reduces the potential for artifacts caused by cross-hybridization, the subsampling of the query genome limits the sequence coverage that can be obtained and therefore reduces the technique's resolution as a genotyping method. We have developed and validated an Affymetrix Inc. GeneChip® array-based, whole-genome resequencing platform for Francisella tularensis, the causative agent of tularemia. A set of bioinformatic filters that targeted systematic base-calling errors caused by cross-hybridization between the whole-genome sample and the array probes and by deletions in the sample DNA relative to the chip reference sequence were developed. Our approach eliminated 91% of the false-positive single-nucleotide polymorphism calls identified in the SCHU S4 query sample, at the cost of 10.7% of the true positives, yielding a total base-calling accuracy of 99.992%.
Multi-drug tolerance is a key phenotypic property that complicates the sterilization of mammals infected with Mycobacterium tuberculosis. Previous studies have established that iniBAC, an operon that confers multi-drug tolerance to M. bovis BCG through an associated pump-like activity, is induced by the antibiotics isoniazid (INH) and ethambutol (EMB). An improved understanding of the functional role of antibiotic-induced genes and the regulation of drug tolerance may be gained by studying the factors that regulate antibiotic-mediated gene expression. An M. smegmatis strain containing a lacZ gene fused to the promoter of M. tuberculosis iniBAC (PiniBAC) was subjected to transposon mutagenesis. Mutants with constitutive expression and increased EMB-mediated induction of PiniBAC::lacZ mapped to the lsr2 gene (MSMEG6065), a small basic protein of unknown function that is highly conserved among mycobacteria. These mutants had a marked change in colony morphology and generated a new polar lipid. Complementation with multi-copy M. tuberculosis lsr2 (Rv3597c) returned PiniBAC expression to baseline, reversed the observed morphological and lipid changes, and repressed PiniBAC induction by EMB to below that of the control M. smegmatis strain. Microarray analysis of an lsr2 knockout confirmed upregulation of M. smegmatis iniA and demonstrated upregulation of genes involved in cell wall and metabolic functions. Fully 121 of 584 genes induced by EMB treatment in wild-type M. smegmatis were upregulated (“hyperinduced”) to even higher levels by EMB in the M. smegmatis lsr2 knockout. The most highly upregulated genes and gene clusters had adenine-thymine (AT)–rich 5-prime untranslated regions. In M. tuberculosis, overexpression of lsr2 repressed INH-mediated induction of all three iniBAC genes, as well as another annotated pump, efpA. The low molecular weight and basic properties of Lsr2 (pI 10.69) suggested that it was a histone-like protein, although it did not exhibit sequence homology with other proteins in this class. Consistent with other histone-like proteins, Lsr2 bound DNA with a preference for circular DNA, forming large oligomers, inhibited DNase I activity, and introduced a modest degree of supercoiling into relaxed plasmids. Lsr2 also inhibited in vitro transcription and topoisomerase I activity. Lsr2 represents a novel class of histone-like proteins that inhibit a wide variety of DNA-interacting enzymes. Lsr2 appears to regulate several important pathways in mycobacteria by preferentially binding to AT-rich sequences, including genes induced by antibiotics and those associated with inducible multi-drug tolerance. An improved understanding of the role of lsr2 may provide important insights into the mechanisms of action of antibiotics and the way that mycobacteria adapt to stresses such as antibiotic treatment.
Understanding the cellular processes stimulated when Mycobacterium tuberculosis is treated with antibiotics may provide clues as to why months of therapy and use of several drugs simultaneously are required to prevent antibiotic resistance. Antibiotic treatment “turns on” or induces certain M. tuberculosis genes. These genes are of special interest because they appear to help M. tuberculosis survive the stress of antibiotic treatment. Our study of the regulation of antibiotic-induced genes, including iniBAC, in two mycobacterial species revealed that a small protein called Lsr2 controls iniBAC and other antibiotic-induced genes, especially ones related to the cell wall. Lsr2 binds to DNA in a relatively non-specific manner and appears to inhibit certain enzymes that must interact with DNA as part of their function. These properties differentiate Lsr2 from classical regulators of gene expression that bind to specific DNA sequences, and suggest that Lsr2 is a novel histone-like protein. These proteins regulate genes by changing the way DNA is shaped, and, indeed, we found that Lsr2 can change the shape of DNA by introducing a small number of coils into its structure. Our results suggest that Lsr2 is a major regulator of antibiotic-induced responses in mycobacteria.
Mycobacterium tuberculosis strains contain different genomic insertions or deletions called large sequence polymorphisms (LSPs). Distinguishing between LSPs that occur one time versus ones that occur repeatedly in a genomic region may provide insights into the biological roles of LSPs and identify useful phylogenetic markers. We analyzed 163 clinical M. tuberculosis isolates for 17 LSPs identified in a genomic comparison of M. tuberculosis strains H37Rv and CDC1551. LSPs were mapped onto a single-nucleotide polymorphism (SNP)-based phylogenetic tree created using nine novel SNP markers that were found to reproduce a 212-SNP-based phylogeny. Four LSPs (group A) mapped to a single SNP tree segment. Two LSPs (group B) and 11 LSPs (group C) were inferred to have arisen independently in the same genomic region either two or more than two times, respectively. None of the group A LSPs but one group B LSP and five group C LSPs were flanked by IS6110 sequences in the references strains. Genes encoding members of the proline-glutamic acid or proline-proline-glutamic acid protein families were present only in group B or C LSPs. SNP- versus LSP-based phylogenies were also compared. We classified each isolate into 58 LSP types by using a separate LSP-based phylogenetic analysis and mapped the LSP types onto the SNP tree. LSPs often assigned isolates to the correct phylogenetic lineage; however, significant mistakes occurred for 6/58 (10%) of the LSP types. In conclusion, most LSPs occur in genomic regions that are prone to repeated insertion/deletion events and were responsible for an unexpectedly high degree of genomic variation in clinical M. tuberculosis. Group B and C LSPs may represent polymorphisms that occur due to selective pressure and affect the phenotype of the organism, while group A LSPs are preferable phylogenetic markers.
We used Porphyromonas gingivalis gene microarrays to compare the total gene contents of the virulent strain W83 and the avirulent type strain, ATCC 33277. Signal ratios and scatter plots indicated that the chromosomes were very similar, with approximately 93% of the predicted genes in common, while at least 7% of them showed very low or no signals in ATCC 33277. Verification of the array results by PCR indicated that several of the disparate genes were either absent from or variant in ATCC 33277. Divergent features included already reported insertion sequences and ragB, as well as additional hypothetical and functionally assigned genes. Several of the latter were organized in a putative operon in W83 and encoded enzymes involved in capsular polysaccharide synthesis. Another cluster was associated with two paralogous regions of the chromosome with a low G+C content, at 41%, compared to that of the whole genome, at 48%. These regions also contained conserved and species-specific hypothetical genes, transposons, insertion sequences, and integrases and were located adjacent to tRNA genes; thus, they had several characteristics of pathogenicity islands. While this global comparative analysis showed the close relationship between W83 and ATCC 33277, the clustering of genes that are present in W83 but divergent in or absent from ATCC 33277 is suggestive of chromosomal islands that may have been acquired by lateral gene transfer.
The Mycobacterium tuberculosis alternate sigma factor, SigF, is expressed during stationary growth phase and under stress conditions in vitro. To better understand the function of SigF we studied the phenotype of the M. tuberculosis ΔsigF mutant in vivo during mouse infection, tested the mutant as a vaccine in rabbits, and evaluated the mutant's microarray expression profile in comparison with the wild type. In mice the growth rates of the ΔsigF mutant and wild-type strains were nearly identical during the first 8 weeks after infection. At 8 weeks, the ΔsigF mutant persisted in the lung, while the wild type continued growing through 20 weeks. Histopathological analysis showed that both wild-type and mutant strains had similar degrees of interstitial and granulomatous inflammation during the first 12 weeks of infection. However, from 12 to 20 weeks the mutant strain showed smaller and fewer lesions and less inflammation in the lungs and spleen. Intradermal vaccination of rabbits with the M. tuberculosis ΔsigF strain, followed by aerosol challenge, resulted in fewer tubercles than did intradermal M. bovis BCG vaccination. Complete genomic microarray analysis revealed that 187 genes were relatively underexpressed in the absence of SigF in early stationary phase, 277 in late stationary phase, and only 38 genes in exponential growth phase. Numerous regulatory genes and those involved in cell envelope synthesis were down-regulated in the absence of SigF; moreover, the ΔsigF mutant strain lacked neutral red staining, suggesting a reduction in the expression of envelope-associated sulfolipids. Examination of 5′-untranslated sequences among the downregulated genes revealed multiple instances of a putative SigF consensus recognition sequence: GGTTTCX18GGGTAT. These results indicate that in the mouse the M. tuberculosis ΔsigF mutant strain persists in the lung but at lower bacterial burdens than wild type and is attenuated by histopathologic assessment. Microarray analysis has identified SigF-dependent genes and a putative SigF consensus recognition site.