|Home | About | Journals | Submit | Contact Us | Français|
Mass spectrometry-based proteomics is a powerful analytical tool for investigating pathogens and their interactions within a host. The sensitivity of such analyses provides broad proteome characterization, but the sample-handling procedures must first be optimized to ensure compatibility with the technique and to maximize the dynamic range of detection. The decision-making process for determining optimal growth conditions, preparation methods, sample analysis methods, and data analysis techniques in our laboratory is discussed herein with consideration of the balance in sensitivity, specificity, and biomass losses during analysis of host-pathogen systems.
The elucidation of critical functional pathways employed by pathogens and hosts during an infectious cycle is both challenging and central to our understanding of infectious diseases. In recent years, mass spectrometry–based proteomics has been used as a powerful tool to identify key pathogenesis-related proteins and pathways. Despite the analytical power of mass spectrometry–based technologies, samples must be appropriately prepared to broadly characterize the functions of interest (e.g., host-response to a pathogen or a pathogen-response to a host). The preparation of these protein samples often requires consideration about what aspect of infection is being studied, and may require the isolation of either host and/or pathogen cellular material. This is especially true of viruses and bacteria which have no host-free model culturing conditions. The decision-making process for determining growth conditions, preparation methods, sample analysis methods, and data analysis techniques is discussed in the context of sample quantities and recovery. Specifically, we discuss strategies we have employed recently to extract proteins from different pathogen-host systems including Salmonella enterica serovars Typhimurium and Typhi, Vaccinia virus, Monkeypox virus, Anaplasma phagocytophilium, and Ehrlichia chaffeensis. Using different cellular preparations, partitioning of proteins, protein digestion protocols, and fractionation of resultant peptides, we were able to tailor our methods to different experimental goals so as to identify as many proteins and peptides as possible even from limited sample sizes. Tradeoffs in sensitivity, specificity, and potential loss of context must be considered during analysis of host-pathogen systems, and has implications for the discovery of novel treatments and/or therapeutics for existing and emerging pathogens. This article is focused on obtaining proteomics information regarding host-pathogen interactions and systems biology information regarding those interactions utilizing MS-based proteomics.
Unless otherwise stated, all reagents were purchased from Sigma-Aldrich (St. Louis, MO) or one of its subsidiaries. Reagents used: urea; thiourea; 3-[(3-cholamidopropyl) dimethylammonio]-1-propanesulfonate (CHAPS); DL-dithiothreitol (DTT); CaCl2; Rapigest SF (Waters, Milford, MA); ammonium bicarbonate; ammonium formate; acetonitrile and methanol (ThermoFisher Scientific, Fair-lawn, NJ); trifluoroacetic acid (TFA); sequencing grade modified trypsin (Promega, Madison, WI); and formic acid. The bicinchoninic acid (BCA) and Coomassie Plus protein assays (Pierce Biotechnology, Rockford, IL) were used to quantify protein and peptide concentrations. Deionized water was purified through a Nanopure water system (Barnstead International, Dubuque, IA).
The monkeypox and vaccinia viruses were extracted from HeLa cells cells as described in the literature.1 In brief, monkeypox and vaccinia viruses were grown in HeLa cells and purified using sucrose gradient ultracentrifugation.
E. chaffeensis Arkansas2 and A. phagocytophilum HZ3 strains were cultivated in the human leukemia cell line HL-60 in RPMI-1640 medium containing 10% heat-inactivated fetal bovine serum (US Biotechnologies, Parker Ford, PA) and 2% L-glutamine (Invitrogen, Carlsbad, CA) at 37°C in a humidified atmosphere of 5% CO2/95% air.4 No antibiotic was used throughout the study. The degree of bacterial infection in host cells was assessed by Diff-Quik staining of cytocentrifuged preparations (Baxter Scientific Products, Obetz, OH). Host cell-free bacteria were prepared from heavily infected host cells (>95% infected cells) by sonication for 10 s with an ultrasonic processor W-380 (Heat Systems, Farmington, NY) at an output setting of 2. After low-speed centrifugation at 700 × g to remove nuclei and unbroken cells, the supernatant was filtered through 5-μm then 0.8-μm filters (Millipore, Billerica, MA) to remove cellular debris. The filtrate was then centrifuged at 10,000 × g for 10 min, and the pellet enriched with host cell-free bacteria was collected.
Salmonella enterica serovar Typhimurium (S. Typhimurium) were isolated from RAW 264.7 macrophage cells as described in the literature.5 In brief, RAW 264.7 macrophage cells were infected with S. Typhimurium and purified using one of two methods. One is a gentler, mechanical method (no detergent) that is focused on obtaining organelles as well as the Salmonella-containing vacuoles (SCV). The second method utilizes detergents and differential solubilization, taking advantage of the detergent resistance of S. Typhimurium cells in comparison to the macrophages.
A global digest was performed on the whole cell lysate for each sample of interest. The sample material was resuspended in an equal volume of 100 mM NH4HCO3 buffer. The resulting suspension (in 200-μL aliquots) was then transferred to a 2.0-mL cryovial (with O-ring in cap), and 0.1-mm zirconia/silica disruption beads (BioSpec Products, Bartlesville, OK) were added to equal approximately half of the total volume in the tube. The tube was then vigorously vortexed for 30 sec. The tube was then cooled for 1 min at 4°C in a cold-block. The vortexing step was repeated five times, with the final cooling time being 5 min to reduce any possible aerosols that might contain pathogens. The solution was drawn off of the top of the beads and transferred to a 2.0-mL low-binding microcentrifuge tube. A 200-μL aliquot of buffer was added to the beads as a rinse; the tube was briefly vortexed and cooled for 1 min. The rinse was drawn off of the beads and transferred to the microcentrifuge tube containing the original lysate. The rinse step was performed four to six times until the rinse solution was clear. A protein assay (either BCA or Coomassie Plus) was performed on the resultant lysate, and the volume was noted. Urea and thiourea were added to the sample to obtain a solute concentration of 7 M and 2 M, respectively. A 50-mM solution of DTT was used to obtain a 5 mM concentration in the sample. The sample was incubated at 60°C for 30 min to assist with the denaturation and reduction of the proteins. The sample was then diluted 10-fold with 100 mM NH4HCO3 buffer, trypsin was added in a 1:50 (w:w) enzyme:protein ratio, and CaCl2 was added to a final concentration of 1 mM. The sample was then incubated for 3 h at 37°C, and then was quick frozen to stop the digestion. The sample was thawed and solid phase extraction (SPE) cleanup was performed to prepare the sample for mass spectrometry analysis. A Discovery C-18 SPE column (Supelco, Bellefonte, PA) was used for each sample. The column was conditioned with 2 mL methanol and 3 mL of 0.1% TFA in water. After the sample was introduced onto the column, it was rinsed with 4 mL of 95:5 water:acetonitrile with 0.1% TFA. The sample was eluted with 80:20 acetonitrile:water with 0.1% TFA, concentrated in a Savant Speed-vac (ThermoFisher, Milford, MA) to approximately 100 μL, and a BCA protein assay was performed to determine the final sample concentration.
A global protein lysate can be partitioned into a two parts—a soluble and an insoluble protein fraction. To this end, the lysates were prepared as for the global digest, with the exception that 50 mM NH4HCO3 buffer was added during the final rinse step. The samples were centrifuged at 355,000 × g at 4°C for 10 min using a Beckman Optima TL ultracentrifuge (Beckman Coulter, Fullerton, CA). The supernatant was removed and saved for the soluble digest (see below). A 100- to 200-μL aliquot of 50 mM NH4HCO3 buffer was added to the pellet, and the pellet was resuspended with vigorous pipetting. Another centrifugation at 355,000 × g (at 4°C for 10 min) was performed, and the supernatant was added to the previously collected soluble fraction. The pellet was resuspended in a 100–200-μL aliquot of 50 mM NH4HCO3 buffer with vigorous pipetting. A BCA protein assay was performed on the suspension, and the sample was centrifuged again as above. The supernatant was discarded and 100–200 μL of a denaturing solution (7 M urea, 2 M thiourea, 1% CHAPS in 50 mM NH4HCO3) was added to the pellet. A 100-mM solution of DTT was used to obtain a 10-mM concentration in the sample. The sample was then incubated at 60°C for 30 min to assist with the denaturation and reduction of the proteins. The sample was then diluted and enzymatically digested in the same manner as the global digest, with the exception that the buffer used for dilution was 50 mM NH4HCO3. The sample was thawed and an SPE cleanup was performed to prepare the sample for mass spectrometric analysis. A Discovery SCX SPE column (Supelco, Bellefonte, PA) was used for each sample containing CHAPS. Solvents are referred to as follows: Solvent A was 10 mM ammonium formate in 25% acetonitrile, pH 3.0 and Solvent B was 500 mM ammonium formate in 25% acetonitrile, pH 6.8. The column was conditioned with 2 mL methanol, 2 mL Solvent A, 2 mL Solvent B, 2 mL Solvent A, 2 mL water, and 4 mL Solvent A. After the sample was acidified to a pH of ~3 with 20% formic acid and centrifuged at 16,000 × g for 5 min, the supernatant was introduced onto the column and rinsed with 4 mL of Solvent A. The sample was eluted with 80:15:5 MeOH:H2O:NH4OH and concentrated in a Savant Speed-vac (Thermo Fisher, Milford, MA) to approximately 100 μL, and a BCA protein assay was performed to estimate the final peptide concentration.
The soluble protein fraction as acquired above was assayed for protein concentration (either BCA or Coomassie Plus) and the final volume was noted. The sample was then reduced, denatured, diluted, tryptically digested, and cleaned up using C-18 SPE in the same fashion as in the global digest method.
The global protein fraction tends to be depleted of insoluble and membrane proteins. One method to offer improved insoluble and membrane protein coverage in a proteome analysis or to solubilize viral particles is through the use of an acid-labile sufactant, such as Rapigest SF. To perform a digestion using Rapigest, the sample material was resuspended in 2× its volume with 100 mM NH4HCO3, transferred to a 2.0-mL cryovial (with O-ring in cap), and the final volume noted. Rapigest SF surfactant was diluted to 1% using 100 mM NH4HCO3, then added so that the final concentration in the sample was 0.1%. The sample was then placed in boiling water for 5 min, followed by immediate quenching in ice for 5 min. A BCA protein assay was performed on the lysate, and volume was noted. Trypsin was added at a 1:50 enzyme:protein (w:w) ratio, and the sample was incubated for 1 h at 37°C to digest. Two percent TFA was added to the sample in small aliquots until the pH of the sample was ~3.0. The sample was incubated at 37°C for 1 h, followed by flash freezing in liquid nitrogen to assist with precipitation of the now insoluble surfactant. The sample was thawed, centrifuged for 5 min at 16,000 × g, and the resultant supernatant was transferred to a microcentrifuge tube. The pH of the sample was raised to ~7 using 6% NH4OH, and a BCA protein assay was performed to estimate the final peptide concentration.
A description of the instrumentation and specifics of the high-performance linquid chromatography (HPLC-MS/MS) and HPLC-MS instrumental arrangements and associated methods for each biological system has been published elsewhere1,5–7 and is consistent for all experiments presented here. In brief, samples were loaded onto an in-house developed chromatography system that uses a 20-cm × 75-μm C18 reverse-phase column and ionized as they elute from the column into a mass spectrometer using electrospray ionization. The liquid chromatography gradient was generated linearly from aqueous to organic over 100 min in acidic conditions. Typically, MS was performed in an linear trap quadrupole (LTQ; Thermo Fisher Scientific) ion trap mass spectrometer. Tandem MS (MS/ MS) were collected using data-dependent settings on the top 10 ions from the precursor scan.
Tandem MS spectra (MS/MS) were matched to protein sequence files for the organism or host-pathogen system studied (Table 1) using SEQUEST.8 Filters for peptide identification scores were defined according to the acceptable false discovery rates in each system, based on a reversed database search1,6,7 or were consistent with those filters considered acceptable when using ion trap instruments.9,10
Well-designed proteomics studies can quickly result in novel insights into host-pathogen interactions. An influence diagram (Figure 1) is one way to consider the interrelated decisions of host-pathogen experiments. This tool helps identify the steps that may influence or impact others and to what degree. This balancing of the various pros and cons at each step of the experimental design for host-pathogen studies will be discussed, including the influences of those decisions. We will follow the diagram on the left side of Figure 1 as headers for the discussion below.
A thorough knowledge of the types of sample preparation methods available is essential for a successful experimental design. For instance, when comparing a global protein preparation versus a paired soluble/insoluble protein preparation, it should be noted that the global digest requires fewer sample handling steps. However, the soluble and insoluble preparations offer improved results with regards to possible cellular location and hydrophobicity of the proteins. If the goal of the proposed study is to understand the membrane associated protein changes during infection, then an insoluble preparation method should be chosen, keeping in mind that there will be an increase in preparation steps and possible sample losses. Another important consideration for sample preparation methods is the quantity of initial biological sample and total protein content. If the sample is limited, due to growth/extraction procedures, then a simpler preparation method may be necessary, such as an in-solution preparation without a cleanup step (e.g., organic denaturants, cleavable surfactants).1,11–13 If one is looking for specific proteins of interest, then isolation of those proteins by molecular weight (1D- or 2D-gel electrophoresis), followed by in-gel digestion14 or gel comparison,15 would be one possible path to follow.
To discover new proteins involved in host-pathogen interactions, experiments will typically be designed to compare the protein complement between different growth conditions, host-types, change in function mutants, or changes over a time course. These comparisons are used to differentiate between proteins of higher importance and those of lesser importance. However, the replicate analyses needed for good statistical determination of changes observed in comparison-based experiments can quickly overwhelm available resources (both physical and financial) and often results in limited returns. Obviously, these experiments must be extensively planned to avoid any costly errors and obtain quality data.
The items noted above have influenced the experimental design in several studies performed by our group. Figure 2 highlights the general growth conditions, the pros and cons for each condition, the challenges we experienced for the preparation of samples for each condition, and the organisms processed as well as the preparative methods utilized for each condition. Determining an experimental design to obtain the expected biological insight for an organism of interest is a challenge at many steps, and we discuss these in greater detail below.
The obvious first step of experimental design is determining which proteins are of interest and where they would possibly be located. For example, if the research question is to elucidate the host’s response to a pathogen, the proteins of primary interest are to be those isolated from the host cells during infection. However, analysis of the host’s interaction with the pathogen may actually suggest that a gentle isolation of the pathogen from host cells may be more valuable to identify which host proteins are actually interacting with the pathogen’s surface.16,17 If the sole interest is the pathogen, then the growth conditions of the pathogen must be considered. Many pathogens cannot be grown in free-living batch culture; they must be grown in a host cell (e.g., vaccinia virus, E. chaffeensis, A. phagocytophilium) or even, in the case of some uncultivable bacteria such as Trepenoma pallidum and Mycobacterium leprea, from a host itself. If the pathogen can be grown in culture, this enables the ability of generating large amounts of sample, but one drawback is that proteins expressed may not represent an infectious condition. Some pathogens have growth conditions that are thought to mimic at least some aspects of a host infection.6,7,18–20 Even though these growth conditions can offer useful insights it must be recognized that they are unlikely to completely explain the pathogen’s response to infecting a host. Once the subset of proteins to be studied is determined, logical decisions can be made regarding how the proteins are to be isolated. Often, the cells or virions of interest must be isolated, then alternate separations or fractionations (e.g., strong cation exchange) performed to isolate those proteins.1,5,12,21 Alternatively, methods such as gel electrophoresis or isoelectric focusing can be utilized to isolate the proteins as Rosenfeld, et al. and Han, et al. have demonstrated.14,15
Depending upon whether the host, pathogen, or host-pathogen pair is to be emphasized during analysis, there are multiple choices for growth conditions. The most difficult biomaterial, from a homogeneity and access aspect, is analysis of infected tissues, i.e., they must be extracted from a larger organism and the organs/cells of interest (where the pathogen would be infecting) may not be isolatable. While the use of tissue may be among the most biologically relevant models to study, the difficulty of obtaining the sample, lack of biological homogeneity at the organism and cellular level, and masking effects of surrounding healthy or uninfected tissues, suggests this approach requires especially careful planning,12,22,23 When possible, co-cultured pathogen and host-cells can offer a more controlled environment to study the effects of the pathogen on clonal host-cells. While this gives a slightly better “narrowing of focus” (tissue vs cell type), it does not perfectly reflect the infectious conditions in a live host. Additionally, while one might not expect to observe the pathogen proteins from tissue it becomes an important goal to observe pathogen proteins from co-cultured pathogen and host cells. If the proteins expressed in the pathogen are the primary goal then isolation of the pathogen from the co-cultured samples is a good approach for analysis of the samples, but the host proteins are likely to be major contaminants that provide challenges to complete coverage of the pathogen’s host-associated proteome. As an example, analysis of Salmonella isolated from macrophage cells demonstrated that despite enrichment of the bacterial pathogens a large number of abundant and less abundant host cell proteins were observed.5 In this multi-organism system, the bacterial cells were of similar size and shape to the host cell’s mitochondria (Figure 3), requiring harsher treatment for effective enrichment of Salmonella.
When available, cultured organisms provide a means of isolating pathogenic agents, measuring the response of the organism to stressors such as nutrient limitations and potential antimicrobial drugs, while obtaining a large biomass that allows for multiple experiments, including the possible addition of metabolomics and lipidomics analyses. However, culturing pathogens free of host proteins does require additional efforts to follow-up observations in more biologically relevant host systems because this growth may not be the same as those found in vivo. If the organism is well characterized, this can be a method for identifying protein pathways by performing mutant studies (e.g., knockouts) under controlled culture conditions. Some systems, such as E. chaffeensis and A. phagocytophilium, cannot be cultured in pure media and, thus either human cell lines (here, HL-60) or a mouse model must be used.
Additional considerations when deciding which growth condition to utilize can be sample size, complexity of preparation method, timecourse studies to be performed, host specificity, and downstream data analysis complications. Tissue samples can often require extremely complicated sample preparation methods, causing loss of proteins of interest, and might be extremely limited if acquired from, for example, a human biopsy or mouse model. Eukaryotic cell culture and growth of a pathogen with host cells can alleviate some of the sample preparation steps, but the biomass may still be limited in quantity, requiring a preparation method that may not be friendly to the analysis method preferred (e.g., detergents with LC-MS). Another consideration for studying pathogen and host together is the complexity of the downstream data analysis. When HeLa cells infected with Orthopox-viruses were studied, the potential host proteins outnumbered the viral by 49,161 to 218 (ratio of 225 to 1). Despite these odds, we identified 3 proteins not previously detected (unpublished) from isolated virus preparations using the sample preparation procedures and data analyses described here.1,21 For large batch culture, sample losses associated with more elaborate sample preparation steps do not hinder downstream analysis because of the larger quantity of initial sample. Furthermore, data analysis and interpretation is simplified in a culture system because the number of protein candidates is reduced by an order of magnitude (Table 1, Salmonella in culture versus Salmonella and host macrophage). Unfortunately, not all pathogens (i.e., viruses and obligate pathogens) can be grown in host-free batch cultures to generate the amount of biomass needed to perform a thorough method comparison. In this instance, it is best to perform a more targeted comparison with more modest experimental goals. Regardless, once the most appropriate sample(s) to collect is determined, a sample preparation method must be developed to optimize the available sample preparation, MS-based proteomics platform and data analysis tools.
Once the biomass (sample) has been generated, preparation of the sample for instrumental analysis is the next crucial step. The decision of which preparation method used is based upon factors from both growth conditions as well as sample analysis method and data analysis to be performed. For example, restricted sample size could limit your options to in-solution preparations (organic denaturants, such as methanol, acetonitrile or trifluoro-ethanol or acid-labile surfactants such as Rapigest SF) without cleanup steps (SCX, SPE, or desalting column) or gel electrophoresis (extracting proteins from the gel bands or comparing 2D gels). Location of the protein(s) of interest will influence which preparation method is utilized as well (subcellular fractionation, organelle isolation). Often, the resolution needed in the downstream sample analysis steps requires a preparation method that simplifies the complexity of the proteins/peptides in the system (e.g., 2D HPLC fractionation,24 offgel electrophoresis, isolation of cysteine-containing peptides.25 The instrumentation used downstream can often limit the types of chemicals utilized in the preparation methods. For instance, one rarely chooses to use sodium dodecyl sulfate (SDS) or other ionic detergents to prepare samples for LC-MS analysis due to the detergent precipitating either inside the separation column or out on the electrospray tip or heated capillary, causing clogging of the system. Sample preparation steps can be sensitive to changes in buffer, temperature, etc., therefore prior knowledge of “critical points” in the method(s) to be used is essential before making major changes. Often, methodology changes can be tested on a non-pathogenic system similar to the desired target sample to determine the outcome of modifications.
While one might hope to discover a “perfect” single preparation and analysis method, the truth is that to truly understand the protein complement of the system of interest, multiple methodologies will be necessary. When S. Typhimurium were analyzed from batch cultures, we were able to perform comparisons between global, soluble, insoluble, and Rapigest digests as noted in the methods and materials section.7 This allowed us to develop a comprehensive database of peptides observed. Unfortunately, many facilities may not have the resources for high-througput analyses, therefore generation of large quantities of samples may not be feasible. However, limited sample quantities and operations within a Biosafety level (BSL) 3 facility required selecting a single preparation method meant to produce a broad set of protein observations through analyzing viral samples (Orthopoxvirus).1 When E. chaffeensis and A. phagocytophilium were initially studied in our laboratory, global, soluble, and insoluble preparations were compared on pathogens isolated from host cells (the only way the pathogen can be grown). Those processing methods were not successful when the pathogens and host cells together were studied. Therefore we experimented by adding various detergents into the global digestion method (normally devoid of detergents). Using a 1%–4% addition of CHAPS (a zwitterionic detergent that can be successfully removed before MS analysis) into the denaturing solution enabled successful digestion of the pathogen + host cell samples. The addition of 1% CHAPS assisted the tryptic digestion of proteins, but not as well as addition of 2% or 4% CHAPS concentration (which were very similar when compared on a SDS-PAGE gel to determine the amount of undigested protein). Major factors to consider when comparing methods for different systems is ease of use, amount of sample handling, conversion for biosafety procedure use (as most pathogens are BSL2 or higher), highest protein/peptide recovery, and type of proteins recovered. The level of sample handling and ease of use usually went hand-in-hand as the fewer steps used in a procedure, the fewer sample transfers occurred, and less sample loss was observed. It is also safer when working with BSL2 or higher organisms to limit sample handling steps and reduce possible aerosolization of pathogen lysates. Traditional bead beating methods (used to lyse prokaryotes) can produce heated aerosols of the pathogens due to the high speed of shaking. We converted the traditional method to a vortexing step with beads in solution, followed by chilling to precipitate aerosols. Fewer sample preparation steps can also lead to higher processing rates when faced with larger batches of samples. However, the simplest procedure that requires the fewest sample handling steps may result in a poor protein yield. It is for this reason that when sample is available, a trial systematic sample preparation comparison is advised especially when planning a series of experiments that may build upon each other.
Localization of specific proteins within a cell can be extremely helpful with identifying protein generation or transport mechanisms.26,27 In bacterial systems, membrane proteins are often separated from other soluble proteins via centrifugation in order to isolate transport proteins. When studying viruses, it is often difficult to collect specific proteins due to the growth state of the virion within the host cell (e.g., intracellular mature virus, intracellular-, cell-associated-, and extracellular-enveloped virus). In those cases, the virus must often be collected and purified at specific growth stages, then analyzed as a whole.1,28 Overall, these types of purifications can involve extremely complex sample preparation steps that will reduce the sample material by factors of 10 or more.29 The same can be said for studying posttranslational modifications as recoveries from these preparations are often in the single-digit percentile (e.g., phospho- and glyco- proteins).30 However, the improvement of quality of information for specific proteins may outweigh the loss of overall material, depending upon the research question. Reducing complexity can sometimes result in an enhanced “signal to noise” ratio in the experiment. Orthogonal separations of proteins or peptides from pathogens can result in identification of proteins that might be lost in the noise of more prevalent proteins/peptides. Looking at sample location and posttranslational modifications can narrow the focus to identifying the mechanisms either the host or pathogen uses for recognition or signaling, but again the researcher has to weigh the loss of information from sample preparation losses as well.
Sample analysis platforms are as varied as sample preparation methods. However, due to the complexity of the instrumentation and systems, specialized staff are required to keep the equipment running in an accurate and efficient manner. Often, when a researcher is deciding what type of preparation method to use, they have to first consider to which instruments they have access. Often the sensitivity and specificity of the platform are two of the most important factors in deciding how much sample needs to be collected or generated and then prepared for analysis. The selection of instrumentation may therefore be dependent upon the sample size, complexity, and associated contaminants, and if there is limited sample, it may limit the researcher to one platform.
When analyzing any pathogenic system in our facility using the accurate mass and time tag approach,31–33 systems of differing sensitivity are utilized. Often, two-dimensional separations are performed, with strong-cation exchange being performed offline, and a high resolution reversed separation coupled to the MS instrument. The samples may be grossly characterized using LC34 coupled to linear ion-trap systems (e.g., linear trap quadrupole (LTQ) systems, Thermo Fisher Scientific), then subsequently in finer detail and with replicates on a high-resolution Fourier-transform ion cyclotron resonance (FTICR) (Bruker and Thermo Fisher Scientific) or LTQ-Orbitrap (Thermo Fisher Scientific) system .1,5,19,21
In the case of cultured bacteria in rich or minimal media, the protein sequence file generally used for identifying peptides, and ultimately proteins, is the latest genome annotation for the organism and one that ideally includes references to other information such as protein function and location. Unfortunately, not every bacterium subject to study has been sequenced. Connecting to databases of functional information is also difficult when a protein requires cross-referencing to a different protein identifier. However, once an annotation is found, searching MS/MS spectra is relatively well developed and can be automated using tools such as SEQUEST, X!Tandem, MASCOT, and others.8,35,36 The proteomic analysis can also be critical to the identification of correctly annotated proteins by considering all open reading frames as potentially translated proteins.37
For pathogens grown in or extracted from host cells, a protein search data file with a combination of both the host proteins and the pathogen proteins should be used. Use of a protein data file that contains only the host or the pathogen can lead to misidentifications, based on identifications that pass the scoring thresholds used for that software tool, but that spectrum would have had a much better “score” if the true sequence was present and identified against the complete protein data file. The increase in size of these combined data files does result in longer search times and a different set of problems for potential false identifications due to the “distraction” from the total number of sequences that match a particular mass range (determined from the mass of the precursor ion used for fragmentation) being increased as compared to single-organism files. Distinguishing a true sequence from closely scoring matches can be more challenging. Peptide score filter sets may need to be altered to reduce false discovery, and reversed or decoy searches can be used to define these filters.38–41
Mammalian host genomes have the additional challenge of high sequence redundancy of peptides often with multiple possible protein parents. This redundancy results from regions of different proteins, alleles, or splice variants that have sequence regions with identity or high similarity. The peptides that result from a protein digestion may then be identical with the original source protein being difficult if not impossible to determine whether it arises from one or another variant, orthologue or paralogue. Programs such as ProteinProphet42 attempt to assign group identifiers to families of proteins that have multiple redundant and some unique peptide sequences.
Multiple methods have been developed to perform relative quantification across proteomics samples. Two examples that do not require isotopic labeling of peptides are discussed here. The first method uses MS/MS only information, and uses the number of assigned peptide spectra, aka spectral counting, to obtain an approximate relative abundance value for a protein. It is assumed that a protein of high abundance will have a greater number of spectra that match to its peptides. An alternate approach uses high resolution-high mass measurement accuracy MS information for an identified peptide and either the apex abundance or the area under the elution profile is used as an abundance value for the identified sequence.
Either method requires some normalization or scaling in order to compare results across different biological conditions. The approach used depends on the type of information (spectra counts or abundances), the range of values, the number of conditions being compared, and the number of missing values (conditions where the peptide or protein was not observed).
The data obtained through this multistep process can then be used in systems biology applications to assist in identifying how the organism survives and is transmitted in the host-pathogen system. It can also be utilized for further experimental design. Many times, the data is utilized to identify potential biomarkers for detection of the pathogens (e.g., possible preventatives for bioterrorism instances, or use in clinical applications).19,43–45 The ultimate goal for many researchers studying pathogens is to find a potential cure or novel immunization method to stop the spread of the disease. In the case of our laboratory, identification of proteins and the pathways used to generate the proteins from the pathogenicity islands was the goal. The next step is to share them with collaborators and the community, to identify and study methods required to disrupt those pathogenic functions.
Investigating host-pathogen interactions is vital to developing treatments that target either the pathogen itself or alternatively various host mechanisms that permit infection.46 Proteomic studies involving cultured and isolated pathogens have already provided valuable insight regarding the lifestyle of intracellular infections.43,47–50 However, in order to fully use the potential for mass spectrometric-based proteomics methods, such as shotgun proteomics, protein recovery should be maximized, and a combination of pure culture and mixed cultures may be used to target proteins involved in particular pathways of infection or host-pathogen interactions.
Due to limited sample quantity, what we compromised for sensitivity, specificity, and potential loss of context was minimized by combinations of techniques to obtain the highest quality data with the smallest amount of sample. Comparing differing preparation methods with peptide fractionation on samples from monkeypox virus (both purified virion and host + pathogen lysate), we were able to identify three viral proteins not previously detected in the purified viral particles (unpublished). These identifications were made from the host-pathogen sample, in which fewer viral proteins were identified when compared to a purified virion sample. However, when not limited by sample quantity, valuable added information is often gained as to the most appropriate preparation method, background database production, and analysis methods (instrumental and data).7,19,44
Well-executed experimental designs are key to successful investigations for proteomic analysis of pathogens. Identification of the proteins of interest, planning of appropriate growth conditions, optimal sample preparation techniques, a fitting sample analysis platform, and applicable data analysis will lead to improved understanding of the studied organism. Rigorous application of well-designed experiments will drive the knowledge base for the development of treatments and/or therapeutics. This is observed in our laboratory with the identification of a novel protein that contributes to S. Typhimurium replication inside macrophages.5 This protein has a predicted function of biosynthesis and modification of the peptidoglycan layer of the cell wall. However, when a mutant was generated deficient in this protein, a dramatic reduction in the ability of the bacteria to infect the macrophage was observed, thus leading to the conclusion that this protein is essential to the infectivity of the pathogen. Encouraging results such as this lead us to emphasize the importance of careful experimental design in all of our research. Finally, we note that all the methods and technologies applied in this research are active fields of investigation, and future significant improvements in all aspects of such proteomics applications are to be anticipated.
We would like to thank Kim Hixson for critical reading of this manuscript. Portions of the research described in this paper were supported by the Environmental Molecular Sciences Laboratory, the National Institute of Allergy and Infectious Diseases NIH/DHHS through interagency agreement Y1-AI-4894-01, the NIH National Center for Research Resources (RR018522), and the U. S. Department of Energy Office of Biological and Environmental Research. The Environmental Molecular Sciences Laboratory is a U.S. Department of Energy national scientific user facility located at Pacific Northwest National Laboratory.