|Home | About | Journals | Submit | Contact Us | Français|
A major scientific challenge at the present time for cancer research is the determination of the underlying biological basis for cancer development. It is further complicated by the heterogeneity of cancer's origin. Understanding the molecular basis of cancer requires studying the dynamic and spatial interactions among proteins in cells, signaling events among cancer cells, and interactions between the cancer cells and the tumor microenvironment. Recently, it has been proposed that large‐scale protein expression analysis of cancer cell proteomes promises to be valuable for investigating mechanisms of cancer transformation. Advances in mass spectrometry technologies and bioinformatics tools provide a tremendous opportunity to qualitatively and quantitatively interrogate dynamic protein–protein interactions and differential regulation of cellular signaling pathways associated with tumor development. In this review, progress in shotgun proteomics technologies for examining the molecular basis of cancer development is presented and discussed.
Despite recent advances in the diagnosis and treatment of cancer, tumor cell progression and metastasis continue to be a major cause of morbidity and mortality in cancer patients. Understanding the molecular and cellular mechanisms that contribute to tumor formation, progression and metastasis has been a major challenge in cancer research. For many years, progress in cancer research has been dominated by the concepts and methods of molecular genetics, but advances in technologies for protein analysis over the last ten years have accelerated studies of cancer biology at the protein level. It is clear from molecular genetics that mutations alter the “command and control” programs of the cell to accelerate growth and inhibiting cell death. Genomic instability further contributes to functional plasticity that allows cancer cells to adapt to environments, increase cell motility, and metastasize. To understand the molecular changes that create these phenotypic changes proteomic methods are now being used to study alterations in protein expression, modifications, and enzyme activity (Olsen et al., 2006; Liu et al., 1999) that accompany malignant changes. By identifying key proteins and the regulatory changes that accompany them further insight will be gained into the evolutionary process of malignant cells evolving new functions and phenotypes.
A cancer cell like other cells is complex and has a range of protein abundances. Cancer cells have altered both transcriptional programming and post‐translational protein modifications to gain advantage in both growth and motility (Albertson et al., 2003; Birchmeier et al., 2003). To determine changes for both protein expression and post‐translational modifications on a global or pathway level requires sophisticated technology. Proteomic technologies have been rapidly advancing to meet the challenges posed in biology by increasing the scale of protein identification and protein quantification (both relative and absolute), and enrichment and identification of post‐translational modifications. In particular, precise quantitative measurements are key to understanding relationships between normal cellular biology and the aberrant biology observed in cancer cells. Discovery of modification changes is being broadened beyond the study of modification changes associated with phosphorylation to better understand the roles of ubiquitin, sumoylation, acetylation and methylation in loss of growth control.
The ability to identify proteins and changes in their abundance that correlate with disease progression is fundamental for exploitation of proteomics for studying cancer biology. However, a major limitation to molecular profiling of cancer cells continues to be the challenge of biological complexity. During the last decade, the field of proteomics has evolved rapidly to overcome technical limitations to improve the dynamic range and reproducibility of large‐scale proteomic analyses. In particular, advances in mass spectrometry instrumentation and bioinformatics for “bottom‐up” or “shotgun” proteomic approaches have been at the core of new developments to meet the challenge of biological complexity. These and other methods are striving to provide a more comprehensive view of the multi‐factorial processes occurring during cancer progression.
Several sources have been explored to generate valid comparisons for studying cancer proteome: cancer cell lines, human tissues, and body fluids such plasma (Table 1).
A comparison of biological samples available for studying cancer proteome
A variety of cancer cell lines have been established to mimic different stages of cancer progression. They are widely used in many aspects of cancer research and particularly as in vitro models. There are several advantages to using the cell culture system to study cancer progression. They are easy to handle and represent an unlimited self‐replicating source which yields a large amount of biological material for proteomic studies. Second, cell lines exhibit a relatively high degree of homogeneity and can be manipulated genetically for mechanistic studies. However, there are disadvantages. Cell lines are prone to genotypic and phenotypic drifts during the process of cell culture. Subpopulations may arise and cause phenotypic changes over time by the selection of specific, more rapidly growing clones within the population (Osborne et al., 1987).
Proteomic technologies have been used in clinical investigation using human tissues. Comparisons between normal and cancer tissues can be performed using the entire section of the tissue (Chaurand et al., 2006, 1999, 2006, 2005) or laser capture microdissection of tissues (Baker et al., 2005; Zang et al., 2004; Xu et al., 2002). The ability to simultaneously and comprehensively examine changes in large numbers of proteins in the context of disease or other changes in physiological conditions allows better understanding for disease progression and novel therapeutic discoveries. However, the amount and availability of human tissues can be limiting factors for clinical proteomic studies. Also, inherent variations among individual patients require a great deal of biological and technical replications for reproducibility. Therefore, studies using human tissues must be carefully designed to differentiate true clinical differences in protein expression from variation in sample collection, variation in experimental condition, and normal biological variability (Clarke et al., 2003).
Using proteomic approaches to define protein profiles/pattern of easily accessible body fluids such as urine, saliva and, plasma has been explored as new avenues to distinguish cancer from non‐cancer patients (Roelofsen et al., 2007; Hu et al., 2007; de Noo et al., 2006; Anderson et al., 2004; Anderson and Anderson, 2002). Plasma, in particular, represents an attractive source for disease biomarker discovery and early disease prediction. However, characterizing the proteome of biological fluids presents significant challenges due to extreme complexity and large dynamic range in protein concentrations. For example, peptides derived from the most abundant proteins with similar mass/charge ratios and retention times as the lower abundant peptides can cause significant ion suppression and mask ion signals from low abundant species. Therefore, considerable improvement in immunoaffinity depletion and various fractionation strategies in combination with the new LC/MS platforms have been employed to increase the coverage and sensitivity for characterizing the plasma proteome (Pieper et al., 2003; Qian et al., 2006; Liu et al., 2006; Lee et al., 2006).
Two fundamental strategies have begun to evolve for the analysis of complex protein mixtures. The first strategy, referred to as differential proteomics, is best illustrated by two‐dimensional gel electrophoresis (2‐DGE). Proteins from two different states (at least two) are separated by 2‐DGE and differences determined. Only those proteins that are unique to one state or have increased or decreased in abundance are then pursued for identification. Identification often involves removing the protein or “spot” from the gel, proteolytic digestion and mass spectrometry analysis (Hamdan and Righetti, 2003; Gorg et al., 2004). Clearly this approach benefits from high‐resolution separations, a large dynamic range, and reproducibility. Improvements in the technology have resulted from the use of different “color” fluorescent dyes to allow a single gel comparison of different states (Unlu et al., 1997; Righetti et al., 2004; Marouga et al., 2005). Two‐dimensional difference gel electrophoresis (DIGE) is similar in concept to a microarray analysis and creates substantial improvements in reproducibility because both states are subjected to the same separation conditions. A non‐gel approach to this type of comparison has been developed by Yoo et al. (2007). Rather than use a gel to separate proteins, two stages of liquid chromatography that mimic the 2‐DGE separation by involving chromatofocusing and reversed‐phase separation are used (Wall et al., 2000). Fractions are collected in 96‐well microtiter plates that are more convenient for downstream analytical methods. Again, two states are compared and only those fractions representing differences (by UV absorption) are further analyzed by mass spectrometry (Buchanan et al., 2005). In both systems the process works best when high‐resolution separation is achieved which improves dynamic range and then differences can be better compared.
In contrast to differential proteomics the second strategy, shotgun proteomics, digests proteins to peptides and then compares identities or patterns of peptides. Peptides are then used as a surrogate to measure the presence or abundance of proteins. Technologically, peptides are easier to identify using mass spectrometry based techniques than intact proteins although the concept of “top down” protein analysis using mass spectrometry is improving rapidly (Kelleher and Hicks, 2005; Du et al., 2006). A comparison between the gel‐based (1‐DGE and 2DGE) and LC–MS/MS‐based (shotgun proteomics) is illustrated in Figure 1.
Comparison between gel‐based proteomics and LC–MS/MS‐based proteomics.
Shotgun proteomics consists of four distinct steps. The first step, and perhaps one of the most critical, is preparation of the sample. The next step is separation of peptides and then analysis by mass spectrometry and the final step is informatics to analyze the mass spectrometry data and assess the results.
Inefficient digestion of the protein mixture can result in poor and irreproducible data collection, and thus efficient and complete denaturation is essential to insure comprehensive digestion. Furthermore, proteins that are typically difficult to solubilize in aqueous buffers such as membrane proteins can also be a challenge to digest. Proteins are often solubilized and denatured using chaotropes such as urea or detergents. Trypsin or a two‐step digestion process with endoproteinase LysC followed by trypsin is frequently used (Levander et al., 2004; Link et al., 1999; Washburn et al., 2001). An advantage to the shotgun proteomic method is the ability to better identify and analyze membrane proteins (Abe et al., 2004). Two basic strategies have been employed for membrane proteins. The first maintains the membrane proteins in the lipid bilayer and digests the exposed regions using either a specific protease such as trypsin or a non‐specific protease like Proteinase K. When Proteinase K is used, a high pH is employed to linearize the lipid bilayer (Howell and Palade, 1982) to improve exposure of the membrane protein to the protease. Organic solvents have also been used to solubilize membrane proteins in the absence of the lipid bilayer (Blonder et al., 2002). The use of traditional detergents such as SDS has been a long standing problem because of compatibility with liquid chromatography and mass spectrometry, but new mass spectrometry compatible detergents have alleviated many of the problems (Klammer and MacCoss, 2006). Each type of sample (e.g. cells, tissues) may require some optimization of the digestion process to insure the digestion is complete.
Shotgun proteomic strategies convert a complex protein mixture to an even more complicated peptide mixture. In general the complexity conversion factor is about 40×, that is, on average a protein will digest to yield ~40 peptides. It is important, however, to maintain sensitivity at the ionization source of the mass spectrometer and this often requires minimizing the elution volume as electrospray ionization is a concentration dependent process. A simple rule of thumb is to keep flow rates low and peptide peak widths narrow. To resolve complex peptide mixtures, high‐resolution HPLC separations are necessary to maximize peptide separation for acquisition of tandem mass spectra. Advances in HPLC technology such as ultra high pressure pumps allow the use of small diameter supports to affect higher resolution separation at higher speed and these advances improve our ability to resolve the increased complexity of digested protein mixtures (Motoyama et al., 2006). Improvements in resolution can also be obtained by using multi‐dimensional separations (LC/LC) (Link et al., 1999; Larmann et al., 1993; Opiteck and Jorgenson, 1997; Wei et al., 2005; McDonald et al., 2002; Wolters et al., 2001). These can be performed either on‐line or off‐line of the mass spectrometer and frequently involve the use of strong cation ion exchange in combination with reversed‐phase chromatography (Lee et al., 2002). Multi‐phase peptide separation utilizes different properties of peptides such as charge and hydrophobicity to increase resolution (Liu et al., 2002). Alternate ion exchange strategies are emerging that use mixed bed ion exchange phases, such as a mixture of strong cation exchange and weak anion exchange (Motoyama et al., 2007). Thousands of peptides can be resolved by LC/LC/MS/MS that result in hundreds or thousands of protein identifications. Additional improvements are still needed but often there is a trade‐off between the time required to achieve greater resolution and the need for throughput. If more time is required for an experiment, then fewer experiments can be performed.
The mass spectrometer is a key component to convert peptides into knowledge about the peptide sequences present. A variety of mass spectrometry technologies are in use to analyze peptides, but tandem mass spectrometers are often used for peptide sequence identification. Two trends have been appearing in tandem mass spectrometers. The first trend has been the development of rapid scanning instruments such as the linear ion trap mass spectrometer. This capability has been useful for the analysis of complex peptide mixtures and the increased scan speeds have been important to take advantage of the higher resolving separations capable with ultra high pressure HPLC separations. Linear ion traps have greater ion capacities than the older three‐dimensional ion traps and thus sensitivity and data quality is better (Schwartz et al., 2002). These instruments have been very effective for proteomic studies.
The second area is the rise of the hybrid instrument to create improved capabilities for proteomics. A linear ion trap has been combined with quadrupole mass filters (Q1 and Q2) to provide a versatile instrument capable of tandem mass spectrometry in the ion trap (tandem MS in time) and MS/MS using the Q1 to select ions, a collision cell (Q2) to dissociate ions and the linear trap for analysis of m/z values (tandem MS in space) (Hager and Yves Le Blanc, 2003). In this configuration the instrument can also perform single ion monitoring for hypothesis driven experiments, where you know what proteins might be present in a mixture and the instrument is set up to pass only that m/z value, fragment the ion and monitor the signal for one of the fragment ions. This strategy is capable of very precise measurements and is used when global measurements are not needed. Hybrids incorporating high resolution, high mass accuracy analyzers such as the Fourier transform ion cyclotron resonance (FTMS) and Orbitrap (also a form of FTMS) have also become incredibly useful for proteomics. A major limitation for FTMS has been the ability to perform routine and general tandem mass spectrometry and the ability to acquire high quality m/z data on a chromatography time scale. FTMS instruments are ion‐trapping devices that are sensitive to the size of the ion population injected into the ion trap. Too few ions and signal to noise is poor, too many ions and space charging (ions of like charge perturbing the natural motion of ions in the external field, e.g. magnetic or electric) is a factor. By creating a hybrid instrument with a device to initially store ions before injection in the FT ion trap, the capability to perform MS/MS or to count ions is enabled. Two devices, the LTQ‐FTMS and LTQ‐Orbitrap, use a linear ion trap to store, manipulate and count ions before injection into the FTMS (Syka et al., 2004; Yates et al., 2006). Tandem mass spectrometry can be performed in the linear ion trap and fragment ions measured in this analyzer or the fragment ions can be transferred into the FTMS for analysis. Thus, fragment ions can be measured with low or high resolution. Because of throughput issues most users operate the instrument with high‐resolution precursor ion measurement and low‐resolution acquisition of tandem mass spectra. An advantage to this technology is the ability to measure ions with 60–100,000s of resolution and mass accuracy of 1–3ppm on a chromatographic time scale (Syka et al., 2004; Yates et al., 2006). Mass spectrometry technology continues to evolve creating more capability and opportunity in proteomics. An in‐depth review of different MS intrumentation and applications can be found in Domon and Aebersold (2006).
Once mass spectrometry data is collected, the data must be analyzed in some manner to extract information. In shotgun proteomics, tandem mass spectra are searched through sequence databases and matched to amino acid sequences. Several algorithms exist to perform this operation and all use some type of mathematical closeness of fit measure to determine how well the amino acid sequence fits the tandem mass spectrum (Sadygov et al., 2004). The seminal approach to match tandem mass spectra to sequences is the SEQUEST algorithm that uses a two step scoring function with a cross‐correlation analysis providing the final measure of closeness of fit (Eng et al., 1994). Once a database search has been performed, the data needs to be filtered, assembled and statistically analyzed. A rising standard for statistical analysis of data is the use of a “decoy” database, which consists of a reversed, randomized, or shuffled copy of the database used in the search (Peng et al., 2003, 2003). By monitoring the number of tandem mass spectra that match to the decoy sequences the false positive rate can be determined. Scoring threshold values can then be set based on a desired false positive rate which is usually 5% or less. Some care has to be taken to make sure the decoy database does not contain a sequence found in the database and this can often be accomplished by appending the decoy database to the end of the sequence database so the first match observed during the search is from the original database. Computer programs have emerged to automate post search analysis of data including the calculation of statistics for the dataset (Tabb et al., 2002; Keller et al., 2002). For example, the program DTASelect is used to filter and assemble the results from an LC/LC/MS/MS analysis (Tabb et al., 2002). This type of data is more complicated than that of LC/MS/MS data as it includes analyses from multiple fractions and occasionally peptides appear in multiple fractions. These programs often regroup peptides according to the protein of origin, sort by isoforms when possible, and determine the amount of protein sequence coverage obtained by peptides (Tabb et al., 2002; Nesvizhskii et al., 2003). When quantitative information is needed more specific experimental parameters must be followed in addition to the use of specific software to analyze the data. An in‐depth review of different spectra searching algorithms can be found in Nesvizhskii and Aebersold (2004).
Both qualitative and quantitative measurements of protein content in biological samples are important for comprehensive proteomic analysis. A crucial component of LC(LC)–MS/MS approaches is the assessment of the abundance of detected peptides for complex biological systems in relation to a control state. Mass spectrometry is an excellent quantification device, but the ionization process that creates peptide ions can introduce variability in the number of ions created. Additionally the m/z values of peptides being compared between two states will be the same if they represent the same proteins. To differentiate between peptides from different states and correct for ionization differences and other potential sources of variability, stable isotope labeling of peptides is often used. In one state the naturally occurring isotopomer is used (e.g. 12C, 1H, 16O, 14N) and in the other state a stable and non‐radioactive heavy isotope is used. A variety of methods have been used including tagging proteins and peptides with covalent labels (Gygi et al., 1999; Zhou et al., 2004; Qiu et al., 2002; Smolka et al., 2001; Goshe and Smith, 2003), digesting proteins in the presence of 18O water (Bantscheff et al., 2004), and metabolic labeling of proteins using the translational machinery of the cell to incorporate labels (Ishihama et al., 2005, 2005, 2003, 2002, 2004, 2002). In all cases the peptides of one state have a different molecular weight from peptides of the other state but have nearly identical chemical properties so they behave chromatographically the same and ionize with the same efficiency. During mass analysis the m/z difference between two peptides will be resolved and based on the peak areas of the two different ions, the amount of each peptide can be ascertained. When the experiment is designed to both collect tandem mass spectra and determine the amount of stable isotope labeled peptide present, a high variance can be observed (~15–25%) (Zybailov et al., 2005; Venable et al., 2004). A comparison of different stable isotope labeling methods of proteomic quantification is summarized in Table 2. A number of computer algorithms have been developed to use the results of database searching algorithms to calculate the peak areas for each peptide. Quantitative measurements of peptides are obtained using quantification software, such as XPRESS (Moulder et al., 2005), ASAPRatio (Li et al., 2003), RelEx (Venable et al., 2004; MacCoss et al., 2003), QUIL (Wang et al., 2006), ProRata (Pan et al., 2006), Multi‐Q (Lin et al., 2006), or more recently CenSus (manuscript submitted), which uses the relative MS signal intensities of the different isotopic forms to calculate the relative abundance of each identified peptide (Figure 2).
Protein quantification using the stable isotope labeling strategy.
A comparison of stable isotope labeling methods for proteomic quantification
Quantitative proteomic approaches utilizing the differential labeling technique such as isotope‐coded affinity tagging (ICAT) have been developed for rapid and sensitive measurements of protein abundance in biological samples. ICAT reagents can be used to perform quantitative analysis of paired protein samples followed by separation and identification of proteins within these complex mixtures by LC–MS/MS (Bottari et al., 2004; Lu et al., 2004). The ICAT reagents generally consist of three components: a reactive group (i.e. thiol), a linker group (existing in a light or heavy isotopic form), and an affinity handle (i.e. biotin, for purification of tagged peptides). A presumptive advantage of ICAT over other isotope‐labeling systems is that it should also reduce the complexity of peptide mixtures from biological samples. Therefore, it has been utilized to perform comparative and quantitative analysis of differential protein expression in complex biological samples such as cancer cells and tissues of human cancer patients (Meehan and Sadar, 2004; Hara et al., 2007; Chen et al., 2006; Pawlik et al., 2006; Stewart et al., 2006). Besides the ICAT reagents, a new isobaric multiplexing tagging reagent, iTRAQ, has been introduced recently to perform quantitative global protein expression analysis (Aggarwal et al., 2006; Zieske, 2006). Instead of labeling a single amino acid, iTRAQ technology makes use of amine‐specific, stable isotope reagents that can label all peptides with up to four different biological samples simultaneously. Examples of the utility of the iTRAQ technology coupled with LC/LC–MS/MS include studying cancer cell invasion (Keshamouni et al., 2006), differential protein expression in cancer tissues (DeSouza et al., 2005), and novel therapeutic targets for advanced cancer (Comuzzi and Sadar, 2006). A disadvantage to the iTRAQ approach is the labeling process occurs after proteolytic digestion and does not control as well for variance introduced during early stages of the sample preparation process.
Techniques such as ICAT and iTRAQ can be applied only to non‐living samples. Alternatively, a variety of in vivo stable isotope amino acid labeling strategies for LC–MS/MS‐based quantitative proteomics have also been reported. Most of these approaches allow the labeling at the protein level before digestion, and the samples are combined after the isotope‐coding step. For instance, cancer cells in culture can be metabolically labeled by the SILAC (stable isotope labeling by amino acids in cell culture) method to incorporate specific amino acids into proteins (Ong et al., 2003; Yocum et al., 2006; Gronborg et al., 2006; Everley et al., 2004). Cells are grown in the media with normal amino acids (light) or in the media with non‐radioactive, isotopically labeled form of specified amino acids (heavy). Using this metabolic labeling method, proteins from cancer cells at different stages or engineered genetically can be labeled during growth and high enrichment of an isotopic labeled amino acid can usually be achieved in 4–5 doublings of the cell population. The additional constraint of one or two incorporated amino acids can aid in database searching and allow observations of PTMs by comparing the differences between light and heavy versions of the peptide (Amanchy et al., 2005). Relative quantification of changes in protein expression by SILAC is obtained at the peptide level comparing the MS intensities of light (non‐labeled) and heavy peptides (isotope labeled). Since no chemical labeling or affinity enrichment steps are required, the SILAC method can be widely used to study cancer cells in culture.
Proteins can also be metabolically labeled by growing cells in either 14N minimal or 15N‐enriched media and combined with affinity enrichment analogous to the ICAT approach (Oda et al., 1999, 2001). Conrads et al. (2001) carried out quantitative analysis of bacterial and mammalian proteins using 15N‐metabolic labeling in combination with a cysteine‐reactive biotin affinity tag and demonstrated that no significant isotope‐dependent chromatographic shift was observed between 14N and 15N‐labeled peptides allowing more precise quantification of peptide abundances. Therefore, the 15N metabolic labeling strategy presents an alternative to SILAC for global protein labeling and quantitative proteomic studies. The possibility of global protein quantification using the 15N metabolic labeling strategy coupled with MudPIT has been demonstrated in an analysis of the yeast proteome. Large‐scale protein quantification and identification in S. cerevisiae were obtained simultaneously using the combination of 15N metabolic labeling and MudPIT (Washburn et al., 2002). A data‐independent data acquisition strategy further improves protein quantification using 15N metabolic labeling and shotgun proteomics. This 15N based protein quantification method was applied to measure protein expression levels in two developmental stages of Caenorhabditis elegans (Venable et al., 2004) and was shown to improve the accuracy of quantification.
Although differential labeling using stable isotopes allows accurate quantification of protein expression, there are significant drawbacks in using stable isotope labeling methods in general. First, the labeling efficiency of amino acids or proteins varies depending on the rate of protein turnover in cells. Second, MS‐based quantification requires detection of chromatographic peaks for both light and heavy peptides. Substantial changes in protein expression often result in a poor peak correlation between the light and heavy peptides and reduce the number of quantifiable peptide measurements. Third, implementation of the stable isotopic labeling strategy is further hampered by the cost of stable isotope‐labeled amino acids as well as limited availability of isotope/metabolic labeling media. An interest in alternate protein quantification methods has prompted researchers to look for suitable features from tandem mass spectra that would be surrogate measures of protein abundance. In fact, several features of tandem mass spectra collected by shotgun proteomics, including peptide hits, protein sequence coverage, spectral count, and normalized ion intensity have been observed to be indicative of protein abundance in a given sample (Washburn et al., 2001; Florens et al., 2002; Gao et al., 2003; Pang et al., 2002; Wang et al., 2003). Using LC(LC)–MS/MS based approaches such as MudPIT, the number of tandem mass spectra assigned per protein (spectra count) has been used to provide relative protein quantification in biological samples (Figure 3). Linearity and sensitivity of using spectra count measurements were evaluated by studying the correlation between spectral counts and increased protein concentrations (Liu et al., 2004; Resing et al., 2004). Positive correlation between spectra counts and protein concentration was observed in a linear dynamic range over two orders of magnitude using the number of spectra (spectral sampling) acquired for each protein. Also, spectra counts distributed across different steps or runs of MudPIT can be combined (Zybailov et al., 2005). Last, the spectra counting method can be modified to allow normalization by protein length as a normalized spectral abundance factor (NSAF) (Florens et al., 2006). Another comparable label‐free parameter termed the protein abundance index (PAI), which is defined as the number of observed peptides divided by the number of observable peptides per protein, was also reported to show a linear relationship with the logarithm of protein concentration in LC–MS/MS experiments (Ishihama et al., 2005, 2005).
Spectra count quantification in LC–MS/MS‐based proteomics.
The use of stable isotope labeling in conjunction with tandem mass spectrometry is a precise and accurate method to measure proteins changes (Gygi et al., 1999; Washburn et al., 2002; MacCoss et al., 2003; Lemmel et al., 2004). The method typically relies on the acquisition of tandem mass spectra to identify a peptide prior to measuring the quantities of ions from the two different states and because the proteome coverage is dependent on acquisition speed, it is seldom comprehensive. To circumvent this limitation and improve coverage of a proteome, a differential analysis strategy has evolved. In this strategy the intact proteins are digested and then peptide ion currents measured using LC/MS (Wang et al., 2003; Fang et al., 2006). It is set to provide a broad coverage of peptides and allow relative quantitative comparison across multiple samples. However, many peptides co‐elute when thousands of peptides are measured in one LC/MS analysis. To distinguish experimental variability from real biological differences across various LC–MS analyses, efforts are made to make the LC/MS analysis as a reproducible as possible. Larger diameter (1mm or 2.1mm) commercial columns are sometimes used at higher flow rates to achieve better inter experiment reproducibility. It is easier to insure uniform conditions at higher flows and commercial columns have better quality assurance than home‐made columns. Analyses can be compared by using ion chromatograms and chromatographic alignment is frequently performed before comparing ion signals (MacCoss et al., 2003; Sadygov et al., 2006). Once the chromatograms have been compared and differences identified, the peptides representing these differences are analyzed by tandem mass spectrometry. On some types of instruments, e.g. LTQ‐FTMS, LTQ‐Orbitrap, the recording of the ion chromatogram and acquisition of tandem mass spectra can occur simultaneously. In the event a tandem mass spectrum was not acquired for a peptide representing a difference, this information can be acquired in a subsequent experimental step. This process of using LC/MS increases the acquisition throughput for m/z values and thus potentially produces a higher level of coverage across the proteome. Dynamic range is still an issue, and ultimately the process may need to go to a multi‐dimensional analysis for more complex systems, but there are still technical hurdles to overcome before multi‐dimensional analysis can be used. The complexity of peptide mixtures in biological samples is one of the major limitations for LC/MS differential analysis. Co‐elution of peptides can mask biological differences or decrease the dynamic range of the experiment. Other issues such as experimental noise, and uncorrelated measurements between experimental and biological variations present challenging analytical concerns for large‐scale multidimensional proteomic dataset (Listgarten and Emili, 2005)
An emerging form of analysis uses single ion monitoring as way to measure peptides and ultimately proteins. This approach has its roots for peptide analysis in methods developed by Desiderio and Kai (1983) to measure and quantitate neuropeptides. Barr et al. (1996) described it use to quantitate proteins by digesting the protein and measuring the peptide levels. Gerber et al. (2003) brought this method to use for proteomics by creating a strategy to measure the absolute amounts of proteins. This strategy is also called “hypothesis driven mass spectrometry” because you have to know in advance what you want to measure and then preset the mass spectrometer to acquire those ions (Kalkum et al., 2003). Two strategies are emerging in the use of hypothesis driven methods. The first preselects ions and acquires a tandem mass spectrum for each ion. Fragmentation patterns in the spectrum confirm the identify of the peptide. The second strategy uses the exquisite sensitivity and specificity of the triple quadrupole mass spectrometer to perform single reaction ion monitoring (SRM). Ion m/z values are preselected as above but only one or two fragment ions are monitored to record ion current signal. The combination of detecting both parent and fragment ions yields maximum sensitivity for detecting target peptides and has been shown to be useful for targeting low‐abundance proteins (Wienkoop and Weckwerth, 2006). Anderson and Hunter (2006) applied the SRM approach to measure specific peptides in tryptic digests of human plasma. Tryptic peptides representing 53 high and medium abundance proteins in human plasma were quantitatively measured by SRM LC–MS/MS assays. Synthetic stable isotope‐labeled peptides corresponding to the targeted peptides were spiked in the plasma proteolytically cleaved by trypsin. Results from this study exemplify the utility of target specific mass spectrometry‐based assays and possibility of high throughput quantification as well as validation of shotgun proteomic datasets.
Unregulated growth is the hallmark of cancer cell development. Deregulation of oncogenic and tumor suppressive genes enables cancer cells to increase rates of cell proliferation and evade programmed cell death (Weinstein and Joe, 2006; Tamura, 2006). Often advanced tumors display mutations in multiple oncogenes and tumor suppressor genes and harbor epigenetic abnormalities resulting in increased expression of hundreds of genes. Despite the fact that differential regulation of oncogenes and tumor suppressor genes is recognized as essential to cancer development and metastasis, mechanisms of their tumor enhancing or suppressing functions are largely unknown. It has been observed that oncogenes and tumor suppressor genes often involve multiple cellular processes that are critical to tumor development (Tamura, 2006; Cantley et al., 1991). Furthermore, structural information often offers limited mechanistic insights into the functions of oncogenes and tumor suppressor genes. For example, a tumor suppressor gene, maspin, is classified as a family member of serine proteinase inhibitors (Serpins) based on its protein structure but does not function as a classical serpin (Pemberton et al., 1995). Although the anti‐tumor mechanism of maspin is uncertain, maspin has been found to interfere with several cancer‐associated processes such as cell invasion, apoptosis, and tumor angiongenesis (Hendrix, 2000; Sheng et al., 1996; Zhang et al., 2000). Using MudPIT and the spectra counting method, a mechanistic investigation of maspin's anti‐metastatic function was performed. Over a thousand proteins were identified and quantified using the spectra count comparison from a large‐scale protein profiling study between cancer cells with or without the ectopic expression of maspin (Chen et al., 2005). Changes in protein levels found by spectra count assessment were verified by Western blot analysis and found to be consistent. A novel connection between maspin and suppression of the proteasome activity in cancer cells was revealed from this study, and our results further support deregulation of the proteasome in cancer metastasis. Using two‐dimensional electrophoresis coupled with mass spectrometry, Hondermarck (2003) performed similar mechanistic investigations by identifying targets of an oncofetal gene, H19. Hondermarck (2003) determined proteomic profiles of breast mammary cells transfected with the H19 gene, which encodes an untranslated mRNA and identified one of the major proteins regulating intracellular redox metabolism, thioredoxin, as a first‐identified target of the H19 gene product. These studies demonstrate the possibility of combining large‐scale protein identification with label‐free protein quantification for rapid and comprehensive molecular analysis of cancer development. Furthermore, results from these studies illustrate the value of proteomic analysis and generate new leads for understanding the mechanisms involved in the development of malignant phenotypes.
The diverse realm of posttranslational modification (PTM) of proteins encompasses various signaling events occurring at different stages of tumor development. Post‐translational protein modifications can result in differential transcriptional regulation, induction of protein–protein interactions, and alteration in protein stability (Seo and Lee, 2004; Gill, 2003; Dunne and O'Neill, 2005; Bode and Dong, 2004; Kaiser and Huang, 2005; Vogt et al., 2005). The dynamic composition and complexity generated by post‐translational modification make identification a tremendous challenge in proteomics. Patterns of modified proteins such as differential phosphorylation of a protein can be visualized in 2DE; however, selective and specific probes (i.e. radiolabeling of proteins by 32P incorporation) are necessary to detect intact modified proteins (Bykova et al., 2003). After the detection of specific modified proteins, phosphopeptides and phosphoamino acid assignments are achieved by LC/MS or LC–MS/MS analysis. PTM characterization can also be performed by mass spectrometry‐based approaches. In fact, MS‐based methods have made significant progress in recent years toward qualitative and quantitative measurement of post‐translational modifications. Modified proteins can be characterized by either top down or bottom up (shotgun) techniques (Olsen et al., 2006; Kjeldsen et al., 2003). Both techniques involve the use of liquid chromatography and tandem mass spectrometry. Although different techniques excel in various aspects of PTM mapping, the heterogeneous nature of modifications requires enrichment of modified proteins/peptides for complete characterization of PTM.
Often the context of modifications within a protein is important to understand regulatory events. This information can be lost in a bottom up approach when multiple modified forms of a protein are analyzed together. To circumvent this problem, top down approaches that measure and fragment intact or large fragments of proteins are being used to retain or discover the context of modifications. Top down proteomics requires the ability to measure intact protein masses and thus much higher resolution mass spectrometers are needed (Sze et al., 2002; Pesavento et al., 2006). Two major advances in techniques have enhanced the capability of the top down approach to detect intact modified proteins. First, Fourier transform ion cyclotron resonance (FT‐ICR) MS provides very high‐resolution measurement of intact ions. Linear ion trap coupled to FT‐ICR improves the ion accumulation efficiency for ion fragmentation with accurate mass and high resolution (Chalmers et al., 2003). Second, the novel technique of electron capture dissociation (ECD) FT‐ICR MS cleaves only the peptide backbone leaving the modified amino acid side‐chains intact and enables sequencing of phosphopeptides, glycopeptides and other types of modified peptides as well as of intact, modified proteins up to 45kDa (Kjeldsen et al., 2003; Sze et al., 2002). Pesavento et al. (2006, 2006, 2006) and Siuti et al. (2006) have pioneered these novel techniques and demonstrated the utility of top down proteomics to characterize post‐translational modifications in human histones.
Bottom up approaches require proteolytic digestion of intact proteins into a collection of peptides that are more readily characterized than the intact protein. Then, the peptide mixtures are separated by single or multidimensional chromatography with offline fraction collection or online electrospray mass spectrometry (LC/MS). High protein sequence coverage using the bottom up approach involves generating peptide length optimal for MS analysis (500–3000Da) and prediction of mass shifts due to known modifications. To increase the sequence coverage of the modified proteins, enzymes with a different proteolytic specificity are used to cleave the protein and generate a ladder of peptides potentially covering the sequence of the entire protein (Yates 3rd, 1993). In theory, the peptide sequence, mass, and localization of modifications can determined by the MS/MS fragmentation pattern of the targeted precursor ion since the modified peptide generates a similar fragmentation pattern to the unmodified peptide with the difference of mass increase resulting from the modified amino acid. However, in practice, if the modification is labile, then it will be lost before the peptide itself fragments. In this case, the peptide can still be sequenced and identified, but only the mass difference not the location of the modification is determined. The recent development of electron transfer dissociation improves the localization of modifications such as phosphorylation as labile modifications are not as prone to elimination (Mikesh et al., 2006).
Protein phosphorylation is also one of the most important PTMs found to play a critical role in tumor growth and has been studied intensively. Many growth factors initiate cellular signaling through tyrosine kinase receptors, which in turn induce protein–protein interaction and phosphorylation cascades involving a variety of signaling proteins such as MAP kinases. These cascades of protein phosphorylation ultimately induce changes in gene expression with resulting modifications in protein synthesis and leading to either cell survival, proliferation, differentiation, or migration in normal cells as well as cancer cells. For example, the function of epidermal growth factor receptor (EGFR) is dys‐regulated in the vast majority of human epithelial tumors (Yarden, 2001; Normanno et al., 2006). To understand the signaling events in cancer cell development, techniques allowing sensitive and quantitative detection of phosphopeptides have been developed and applied to cancer studies. Immunoaffinity based strategy targeting tyrosine phosphorylation has been developed to characterize the phosphotyrosine component of the phosphoproteome and identify tyrosine phosphorylation sites (Rush et al., 2005). Peptides containing phosphotyrosine are specifically enriched by the phosphotyrosine‐specific antibody from the protease‐digested cancer cell extracts and analyzed by MS/MS. Another strategy using phosphotyrosine immunoprecipitation in tandem with immobilized metal affinity chromatography (Yoshida et al., 1995) and mass spectrometry has been applied to follow changes in tyrosine phosphorylation occurring over time during the activation of human T cells or the inhibition of cancer cells in response to treatment with ST1571 (Gleevec) (Brill et al., 2004; Salomon et al., 2003). Other large‐scale profiling strategies combining phosphopeptide enrichment, high accuracy identification, and SILAC have also been applied to qualitatively and quantitatively study phosphoproteome in cancer cells. Olsen et al. (2006) utilized the approach of SILAC for phosphopeptide quantification and two chromatographic strategies, strong‐cation exchange (SCX) and titanium dioxide (TiO2), to enrich phosphopeptides for MS/MS analysis. A phospho‐site analysis algorithm to predict the probabilities of phosphosites has also been developed to analyze high throughput phosphoproteome profiling data (Olsen et al., 2006; Beausoleil et al., 2006). To further enhance the capability of phosphopeptide identification, a new fragmentation technique, electron transfer (ETD), has been developed to use in tandem mass spectrometry for preservation of labile PTMs such as phosphorylation in peptides (Mikesh et al., 2006; Chi et al., 2007). With the newly developed techniques and improved algorithms for assigning phosphorylation sites, the field of PTM characterization shows tremendous potential in shedding new light on important signaling events occurring in the pathogenesis of cancer.
Protein ubiquitination is another important post‐translational modification that plays a critical role in tumor growth. The ubiquitin‐proteasome system is involved in regulation of cell growth and apoptosis. In normal cells, the proteasome controls degradation of intracellular proteins (Mukhopadhyay and Riezman, 2007). Proteins destined for proteolysis are targeted to the proteasome through the attachment of a poly‐ubiquitin chain and degraded to small peptides and the ubiquitin is released and recycled (Goldberg, 1995). By coordinating the protein degradation in the cell, the ubiquitin‐proteasome system regulates many important cellular processes such as the cell cycle, apoptosis, and transcription factors (Hershko, 1997; King et al., 1996; Orlowski, 1999; Karin and Ben‐Neriah, 2000). In preclinical cancer models, proteasome inhibitors induce apopoptosis, have in vivo anti‐tumor efficacy, and sensitize malignant cells to other anti‐tumor therapy (Voorhees et al., 2003). Although the biological significance of protein ubiquitination is well studied, proteins regulated by ubiquination are far less known. Problems of identifying ubiquitin‐modified proteins are the low steady‐state conjugate level due to rapid turnover of ubiquitinated proteins and large poly‐ubiquitin chains (~8kDa) on modified proteins. To overcome these two problems and identify ubiquitinated proteins at a global level, Peng et al. (2003, 2003) over expressed 6xHis‐tagged ubiquitin in yeast and enriched ubiquitin conjugates for characterization. Then, they cleaved the isolated ubiquitinated proteins with trypsin to generate peptides that contain two glycines derived from the C terminus of ubiquitin and attached covalently to the targeted lysine residue. The trypsin‐digested peptides were fractionated first by strong cation exchange (SCX) chromatography and analyzed by LC–MS/MS. Tandem spectra collected were searched against a yeast protein database and signature mass shift of 114.1Da (two glycines attached) and a missed proteolytic cleavage at the lysine residue because trypsin cannot cleave the modified lysine residues. Another ubiquintin‐related polypeptide modification, sumoylation (SUMO), has also been studied using the global profiling strategy. SUMO conjugation is carried out in a multi‐step enzymatic pathway similar to ubiquitin conjugation resulting in a covalent isopeptide bond linking to the amino group of specific lysines in the target protein (Johnson and Blobel, 1997; Johnson et al., 1997). Some of the known SUMO targets are involved in important biological processes including transcription, cell cycle progression, DNA damage response, and signal transduction. Using the shotgun proteomics approach, Wohlschlegel et al. (2004) carried out large‐scale identification of SUMO targets in S. cerevisiae and revealed novel SUMO substrates involved in chromatin modifications.
It has become apparent that signaling events occur in cancer cells, among cancer cells, and between the extracellular matrix and cancer cells. Cancer cells become migratory as a result of genetic alterations and in response to the surrounding microenvironment (Boudreau and Bissell, 1998). Dynamic interactions between the cancer cell and its microenvironment can contribute to the metastatic potential of the cancer cell and influence the access of the therapeutic agents to cancer cells (Brown and Giaccia, 1998). Therefore, characterization of dynamic interactions and communication of cancer cells in the tumor microenvironment will provide a more comprehensive view of complex networks within and among cancer cells.
Characterization of normal and tumor tissues by proteomic approaches has been performed but the outcome from these studies has not been fruitful. In vivo human tissue studies are hindered by limitations of the sample size and availability of matched normal and tumor patient samples for proper controls. Therefore, researchers have been developing mouse models to reproduce and monitor human cancer progression. These mouse cancer models can be used to model complex cellular networks of cancer cells and to evaluate targeted therapeutics. To obtain qualitative and quantitative changes of proteins on a global level in vivo, we developed the method to metabolically introduce 15N stable isotopes into proteins of living mammalian organisms (Wu et al., 2004). Ideally, 15N labeled tissues specific mouse proteins can be used as an internal protein standard for quantification. Changes in protein level can be estimated by measuring the ratio of unlabeled and 15N labeled tissues. Any systemic errors between the labeled standard and unlabeled sample can be canceled out by calculating the ratio over the 15N labeled internal protein standard (Figure 4). To explore the concept of monitoring qualitative and quantitative changes in protein abundance induced by cancer‐host interactions in vivo, we carried out in vivo 15N metabolic labeling of human cancer cells in mice. Grafting human cancer cells in mice has been used extensively to study the growth of tumor and pathological processes such as angiogenesis (Kluger et al., 2005; Eibl and Reber, 2005; Khalili et al., 2005; McAlhany et al., 2003; Tuxhorn et al., 2002). Successful application of 15N metabolic labeling in the human xenograft cancer model will enable us to probe the dynamic cellular networks within the cancer cells and interactions of cancer cells with the tumor microenvironment to obtain quantitative protein changes from all stages of tumor development. Based on our preliminary study, we found that the 15N metabolic labeling method can be readily incorporated into the human xenograft cancer model, and proteins in human cancer cells can be rapidly labeled by 15N stable isotope when implanted in 15N metabolically labeled mice (unpublished data). Future applications of the 15N metabolic labeling coupled with quantitative proteomic approaches such as MudPIT will reveal more pathologically relevant molecular patterns from animal cancer models.
In vivo quantitative proteomic strategy for cancer research.
There are two main expectations from proteomic analysis of cancer proteomes. The first is to discover new molecular targets associated with different stages of tumor development. The second is to decipher the molecular mechanisms and signaling events that lead to cancer development. Currently, direct analysis of complex protein mixtures using powerful liquid chromatographic separation methods such as MudPIT, advanced mass spectrometers capable of high mass accuracy, and sophisticated bioinformatics algorithms shows great potential for comprehensive and quantitative cancer proteome analysis and holds great promise for future cancer discovery. However, we are still in a pioneering phase that is exploring the limits of current proteomic technologies to understand the complex deregulation of cellular signaling during cancer progression.
No one proteomic technology can address all questions, but integrating methods should allow progress to be made. For example the large‐scale proteomic methods can identify new biochemical features or perturbations that can be validated, substantiated or extended using more directed mass spectrometry approaches.
Hypothesis driven mass spectrometry, a directed method, may provide a powerful method to decipher initial changes observed in signaling or protein expression events associated with cellular processes. Thus, initial changes can be observed with shotgun or differential methods can then be followed over time using more directed mass spectrometry methods. In summary, future development of cancer proteomics is likely to depend on innovative technologies focused on improving the LC–MS/MS based protein identification and quantification methods for comprehensive molecular profiling.
The authors are supported by the following grants: NIAID sub‐contract grant UCSD/MCB0237059, 5R01MH067880‐02, and NIH P41 RR11823‐10.
Chen Emily I., Yates John R. III, (2007), Cancer proteomics by quantitative shotgun proteomics, Molecular Oncology, 1, doi:10.1016/j.molonc.2007.05.001.