|Home | About | Journals | Submit | Contact Us | Français|
Because early detection greatly improves clinical prognosis, robust screening tests are needed for all types of gastrointestinal (GI) cancer. Whereas a wide variety of modalities exist for colorectal cancer, only 40% are being detected at early stages. In contrast, few modalities exist for pancreatic cancer, resulting in only 10%–20% of cases diagnosed at an early stage.1 As blood laboratory testing is a routine procedure in primary care clinics, blood-based screening tests would greatly improve compliance rates. Proteomics is the study of proteins in a biological system. Biomarkers are discrete proteins or other molecular entities that act as surrogate markers for the presence of disease. Although these biomarkers remain elusive, historical studies certainly support their presence, particularly in the major malignancies of the GI system. For example, carcinoembryonic antigen, first discovered in 1965 by Gold and Freedman in human colon cancer tissue extracts, is often elevated in the serum of patients with luminal GI tumors.2 Another glycoprotein, carbohydrate antigen 19-9, discovered in patients with colon and pancreatic cancer in 1981, is often elevated in the serum of patients with pancreaticobiliary tumors.3 Finally, alpha fetoprotein, first described in 1963, is often elevated in the serum of patients with hepatocellular cancer.4 Recent advances in proteomics methods make the discovery of robust biomarkers a coming reality. These methods were brought to the forefront in 2002, with a study by Petricoin et al describing biomarker discovery in ovarian cancer.5 Although subsequent work demonstrated the preliminary nature of their approach, the field of proteomics has made significant strides in the subsequent years, resulting in powerful new technologies and increased understanding of protein biology. In this review, important advances and studies in the field of biomarker discovery focused on GI malignancies are discussed.
In most proteomic workflows, proteins are digested to peptides by a site-specific protease such as trypsin. The resulting peptide mixture is then applied to a high-performance liquid chromatography column coupled to a mass spectrometer (MS). After ionization, the peptides are resolved inside the MS according to mass to charge (m/z) ratio, with the final output consisting of a full mass spectrum with the total ion current on the y-axis and the m/z ratio on the x-axis. Tandem MS (MS/MS) consists of serial analyses in which a few of the most abundant peptides from the primary mass scan (precursors) are selected for additional fragmentation (products) and analysis. Using sophisticated computational algorithms, these characteristic fragmentation patterns can be matched against a database of the entire proteome that has been digested in silico to provide peptide sequencing information and ultimately protein identities (Figure 1). This information is critical for the analysis of proteome profiling datasets in the context of biological significance. With this basic overall approach, the proteomes of complex biological fluids can be analyzed in a completely unbiased fashion, thus increasing chances for the discovery of novel biomarkers.
Because several profound obstacles have proven difficult to overcome, it is not surprising that identification of clinically useful circulating biomarkers has proven difficult. First, because the blood proteome is a complex mixture of proteins, the identification of low abundance biomarkers can be quite a challenge. 6 For example, the most abundant blood protein, albumin, is present at 30 mg/ml, whereas most biomarkers are thought to be present at ng to pg/ml concentrations. As such, a dynamic range of detection of up to 12 orders of magnitude is required, whereas most MS can measure only about 3–4 orders of magnitude. Second, malignancy is often associated with inflammation and a systemic acute phase reaction. Proteins differentially regulated by these processes are present at much higher levels than biomarkers specific for malignancy, thus making the discovery process difficult. Third, the blood proteome is a very dynamic entity that is extremely sensitive to environmental changes. For example, meal to meal dietary effects are extremely difficult to control for in large-scale human trials. Fourth, although biomarker discovery is usually performed using MS approaches, large-scale validation is historically performed using immunologic approaches. Therefore, although proteomics studies can identify hundreds of candidate biomarkers, validation is limited to the subset of proteins for which high-quality antibodies already exist or can be developed.
Because the blood proteome contains contributions from all organ systems in the body, an alternative is the proteomic analysis of a proximal fluid, such as ductal fluids in the case of pancreatic or biliary cancer, or the actual primary tumor tissue for candidate biomarkers.7 The rationale behind this is that, whereas the concentration of tumor-derived proteins is very low in the blood, their abundance should be greatly increased at or close to the source. A major drawback to this approach is the accessibility to tissue or proximal fluid samples. For instance, although colonic samples can be obtained relatively easily, pancreatic samples would be more difficult to obtain. Furthermore, there can often be poor concordance between protein abundance in the tumor or proximal fluid and the presence of the corresponding protein in the blood. This is because analysis of protein levels at the proximal source does not provide information regarding protein entrance into and clearance from the blood compartment. For these reasons, we and many other groups focus mainly on biomarker discovery directly in the blood compartment.
Biomarker discovery directly in humans can often be quite challenging. This is secondary to the inherent genetic variation between humans, not only in background, but also in the tumor itself. Further complications include poorly controlled environmental exposures and study conditions. The study of genetically engineered mouse models of GI cancer provides an excellent discovery platform.8 These models are derived from inbred mouse strains of homogeneous genetic backgrounds. In addition, they have been engineered to express critical mutations known to be important for carcinogenesis in humans. Furthermore, environmental exposures can be carefully controlled. We and others have employed this approach.9, 10 Most important, a recent study using a mouse model for pancreatic cancer demonstrated that results from a discovery effort using mouse plasma can readily translate to orthogonal validation in humans.11
The most effective method for addressing the obstacle of abundant serum proteins has been simply to remove them. This is accomplished using high-performance liquid chromatography columns with a stationary phase consisting of immobilized antibodies specific for multiple abundant proteins. Progressively more powerful systems have become available, capable of removing the 7, 12, and 20 most common plasma proteins. Most recently, the Seppro IgY–SuperMix system has been demonstrated to effectively remove >60 of the most common serum proteins. These systems have been demonstrated to be selective, to perform reproducibly12-14 and to remove a significant proportion of these abundant proteins, up to 99%, for example, with the SuperMix system.15 One unavoidable drawback to this approach is the potential loss of biomarker proteins bound to highly abundant proteins.
Even with these procedures, further fractionation is required to fully analyze the serum proteome for purposes of biomarker discovery. Two-dimensional liquid chromatography-tandem MS (2D LC-MS/MS) has become an essential methodology for such an analysis. Termed “shotgun proteomics” and first described in 2001, complex protein mixtures can be digested with trypsin, then subjected to orthogonal separation methods, such as fractionation by offline strong cation exchange LC followed by online reversed phase LC.16 Offline fractionation strategies can be based on any biochemical property, such as hydrophobicity, charge, or glycosylation. By adding multiple layers of offline orthogonal separation, even complex proteomes such as plasma can be deeply explored. These fractionation strategies are also effective at the protein level. Combinations of these approaches can dramatically increase the dynamic range of detection for the overall discovery project. For example, by combining removal of abundant proteins, N-glycoprotein and cysteinyl peptide separation, followed by standard 2D LC-MS/MS, a total of 3,654 proteins have been identified in human plasma, with some of these proteins present in the low ng/ml concentration range.17 The drawback to this methodology is the generation of a significant number of fractions requiring MS analysis. Thus, the rate limiting step in many of these studies is the availability of MS instrument time.
One approach to overcome analysis time limitations is to isolate and analyze specific subsets of the larger serum proteome, usually consisting of specific posttranslational modifications. The glycoproteome is of particular interest because most traditional cancer biomarkers are glycoproteins, and changes in patterns of glycosylation have been described in cancer cells. In an early study, we used concavalin A slurries to enrich for the glycoproteome from plasma of Apcmin mice and increase the ultimate dynamic range of detection.18 In later studies of colorectal cancer, Qiu et al employed lectin affinity chromatography to isolate N-linked plasma glycoproteins, extensive fractionation, followed by printing of protein fractions on nitrocellulose-coated slides and probing with lectins to determine patterns of glycosylation.19 Increased sialylation and fucosylation were observed in colorectal cancer and adenoma samples, and validation of 3 biomarkers was performed in an independent set of plasma samples. In a study of pancreatic cancer, Li et al employed a lectin antibody microarray to extract candidate glycoprotein biomarkers from human serum samples, followed by probing with a variety of biotinylated lectins.20 Captured proteins were digested, then sequenced by MS. Using this high-throughput approach, α-1-β-glycoprotein response to SNA was found to be significantly increased in cancer samples.
The foundation for any biomarker discovery effort is based on identification of proteins that show differential expression between disease and control samples. Two fundamental quantitative approaches have emerged: label-free and labeled methods. For label-free quantitation, either peptide peak heights/areas from the mass chromatogram or the number of peptide MS/MS sequencing events can be used as a relative metric for peptide abundance.21 These approaches are attractive owing to their relative ease of use, low cost, and absence of chemical modifications. Common disadvantages include no internal controls for variations during offline sample preparation, variable ion suppression during MS analysis, and poor sensitivity for low-abundance biomarkers.
Chemical labeling approaches employ differential offline modification of the biological samples such that disease and control samples can be analyzed concurrently and resolved in the MS via mass differential. In this manner, relative protein abundances between disease and controls can be identified for all labeled species. The isotope-coded affinity tag reagent can be used to differentially label cysteine residues in proteins from the disease and control samples, with the limitation that only cysteine-bearing proteins can be quantitated.22 Isobaric tag for relative and absolute quantitation reagents can provide simultaneous quantitation for up to 8 samples in a single experiment. In this method, samples are labeled at the peptide level with isobaric tags that react with the N-terminus and side chain amines of peptides.23 O18 from heavy water can be incorporated into the C-terminus of peptides of disease or control samples to provide peak discrimination.24 A drawback of this approach is that there is often back exchange of the O18 label with residual water in the system that might affect quantitation. Nonetheless, these approaches share the common advantage that they account for any downstream experimentation variation after the labeling step and mixture of the disease and control samples and allow for more precise quantitation of low-abundance species. Disadvantages include cost, increased sample handling, and decreased dynamic range of detection. One intrinsic advantage of a chemical labeling approach compared with a metabolic labeling is the ability to directly interrogate biological samples such as human serum or tissue.
Metabolic labeling is performed using an approach termed stable isotope labeling with amino acids in cells culture (SILAC).25 SILAC methodology uses essential amino acids containing stable heavy isotopes, usually C13 and N15, in the culture of cells, resulting in the production of labeled proteins and thus peptides, which then can be mixed with and compared with the unlabeled forms in control samples. Taken to the next step, stable isotope-labeled peptides can be synthesized directly and used to perform absolute quantitation of proteins by means of stable isotope dilution LC-MS.26 A disease-specific, metabolically labeled proteome standard can be used, not only to quantify proteins in serum, but also focus the analysis on relevant biomarkers. Circulating protein biomarkers are likely to be proteins secreted by cancer cells. Good biomarker candidates, therefore, are likely enriched in the secreted proteomes of cancer cells in vitro. To test this hypothesis, the CAPAN-2 pancreatic cancer cell line was SILAC labeled and the secreted pro-teome collected.27 This stable isotope-labeled proteome, termed SILAP, was used as a standard, and was added to pooled pancreatic cancer and control human serum. Over 100 differentially expressed biomarker candidates were identified. The presence of these proteins both in sera and in the CAPAN-2 secreted proteome improves their biological plausibility. Two of these proteins, intercellular adhesion molecule-1 and B-cell adhesion molecule (BCAM), were validated by enzyme-linked immunosorbent assay (ELISA) in the original serum samples used for discovery and in an independent cohort of serum samples. By ELISA, BCAM was present in serum in the 200 pg/ml range, demonstrating the sensitivity of this approach.
The biologically targeted SILAP standard approach allows for relative quantitation of the corresponding unlabeled endogenous proteins, while controlling for non-specific losses during sample processing. Limiting the analysis to proteins secreted and over-expressed in the cell line model makes it possible to exclude acute-phase proteins and other abundant proteins found in serum while simultaneously focusing on proteins with biological relevance to the disease of interest.
Because the profiling strategies described result in thousands of candidate biomarkers, a rational approach to their prioritization before subsequent validation is needed. Using a systems biology approach, one can use orthogonal datasets as filters to identify candidate biomarkers that are of high interest. One approach is to incorporate parallel proteomic analysis of a proximal fluid or tissue. However, because the rate-limiting step in most biomarker discovery efforts is MS instrument time, this approach would severely decrease the depth of analysis of the blood proteome. For this reason, this strategy is not feasible for most biomarker discovery efforts. Integration of a proteome standard derived from secreted cancer cell lines as described in the previous section is 1 way to prioritize analysis.10, 27 Another approach is to make use of the wealth of available transcriptomic data in the public domain, based on the underlying hypothesis that concordance between increased abundance of circulating protein levels and increased levels of corresponding mRNA in the tumor supports tumor specificity of the blood protein in question. We have utilized this approach in a biomarker discovery effort using acrylamide labeling and extensive multidimensional LC-MS/MS to characterize differentially expressed proteins in the genetically engineered mouse model for intestinal cancer, Apc Δ580.9 In parallel, we processed publicly available transcriptome datasets from mouse intestinal tumors. Proteins and their corresponding RNA species that were concordantly differentially expressed were chosen for further analysis. The sensitivity of a subset of these candidate biomarkers were assessed by antibody microarrays in a validation cohort. The specificity of these candidate biomarkers was confirmed by immunohistochemistry at the tumor site.
Traditionally, MS has been used for discovery in a handful of carefully selected samples, whereas antibody-based techniques have been used for larger scale validation. The simplest form of validation is the traditional Western blot, which requires a single primary antibody. Sandwich ELISAs improve sensitivity and allow for absolute quantitation, and therefore are well suited for clinical laboratory tests. These approaches can be used to rapidly screen a large validation cohort. For even higher throughput, antibody microarray technology has rapidly expanded.28
Despite the advantages of antibody-based approaches, generation of new assays is hindered by the poor sensitivity and specificity of existing and newly generated antibodies and the long development time. As a result, there is great interest in development of MS-based validation approaches. A logical solution is stable isotope dilution multiple reaction monitoring (MRM) MS, long considered the “gold standard” approach for small molecule quantitation (Figure 2). Quantitation is performed in conjunction with an internal stable isotope-labeled peptide standard. Specificity is imparted by the MRM MS technique, during which peptide ions of interest are monitored based on both the precursor and resulting product ions after fragmentation. A recent review explores this emerging field in greater detail.29 Anderson and Hunter demonstrated the power of this approach, reliably monitoring 47 different proteins in human plasma >4.5 orders of magnitude with good sensitivity in the low μg/ml range.30 A subsequent study by Kesh-ishian et al has lowered the limit of detection to the high pg/ml range by integrating depletion of abundant proteins and minimal fractionation.31
A study by Rangiah et al combined a SILAP standard approach with MRM-based validation to study biomarkers in the Apcmin mouse, a colon cancer model.10 Candidate biomarkers were identified in the secreted proteome of the CT26 colon cancer cell line and an MRM method was developed to measure 12 biomarkers of interest. The CT26-derived SILAP standard was added to pooled Apcmin mouse or normal serum samples. Levels of all 12 biomarkers could be quantitated with differential expression validated independently by Western blot analysis for 5 biomarkers.
The past decade has witnessed remarkable innovation in proteomic methods and technologies. The feasibility of LC-MS approaches for the development of protein biomarkers has clearly been demonstrated by the studies described here. In the future, the development of more robust and specific separation technologies coupled to higher sensitivity MS instrumentation will enable the identification of exponentially more candidate biomarkers. However, with the resultant expanding discovery datasets, homogenous “reagent grade” patients are vital to maximize the signal to noise ratio. Alternatively, genetically engineered mouse models of GI disease provide an ideal test platform for such studies. The development of high-throughput MS-based validation protocols will facilitate the development of clinically relevant biomarkers. Taken together, these developments will result in significant progress toward development of protein biomarkers for the early detection of GI cancers.
Conflicts of interest
The authors disclose no conflicts.
Note. The first 5 references associated with this article are available below in print. The remaining references accompanying this article are available online only with the electronic version of the article. Visit the online version of Gastroenterology at www.gastrojournal.org, and at doi:10.1053/j.gastro.2009.11.020.