|Home | About | Journals | Submit | Contact Us | Français|
Plasma membranes form a critical biological interface between the inside of every cell and its external environment. Their roles in multiple key cellular functions make them important drug targets. However the protein composition of plasma membranes in general is poorly defined as the inherent properties of lipid embedded proteins, such as their hydrophobicity, low abundance, poor solubility and resistance to digestion and extraction makes them difficult to isolate, solubilize, and identify on a large scale by traditional mass spectrometry methods. Here we describe some of the significant advances that have occurred over the past ten years to address these challenges including: i) the development of new and improved membrane isolation techniques via either subfractionation or direct labeling and isolation of plasma membranes from cells and tissues; ii) modification of mass spectrometry methods to adapt to the hydrophobic nature of membrane proteins and peptides; iii) improvements to digestion protocols to compensate for the shortage of trypsin cleavage sites in lipid-embedded proteins, particularly multi-spanning proteins, and iv) the development of numerous bioinformatics tools which allow not only the identification and quantification of proteins, but also the prediction of membrane protein topology, membrane post-translational modifications and subcellular localization. This review emphasis the importance and difficulty of defining cells in proper patho- and physiological context to maintain the in vivo reality. We focus on how key technological challenges associated with the isolation and identification of cell surface proteins in tissues using mass spectrometry are being addressed in order to identify and quantify a comprehensive plasma membrane for drug and target discovery efforts.
Plasma membranes (PM)1 and their associated proteins are part of a key biological interface between the outside and the inside of the cell. They are implicated in important cellular functions, such as small molecule transport, cell communication, and signaling. Such proteins are critical in sensing changes in the external environment and in transmitting signals into and out of the cell. Impaired cellular signaling, often involving PM proteins, is apparent in many cancers (1, 2). Membrane proteins represent one-third of the proteins encoded by the human genome (3) but represent more than two-thirds of the known protein targets for existing drugs (4). Thus, defining the proteome of PMs is critical for understanding cellular functions and fundamental biological processes and for finding new targets for drug discovery efforts.
In an ideal world, the PM proteome of all cell types would be analyzed comprehensively to (i) better understand why and where different membrane proteins are expressed, (ii) reveal new functions for the PM in various cell types, (iii) identify cell-specific surface-accessible markers as targetable proteins for local drug delivery, and (iv) identify diagnostic or prognostic indicators in healthy and diseased cell or tissue states. This can be attempted relatively easily in cell culture using a homogenous cell population as compared with the heterogeneity of cells within any tissue or organ from which they are derived. Unfortunately, once cells are isolated from different organs and even more so when cultured and grown in vitro, the cells can change dramatically in appearance, structure, and responsiveness, and protein expression and distribution within the cell can change (5, 6). Biomarkers readily disappear, and the cultured cells do not reflect the in vivo reality. Most importantly, they lose expression of tissue-specific proteins and dedifferentiate into a more common phenotype. Mass spectrometry (MS) analysis revealed that as much as 40% of the proteins expressed by endothelial cells in vivo are not found in vitro (6). Thus, several groups have tried to address this problem by capturing the PMs in as close to an in vivo situation as possible as discussed below.
PM proteins in general may not serve as suitable targets for drug treatment regimens mainly because of their inaccessibility to intravenously injected agents. Although most small molecule drugs can readily penetrate and accumulate in almost any tissue, often a high dose must be administered for the therapeutic dosage to accumulate in the diseased tissue of interest. However, major problems can arise when such drugs are toxic to both normal and healthy tissue, e.g. chemotherapeutic agents, which can accumulate in healthy tissue, resulting in the unwanted and often severe side effects associated with these drugs. Thus, even if a protein expressed on the surface of a cell is indicative of a disease state, it would be relatively impossible to target this cell specifically while avoiding the accumulation in healthy tissue using current drug delivery approaches, especially if the cell of interest is embedded within a tissue or organ. This is simply because multiple barriers, e.g. endothelium, epithelium, etc., must be crossed to access the cell regardless of the route of administration. As a result, great effort is now being focused on developing more targeted approaches where the toxic agents are specifically delivered to the organ or tissue of interest (7–9). With such targeting approaches, dosages can be significantly reduced, thus increasing the therapeutic efficacy while minimizing the side effects (10, 11). Consequently, proteins that are expressed on the surface of endothelial cells (ECs) that line the luminal surface of all vasculature are attractive targets for drugs and imaging agents as these proteins are in direct contact with the circulating blood and are thus inherently more accessible to intravenously administered agents than PM proteins of cells residing deep inside tissues and organs. Proteins expressed on the outer luminal EC surface can readily bind antibodies and other agents that are circulating in the blood. Thus, identifying and characterizing the proteins that line the vasculature of each organ and tissue is highly desirable for drug delivery and diagnostic imaging.
PM proteins, regardless of the cell or tissue of origin, have generally been under-represented in proteomics analysis mainly because of their low abundance. In addition, the inherent insolubility of membrane proteins due to their hydrophobic nature has rendered them difficult to isolate and identify compared with their counterparts in the soluble cytosol and nuclear fractions. In many high throughput protein identification approaches, soluble proteomes are readily characterized, and it is fairly common for thousands of proteins to be identified in such samples. However, when more challenging proteomes, such as those of the PM, are of interest, the numbers of proteins identified are significantly lower. There are many technical reasons for such a dramatic decrease in protein identification when membrane proteomes are of interest. This review focuses on how we and other laboratories are overcoming key technological challenges associated with using traditional MS-based approaches, which were initially developed for the identification of more soluble proteins, for the mapping of membrane proteomes and in particular the in vivo cell surface proteome of endothelial cells.
With the advent of mass spectrometry for shotgun protein analysis, key challenges associated with the large scale analysis of PM proteins became readily apparent. These include 1) membrane isolation and enrichment methods; 2) the subsequent solubilization of the membrane proteins, which although possible with detergents may hinder downstream MS analysis; 3) identification of membrane proteins using traditional MS approaches; 4) determining the topology of membrane proteins, which is very important for the development of antibodies, drugs, or targeting reagents; and 5) achieving comprehensive coverage of the protein content of the membrane sample, which is critical for 6) quantification and relative enrichment but requires 7) proper normalization for meaningful comparisons to be made. The following sections discuss these challenges in more detail and outline how these challenges are being addressed.
PM protein enrichment through subfractionation or isolation aims to not only reduce the complexity of the sample and improve the overall dynamic range of detectable proteins but also helps overcome the low abundance issue as PMs only represent a small fraction of the cell. In addition, many PM separation techniques based on subcellular fractionation principles (e.g. density gradient centrifugation and two-phase partitioning), which use the whole cultured cell as starting material, do not appear to have the resolution to provide highly purified PM fractions without substantial contaminating membranes from other subcellular organelles, such as nuclei, Golgi, ER, and mitochondria (12, 13). Therefore, many newly published PM “capture” techniques center around a similar theme where the goal is to capture, specifically label, or alternatively “shave off” the proteins protruding from the PM into the extracellular space.
Some groups have explored the use of proteases floating free in solution around intact cells in culture to cleave off surface-exposed peptides. The cleaved peptides are then concentrated in the cell solution and analyzed by mass spectrometry (14–17). Although the method appears intuitive, there may be problems relating to the lysis of the cells by the proteases, thus contaminating the membrane “peptide” solution with intracellular proteins. An alternative approach is the cross-linking of PM protein complexes where specific reagents are used to maintain protein complexes in their close to native state (18, 19). The cells are subsequently lysed, and the non-complexed proteins can be removed by various methods, including size exclusion chromatography.
Over the past 30 years, numerous chemical techniques have been developed to label surface proteins of cells. The majority of these methods were developed in cell culture systems before the advent of mass spectrometry permitted large scale protein identification. In one such approach, lactoperoxidase-catalyzed iodination was carried out with epithelial cells to label cell sheets with 125I. The iodination was carried out under conditions that allowed little penetration of lactoperoxidase into the cells, which restricted iodination to the cell surface, and thus, membrane-labeled 125I could therefore provide an effective marker for following PM fragments through subcellular fractionation. This labeling strategy was used to differentiate between apical and basal-lateral PMs in epithelial cell polarization studies (20, 21). This approach was later applied to label surface proteins of EC PMs in vitro and was most often used in combination with lectin affinity chromatography to isolate and characterize glycoproteins (22–24). This approach was adapted for the first mapping of the EC surface in tissue (24, 25). In situ radioiodination of pulmonary endothelial plasma membranes was performed by perfusing intact lungs via the pulmonary artery with lactoperoxidase-conjugated beads. This method could be used to verify protein expression in vivo versus cell culture. Although this radioiodination approach helped identify a small number of PM proteins, its utility for large scale protein mapping is significantly limited as there is no simple way to separate and identify radiolabeled proteins. Nonetheless, lectin affinity principles were still used to pull out membrane glycoproteins (26–28) in the absence of radiolabeling as the vast majority of proteins expressed on the surface of the PM are post-translationally modified by the addition of glycans to asparagine residues. This fact has been exploited by many groups to isolate glycoproteins prior to mass spectrometry analysis.
Other methods for the selective isolation, identification, and quantification of peptides that contain N-linked glycoproteins have emerged and are based on the conjugation of glycoproteins to a solid support using hydrazide chemistry, stable isotope labeling of glycopeptides, and the specific release of formerly N-linked glycosylated peptides via peptide-N-glycosidase F. The recovered peptides are then identified and quantified by MS/MS (29). Alternative glycolabeling enrichment chemistries include the oxidation of cell surface polysaccharides on living cells combined with subsequent biocytin hydrazide labeling (30), but such approaches have been developed for cell culture and have not yet been applied in vivo. Several approaches have been developed that introduce a stable isotope to a given glycan via chemical derivatization (31, 32). Glycans are typically derivatized prior to analysis either by tagging the reducing terminus with a chromophore when subsequent analyses are chromatographic or by permethylation when the sample is to be analyzed by MS. A typical work flow for these in vitro labeling approaches involves the parallel release of glycans from the sample populations under investigation and derivatization with an isotopic label after which the samples are mixed and then analyzed by MS. In vivo labeling strategies have also been described. One such approach is called IDAWG, isotopic detection of amino sugars with glutamine (33), which is similar in principle to SILAC (34). The main source of nitrogen for amino sugars in the production of sugar nucleotides is the side chain of glutamine; hence, isotopically labeled glutamine can be added as the sole source of glutamine for the cells, meaning that it will be incorporated into all N- and O-linked glycans. However, as with SILAC approaches, the cost of the reagents as well as incomplete incorporation of the heavy labeled amino acid into all the glycans can limit its application.
The next development evolved to allow both labeling and capturing of surface-expressed proteins by exploiting biotin-avidin chemistry to isolate cell surface membrane proteins. The membrane proteins are biotinylated on amino acid residues located in extracellular domains and subsequently enriched using magnetic streptavidin beads. There were numerous reports in the early 1980s describing biotinylation of cell surface proteins from plants and microorganisms (35, 36). This approach was later applied to mammalian cells in culture. Initial reports described its application to epithelial cell polarization studies to differentiate between apical and basal-lateral membranes. The cells were grown on filters and then incubated with the biotin reagents to facilitate apical surface labeling, and therefore determine polarized expression of proteins (37).
Zhang et al. (38) and others (39) (for a review, see Ref. 40) later adapted this affinity enrichment method for proteomics analysis. It combines cell surface biotinylation with affinity enrichment by immobilized streptavidin beads for the isolation of cell surface proteins from cultured cells. The authors showed enrichment of PMs relative to other cellular organelles, Zhao et al. (39) were able to detect 898 proteins, including a significant number of proteins identified with only 1 peptide using this method, with 781 of them being annotated as PM-localized. Although this approach gave a good first approximation of proteins at the cell surface, cytoskeletal and intracellular proteins were also observed in these preparations. This is not surprising as biotin reagents are fairly small and can cross lipid membranes to get into cells, and thus proteins within the cells can become labeled. However, the use of charged or polar biotinylation reduces the entry of the reagents into the cells, but cytoskeletal proteins are still often detected (40).
Modifications to the biotin approach in which the probe is membrane-impermeable have also been described (41). In this study, a commercially available sulfo-NHS-SS-biotin probe was used in combination with 18O labeling for quantitative analysis of cell surface membrane proteins. There are various sulfo-NHS-biotin ester reagents available with varying properties and spacer arm lengths. They are all polar and water-soluble, thus reducing their ability to penetrate cell membranes. Alternatively, membrane-impermeable thiol-reactive biotins (e.g. maleimido-biotin) are also available for the labeling of membrane proteins where the usual amine labeling by biotin is undesirable. However, whether all PM proteins are biotinylated similarly remains unclear as different biotinylation reagents preferentially target different reactive groups; for example, some reagents target primary amino groups, which are abundant in the form of lysine side chain ε-amines, and others target free sulfhydryl groups found in cysteine residues (40). Therefore, the extent of protein biotinylation is dependent not only on the nature of the reagent but the number and type of accessible amino acids of the protein. This variability of protein labeling will likely complicate any attempt to quantify proteins and thus highlights the need to investigate the degree of protein biotinylation further.
After a decade or two of studying various isolate cells in culture, researchers became acutely aware of the very significant phenotypic drift associated with removing and culturing cells away from their native microenvironment (5, 6, 8). There is documentation of large morphological changes, including altered distribution and even loss of subcellular organelles, which provide primary evidence of the problem (5, 42). This has major implications for biomarker and target discovery efforts as the in vitro phenotype generally does not reflect the in vivo reality. However, it has been very difficult to study cells in vivo, especially to get key samples for protein analysis. But ECs are one cell type that may allow such analysis as we can perfuse the vasculature to isolate these cells for direct analysis. It is for this reason that we focus on ECs and highlight the requirement for new technologies for other cells in tissues. Although most cells can be removed from tissues, the key is not to perturb the surface of the cells, which is hard to do. However, other readily accessible surfaces, such as the epithelial lining of the lungs, intestine, etc., can be readily perfused in the same manner as described for the lung.
In 1997, Merker and co-workers (43) described a method for biotinylating cell membrane proteins accessible via the vascular lumen in the isolated-perfused rat lung to examine the impact of hyperoxia on the spectrum of the biotinylated proteins. This approach was pivotal for moving the biotin-based membrane capture approaches in vivo. In the last 5 years, this approach has been adapted and used for large scale proteomics analysis of vasculature-accessible proteins (44). However, the specificity of this technique in vivo for the specific labeling of endothelial cell membranes remains an issue as extracellular matrix proteins are also found in this preparation because it is difficult to control the location of biotinylation in vivo. Not all expressed cell surface proteins are necessarily biotinylated and clearly not biotinylated equally so identification and especially quantification can be problematic. Also, access into tissue beyond the endothelial cell surface remains an issue as immunostaining clearly shows that labeling has gone beyond the EC lining and well into the tissue.
We took a different approach to isolating EC surface membrane proteins from tissue. For over two decades, we have developed and optimized several methods to study surface proteins in vitro and in vivo (6, 8, 23, 24, 45–55), the most promising of which has been an adaptation of the in vitro silica coating procedure developed for cultured cells (56) to isolate luminal vascular endothelial PMs directly from tissue (6, 8, 48, 55, 57). In this method, the vasculature is perfused at low temperature with a colloidal silica solution, and the cationic silica nanoparticles selectively coat the luminal EC surface, greatly increasing the buoyant density of the PM beyond that of any other component in the tissue. After tissue homogenization, large sheets of silica-coated EC PM are readily isolated from other cellular membranes and debris by centrifugation through a high density medium. This method serves to preserve the native EC conditions in vivo, which is critical for target discovery efforts.
We use this protocol for all of our PM preparations and perform rigorous quality control on these samples, including a requirement of a 20-fold enrichment in EC plasma membrane markers, such as caveolin-1, angiotensin-converting enzyme, endothelial NOS, and VE-cadherin, and a 20-fold depletion in markers of other organelles, e.g. transportin (nuclear marker), p58 (Golgi marker), ERp72 (ER marker), and cell types, e.g. CD11 (macrophage) and GlyA (red blood cell). Such validation is very important for target discovery proteomics efforts. It is well known that classical biochemical cell fractionation techniques do not yield perfectly purified PMs, but usually the PMs contain levels of contaminations derived from intracellular organelles membranes and tissue components that must be tested, such as the nucleus, mitochondria, ER, Golgi, lysosomes, etc. Therefore, validation of PM enrichment as well as depletion of known intracellular markers is key for successful proteomics mapping of such membranes. Ideally, a panel of benchmark proteins for determining enrichment of PMs should be established for each cell type as opposed to a “one panel fits all approach” as not all proteins are expressed equally in each cell type, especially on the PM surface. This also needs to be taken into account when cells are isolated in vivo and transferred to culture as discussed above.
Regardless of where the membrane proteins come from or how they are purified, the same underlying issues associated with their analysis remain and include solubility and difficulty in detection by traditional mass spectrometry approaches. It is the inherent properties of the membrane proteins, such as their low abundance and poor solubility, that makes them difficult to isolate, purify, solubilize, and identify on a large scale. Although significant progress is being made on isolating and enriching for membrane proteins, the solubilization of these proteins remains a key issue as many membrane proteins contain both hydrophobic and hydrophilic domains that make them difficult to purify, separate, and characterize. There are various types of PM proteins, including 1) integral membrane proteins (IMPs), which are permanently attached to the membrane and are either single pass or multipass proteins that weave in and out, crossing the membrane several times; 2) peripheral or membrane-associated proteins, which adhere only temporarily to the membrane with which they are associated as they generally bind to other proteins or lipids in the PM but are not anchored inside the lipid bilayer (e.g. cytoskeletal proteins); and 3) lipid-anchored proteins, which are attached to the membrane via covalently attached lipids, such as palmitate, myristate, or glycosylphosphatidylinositol (GPI). Various methods have been developed to preferentially isolate and solubilize these different types of membrane proteins and have been described extensively elsewhere (39, 58–65).
Once isolated, PM proteins are further separated prior to MS analysis to prevent overwhelming the detection capabilities of the mass spectrometer and to allow the unmasking of low abundance proteins. Such separation usually occurs at either the protein or peptide level. Proteins can be separated by either two-dimensional electrophoresis (2DE), one-dimensional SDS-PAGE, or off-gel isoelectric focusing (66). Alternatively, peptides are often separated by either peptide isoelectric focusing (capillary electrophoresis (for a review, see Ref. 67) or single or multidimensional liquid chromatography.
Although 2DE was initially the gold standard for large scale protein separation as they greatly facilitate sample visualization and are esthetically pleasing they are however quite laborious to perform with consistency and have a bias against IMP identification. 2DE is limited in its ability to resolve basic or hydrophobic proteins as well as those with greater than three transmembrane regions. In addition, its limited dynamic range and difficulty in unmasking low abundance proteins mean that a lack of sensitivity is also a key issue (68). Of the estimated 30% of proteins that are membrane proteins, only ~1% of them can be resolved properly on two-dimensional gels with current techniques (69) despite the increased use of denaturing agents such as thiourea (70) and zwitterionic detergents (71), which improve resolution somewhat. Further improvements are needed for the analysis of membrane proteins and are expected in the future.
With the advent of high throughput mass spectrometry for protein identification, researchers sought to combine the separation power of LC with the rapid sequencing capabilities of modern mass spectrometers. The exploitation of more than one property of a peptide is a powerful chromatographic technique for protein identification in complex samples (72). When this technique was first described by Yates and co-workers (72), proteins were first digested in solution into peptides prior to separation by reverse phase, which separates by hydrophobicity, and/or strong cation exchange, which separates by electrostatic charge, liquid chromatography (one-dimensional and/or two-dimensional LC). This separation can be done either off line or coupled directly to the mass spectrometer. Although this approach showed great promise for soluble cell extracts, its application to less soluble membrane preparations quickly revealed its limitations.
Although we and other groups were achieving modest success with membrane protein identification strategies using gel-free chromatographic separation (16, 73–75), the notable absence of integral membrane proteins from such MS analysis became readily apparent (6, 76, 77). In our initial survey of normal rat lung EC PMs using three standard analytical techniques of the time (2DE, Western analysis, and the shotgun method of gel-free two-dimensional liquid chromatography-tandem mass spectrometry (LC-MS/MS), we successfully identified 450 proteins, which was a notable number at the time (6, 8). Of the 450 proteins, only ~15% were integral membrane proteins. Further examination of the data set revealed that several well known EC surface marker proteins, including specific enzymes, adhesion molecules, and growth factor receptors, were not detected in the sample by the large scale two-dimensional LC analysis.
To begin to understand why so many IMPs escaped detection by the gel-free two-dimensional LC approach, we investigated the sample preparation procedures for two-dimensional LC. Most membrane proteins are best solubilized by ionic detergents, such as SDS. However, SDS and other detergents are either not compatible with trypsin digestion or interfere with the MS analysis so that in the sample processing for two-dimensional LC the detergent-solubilized proteins must first be precipitated before being resolubilized for trypsin digestion (usually in urea solution). We assessed possible protein loss during the precipitation and resolubilization steps. As expected, the more soluble proteins were equivalently present regardless of the process and showed minimal losses after precipitation and urea solubilization. However, IMPs and even lipid-anchored proteins were significantly affected by sample processing. Some proteins unaffected by the precipitation step did not resolubilize well in urea solution, which typifies the procedure used in two-dimensional LC. Other proteins showed a dramatic loss in signal from the precipitation step regardless of resolubilization conditions. The sample loss for two-dimensional LC was fairly apparent, ranging from 0% for soluble proteins to >90% for many lipid-embedded proteins (47).
Mass spectrometry-compatible detergents have been developed to aid with membrane protein identification (78). Chen et al. (79) recently examined the effect of adding such MS-compatible detergents to the digestion buffer in place of urea and observed a significant improvement in the number of peptides and proteins identified compared with the traditional urea-based in-solution digestion. However, they noted that the different MS-compatible detergents resulted in preferential detection of different types of peptides and suggested the combination of multiple approaches to generate a more diverse mixture of peptides; this is a problem similar to that observed with the different membrane isolation methods described above. Unfortunately, none of these MS-compatible detergents have the membrane solubilization powers of SDS.
We sought to exploit the benefits of SDS and SDS-PAGE for membrane protein solubilization and separation, respectively, while bypassing standard precipitation and resolubilization prior to in-solution trypsin digestion and MS analysis. We recently performed a large scale analysis using four different MS-based strategies involving two- and three-dimensional separation of proteins and peptides (47). Proteins were either digested in solution before two-dimensional LC-MS analysis or separated by SDS-PAGE after which the gel was cut into multiple slices for in-gel trypsin digestion followed by either one-dimensional or two-dimensional LC-MS analysis of digested peptides. We found that the best method was the three-dimensional approach, which combined SDS-PAGE separation of proteins followed by two-dimensional separation of in-gel digested proteins directly coupled on line to MS/MS analysis. This analysis showed the benefits to be gained from the very complete solubilization of membrane proteins by SDS and SDS-PAGE separation of the proteins prior to in-gel trypsin digestion. SDS-PAGE also plays a significant role in resolving IMPs and lipid-anchored proteins and minimizing losses associated with the sample preparation. By using the methods outlined above, we succeeded in expanding this vascular endothelial plasma membranome to 1833 proteins, including >500 lipid-embedded proteins, which provided nearly a 10-fold increase in the identification of key multispanning transmembrane proteins over past approaches.
Membrane protein and peptide hydrophobicity provides another challenge in the identification of membrane proteins by LC-MS/MS apart from the solubilization issues discussed above. Peptides can often be so hydrophobic that they are not eluted from the C18 reverse phase material normally used in LC-MS/MS and thus are never detected. Recently, the application of heat to reverse phase LC has been shown to enhance the identification of extremely hydrophobic proteins and peptides (16, 80). Also, few tryptic cleavage sites occur within the hydrophobic domains of membrane proteins. The target sites for tryptic cleavage, lysine and arginine, are mainly absent in transmembrane helices and only found in the hydrophilic part of the protein (81). The size of exposed hydrophilic domains varies among integral membrane proteins from large (e.g. epidermal growth factor receptor) to small (e.g. rhodopsin). The number of available tryptic sites for membrane proteins is significantly lower than for more soluble proteins, meaning that few if any tryptic peptides are produced of appropriate size that are within the target mass range for MS analysis.
To address the problem of low numbers of tryptic sites for membrane proteins, alternative enzymes to trypsin for protein digestion as well modifications to the trypsin digestion protocol have been used to increase the number of peptides available for MS analysis. These include combining trypsin digestion with cyanogen bromide (82) and chymotrypsin cleavage. A recent study by Fischer and Poetsch (83) looked at protein cleavage strategies for improved analysis of membrane proteins. They tested the effects, in silico, of different proteases for improved sequence coverage for membrane proteins from three different organisms and found that the combination of chymotrypsin and staphylococcal peptidase 1 or trypsin/cyanogen bromide gave significantly better results than trypsin alone. Other studies have shown that cleavage with trypsin and cyanogen bromide in formic acid almost exclusively yields peptides of exposed hydrophilic domains from membrane proteins (e.g. loops) (72) but that digestion in 60% methanol permits the analysis of peptides from transmembrane helices (84).
Studies involving the use of elastase (85) and pepsin (86) for the identification of membrane proteins have also been reported. Alternative nonspecific proteases, such as Proteinase K, which at high pH creates overlapping peptides of 6–20 amino acids, have also been used (87, 88). A major disadvantage of this approach is the dramatic increase in the number of peptides produced compared with a traditional trypsin digestion, meaning additional chromatographic steps are needed to separate the peptides that can complicate the analysis of a complex membrane proteome even further.
Once a successful digestion protocol has been established and the membrane proteins have been identified, the orientations of the proteins in the membrane as well as the regions of the protein spanning the membrane are often determined. This process is referred to as topology mapping, and such mapping is very important for developing antibodies, drugs, and other reagents that will bind to and/or affect the function of membrane proteins. Classically, protein structure is determined by x-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy, but unfortunately, there are no general and reliable methods for forming three-dimensional crystals of membrane proteins suitable for crystallographic analysis mainly due to the inherent hydrophobicity. To date, only a handful of high resolution membrane protein structures have been solved, whereas several thousands of three-dimensional structures of globular proteins are known. Nevertheless, biochemical and prediction methods can still be used to obtain structural information about membrane proteins. The biochemical methods used to determine membrane topology are based on similar principles and are reviewed elsewhere (89). In short, they are based on the fact that, due to the impermeability of the membrane to hydrophilic molecules, parts of a membrane protein that lie on opposite sides of the membrane are differently accessible to various agents. Easily identified target sites are inserted in the protein, and membrane-impenetrable reagents are used to determine their accessibility at one side of the membrane. By inserting the tag at different positions in the protein, a more complete topology can be determined.
Alternatively, the physiochemical constraints imposed by the lipid membrane environment provide a simple method to predict the topology of a membrane protein. Membrane protein topology predictions are based on the observation that 1) transmembrane α-helices have a high overall hydrophobicity, and 2) the charge distribution of the hydrophilic loops that connect the transmembrane segments follows the “positive inside” rule, which states that non-translocated loops are enriched in positively charged residues compared with protein domains translocating the lipid bilayer (90). The first observation is used to identify the transmembrane segments in the amino acid sequence by analyzing the hydropathic properties of the amino acid sequence (81, 91, 92), and the second observation is used to predict the overall orientation of the protein in the membrane. The biochemical techniques mentioned above are often used to verify these predictions. Examples of prediction tools include TMHMM, SOSUI, and TMpred. These and others are available on the ExPASy web site at www.expasy.org/tools.
Because of the inherent difficulties associated with the proteomics analysis of membranomes, it is not surprising that a plethora of bioinformatics tools have emerged to facilitate their analysis. Depending on the question of interest, these tools can either be used prior to or after MS analysis or, in some situations, may be used independently of MS analysis. For example and as discussed above, there are a number of tools available that can predict the cleavage sites for particular enzymes; thus, the entire proteome of interest can be digested in silico prior to MS analysis to determine the optimal enzyme to maximize the number of detectable peptides. This is very important as many membrane proteins have few or inaccessible trypsin cleavage sites, significantly reducing the probability of being detected by traditional MS approaches. Such tools include PeptideCutter, which predicts potential protease cleavage sites and sites cleaved by chemicals in a given protein sequence, and PeptideMass, which cleaves a protein sequence with a chosen enzyme and computes the masses and sequences of the generated peptides.
Subcellular localization prediction of proteins is often used after mass spectrometry analysis of membrane proteomes to confirm that the majority of proteins identified in the analysis are localized to the plasma membrane, thus helping to validate the membrane preparation. These tools are based on sequence analysis as the subcellular localization of a protein is influenced by several features present within the primary structure of the protein, such as the presence of a signal peptide (which directs a protein for export past the cytoplasmic membrane), particular domains or motifs (e.g. mitochondrial targeting peptide), or membrane-spanning α-helices. Examples of such tools include PSORT and TargetP. However, it is common for proteins to have dual localizations, a fact that is not always picked up by such predictive tools.
Various other bioinformatics tools have been developed to predict post-translational modification sites. For the analysis of PM proteins, the main PTMs that are of interest are phosphorylation and glycosylation. Phosphorylation plays an important role in cell signaling. The plasma membrane is often the initiation center for cell signaling events; thus, the identification of phosphorylation events at the plasma membrane remains an important challenge. Glycosylation as a modification is usually in the form of either O-linked or N-linked glycosylation. O-Linked glycans are attached to serine or threonine residues and act in a dynamic fashion similar in principle to phosphorylation. N-Linked glycosylation is more of a co-translational modification than a PTM. This modification is usually found in membrane and secreted proteins and thus is of interest in the analysis of membrane proteins. PTM prediction tools can prove very helpful prior to mass spectrometry analysis because if the modification site is known prior to analysis then the peptides that can carry the modification can be monitored throughout the MS analysis. Examples of such PTM prediction tools include NetPhos, which predicts Ser, Thr, and Tyr phosphorylation sites in eukaryotic proteins; NetPhosK, which predicts kinase-specific phosphorylation sites in eukaryotic proteins; NetNGlyc, which allows the prediction of N-glycosylation sites in human proteins; and NetOGlyc, which can predict O-GalNAc glycosylation sites in mammalian proteins.
In addition to the PTM prediction tools, PTMs can also be determined through targeted MS analysis, which often relies upon enrichment techniques for the PTM of interest. For example, the most commonly used strategies for determination of phosphorylation sites involve peptide-focused analysis of phosphopeptides enriched typically by IMAC (93, 94) and/or TiO2 (95, 96) microchromatography. In traditional CID mass spectrometry, phosphorylation is detected by one intensive peak, corresponding to the loss of phosphoric acid from the parent mass, and a few low abundance fragment ions are generated upon CID of serine and threonine phosphorylated peptides (97). Additional losses of water can also be observed (98). PTMs, including phosphorylation, often mask putative tryptic cleavage sites, sometimes resulting in peptides that are too large for efficient MS/MS analysis using CID. In addition, PTMs are often labile upon CID, meaning that the modification may be lost by the fragmentation process. As a result, alternative methods of fragmentation were sought that would allow sequencing for peptide identification while maintaining the integrity of the PTM. Electron transfer dissociation is one such technique where cleavage of the N–Cα bond is induced, resulting in the generation of c- and z-type fragment ions as opposed to b- and y-ions produced by CID. In contrast to CID, electron transfer dissociation produces almost complete c- and z-ion series, thus preserving the phosphorylation modification, which therefore improves global proteomics profiling of phosphopeptides. The field of phosphoproteomics has been extensively reviewed, and the details can be found elsewhere (99, 100).
Lipid-anchored proteins, including those with a GPI anchor, represent another PTM that is of interest for membrane proteins as proteins can be attached to the plasma membrane by a GPI anchor. Elortza et al. (14, 101) have described various techniques for the analysis of GPI-anchored proteins by MS, including one “shave and conquer” approach where GPI-anchored proteins are released from the membrane by phosphatidylinositol phospholipase C or D cleavage prior to MS analysis. N-Myristoylation is another co-translational modification or PTM of interest where myristate is covalently attached to the N-terminal glycine of proteins. It promotes weak and reversible protein-membrane as well as protein-protein interactions. In addition, a subset of proteins undergo post-translational covalent modification of cysteine residues with one or more palmitoyl groups after N-myristoylation, including the Src family of tyrosine kinases. This reversible modification provides a mechanism for regulated interactions between these N-myristoylated proteins and cellular membranes. In addition, palmitoylation enhances the hydrophobicity of proteins and contributes to their membrane association. These modifications are usually determined using predication algorithms, such as Myristoylator found on the ExPASy website or the MYR predictor found at http://mendel.imp.ac.at/myristate/SUPLpredictor.htm.
Although sample prefractionation and protein/peptide separation have greatly enhanced the number of proteins detected in a typical high throughput mass spectrometry protein identification experiment, achieving comprehensive coverage of the protein sample remains a key issue, especially for membrane proteins. This issue increases in difficulty as the complexity of the sample increases as often only the most abundant proteins in the sample are detected. To increase proteome coverage, data-dependent acquisition is a widely used technology in LC-MS/MS that facilitates the unmasking or sampling of lower abundance proteins. The intensity-based dynamic exclusion algorithm that underlies this function in most software usually follows the following rules: the most intense full mass scan peak(s) for the succeeding MS/MS scan(s) is picked, and when an m/z value has been repeatedly chosen for CID analysis a user-defined number of times, it is excluded from CID analysis for a number (also predefined) of full mass scans before becoming eligible again (102, 103). Once a precursor ion mass is placed on the dynamic exclusion list, the instrument will move on to other, usually less abundant, ions to generate additional MS/MS scans of new precursor ions. As a result, both high abundance and low abundance co-eluting ions might have a chance to undergo fragmentation. Obviously, more unique peptides can be identified when dynamic exclusion is enabled. On the other hand, when dynamic exclusion is not enabled, the instrument will only select the most abundant ions for repeated MS/MS generation, and the number of identified peptides will rely primarily on the separation capabilities of the liquid chromatography system used.
A more recent modification of this approach is the generation of mass exclusion lists where the masses of successfully identified peptides are not selected for sequencing during subsequent analysis (104, 105). When a run concludes, peptide database searching is performed, and the masses of the sequenced peptides from that run are compiled into the mass exclusion list such that the list is amalgamated after each run. This analysis may be repeated until the desired number of peptides is identified or until no additional peptides are sequenced. Selected ions that do not generate a validated peptide sequence are not added to the exclusion list. Rudomin et al. (104) recently described such an exclusion list-based approach, termed AMEx. They showed that AMEx increases the total number of unique peptides identified in a standard 90-min reverse phase separation by 26% compared with standard data-dependent acquisition analysis. This translated into increasing protein identifications by 15%. It will be interesting to see how such an approach can be applied to multidimensional analysis such as two-dimensional LC and those recently described by Li et al. (47) as the run times (both chromatographic and MS) and data sets are significantly larger. The generation of such exclusion lists may reduce the number of analyses that need to be performed to comprehensively describe the data set. Nevertheless, how quantitative data, in particular for label-free quantification, can be obtained from such an approach remains to be determined.
It is not surprising that both the type of instrument and the profiling method can impact the depth of coverage achieved for not only a membrane protein sample but for any sample as each mass analyzer has unique properties, such as mass range, analysis speed, resolution, sensitivity, ion transmission, and dynamic range. The type of instrument chosen for the analysis of membrane proteins is often as a result of whatever instrument is on hand, but if choices are available, then the instrument should match the desired outcome. For example, if the goal is to identify as many proteins as possible in a complex membrane (or any) sample, then fast scanning, high sensitivity instruments such as ion traps are preferred. Alternatively, if the aim is to identify the post-translationally modified forms of intact proteins, then high resolution, high mass accuracy instruments such as Orbitraps and FT-ICR instruments are preferred (for a review, see Ref. 106). Hybrid mass spectrometers have been built that combine more than one mass analyzer to address specific needs during the analysis. For example, the Orbitrap, when coupled to an LTQ ion trap, has the advantages of both high resolution and mass accuracy of the Orbitrap and the speed and sensitivity of the LTQ. Furthermore, one can operate an LTQ-Orbitrap in a parallel fashion: the Orbitrap acquires MS full scans, whereas the LTQ carries out fragmentation reactions. This acquisition of high resolution MS data can improve the identification and quantification of low abundance peptides. This has important implications for membrane proteins because of the low number of potential tryptic sites as well as the hydrophobic nature of resulting peptides. As discussed above, if multispanning transmembrane proteins with short ecto- and endodomains (little proteins embedded in the membrane) are to be analyzed by mass spectrometry, then it is often the case that that few if any peptides are available for detection. This highlights the need for high resolution instruments such as Orbitraps and FT-ICR instruments to have confidence in single peptide identifications.
While achieving comprehensive coverage of the protein content of a sample is critical to target or biomarker discovery efforts, emphasis should also be placed on proteins with the biggest differential expression patterns. This means that quantitative methods must be used for comparison. Most of these quantitative approaches can be applied to virtually any protein sample with few if any approaches being developed for the analysis of membrane proteins specifically aside from the use of membrane protein capture methods described above where the targets proteins are isotopically labeled, e.g. IDAWG or SILAC approaches.
Most quantitative experiments utilize a special tagging system that is usually either chemical (ICAT and iTRAQ (isobaric tags for relative and absolute quantitation)) (107) or biosynthetic (cells grown in medium with one or more stable isotopes) (34). Differences in protein expression are quantified by the relative intensity of conjoint spectra, each with a tag unit difference (for a review, see Ref. 108). Label-free quantitative methods have also emerged (109–114) and are growing in popularity in part because they avoid the use of expensive labeling reagents; eliminate the extra analytical complexity from labeling, which requires additional MS/MS spectra interpretation; permit comparison of multiple data sets; and facilitate retrospective comparisons. These methods are based on using one MS output feature of abundance, such as spectral or peptide counts (111, 112, 115, 116), as a means of determining relative protein abundance of the same protein in several samples. Chromatographic peak intensity and peak area have also been shown to correlate with protein abundance (117–122) but require complex algorithms to integrate the area under the curve or total elution curve for each isotope pattern. A number of tools can be used to extract peptide ion intensities following identification (109, 123–125).
Regardless of whether label or label-free quantitative approaches are used to identify and quantify membrane proteins, current large scale MS approaches cannot identify all proteins in complex tissues, cells, or membrane preparations. Each cell contains approximately twenty thousand genes, which can give rise to the expression of hundreds of thousands of proteins due to alternative splice isoforms, post-translational modifications and other factors. Although subfractionation helps reduce this complexity for membrane proteomes, it is still not possible to identify every protein and its modification states in a complex membrane (or any) sample. As a result, we need to define when the MS analysis of a sample is complete using a specific technique. Currently there is no definition in the field of when a sample is sufficiently analyzed. Most journals require a minimum of two MS replicates per sample group when samples are compared. Yet no emphasis has been placed on the importance of analytical completeness. We previously discovered that each MS measurement of a shotgun proteomics analysis identifies only a subset of proteins and that any second and third MS measurement of the same sample will reveal 33% and 16% respectively of new proteins not detected in the previous analysis (6, 126). Just because a protein is not detected in one measurement does not mean it will not be detected in a second or subsequent measurement of the same sample. This means that multiple MS measurements should be performed to comprehensively define the full proteome to the degree possible with the technique used before meaningful comparisons can be made between sample groups; otherwise, false positives will prevail. This is critical for biomarker and target discovery as illustrated in Fig. 1.
Analytical completeness serves to instill statistical confidence that a sample has been analyzed to the capacity or limitations of the analytical technique used. Our unique laboratory standard is that multiple MS measurements of the same sample are analyzed until >95% analytical completeness is achieved. This means that after this level has been reached, subsequent repeated measurements of the same sample using the same analytical technique will not contribute a significant number of new protein identifications. However, doing so requires extensive resources, including time, instrumentation, and sample. We recently adapted a mathematical model to model the number of peptide identifications across various cycles of a typical multidimensional protein identification technology experiment (126). This model can be used to estimate the total number of proteins in the experimental sample and thus facilitate identification of a statistically determined end point of analytical completeness. We later applied this model to multiple mass spectrometry-based methods and found that, depending on the method used, 5–10 replicate MS measurements are needed to reach 95% analytical completeness (6, 47, 126–128).
Although achieving analytical completeness through multiple replicate measurements is the end goal before any meaningful comparisons can be made between samples, it must be noted that such replicate data contain inherent biases and variations as MS signals are frequently corrupted by systematic or even apparently random changes. Peptide detection and thus protein identification are dependent on a number of parameters intrinsic to the MS method itself, including peptide ionization efficiency, which is clearly an issue with the mainly hydrophobic nature of peptides derived from membrane proteins. Thus, replicate samples will usually show variation in MS abundance signals that is likely not a reflection of biological change (127). This highlights the need to normalize measurements to minimize inherent experimental bias and variability so that real changes in protein expression or abundance between distinct samples can be reliably determined. Such statistical quantification becomes critical when multiple replicate samples are analyzed using single or multiple high throughput, shotgun methods, which result in large volumes of data.
To overcome this hurdle, we recently set out to develop and test various methods to quantify, normalize, and compare complex label-free proteomics data, the majority of which were derived from the MS analysis of membrane proteins (127). To quantify protein levels, we compared the effectiveness of using features intrinsic to typical MS measurements, such as spectral and peptide counts as well as testing the utility of fragment ion intensities. We subsequently combined these features into a single measurement called the spectral index (SI) and found that this method is superior to any single MS feature used in isolation. We found that each of these raw features was not reproducible across replicate experiments, meaning that normalizations methods are required to allow meaningful comparisons between the replicates. We concurrently developed and tested various methods to normalize these features to control for measurement biases and variations. We found that the new normalized spectral index (SIN) over all other methods tested largely eliminated variances between replicate MS measurements (127). In addition, SIN could accurately determine the correct amount of each protein standard in a mixture better than all other methods tested, thus allowing the quantitative comparison of biologically distinct data sets with high confidence and relative ease. Importantly, SIN could be used to determine enrichment ratios of PM proteins compared with the rest of the cell, and such ratios showed excellent correlation with Western blot ratios. Such a normalization approach is crucial when multiple replicate samples are analyzed to achieve comprehensive protein coverage of a sample before true biomarkers or targets can be confidently discovered; otherwise, false positives will thrive.
A downside to shotgun proteomics is the phenomenon that many peptides that are observed in some samples are not observed in others, resulting in widespread missing values. This can occur with any sample type and is not limited to membrane proteins. Several approaches have been developed to try to address the issue of “missing peptides,” which can hamper protein quantification across replicates and between distinct samples. With these approaches, analysis can be carried out either on the complete data set, excluding the missing values (129, 130), or based on imputed data using standard imputation routines developed for microarray data like k-nearest neighbors (131) or the least squares principle (132) for filling in the missing values. A number of commercial software packages also try to take into account missing values for protein quantification. For example, the ProQuant software package allows the use of unfiltered, subthreshold peptide identifications for quantitative analysis if, say, a tandem MS spectrum is present that correlates with the peptide of interest but is below the user-defined XCorr threshold for identification; this XCorr value is used to generate a score for the peptide. The unfiltered data are only used for imputing missing values and not used for peptide identification. In other software packages, including ProteinPilot (Applied Biosystems), DecyderMS (GE Healthcare), and Proteome Discoverer (Thermo Fisher), the user can select the number of replicates where a peptide has to appear for it to be considered significant and can manually decide whether to include a particular peptide in the quantitation results. In an alternative approach for dealing with missing values in intensity-based protein quantification, the intensities for the peptides of a protein are averaged after suitable scaling or normalizing the sibling peptides. For example, the DAnTE software constructs a median protein abundance profile across samples and scales all other proteins to this profile (133).
The advantages to be gained by performing comprehensive proteomics analysis on plasma membranes as they exist in vivo become readily apparent, especially as it offers the potential to uncover vast numbers of new targets that otherwise may not be discovered by traditional biochemical and molecular methods. However accessing cells and their plasma membranes in vivo remains a challenge. While blood cells can be readily isolated and analyzed, solid tissue components are more difficult. We and others have shown that isolated luminal endothelial cells offer a significant reduction in complexity over analyzing the entire tissue and afford the potential to define the set of proteins in direct contact with the circulating blood that are therefore inherently more accessible to biological agents injected intravenously. Such an approach should be applied to other luminal surfaces including the gut and lung epithelium (6, 8, 134).
Although there are key challenges associated with the identification and characterization of PM proteins, over the past 5 years, we as a field have made significant progress in overcoming some of the main obstacles associated with their proteomics analysis. These include making significant improvements to PM isolation, solubilization, and separation methodologies, meaning that more IMPs, which were previously lost during the sample preparation process for two-dimensional LC, can be detected by newer methodologies (47). A clear need has also been shown to achieve 95% analytical completeness for each sample before reliable comparisons between samples can be made (6, 47, 126–128). We have created a statistical way to define analytical completeness. Without such statistical rigor, proteins initially classed as being differentially expressed may in fact be false positives and end up being detected in the corresponding sample after an additional replicate measurement. In addition, we have developed a method (SIN) to control for variation between replicate samples so that real changes in protein abundance between biologically distinct samples can be detected (127). This method allows the quantitative comparison of biologically distinct samples and should therefore greatly facilitate the use of label-free quantitative proteomics approaches for differential protein expression analysis both in PM and other proteomes. Many of the key challenges that were apparent at the beginning of this enterprise have now been met.
Elucidating the molecular topography of the cell surface in multiple tissues constitutes the first step in gaining a better fundamental understanding of functional differences across organ systems. However, such comprehensive profiling of membranes is no trivial matter, and although there have been many hurdles along the way, great progress has been achieved over the past few years in addressing key technological challenges associated with the use of mass spectrometry for cell surface mapping. Although we are making significant progress with characterizing ECs in vivo, such an approach needs to be applied to all in vivo cells as it is almost impossible to extrapolate in vitro findings into an in vivo situation. This is one challenge on which we need to focus because for biomarker and target discovery to move forward defining the in vivo reality becomes critical.
* This work was supported, in whole or in part, by National Institutes of Health Grants RO1HL074063, R33CA118602, P01CA164898, and RO1CA119378 (to J. E. S.) and The Regents of the University of California, Tobacco-Related Disease Research Program, Grant Number 18XT-0196. The opinions, findings, and conclusions herein are those of the author(s) and do not necessarily represent those of The Regents of the University of California, or any of its programs.
1 The abbreviations used are: