Reverse phase protein arrays (RPPA) have been demonstrated to be a useful experimental platform for quantitative protein profiling in a high-throughput format. Target protein detection relies on the readout obtained from a single detection antibody. For this reason, antibody specificity is a key factor for RPPA. RNAi allows the specific knockdown of a target protein in complex samples and was therefore examined for its utility to assess antibody performance for RPPA applications.
To proof the feasibility of our strategy, two different anti-EGFR antibodies were compared by RPPA. Both detected the knockdown of EGFR but at a different rate. Western blot data were used to identify the most reliable antibody. The RNAi approach was also used to characterize commercial anti-STAT3 antibodies. Out of ten tested anti-STAT3 antibodies, four antibodies detected the STAT3-knockdown at 80-85%, and the most sensitive anti-STAT3 antibody was identified by comparing detection limits. Thus, the use of RNAi for RPPA antibody validation was demonstrated to be a stringent approach to identify highly specific and highly sensitive antibodies. Furthermore, the RNAi/RPPA strategy is also useful for the validation of isoform-specific antibodies as shown for the identification of AKT1/AKT2 and CCND1/CCND3-specific antibodies.
RNAi is a valuable tool for the identification of very specific and highly sensitive antibodies, and is therefore especially useful for the validation of RPPA-suitable detection antibodies. On the other hand, when a set of well-characterized RPPA-antibodies is available, large-scale RNAi experiments analyzed by RPPA might deliver useful information for network reconstruction.
Reverse phase protein arrays (RPPA) emerged as a useful experimental platform to analyze biological samples in a high-throughput format. Different signal detection methods have been described to generate a quantitative readout on RPPA including the use of fluorescently labeled antibodies. Increasing the sensitivity of RPPA approaches is important since many signaling proteins or posttranslational modifications are present at a low level.
A new antibody-mediated signal amplification (AMSA) strategy relying on sequential incubation steps with fluorescently-labeled secondary antibodies reactive against each other is introduced here. The signal quantification is performed in the near-infrared range. The RPPA-based analysis of 14 endogenous proteins in seven different cell lines demonstrated a strong correlation (r = 0.89) between AMSA and standard NIR detection. Probing serial dilutions of human cancer cell lines with different primary antibodies demonstrated that the new amplification approach improved the limit of detection especially for low abundant target proteins.
Antibody-mediated signal amplification is a convenient and cost-effective approach for the robust and specific quantification of low abundant proteins on RPPAs. Contrasting other amplification approaches it allows target protein detection over a large linear range.
The reverse phase protein array (RPPA) data platform provides expression data for a prespecified set of proteins, across a set of tissue or cell line samples. Being able to measure either total proteins or posttranslationally modified proteins, even ones present at lower abundances, RPPA represents an excellent way to capture the state of key signaling transduction pathways in normal or diseased cells. RPPA data can be combined with those of other molecular profiling platforms, in order to obtain a more complete molecular picture of the cell. This review offers perspective on the use of RPPA as a component of integrative molecular analysis, using recent case examples from The Cancer Genome Altas consortium, showing how RPPA may provide additional insight into cancer besides what other data platforms may provide. There also exists a clear need for effective visualization approaches to RPPA-based proteomic results; this was highlighted by the recent challenge, put forth by the HPN-DREAM consortium, to develop visualization methods for a highly complex RPPA dataset involving many cancer cell lines, stimuli, and inhibitors applied over time course. In this review, we put forth a number of general guidelines for effective visualization of complex molecular datasets, namely, showing the data, ordering data elements deliberately, enabling generalization, focusing on relevant specifics, and putting things into context. We give examples of how these principles can be utilized in visualizing the intrinsic subtypes of breast cancer and in meaningfully displaying the entire HPN-DREAM RPPA dataset within a single page.
RPPA; proteomics; molecular profiling; integrative analysis; breast cancer; TCGA
The current study analyzed reverse phase protein arrays (RPPA) as a means to experimentally validate biomarkers in blood samples. One µl samples of sera (n=71), and plasma (n=78) were serially diluted and printed on nitrocellulose-coated slides. CA19-9 levels from RPPA results were compared with identical patient samples as measured by ELISA. There was a strong correlation between RPPA and ELISA (r=0.87) as determined by scatter plots. Sample reproducibility of CA19-9 levels was excellent (interslide correlation r=0.88; intraslide correlation r=0.83). The ability of RPPA to accurately distinguish CA19-9 levels between cancer and non-cancer samples were determined using receiver operating characteristic curves and compared with ELISA. The AUC for RPPA and ELISA was comparable (0.87 and 0.86, respectively). When the mean CA19-9 levels of normal samples was used as a cutoff for RPPA and compared with the standard clinical ELISA cutoff, comparable specificities (71% for both) were observed. Notably, RPPA samples normalized to albumin showed increased sensitivity compared to ELISA (90% vs 75%). As RPPA is a high throughput method that shows results comparable to that of ELISA, we propose that RPPA is a viable technique for rapid experimental screening and validation of candidate biomarkers in blood samples.
biomarker; CA19-9; pancreatic cancer; reverse phase protein array
Reverse phase protein arrays (RPPAs) are a powerful high-throughput tool for measuring protein concentrations in a large number of samples. In RPPA technology, the original samples are often diluted successively multiple times, forming dilution series to extend the dynamic range of the measurements and to increase confidence in quantitation. An RPPA experiment is equivalent to running multiple ELISA assays concurrently except that there is usually no known protein concentration from which one can construct a standard response curve. Here, we describe a new method called ‘serial dilution curve for RPPA data analysis’. Compared with the existing methods, the new method has the advantage of using fewer parameters and offering a simple way of visualizing the raw data. We showed how the method can be used to examine data quality and to obtain robust quantification of protein concentrations.
Availability: A computer program in R for using serial dilution curve for RPPA data analysis is freely available at http://odin.mdacc.tmc.edu/~zhangli/RPPA.
Reverse-phase protein arrays (RPPAs) have become an important tool for the sensitive and high-throughput detection of proteins from minute amounts of lysates from cell lines and cryopreserved tissue. The current standard method for tissue preservation in almost all hospitals worldwide is formalin fixation and paraffin embedding, and it would be highly desirable if RPPA could also be applied to formalin-fixed and paraffin embedded (FFPE) tissue. We investigated whether the analysis of FFPE tissue lysates with RPPA would result in biologically meaningful data in two independent studies. In the first study on breast cancer samples, we assessed whether a human epidermal growth factor receptor (HER) 2 score based on immunohistochemistry (IHC) could be reproduced with RPPA. The results showed very good concordance between the IHC and RPPA classifications of HER2 expression. In the second study, we profiled FFPE tumor specimens from patients with adenocarcinoma and squamous cell carcinoma in order to find new markers for differentiating these two subtypes of non-small cell lung cancer. p21-activated kinase 2 could be identified as a new differentiation marker for squamous cell carcinoma. Overall, the results demonstrate the technical feasibility and the merits of RPPA for protein expression profiling in FFPE tissue lysates.
The ability to predict the developmental and implantation ability of embryos remains a major goal in human assisted-reproductive technology (ART) and most ART laboratories use morphological criteria to evaluate the oocyte competence despite the poor predictive value of this analysis. Transcriptomic and proteomic approaches on somatic cells surrounding the oocyte (granulosa cells, cumulus cells [CCs]) have been proposed for the identification of biomarkers of oocyte competence. We propose to use a Reverse Phase Protein Array (RPPA) approach to investigate new potential biomarkers of oocyte competence in human CCs at the protein level, an approach that is already used in cancer research to identify biomarkers in clinical diagnostics.
Antibodies targeting proteins of interest were validated for their utilisation in RPPA by measuring siRNA-mediated knockdown efficiency in HEK293 cells in parallel with Western blotting (WB) and RPPA from the same lysates. The proteins of interests were measured by RPPA across 13 individual human CCs from four patients undergoing intracytoplasmic sperm injection procedure.
The knockdown efficiency of VCL, RGS2 and SRC were measured in HEK293 cells by WB and by RPPA and were acceptable for VCL and SRC proteins. The antibodies targeting these proteins were used for their detection in human CCs by RPPA. The detection of protein VCL, SRC and ERK2 (by using an antibody already validated for RPPA) was then carried out on individual CCs and signals were detected for each individual sample. After normalisation by VCL, we showed that the level of expression of ERK2 was almost the same across the 13 individual CCs while the level of expression of SRC was different between the 13 individual CCs of the four patients and between the CCs from one individual patient.
The exquisite sensitivity of RPPA allowed detection of specific proteins in individual CCs. Although the validation of antibodies for RPPA is labour intensive, RRPA is a sensitive and quantitative technique allowing the detection of specific proteins from very small quantities of biological samples. RPPA may be of great interest in clinical diagnostics to predict the oocyte competence prior to transfer of the embryo using robust protein biomarkers expressed by CCs.
Biomarkers; Cumulus cells; Oocyte developmental competence; Reverse phase protein array
Reverse phase protein array (RPPA) is a powerful dot-blot technology that allows studying protein expression levels as well as post-translational modifications in a large number of samples simultaneously. Yet, correct interpretation of RPPA data has remained a major challenge for its broad-scale application and its translation into clinical research. Satisfying quantification tools are available to assess a relative protein expression level from a serial dilution curve. However, appropriate tools allowing the normalization of the data for external sources of variation are currently missing.
Here we propose a new method, called NormaCurve, that allows simultaneous quantification and normalization of RPPA data. For this, we modified the quantification method SuperCurve in order to include normalization for (i) background fluorescence, (ii) variation in the total amount of spotted protein and (iii) spatial bias on the arrays. Using a spike-in design with a purified protein, we test the capacity of different models to properly estimate normalized relative expression levels. The best performing model, NormaCurve, takes into account a negative control array without primary antibody, an array stained with a total protein stain and spatial covariates. We show that this normalization is reproducible and we discuss the number of serial dilutions and the number of replicates that are required to obtain robust data. We thus provide a ready-to-use method for reliable and reproducible normalization of RPPA data, which should facilitate the interpretation and the development of this promising technology.
The raw data, the scripts and the NormaCurve package are available at the following web site: http://microarrays.curie.fr.
The goal of personalized medicine is to provide patients optimal drug screening and treatment based on individual genomic or proteomic profiles. Reverse-Phase Protein Array (RPPA) technology offers proteomic information of cancer patients which may be directly related to drug sensitivity. For cancer patients with different drug sensitivity, the proteomic profiling reveals important pathophysiologic information which can be used to predict chemotherapy responses.
The goal of this paper is to present a framework for personalized medicine using both RPPA and drug sensitivity (drug resistance or intolerance). In the proposed personalized medicine system, the prediction of drug sensitivity is obtained by a proposed augmented naive Bayesian classifier (ANBC) whose edges between attributes are augmented in the network structure of naive Bayesian classifier. For discriminative structure learning of ANBC, local classification rate (LCR) is used to score augmented edges, and greedy search algorithm is used to find the discriminative structure that maximizes classification rate (CR). Once a classifier is trained by RPPA and drug sensitivity using cancer patient samples, the classifier is able to predict the drug sensitivity given RPPA information from a patient.
In this paper we proposed a framework for personalized medicine where a patient is profiled by RPPA and drug sensitivity is predicted by ANBC and LCR. Experimental results with lung cancer data demonstrate that RPPA can be used to profile patients for drug sensitivity prediction by Bayesian network classifier, and the proposed ANBC for personalized cancer medicine achieves better prediction accuracy than naive Bayes classifier in small sample size data on average and outperforms other the state-of-the-art classifier methods in terms of classification accuracy.
Aberrations in oncogenes and tumor suppressors frequently affect the activity of critical signal transduction pathways. To analyze systematically the relationship between the activation status of protein networks and other characteristics of cancer cells, we performed reverse phase protein array (RPPA) profiling of the NCI60 cell lines for total protein expression and activation-specific markers of critical signaling pathways. To extend the scope of the study, we merged those data with previously published RPPA results for the NCI60. Integrative analysis of the expanded RPPA data set revealed 5 major clusters of cell lines and 5 principal proteomic signatures. Comparison of mutations in the NCI60 cell lines with patterns of protein expression demonstrated significant associations for PTEN, PIK3CA, BRAF and APC mutations with proteomic clusters. PIK3CA and PTEN mutation enrichment were not cell lineage-specific but were associated with dominant yet distinct groups of proteins. The five RPPA-defined clusters were strongly associated with sensitivity to standard anti-cancer agents. RPPA analysis identified 27 protein features significantly associated with sensitivity to paclitaxel. The functional status of those proteins was interrogated in a paclitaxel whole genome siRNA library synthetic lethality screen, and confirmed the predicted associations with drug sensitivity. These studies expand our understanding of the activation status of protein networks in the NCI60 cancer cell lines, demonstrate the importance of the direct study of protein expression and activation, and provide a basis for further studies integrating the information with other molecular and pharmacological characteristics of cancer.
NCI60; reverse phase protein arrays; signal transduction
Reverse phase protein arrays (RPPA) are an efficient, high-throughput, cost-effective method for the quantification of specific proteins in complex biological samples. The quality of RPPA data may be affected by various sources of error. One of these, spatial variation, is caused by uneven exposure of different parts of an RPPA slide to the reagents used in protein detection. We present a method for the determination and correction of systematic spatial variation in RPPA slides using positive control spots printed on each slide. The method uses a simple bi-linear interpolation technique to obtain a surface representing the spatial variation occurring across the dimensions of a slide. This surface is used to calculate correction factors that can normalize the relative protein concentrations of the samples on each slide. The adoption of the method results in increased agreement between technical and biological replicates of various tumor and cell-line derived samples. Further, in data from a study of the melanoma cell-line SKMEL-133, several slides that had previously been rejected because they had a coefficient of variation (CV) greater than 15%, are rescued by reduction of CV below this threshold in each case. The method is implemented in the R statistical programing language. It is compatible with MicroVigene and SuperCurve, packages commonly used in RPPA data analysis. The method is made available, along with suggestions for implementation, at http://bitbucket.org/rppa_preprocess/rppa_preprocess/src.
The human epidermal growth factor receptor-2 (HER-2) expression level is a critical element for determining the prognosis and management of breast cancer. HER-2 targeted therapy in breast cancer depends on the reliable assessment of HER-2 expression status but current standard methods are lacking a rigorous quantitative assay. To address this challenge, we developed an assessment of HER-2 expression method by well-based reverse phase protein array (RPPA).
Well-based RPPA is based on a robust protein isolation methodology paired with a novel electrochemiluminescence detection system. HER-2 value of well-based RPPA significantly correlated with dot blotting results (R2 = 0.939). By well-based RPPA, we successfully detected HER-2 expression in 76 human breast formalin-fixed paraffin-embedded tissue samples. We observed 93.4% (71/76) concordance between well-based RPPA and current HER-2 immunohistochemical assessment guideline. When the cutoff level of HER-2 value was set to 0.689 (HER-2/GAPDH) on the basis of receiver-operating characteristic curve, the area under the curve was 0.975 (95% CI, 0.941-1.000). Sensitivity and specificity of well-based RPPA was 92.1% and 94.7%, respectively.
HER-2 value by well-based RPPA was correlated with the current HER-2 status guideline, suggesting that this normalized HER-2 assessment may offer advantages over unnormalized current immunohistochemical assessment methods.
Breast cancer; Formalin-fixed paraffin-embedded; Human epidermal growth factor receptor- 2; Immunohistochemistry; Reverse-phase protein array
The aim of this study was to use the reverse-phase protein array to predict patients who are at low risk of developing bone metastasis from breast cancer. The model showed novel predictive potential. Patients with low assessed risk are unlikely to benefit from receiving a bisphosphonate in the adjuvant setting. Further clinical trials excluding these patients will clarify the benefit of bisphosphonates.
A biomarker that predicts bone metastasis based on a protein laboratory assay has not been demonstrated. Reverse-phase protein array (RPPA) enables quantification of total and phosphorylated proteins, providing information about their functional status. The aim of this study was to identify bone-metastasis-related markers in patients with primary breast cancer using RPPA analysis.
Patients and Methods.
Tumor samples were obtained from 169 patients with primary invasive breast carcinoma who underwent surgery. The patients were categorized by whether they developed breast cancer bone metastasis (BCBM) during follow-up. Clinical characteristics and protein expression by RPPA were compared and verified by leave-one-out cross-validation.
Lymph node status (p = .023) and expression level of 22 proteins by RPPA were significantly correlated with BCBM in logistic regression analysis. These variables were used to build a logistic regression model. After filtering the variables through a stepwise algorithm, the final model, consisting of 8 proteins and lymph node status, had sensitivity of 30.0%, specificity of 90.5%, positive predictive value of 30.0%, and negative predictive value of 90.5% in the cross-validation. Most of the identified proteins were associated with cell cycle or signal transduction (CDK2, CDKN1A, Rb1, Src, phosphorylated-ribosomal S6 kinase, HER2, BCL11A, and MYH11).
Our validated model, in which the primary tumor is tested with RPPA, can predict patients who are at low risk of developing BCBM and thus who likely would not benefit from receiving a bisphosphonate in the adjuvant setting. Clinical trials excluding these patients have the potential to clarify the benefit of bisphosphonates in the adjuvant setting.
Breast neoplasm; Bone metastasis; Reverse-phase protein array; Prediction model; Phosphorylated protein
Motivation: Reverse phase protein arrays (RPPA) measure the relative expression levels of a protein in many samples simultaneously. A set of identically spotted arrays can be used to measure the levels of more than one protein. Protein expression within each sample on an array is estimated by borrowing strength across all the samples, but using only within array information. When comparing across slides, it is essential to account for sample loading, the total amount of protein printed per sample. Currently, total protein is estimated using either a housekeeping protein or the sample median across all slides. When the variability in sample loading is large, these methods are suboptimal because they do not account for the fact that the protein expression for each slide is estimated separately.
Results: We propose a new normalization method for RPPA data, called variable slope (VS) normalization, that takes into account that quantification of RPPA slides is performed separately. This method is better able to remove loading bias and recover true correlation structures between proteins.
Availability: Code to implement the method in the statistical package R and anonymized data are available at http://bioinformatics.mdanderson.org/supplements.html.
Supplementary data are available at Bioinformatics online.
Loading control (LC) and variance stabilization of reverse-phase protein array (RPPA) data have been challenging mainly due to the small number of proteins in an experiment and the lack of reliable inherent control markers. In this study, we compare eight different normalization methods for LC and variance stabilization. The invariant marker set concept was first applied to the normalization of high-throughput gene expression data. A set of “invariant” markers are selected to create a virtual reference sample. Then all the samples are normalized to the virtual reference. We propose a variant of this method in the context of RPPA data normalization and compare it with seven other normalization methods previously reported in the literature. The invariant marker set method performs well with respect to LC, variance stabilization and association with the immunohistochemistry/florescence in situ hybridization data for three key markers in breast tumor samples, while the other methods have inferior performance. The proposed method is a promising approach for improving the quality of RPPA data.
reverse-phase protein array; RPPA; normalization; proteomics
The lack of large panels of validated antibodies, tissue handling variability, and intratumoral heterogeneity potentially hamper comprehensive study of the functional proteome in non-microdissected solid tumors. The purpose of this study was to address these concerns and to demonstrate clinical utility for the functional analysis of proteins in non-microdissected breast tumors using reverse phase protein arrays (RPPA).
Herein, 82 antibodies that recognize kinase and steroid signaling proteins and effectors were validated for RPPA. Intraslide and interslide coefficients of variability were <15%. Multiple sites in non-microdissected breast tumors were analyzed using RPPA after intervals of up to 24 h on the benchtop at room temperature following surgical resection.
Twenty-one of 82 total and phosphoproteins demonstrated time-dependent instability at room temperature with most variability occurring at later time points between 6 and 24 h. However, the 82-protein functional proteomic “fingerprint” was robust in most tumors even when maintained at room temperature for 24 h before freezing. In repeat samples from each tumor, intratumoral protein levels were markedly less variable than intertumoral levels. Indeed, an independent analysis of prognostic biomarkers in tissue from multiple tumor sites accurately and reproducibly predicted patient outcomes. Significant correlations were observed between RPPA and immunohistochemistry. However, RPPA demonstrated a superior dynamic range. Classification of 128 breast cancers using RPPA identified six subgroups with markedly different patient outcomes that demonstrated a significant correlation with breast cancer subtypes identified by transcriptional profiling.
Thus, the robustness of RPPA and stability of the functional proteomic “fingerprint” facilitate the study of the functional proteome in non-microdissected breast tumors.
Functional proteome; RPPA; Breast cancer; Kinase signaling; Steroid signaling
Vascular endothelial growth factor (VEGF) is a critical pro-angiogenic factor, found in a number of cancers, and a target of therapy. It is typically assessed by immunohistochemistry (IHC) in clinical research. However, IHC is not a quantitative assay and is rarely reproducible. We compared VEGF levels in colon cancer by IHC and a quantitative immunoassay on proteins isolated from formalin fixed, paraffin embedded tissues.
VEGF expression was studied by means of a well-based reverse phase protein array (RPPA) and immunohistochemistry in 69 colon cancer cases, and compared with various clinicopathologic factors. Protein lysates derived from formalin fixed, paraffin embedded tissue contained measurable immunoreactive VEGF molecules. The VEGF expression level of well differentiated colon cancer was significantly higher than those with moderately and poorly differentiated carcinomas by immunohistochemistry (P = 0.04) and well-based RPPA (P = 0.04). VEGF quantification by well-based RPPA also demonstrated an association with nodal metastasis status (P = 0.05). In addition, the normalized VEGF value by well-based RPPA correlated (r = 0.283, P = 0.018). Furthermore, subgroup analysis by histologic type revealed that adenocarcinoma cases showed significant correlation (r = 0.315, P = 0.031) between well-based RPPA and IHC.
The well-based RPPA method is a high throughput and sensitive approach, is an excellent tool for quantification of marker proteins. Notably, this method may be helpful for more objective evaluation of protein expression in cancer patients.
Vascular endothelial growth factor; Formalin-fixed paraffin-embedded; Colon cancer; Immunohistochemistry; Reverse-phase protein array
Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax.
We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication.
Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
Motivation: Reverse-phase protein arrays (RPPAs) allow sensitive quantification of relative protein abundance in thousands of samples in parallel. Typical challenges involved in this technology are antibody selection, sample preparation and optimization of staining conditions. The issue of combining effective sample management and data analysis, however, has been widely neglected.
Results: This motivated us to develop MIRACLE, a comprehensive and user-friendly web application bridging the gap between spotting and array analysis by conveniently keeping track of sample information. Data processing includes correction of staining bias, estimation of protein concentration from response curves, normalization for total protein amount per sample and statistical evaluation. Established analysis methods have been integrated with MIRACLE, offering experimental scientists an end-to-end solution for sample management and for carrying out data analysis. In addition, experienced users have the possibility to export data to R for more complex analyses. MIRACLE thus has the potential to further spread utilization of RPPAs as an emerging technology for high-throughput protein analysis.
Availability: Project URL: http://www.nanocan.org/miracle/
Supplementary data are available at Bioinformatics online.
Using a new type of array technology, the reverse phase protein array (RPPA), we measure time-course protein expression for a set of selected markers that are known to co-regulate biological functions in a pathway structure. To accommodate the complex dependent nature of the data, including temporal correlation and pathway dependence for the protein markers, we propose a mixed effects model with temporal and protein-specific components. We develop a sequence of random probability measures (RPM) to account for the dependence in time of the protein expression measurements. Marginally, for each RPM we assume a Dirichlet process (DP) model. The dependence is introduced by defining multivariate beta distributions for the unnormalized weights of the stick breaking representation. We also acknowledge the pathway dependence among proteins via a conditionally autoregressive (CAR) model. Applying our model to the RPPA data, we reveal a pathway-dependent functional profile for the set of proteins as well as marginal expression profiles over time for individual markers.
Bayesian nonparametrics; dependent random measures; Markov beta process; mixed effects model; stick breaking processes; time series analysis
Gathering vast data sets of cancer genomes requires more efficient and autonomous procedures to classify cancer types and to discover a few essential genes to distinguish different cancers. Because protein expression is more stable than gene expression, we chose reverse phase protein array (RPPA) data, a powerful and robust antibody-based high-throughput approach for targeted proteomics, to perform our research. In this study, we proposed a computational framework to classify the patient samples into ten major cancer types based on the RPPA data using the SMO (Sequential minimal optimization) method. A careful feature selection procedure was employed to select 23 important proteins from the total of 187 proteins by mRMR (minimum Redundancy Maximum Relevance Feature Selection) and IFS (Incremental Feature Selection) on the training set. By using the 23 proteins, we successfully classified the ten cancer types with an MCC (Matthews Correlation Coefficient) of 0.904 on the training set, evaluated by 10-fold cross-validation, and an MCC of 0.936 on an independent test set. Further analysis of these 23 proteins was performed. Most of these proteins can present the hallmarks of cancer; Chk2, for example, plays an important role in the proliferation of cancer cells. Our analysis of these 23 proteins lends credence to the importance of these genes as indicators of cancer classification. We also believe our methods and findings may shed light on the discoveries of specific biomarkers of different types of cancers.
Protein extraction from formalin-fixed paraffin-embedded (FFPE) tissues is challenging due to extensive molecular crosslinking that occurs upon formalin fixation. Reverse-phase protein array (RPPA) is a high-throughput technology, which can detect changes in protein levels and protein functionality in numerous tissue and cell sources. It has been used to evaluate protein expression mainly in frozen preparations or FFPE-based studies of limited scope. Reproducibility and reliability of the technique in FFPE samples has not yet been demonstrated extensively. We developed and optimized an efficient and reproducible procedure for extraction of proteins from FFPE cells and xenografts, and then applied the method to FFPE patient tissues and evaluated its performance on RPPA.
Fresh frozen and FFPE preparations from cell lines, xenografts and breast cancer and renal tissues were included in the study. Serial FFPE cell or xenograft sections were deparaffinized and extracted by six different protein extraction protocols. The yield and level of protein degradation were evaluated by SDS-PAGE and Western Blots. The most efficient protocol was used to prepare protein lysates from breast cancer and renal tissues, which were subsequently subjected to RPPA. Reproducibility was evaluated and Spearman correlation was calculated between matching fresh frozen and FFPE samples.
The most effective approach from six protein extraction protocols tested enabled efficient extraction of immunoreactive protein from cell line, breast cancer and renal tissue sample sets. 85% of the total of 169 markers tested on RPPA demonstrated significant correlation between FFPE and frozen preparations (p < 0.05) in at least one cell or tissue type, with only 23 markers common in all three sample sets. In addition, FFPE preparations yielded biologically meaningful observations related to pathway signaling status in cell lines, and classification of renal tissues.
With optimized protein extraction methods, FFPE tissues can be a valuable source in generating reproducible and biologically relevant proteomic profiles using RPPA, with specific marker performance varying according to tissue type.
Formalin-fixed; Paraffin-embedded tissue; Protein extraction; Reverse phase protein array; Breast cancer; Renal cancer
BACKGROUND: (blind field). METHODS: Proteomic profiles of 20 patient-derived glioblastoma xenografts (PDGX), established and genetically characterized at Duke's Preston Robert Tisch Brain Tumor Center, were obtained using Reverse Phase Protein Arrays (RPPA). The proteomic profiles of the parent tumors, obtained from Duke's brain tumor repository, were also obtained using RPPA. The RPPA analysis examined the expression of 128 proteins and phosphoproteins in each primary tumor and PDGX. We compared the expression levels of the 128 proteins in each PDGX/primary tumor pair by plotting the expression level on the y-axis and name of the protein/phosphoprotein on the x-axis. RESULTS: For each primary tumor and PDGX pair, the expression of all but 4 to 8 proteins matched. These data indicate that PDGX retain the protemic profile of the parent tumor. Since the proteomic profile provides information on activated growth pathways, these data can be used to develop an arsenal of signal transduction modulators effective in arresting PDGX growth. For example, 5 xenografts exhibited high phosphorylation of EGFR and these PDGX also showed significantly increased phsophorylation of five other proteins, including GSK3-β, HER2, MAPK, Rb, and Src. Thus, a decrease in phosphorylation of these five proteins can be used to monitor the effectiveness of EGFR blockade in silencing EGFR signaling. If the agents found to be effective in silencing EGFR signaling also inhibit xenograft growth, then these agents can be further tested in glioblastoma patients whose tumors have similar molecular features. CONCLUSIONS: A concordance between the proteomic/phosphoproteomic profiles of PDGX and their parent tumor makes the PDGX an excellent preclinical model for personalized drug development. Acknowledgements: This research is supported by a grant from the NIH (R21NS078642) and a grant from the Musella Foundation to MMK. SECONDARY CATEGORY: Tumor Biology.
Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols.
This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls.
The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community.
The identification of key pathways dysregulated in non-small cell lung cancer (NSCLC) is an important step toward understanding lung pathogenesis and developing new therapeutic approaches.
Toward this goal, reverse-phase protein lysate arrays (RPPA) were used to compare signaling pathways between NSCLC tumors and paired normal lung tissue from 46 patients and assess their association with clinical outcome.
After RPPA quantification of 63 proteins and phosphoproteins, tissue pairs were randomized to a training set (n = 25 pairs) and test set (n = 21 pairs). In the training set, 15 protein markers were differentially expressed between tumors and normal lung (p ≤ 0.01), including markers in the PI3K/AKT and p38 MAPK signaling pathways (e.g., p70S6K, S6, p38, and phospho-p38), as well as caveolin-1 and β-catenin. A four-protein signature (p70S6K, cyclin B1, pSrc(Y527), and caveolin-1) independent of histology classified specimens as tumor versus normal with a predicted accuracy of 83%, sensitivity of 67%, and specificity of 100%. The signature was validated in the test set, correctly classifying all normal tissues and 14 of 21 tumor tissues. RPPA results were confirmed by immunohistochemistry for caveolin-1 and p70S6K. In tumors from patients with resected NSCLC, expression of proteins in the energy-sensing AMPK pathway (pLKB1, AMPK, p-Acetyl-CoA, pTSC2), adhesion, EGFR, and Rb signaling pathways was inversely associated with NSCLC recurrence.
These data provide evidence for dysregulation of several pathways including those involving energy sensing and adhesion that are potentially associated with NSCLC pathogenesis and disease recurrence.
NSCLC; Proteomics; Recurrence; AMPK; Adhesion