|Home | About | Journals | Submit | Contact Us | Français|
In MS-based studies to discover urinary protein biomarkers, an important question is how to analyze the data to find the most promising potential biomarkers to be advanced to large-scale validation studies. Here, we describe a ‘systems biology-based’ approach to address this question.
We analyzed large-scale LC-MS/MS data of urinary exosomes from renal allograft recipients with biopsy-proven evidence of immunological rejection or tubular injury. We asked whether bioinformatic analysis of urinary exosomal proteins can identify protein groups that correlate with biopsy findings and whether the protein groups fit with general knowledge of the pathophysiological mechanisms involved.
LC-MS/MS analysis of urinary exosomal proteomes identified more than 1000 proteins in each pathologic group. These protein lists were analyzed computationally to identify Biological Process and KEGG Pathway terms that are significantly associated with each pathological group. Among the most informative terms for each group were: “sodium ion transport” for tubular injury; “immune response” for all rejection; “epithelial cell differentiation” for cell-mediated rejection; and “acute inflammatory response” for antibody-mediated rejection. Based on these terms, candidate biomarkers were identified using a novel strategy to allow a dichotomous classification between different pathologic categories.
The terms and candidate biomarkers identified make rational connections to pathophysiological mechanisms, suggesting that the described bioinformatic approach will be useful in advancing large-scale biomarker identification studies toward a validation phase.
The main objective of the present study is to develop a bioinformatic strategy that can reveal exploitable patterns in protein mass spectrometry data from urinary samples. The question being addressed is how to proceed from mass spectrometry-generated lists of proteins to a choice of candidate biomarkers for future testing in validation clinical trials. Here we describe and illustrate an approach derived from the field of systems biology that uses computational tools to discover biologically meaningful differences among experimentally derived protein lists [1, 2]. This approach is based on the idea that the objective of biomarker discovery studies should be to identify biological processes that are deranged rather than to look for specific proteins that may stand out. The rationale has been presented previously . Briefly, it is based on the idea that protein biomarkers that “make sense” from the perspective of pathophysiology are more likely to succeed in the clinical setting than randomly discovered proteins that have uncertain connection to the relevant disease mechanisms (see Discussion). In this study, as a ‘proof of principle’, we apply such an approach to proteins detected by LC-MS/MS (liquid chromatography-tandem mass spectrometry) analysis of urinary exosome samples from three groups of renal transplant patients with different pathologic findings in the allograft biopsy: tubular injury, cell-mediated rejection, and antibody-mediated rejection. The bioinformatic approach outlined identified biological processes and candidate biomarkers associated with the different patient groups that fit with known pathophysiological mechanisms.
Urine samples were collected prospectively from adult renal transplant recipients followed at the Johns Hopkins Comprehensive Transplant Center. These patients underwent a kidney graft biopsy, either for cause (due to rise in baseline serum creatinine) or as protocol biopsy in patients with stable renal function. All patients received standard immunosuppressive treatment with prednisone, mycophenolate, and tacrolimus. Exclusion factors included combined transplantation of kidney and other organs, evidence of infection in the kidney (bacterial or viral such as BK/polyoma, CMV, or Herpes), primary glomerular disease (de novo or recurrent), post-transplant lymphoproliferative disorder (PTLD). Kidney graft biopsies were processed for routine studies as previously described including staining with hematoxilin-eosin (H&E), periodic acid-Schiff (PAS) methenamine silver, and Masson’s trichrome for light microscopy examination . C4d staining by immunofluorescence was routinely performed on frozen sections of transplant kidney biopsies (alternatively C4d immunoperoxidase-based stain was also available on sections from paraffin embedded tissue). Biopsies were graded for cellular- and antibody-mediated rejection according to the Banff score [5, 6]. Based on the graft biopsy findings, patients were assigned to four groups: Non-Specific Findings (N), mild to moderate Tubular Injury (TI), Cell-Mediated Rejection (CMR), or Antibody-Mediated Rejection (AMR). The study protocol was approved by the Johns Hopkins Medical Institutions Review Board.
Freshly voided urine (20-200 ml) was collected in the morning, before the biopsy (but was not the first morning urine sample), and was processed immediately to isolate urinary exosomes using the differential centrifugation procedure described by Gonzales et al . Processed protein samples solubilized in Laemmli reagent were pooled for each pathological group for analysis by LC-MS/MS.
Triplicate sets of 200 μg of urinary exosomal proteins pooled from each pathological group (7 TI, 6 CMR, 3 AMR, and 2 N, supporting Information Table S1) were separated by one dimensional SDS/PAGE electrophoresis using 10% polyacrylamide gels. After staining with Coomassie blue, the gels were de-stained, cut in multiple slices, dehydrated, reduced, alkylated, and subjected to trypsin digestion at 37°C overnight to obtain peptides, which were reconstituted in 0.1% formic acid for analysis as described . LC-MS/MS was performed on triplicate aliquots of digested protein samples using LTQ Orbitrap XL (two runs) and LTQ Orbitrap Velos (one run) (Thermo Scientific, Waltham, MA). Reversed-phase C18 chromatographic separation of trapped peptides was carried out on a prepacked Beta Basic C18 PicoFrit column (75 μm i.d. × 10 cm length; New Objective) at 300 nL/min using the following gradient: 2–5% solvent B for 2 min; 5–45% solvent B for 45 min; 45–50% solvent B for 5 min; 50–95% solvent B for 5 min (solvent A: 0.1% formic acid in 98% water, 2% acetonitrile; solvent B: 0.1% formic acid in 100% acetonitrile). To identify peptide sequences, the MS data was searched against National Center for Biotechnology Information (NCBI) Reference Sequence human protein database (April 8, 2010, 38767 entries) which included a list of common contaminating proteins and scored using SEQUEST algorithm on Proteome Discoverer software (ver. 1.1, Thermo Scientific). Precursor ion tolerance was 50 ppm, while fragment ion tolerance was 1.0 Da. Two missed trypsin cleavage sites were allowed. Static modifications included carbamidomethylation of cysteine (+57.021 Da) and variable modifications included oxidation of methionine (+15.995 Da). The peptide false discovery rate was limited to < 2% for individual peptides using the target-decoy approach . All MS spectral files associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hashes:
For the evaluation of relative abundance of exosomal proteins among groups, the spectral counts were normalized by dividing by the number of theoretical tryptic peptides detectable by LC-MS/MS (criteria for predicting a detectable tryptic peptide are as followed: theoretical m/z = 300-2000 for charge states +1 to +3, peptide length = 6-35, possible missed cleavage sites < 3). These values were further normalized by dividing by the median value for each group to account for potential differences in sample size.
Identified proteins were classified as ‘extrinsic to exosomes’ if they possessed the Gene Ontology Cellular Component term of “extracellular space”, but lacked the terms “anchor” and “cytosol”. The remaining proteins were placed in the ‘intrinsic to exosomes’ subgroup. Protein lists were analyzed using the DAVID bioinformatic tool (Database for Annotation, Visualization and Integrated Discovery, NIAID, Bethesda, MD http://david.abcc.niifcrf.gov/)  to determine what Gene Ontology Biological Process terms are over-represented relative to the concatenated lists from all four conditions. The DAVID tool was also applied to determine what KEGG pathways (Kyoto Encyclopedia of Genes and Genomes)  are over-represented in each pathological group.
Proteomes were profiled by LC-MS/MS in urinary exosome samples from renal allograft recipients with acute decrease in renal function and distinct biopsy findings: tubular injury (TI), cell-mediated rejection (CMR), and antibody-mediated rejection (AMR) (supporting Information Table S1). We also examined urinary exosomes of two allograft recipients with stable renal function undergoing “protocol” kidney graft biopsy, with non-specific findings (N). We selected cases that were stereotypical of each category, recognizing that real-world cases often display less distinct histology. Exosome samples were pooled within each histology class (supporting Information Table S1) and analyzed by LC-MS/MS. 1989 urine exosomal proteins were identified in renal transplant patients (false discovery rate < 2%, target-decoy analysis). This total exceeds the 1160 proteins reported in the Urinary Exosome Protein Database (http://dir.nhlbi.nih.gov/papers/lkem/exosome/). A list of all identified proteins along with associated median-normalized spectral counts, a measure of relative abundance (Materials and methods), can be accessed online through http://helixweb.nih.gov/ESBL/Database/EXORT/ (username: clp, password: Esbl!$, login to be removed upon acceptance of this paper). Each group had a large subset of uniquely identified proteins, TI: 353 proteins; CMR: 322 proteins; and AMR 165 proteins, with 1073 proteins present in more than one group.
We classified the identified proteins into one of two subgroups (Materials and methods): ‘intrinsic to exosomes’ or ‘extrinsic to exosomes’ (see the online database). The ‘extrinsic’ proteins are presumably plasma proteins that have crossed the glomerular filter and are nonspecifically associated with exosomes. The ‘intrinsic’ proteins were used for bioinformatic analysis. Overall, 92% of the proteins identified were classified as intrinsic, suggesting that they were a component of exosomes (or similar membranous structures) released from kidney cells.
Interestingly, there seems to be a higher proportion of extrinsic proteins in the transplanted control group than in the non-transplanted group (chi-square, p < 0.0001) when comparing the most abundant proteins (based on spectral counts) found in the group of transplant recipients with non-specific findings versus the most abundant proteins found in a group of healthy, non-transplanted volunteers that we studied previously , (supporting Information Table S2). Although these results need to be confirmed on a larger number of observations in transplanted patients without kidney pathology, they suggest that transplanted kidneys excrete higher levels of plasma proteins than non-transplanted kidneys even with no obvious pathological changes suggestive of tubular injury or rejection, and are also consistent with previously noted differences in the proteomic profile of urine from patients with transplanted or native kidneys [12, 13].
We analyzed the intrinsic exosomal proteins in each group to determine what characteristics distinguish the groups based on the biological processes that they take part in. This analysis was conducted with an online bioinformatic analysis tool called “DAVID” (Materials and methods), using the Gene Ontology (GO) Database, which attaches hierarchical descriptors to all annotated proteins coded by the human genome. We applied the Biological Process GO terms to identify which Biological Process terms are represented statistically more frequently in a list of exosomal proteins from a given patient group than from the entire set of exosomal proteins.
We asked whether any GO Biological Process terms are over-represented in the urinary exosomal proteins from kidneys of each patient group with abnormal biopsy findings (compared with the list of all proteins identified in transplanted kidneys). Table 1 shows the GO Biological Processes identified in each group with p < 0.05 (Fisher Exact). A comparison of Table 1A, 1B, and 1C reveals that several terms were enriched in more than one group. For example the term “immune response” is found in both graft rejection groups. Thus, proteins populating the “immune response” list can be deemed potential discriminators between rejection and tubular injury, but not between the two forms of rejection. supporting Information Table S3 shows proteins in this “immune response” list (cell-mediated and antibody-mediated rejection combined). Each pathological group in Table 1 is associated with at least one unique term (represented in Bold). As discussed below, most of these appear to be related to the underlying pathophysiology.
For the Tubular Injury group, the Gene Ontology Biological Process term “sodium ion transport” was the only unique term (Table 1A). This protein list (Table 2) includes a large number of transporters, many of them are sodium-coupled solute transporters expressed in the apical brush border membrane of the proximal tubule. Based on the transcriptomic databases of proximal tubule, medullary thick ascending limb, and inner medullary collecting duct , , 25 out of all 29 proteins (86%) in Table 2 were expressed with relatively high signal values in proximal tubule. This finding is presumably indicative of proximal tubule damage and therefore makes sense from a pathophysiological perspective.
In the Cell-Mediated Rejection group (Table 1B), there are several unique terms, including three similar terms: “actin filament based process”, “actin cytoskeleton organization”, and “cytoskeleton organization”. The proteins in the former group are shown in supporting Information Table S4. An additional unique term in the Cell-Mediated Rejection group is “epithelial cell differentiation” (Table 3). These are predominantly proteins involved in intermediate filament organization and include four different uroplakin isoforms. The latter are expressed in the renal ureter and pelvis, and point to the possibility that the transitional epithelium of the urinary drainage system is a target tissue in cell-mediated allograft rejection.
In the Antibody-Mediated Rejection group (Table 1C), several unique terms appear, including three similar terms relevant to protein trafficking. The proteins in the “protein localization” group are listed in supporting Information Table S5. Additional unique terms in the Antibody-Mediated Rejection group are “response to unfolded protein“ (supporting Information Table S6) and “acute inflammatory response” (Table 4). The latter category is dominated by components of the complement pathway. These categories refer to aspects of pathophysiology that appear logical in the context of current knowledge about antibody-mediated rejection (see Discussion).
We next analyzed the data to determine what characteristics distinguish the groups based on the KEGG biological pathways that they take part in, asking the question: “What KEGG pathways are represented statistically more frequently in a list of urinary exosomal proteins from a given patient group than in the list of all identified intrinsic exosomal proteins in all groups?”
The findings are summarized in Table 5. There were no unique KEGG pathways for exosomal proteins in the Tubular Injury group. One pathway was over-represented in the two Rejection groups, namely “endocytosis” (supporting Information Table S7 shows the “endocytosis” proteins combined from the two Rejection groups). Two KEGG pathways were uniquely over-represented in the list of exosomal proteins found in the Cell-Mediated Rejection group, viz. “tight junction” (supporting Information Table S8) and “glutathione metabolism” (supporting Information Table S9). Interestingly the “tight junction” KEGG pathway with the Cell-Mediated Rejection group proteins include three proteins that are critically involved in epithelial polarization through PDZ interactions, viz. Crumbs3 (CRB3), PALS1 (MPP5), and PATJ (INADL) , and may be important in the recovery of tubular integrity damaged by “Tubulitis”, a signature pathologic lesion in this type of rejection. Also as seen in Table 5, four KEGG pathways were uniquely over-represented in the Antibody-Mediated Rejection group, viz. “antigen processing and presentation” (Table 6), “neurotrophin signaling pathway”, “pathways in cancer” and “pathogenic E. coli infection” (supporting Information Tables S10-S12). The “antigen processing and presentation” list is dominated by major histocompatibility complex proteins and heat shock proteins.
The approach described above selects groups of MS-identified proteins associated via Gene Ontology Biological Process terms or KEGG Pathways that relate to pathophysiological mechanisms in different patient groups. In full-scale biomarker studies, the next task would be to select candidate biomarkers from the protein groups that are predicted from the mass spectrometry data to have the potential to discriminate patient groups. In this study, we developed a strategy to identify candidate biomarkers from our ‘proof-of-principle’ data using the protein lists associated with the over-represented Biological Processes for each pathologic category (Tables 2--44 and supporting Information Tables S3-6). In this context, a first practical goal would be to identify pairs of proteins that will allow a dichotomous classification into Rejection vs. Tubular Injury categories (Figure 1). A second goal would be to choose protein pairs that allow samples to be categorized as Antibody-Mediated Rejection vs. Cell-Mediated Rejection (Figure 2). Selecting pairs of proteins that may have sufficient discriminating power between two different conditions, based on ratios of their abundance, would avoid the need of normalization factors, a main hurdle in optimizing assay conditions for urine biomarkers.
As rationalized in the Discussion, criteria for selection of candidate proteins from the Gene Ontology Biological Process lists obtained in the present study were: 1) maximization of the spectral count ratio between the two states being discriminated and 2) high absolute abundance based on spectral counts. Figure 1A shows a heat map representation of the dichotomous classification between All Rejection vs. Tubular Injury. The numbers for each pair of proteins are discrimination factors between the two conditions for a protein pair [Values are log2(RR/RTI), where RR is the median-normalized spectral count ratio of the two proteins for All Rejection and RTI is the ratio of the two proteins in Tubular Injury]. The best discrimination factors are the highest positive numbers. Figure 1B shows the same type of information for the comparison between Antibody-Mediated Rejection versus Tubular Injury and Figure 1C shows comparable information for Cell-Mediated Rejection versus Tubular Injury. Figure 2 shows the dichotomous classification between Antibody-Mediated Rejection and Cell-Mediated Rejection in which the median-normalized spectral count ratio between potential markers are represented in the same manner as in Figure 1. All proteins presented in Figures 1 and and22 were also selected from the Biological Process protein lists only if their absolute abundance levels in the pathologic category associated with a particular protein list were at least 2-fold above the median value for that pathologic category (median-normalized spectral count values > 2).
Typically biomarker development studies involve at least three distinct stages: discovery, validation, and implementation . Here, we describe a bioinformatic approach to the analysis of discovery data. We used state-of-the-art mass spectrometry techniques for proteomic analysis of urinary exosomes from renal transplant recipients with renal dysfunction and biopsy-proven tubular injury, cell-mediated rejection, or antibody-mediated rejection. Although this approach was developed in the context of urinary proteomics data, it theoretically could be applied to analysis of data from other body fluids.
A central question addressed in this study is: “how best to analyze large-scale mass spectrometry data from urinary biomarker discovery studies to find proteins that have a meaningful connection to the pathophysiology of the disease process under study?” Optimally, candidate protein biomarkers must be sufficiently abundant to allow measurement in whole urine and should ‘make sense’ in terms of the pathophysiological processes involved . The latter requirement is based on three considerations: a) measurements of biomarkers consistent with current pathophysiological knowledge will more readily be integrated with other types of results in future clinical studies; b) such biomarkers will be more readily acceptable and useful to the clinical nephrology community than those with no known connection to pathophysiology; and c) biomarkers that fit with recognized mechanisms are less likely to be false positives.
Because of the considerations outlined in the foregoing Discussion, we have devised a ‘process-oriented’ approach to bioinformatic interpretation of the protein mass spectrometry discovery data. Thus, we are not looking for specific proteins that are enriched in individual patient groups, but rather we wish to identify classes of proteins (involved in specific biological processes) that can be shown statistically to be associated with individual patient groups. To implement this approach, we used the online computer utility, DAVID (see Materials and methods), to identify either Gene Ontology Biological Process (GO-BP) terms or KEGG Pathway terms that are selectively associated with lists of proteins identified in urinary exosomes in each renal transplant subgroup. Based on this approach, we can summarize the key observations in our ‘proof of principle’ samples as follows:
In a full-blown discovery study, the ultimate objective would be the development and testing of immunoassays (or other assays) for selected biomarker proteins. A major problem in designing useful urinary assays is normalization. The need for normalization arises from the fact that water excretion is highly variable, depending on physiological factors, rendering measurement of the absolute concentrations of a given biomarker substance virtually useless for making practical comparisons. Much has been written about the use of various measures such as creatinine, osmolality, Tamm-Horsfall protein, etc., to normalize urinary measurements. Such normalizing factors add work and variability to the read-out. However, because many clinical problems (including the one described in this paper) involve the development of a dichotomous classification that distinguishes one state from another, the normalization problem may be obviated; it is theoretically sufficient to form ratios of measured biomarker abundances, thereby canceling out the normalization factor. In the example developed for this study, two dichotomies are seen: 1) between rejection and tubular injury and 2) between cell-mediated rejection and antibody-mediated rejection. Consequently, in such studies it is rational to identify pairs of biomarker proteins, which, when measured under arbitrary loading conditions, give a concentration ratio that correlates with disease state. If the assays are linear, the ratio should be independent of water excretion rate or sample size, and separate normalization should be unnecessary.
Given the above observations, we selected candidate biomarkers representative of the biological processes identified by bioinformatic analysis. Operationally, we identified pairs of candidate biomarker proteins that potentially could allow a dichotomous classification into Rejection vs. Tubular Injury (Figure 1) and, among those classified as Rejection, will allow samples to be categorized as Antibody-Mediated vs. Cell-Mediated (Figure 2). Among the proteins in Figures 1 and and2,2, SERPING1  and S100A9  have already been reported to be candidate biomarker proteins for acute rejection in renal allografts. Many membrane protein candidates were identified and have not been reported in the literatures associated with renal allograft rejection. This observation is likely to be due to the use of urinary exosomes in this study. Urinary exosomes are tiny (40–80 nm) membranous structures secreted by every renal tubule epithelial cell type, as well as podocytes and transitional epithelia from the urinary collecting system .
In summary, the object of this paper was to describe and illustrate an approach to bioinformatic analysis of biomarker discovery data in order to facilitate identification of candidate biomarkers for future testing in validation clinical trials. For this purpose, we prepared samples from a small number of actual renal transplant patients with different pathological changes in the allograft. The primary goal of this study was not to validate protein biomarkers for different processes that could cause an elevation of serum creatinine in renal transplant patients. Optimally, such validation studies would warrant the analysis of larger numbers of samples than used here. Nevertheless, data presented here, when combined with results from other biomarker studies      , could be useful in the design of validation studies for a limited number of protein biomarker candidates that make pathophysiological sense.
A rising serum creatinine concentration in renal transplant patients raises a diagnostic dilemma: Is the rise in creatinine due to rejection or to tubular injury? If the former is true, what is the pathophysiological basis of the rejection? Presently these questions are best answered by examination of tissue from renal biopsy, an invasive procedure that requires hours to provide a diagnosis. The ultimate long-term objective of the present line of investigation is to devise immunological assays that can predict the biopsy results, possibly allowing earlier start of appropriate therapeutic intervention pending biopsy findings. The question addressed in this paper is: “how best to analyze large-scale mass spectrometry data from urinary biomarker discovery studies to find proteins that have a meaningful connection to the pathophysiology of the disease process under study?” To address this, we used state-of-the-art mass spectrometry techniques for analysis of urinary exosomes from patients with renal transplants, including those with biopsy evidence of tubular injury, cell-mediated rejection, and antibody-mediated rejection. We have developed a ‘process-oriented’ approach to bioinformatic interpretation of the protein mass spectrometry discovery data that facilitates identification of candidate biomarkers for future testing in validation clinical trials.
Parts of this study were funded with a grant from the National Kidney Foundation of Maryland (to SMB) and the operating budget of Division of Intramural Research, National Heart, Lung, and Blood Institute (Project ZO1-HL001285 to MAK). Mass spectrometry was conducted in the National Heart, Lung, and Blood Institute Proteomics Core Facility (director, Marjan Gucek). A portion of the data described in this study was presented in Abstract form at the American Society of Nephrology Renal Week 2009.
Conflict of interest statement
The authors have no competing financial interests to disclose.