|Home | About | Journals | Submit | Contact Us | Français|
Acute kidney injury (AKI) is an important cause of death among hospitalized patients. The two most common causes of AKI are acute tubular necrosis (ATN) and prerenal azotemia (PRA). Appropriate diagnosis of the disease is important but often difficult. We analyzed urine proteins by 2-DE from 38 patients with AKI. Patients were randomly assigned to a training set, an internal test set or an external validation set. Spot abundances were analyzed by artificial neural networks (ANN) to identify biomarkers which differentiate between ATN and PRA. When the trained neural network algorithm was tested against the training data it identified the diagnosis for 16/18 patients in the training set and all 10 patients in the internal test set. The accuracy was validated in the novel external set of patients where 9/10 subjects were correctly diagnosed including 5/5 with ATN and 4/5 with PRA. Plasma retinol binding protein (PRBP) was identified in one spot and a fragment of albumin and PRBP in the other. These proteins are candidate markers for diagnostic assays of AKI.
Mortality associated with acute kidney injury (AKI) is approximately 50% in spite of advances in medical therapy [1,2]. Timely recognition of AKI and its causes could lead to better understanding of the pathophysiology and improve outcomes. Unfortunately no reliable diagnostic markers which can predict the cause of AKI are currently available. The two most common causes of AKI are prerenal azotemia (PRA) and acute tubular necrosis (ATN) [1–3]. Identification of markers that predict which of these diseases is present would allow earlier and more appropriate therapy. The most common method to differentiate ATN from PRA is to determine the fractional excretion of sodium (FENa) in the urine . The FENa is typically low in PRA and higher in ATN. However, treatment with diuretics increases the FENa in PRA. In addition to the problem with diuretics, FENa often gives incorrect results in other situations [5–8]. Because of the difficulty differentiating between these conditions, new markers are necessary.
Several urine protein markers of tubular injury have been proposed including serum proteins like albumin, alpha-1 microglobulin, beta 2-glycoprotein and plasma retinol binding protein [9,10]. Proteins of renal tubular origin which are potential markers include NHE3 , N-acetyl-glucosaminidase (NAG) [12,13], neutrophil gelatinase-associated lipocalin (NGAL) , cytokines and proteases . Kidney Injury Molecule-1 (KIM-1), a proximal tubular brush border protein which is upregulated by injury, was identified as a potential urinary marker of tubular injury [16,17] and a combination of KIM-1 and NAG have been used to predict the need for renal replacement therapy in patients with acute kidney injury . Urine levels of the two proteins, along with four clinical variables, predicted the need for renal replacement therapy better than any of the factors alone. IL-18 is known to be a mediator of AKI in animal models. Urine levels of IL-18 correlate with the diagnosis of ATN compared to other causes of AKI . Elevated levels of NGAL were highly correlated with AKI in a study of patients seen in an emergency room . These markers have been used to identify early tubular injury or to provide prognostic information but have not been used to distinguish ATN from PRA. Traditionally, potential biomarkers have been identified from known proteins and screened individually for their diagnostic potential. Proteomic methodologies can quantify multiple different potential markers simultaneously in the sample to provide a more powerful analysis [20–22]. Previous studies of urine biomarkers have shown an advantage in sensitivity and specificity using combinations of markers rather than a single marker [12,23].
Proteomic methods have previously been used to identify proteins associated with tubular injury. Nguyen and colleagues used SELDI (Surface Enhanced Laser Desorption Ionization) to identify proteins which increased in postoperative AKI prior to an increase in serum creatinine . These markers have not been confirmed to be reproducible or accurate on samples that were not used to train the algorithm. In a rat model of sepsis-induced AKI, two dimensional gel electrophoresis with DIGE (Difference Gel Electrophoresis) was used to show that 30 urine proteins changed in abundance including the renal tubular brush border protein meprin 1-alpha . Proteomic analysis of urinary exosomes has also been used to identify urine protein changes associated with tubular injury. Exosomes are apical membrane vesicles that are secreted into the urine which contain both membrane proteins and intracellular fluid [26,27]. Analysis of urinary exosomes by DIGE in two models of renal injury demonstrated proteins that were changed in abundance prior to an increase in serum creatinine . One of the proteins, fetuin A was also increased in three ICU patients with acute kidney injury compared to ICU patients without renal injury. These studies have used proteomic methods to identify individual proteins that are changed with AKI. They have not systematically examined the accuracy of these markers in complicated patients with coexisting diseases. Combinations of markers will most likely be required for diagnostic markers in acutely ill patients with confounding illnesses. The objective of this study is to identify biomarkers that can differentiate between ATN and PRA. In this report, we separate patients with ATN from those with PRA by defined criteria and by following them over time. This approach allows us to accurately separate the two groups. We then use artificial neural networks (ANN), which are computer algorithms that can define a nonlinear association, to identify a combination of two spots from 2D gels that can differentiate between patients with ATN and PRA and test the accuracy of the algorithm in a different set of patients.
AKI was defined by an increase in serum creatinine of ≥50% or 0.3 mg/dl over a period ≤ 48 hours. ATN was defined as AKI in a situation which could be expected to cause ATN that could not be attributed to another cause, did not respond to fluid hydration and had at least 2 of the following characteristics: 1) BUN:Cr ratio ≤20, 2) urinalysis with muddy brown granular and epithelial cell casts and free epithelial cells; 3) fractional excretion of sodium >1%, fractional excretion of uric acid >12% or fractional excretion of urea >35%; 4) urine osmolality <350; 5) rate of rise of serum creatinine > 0.3 mg/dl/day. PRA was defined as AKI in an appropriate clinical situation which could not be attributed to another cause, where urinary sediment did not show epithelial cell casts or free epithelial cells, responded to fluid hydration with improvement or no further worsening of serum creatinine and had at least two of the following characteristics: 1) BUN:Cr ratio ≥ 20; 2) fractional excretion of sodium <1%, fractional excretion of uric acid <12% or fractional excretion of urea <35%; 3) urine osmolality >500 mosm; 4) urine volume <500 ml/day. In all cases we continued to follow the clinical course of the patients to confirm the diagnosis of ATN or PRA. Since the patients were followed over time we could differentiate the two conditions better than observations at a single time point. Patients were excluded for whom informed consent could not be obtained. In most cases, the investigators were made aware of the patients by the clinical Nephrology service. All patients who we identified with AKI that was caused by either ATN or PRA and from whom we obtained urine during the time from June 2003 to October 2004 were enrolled. Serum laboratory values were obtained from the clinical record. Fractional excretion of sodium was calculated as (urine sodium × serum creatinine)/(serum sodium × urine creatinine). Fractional excretion of urea or uric acid was calculated by replacing sodium in the formula with urea or uric acid.
Urine was collected from patients at the Medical University of South Carolina or the Ralph H. Johnson VA Medical Center in accordance with a protocol approved by the appropriate institutional review boards. Random fresh urine was collected and processed immediately. We did not collect first morning voids because we attempted to collect the urine as soon as possible after recognition that the patient had AKI. In addition, many of the patients had bladder catheters. Cellular elements and debris were removed by centrifugation at 1000 × g for 10 minutes. Samples were frozen without protease inhibitors at −80° C until they were prepared for two dimensional gel electrophoresis (2-DE). After thawing, an equal volume of acetone at 4° C was added to 10 ml of urine sample and incubated at 4° C for 10 minutes. The samples were centrifuged at 9000 × g for 10 minutes and the precipitate was resuspended in 100 μl of a buffer containing 7 M urea, 2 M thiourea, 2% CHAPS and 1% Amidosulfobetaine-14 (ASB-14). Urine protein concentration was adjusted to 100 μg in 185 μl with a buffer containing 7 M urea, 2 M thiourea, 2% CHAPS, 1% ASB-14, 0.2% 3–10 ampholytes and 50 mM DTT. Two-dimensional electrophoresis was done as previously described over a pI range of 4–7 and on 8–16% gradient SDS-PAGE gels . Gels were stained with Sypro Ruby (Molecular Probes) and imaged on an FX Pro Plus fluorescent imager (Bio-Rad). Gel images were matched into a set using PDQuest software. Protein spots were aligned across the 38 gels using PDQuest software (Bio-Rad). Automated spot alignment by the software was followed by manual confirmation and correction of alignments by an experienced user.
We used an exploratory multivariate statistical cluster algorithm to look for general patterns of similarity. Protein abundance data was converted to a usable format by exporting the list of protein coordinates, matches and intensities from PDQuest as a text file. The text file was transformed to an XML form called annotated gel markup language (AGML) as previously described . This data structure enabled the assembly of a bioinformatics infrastructure, AGML central, which is a web-based open-source public infrastructure for dissemination of 2-DE data in XML format . The AGML formatted structure describing the urine protein spot abundance was processed by code written in MATLAB (MathWorks Inc) for data analysis and graphic visualization. The spots were ranked by intensity and expressed as quantiles. This is a non-parametric approach that replaces spot intensity by its quantile within the individual gel . Quantile rank rather than spot intensity was used for analysis. To determine if groupings of patients could be observed based on disease; we performed an exploratory analysis using multivariate statistical methods with code written making use of the MATLAB statistics toolbox. Bottom-up exploratory analyses was performed by unweighted pair group average (UPGA), using Euclidean distance between spot quantile vectors as a dissimilarity metric.
The protein spot intensities were ranked by intensity and expressed as quantiles. The 38 patients in the set were randomly sorted into one of three groups: training (18 patients), internal testing (10 patients) and external validation (10 patients). The random selection to each group was performed to keep a balanced proportion between the two outcomes, that is, independent selection from each of the two groups of patients. The identification of artificial neural network models was performed by Matlab code written along the guidelines previously proposed , which includes bootstrapped cross-validation as an early stop criteria, and screening for optimal topology. The predictive value of each spot was evaluated by sensitivity analysis by determining, for each ith spot in each jth gel/patient, Si,j=(dOj/dli,j).(li,j/Oj). The TRAINING dataset was used to parameterize the ANN which was pursued with its own internal testing by cross-validation, with each run including an early stopping and topology optimization. A cross-validation scheme with leave 1/9th out was used where the median performing ANN was selected from each run. A more detailed description of the topology optimization, cross-validated, with early stopping and training procedure is described in .
The TESTING dataset was used strictly for variable addition – for all possible n-s variables to be added to a pool of s variables already selected from a total of n, the trained ANN that best predicted this second dataset would designate which variable to be selected by the sth iteration of the variable addition procedure.
Finally, the EXTERNAL validation dataset was kept completely independent from both the training and variable selection procedures in order to have an unbiased assessment of the model predictability which was used to determine the point of the variable selection procedure at which we find the optimal set of variables. Comparison of the diagnostic results of the external set with those of the training and internal testing sets were only done after the training was completed.
The threshold value to discriminate ATN from PRA was determined in the training set and applied independently to the external validation set. Therefore, since the ANN was trained and the threshold value was set independently of the external validation set, the classification is unbiased. Outputs greater than the threshold predict a diagnosis of ATN and outputs less than the threshold predict PRA. Sensitivity was defined as True positive ATN/All patients with ATN and specificity was defined as True negative ATN/All patients without ATN. Overall accuracy was calculated in the external set by true discovery rate, that is, by all correct diagnoses/number of patients.
Protein spots were picked from the gel using a Proteome Works spot picking robot (Bio-Rad) and digested with trypsin as previously described . Digests were concentrated using C18 ziptips (Millipore). Digests were initially analyzed on a Micromass MALDI-TOF mass spectrometer. Peptide mass fingerprinting was done using the Mascot search engine against the MSDB database with the following assumptions: Fixed carbamidomethyl modifications of cysteine residues, variable oxidation modifications of methionine and peptide mass tolerance ± 100 ppm. For the spots that could not be identified by MALDI-TOF, tryptic digests of gel plugs were loaded onto a 15 cm, 75 μm internal diameter reversed phase C18 column (Microtech Scientific, Vista, Ca) for liquid chromatography/mass spectrometry analysis. Peptides were eluted with a gradient of 2–80% buffer B (acetonitrile/0.1% formic acid) directly into a Finnigan LTQ linear ion trap mass spectrometer in nanospray, positive ion mode. Full scans, zoom scans and MS/MS scans were done. Protein identification was done using the Turbo-SEQUEST algorithm in Bioworks 3.1 software and the MSDB database. Criteria for identification of peptides was XCorr >1.5 for singly charged ions, >2.0 for doubly charged and > 2.5 for triply charged ions.
Mean characteristics of the 38 patients are shown in Table 1. Although we were comparing urine protein differences between two diseases, the patients in the groups had very heterogeneous characteristics. Many coexisting diseases were present among patients in both groups. Table 2 shows the individual patients in each group with their renal disease, gender, urine specific gravity, serum creatinine, fractional excretion of sodium, coexisting disease processes, and whether or not they were septic at the time the urine was obtained. The table also shows whether they required dialysis or not. The patients were a diverse group and it was often difficult to tell exactly when the injury occurred. This is typical of the clinical presentation of AKI. Proteins were separated by 2-DE and 231 spots were aligned across the gels from the 38 patients. Abundance values of seven spots representing three proteins (alpha-1 antitrypsin, gelsolin and transferrin) were statistically different between the groups but none could consistently predict the etiology of the disease (data not shown).
To determine if unsupervised groupings of proteins was correlated with unrelated factors such as patient age, the amount of protein in the urine, the batch in which the sample was prepared and separated or the serum creatinine, we performed exploratory multivariate statistical analysis by the bottom-up approach of unsupervised simultaneous clustering of gels and spots by unweighted pair group average (UPGA). This approach did not reveal any clustering by any of the factors (Figure 1) and demonstrates that factors other than the cause of AKI or the other measured variables are causing most of the variability in spot intensity on the gels.
We used a machine learning algorithm (ANN) to identify nonlinear patterns that could differentiate the diseases. The rank ordered spot intensity data from the 38 patients was used to train the ANN. Networks were repetitively trained with the training set and iteratively checked against the test set. A combination of two markers had the maximum predictive independent performance. The two spots are shown in Figure 2.
To test the ability of the algorithm derived from the ANN to identify the etiology of AKI, we evaluated the output of the ANN in the external validation set which was not used for training the algorithm. Table 3 shows the output values of individual patients in each of the three sets. The table compares the observed diagnosis (0 for PRA or 1 for ATN) to the predicted diagnosis value for each of the three sets (a number between 0 and 1). The output value is assigned by the ANN and represents the similarity of the data from each patient to data in the training set. A segmentation threshold value at which a diagnosis of ATN or PRA was made was set at 0.21 based on the training set. Since this value was selected in the training set and applied to the external set, no violation of the independent assessment for the external dataset takes place. By comparing the output value to the threshold value, a predicted diagnosis is obtained. The diagnoses for 16/18 patients in the training set and for all 10 patients in the internal test set were correctly identified. Most importantly, all 5 of the patients with ATN in the external validation set were correctly diagnosed (100% sensitivity) as were 4 of 5 patients with PRA (80% sensitivity). An ROC curve was generated for the group of 10 patients in the external validation set where a total area under the curve (AUC) of 0.88 was observed. The 90% accuracy and high AUC value in the validation set indicate a very good quality test since the validation set was not used in either training or variable selection for the algorithm. The algorithm was not able to predict death or dialysis in these patients.
The two protein spots that were identified as markers were spot numbers 40 (pI 4.9, Mr 13, Spot 2103 in AGML) and 133 (pI 5.4, Mr 20, Spot 5305 in AGML). Figure 3 plots the quantile ranks for intensity of the spots in the validation set. The correlation of protein rank abundances for the two spots with the disease is nonlinear, more specifically it indicates an XOR (exclusive OR) interdependency, where either high or low values of spot 133 can correlate with presence of ATN depending on the value of spot 40. Intermediate abundances of spot 133 are associated with PRA only for low values of spot 40. The abundance of both proteins spots must be known to differentiate between the two outcomes. The XOR distribution of intensities observed also suggests a possible reason that linear approaches to identify biomarkers failed: the ANN predictor is relying on inter-dependencies that would not be captured by linear discriminant analysis tools.
After determining that analysis of abundance of two spots on the gel could differentiate between ATN and PRA, we attempted to identify the proteins that make up those spots. Digested spots were analyzed by MALDI-TOF MS and MALDI-TOF/TOF tandem MS. Spot 133 was identified as PRBP by peptide mass fingerprinting with a MASCOT score of 81 (> 63 significant, p<0.05). Ten peptides were matched from 43 searched. 53% of the protein was covered by the expected peptides (figure 4A). To confirm the identification, peptide sequencing was done by MALDI-TOF/TOF MS. Based on three peptide ions sequenced (two were overlapping), the protein was again identified as PRBP with a total ions score of 167 where a total ion score of 60 is considered significant. Bold sequence in figure 4A shows sequenced peptides.
Spot 40 was not identified by MALDI-TOF mass spectrometry with peptide mass fingerprinting nor by tandem mass spectrometry using the MALDI-TOF/TOF mass spectrometer. To identify this spot, we took advantage of the higher sensitivity of the linear ion trap mass spectrometer. We identified sequence from two different proteins in the gel plug. One peptide from PRBP was sequenced from both scan 1755 (XCorr 2.532, charge 2) and scan 1769 (XCorr 2.509, charge 2). The precursor ion had a mass of 1199.38 (YWGVASFLQK). Two different peptides were sequenced which mapped to albumin protein. They were an ion from scan 1324 (XCorr 2.968, charge 3) with a precursor mass of 1640.91 (KVPQVSTPTLVEVSR) and an ion from scan 1654 (XCorr 2.263, charge 2) with a precursor mass of 1312.54 (HPDYSVVLLLR). Since the molecular weight of the original protein spot was 13 kDa and the molecular weights of PRBP and albumin are 21 and 66 kDa, the proteins identified in the spot are likely fragments. To confirm that these proteins were in the gel spot and to refine the likely sequence of the peptide fragments, we compared the spectrum obtained from MALDI-TOF analysis of the tryptic peptides to predicted tryptic peptides from the two proteins (Figure 5 and Table 4). All but six of the sixteen most abundant peptides in the MALDI spectrum could be accounted for by either PRBP or albumin or by the contaminant keratin or autolysis of trypsin. Three peptides were identified whose masses matched to predicted tryptic fragments of PRBP. The peptides covered a region of PRBP extending from amino acid residue 90–139 of the circulating form of PRBP (Figure 4B). All 50 amino acids of this fragment were accounted for yielding 100% coverage. The predicted mass of this peptide is 5.8 kDa. Since the gel plug was cut from a position on the gel corresponding to 13 kDa, this sequence must represent only a portion of the sequence of the protein fragment in the spot. Four peptides were identified whose mass matched to predicted tryptic peptides from a fragment of serum albumin extending from 324–428 (Figure 4C). The predicted size of the fragment is 12.2 kDa which correlates well with the position of the protein on the gel implying that all or most of this fragment can be accounted for by the postulated sequence.
The two most common definitions of AKI are the RIFLE and the AKIN definitions. The change in creatinine criteria for the first stage of RIFLE is an increase in serum creatinine of at least 50% and the first stage AKIN is an increase of 50% or 0.3 mg/dl. All patients in this study met the criteria for at least the first stage of both AKI criteria. ATN and PRA are the two most common causes of AKI. Differentiating between these two causes is a common clinical problem. Tools such as FENa that are most commonly used are not reliable. Using artificial neural network analysis we identified a set of two markers that can differentiate between ATN and PRA 90% of the time in patients that were not used to train the network. Not only was more than one marker required for the diagnostic ability, but the pattern of spot rank abundances for the two spots suggests a plausible explanation why data analysis methods that rely on assumptions of linear independence have been inefficient at uncovering biomarkers. The predictive association relies on the covariance between the two candidate markers, and specifically on a particularly complicated type of interdependency, the Exclusive OR relationship (XOR), a well established litmus test for machine learning tools . This study demonstrates that urine protein abundance can be used to predict the cause of AKI. We have recently used artificial neural network analysis of urine protein abundances to identify biomarkers of glomerular diseases . In that study of patients with biopsy proven diseases, we showed that altered glomerular permeability to glycosylated proteins can be used to predict the cause of the glomerular disease. The current study extends these findings to demonstrate that urine proteins can be used to predict the cause of AKI.
We did not identify several previously suggested candidate proteins for the occurrence of AKI such as NGAL and KIM-1 because they were not observed to be differentially expressed in our 2D gels. The proteins we identified in the diagnostic spots were PRBP and a spot that contained fragments of PRBP and albumin. PRBP has been suggested previously as a marker for renal tubular dysfunction [10,35,36]. As we found for a number of potential protein markers, however, levels of PRBP alone had too much overlap between tubular and non tubular diseases to be useful. In fact, mean abundances of PRBP were not statistically different between patients with the two diseases. PRBP was diagnostic only when it was analyzed in a complex algorithm with the abundance of a second spot. As shown in figure 3, the relationship of spot abundance to the disease was not linear. In fact, either high or low levels of plasma retinol-binding protein were associated with a diagnosis of ATN which only become informative when the abundance of the albumin component marker is also known. Intermediate levels were associated with a diagnosis of PRA. We propose an hypothesis for the reason that these urine proteins change in this way. PRBP and albumin are filtered at the glomerulus and reabsorbed and metabolized in the renal proximal tubule. When the tubules are dysfunctional, as in ATN, larger amounts of PRBP will appear in the urine. In severe ATN the fall in GFR is greater than that seen in PRA, therefore filtration of plasma proteins declines leading to a lower level in the urine. The value of the abundance of the second spot which is a fragment of a filtered serum protein is required to interpret the value for PRBP and predict the diagnosis. Since there may be a transition state between ATN and PRA this relationship may not always hold up diagnostically.
A frequent problem in interpretation of urine protein abundance is how to normalize protein abundance in the urine. Several methods have been proposed including loading equal amounts of protein, adjusting for the amount of creatinine in the sample or for the volume of urine. In the ANN analysis we normalized protein abundance by assigning proteins a quantile value based on the rank of their abundance within the gel. The most abundant protein has a rank of 1 and the least abundant has a rank of 0. We have recently shown that this nonparametric method of normalization to be particularly advantageous for gel electrophoresis data where strong variability in the staining and exposure procedures are compounded by a spot identification error . These data show that it is also a useful method for normalizing urine protein abundance because it expresses protein abundance as it relates to other proteins in the sample.
These studies have defined candidate markers that can be used to differentiate ATN from PRA. The findings were confirmed in a set of ten patients that were not used to train the algorithm demonstrating the validity of the test. Further validation using samples obtained from other centers will be necessary to confirm the usefulness of these markers.
This project has been funded by R01DK080234 and with Federal funds as part of the NHLBI Proteomics Initiative from the National Heart, Lung, and Blood Institute, National Institutes of Health, under Contract No. N01-HV-28181. Additional support for this project came from the Department of Veterans Affairs. 2D gel electrophoresis was done by Alison Bland, Lou D'Eugenio and Melissa Dugan. Tim Taylor assisted with sample collection and preparation. Tandem mass spectrometry was done in the MUSC mass spectrometry facility with the assistance of Dr. Kevin Schey and Jennifer Bethard. We are grateful to the Nephrology fellows at MUSC for help with identification of patients and sample collection.
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.