|Home | About | Journals | Submit | Contact Us | Français|
Kidney transplant recipients that develop signs of renal dysfunction or proteinuria one or more years after transplantation are at considerable risk for progression to renal failure. To assess the kidney at this time, a “for-cause” biopsy is performed, but this provides little indication as to which recipients will go on to organ failure. In an attempt to identify molecules that could provide this information, we used micorarrays to analyze gene expression in 105 for-cause biopsies taken between 1 and 31 years after transplantation. Using supervised principal components analysis, we derived a molecular classifier to predict graft loss. The genes associated with graft failure were related to tissue injury, epithelial dedifferentiation, matrix remodeling, and TGF-β effects and showed little overlap with rejection-associated genes. We assigned a prognostic molecular risk score to each patient, identifying those at high or low risk for graft loss. The molecular risk score was correlated with interstitial fibrosis, tubular atrophy, tubulitis, interstitial inflammation, proteinuria, and glomerular filtration rate. In multivariate analysis, molecular risk score, peritubular capillary basement membrane multilayering, arteriolar hyalinosis, and proteinuria were independent predictors of graft loss. In an independent validation set, the molecular risk score was the only predictor of graft loss. Thus, the molecular risk score reflects active injury and is superior to either scarring or function in predicting graft failure.
Kidney transplants that develop dysfunction or proteinuria after one year following transplantation are at considerable risk for progression to renal failure (1). Certain histopathologic features, particularly interstitial fibrosis and tubular atrophy (IFTA), correlate with graft dysfunction, treatment response, and risk of progression to failure in transplants as well as in native kidneys (2–8). This has led to the belief that late failure of kidney transplants is due to progressive nonspecific scarring, possibly related to calcineurin inhibitor toxicity. However, IFTA is common in kidney transplants, reflecting the burden of injury including donor death, organ harvest, and the transplantation process, and mostly develops in the first year (9). Recent analyses (1, 10) indicate that the main cause of late graft loss is not unexplained scarring or calcineurin inhibitor toxicity but specific disease entities, particularly late antibody-mediated injury and recurrent disease.
Identifying the molecules associated with graft failure could potentially lead to interventions that would slow the progression of organ failure. Some of the individual molecules that predict risk of failure in native proteinuric kidney disease include VEGF and molecules associated with activation of intracellular hypoxia response (11). In kidney transplant biopsies, many molecules show altered expression related to rejection or injury (12–16). However, no comprehensive analysis of the relationship between the transcriptome and allograft survival has been performed.
The emergence of microarrays permits a genome-wide survey of the transcripts associated with future failure in renal allografts presenting with clinical indications for a biopsy, i.e., biopsies for cause (BFCs). The present study analyzed the relationship between gene expression in late BFCs in human kidney transplants and subsequent graft loss and assessed the predictive value of gene expression alone and in combination with histologic lesions and clinical variables. We evaluated the performance of these genes in an independent validation set and in a population of early biopsies that have a very low risk of subsequent graft failure.
Because almost all failures occurred in patients who presented for a BFC after 1 year following transplantation (1), this group was selected for the analysis of risk prediction. The study population included 105 consecutive consenting patients who underwent BFCs between 1 and 31 years after transplantation (median, 57 months). Where more than 1 biopsy was available per patient, only the first biopsy was used for analyses. Median time to graft loss was 14 months, and the median follow-up after biopsy for patients without death or graft loss was 32 months.
We observed 30 graft failures during follow-up, and 4 patients died with a functioning graft. Demographics and clinical characteristics of all patients are outlined in Table Table1.1. Grafts that subsequently failed had higher incidence of proteinuria and rapid deterioration in function before biopsy and lower glomerular filtration rate (GFR) at time of biopsy. There were no differences in primary disease, time after transplantation, maintenance immunosuppression, or incidence of anti-HLA antibodies between grafts that subsequently failed after biopsy and those that did not. As previously reported, the main disease diagnoses in biopsies from grafts that failed were antibody-mediated rejection (ABMR) (either C4d-positive or C4d-negative) and glomerulonephritis (1).
To identify all genes associated with graft loss, we performed a Cox regression on the entire dataset of 105 late biopsies, using all probesets that passed the interquartile range (IQR) filter (n = 11,500) (see Methods). 886 genes represented by 1,312 probesets were significantly associated with graft failure (598 positively and 288 negatively) at the 0.0001 level (uncorrected P value) (Supplemental Table 1; supplemental material available online with this article; doi: 10.1172/JCI41789DS1).
We compared the genes associated with subsequent graft loss with those associated with rejection at the time of biopsy in the same dataset. Rejection-associated genes had been identified in a previous analysis (14) using the BioConductor software package limma (17). Rejection-associated genes and genes associated with future graft loss were derived from the same biopsies. Of the 886 genes associated with graft loss, only 82 (9%) overlapped with the gene list associated with rejection at the same P value cutoff (Figure (Figure1).1). The transcripts associated with future graft loss were primarily those that had been annotated as associated with tissue injury, matrix remodeling, and epithelial dedifferentiation (Table (Table2),2), while the rejection-associated genes had been annotated as reflecting inflammation, i.e., IFN-γ effects, infiltrating T cells, and macrophages (18).
Because we aimed to develop a biopsy-based risk prediction method, we built a gene-based classifier to predict graft loss. Classifier results were obtained using a multiple 10-fold cross-validation method (19). Details of the analysis are described in Methods. A diagram illustrating the process of building the classifier and validation steps is shown in Supplemental Figure 1.
The classifier used 2,748 probesets on at least one occasion, and 117 of the probesets were used in at least 50% of the classifiers. The 30 annotated genes used most frequently by the classifier to assign the risk of graft loss are shown in Table Table3,3, ordered by the proportion of times each gene was used in all the cross-validation/resampling loops. Twenty-seven of the top 30 genes (90%) had previously been annotated in experimental transplantation systems as members of pathogenesis-based transcript sets (PBTs) reflecting tissue injury and matrix remodelling (see Methods and Supplemental Table 2).
We used the molecular classifier to assign a prognostic molecular risk score to each biopsy using supervised principal components analysis (PCA) (20). Based on the average risk score across the test sets in the 100 validation loops, we assigned patients into either the high- or low-risk group (n = 52 high-risk, n = 53 low-risk patients) (Figure (Figure2).2). The range of risk scores for each biopsy across the validation loops is shown in Supplemental Figure 2. The mean risk score in those kidney grafts surviving to 1 year after biopsy was –0.31 versus 1.85 in those grafts that failed by 1 year (t test, P = 9.3 × 10–8). In the high-risk group, 25 of 52 patients progressed to graft loss after biopsy, compared with only 5 losses in the low-risk group (log-rank test, P = 3 × 10–7). Kaplan-Meier survival curves for the 2 risk groups are shown in Figure Figure3. 3.
Table 4 shows the correlation between the molecular risk score and histologic and clinical variables. The risk score was negatively correlated with renal function (estimated GFR) at the time of biopsy and positively correlated with proteinuria, interstitial inflammation, atrophy, and fibrosis and less strongly with tubulitis. It was not correlated (P > 0.05) with time of the biopsy after transplantation, glomerular or arterial changes, or arteriolar hyalinosis.
We examined the relationship between risk score and time to failure (Figure (Figure4).4). Among the patients with graft loss, a higher risk score was associated with shorter time to failure (P = 0.0006). Among patients censored only because of end of study, there was no relationship between risk score and time to censoring. Among patients censored because of death, all 4 of whom died fewer than 400 days after biopsy, there was a significant correlation between risk score and time to death (P = 0.01). This may reflect the effect of a failing kidney transplant on the risk of death, but the small number of observations precludes firm conclusions. Thus, we included deaths with functioning grafts as censored data in the classifier and Cox regression analyses. We conclude that the risk score predicts not only whether failure will occur but also time to failure.
The association of molecular, clinical, and histologic features with graft loss was assessed in a univariate Cox regression analysis (Table (Table5).5). In this analysis, the features significantly associated with graft loss were molecular risk score, proteinuria, interstitial fibrosis, tubular atrophy, peritubular capillary basement membrane multilayering (PTCML), mesangial matrix score, interstitial inflammation, glomerulitis, low GFR, and absence of arteriolar hyalinosis. Many variables were not associated with outcome, including time of biopsy after transplantation, the presence of HLA antibody, C4d staining, transplant glomerulopathy, capillaritis, and arterial changes.
To assess the independent associations of these variables, we performed a forward stepwise multivariate regression (Table (Table6)6) of all features that reached significance (P < 0.05) in the univariate analysis. This analysis identified the independent predictors of graft loss as the molecular risk score, peritubular basement membrane multilayering, proteinuria, and arteriolar hyalinosis. Surprisingly, neither GFR nor IFTA was significantly associated with graft loss when the risk score was included in the multivariate analysis.
Since the presence of proteinuria and GFR were known before the biopsy, we repeated the multivariate analysis excluding these factors to determine which information arising from the biopsy (histology, microarray results) predicted failure. When GFR and proteinuria were excluded, the independent risk factors were the molecular risk score, PTCML, and absence of arteriolar hyalinosis.
When the receiver operating characteristic (ROC) curves for each of the features independently associated with graft loss in the multivariate analysis were compared (Figure (Figure5A),5A), the molecular risk score showed a greater area under the curve (AUC = 0.83) than the clinical or histologic features (AUC = 0.63–0.76). When the P values for the differences in AUCs were calculated by permutation test, they were not significant: P = 0.17 for risk score versus proteinuria, P = 0.06 versus PTCML, and P = 0.002 versus arterial hyalinosis. However, the maximum accuracy obtainable using the risk score was significantly greater than with proteinuria (Figure (Figure5B)5B) (P = 0.001). In fact, using the presence of proteinuria as a threshold led to a lower accuracy (i.e., 0.69) than did the null distribution (accuracy, 0.71) (i.e., guessing that all samples survived). In contrast, all risk score thresholds greater than 0.0 produced higher accuracies than the null distribution.
We assessed the strength of predictions from simple single gene classifiers, based solely on the expression values of the highest ranked and the 100th, the 500th, and the 5,000th ranked genes in each training set (Supplemental Figure 3). Predictions from single genes, even those that were not near the top of the ranked list of genes, were very similar to those using the full PCA model. Thus, the risk for graft loss can be predicted adequately by any of a great many genes. This is due to the highly coordinated changes in expression found in many thousands of genes.
We assessed the performance of the classifier in an independent set of biopsies (n = 48) taken for clinical indication more than 1 year after transplantation at the University of Minnesota, with patient characteristics and clinical and histologic features similar to those in the studies described above. This population experienced 11 graft losses and had a median follow-up time of 406 days. Risk scores were calculated by supplying the classifier built from our main dataset with the gene expression values from the Minnesota dataset. The relationship of the molecular risk score to graft loss, the Kaplan-Meier plots of high- versus low-risk groups (defined using the risk threshold in the main dataset), and the ROC curve illustrating the tradeoff between sensitivity and specificity (Figure (Figure6,6, A–C) showed results similar to those in the original data set. In the univariate analysis in this validation set, only the molecular risk score was significantly associated with subsequent graft loss (P = 0.0004). Thus, the risk score was superior to classic predictors such as IFTA and low GFR.
To assess whether high risk scores always predict risk for graft loss, we applied the classifier algorithm (based on late biopsies) to a set of biopsies taken within the first year after transplantation, which had very little risk of graft loss during follow-up in our study (Figure (Figure7).7). Of 73 early biopsies (<1 year after transplantation), 44 (60%) had a risk score above the high-risk threshold from the late biopsy analysis (sensitivity, 1.00; specificity, 0.41; Table Table7).7). However, only 2 of these grafts subsequently failed. Both of these were actually biopsied late in the first year, at 273 days and 360 days. Most of the remaining 42 early biopsies, which were classified as high risk but did not fail during the follow-up period, were diagnosed as T cell–mediated rejection (TCMR) (n = 8), borderline TCMR (n = 4), polyoma virus nephropathy (n = 3), or acute tubular necrosis (n = 21). When comparing early and late biopsies with high-risk scores, we found differences in diagnostic categories: early biopsies with high-risk scores (but low incidence of failure) had mainly acute tubular necrosis and TCMR, while late biopsies with high-risk scores (and high incidence of failure) were often diagnosed as having ABMR or glomerulonephritis (Table (Table8). 8).
To exclude that the high rate of false-positive predictions in the early biopsy group was due to a statistical bias (based on the fact that the classifier was built on late biopsies and applied to the clinically different group of early biopsies), we built a separate classifier on the entire dataset (early and late biopsies). In this analysis, the molecular risk score continued to have a high predictive value for graft loss, but the difference between early and late biopsies in terms of positive predictive value remained (Table (Table7). 7).
Thus reversible/treatable early injury and rejection can induce the same gene expression changes as the disease processes leading to graft loss in the late biopsies, without leading to graft failure.
The present study developed a gene expression–based classifier to predict kidney allograft failure after a late biopsy (>1 year after transplantation) for clinical indications. The molecular risk scores were strongly associated with graft loss. Using previously annotated transcript sets reflecting major biological events in renal transplants, we found that the transcripts used by the classifier to predict graft failure were those reflecting tissue injury, dedifferentiation of the epithelium, and tissue remodeling, but not those reflecting inflammation (IFNG effects and T cell or macrophage infiltration). The predictive ability of the classifier was high, with a sensitivity of 0.83 and a specificity of 0.63 at the high/low risk threshold. The gene expression risk score correlated with fibrosis and atrophy, interstitial inflammation, and glomerulitis but independently predicted graft failure in a multivariate analysis. The risk score was validated in an independent set of late BFCs.
The risk score is an indicator of active injury, whose significance depends on the disease state of the patient. Risk predictions were accurate when applied to the late biopsy population, where subsequent progression is considerable, but not when applied to early biopsies, where the diseases operating have a low risk of progression when treated. Early after transplantation the diseases causing injury are self-limited (acute tubular necrosis) or reversible by therapy (e.g., TCMR), but in late biopsies the diseases causing injury are progressive and unresponsive to treatment (ABMR, recurrence of primary disease). Thus, the probability of progression after a biopsy for clinical indications is conferred both by the presence of active injury (the risk score) and by the diseases inducing the active injury response and their potential for spontaneous resolution or response to therapy. The classifier and the individual genes predict risk not in absolute terms but in relation to the diseases or injury mechanisms disturbing the kidney and triggering the biopsy.
The transcripts predicting graft loss reflect an ongoing response to injury, including epithelial distress and dedifferentiation, with reexpression of developmental genes and loss of transcripts associated with differentiated epithelium as well as remodeling of the matrix. Many of the molecules used by the classifier to predict progression to renal failure are already known to be involved in responses to injury and play important roles in kidney development. For example, nicotinamide N-methyltransferase (NNMT) and versican (VCAN), both associated with cell migration in cancer (21–24), may represent dedifferentiation and epithelial-mesenchymal transition. ITGB6 is expressed by renal epithelium during stress and injury (25) and activates TGFB1 (26). HAVCR1 (also known as KIM1) is a well-established feature of the injured kidney (27). Interestingly, the collagen genes were not prominent in the list, suggesting that the genes predicting graft loss represent an active response to injury in the epithelium and matrix but not fibrogenesis per se. In previous analyses, we have performed extensive comparisons between microarray results and RT-PCR, which showed excellent reproducibility of both methods and confirmed the robustness of microarray results (28).
The genes associated with graft loss in late biopsies indicate that an active ongoing tissue response to injury is the final common pathway linking mechanisms of inflammation and noninflammatory disease states to parenchymal loss, dysfunction, and eventual kidney failure. The changes in the expression of these genes represent a stereotyped response of the tissue to injury, a structured program. Many transcripts were also found in mouse isografts, which allowed us to map the time-dependent intrinsic responses of the nephron to a clearly defined and self-limited stress, indicating that these genes represent inherently reversible injury (29, 30), if the injury mechanism is self limited or treatable. The gene list was similar to the transcript changes we recently found correlating with GFR disturbances in BFCs (31), which also reflect perturbations in the parenchyma and are indicators of an ongoing response to injury and thus potentially of a reversible epithelial injury repair process. This was further corroborated by observing high-risk scores in early biopsies from kidneys that did not progress to failure. Thus, the transcript changes reflect a perturbation in kidney biology that indicates ongoing remodeling and repair but not inevitable decline toward failure.
The risk score emerges as an independent predictor in multivariate analysis, indicating that transcriptome changes provide information beyond that derived from histology, demographics, GFR, or proteinuria. A molecularly based test has the potential to be more objective than a biopsy read by a pathologist, despite the standardization of the Banff criteria, because of intraobserver variation. In addition, the features of active injury are not captured by morphology but are reflected by gene expression changes. Thus, the risk score emerges as a more robust predictor of risk in the validation set than the classical factors: atrophy and fibrosis, low GFR, and proteinuria. The fact that not all grafts with a high-risk score failed, resulting in false positives, is not unexpected given the fact that patients were treated after the biopsy, resulting in reversal of the injury process in some cases. Since we did not include protocol biopsies performed after 1 year, we do not know whether there is a molecular signature of patients at high risk for graft failure in those without symptoms/signs of ongoing injury.
The superiority of the risk score to the classical features associated with progression to renal failure after a biopsy (atrophy, fibrosis, low GFR, proteinuria) indicates that these actually predict progression because they are correlated with an active injury response in biopsies for clinical indications. Atrophy, fibrosis, and low GFR indicate nephron loss, but the risk score subsumes this risk because it actually reflects nephrons in distress. This may offer hope that progression to renal failure is not inevitable if we can arrest the diseases causing progression. In other words, we find evidence here not of an overwhelming “point of no return” but of active injury that could be reversed if we find therapies that arrest the cause of the active injury.
Written informed consent was obtained from all study patients. The study was approved by the institutional review boards of the University of Alberta (issue 5299), the University of Illinois, Chicago (protocol 2006-0544), the University of Minnesota (protocol HSC#0606 M87646), and the Hennepin County Medical Center (protocol HSR#06-2670). All consenting renal transplant patients undergoing BFCs as standard of care between September 2004 and October 2007 at the University of Alberta or between November 2006 and February 2007 at the University of Illinois were included in the analysis. In addition to our cross-validation analysis of the Edmonton dataset, biopsies obtained from Minnesota between September 2006 and September 2007 were used as an independent validation set.
Biopsies were obtained under ultrasound guidance by using spring-loaded needles (ASAP Automatic Biopsy, Microvasive). In addition to the cores required for standard histopathology, we collected one core for gene expression studies. The biopsy sample processing was performed as described in detail in our previous study (14).
All biopsies were assessed using the updated Banff 07 criteria (32, 33) by a pathologist who was blinded to the results of molecular studies. All biopsies had adequate cortical tissue for analysis by Banff criteria, with the exception of 2 biopsies that had no arteries. Diagnostic groups included TCMR, borderline TCMR, ABMR, IFTA not otherwise specified, glomerulonephritis, acute tubular necrosis, and BK virus nephropathy. In addition to ABMR with positive C4d staining (as specified in the Banff classification), we included the emerging category of C4d-negative ABMR (published in ref. 1), which is defined by the presence of circulating donor-specific HLA antibodies and evidence of microcirculation changes in the biopsy (presence of glomerulitis, transplant glomerulopathy, peritubular capillaritis, PTCML).
PBTs represent biological processes during rejection and other types of injury in renal allografts. They serve as an annotation tool, with which genes identified in our analysis can be assigned to biological processes relevant to transplantation. PBTs were derived previously from mouse transplant models and in vitro human cell lines (15, 29, 30, 34–40). The definitions and algorithms for the PBTs referenced in the tables can be found in Supplemental Table 2. The PBTs included IFN-γ–inducible transcripts: GRIT1 and GRIT2 (34); cytotoxic T cell–associated transcripts: CAT1 (15) and QCAT (35); NK cell–associated transcripts: NKST (36); classical macrophage activation transcripts: IMAT (37); alternative macrophage activation transcripts: AMAT1 (38) and AMAT (IL-4–inducible transcripts); B cell–associated transcripts: BAT (39); immunoglobulin transcripts: IGT (39); endothelial transcripts: ENDAT (40); injury and repair transcripts: IRIT subsets IRIT(E), IRIT(I), and IRIT(L) (29); transcripts associated with severe tissue injury: GST and CISTS (38); kidney parenchymal transcripts: KT1 and KT2 (30); and transcript sets reflecting fibroblast activation (FIBET and FIBTG) or TGF-β activation (TGFB and TGFB).
Because we aimed to develop a biopsy-based risk prediction method, graft survival was assessed as time between biopsy and graft failure/censoring, not time between transplantation and failure/censoring. Patients were censored for the end of study (July 26, 2009), death with functioning graft, or loss to follow-up. Graft failure was defined as return to dialysis (n = 30).
Microarray data files were preprocessed using robust multichip averaging (RMA) in Bioconductor and subjected to variance-based filtering (41) as described previously (34). Nonspecific IQR filtering was used to eliminate probesets with low variation across the dataset. 11,500 of the original 54,675 probesets passed this filtering step and were retained for further analysis. Expression and phenotype data, as well the CEL files, are available at the Gene Expression Omnibus database ( http://www.ncbi.nlm.nih.gov/geo/; accession number GSE21374).
Classifier results were obtained using a multiple 10-fold cross-validation method (19). In each iteration, the data were divided into 10 roughly equal-sized subsamples. Nine of the subsamples were used to predict the risk scores of the remaining “left-out” subsample. This procedure was repeated 10 times, each time using a different left-out subsample, so that all biopsies received a single predicted risk score. Within each of the 10 folds of this algorithm, the genes used in the classifier were reselected based only on those samples not being left out, i.e., only using the training set for that left-out subsample. The supervised PCA method uses a shrinkage algorithm (20, 42), ordering the selected genes by their Cox regression P values within each training set. The shrinkage procedure itself selects for genes whose expression is most stable within samples from the same phenotypic class (e.g., within failures and within non-failures). One of the desirable consequences of this method is that many genes with similar P values, in terms of their association with graft loss, can be eliminated in such a way as to retain those with the most informative predictive content. No fixed P value cutoff is used. In each training set, the shrinkage threshold (equal to the number of genes used) was chosen by maximizing the predictive accuracy in that particular training set. This entire procedure was repeated 100 times (each time using a different random 10-fold data split, following the method used in ref. 19), and each biopsy’s average risk score over all 100 iterations was recorded. Based on the average risk score, patients were assigned to one of 2 risk groups (high and low), using the median across all biopsies as the cutoff between both groups. In the case of multiple biopsies from one patient, only the first biopsy was used. A diagram illustrating the process of building the classifier and validation steps is shown in Supplemental Figure 1.
Single gene classifiers were assessed by the same cross-validation method. In each training set, the nth ranked gene (n = 1, n = 100, n = 500, or 5,000) was determined, and its expression value in the corresponding test set was used as the risk score. Since a different nth ranked gene might occur in every training/test set split, all expression values were first standardized by subtracting that gene’s mean expression value and dividing by the gene’s standard deviation.
The full classifier derived from the Edmonton dataset was also applied to the gene expression values of an independent validation set from Minnesota, and risk scores were assigned.
Univariate associations between gene expression and graft survival were assessed by Cox regression analysis (P value cutoff = 0.0001). Multivariate regressions using the molecular risk score, pathology lesions, and clinical variables were built using a forward stepwise model using all features that reached significance (P < 0.05) in the univariate analysis. False discovery rates were estimated using the R package fdrtool (43). Unless otherwise specified, a P value of 0.05 was considered statistically significant.
Special thanks go to Zija Jacaj for help with collection of the clinical data and to Vido Ramassar and Anna Hutton for technical support. This research has been supported by funding and/or resources from Genome Canada, Genome Alberta, the University of Alberta, the University of Alberta Hospital Foundation, Roche Molecular Systems, Hoffmann–La Roche Canada Ltd., the Alberta Ministry of Advanced Education and Technology, the Roche Organ Transplant Research Foundation, the Kidney Foundation of Canada, and Astellas Canada. P. Halloran held a Canada Research Chair in Transplant Immunology until 2008 and currently holds the Muttart Chair in Clinical Immunology.
Conflict of interest: The authors have declared that no conflict of interest exists.
Citation for this article: J Clin Invest. 2010;120(6):1862–1872. doi:10.1172/JCI41789.
See the related Commentary beginning on page 1803.