|Home | About | Journals | Submit | Contact Us | Français|
The management of clinically localized renal masses suspicious or renal cell carcinoma varies, partially because of gaps in the evidence base. We conducted a systematic review to summarize research gaps for the evaluation of composite models for predicting malignancy; use of percutaneous renal sampling for diagnosis; and comparative effectiveness of surgery, thermal ablation, and active surveillance. A total of 147 studies, published in 150 articles, were identified. To promote improved patient care and health outcomes, we recommend incorporation of emerging biomarkers into validated composite models, standardization of biopsy protocols, standard reporting of clinical stage, and performance of prospective studies with objective selection criteria.
Kidney cancer affects approximately 5,000 new patients each year with increasing incidence over the past few decades.1,2 Small, clinically localized tumors now account for upward of 40% of all kidney cancers.3 A simplified view of the current treatment paradigm generally favors extirpation of a mass suspicious for localized renal cell carcinoma (RCC) on a renal protocol computed tomography scan if aligned with patient preferences and surgical risk. However, the detection of a localized renal mass on imaging presents a diagnostic dilemma in urology as the mass can represent a benign lesion indolent cancer, or aggressive RCC. Notably, benign lesions may comprise approximately 20% of surgically reselected tumors.1,2
Ideally, clinical decisions about the diagnosis and management of renal masses and localized renal cancer should be informed by high-quality evidence. Evaluation would start with a pretest probability from a composite model of predictors including imaging, and diagnosis could further be augmented by defining a role for renal mass biopsy. Indications for treatment would depend on the comparative effectiveness of various management options. However a number of inherent research gaps and limitations, such as study type, selection bias, and lack of standardized definition, have made prior attempts to synthesize the renal mass literature to make clinical recommendations difficult. Although a number of opportunities for randomized trials are now available, most existing studies are not randomized or controlled. For instance, there are currently no randomized and only few comparative retrospective trials comparing partial nephrectomy with laparoscopic cryoablation.4 Although data are sometimes collected prospectively, many analyses are performed retrospectively and are thus subject to the many inherent biases associated with his approach.5
We conducted a systematic review of the effectiveness and comparative effectiveness of different strategies in diagnosing and treating patients with a renal mass suspicious for RCC and identified research gaps in the existing literature. The purpose of this article is to identify and summarize critical weaknesses (“research gaps”) in the literature on the diagnosis and management of renal masses and localized renal cancer. We defined a research gap as an area in which the evidence base inadequately addressed a key question (KQ).
The systematic review focused on 3 main KQs developed by various stakeholder groups and submitted to the Agency for Healthcare Research and Quality (AHRQ) via an open process:
KQ1: In patients who undergo surgery for a renal mass that is suspicious for stage I or II RCC, how does the pathologic diagnosis compare with the likelihood of malignancy predicted by using a preoperative composite profile of patient characteristics, including demographics, clinical characteristics, blood or urine markers, and imaging?
KQ2a: In patients who undergo surgery for a renal mass suspicious for stage I or II RCC, what is the accuracy (ie sensitivity, specificity, positive predictive value, and negative predictive value) of percutaneous renal mass sampling (using fine-needle aspiration or core biopsy with cytopathology or surgical pathology) in establishing a diagnosis (eg malignancy, histology, and grade)?
KQ2b: In patients with a renal mass suspicious for stage I or II RCC, what are the adverse effects associated with using renal mass sampling (see KQ2a) to estimate the risk of malignancy, including direct complications (eg pain, infection, hemorrhage, and radiation exposure) and harms related to false positives, false negatives, or non-diagnostic results?
KQ3a: In patients with a renal mass suspicious for stage I or II RCC, what is the effectiveness and comparative effectiveness of the available management strategies on health outcomes?
KQ3b: Do the comparative benefits harms of the available management strategies differ according to a patient's demographic or clinical characteristics, or disease severity defined in terms of clinical presentation, tumor characteristics (imaging), renal mass sampling result, or laboratory evaluations?
We searched MEDLINE, Embase, Cochrane Central Register of Controlled Trials and Clinicaltrials.gov from January 1, 1997 to May 1, 2015. In addition, we requested Scientific Information Packets from device manufacturers to identify additional relevant studies. The results of the searches were downloaded and imported, duplicates were screened out, and the remaining articles were uploaded for systematic review data management. This database was used to track the search results at the levels of title review, abstract review, and article inclusion or exclusion.
Study selection was based on predefined eligibility criteria of patient populations, interventions, outcome measures, and study design (Tables S1 and S2). Abstracts were screened independently by 2 reviewers and were excluded if both reviewers agreed that 1 or more of the exclusion criteria was met. Differences between reviewers regarding abstract eligibility were resolved through consensus. We used DistillerSR (Evidence Partners, 2010) to manage the screening process.
Citations promoted on the bass of the abstract screen underwent another independent screen using the full text of the articles. Additional exclusion criteria were applied at this level (Table S2). Differences regarding citation eligibility were resolved through consensus. Full text articles underwent an additional independent review by paired investigators to determine whether they should be included in the full data abstraction.
Reviewers extracted information on general study characteristics (eg study design, study period, and follow-up) study participants (eg age, gender, race or ethnicity, etc) eligibility criteria interventions, outcome measures and the method of ascertainment, as well as the results of each outcome, including measures of variability. One reviewer completed the data abstraction, and a second reviewer checked the first reviewer's abstraction for completeness and accuracy. We resolved differences through discussion and, as needed, through consensus among our team.
Throughout the review, the Population, Intervention, Comparison, Outcome, Timing and Setting structure was used to describe questions or parts of questions inadequately addressed by the evidence synthesized in the systematic review. The framework and impetus for the process has been previously described by the AHRQ.6 Two reviewers independently assessed the risk of bias of individual studies. We used the Quality Assessment of Diagnostic Accuracy Studies-version 2 tool for diagnostic studies, Cochrane Collaboration's tool for randomized trials, and A Cochrane Risk Of Bias Assessment Tool: for Non-Randomized Studies of Interventions for nonrandomized studies.7-9 Differences between reviewers on individual components of the tool as well as overall ratings were resolved through consensus. This was achieved with mediation by an independent party, discussion between individual reviewers, and finally weekly group discussions involving the entire study team where individual study methodology and research gaps by KQ were discussed. We graded the strength of evidence using the AHRQ Evidence-based Practice Centers Methods Guide for Conducting Comparative Effectiveness Reviews scheme (Table S3)10 The draft report was peer reviewed and posted for public comment. Comments received from invited reviewers and through the public comment website were compiled and addressed.
Figure 1 summarizes the results of our search for relevant studies. The review focuses on 147 studies, reported in 150 articles that met the inclusion criteria. Full details are given in the AHRQ report (Table S4).11 The gaps identified by the review are organized by KQ along with recommendations for bridging the gaps (Table 1)
There were 3 primary gaps in research regarding composite models: the general paucity of data evaluating composite models predicting malignancy, lack of validation of these composite models, and limited use of laboratory biomarkers.
A composite model necessarily included data with a component of imaging and at least 1 additional category (demographics, clinical characteristics, and blood or urine tests). Although a number of studies evaluated a single predictor of malignancy, a total of 20 studies that assessed a composite model with proper adjustment remained. The studies were split on directionally (whether they evaluated predictors of malignancy or predictors of benign pathology) with 10 studies each. The variables included in composite models differed between studies and therefore, individual composite models lacked validation. The most commonly evaluated variables were sex (16 studies), age (15 studies), tumor size (12 studies), tumor characteristics (9 studies), body mass index (5 studies), and incidental presentation (5 studies).
Although there are a number of emerging biomarkers and imaging technologies that may improve on the diagnostic discrimination of current composite models based on clinical findings, the literature lacked published studies using biomarkers in composite models for localized renal masses. This may be due to a failure to test potential biomarkers within a composite model or because tested biomarkers were found to be non-predictive.
Research gaps related to renal mass biopsy in the diagnosis of localized renal masses include lack of standardization of biopsy protocols, inadequate characterization of tumors, uncertainty regarding adverse events and impact on management, and absence of gold standard surgical pathology for biopsies showing benign histology.
Core biopsy has been shown to have more favorable performance characteristics than fine-needle aspiration. Still core biopsy protocols vary on the size of needle used for biopsy and imaging guidance used, which could both affect nondiagnostic rates and overall biopsy yield. Diagnostic outcomes may also be related to tumor characteristics—namely tumor size, presence of cystic component, and location that an affect how easy a tumor is to access for a percutaneous biopsy. Performance characteristics may be biased by selection for patients with tumors that are easy to biopsy. Some studies do not report the number of initially non-diagnostic biopsies, proportion where Fuhrman grading was possible, and correlation to Fuhrman grade at surgical pathology.
Sequelae from a biopsy could also impact the execution of the selected management strategy. For example, perirenal hematomas may delay the ability to perform surgery or increase the difficulty of surgery, but this has not been studied. Although the overall rate of complications of renal mass biopsy was found to be low, indirect risks such as withholding anticoagulation have not been quantified, and the array of complications reported by individual studies varied. This can lead to the potentially incorrect assumption that if a complication is not reported (ie no mention of pneumothorax in a series), the number of cases is zero. Additionally, most studies are focused solely on biopsy performance characteristics. The counterfactual, of what management strategy would have been pursued in the absence of a biopsy, is not well established to assess the proportion of cases where biopsy changes clinical management and improves outcomes.
Our findings demonstrated a high positive predictive value for core biopsy but also a notable non-diagnostic rate and relatively poor negative predictive value. The findings have an associated risk of bias, as there is often no surgical pathology associated with negative or non-diagnostic biopsies. Some studies presumed that a negative biopsy result was a “true negative” based on surveillance parameters. Therefore false negatives (ie cancers) may be missed, which would lower the observed sensitivity and negative predictive value.
The major research gaps related to comparative effectiveness include selection bias and confounding within comparative studies, poor reporting of clinical stage, and variation in reporting of treatment outcomes.
The efficacy and comparative efficacy of management strategies is strongly confounded by the abundance of retrospective, uncontrolled studies with poor reporting of clinical stage. Although randomized controlled trials may be challenging in this space, the number of prospective observational studies is also limited. The greatest number of studies was available for the comparative outcomes of radical nephrectomy and partial nephrectomy. Strength of evidence was rated insufficient for all comparisons for the outcome of quality of life as there were very few studies. Furthermore, very few comparative studies included active surveillance leading to the inclusion of uncontrolled studies. For example, strength of evidence was insufficient for oncologic outcomes and overall survival of active surveillance compared with partial nephrectomy or thermal ablation because of lack of any studies.
Clinical stage was sometimes not reported Whereas partial nephrectomy is usually limited to localized renal masses, radical nephrectomy can be performed for more advanced tumors. The inability to appropriately determine preoperative clinical tumor stage, or the inclusion of locally advanced or metastatic tumors, was an exclusion criterion to limit selection bias. However, among studies evaluating localized renal masses, patient and tumor characteristics and key outcomes were often not stratified by clinical stage, which could help determine if 1 modality was preferable for T1 compared with T2 tumors.
Finally, inconsistent reporting of relevant treatment outcomes was prevalent. For oncologic and overall survival outcomes studies might report various year cutoffs for survival via a product limit estimator (ie 3-year survival, 5-year survival) or list only the absolute proportion of patients surviving with a median or mean follow-up, making it difficult to combine data between studies perform robust meta-analyses. Renal functional outcomes varied in whether the reported outcome was creatinine, estimated glomerular filtration rate, incidence of chronic kidney disease, or a combination of these end points Perioperative outcomes and harms were usually reported without controlling for surgeon experience.
Patient demographics, clinical characteristics, and disease severity are available for each patient, although these data are not easily extracted from medical records or routinely published. Our analysis identifies that these data are important in the evaluation of interventions but dramatically underreported. There was limited evidence to suggest that age, tumor size, and grade were inversely associated with cancer-specific survival among studies comparing radical and partial nephrectomy, and that age and comorbidity predicted overall survival. It could not be determined if differences in these characteristics accounted for differences in comparative effectiveness between management strategies.
An understanding of these gaps can direct future clinical research efforts. To address the research gaps identified above, we have the following recommendations.
Studies evaluating composite models should include standard demographic and tumor characteristics based on face validity and prior research, which would allow for easier comparisons between models and a true assessment of whether additional variables increase the discrimination of the model. These include patient sex, age, tumor size, and other tumor characteristics such as components of the RENAL nephrometry score. Body mass index and incidental presentation are 2 additional variables evaluated by very few studies that should be included when possible strengthen the evidence base. If a continuous variable is evaluated in a categorical fashion, an additional analysis should also evaluate it in a continuous fashion to allow pooling of data in the future for robust meta-analyses.
Furthermore, nomograms should be developed specifically for localized renal masses.2,12 The performance of a model created from patients with all stages of disease may not perform as well in a different population. For example, a nomogram based on the RENAL nephrometry score was shown to predict high-grade pathology in the training cohort including locally advanced and metastatic kidney tumors, but it did not predict high-grade disease in an external validation cohort of small renal masses.13,14
Lastly, emerging biomarkers and imaging modalities should be evaluated in the setting of composite models to assess their utility in changing management decisions Examples of emerging biomarkers for kidney cancer include urine aquaporin-1 and perilipin-2.15 New imaging methods such as 99m technetium-sestamibi single-photon emission computed tomography are also under evaluation to differentiate between malignant and benign pathology16 However, they require validation with sufficient power for patients with localized renal masses. Additionally, null associations or findings are as important as positive studies. Therefore, a well-done study showing no association of a new biomarker or improvement of a composite model deserves to overcome publication bias as it can help reduce repetition and guide new study directions.
First, we recommend standardization and detailed publication of biopsy protocols, including needle size, the number of biopsy attempts, the number of successful biopsies, and the number of patients whose procedures were aborted secondary to technical difficulties17 Second, details on the tumor and its anatomic location need to be reported in relationship to the renal mass sampling outcomes.18 Varying tumor characteristics are likely to yield disparate diagnostic outcomes, and tumors that are more or less amenable to yielding an accurate biopsy should be identified Third, further transparent reporting of data is needed to better characterize negative (normal renal parenchyma vs missed tumor, fibrosis, oncocytoma, angiomyolipoma, etc) and nondiagnostic biopsies and the relationship of Fuhrman grade at biopsy to surgical pathology.
Fourth, studies should report a standard array of harms and adverse events after renal mass biopsy to provide comparable data between studies, including if no events are observed. A special focus should extend to the need to stop anticoagulation or delay surgery because of the performance of a biopsy. Finally, the ideal evaluation of performance characteristics would involve performing biopsy on all patients before extirpative surgery in a blinded fashion or in the setting of a diagnostic randomized controlled trial to best assess its impact on clinical management and oncologic outcomes.19
We propose that well-designed, prospective studies with objective inclusion criteria and selection for intervention should be performed when possible and is an achievable goal to strengthen the evidence base.20 Retrospective literature is more prone to bias and may not capture perioperative considerations such as minimally invasive procedures converted to open procedures, or planned partial nephrectomies converted to radical nephrectomies. Only 1 randomized trial was included in the systematic review, and although additional trials may be difficult to perform, especially if involving active surveillance or when comparing thermal ablation with partial nephrectomy, prospective cohorts would greatly improve the quality of perioperative and long-term data for most key outcomes.21 For thermal ablation and partial nephrectomy, inclusion criteria could be strict enough to allow analysis of outcomes for patients who would have been eligible for either treatment modality. For active surveillance, longer term prospective follow-up from cohorts such as the Delayed Intervention and Surveillance for Small Renal Masses registry will provide data on proper selection and oncologic outcomes.22 We also recognize that more comparative quality of life studies are needed, which is an area ripe for discovery.23
Greater standardization of treatment data is required. We recommend that all studies report the clinical stage of patients as analyses based on clinical stage would be more clinically relevant for preoperative decisions compared with analyses based on pathologic stage. Outcomes should be stratified by clinical stage when possible. There is a known survival bias for renal masses in the Surveillance, Epidemiology, and End Results database where analyses are possible only by pathologic stage, and incomplete data are a concern.24,25
Definitions of survival need to be defined precisely, and methods of calculating survival should be transparent. For example, the definition of cancer-specific survival may be confused with recurrence-free survival, disease-specific survival, or cancer-specific mortality. In addition to defining terminology, methods sections should include how survival data were calculated, including when patients were censored from analyses.
Renal functional and survival outcomes need to be standardized in the routine reporting of outcomes. Immediate postoperative renal functional data are insufficient and inaccurate for reporting the renal effects of interventions.26 We recommend reporting baseline renal function within 1 month of intervention or management, as well as renal function data at 1 and 12 months after surgical intervention. Reporting estimated glomerular filtration rate is preferable to reporting serum creatinine. At a minimum, survival outcomes (ie local recurrence, metastasis cancer-specific, and overall) should be reported at 1, 3 and 5 years depending on the amount of available follow-up.
Lastly, a stadardized definition of surgical competency or expert is needed. This definition may be achieved either by case volume or by a review of proficiency, success, and complications associated with index cases. Defining surgical or technical proficiency will be an ongoing challenge, and standardizing how this is defined is paramount to comparative studies and health policy.
Required reporting of basic demographics, clinical stage, tumor characteristics including anatomic location within the kidney, and pre-intervention and post-intervention health assessments will improve the strength of evidence. It will allow easier comparisons between studies as well as stratified assessments to determine patient populations best suited to each management strategy. Age and comorbidity are known to modify comparative survival and surgical risk to aid in management decisions.27
There is a robust literature regarding the diagnosis and management of renal masses and localized renal cancer; however, the literature is not without critical weakness or research gaps. To improve patient care and health outcomes, we recommend incorporation of emerging biomarkers into composite models predicting malignancy, standardization and detailed publication of biopsy protocols, standard reporting of clinical stage before intervention, and performance of prospective studies with objective selection criteria when possible.
The authors thank Allen Zhang and Emily Little for their assistance in data abstraction, and Associate Editor, Tim Carey, MD, MPH, for revisions and commentary. The authors also thank the Key Informants Technical Expert Panel, Peer Reviewers, Dr. Steven Campbell, and the AHRQ Task Order Officer Dr. Aysegul Gozu for their feedback and insight.
Financial Disclosure: Emmanuel Iyoha, Ritu Sharma, and Eric B. Bass are members of Evidence-Based Practice center, which was funded to conduct the systematic review by Contract No. HHSA2902012000071 from the Agency for Healthcare Research and Quality, U.S. Department of Health and Human Services.
The remaining authors declare that they have no relevant financial interests.
The authors of this report are responsible for this work's content. Statements in the report should not be construed as endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services.
Appendix: Supplementary Data: Supplementary data associated with this article can be found, in the online version, at htp:/dx.doiOrg/10.1016/j.urology2016.08.013.