We examined the relationship of surgery with survival among the 7,086 individuals 30 years or older with metastatic kidney cancer in the SEER dataset diagnosed between 1988 and 2002 [

24]. We excluded 2,836 due to missing tumor size, surgery, or mortality information. The majority (2,562) were excluded due to missing tumor size information. Our final cohort for analysis was 4,250 individuals. displays characteristics of the sample stratified by surgical and missing grade status.

Patients who underwent partial, complete, or radical nephrectomy or nephrectomy NOS were classified as undergoing surgery. Patients who underwent only biopsy, exploratory surgery, palliative bypass, or had unknown status were classified as having not received surgery. There were 1,837 (43%) individuals who did not have surgery and 2,413 (57%) who did have surgery. A total of 2,875 (68%) individuals had missing grade data, with 83% of those who did not have surgery missing grade information compared with 52% of those who did have surgery. Grade was defined as specified in the SEER database except for the collapsing of the two highest grades: well differentiated (grade 1), moderately differentiated (grade 2), and poorly differentiated, undifferentiated, or anaplastic (which we label as grade 3).

Demographic information is listed in . After stratifying by missing grade, those who had surgery were younger on average and had larger tumors. In addition, in the stratified data, the surgery group had fewer women, more married individuals, and fewer non-whites than the non-surgery group. For these analyses, we grouped a small number of individuals (fewer than 25) with missing race information in the non-white category, and 93 individuals with missing marital status into the non-married category. Those with missing grade information were more likely to have been diagnosed at earlier time periods (e.g. pre-1992) than those with complete grade information. Among those who had grade data available, grade seemed to be worse in those who had surgery than those who did not. Although grade appears to be worse in patients who underwent surgery than those who did not, this may be due to the fact that patients who underwent surgery had more adequate pathologic specimens, allowing for assessment of tumor grade.

In we present Kaplan-Meier curves of mortality outcomes by stage across the 3 grades. We see that those who did not have surgery in the sample had substantially worse outcomes on average than those who did have surgery. This may be due in large part to selection bias, since less healthy patients will likely be considered poor surgical candidates.

In order to control for baseline differences between the groups, we examined the effect of surgery in a Weibull proportional hazards regression using the complete data as presented in . We included age, sex, tumor size, race, marital status, and year of diagnosis as covariates in the model in addition to surgery, grade, and their interaction. We used restricted cubic spline basis functions [

25] to account for the terms for age (2 interior knots), size (2 interior knots), and year of diagnosis (1 interior knot). We entered the covariates similarly into the missing data model for the weighted analysis. Restricted cubic splines allow for nonlinear effects in models. In , the hazard ratio of the effect of surgery on mortality among those with grade 1 tumors (the main effect term of surgery) is 0.31 (95% CI 0.19-0.52). The hazard ratio increases with increasing grade. In the regression, the interaction terms of surgery with grade are not significant suggesting that grade does not moderate the protective effect of surgery on outcomes. None of the coefficients of the potentially confounding variables was significantly associated with mortality in the model. In the case of the continuous variables, however, the magnitude and interpretability of the effects is dependent on the scale of the spline basis functions.

| **Table 3**Hazard ratios from Weibull model fit on those with complete data (n=1,375) |

Fitting the naive regression is similar to assuming that τ_{1} and *τ*_{2} equal zero. In , we present the hazard ratio over various assumptions about the relationship of surgery and its interaction with grade in the missing data model. For completeness of presentation, we report analyses over a range of sensitivity analysis parameters that includes extreme assumptions (values of *τ*_{1} and *τ*_{2} that when exponentiated include odds ratios of 1/15 to 15). Later in this Section, we will propose a region in which we believe the truth likely to exist.

In the sensitivity analyses, we entered covariates into the missing data model as described in the Weibull model, but also included censoring status, time until death or censoring (using a restricted cubic spline with 1 interior knot), and interactions between censoring status and all covariates and an interaction between surgical status and time until death or censoring. In this way, we sought to create a flexible missing data model.

When *τ*_{2} = 0, we have the no interaction model in which the relationship of grade with missingness does not vary by surgical status. In the no interaction case, when *τ*_{1} = 0, the hazard ratio is approximately 0.3 to 0.35 regardless of grade, which is consistent with the naive Weibull regression presented in . When *τ*_{1} is negative, the effect of surgery becomes more protective for those with grade 2 tumors. In the no-interaction case, a negative value of *τ*_{1} means that the log odds of having missing data decreases as grade increases. At extreme negative values of *τ*_{1}, this would indicate that much of the missing data consists of better differentiated tumors, as discussed in Section 4.1. Intuitively, incorrect negative specification of *τ*_{1} would result in the incorrect classification of many grade 3 tumors as grade 1 or grade 2 tumors. Since there are fewer grade 1 and grade 2 tumors with observed data, this could explain why varying *τ*_{1} has a larger impact on grade 1 and grade 2 estimates than grade 3 estimates. As *τ*_{1} increases but *τ*_{2} remains equal to zero, more of the missing tumors are assumed to be higher grades. At extreme positive values of *τ*_{1}, few of the missing tumors are assumed to be grade 1 tumors, as discussed in Section 4.1, and hence the surgical effect among grade 1 tumors would be less affected by the sensitivity analysis. This indeed seems to be the case in regardless of *τ*_{2}.

As *τ*_{2} decreases, missing grades in those with surgery are assumed to be better grades relative to those who did not have surgery. This could affect estimates of the surgical effect as the surgical benefit would be due in part to confounding by grade rather than a true surgical effect. Such an effect could explain why the benefit of surgery decreases dramatically, to the point of non-statistical significance and a hazard ratio greater than 0.60, in those with grade 1 tumors when *τ*_{2} is at extreme negative values. Such confounding could also explain the strengthening of the association when *τ*_{2} is positive but *τ*_{1} is negative.

Overall, the magnitude of the protective surgical effect is generally consistent across the three tumor grades regardless of the sensitivity parameter. The magnitude of the effect generally ranges between 0.3 and 0.5 except in grade 1 tumors when *τ*_{2} is extremely negative and *τ*_{1} is positive.

In terms of statistical significance, the majority of the estimates of the hazard ratio effect are statistically significant. For those with grade 2 and 3 tumors, the p-value is less than 0.01 in all cases. This is likely due to the larger number of individuals with grade 2 and 3 tumors in the completely observed data which reduces the standard errors of the estimates. For those with grade 1 tumors, the significance holds in all cases except for extreme negative values of *τ*_{2} and slightly positive values of *τ*_{1} in which the p-value is above 0.10. Again, the lack of statistical significance could be due to confounding of differential classification of missing grade data between surgical groups as discussed above.

In , we present figures representing the main effect and interaction terms from the Weibull proportional hazards used to generate . In , while there is some evidence of an interaction between surgery and grade in the region in which the surgical effect loses statistical significance in those with grade 1 disease, the p-values for the interaction terms do not fall below 0.05.

Of note is that we do not know the true values of *τ*_{1} and *τ*_{2}. However, it is possible to postulate reasonable bounds for *τ*_{1} and *τ*_{2}. We believe that in this population of patients with metastatic disease, those with worse grade (grade 3) tumors are more likely to be sicker on presentation. Therefore, they may be less likely to undergo aggressive procedures such as cytoreductive nephrectomy and have sufficient tissue to properly identify grade. This suggests that *τ*_{1} would be positive (*τ*_{1}*>* 0). However, it is likely that such a trend is not as pronounced among those who have undergone surgery as an adequate surgical specimen would reduce the association between grade and missing data. This would result in *τ*_{2} attenuating the relationship of grade with missing data (*−τ*_{1} < *τ*_{2} < 0). This would suggest that the true values of *τ*_{1} and *τ*_{2} reside in a triangle in the lower right quadrants of the graphs in and . The location of the truth in such a region would suggest that surgery is associated with better mortality outcomes (*p* < 0.05 in all cases). However, the absolute magnitude of the protective effects would not be as great as implied by the naive analysis.