3.1 The data
When no meaning is lost, for simplicity we will suppress the subject index
i. The trial recorded baseline covariates and per-course variables measured at the end of each course of chemo and just prior to assignment of the next chemo until just prior to discontinuation of study chemos. The per-course variables included PSA (a positive continuous variable), and a compound binary favorable/unfavorable response indicator defined in terms of PSA and an indicator of advance of disease (AD). AD was defined as any of the four events (1) new spots of bone involvement on bone scans; (2) increase in product of cross-sectional diameters of soft tissue of visceral metastases by 25% or more; (3) increase in cancer related symptoms; or (4) increase in PSA from baseline by 25% or more confirmed by serial measurements one week apart. As per-protocol, a favorable response in the course that a chemo was first given was defined as a drop in PSA of at least 40% compared to baseline without evidence of AD, and a favorable response in the second consecutive course with the same chemo as per protocol was defined as a drop in PSA of at least 80% compared to baseline without evidence of AD (
Thall et al., 2000).
For subjects departing from the study protocol there also were records indicating the reasons for doing so. In particular, it was recorded whether the decision to stop the study therapy was due to the development of severe toxicity or severe PD, or for other reasons. Note that PD was AD considered by the attending physician to be so severe that it precluded further therapy per the protocol algorithm. The extended data set which incorporated new compound per-course variables that recorded the development of toxicity or PD contained, for each patient, entries for the following 19 variables:
Variable
Aj,
j = 1, …, 4, records the chemo in the set

received at the start of course
j if the patient actually received one. If the patient had discontinued the study chemos at or prior to the start of course
j,
Aj was coded either with OFF or with N/A. It was coded with OFF if the patient was alive at the start of course
j and discontinuation was as mandated by protocol (i.e. due to the occurrence of two consecutive per-course favorable responses, or two unfavorable responses, consecutive or not), or due to either PD or severe toxicity. It was coded as N/A if discontinuation was for other reasons, including death. This data-coding convention is needed for our formal definition of viable rules given in the next section. Note that a patient with
Aj = OFF in course
j would still be adhering to the viable rule during course
j, whereas one alive and with
Aj = N/A would not.
Variables
P1 and
V1 are measured at baseline, prior to receiving the first chemo,
P1 records PSA, and
V1 is a binary indicator of high (versus low) disease volume, defined as at least 4 areas of presumed pathologic uptake or involvement of the appendicular skeleton as shown by bone scan, or visceral involvement (
Thall et al., 2007).
Variables Pj, Tj and Ej, j = 2, .., 5 record PSA, toxicity and our compound measure of efficacy, all computed at the end of course j − 1 and just prior to Aj, provided the subject received a study chemo in course j −1, otherwise they are coded as N/A. Toxicity Tj was a three level ordinal variable: TOX0 (no toxicity), TOX1 (toxicity occurring at a level of severity that precludes further therapy but allows efficacy to be evaluated) and TOX2 (toxicity so severe that therapy must be stopped and efficacy cannot be evaluated). Efficacy Ej was a four level variable: EFF0 (favorable response to a chemo in course j), EFF1 (non favorable response but no PD), EFF2 (PD) and EFF3 (inevaluable response due to severe toxicity).
Although the protocol stipulated that PSA values should be recorded even after study therapy discontinuation, these values were recorded in a very small number of subjects and, for several of them, only intermittently. We have chosen to disregard the few available post-study therapy PSA values and code them as N/A, since any analysis that used them would need to make untestable assumptions about the mechanism leading to the missing PSA values.
The variable X records the time to death measured in months from the time the first chemo was administered. All but two subjects were known to have died by March 1, 2011 and their death times were all recorded. Of the remaining two, one was last recorded to be alive 28.7 months after study enrollment. The other was still alive as of March 1, 2011. The death times of these two subjects were imputed as the last time they were known to have been alive.
In the sequel, we denote L1 = (P1, V1) and let Lj denote the entries for the covariates (Pj, Tj, Ej) at the end of course j − 1 and the indicator that the person is alive at the start of course j, i.e. that X is greater than month 2 × (j − 1),
illustrates the possible per-course trajectories for (Ej, Tj), with the numbers observed to have followed each trajectory in parentheses. The figure also displays the courses of action prescribed by the viable DTRs defined in the next section and the number of subjects that dropped-out from the viable DTRs at each course. Comparison of and shows that only 12 of the 47 cases that dropped-out of the per-protocol DTRs remain drop-outs of the viable DTRs.
3.2 The viable switch rules
To define the viable switch rules, we first consider the hypothetical world in which the only reasons for not adhering to the trial protocol are discontinuation of treatment because of PD and/or severe toxicity. In this hypothetical world, Aj will be coded as N/A only if the person is dead at the start of course j. In section 4.3 we will extend our definition to the case in which drop-outs for other reasons are present.
We will use the notational convention
j = (
V1, …,
Vj) to represent the information accumulated on the variable
Vl up to course
j, and we use an unsubscribed

to denote the entire history. For any viable switch rule, described in section 2, the patient initially is treated with chemo
a

and, if and when he qualifies for a switch to a second pre-specified chemo, he receives chemo
a*

–{
a}, but otherwise is treated with therapy left to the doctors’ discretion. This is defined by four functions,
ga,a*,j (
j), for
j = 1, 2, 3, 4. The function
ga,a*,j(
j) returns the therapy prescribed by the rule for course
j when a patient has data
j. To define
ga,a*,j (·) let
where

[
B] is the indicator that
B is in the set

. Thus,
Sj is the indicator of a favorable response without toxicity in course
j −1 and
Fj is the indicator of a non-favorable response without toxicity or PD. The functions
ga,a*,j,
j = 1, 2, 3, 4, are defined as follows:
Although
X is not a component of
j, the indicator that
X > 2 (
j − 1) is. Thus,
ga,a*,j (·) is a well defined function of just the components of
j. Recall that an OFF in a course
j indicates that the patient is no longer receiving a chemotherapy from the sequence (
a,
a*) at the start of course
j and has been switched to a therapeutic/palliative action decided by the treating physician. For example, at the start of course 2, a patient who had both
S2 = 0 and
F2 = 0 must have
T2 =
TOX1 or
TOX2 or
E2 =
EFF 2 or
EFF3, i.e. he must have experienced severe toxicity and/or PD after the first course of chemo. As such, he should be taken off study chemo and switched to a therapeutic/palliative action, so
ga,a*,2(
2)=
OFF. Of course, no treatment action at the start of a given course needs to be specified if death has occurred prior to that time.
3.3 Outcome scores
In our analysis, we are interested in comparing DTRs on the basis of their effects on both long-term survival and efficacy in diminishing disease burden over 32 weeks. For the first goal, we analyze
U = log
X, log survival time. For the second goal, we analyze three endpoints of the form
Y =
y(

) for specific scoring functions
y (·) taking values in the interval [0, 1]. The value taken by
y(

)is a numerical score that quantifies the clinical desirability of the response trajectory

. Each choice of
y(·) reflects a different viewpoint on what is desirable in a given response trajectory while receiving study chemos. All three scores are composites defined as functions of toxicity and efficacy while on study chemo. The first two scores are functions of the indicators
of evaluable (with or without toxicity) favorable response at course
j.
1. Binary Scores: This scoring system simply assigns the value 1 if there were two consecutive per-course favorable responses and 0 otherwise. That is,
The score Y
bin regards therapies that provide transient benefits, in the sense of having a positive probability of either only one successful course or two non-consecutive courses that are successful, to be equally undesirable as therapies that provide no benefits at all. This score is not quite the same as the overall success/failure endpoint stipulated by the trial protocol, since Y
bin takes the value 0 for a subject that drops out due to toxicity or PD, whereas the trial endpoint would be missing for such a subject.
2. Ordinal Scores: This scoring function differs from Y
bin in that the outcomes of patients for whom therapy achieved one successful course, or two non-consecutive successful courses, were scored as 0.5. Thus, it distinguishes therapies that produce transient efficacy benefits from therapies that don’t. Specifically,
3. Expert Score: This score reflects the viewpoint of the PI of the trial regarding the relative clinical desirability of each of the possible per-course toxicity and efficacy outcomes while the patient was on study therapies. It thus distinguishes therapies on the basis of their benefits over the entire available trajectory of efficacy and toxicity. To construct this score, we elicited numerical values Cj = c (Ej, Tj), j = 2, …, 5, between 0 and 1 for each of the possible combinations of values of (Ej, Tj) for every j such that the subject received a study chemo in course j − 1. The seven possible numerical values of Cj are listed in . They reflect the clinical viewpoint that a course success, EFF0, is highly desirable, the absence of PD even if a success is not achieved, EFF1, is desirable, and extreme toxicity, TOX2, is highly undesirable. The symbol X in the table indicates that the corresponding combination of (Ej, Tj) is not feasible. The overall outcome score, which we call the “expert score,” is defined as the mean of the per-course scores while the patient was on a study chemo, formally,
| Table 1Expert score for the possible combinations of efficacy and toxicity outcomes. |
Note that 1−I{OFF,N/A} [Aj−1] equals 1 if the subject is alive and received a study chemo at the beginning of course j −1, and equals 0 otherwise. Recall that (Ej, Tj) denotes the efficacy and toxicity measured at the end of course j − 1.
The expert score is more informative than the ordinal score, as it distinguishes not only regimes that provide transient efficacy benefits from those that don’t, but it also quantifies the clinical desirability of the different transient benefits. For example, consider two subjects who had a favorable outcome with no toxicity in the first course of chemotherapy (E2 = EFF 0, T2 = TOX0) but no more favorable outcomes afterwards. Suppose the first subject experienced PD and no toxicity to the second course of chemo (E3 = EFF 2, T3 = TOX0) so his chemotherapy was discontinued, whereas the second subject experienced no PD and no toxicity in the second and third courses of chemo (E3 = E4 = EFF 1, T3 = T4 = TOX0). The response trajectory of the second patient, while not an overall success, is still preferable to the response trajectory of the first patient. This is reflected in the expert score but not in the ordinal score; for both patients the ordinal score is 0.5 whereas the expert scores for the first and second patients are 0.55=(1+0.1)/2 and 0.67=(1+0.5+0.5)/3, respectively.
For comparing the benefits of the different DTRs in reducing disease burden over 32 weeks, we use scores computed using only outcome data while the patient was on study chemo. We do so because, by design, data on efficacy and toxicity were not collected subsequent to discontinuation of the study chemos and, as indicated earlier, even though PSA records were obtained for some subjects even after they went off study chemo, these records were very incomplete. The lack of off-study chemo outcome data limits our ability to compare the effects of different viable DTRs on disease burden, while alive, over the fixed period of 32 weeks. Our choice to analyze expert score endpoints is an attempt to remedy this problem insofar as we believe this score is a good predictor of health trajectory over the 32 weeks. The binary and categorical scores can be viewed as alternative, possibly poorer, substitute endpoints. Of course, if data on efficacy and toxicity had been collected over the 32 weeks even after chemotherapy discontinuation, this would have avoided the need for substitute endpoints.
The three scores Y
bin, Y
ord and Y
expert are meant to quantify the health trajectory over 32 weeks since the first course of chemo. Yet, because they do not depend on survival, they rank equally two individuals who have the same outcomes while on study chemos, even if one dies soon after chemo discontinuation and the other remains alive at the end of the 32 weeks. A more reasonable utility function would score these two individuals differently, penalizing the former and rewarding the latter. Nevertheless, for simplicity, we have chosen to analyze scores that do not incorporate survival because only nine out of the 150 patients died in the first 32 weeks, all but one did so after study chemo discontinuation, and they were spread evenly among the four initial treatment arms. Comparing treatments on the basis of the log-survival means E(U(a,a*)) informs about the long term effects of the different DTRs but not about their immediate effects, while comparisons based on the means of the three scores informs about their more immediate effects.
3.4 Counterfactual outcomes and the target of inference
To compare the different switch rules used in the trial, we apply the counterfactual framework for causal inference as originally developed by
Rubin (1978) for time independent treatments and later extended by
Robins (1986,
1987) for time dependent treatments in longitudinal studies. Henceforth, we define the vector
ā = (
Lā,1,
Lā,2,
Lā,3,
Lā,4,
Lā,5) of potential outcomes and the potential survival time
Xā for each possible value
ā = (
a1,
a2,
a3,
a4) that
Ā can take. Each
Lā,j denotes the value of
Lj that would have been recorded at the end of course
j −1 in a given subject in the hypothetical world in which his
Ā would have been equal to
ā. Likewise,
Xā denotes the survival time if
Ā had been equal to
ā. We then define the collection

= {(
ā,
Xā) :
ā is in the range of
Ā}comprised of the potential outcome vectors and survival times under all possible treatment sequences
ā. The set

includes potential outcome vectors
ā corresponding even to values of
ā with some components equal to OFF. For those, the corresponding entries of the vector
ā are set equal to N/A. For example, if
ā = (CVD, TEC, OFF, OFF) then
ā = (
Lā,1,
Lā,2,
Lā,3, N/A, N/A). We use this convention because we want
Lā,j to reflect the value that would have been entered for
Lj in the event that the person had
Ā equal to
ā, and recall that by convention, we code an outcome after discontinuation of study chemos as N/A. Given the complete collection of potential outcomes

, we define for each switch rule
ga,a* the hypothetical outcome vector
(a,a*), the potential survival
X(a,a*), and the potential endpoint
Y(a,a*) =
y(
(a,a*)). These are the values of

, survival time
X, and score
Y that would have been recorded on a given patient if he had been randomized, perhaps contrary to fact, to follow the switch rule
ga,a*. Thus, for example,
(a,a*) =
ā where
a1 =
a,
a2 =
ga,a*,2 (
La1), etc.
In our analysis, we use the mean scores
E[
Y(a,a*)] and mean log-survival times
E[
U(a,a*)] where
U(a,a*) = log
X(a,a*), with (
a,
a*) ranging over all 12 possible pairs, as the target parameters that form the basis for comparing the different switch rules in the trial. In particular, we will estimate each
E[
Y(a,a*)] and
E[
U(a,a*)] and the optimal switch rules

, where
depending on whether our goal is to compare DTRs on the basis of their benefits for transitory diminishing disease burden or for prolonging survival.
SMART trials like the one considered here furnish data that identifies the effects of the DTRs they were designed to compare on the basis of a predetermined endpoint. This is so because at each stage each subject is randomized to one of the treatment options that would be available to him if he were to follow any of the DTRs being compared. One immediate question is whether the prostate cancer trial data could also identify the effects of the viable DTRs that we consider in our analysis. In fact, our modification of the definition of the switch rule does not impede identification. This is because the viable DTRs differ from the original DTRs only in that they prescribe a switch to a non-prespecified therapy in the event of high toxicity or PD, and this rule was followed by all participating physicians. Intuitively, after a patient develops toxicity or PD, there is only one possible treatment option -the non-prespecified therapy- so identification is possible so long as everybody in the study complies to this added mandate, which indeed happened in the prostate cancer trial.