|Home | About | Journals | Submit | Contact Us | Français|
Since morbidity early after HCT results in large part from the development of acute GVHD, we previously proposed that a longitudinal assessment of morbidity involving the skin, liver, and gastrointestinal tract might provide a more complete, objective approach for comparing two arms of open-label randomized clinical trials for acute GVHD prevention. In the current study, we determined both morbidity-across-time and GVHD-across-time in a retrospective analysis of a database from an open-label randomized clinical trial comparing tacrolimus/methotrexate versus cyclosporine/methotrexate after myeloablative conditioning and marrow transplantation from HLA-matched unrelated donors. The results confirmed differences in overall morbidity across time among patients with peak grades II-IV GVHD as compared to those with grades 0-I GVHD, but no significant differences were found between morbidity associated with grade II GVHD as compared to grades 0-I GVHD. We observed less skin and a trend towards less liver morbidity across time in the tacrolimus group (p=0.04; p= 0.09, respectively) but not for gastrointestinal or overall morbidity, despite significantly decreased skin and liver stages and overall grades of GVHD-across-time in the tacrolimus arm. In conclusion, an objective assessment of differences in morbidity (regardless of cause) as a measure of acute GVHD in a randomized clinical tria of acute GVHD prevention had limited utility. The difficulty of demonstrating clinical benefits from objective parameters such as survival and morbidity and the subjectivity of grading acute GVHD emphasize that blinded assessments are required in clinical trials of GVHD prevention.
Acute graft-versus-host-disease (GVHD) is a complication of allogeneic hematopoietic cell transplantation (HCT), and it contributes significantly to transplant-related mortality . Acute GVHD after HCT has conventionally been measured by assessing the degree of rash, total serum bilirubin and diarrhea volume and then based on the pattern and severity of the organ involvement, assigning a peak grade [2-5]. This approach has potential limitations. First, there are other causes for organ dysfunction besides acute GVHD that may confound the assessment and so the assignment of stages and overall grades of GVHD may then require some element of subjective interpretation [6-8]. This subjectivity increases the risk for bias in open label randomized clinical trials with acute GVHD endpoints [7,8]. Second, acute GVHD is a process that may persist or relapse over time, and therefore a large amount of data are not considered using a system that grades only the peak severity of the abnormalities [7,8]. A longer duration of morbidity related to acute GVHD may be associated with resistance to treatment and could be highly associated with patient outcomes [8,9]. The importance of conducting blinded randomized clinical trials is well established . However, most, if not all, previous randomized clinical trials of interventions for GVHD prophylaxis have been open-label [11-16]. This may have occurred from the belief by investigators that effective control of acute GVHD would result in the improvement of more objective endpoints like survival or that the type of study intervention prevented effective blinding of the patient or the transplant team . Since more effective control of acute GVHD may not necessarily improve survival and blinding is not always feasible for randomized clinical trials, the approach to the assessment of interventions on acute GVHD end points requires re-evaluation because of the potential bias associated with the unblinded grading.
A new potentially more objective method for assessing the effects of an intervention on acute GVHD had been previously proposed by Al-Ghamdi et al. based on the overall morbidity of the target organs measured longitudinally across time regardless of cause . In that proposal, an assumption was made that if the study arms of the randomized clinical trial were balanced for factors that influence the risk for regimen-related morbidity then any differences in skin, liver or gastrointestinal (G.I.) morbidity between the arms would result from acute GVHD. Further, if differences between the arms were not detected, then it was unlikely that the treatment affected the risk of GVHD, provided that the study design had adequate statistical power to detect any true differences. It was concluded by Al-Ghamdi et al. that there were significant differences in morbidity between the peak grades II-IV and grade 0 acute GVHD . No differences were observed in morbidity between the two study arms, but no difference had been noted in the incidence of the peak grades of acute GVHD in the clinical trial which had been retrospectively analyzed . The study by Al-Ghamdi et al. was limited by the small sample size, the exclusion from the analysis of patients with grade I acute GVHD, and the lack of assessments before the onset of GVHD or during the first 3 weeks after transplant for patients who did not develop GVHD. Finally, the lack of difference in the incidence of acute GVHD between the study arms did not allow an assessment of the methodology under conditions where a treatment effect was evident by conventional methods. Based on the potential for this approach and the limitations of the study by Al-Ghamdi et al., another analysis of this methodology was undertaken to assess its validity.
In the present study, we assessed the relationships between morbidity and acute GVHD among subjects who participated in an open-label prospective randomized clinical trial of acute GVHD prevention comparing the efficacy of tacrolimus to cyclosporine (CSP) . One goal of the current study was to validate the prior observation that higher peak organ stages and overall grades of GVHD correlated with greater morbidity across time during the first 14 weeks after transplantation. A second goal was to determine whether differences observed in peak overall grades of GVHD in a prospective clinical trial translated into differences in morbidity between the two arms.
Between March 1995 and September 1996, 180 patients at 10 institutions in the United States were enrolled in a randomized, open-label, multicenter randomized clinical trial comparing the combination of tacrolimus and methotrexate (MTX) versus the combination of CSP and MTX for the prevention of acute GVHD after marrow transplantation from matched unrelated donors. The study design, criteria for HLA donor matching, transplant procedure, supportive care, patient characteristics and outcomes of the study have been previously published . Briefly, donor selection was based on typing for HLA-A and -B antigens with serological methods, including all splits defined by the World Health Organization Nomenclature Committee at the Tenth Histocompatibility Workshop in November 1987. Typing of HLA-DRB1 alleles was by DNA hybridization with sequence specific oligonucleotides probes. All patients had hematologic malignancies and were at low to intermediate risk for relapse after HCT. Preparative regimens were assigned according to the treatment protocols at the investigational sites. Unmodified donor marrow was infused on day 0.
Tacrolimus was given initially at a dose of 0.03 mg/kg/day and CSP at 3 mg/kg/day as a continuous IV infusion over 24 hours starting on day -1. The oral formulations were later given at a ratio of 4:1 in two divided doses for both drugs. During the first 8 weeks, tacrolimus and CSP whole blood trough concentrations were maintained at 10-30 ng/mL and 150-450 ng/mL, respectively. In the absence of GVHD, doses were tapered beginning at week 9 and administration was discontinued at 6 months after transplant. MTX doses were 15 mg/m2 on day 1 and 10 mg/m2 on days 3, 6, and 11 after HCT. All patients received either prophylactic or preemptive cytomegalovirus (CMV) therapy and anti-fungal prophylaxis according to the institutional protocols.
During the study, an overall peak grade of acute GVHD was assigned by the site investigator at each institution according to modified Seattle criteria, and these grades were used to evaluate the primary end point as previously described [2-4,13]. Biopsies were obtained when indicated to corroborate the clinical diagnosis of GVHD. In addition, an Endpoint Evaluation Committee (EPEC) was formed to independently assess the acute GVHD end point for all patients. Patients received primary treatment with corticosteroids and continued the administration of tacrolimus or cyclosporine according to the original randomization.
During the clinical study, measurements of rash, serum bilirubin, nausea, and average daily stool volume were recorded weekly from marrow infusion until day 100 after HCT. Additional weekly notations recorded the presence of overtly visible blood in stool or abdominal cramping. At the investigational site, the attribution of organ morbidity to GVHD or another cause was also recorded. Weekly stages and grades of acute GVHD were assigned to patients by the site investigators based on clinical parameters and biopsy results if available [3,4,13]. An overall peak grade of acute GVHD was also assigned to patients by the site investigators for that period between day 0 and 100.
For the longitudinal assessment of morbidity, staging of organs and grading for overall morbidity were modified from the system used for grading acute GVHD (Table 1) . Morbidity was defined as any skin rash, gastrointestinal dysfunction (diarrhea, cramps or blood in the stools) or liver dysfunction resulting in hyperbilirubinemia regardless of cause. Stages for specific organ morbidity and grades for overall morbidity were assigned for each week after HCT regardless of the cause. The causes of morbidity after HCT included but were not limited to regimen-related toxicities, infections, drug-related adverse effects and GVHD. For all organs, a morbidity stage 0 indicated normal function. Skin morbidity was categorized according to the extent of rash or erythema and presence of bullae: stage 1, ≤25% of skin surface area; stage 2, 26-50%; stage 3, >50%; stage 4, wet desquamation with tenderness or bullae formation (or both). Desquamation or bullae formation were considered relevant in assigning stage 4 skin morbidity only in the presence of rash or erythema involving more than 50% of body surface area; otherwise these manifestations were classified as skin stage 3. Liver morbidity was categorized according to the serum total bilirubin concentration; stage 1, 2.0-2.9 mg/dL; stage 2, 3.0-5.9 mg/dL; stage 3, 6.0-14.9 mg/dL; stage 4, ≥15 mg/dL. G.I. morbidity was staged according to the previously published scoring system and included symptoms of diarrhea, cramps and overtly visible blood in the stool . Scores of 1, 2 and 3 were assigned, respectively, for diarrhea with average daily volumes <1,000 mL, 1,000-1,499 mL, and >1,500 mL. A score of 2 was assigned for the presence of either abdominal cramps or overt blood in the stool for a potential total score of 7. Overall stages 1, 2, 3 and 4 were assigned, respectively, for total G.I. symptom scores of 1, 2, 3-4, and 5-7. If urinary mixing was present in the assessment of diarrhea volume, a score of 1 was assigned when the average daily volume was <1,000 mL, and it was the only sign of G.I. disease. The longitudinal assessment of G.I. morbidity was repeated with consideration of anorexia, nausea, and vomiting for determining the overall score. If nausea and vomiting were present and were the only G.I. symptoms reported, a score of 1 was assigned; otherwise, anorexia, nausea, and vomiting were not considered in the score in the presence of diarrhea, abdominal pain, or visible blood. Grade 1 overall morbidity was defined as stage 1-2 skin morbidity with stage 0 liver and G.I. morbidity. Grade 2 overall morbidity was defined as stage 3 skin morbidity or stage 1 liver or G.I. morbidity. Grade 3 overall morbidity was defined as stage 2-3 liver or G.I. morbidity. Grade 4 overall morbidity was assigned for patients with stage 4 morbidity in the skin, G.I., or liver.
A longitudinal method was used to assess morbidity to allow for the fact that morbidity changes with time after transplant. Curves were fit to morbidity scores (0, 1, 2, 3, 4) across time. Differences between groups defined by the Seattle criteria for peak grades of acute GVHD or study arm (patients with peak grade 0 versus those with peak grades III-IV acute GVHD, patients with grades 0-I versus those with grades II, and those with grades III-IV acute GVHD or alternatively tacrolimus versus CSP) were compared using an odds ratio (OR) for the proportions of patients with scores above specific thresholds of morbidity (0, 1, 2, 3). This analysis assumed that the OR was identical for all thresholds. Accordingly, the OR for a score above 3 was the same as the OR for a score above 2. The analysis also assumed that the general shape of the curves describing morbidity over time was the same for the two groups being compared. Plots of predicted values were compared to observed values to validate these assumptions. Statistical significance and confidence intervals were estimated from the OR and the corresponding variance estimate. In more technical terms, ordinal regression methods were used to evaluate differences between groups in ordered categorical distributions of morbidity grades measured at weekly intervals. Cubic spline functions with five knots were included in the model for the time variable. The proportional odds model was then fit to the ordered categories, using robust sandwich variance estimates to account for the multiple observations contributed across time by each subject [20,21]. An additional analysis was performed to assess the impact of morbidity scores on non-relapse mortality. Morbidity scores were utilized as time-dependent covariates in a Cox regression model with non-relapse death by day 100 post-transplant as the outcome of interest to obtain hazard ratios (HR) and associated confidence intervals. Patient follow-up was censored at relapse date, date of last contact or day 100 post-transplant, whichever came first.
Using the staging and grading system for morbidity after HCT previously described by Al-Ghamdi et al., there was a high prevalence of overall morbidity before week 5, likely related to regimen-related toxicity, after which morbidity began to decrease progressively (Figure 1A) . While skin and liver had stable trends towards less morbidity with increased time from transplantation, the prevalence of stage 1-4 G.I. morbidity was initially high, and then had a more fluctuating trend with persistence of significant morbidity to week 14 after transplantation (data not shown). There was a similar profile for the prevalence of acute GVHD but the proportions were substantially lower during the early weeks after HCT (Figure 1B). Differences between Figures 1A and 1B demonstrate substantia morbidity attributed to causes other than acute GVHD persisting to week 14 after HCT.
Higher peak grades of acute GVHD were associated with a longer duration of GVHD manifestations, and there was a trend toward a longer duration of morbidity. The median proportion of weeks at grade 2 or higher morbidity was 0.63 among patients with peak grades III-IV acute GVHD (n=38) as compared to 0.38 among those with peak grade II acute GVHD (n=71) (p<0.01). The median proportion of weeks at grade II or higher of acute GVHD was 0.36 in patients with a peak of grade III-IV acute GVHD compared to 0 in patients with a peak of grade II GVHD (p<0.01).
Skin, liver, G.I. tract, and overall morbidity distributions across time during the first 14 weeks after HCT were higher among patients with peak grades III-IV acute GVHD than among those with peak grades 0-II GVHD (all p<0.01) (Table 2A; Figure 2). Skin, G.I., and liver morbidity distributions were also significantly higher among patients with peak grades II-IV than among those with peak grades 0-I acute GVHD (p<0.01) (Table 2B), but there was only a marginally significant difference in overall morbidity between the two groups (OR 1.4; P = 0.05). In further analysis comparing peak grades 0-I to II and III-IV GVHD separately, there was a significant difference in morbidity between peak grades 0-I and III-IV GVHD (OR=3.0; p<0.01), but no significant difference was found between peak grades 0-I and II GVHD (OR=1.0; p=0.94). To decrease the contribution of the morbidity related to the conditioning regimen, the analysis was repeated for events from weeks 3-14 only, but the results did not change significantly in any category (data not shown). The inclusion of nausea and vomiting in the staging of G.I. morbidity did not improve the association of overall morbidity with the peak grade of acute GVHD (data not shown).
With an overall grade of 3-4 for morbidity, the hazard of death was significantly higher compared to grade 0 HR=20.9, p<0.001) but was not significantly higher with overall grade 2 morbidity (HR=0.4; p=0.36). The hazard of death was significantly higher with overall grade III-IV acute GVHD compared to grade 0 (HR=4.6, p<0.001) but not for grade II acute GVHD (HR= 0.9, p=0.86). Higher grades of morbidity and acute GVHD were associated with an increased non-relapse mortality.
The two arms of the study population were balanced with regard to all variables including the number of deaths before day 100 (p=0.49), the number of relapses by day 100 (p= 0.19), and the number of patients followed for 14 weeks (p= 0.76) (Table 3). Morbidity-across-time was not significantly different for patients in the two arms of the study for G.I., liver, or overall morbidity (Table 4). There was less skin morbidity-across-time, and there was a trend towards less morbidity of the liver in the tacrolimus group (OR=1.5, p= 0.04, OR=1.6, p=0.09; respectively). The analysis was also performed for the period from week 3-14, but the OR did not change significantly for any of the groups (data not shown). When the analysis was performed using GVHD-across-time, we found significantly less skin, liver, and overall GVHD (p<0.01, p=0.02, p<0.01 respectively) but not G.I. GVHD (p=0.33) in the tacrolimus group (Table 4). These differences were also observed for peak stages and grade of acute GVHD between the two arms for skin, liver, and overall GVHD (p<0.01; p<0.01, and p=0.02, respectively), but not for G.I. GVHD (p=0.67). We also measured overall and organ-specific morbidity in patients with peak grades 0 and I acute GVHD to determine whether morbidity was differentially attributed between study arms due to the open-label design of the clinical trial. No significant imbalances in morbidity were observed between the study arms for any of these groups. Therefore an imbalance of the non-GVHD morbidity did not explain the lack of significant differences for organ-specific or overall morbidity between the study arms.
The conventional systems for grading acute GVHD have been informative as a measurement of disease severity for treatment interventions. On the other hand, current grading systems can be biased by subjective assessment and represent a single-event statement that might not provide an ideal method for evaluating results in randomized clinical trials in which GVHD was an outcome of interest [2,4-8,22]. To overcome these deficiencies, we proposed a novel method for assessing the effect of study interventions on acute GVHD in randomized clinical trials based on a longitudinal evaluation of morbidity regardless of cause . This approach could be used as a study end point and would not necessarily replace the conventional staging and grading of acute GVHD. If the arms of the study were well balanced with respect to factors that influence morbidity after HCT, we hypothesized that any difference in morbidity involving the skin, liver, and G.I. tract between the treatment groups should be related to the incidence and severity of GVHD. In the present study, we found a high prevalence of skin, liver, G.I., and overall morbidity during the first 3-5 weeks after marrow transplantation, most likely as a result of regimen-related toxicity. After week 5, morbidity decreased more in the group with grades 0-II acute GVHD than the group with grades III-IV acute GVHD as defined by the modified Seattle criteria. However, persistence of morbidity not related to GVHD was still substantial until week 14, which may contribute to difficulties in the grading of acute GVHD even after resolution of regimen-related toxicity.
The morbidity-across-time analysis was able to detect differences in overall grade as well as stage of skin, G.I., and liver morbidity when patients with peak grades 0-II were compared with grades III-IV acute GVHD. Importantly, there was no difference in overall morbidity between 0-I and II acute GVHD. This may have implications for the sample size required in a clinical trial to detect differences in morbidity regardless of cause related to better prevention of acute GVHD. Compared to a randomized clinical trial in which the end point was peak grades II-IV acute GVHD, an increase of the sample size would likely be required if the expected differences in morbidity between the study arms were associated primarily with peak grades III-IV acute GVHD. The lack of difference in overall morbidity between peak grades 0-I and II acute GVHD may also explain why the incidence of acute GVHD identified by the Endpoint Evaluation Committee (EPEC) was markedly decreased compared to the incidence reported by the site investigators. Most of the cases of acute GVHD identified by the EPEC had been reported by the site investigators as grades III-IV even though most cases of acute GVHD relevant to the study end point were grade II [13,17]. Morbidity-across-time measurements in randomized clinical trials are likely to detect primarily the differences between study arms associated with severe acute GVHD.
Grades 3-4 morbidity and grades III-IV acute GVHD were associated with an increased risk of non-relapse mortality. Grade 2 morbidity and grad II acute GVHD were not associated with an increased risk of non-relapse mortality. Previous studies have also failed to show a strong association between mortality and overall grade II GVHD [9,23]. Since grade II acute GVHD was not associated with an increase in morbidity or an increased risk of non-relapse mortality, the incidence of grade III-IV acute GVHD could be considered the most relevant endpoint for clinical trials of GVHD prevention. This endpoint, however, would have some limitations, since the difference between grade II GVHD versus grades III-IV GVHD reflects both the efficacy of the prophylaxis regimen and the efficacy of any treatment for acute GVHD. In addition, the distinction between grade II GVHD and higher grades cannot be made reliably and consistently in patients who have other gastrointestinal or hepatic complications.
The EPEC, which was blinded to study arm and GVHD treatment, determined that the incidence of GVHD was less in the tacrolimus than in the CSP group. Although the overall incidence of acute GVHD was substantially less in the EPEC analysis than the incidence assessed by the study investigators, it confirmed the treatment effect observed by the unblinded assessment of acute GVHD by study investigators [10,17]. The longitudinal analysis of morbidity did find significantly less skin morbidity in the tacrolimus group but no difference in the overall grades of morbidity between the two study arms. Less skin morbidity and a trend towards less liver morbidity in the tacrolimus groups were consistent with what had been determined by the EPEC assessment . The failure to observe a larger effect on morbidity, when the incidence of conventionally graded acute GVHD was significantly reduced, may have resulted from the large number of morbidity events related to causes other than GVHD. The increased use of corticosteroids for the primar also mitigated the potential difference in morb Py treatment of GVHD in the CSP group may have idity compared to the tacrolimus group . Although potentially more objective because an analysis of morbidity-across-time does not require the assignment of cause, this type of analysis would appear here to increase the chances of failing to detect an effective therapy for GVHD prevention as determined by the EPEC. There was significantly less skin, liver, and overall acute GVHD-across-time in the tacrolimus group but the incidence of acute GVHD in the G.I. tract was similar. It is not clear why the major effect of tacrolimus was the reduction of skin and liver acute GVHD. The hazard ratios obtained by comparing tacrolimus to CSP were similar for peak acute GVHD grades and GVHD-across-time. In this study, the longitudinal analysis of morbidity and acute GVHD did not appear to be more informative than the analysis of peak grades of acute GVHD for assessing the differences between study arms.
In summary, this was a novel attempt at describing morbidity-across-time after allogeneic HCT and how it might be used as a surrogate endpoint for acute GVHD. Findings from this study demonstrated that the longitudinal assessment of morbidity in skin, liver, and G.I. tract in a randomized clinical trial was feasible. However, contrary to our original assumption, considerable morbidity not related to acute GVHD was observed even after the expected resolution of regimen-related toxicity. Significant differences were observed for acute GVHD-across-time and peak stages and grades of acute GVHD between the arms of the randomized clinical trial but not for overall morbidity. The Dlack of difference in overall morbidity between grades 0-I and II acute GVHD and the presence of limits the utility of this method in a randomized considerable morbidity unrelated to GVHD clinical trial of GVHD prevention and would likely require a larger patient sample size for detecting significant differences in severe acute GVHD between the study arms. In addition, factors that might affect the morbidity of acute GVHD and are not balanced between the two arms (most notably the duration and intensity of other immunosuppressive treatment after diagnosis of GVHD) would need to be considered in the interpretation of the outcomes of GVHD studies. Although the evaluation of morbidity and GVHD-across-time may be informative, it might best serve as a descriptive analysis or a secondary end point rather than a primary end point of a randomized clinical trial of GVHD prevention. In open-label studies, a carefully constructed EPEC analysis with blinding of investigators to both investigational intervention and GVHD treatment appears to be the best option for minimizing bias in evaluation of acute GVHD end points.
The authors thank William Fitzsimmons, PharmD; Fujisawa Inc., for permitting the use of the clinical database for the purpose of this analysis and all the investigator Cs who contributed to the original report of this study including J.H. Antin, M.D., V. Ratanatharathorn, M.D., C. Karanes, M.D., J.W. Fay, M.D., B.R. Avalos, M.D., A.M. Yeager, M.D., D. Przepiorka, M.D., S. Davies, M.D., and F.B. Petersen, M.D. We would also like to thank Helen Crawford, Bonnie Larson, Sue Carbonneau, and Connie Chan for their excellent support in preparing the manuscript.
This work was supported in part by grants CA18029, HL36444, and CA15704, NIH, DHHS, Bethesda, MD.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.