|Home | About | Journals | Submit | Contact Us | Français|
Currently, no agents are approved by the United States Food and Drug Administration for either prevention or treatment of acute graft-versus-host disease (GVHD). Since the field lacks formal precedents establishing a comparative basis for assessing the efficacy and safety of new investigational agents, the design of trials to demonstrate overall clinical benefit with statistical certainty remains extremely difficult both for academic and industry sponsors. As a step toward addressing this problem, a panel of experts met on two occasions to reach consensus on recommendations for terminology defining a clinically meaningful primary endpoint in studies assessing treatment for acute GVHD. The panel recommended terminology for “very good partial response” that includes both diagnostic and functional criteria. Assessment of response at day 28 after starting treatment is appropriate for the primary end point, but later time points can be considered. Since treatment agents can be designed for use on a single occasion, on repeated occasions as needed, or continuously, durability of response should not be required as a component in the primary end point of the initial trials, although follow-up trials may be needed in order to define the optimal conditions of use and the associated risks and benefits, and sustained response may be essential for optimal clinical benefit.
Progress in the treatment of acute graft-versus-host disease (GVHD) requires appropriate planning, conduct and interpretation of results of clinical trials. Most of the historical studies that have assessed the efficacy of treatment for acute GVHD were sponsored by academic investigators. As a result, clinical practice in the management of GVHD has been based largely on institutional and physician experience, with some consideration of evidence from the literature . Currently, no agents are approved by the United States Food and Drug Administration (FDA) for either prevention or treatment of acute GVHD. Numerous clinical trials with GVHD-related endpoints are in progress, although very few phase III studies have a primary endpoint directly related to treatment of GVHD. Since the field lacks formal precedents that could provide a consistent comparative basis for assessing the efficacy and safety of new investigational agents, the design of trials to demonstrate overall clinical benefit with statistical certainty remains extremely difficult both for academic and industry sponsors.
The challenges inherent in assessing response to treatment of acute GVHD in the context of the complex and variable manifestations of the disease suggest the need for a more standardized and clinically meaningful approach to clinical trial design . Such guidance would benefit regulatory agencies, the transplant community, sponsors, and ultimately the patients for whom these new treatments are intended. A similar effort for chronic GVHD has resulted in the publication of a series of consensus documents describing unified recommendations for the diagnosis, staging, and response criteria for chronic GVHD. This effort was sponsored by the National Institutes of Health Consensus Development Project on Criteria for Clinical Trials in Chronic Graft-Versus-Host Disease [3–8].
As an initial step in addressing clinical trial design for acute GVHD, a panel of experts met on two occasions to reach consensus on recommendations for terminology defining a clinically meaningful primary endpoint in studies assessing treatment for acute GVHD. The goal was to develop criteria for treatment success that are sufficiently flexible to allow interpretation according to institutional protocol and physician experience while minimizing subjectivity and bias to achieve sufficient consistency of response for regulatory approval.
A regulatory approval pathway is clearly needed for products intended for treatment of acute GVHD. Such pathways have already been established for products in other therapeutic areas such as oncology and autoimmune diseases. The overall goal of clinical trials is to provide direct evidence of clinical benefit for a treatment. Although improved survival would provide persuasive evidence of benefit in a GVHD treatment trial, experience has shown that successful control of GVHD does not necessarily lead to improved survival. For example, a recent study by Levine et al.  showed that despite impressive differences in day 28 response rates after treatment of acute GVHD with etanercept plus steroids compared to steroids alone, survival differences were observed among patients who had related donors but not among those with unrelated donors. Among patients with related donors, the difference in survival between the two treatment groups was much smaller than the difference in response rates. In GVHD treatment trials, differences in the magnitude of response and survival effects are likely related to complications such as infection, regimen-related toxicity, recurrent malignancy, and pre-existing conditions unrelated to GVHD. Even though most GVHD treatments are not likely to produce a survival benefit, survival remains as an appropriate secondary end point to consider in acute GVHD treatment trials.
Although prolonged survival is considered the most reliable end point with clinical benefit in oncology trials, the FDA has accepted non-survival end points such as tumor response rates as the basis for both regular and accelerated approval. In studies of patients with serious or life-threatening diseases, accelerated approval status permits the use of non-survival end points if they are reasonably likely to provide clinical benefit. Post-marketing studies are usually required to confirm clinical benefit [10, 11]. From January1990 to November 2002, 68% (39 of 57) of regular approvals and all  accelerated approvals for oncology drugs were based on non-survival end points .
Regulatory approval pathways based on non-survival end points have been established for products in autoimmune diseases that have some similarity to GVHD, including Crohn’s disease, rheumatoid arthritis and systemic lupus erythematosis. For these chronic inflammatory diseases characterized by episodes of flares and remissions, the goals of treatment are to control inflammation and suppress disease activity. The first biologic (infliximab) for Crohn’s disease was approved in 1998 for reduction of signs and symptoms in patients with moderate to severe active disease. In 2002, a supplemental filing was approved for inducing and maintaining clinical remission of Crohn’s disease . Thus, infliximab was first approved based on induction of clinical response, whereas repeated therapy and maintenance of remission was assessed in a subsequent trial [13, 14].
Treatment success in clinical studies of autoimmune diseases is not predicated on producing complete response (CR) or remission, but on demonstrating improvement in a validated score or index based on a set of established measures of activity in diseases such as rheumatoid arthritis , systemic lupus erythematosis [16, 17] and Crohn’s disease . These indices have been periodically reviewed and updated as better understanding of disease pathophysiology and new treatments evolve (Table 1). A disease index or score, however, might not be appropriate for treatment trials in acute GVHD, since expectations for acute GVHD differ from those for chronic autoimmune diseases. In autoimmune disease, mortality is not a key issue, whereas death is an appreciable risk with GVHD. Furthermore, a disease activity score is applicable for extended periods of time in patients with autoimmune diseases but for only a short period time in patients with acute GVHD. Typically, GVHD has 1 of 3 outcomes: death, progression to chronic GVHD, or complete resolution within a period of 4 to 10 weeks. In most cases, manifestations do not persist for longer periods of time without progression to chronic GVHD. Therefore, control of GVHD manifestations measured primarily as the response and secondarily as the durability of the response might have the greatest impact in determining these three possible outcomes.
The close relationship between acute and chronic GVHD and the lack of an accepted severity index complicate the measurement of outcomes in GVHD treatment trials. The introduction of non-myeloablative conditioning regimens has highlighted some of the difficulties in distinguishing acute and chronic GVHD . Although acute GVHD is often associated with the development of chronic GVHD, experts agree that acute and chronic GVHD should be viewed as separate diseases, despite the extensive overlap in signs, symptoms, and management strategies [20–22]. Currently, no single accepted standard has been established for grading the severity of skin, liver, and gastrointestinal tract involvement in acute GVHD, although most institutions accept and use the Seattle grading system  modified to omit performance criteria, the consensus criteria  or the International Bone Marrow Transplant Registry  grading system to assess the peak severity of GVHD. These scales represent static measurements that do not necessarily correlate with treatment outcome or mortality .
In clinical trials, the assessment of response to treatment requires comparison between 2 sets of measurements—baseline and a designated time of endpoint measurement. In some studies, a change of GVHD grade has been used to assess response. One problem with this approach is that trivial changes in GVHD manifestations can cause a change of grade that should not be scored as success in a clinical trial (e.g., a decrease in body surface area affected by rash from 55% to 45%, corresponding to a decrease from grade II GVHD to grade I GVHD). In principle, this problem can be circumvented by defining the endpoint as CR or a 2-grade reduction, rather than a 1-grade reduction. As an alternative, a graft-versus-host disease activity index (GVHDAI) that predicts non-relapse mortality at day 200 among patients with GVHD could be used to measure response in GVHD trials . This index includes weighted consideration of the total serum bilirubin concentration, oral intake, need for treatment with prednisone and overall performance score. A disadvantage of this index, however, is that the bilirubin and performance components are not necessarily related to GVHD, although it should be noted that the original description of the GVHD grading scale included undefined criteria of “mild,” “moderate,” or “extreme” decrease in clinical performance [23, 28]. Later scales omitted these less precise performance criteria.
The eligibility criteria for enrollment in GVHD treatment trials are often specified according to the grade of GVHD, but the decision to treat acute GVHD does not correspond completely with the overall grade of the disease. In general, a “watch and wait” approach is taken with patients with grade I acute GVHD, but in some cases, it may be justified to treat grade I GVHD if clinical circumstances offer reasons to expect rapid progression to more severe disease. Treatment is generally indicated for patients with grades II to IV disease, although grade II GVHD can leave room for clinical judgment . For example, stable GVHD with mild rash involving more than 50% of the body surface without liver or gastrointestinal involvement does not necessarily require systemic treatment.
The design of clinical trials, whether sponsored by academic institutions or industry, should be guided by regulatory principles as set forth in the FDA Good Clinical Practice guidelines and Title 21 of the Code of Federal Regulations, Part 314.126 . In part, unbiased compilation and reporting of data should be based on selection of a primary end point that demonstrates clinical benefit; with adequate statistical power to discriminate between study arms for the primary end point .
A review of selected randomized trials for the treatment of acute GVHD shows variation of primary end points and timing of assessment (Table 1). Although a CR, defined as resolution of all signs and symptoms of acute GVHD, might represent the ultimate goal for individual patients, the consensus group agreed that CR may be too stringent as a primary endpoint in treatment trials. Protocols that seek complete disappearance of all GVHD manifestations as a primary end point may actually discourage or delay appropriate tapering of concomitant steroid treatment among patients with GVHD. At the same time, the group agreed that a partial response (PR) is insufficient for approval. Control of serious GVHD manifestations with some durability to permit reduction in steroid doses or other immunosuppressive therapy may reflect the true clinical goals of successful therapy.
The group sought to initiate an iterative process that would identify a primary clinical trial endpoint to permit better comparisons between treatment arms. The term “very good partial response” (VGPR) has been used to indicate a response that approximates CR, i.e., a CR with qualifications, or a functional CR. Precedent for the use of VGPR has been established within the oncology community, where the term is now an accepted addition to the uniform clinical response criteria for multiple myeloma and is used as an end point in clinical trials [32, 33]. The International Myeloma Working Group introduced VGPR to identify patients with excellent responses who may have outcomes similar to those observed among patients with CR . VGPR differs in a fundamental way from PR, despite the similarity in terminology for the two outcomes. PR indicates an improvement from baseline and does not consider whether the response approximates a CR. In contrast, VGPR indicates any improvement that approximates CR, which would always qualify as a PR if more than trivial manifestations of GVHD were present at baseline, although the criteria for VGPR do not explicitly consider whether the magnitude of improvement from baseline is sufficient to qualify as a PR.
The group agreed on working terminology for VGPR for acute GVHD as shown in Table 2. The terminology incorporates both diagnostic and functional criteria and is intended for use with adult patients. The statements are framed generally according to the acceptable limits of any manifestations that might be related to GVHD.
Agreement on the terminology describing VGPR was based on the potential comfort level with continued tapering of steroid treatment. A degree of subjectivity in assessing functional criteria provides some flexibility in interpretation. VGPR generally justifies continued tapering of steroid treatment, while PR might not.
Several criteria were excluded because they lack specificity for GVHD or because they are heavily influenced by physician or institutional preferences. For example, requiring the absence of parenteral nutrition was considered too stringent, since parenteral nutrition can be used to supplement oral intake in some patients who are otherwise doing well or in patients who have oral intake limited by factors other than GVHD. The group recognized that special considerations for pediatric patients should include allowances for decreased oral intake attributed to factors other than GVHD, such as food aversion, gastric hypomotility or the effects of pretransplant diseases such as leukodystrophy syndromes. Even though oral intake at less than 40% of daily caloric needs has been associated with a higher risk of mortality , the terminology does not include a specific level of oral caloric intake as a criterion for VGPR, because medical records do not routinely include estimates of oral caloric needs and oral caloric intake. Stable body weight was rejected as a criterion, since weight loss can accompany improvement in liver and gut function in patients with hypoalbuminemia, and overall performance status was not included due to lack of specificity for GVHD.
The group endorsed the assessment of response on day 28 as a conventional time point for the primary endpoint in GVHD treatment trials, since previous studies have shown that day 28 appears to yield the best discriminatory data for GVHD response when an accelerated response to treatment is anticipated [9, 35, 36]. Assessment of the primary endpoint at a single specified time point has the advantage that sample size can be easily estimated from binomial distributions. An additional advantage is the simplicity of describing the clinical benefit for individual patients. Time to event measurements, however, remain useful in exploratory phase II studies where the timing of maximal differences in response rates between groups cannot be anticipated. Time to event analysis can also be incorporated as a secondary end point in phase III studies.
The use of a specified time point for assessment of the primary end point might depend on whether the agent is intended for use on a single occasion, on repeated occasions as needed, or continuously until either response is observed or until it is determined that a response is not likely. The group concluded that the clinical benefit of a new investigational agent should be evaluated first by demonstrating an improved response rate at a single time point, without including durability of response as a component in the primary end point. Durability should be included as a secondary observational end point [37, 38], so that the data can be used to construct a hypothesis to be tested as the primary end point in a subsequent trial to evaluate the benefits and risks of ad hoc variation in the schedule or dose of administration according to the initial response. Durability of response, of course, may be greatly influenced by concomitant steroid treatment or by other therapy.
Since the efficacy of corticosteroids as first line treatment for acute GVHD has been established [35, 39–44], it would be very difficult to design a clinical trial without including steroids in the treatment regimen. Steroids can have many well-recognized side effects, but the relationship of steroid dose and treatment duration with potential harms has been only partially defined. End point definitions that require CR might discourage investigators from fulfilling their ethical mandate to minimize steroid-related side effects by using the lowest effective steroid dose. Potentially additive or synergistic side effects between investigational agent and steroids must be considered in assessing results of clinical trials. Assessments must account for the contribution of steroids to the response, particularly in demonstrating that any favorable treatment effect was not merely a consequence of longer duration of steroid treatment or higher cumulative steroid dose in the investigational arm compared to the control arm.
A lack of consensus among investigators and centers regarding the starting steroid dose and subsequent rate of taper has caused problems with add-on study designs. The starting dose of prednisone or methylprednisolone is usually 1 or 2 mg/kg per day, depending on the severity of GVHD manifestations and the degree of steroid-related toxicity. The starting dose is particularly controversial among patients with different subsets of stage II GVHD . Protocols should allow some degree of case-by-case physician discretionary judgment, since patients vary in their response to steroids and in their ability to tolerate steroids.
A crucial question is whether the steroid dose and the taper schedule should be mandated in the study protocol—a mandated dose might be beneficial for the trial, but not for the patient. Another question is how long to maintain the starting dose before initiating the taper, (e.g., 7 days, 14 days) or should this again be the physician’s decision? Strictly mandated taper rates are bound to cause protocol deviations, but too much variation of steroid dose could lead to a regression to the mean between study arms, which could dampen the observed contribution of the study product to the response.
Protocols should provide guidance regarding the use of steroids during the trial in an effort to balance the ideal of minimizing the steroid dose versus the ideal of standardizing the treatment regimen. In trials where steroid management is tightly controlled, steroid dose should not be included as a component in the definition of response for individual patients. In trials where the steroid management is not tightly controlled (e.g., when steroid-sparing claims are being assessed) steroid dose could be incorporated as part of the primary endpoint. For example, a remission would have uncertain value if the steroid dose remains at the starting dose on day 28. On the other hand, excessively rapid tapering of steroid doses increases the risk of not having a response at day 28. Limits on the maximum or minimum steroid doses at specified intervals, and particularly at the time for assessment of the primary endpoint, may be important for valid comparisons.
The management of flares during taper of steroid doses is an important issue to address in the protocol. To a large extent, decisions regarding any increase in steroid dose or resort to other immunosuppressive agents must be left to the discretion of the physician who is responsible for the care of the patient. In any case, the group agreed that a response should not count in cases where systemic agents other than those administered at the beginning of enrollment in the trial were given to control GVHD before the assessment day designated for the primary end point.
Demonstration of the safety of the investigational agent is an integral part of the clinical trial design. Although most efficacy studies are not powered to prove the absence of major harmful effects, a statistically significant excess of deaths, non-relapse mortality, recurrent malignancy, infections or serious adverse events in the investigational arm would make regulatory approval very difficult, if not impossible. Safety data between study arms should be monitored by a Data and Safety Monitoring Board, so that appropriate actions can be taken if substantial differences are observed between study arms, as exemplified the study by Lee et al.  Safety monitoring beyond the date of the primary endpoint assessment (e.g., 3 or 6-month survival with a day 28 primary endpoint) would be important in judging the clinical value of the primary endpoint.
The following position statement summarizes the consensus recommendations and opinions regarding clinical endpoints in treatment trials for acute GVHD.
A survival difference between the investigational and control arms of a GVHD trial would provide persuasive evidence of clinical benefit, but prior experience indicates that even large differences in response rates might not be sufficient to provide a survival benefit among patients with acute GVHD. As an alternative, a very good partial response (VGPR) that closely approximates complete resolution of disease manifestations has considerable appeal as a primary end point in clinical trials for treatment of acute GVHD (Table 2).
Results of previous studies have suggested that assessment of response at day 28 after starting treatment is appropriate for the primary end point, but later time points can be considered. Earlier time points might not be sufficient interval for optimal responses, and substantially later time points carry increasing risks of confounding by development of chronic GVHD.
Since treatment agents can be designed for use on a single occasion, on repeated occasions as needed, or continuously, durability of response should not be required as a component in the primary end point of the initial trials but is an important secondary endpoint for all new therapies. Follow-up trials may be needed in order to define the optimal conditions of use and the associated risks and benefits.
Inclusion of steroid dose as a component in the criteria for VGPR may be considered if results of the study are intended to support steroid-sparing claims. Otherwise, steroid dose should serve as a group assessment rather than an individual patient assessment. If a study product is highly effective, steroid doses could be lower in the investigational arm than in the control arm, but claims of efficacy would not be credible if steroid doses were higher in the investigational arm than in the control arm. Claims of steroid-sparing benefit must be supported by demonstrated overall clinical benefit in order to ensure that the benefits related to reduction of steroid exposure are not overwhelmed by side effects of the investigational agent.
VGPR attained after systemic treatment that is not prescribed by the protocol, and VGPR followed by death before the day designated for assessment of the primary end point should not be categorized as success for the primary endpoint. The VGPR category should allow administration of secondary systemic therapy after the day designated for assessment of the primary end point, but the proportions of patients needing secondary therapy and the time distributions of these events should be included as secondary end points. Protocols should encourage tapering of steroid doses at a rate that is commensurate with resolution of GVHD manifestations, in order to minimize the risk of steroid-related complications. The VGPR category should allow escalation of steroid doses to regain control of GVHD before or after assessment of the primary endpoint, but the frequency of dose escalation should be included as a secondary endpoint. Protocols should provide criteria and guidelines for dose escalation and subsequent tapering of steroid doses.
In clinical practice, either VGPR or CR can justify gradual withdrawal of immunosuppressive treatment. The use of VGPR as the primary endpoint in clinical trials, particularly if it is durable, addresses problems potentially associated with the use of CR as the primary endpoint. One concern is that it is not always possible to determine whether hepatic and gastrointestinal abnormalities are caused by GVHD or other complications. The use of VGPR as an endpoint makes some allowance for minor clinical abnormalities that might not be caused by GVHD, whereas the use of CR does not. A second concern is that overzealous immunosuppressive treatment given to produce CR could have detrimental effects on survival . The use of VGPR as an endpoint follows from the belief that the potential harm of giving more treatment than is really needed to produce or maintain CR exceeds the harm of slight under-treatment that may be associated with VGPR.
VGPR clearly cannot be used as the sole outcome measure in GVHD treatment trials. In reality, multiple endpoints are necessary in evaluating the overall results of treatment for acute GVHD. CR should be reported in order to facilitate comparisons with control groups and results of previous studies. Likewise, results should report flares of acute GVHD after CR or VGPR, need for secondary therapy, chronic GVHD, non-relapse mortality, recurrent malignancy and overall survival at standardized time points (e.g., 6 months and 1 year) in order to evaluate the overall effects of treatment and to facilitate comparisons between studies.
The consensus meetings were held on September 11, 2007 and on May 23, 2008 and were sponsored by Osiris Therapeutics, Inc., Columbia, MD. Osiris provided honoraria to the participants (except P.J.M.) but did not contribute to the meeting output or to the content of this manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Financial Disclosure Statements
P.J.M., research support from Osiris Therapeutics; C.R.B., research support from Osiris Therapeutics; H.-G.K, independent safety monitor for a clinical trial sponsored by Osiris Therapeutics and owns < $50,000 of Osiris stock; M.W.S., research support from Osiris Therapeutics; D.W., research support from Amgen, Hospira, Easai, Roche, Genzyme and honoraria for advisory board participation from Genzyme, Dor, Hospira, and Osiris; R.J.S., P.L.M., P.S., J.P.U., N.J.C., P.W., E.J.S. and M.L.M., no additional relevant financial disclosures.