|Home | About | Journals | Submit | Contact Us | Français|
Appropriate control group selection in a randomized controlled trial (RCT) is a critical factor in generating results which are both interpretable and generalizable. Control groups ideally encompass and realistically reflect prevailing medical practices. This goal can be challenging in investigations of standard therapies that are routinely titrated. To eliminate the heterogeneity in clinical practice from the trial design, recent investigations of titrated therapies have randomized patients to fixed-dose regimens. Although this approach may produce statistically significant differences, the results may not be interpretable or generalizable.
In this trial design, randomization disrupts the normal relationship between clinically important characteristics and therapy titration, thereby creating subgroups of patients within each study arm that receive levels of therapy inconsistent with current practices outside of the clinical study. These misaligned subgroups may have worse outcomes than usual care. Practice misalignments can occur in any clinical trial of a preexisting therapy that is typically adjusted based on severity of illness or other patient characteristics.
In this manuscript, we review 3 recent RCTs to demonstrate how practice misalignments can affect the safety, results, and conclusions of RCTs. Furthermore, we discuss methods to prospectively identify potentially important relationships between therapy titration and patient- and disease-specific characteristics. Finally, we review trial design options that may minimize the occurrence and impact of practice misalignments. Since these designs may limit the feasibility of a clinical trial, a thorough characterization of usual care is necessary to determine whether one of these designs to protect patient safety should be used.
Choosing an appropriate control group in a randomized controlled trial (RCT) is critical for generating results which are both interpretable and generalizable. Control groups should closely reflect prevailing medical practices. For conditions with no proven therapy or when withholding therapy is acceptable, use of a placebo control may represent the best design option.1,2 However, for RCTs investigating life-threatening conditions, such as septic shock or acute myocardial infarction, control groups that reflect prevailing clinical practices are required. In these cases, withholding or changing therapies can harm patients and may destroy the external validity of the study.
It is especially challenging to design control groups for investigations of standard therapies that are routinely titrated based on physiologic end-points or patient characteristics. Some authors have argued that including a usual care control group in a clinical trial of a titrated therapy introduces too much heterogeneity into the results, making it difficult to detect a treatment effect and potentially limiting the scientific relevance and feasibility of clinical trials.3,4 Due to these concerns, institutional review boards and other regulatory bodies (e.g., the United States Food and Drug Administration) may ask that clinical trials eliminate treatment titration. Therefore, some trials have been designed to avoid titration altogether and compare 2 or more fixed-dose regimens.3-9 This approach may increase the likelihood of identifying statistically significant differences between study arms. However, such results may not be interpretable or generalizable because the comparator group does not reflect the prevailing practice of therapy titration. In previous publications we refer to this potential design flaw as a practice misalignment.10
In trials affected by practice misalignments, randomization disrupts the normal relationship between clinically important characteristics and therapy titration.10 It creates subgroups of patients within each study arm that receive levels of therapy inconsistent with current practices outside of the clinical study.10 In the study arm receiving the low level of fixed therapy, a portion of the patients will receive therapy that is arguably insufficient based on relevant clinical factors, such as severity of disease. Conversely, in the study arm receiving the high level of fixed therapy, a portion of the study subjects will receive excessive therapy compared to clinically similar patients outside of the study. These misaligned subgroups may have a worse outcome than patients receiving usual care and may substantially contribute to outcome differences between the trial arms. As such, trial results will lack external validity and cannot readily be used to inform clinical practice. Compared to physician-titrated care, fixed treatment regimens may improve or worsen outcomes; however, this can only be determined by performing a clinical trial to compare them.
Identifying potential practice misalignments in published clinical trials can be difficult. Within either trial arm, the dose or intensity of treatment may be inappropriate for certain patients based on specific clinical characteristics (e.g., age, weight, severity of illness, presence of diabetes). If practice misalignments are suspected, then statistical analyses can be used to determine whether treatment effects differ in specific patient subpopulations. When sufficient published data are available for analysis or can be obtained from the authors, standard meta-analytic techniques, mixed models or logistic regression can be used to look for differences in treatment effects across subgroups. When sufficient data are unavailable, the impact of practice misalignments cannot be assessed and the trial results may be impossible to interpret.
Practice misalignments are not limited to any one field of medicine. They can occur in any clinical trial of a preexisting therapy that is typically titrated. In the remainder of this paper, we will re-review 2 previously published examples from the critical care literature and present a new example from the addiction literature to demonstrate how practice misalignments can affect the results and conclusions of RCTs.5,8,10,11 In addition, we will discuss methods to characterize usual care and explore potential trial design strategies that may minimize practice misalignments.
The Canadian Critical Care Trials Group study of transfusion triggers randomized critically ill patients to either a liberal (10 g/dL of hemoglobin) or restrictive (7 g/dL) transfusion threshold independent of co-morbidities.8 In this trial, hospital mortality was significantly higher in the liberal strategy group compared to the restrictive strategy group (28.1% vs. 22.2%, p = 0.05). However, patient subgroups within each arm of the trial were assigned to treatments that were opposite to routine practices outside of the trial.10
At the time of the TRICC trial, physicians titrated transfusion based on many indicators of health status, including age, severity of illness (as measured by acute physiology and chronic health evaluation II scores; acute physiology and chronic health evaluation (APACHE II)), preoperative risk status, presence of shock, presence of coronary ischemia, and presence of anemia.12-14 Accordingly, randomization of a heterogeneous patient population to fixed transfusion thresholds created practice misalignments in both study arms of the TRICC trial. In the restrictive strategy arm, patients with ischemic heart disease were randomized to receive blood transfusions only when their hemoglobin level dropped below 7 g/dL, a transfusion strategy that only 3% of physicians surveyed before the TRICC trial chose for patients with ischemic heart disease.14 In the liberal strategy arm, young, relatively healthy patients were randomized to receive blood transfusions whenever their hemoglobin level decreased below 10 g/dL, a transfusion strategy that only 12% of physicians surveyed before the TRICC trial would have used in these less severely ill patients.14 Previous analyses of this trial demonstrated the effects of these practice misalignments. For patients with ischemic heart disease, a liberal transfusion strategy resulted in a lower 30-day mortality than a restrictive strategy, whereas the opposite pattern was seen in patients without ischemic heart disease (Figure 1A).10 In contrast to its benefits in patients with ischemic heart disease, the liberal strategy was primarily harmful in young, relatively healthy patients (< 55-years-old, APACHE II scores ≤ 20) (Figure 1B).10 These misalignments make the results of the TRICC trial difficult to interpret and lessen their ability to inform clinical practice.
The Acute Respiratory Distress Syndrome (ARDS) Network trial of low tidal volume ventilation (ARMA trial) randomized ARDS patients to mechanical ventilation with a tidal volume of either 6 mL/kg or 12 mL/kg.5 In this trial, changing pre-randomization tidal volumes to 6 mL/kg significantly decreased mortality compared to changing them to 12 mL/kg (31% vs. 40%, p = 0.007). However, practice misalignments occurred in both study arms, with subgroups within each arm receiving care that was opposite to routine practices outside of the trial.10
At the time of the ARMA trial, clinical practice was loosely characterized by tidal volume titration based on markers reflective of severity of lung injury, including airway pressures and compliance;15-17 physicians tended to ventilate ARDS patients' lungs with smaller tidal volumes as airway pressures increased and lung compliance decreased.10,18,19 Accordingly, randomization of patients with varying degrees of lung injury to ventilation with fixed tidal volumes created practice misalignments in both arms of the trial. In the 12 mL/kg arm, subjects with severely injured lungs were randomized to receive a relatively high tidal volume, whereas clinicians would be inclined to use lower tidal volumes in these patients. In the 6 mL/kg arm, subjects with less severe lung injury were randomized to receive a relatively low tidal volume, whereas clinicians would tend to use higher tidal volumes in similar patients. A previous analysis demonstrated that the impact of increasing or decreasing tidal volume on mortality in the ARMA trial was dependent on pre-randomization lung compliance (a marker of severity of lung injury) (p = 0.003) (Figure 2).10 In patients with less compliant lungs (compliance < 0.6 mL/cm H2O/kg predictable body weight (PBW)), increasing tidal volume increased mortality compared to decreasing tidal volume (42% vs. 29%). Conversely, in patients with more compliant lungs (compliance > 0.6 mL/cm H2O/kg PBW), decreasing tidal volume increased mortality compared to increasing tidal volume (37% vs. 21%). The results of the ARMA trial were confounded by the presence of practice misalignments that weaken the external validity of the results.
Our last example illustrates that practice misalignments can occur in any field of medicine. Johnson et al. compared the efficacy of 3 different narcotic therapies (levomethadyl acetate, buprenorphine, and high-dose methadone) to a low-dose methadone control group for the treatment of opioid dependence.11 The 3 treatment groups in this trial consisted of a high-dose methadone group (60 mg to 100 mg) representing the “upper range (of) doses generally used in clinical practice” and 2 other treatment groups (levomethadyl acetate and buprenorphine groups) with doses equivalent to 60 to 100 mg of methadone daily.11 Accordingly, all 3 treatment groups in this trial represent high-dose therapy for opioid dependence. In these 3 treatment groups, drug doses were further increased if patients attended scheduled clinic appointments regularly and were still using illicit drugs. In contrast, the low-dose methadone control group received a fixed dose of 20 mg of methadone per day regardless of regular clinic attendance or continued use of illicit drugs. The authors reported significantly higher trial completion rates, fewer opioid-positive urine specimens, less frequent use of illicit drugs, and lower patient-reported ratings of the severity of their drug problem in the 3 high-dose treatment arms compared to the low-dose methadone group.11
To fully understand and interpret these results, usual practices at the time of the trial should be examined. In 1998, an National Institutes of Health consensus conference recommended methadone for the treatment of opiate dependence and recognized that its effectiveness is dependent upon adequate dosage and duration combined with continuity of treatment and accompanying psychosocial services.20 According to a 1995 Institute of Medicine report, a typical patient should begin methadone treatment at a dose of 20-40 mg/day and the dose should then be titrated to the minimum level of methadone necessary to prevent symptoms of opiate withdrawal and to eliminate cravings and the use of illicit drugs.21 During the first week of therapy, patients may require additional daily doses of 5 to 20 mg of methadone 3 to 12 hours after the initial dose to prevent symptoms of opiate withdrawal.21 After the first week, the daily methadone dose is increased by 5-10 mg per week during the next 4 to 8 weeks to reach the minimal dose necessary to achieve the treatment goals (typical dose range: 60-120 mg/day, but individual patients may need more or less).21
Based on data available at the time of this trial, patients in clinical practice would have been started on methadone (20-40 mg/day) with subsequent titration to minimize symptoms of withdrawal and eliminate cravings and the use of illicit drugs. Accordingly, randomization of a heterogeneous group of opioid-dependent patients to fixed treatment protocols may have created practice misalignments in all arms of this trial.11 In the low-dose methadone group, randomization created a practice misalignment by restricting a subgroup of patients to a fixed level of methadone which was insufficient. Not surprisingly, this misaligned subgroup had a low retention rate and a high rate of continued illicit drug use. In the 3 high-dose treatment groups, randomization created a practice misalignment by assigning some patients to excessive levels of therapy because titration was not permitted below a dose equivalent to 60 mg/day of methadone. The potential harmful effects in these misaligned subgroups are supported by the increased rate of adverse events (including treatment side effects leading to study withdrawal, over-medication and hospitalization) in the high-dose treatment groups compared to the low-dose methadone group (Table 1).11 The presence of practice misalignments and the absence of a usual care comparison group make the results of this trial difficult to interpret.10,11
Practice misalignments occur when clinical trials of titrated therapies randomize patients to fixed regimens and thereby create subgroups of patients that receive levels of therapy inconsistent with usual care. These misaligned subgroups may have worse outcomes than usual care and may make the overall results of a trial difficult to interpret. A thorough understanding of clinical practice before designing a study may help alert investigators to the potential for unwanted practice misalignments. As illustrated by our 3 retrospective examples, once a study is completed, it may be possible to detect and understand practice misalignments, but not undo their harm in terms of patient safety or trial validity.
Although potentially difficult to perform, a prospective characterization of usual care is necessary to design trials that can minimize the occurrence and impact of practice misalignments. This process begins with a comprehensive review of all available literature, including previous RCTs, observational and retrospective studies, expert opinion, physician practice surveys, and published expert guidelines, to identify variables known to affect therapy level (Table 2).10 For example, blood transfusion survey data that were collected before the TRICC trial identified important clinical characteristics that affected physician transfusion thresholds.12-14 For mechanical ventilation, published literature suggested that tidal volume was adjusted for severity of lung injury.15-17 For the opioid trial, guidelines were available that recommended frequent titration based on individualized drug habits and signs and symptoms of withdrawal.20
In addition to conducting provider surveys, reviewing published studies, and using guidelines, investigators can analyze institutional historical data to characterize usual care at participating hospitals. This process can determine if there are also potentially important relationships reported in the literature at individual centers. Within historical cohorts, the effect of categorical variables (e.g., presence of coronary artery disease) on the administered level of a continuous titrated therapy (e.g., hemoglobin level) can be evaluated by visual inspection and common statistical tests. Relationships between categorical variables and categorical levels of a titrated therapy (e.g., on-pump vs. off-pump coronary artery bypass grafting) can be assessed using chi square or Fisher's exact tests. The impact of continuous variables (e.g., lung compliance or age) on the level of a titrated therapy can be examined using correlation or regression modeling. Patient variables that are identified as significantly related to therapy titration and not accounted for in the study design may lead to practice misalignments.
A simulated randomization of patients to the proposed arms of a trial using a historical cohort may help identify potentially misaligned subgroups and allow for changes in the trial design before the enrollment of patients. Using patient-level data from the cohort, each patient is “assigned” to each treatment arm. These treatment assignments should then be evaluated to determine whether a comparable level of therapy would be used for the same patients outside of the trial. If therapy received by participants is not comparable to usual care, then practice misalignments will likely occur in an actual trial. The investigators must then decide if it is still informative to proceed. If so, a usual practice control or other measures may be needed to ensure participant safety.
Randomizing historical cohorts to the study arms in our 3 examples may have identified practice misalignments before trial enrollment. In the TRICC trial, randomizing a historical group of INTENSIVE CARE UNIT patients to the low hemoglobin threshold of 7 gm/dL might have uncovered the deviation in care that the trial would cause patients with ischemic heart disease. Conversely, the random assignment of young subjects without shock to the high hemoglobin threshold of 10 gm/dL might have alerted investigators to this other group of misaligned patients. For the ARMA trial, a simulated randomization may have raised concerns that randomization would increase the tidal volume of patients with severe ARDS, poor lung compliance, and high airway pressures. Alternatively, other patients with less severe ARDS and relatively high lung compliance would be randomized to low tidal volumes, which might require heavy sedation and predispose these subjects to atelectasis. In the opioid trial, simulated randomization would have demonstrated that heroin addicts with a history of heavy drug use or multidrug use were being randomized to a fixed low-dose methadone regimen without the ability to increase the dose. It may also have demonstrated that physically smaller patients or patients with less heavy heroin use were being randomized to a high dose of methadone without the ability to titrate down. Within each example, misalignments identified during the simulated randomization would have suggested that these variables need to be accounted for in the trial design.
For investigators designing a clinical trial, multiple approaches are available that may minimize the impact of potential practice misalignments (Table 3). In many instances, the best approach is to include a usual care control group managed in a manner consistent with medical practice at the participating hospitals outside of the trial setting. Comparisons between usual titrated care and the other study arms will increase external validity and consequently the generalizability of the results. Although practice misalignments are not eliminated and sample size is likely to increase, this design improves safety monitoring by providing an ability to detect harmful effects relative to usual care.
Another trial design strategy that might minimize the occurrence of practice misalignments is restriction of enrollment to a more homogenous population. A trial of blood transfusion using this design might have included only younger patients or patients with lower APACHE scores and randomized them to a hemoglobin transfusion threshold of 7 gm/dL or usual care. A tidal volume trial might restrict enrollment to only patients with more severe lung disease and randomize patients to 6 ml/kg or usual care. Treatment for opioid dependence could be studied in a trial that randomized high-intensity addicts to either a high-dose therapy group or usual care beginning with low-dose methadone with titration. These restricted population trials can minimize practice misalignments and improve patient safety monitoring. However, the results will only be applicable to the subpopulations of patients included in the study.
A heterogeneous patient population could be studied in a single trial using a stratification scheme based on important patient/disease characteristics, with matched usual care control groups. In this design, participants are randomized to either the treatment arm or the usual care arm within strata created by clinical characteristics that affect therapy level. For example, the TRICC trial could have stratified patients based on 2 age-based categories and then randomized patients to a hemoglobin transfusion threshold of 7 gm/dL or usual care, creating a total of 4 subgroups (age < 55 years and a threshold of 7gm/dL, age < 55 years and usual care, age > 55 years and a threshold of 7 gm/dL, and age > 55 years and usual care). The ARMA trial could have measured lung compliance in each patient before randomization and then stratified patients into high or low compliance groups with randomization to 6 ml/kg or usual care. The opioid trial could have similarly stratified patients based on the severity of their addiction. Stratification allows for a heterogeneous patient population to be studied and monitored in separate strata for safety and efficacy with a planned statistical analysis to test for treatment-covariate interactions across the strata. This design allows for improved safety monitoring, with earlier detection of harmful treatment effects within each subgroup. However, this design requires multiple comparisons and may require larger sample sizes to maintain statistical power.
Alternatively, an adaptive trial design could be used that links titration-dependent characteristics to treatment dose or allows for therapy titration based on a priori rules.22-24 In general, this approach uses an algorithm that adapts treatment to intermediate patient outcomes. For example, in a trial of blood transfusion, patients could be randomized to a hemoglobin transfusion threshold of either 7 or 10 gm/dL with the following algorithms: in the 7 gm/dL group, if signs of ischemia or active bleeding develop, transfuse to 10 gm/dL; for signs of hypoxia, transfuse to 8 gm/dL; in the 10 gm/dL group, if pulmonary edema develops, decrease to 8 gm/dL. The major drawback of this trial design is that it is best suited to compare the effects of the overall algorithms and not the individual segments. Creating algorithms with external validity that are widely accepted may not be possible for many interventions. However, for some titrated therapies, treatment algorithms responsive to patient outcomes may be more relevant to clinical practice than comparing fixed treatment regimens that misalign important patient subgroups.
Another trial design that may limit misalignments is broadly known as outcome-adaptive randomization.24 In this design, randomization within subgroups that might be susceptible to treatment-covariate interactions can be adjusted based on outcomes as the trial progresses. The occurrence of practice misalignments is minimized in this design because randomization to harmed subgroups is automatically reduced over time. For example, in the TRICC trial, this design would have led to fewer patients with ischemic heart disease being randomized to the restrictive transfusion threshold arm as the trial progressed. In the ARMA trial, fewer patients with less compliant lungs would have been randomized to the 12 ml/kg arm. In the opioid trial, fewer patients would have been randomized to the fixed low-dose treatment group. This design allows for treatment effects to be detected, but generalizability of the results might still be limited if one of the study arms is not reflective of usual care. In addition, the randomization process is complicated because it uses patient outcomes to alter the probability of assigning patients to certain trial arms. Monitoring subgroups and randomization based on more than 1 or 2 characteristics becomes technically challenging and may limit the feasibility of a trial.
Finally, if therapy titration in clinical practice cannot be easily protocolized, then a proportionate alteration in therapy might be compared to a usual care group.10 For example, to test the hypothesis that lower transfusion thresholds lead to improved survival, a trial could randomize patients to usual care or transfusion at a hemoglobin level 1 gm/dL lower than the level set at the bedside by the provider team. For a tidal volume trial, patients with ARDS would be randomized to remain at the same tidal volume or to a 10% decrease in their current tidal volume. This design preserves the relationship between therapy titration and patient /disease-specific characteristics by using the usual care level chosen by the clinician as the starting point. In addition, it should permit adequate safety monitoring and produce results which can be generalized to usual care. Issues that may arise with this trial design include difficulty with blinding and with adherence to the assigned study groups.
Designing a clinical trial of a titrated therapy is difficult. Randomization to artificial fixed treatment protocols simplifies trial design and reduces the required number of patients, but may produce results that fail to protect participant safety and inform medical practice. Although complicated and potentially cumbersome, the characterization of usual care to identify significant relationships between therapy titration and patient- and disease-specific characteristics is critical to designing trials that minimize practice misalignments. Trials that do not include a control group reflective of usual care cannot conclude that the treatments used in these trials are superior to the titrated care provided in clinical practice. In fact, universally applying the recommendations of these trials in clinical practice may lead to worse patient outcomes due to the presence of practice misalignments. The results of the discussed trials demonstrate the importance of titration and the potential downfalls of using fixed treatment regimens. There are alternative trial designs that may minimize the impact of practice misalignments on trial results and produce generalizable results. However, each of the designs discussed has its own set of benefits and drawbacks and would need to be applied appropriately. We recognize that there are limited research resources available; however, insufficient resources should not be the basis for performing trial designs that are potentially unsafe and whose results cannot be used to change practice. Designing adequately powered trials of titrated therapies that are feasible, safe, and valid remains an ongoing challenge to the clinical researcher.
Sources of Funding: This study was funded by the Intramural Research Division of the Critical Care Medicine Department, Clinical Center at the NIH, and by extramural NHLBI grant # K22HL089041-01.
Publisher's Disclaimer: The content of this manuscript represents the opinions of the individual authors and is not the official opinion of the National Institutes of Health, the United States Government, the University of Pennsylvania School of Medicine or the Children's Hospital of Philadelphia.
Conflicts of Interest: None