|Home | About | Journals | Submit | Contact Us | Français|
Clinical trials that test interventions for symptom management must target patients whose symptoms are severe and can benefit from participation. Screening symptoms for their severity prior to trial entry may be an important element of trial design. This research describes the utility of screening for severity of symptoms prior to entry into clinical trials for symptom management in cancer. To accomplish this, 601 cancer patients undergoing chemotherapy were assessed at screening and the initial intervention contact, using the 0-10 rating scale for severity of nine symptoms. Post-test probabilities and likelihood ratios were estimated across cut-offs in screening severity scores. Areas under Receiver Operating Characteristic curves for reaching threshold of four at the initial intervention contact were estimated by a nonparametric method. It was found that screening severity scores were good predictors for identifying patients who would not reach threshold but did not always accurately predict patients who would. The cut-offs between 2 and 4 on a 0-10 scale could be used to identify patients that might benefit from receipt of interventions. For all symptoms, the likelihood ratios were greater than one across possible screening cut-offs. The findings indicate that decision rules based on screening prior to entry into cancer symptom management trials can provide reasonable discriminative accuracy by differentiating among patients who are likely to reach higher levels of severity later in the trial from those who are not. Optimal severity cut-offs can be established based on likelihood ratios and desired sensitivity and specificity.
Numerous pharmacological or behavioral interventions to manage symptoms experienced by cancer patients during treatment have been designed and tested. For selected symptoms (pain, fatigue and distress) National Comprehensive Cancer Network guidelines suggest symptom monitoring for severity of 1 to 3 on a 0-10 rating scale, and intervening for symptoms that are over the threshold of 4 in severity (1). When multiple symptoms are present, the guidelines have not been developed when to monitor symptoms, and when to initiate the symptom management interventions for each symptom. Screening for symptom severity prior to enrollment into clinical trials may be used to ensure that the research resources are directed at patients who are in need of symptom management and can benefit from the interventions delivered. However, symptom screening faces several challenges. First, it is argued that symptoms are ephemeral lasting from hours to weeks. Second, symptoms differ dramatically in their severity and duration. Symptoms reported when patients are screened for entry into a trial may be present but at less severe levels or not present at all when patients begin the trial and patients may no longer perceive a need to participate or may be experiencing different symptoms from those that are the focus of the intervention. As a result, the integrity of symptom management trials may be compromised. For example, trials to detect significant reductions in pain or fatigue, and are powered based on all cases having that symptom, may have inadequate sample sizes if a substantial number of patients fail to report this symptom at the first intervention contact. Similarly, trials that depend upon symptoms being at a minimum severity to be effective will, have inadequate power to detect differences if too few patients report pre-specified levels of symptom at the onset of the intervention.
When the literature is assessed differences emerge regarding the nature of symptoms. For example, Kroenke (2) suggests that symptoms are of varying and often undeterminable duration and Komaroff (3) cites literature suggesting that there may be a psychosomatic component to symptom expression. Kollar et al. (4) found that symptoms were positively associated with negative affect and negatively with social desirability measures. Countering these perspectives are arguments that symptoms, particularly cancer-related ones, are sustained over time. Given et al. (5) report sustained but varying levels of pain and fatigue over the course of a year following the diagnosis of solid tumor cancers. Among lung cancer patients symptoms appear to persist over time with variations according to treatment modality (6). Among patients receiving palliative care (7), observed symptom scores decreased as time from diagnosis increased. The increased severity of certain symptoms as patients undergo treatment is well documented (8). Large bodies of literature have reported on how nausea and vomiting follow administration of some chemotherapies, and other literature has documented the presence of anticipatory nausea and vomiting (9). Lowered hemoglobins and Absolute Neutrophil Counts strongly suggest that certain chemotherapeutic agents are responsible for fatigue, weakness, insomnia and other symptoms. Finally, new targeted agents produce severe and specific symptoms such as rash, fever, and joint pain probably (10).
Many symptom management trials focus on a single symptom (11-14) or groups of symptoms (15-18). This study draws on data from two symptom management trials directed toward the management of multiple cancer and treatment-related symptoms to inform the design of future trials that focus on specific symptom(s) or groups of symptoms. Two questions guide this research. For each of nine cancer-related symptoms, what is the utility of screening for predicting need for treatment as defined by National Comprehensive Cancer Network (NCCN) guidelines for symptom management (1)? Second, what severity cut-offs may be selected at screening to predict a symptom status that may require an intervention in a clinical trial?
This research will inform the design of future symptom management trials by evaluating the utility of screening of nine common cancer- and treatment-related symptoms for reaching or exceeding predefined NCCN symptom severity guidelines and thus suggesting a need for management.
This research is based on data from two randomized clinical trials comparing cognitive behavioral intervention with education and information only strategies delivered to assist cancer patients with the management of their multiple symptoms while undergoing chemotherapy.
The Institutional Review Board (IRB) of the sponsoring university and the IRBs of two comprehensive cancer centers, one community cancer oncology program, and six hospital-affiliated community oncology centers approved this research. Through subcontracts, nurses from the respective clinical trials offices were hired and trained to implement the recruitment protocol. To be eligible, patients had to meet the following requirements: 1) be 21 years of age or older, 2) have a diagnosis of a solid tumor cancer or non-Hodgkin’s lymphoma, 3) be undergoing a course of chemotherapy, 4) be able to speak and read English, and 5) have a touchtone telephone. Patients who met these requirements and were willing to participate signed consent forms. For all participating patients recruiters entered their socio-demographic information into a web-based tracking system. Prior to the entry into the trials, patients were screened for symptom severity using an automated voice response version of M. D. Anderson Symptom Inventory (19). All patients were called twice weekly for up to six weeks. Since the two trials were targeting multiple symptoms, a score of 2 or higher on any symptom was selected as a cut-off for entry based on guidelines for symptom monitoring. Only two patients did not meet this criterion, that is, all but two patients had at least one symptom at 2 or higher in screening. Therefore, complete data are available on patients with varying levels of severity of multiple symptoms at screening.
Upon entering the trial, patients had a baseline interview and received a copy of the Symptom Management Guide (SMG). In one trial, patients were randomized either to a nurse administered arm or to an arm conducted by a non-nurse coach trained to assess patients’ symptoms and to refer them to the SMG for each symptom rated at a 4 or higher in severity. Severity of 4 or higher is defined by NCCN guidelines as threshold for the need for symptom management. In the second trial, patients were randomized to either a nurse-administered arm or to an interactive voice response arm that assessed patients’ symptom severity, and, for symptoms above the threshold of 4, referred them to the SMG. In the nurse-administered arms of both studies, nurses delivered up to four strategies for each symptom supplemented with references to the SMG. In all four arms, patients had six contacts over eight weeks. Data from the initial intervention contact collected prior to the delivery of any intervention strategies are used in this study. We included a total of 601 patients. Figure 1 summarizes the flow of patients from eligibility through randomization and completion of their initial intervention contact. The interval of time from screening to baseline interview was on average 7.3 days, and an interval from baseline interview to the initial intervention contact was on average seven days
Age, gender, site and stage of cancer were obtained from the patients’ medical records, entered into the tracking system, and confirmed at the baseline interview. Comorbid conditions were assessed at baseline using a 13-item questionnaire, which asked patients whether a health care provider had ever told them that they had such conditions as diabetes, high blood pressure, or other chronic diseases.
At screening, baseline interview, and each intervention contact, the severity of nine symptoms commonly associated with solid tumor cancers and their chemotherapy were measured. The symptom list included pain, fatigue, nausea, insomnia, shortness of breath, memory loss, poor appetite, dry mouth, and numbness and tingling. Patients were asked to rate the severity of each of their symptoms in the past 24 hours with 0 being not present and 10 being the worst it could be.
For each of the nine symptoms, several indicators were used to assess the utility of screening severity score as a predictor of whether patient would report a severity of 4 or higher at the initial contact of the intervention. For each symptom, pre-test probabilities were estimated as the proportions of patients who reached threshold of 4 at the initial intervention contact out of all patients. Post-test probabilities were estimated as proportions of those who reached threshold of 4 at the initial intervention contact out of those who scored above a specified cut-off in screening. Cut-offs of 1-6 were investigated.
Discrimination of the symptom screening was quantified using likelihood ratios defined as the ratios of pretest and post-test odds of reaching a threshold of 4 at the initial intervention contact. Likelihood ratios (LRs) and their 95% confidence intervals (CI) were calculated for each symptom and possible cut-offs of 1-6 in screening. A likelihood ratio of 1 indicates screening is not useful to predict symptom reaching 4 at the initial intervention contact, whereas LR greater than 1 indicates that patients who score above cut-off in screening are more likely to reach severity of 4 or higher at the initial intervention contact compared to all patients. The likelihood ratio can also be expressed as the true-positive rate (sensitivity) divided by the false-positive rate (one minus specificity). Sensitivity was the proportion of patients who reported severity above each cut-off in screening out of patients reaching a threshold of 4 at the initial intervention contact. Specificity was the proportion of patients who reported severity below each cut-off in screening out of patients not reaching a threshold of 4 at the initial intervention contact. The areas under Receiver Operating Characteristic curves (AUC) and their standard errors were estimated for each of the nine symptoms using the nonparametric method proposed by Hanley and McNeil (20). The AUC between 0.7 and 0.8 represents acceptable discrimination while the AUC greater than 0.8 is considered as excellent discrimination (21).
Table 1 summarizes the characteristics of 601 patients who completed screening, and the initial intervention contact. Cancer sites were in order of prevalence; breast, lung, and colon, followed by gastrointestinal 4.5%, gynecological 7.8%, pancreatic 3.2%, non-Hodgkin’s lymphoma 5.8%, myeloma 12.4%, other 2.3%. More than half of the participants (62.5%) had two or more comorbid conditions. AU: SHOULD 62.5 BE 62.6 AS IN TABLE 1?
The pre-test probability, post-test probability, and likelihood ratios for screening cut-offs 1-6 are summarized in Table 2.
The greater proportions of patients reaching threshold of 4 or higher at the initial contact (pre-test probabilities) were observed for fatigue (57.8%), insomnia (35.2%), poor appetite (31.2%), dry mouth (27.5%), and pain (27.1%). The smallest pre-test probabilities were observed for nausea (13.1%) and shortness of breath (16.8%). The post-test probabilities were substantially greater than the pre-test probabilities at a screening severity cut-off of 3 or greater for most symptoms. The LRs were greater than 2 at the screening severity of 3 for all symptoms except fatigue and insomnia. That is, for all symptoms except fatigue and insomnia, patients who achieved a screening severity of 3 or higher have a 2 times higher odds of reaching a threshold of 4 at the initial contact compared with the odds for all patients in the sample. Numbness (LR=4.0, 95% CI=3.00, 5.00) and memory loss (LR=3.5, 95% CI=2.87, 4.26) had the greatest likelihood ratios at the screening severity of 3. The likelihood ratio for fatigue was close to 2 (LR=1.97, 95% CI=1.54, 2.49) for the screening severity cut-off of 6.
The graphs of sensitivity and specificity across the range of possible cut-offs in screening are in Figure 2. These graphs describe more clearly how sensitivity and specificity changed across screening scores for each symptom. Convex (upward) curves of sensitivity (solid line) and specificity (dash line) reflect greater frequency of higher screening severity for patients reaching threshold and a lower severity for patients not reaching threshold. The convex curves represent higher accuracy of sensitivity and specificity. For example, if the specificity reaches 80% at a screening score of 2, then the specificity curve will be convex. In contrast, the concave (downward) curves indicate that patients reaching threshold were more likely to report lower screening severity and those not reaching threshold were more likely to report higher screening severity scores. For most symptoms except fatigue and, to a lesser extent, insomnia, severity scores at screening are good indicators of patients who will not achieve a threshold score of 4 and thus not require the intervention.
Memory loss and numbness (symptoms with fairly high LRs) had higher specificity along the possible range of cut-offs for screening severity (Figure 2) and the highest AUCs among the nine symptoms. For numbness, the AUC was 0.81 with a 95% CI=0.71, 0.86, and for memory loss AUC was 0.80, with a 95% CI=0.75, 0.85 (not shown). Many patients reported fatigue at screening but not necessarily at particularly high levels of severity, thus making screening severity score for this symptom not as good a predictor of patients who would reach threshold at the initial intervention contact compared to other symptoms. Compared to other symptoms, fatigue had lowest score area under ROC curve (AUC=0.67, 95% CI=0.62, 0.71). The AUCs were greater than 0.7 (acceptable discrimination) for all symptoms except fatigue. Finally, the graphs in the Figure 2 for nausea reflect relatively high specificity but the poor sensitivity (concave curve located below the diagonal). This is explained by the fact that patients reported nausea at relatively low levels of severity at both screening and at the initial contact. Because about 430 patients experienced severity of less than 4 in nausea at both screening and the initial contact, nausea has a fairly high specificity (83%) at the cut-off of 4.
Trials seeking to test new pharmacologic or psycho-behavioral interventions for the management of cancer-related symptoms need to accrue patients who exhibit symptoms that are sufficiently high in order to demonstrate a response to treatment. Accruing patients whose symptoms fail to reach a pre-specified threshold, as established in the research design, will result in lost power to detect a statistically significant response when in fact one may occur. Thus it is important to set eligibility criteria at levels that will assure that patients who enroll and enter the trial can benefit from intervention being tested.
This study examined how levels of symptom severity at screening predicted the need for symptom management (a severity of 4 or higher at the first intervention contact) among patients with solid tumor cancers who were undergoing chemotherapy. The principal finding from this study is that the introduction of a screening mechanism for symptom severity can successfully identify patients who would report a symptom at a pre-specified threshold of 4 or higher during the following two weeks when the intervention is introduced. The estimated AUC for all symptoms except fatigue was between 0.7 and 0.8, and for all symptoms, the range of cut-offs for the LRs were greater than 1, indicating the usefulness of including screening prior to implementing a symptom management trial. The finding of relatively low AUC for fatigue is consistent with the results of Wang et al. (8), who noted that among lung cancer patients undergoing concurrent chemotherapy and radiation, fatigue was the most severe symptom that remained at high levels over several weeks from initiation of treatment. High prevalence of fatigue among cancer patients has been reported in several studies (1, 22-24). Relatively small numbers of patients who reported low screening severity of fatigue resulted in a small denominator when estimating the specificity. Other symptoms such as pain and peripheral neuropathy (25), and cognitive problems (7, 26) have been found to persist over time, therefore it is not surprising that memory loss and numbness and tingling had AUCs and LRs suggestive of excellent discrimination.
Specificities made the major contribution to the overall accuracy of the screening severity based prediction. Increasing cut-off at screening would decrease sensitivity, but also increase specificity. Therefore, through the results of these analyses, researchers and clinicians can select cut-offs to establish levels of sensitivity and specificity consistent with the goals of their research or clinical care. Several approaches to determining a decision rule have been used in classifying patients according to the need for a pharmacologic or psycho-behavioral intervention for managing their symptoms and include (27-29): 1) selecting a cut-off that maximizes the sum, sensitivity plus specificity; 2) selecting a cut-off with equivalent sensitivity and specificity; 3) selecting a cut-off with desired level of sensitivity or specificity, and finding the corresponding cut-off; 4) weighting sensitivity or specificity based on prevalence of each symptom obtained to minimize the total number of misclassifications. In this study, we determined that fairly consistent cut-offs can be selected using either of these approaches. For example, the first two approaches would result in an identical cut-off between 2 and 4 depending on symptom.
Approaches to establishing decision rules depend to a large extent on the issues to be addressed. For clinical trials, it may be appropriate to set cut-offs that minimize false positives since enrolling patients who at screening appear to need symptom management but upon entry into the trial fail to meet threshold scores will lead to floor effects and increase sample size requirements. In practice, failure to manage aggressively cancer patients’ symptoms can lead to dose delays or interruptions or to costly emergency visit and hospitalizations. In these situations, where costs of symptom management may be small, relative to symptom related adverse events, the emphasis should be placed on reducing false negatives and thus including more patients who may benefit from the intervention. The results reported in this study could be used as an input into any of these algorithms for determining optimal cut-offs.
It is important to acknowledge that the prevalence of each symptom (the proportion of patients reaching a threshold), can affect both AUC and post-test probabilities. For cancer patients, significant variations in average symptom severity were observed over time during and after a course of chemotherapy and radiotherapy (22). Therefore, changing the timing of screening tests across the treatment period could affect the prevalence of symptom, and affect the properties of screening. Future use of the statistics obtained in this study depends upon similar setting of study including treatment, timing of screening and level of threshold.
Decision rules based on screening prior to entry into cancer symptom management trials can provide reasonable discriminative accuracy by differentiating among patients who are likely to reach higher levels of severity later in the trial from those who are not. The findings of this study can inform planning of the trials with regard to selecting optimal cut-offs based on likelihood ratios and desired sensitivity and specificity.
This research supported by Grants R01 CA79280 and R01 CA30724 from the National Cancer Institute.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.