This rather complex triangulation study to derive CIDs estimates for heart disease patients in the domains of the CHQ and scales of the SF-36 followed Denzin's approach for combining informant groups and methods (Denzin 1970
). We specifically targeted the three primary stakeholder groups: expert physicians, outpatients with heart disease, and the PCPs who care for those outpatients. Ultimately, however, the question comes down to this: do the resulting CID estimates in and allow us to “overcome the deficiencies that flow from one investigator and/or one method,” and if so, how? (Denzin 1970
, p. 300). To answer that question, we must begin by reviewing the information obtained from each informant group.
Our expert physician panel used methods that may be subject to coercion and/or domination by one or more members (Stasser, Kerr, and Davis 1989
), although we did not witness any evidence of this. Second, although prior patient-based measurement studies were referenced by panelists throughout the Delphi rounds and consensus meeting, the panelists made their final recommendation without reference to specific patient data, but through the use of the state change concept. Moreover, the simplicity of symmetry and even increments of change is attractive and easily communicated, yet “real” patient data may or may not behave in such a predictable pattern. In addition, we cannot know exactly what criteria were used by individual experts to assess their final and consensus values for the magnitude of each improvement or decline. It is important to note that our proposal for this study stated a priori that expert panel-based CID estimates would be considered “of lowest evidence value because they do not reflect actual encounters between patients and their primary care physicians” (p. 102, Wolinsky et al.
, R01 HS10234).
The second group of informants—outpatients with heart disease—provided the most important source of data for our triangulation. However, their bimonthly cross-sectional and retrospective measurements are also subject to error. The retrospective change items for each CHQ domain and SF-36 scale (“Since your last interview on ‘prior interview date,’ has there been a change in your fatigue? Is it better, worse or about the same?”) ask participants to revisit the day of their last interview, remember their HRQoL dimensional state (e.g., How much energy did I have at that time?) and then compare it with their current HRQoL state. These global assessments of change have long-been reported to be biased toward the patient's current health rating and may not reflect their health status before 2 months.
However, we had little (<1 percent) incomplete data for these retrospective-anchoring items. Of course, as demonstrates, most retrospective comparison responses were “about the same” and, indeed, this was the “easiest” manner to reply. If an interviewee chose “better” or “worse,” she/he then had an extra follow-up item that solicited how much better or worse on a seven-point scale. However, nearly 33 percent of the retrospective change item responses did reflect patient-perceived changes. Our study design attempted to improve the accuracy of these responses through the use of a memory marker. Interviewers solicited an event or statement from respondents at the end of each interview that would help the respondent to remember that particular day at the next interview and noted it in the interview database. Examples of the memory markers include: “I went to church with my niece today” and “I dug up the garden for planting.” These statements were then read back to the interviewees before they responded to the retrospective change items at their next interview. Although participants have expressed their delight in hearing their own descriptions of what was memorable on the prior day repeated back to them 2 months later, we have no empirical evidence that this practice decreased the well-known error associated with retrospective change assessments. Moreover, both our patients and PCPs provided repeated assessments, and therefore our ratings are not statistically independent. However, corrections for this statistical dependence, as well as the nesting of patients within PCPs, would have an effect on the standard deviation of the associated mean change score thresholds, but not on the value of the mean point estimates themselves. Thus, we have considerable confidence in the validity of the patient-based CIDs.
Measurements from the third-informant group reflected PCP assessments of change in the patients' heart disease, and these were then used to anchor change for all HRQoL measures. Unfortunately, this global item is not consistent with the specific domain change items used by patients. An alternative would have been to ask PCPs to separately assess changes in activities, fatigue, emotional functioning, physical functioning, role physical, bodily pain, etc., at each linked follow-up visit with each participant. This rather long list of relevant HRQoL dimensional changes would have burdened our PCPs and the busy practice settings where they treated our enrollees, and that would have made this study of clinically significant change unworkable. Nonetheless, it is clear that our PCP assessments reflect physician-perceived changes in the patients' heart disease, and that construct does not directly correspond to any specifically measured dimension of HRQoL in this study. Although this clinically informed evaluation is important in understanding CIDs, it is evident that PCPs are generally gauging a different theoretical construct than that which patients are evaluating (Detmar et al. 2001
Those limitations notwithstanding our low-weighted κ
results for seemingly related areas, like the CHQ activities domain (κ
=0.23) or the SF-36 physical functioning scale (κ
=0.14), demonstrate the great need for enhanced and improved dialogue between patients and their PCPs to improve clinical encounters and clinical decision making. It is possible that PCPs may have based their assessments on objective findings (e.g., evidence of CHF worsening on physical examination) or changes sufficient to trigger alterations in treatment, yet different changes are leading to patient-perceived improvements or declines, such as the inconvenience of moving to a new home with no stairs or selling one's farm (Velikova et al. 2004
). Whether this is the explanation or others will surface in these necessary discussions, it is the primary result or insight gained from this triangulation process.
It is also important to note that our results for small patient-perceived changes reflect an average change score on each CHQ domain that meets or exceeds at least one state change. Results from the SF-36 did not perform as strongly. Indeed, they often yielded mean values that were smaller than a state change value.
Unlike the use of triangulation by a surveyor or navigator, the results from this study elucidate how different information sources and methods do not and should not necessarily point to a single best estimate. Instead, we seek to use these data to better understand the three stakeholders—experts, heart disease patients, and their PCPs—and their approach to the daunting challenge of estimating CID thresholds for patient-reported outcomes (PROs). We also seek to provide a process that others can employ to determine CIDs for PROs. Toward that goal, we believe that our results demonstrate that it is very difficult in cohort studies of chronic disease to obtain sufficiently large enough samples of patient- and PCP-perceived “changers” at the moderate or large level for either improvements or declines. Instead, clinical trials involving effective interventions might be required, although this design would better assess moderate to large improvements rather than declines. Second, these results and others indicate that the SF-36 may not have the sensitivity needed to capture individual-level changes in a consistent manner as to yield a stable CID estimate. Therefore, we recommend the use of disease-specific instruments to demonstrate important HRQoL changes if the instrument also strives to capture those often overlooked mental or psycho-social domains of patient's health and well being.
A recent Food and Drug Administration report, Innovation or Stagnation: Challenges and Opportunities on the Critical Path to New Medical Products
, focused on stemming the tide of products that are delayed from reaching those consumers who can benefit (Food and Drug Administration 2004
). This document speaks directly to the need for community consensus between health professionals and patients “on appropriate outcome measures and therapeutic claims” (p. 24).
Beyond this FDA recommendation, the HRQoL of patients with heart disease should be a primary concern for their health care providers. Hence, awareness and appreciation of the multiple perspectives and the uniqueness of each perspective is necessary for assessing the impact of potential therapeutic interventions on maintaining and/or improving health. Like blind men and the elephant, all the points of view are essential. Dialogue, measurement, and respect for each of these stakeholder's perspectives are necessary to secure an in-depth understanding and eventually achieve an informed community consensus on the magnitude of an important change over time in PRO measures.