A detailed discussion of CER for patient outcomes associated with diagnostic tests is naturally beyond the scope of this commentary. As most readers are aware, at least one class of such studies, the comparative studies of screening modalities, has received a lot of attention in the literature and has consumed large amounts of resources. However, CER studies are infrequent when it comes to patient outcomes related to the three other primary roles of diagnostic modalities in health care, namely, diagnosis and disease staging, patient management, and post-therapy surveillance. This situation will undoubtedly change in the coming years, as more attention is given to the evaluation of outcomes of tests and biomarkers, across the spectrum of health care.
The development of CER studies of patient outcomes associated with tests and biomarkers will benefit enormously from the application of the principles of an evidentiary framework for CER, presented by Tunis and colleagues. Surely a range of methods will need to be used, including but not limited to randomized clinical trials. And as surely, there is no “one-size-fits-all” approach.
When it comes to randomized studies of test outcomes, it is important to note that both long- and short-term effects of tests materialize in a context defined by the available health care options, including therapeutic interventions. It is therefore not possible to define and measure test effects outside the particular health care context in which the test will be used. For example, in the unfortunate but not altogether rare situation in which therapy offers rather ineffective options, the impact of diagnostic tests on patient outcomes will be minimal. It is also important for study designs to link test results to specific therapeutic strategies.
Even with close linkage of test results to therapeutic interventions, there are significant challenges when it comes to the practical feasibility of randomized studies of test outcomes. Consider for example a simple design for a randomized study to compare outcomes of two alternative tests, A and B, which are being evaluated for use in detecting the presence of a clinical condition. In this design, patients will be randomized to undergo test A or test B and the subsequent course of action will be based on the results of the tests. For simplicity, assume that two therapeutic interventions are available and that a positive test result on either A or B would lead to a decision to adopt the first intervention (Tx1). A negative result on either test would lead to a decision to adopt the second intervention (Tx2). Tx2 could be “usual care” or active therapy depending on the context.
The simple randomized design of is representative of many settings in which randomized studies are conducted to compare test outcomes. Assume now that prior studies provide estimates of the success rates r1 and r2 for therapeutic interventions Tx1 and Tx2, when performed on cases that actually have the clinical condition the two tests are intended to detect. If the specificities of the two tests are equal, some algebra shows that the difference in the overall success rates between the two arms of the randomization is
where p denotes the prevalence of the clinical condition and
Sens denotes the sensitivity of a test. As can be seen from
Eq (1), even if the prevalence of the clinical condition is relatively high, the actual difference in overall success rates between the two arms of the study is only a fraction of the difference in success rates between the two therapeutic interventions.. If, as in the screening setting, the prevalence is low, the estimate of effect is even smaller.
The simplified setting of the example underscores the often stated point that randomized studies of test outcomes can be very resource intensive, to the point of becoming impractical. Insistence on the availability of randomized study results as the definitive evidence for the appropriateness of a particular diagnostic modality in a given clinical context can create the type of impasse mentioned by Tunis and colleagues in the case of PET in cancer therapy.
This is not to say that randomized studies of outcomes are to be abandoned in the case of PET or any other diagnostic modality. More efficient designs may be already available or could be developed in the future (
2). However, the promise of CER is not to confine its range of possibilities within a single evidentiary dimension but to encompass a range of methodologic approaches.
An example of an approach that has been already put to practice is the use of registries. As noted by Tunis and colleagues, the National Oncology PET Registry (NOPR) provides a useful example of the possibilities and limitations of the approach (
3). The registry was developed in order to assess the effect of PET on referring physicians’ plans for intended patient management and to do so across a spectrum of cancer conditions in which PET may be used. To-date NOPR has gathered information on more than 130,000 cancer patients and has led to a substantial list of published reports documenting particular aspects of PET use. These reports are available in a more timely and real-world setting than would be possible in prospective clinical trials, perhaps even in the (curiously named) “pragmatic” trials advocated by Tunis and colleagues. Of course, the evidence provided by NOPR has significant limitations, including selection bias and silence on key questions about whether the intended therapy changes were actually implemented and whether they led to improved patient outcomes. Some of these limitations could be overcome by additional follow-up data collection, while others such as the potential for selection bias would apply to many types of registries.