We developed and followed a structured protocol. We considered randomised controlled trials of surveillance mammography and diagnostic consecutive cohort studies of surveillance mammography or other comparator tests, involving women previously treated for primary breast cancer without detectable metastatic disease at the time of presentation for their initial treatment. We also considered indirect (between-study) comparisons by comparing cohort studies analysing results of at least 100 women who received surveillance mammography, or a comparator test, or a combination of tests, with the reference standard test in the same population. We excluded case reports and studies investigating technical aspects of a test. Comparator tests included ultrasound, magnetic resonance imaging (MRI), specialist-led clinical examination and unstructured primary care follow-up (defined as absence of formal routine secondary care follow-up, which may or may not involve mammography). The reference standard was histopathological assessment for test positives and a period of follow-up for test negatives.
We chose to include studies assessing test performance for routine and non-routine surveillance patients. Adjunct tests are part of breast cancer surveillance management and the performance of diagnostic tests used for this purpose is relevant to our population of interest. The accuracy of non-routine adjunct imaging tests may differ from the accuracy of first-line surveillance tests as the test operator is primed to evaluate a suspicious finding in the non-routine surveillance patient. It is unclear what effect this has on test accuracy but it is likely to focus attention on a particular area of the breast and may conceivably increase the diagnostic test sensitivity. Consequently, we have not attempted to mix or compare the performance of tests used for these different purposes. Similarly, because of anatomical differences between a “treated” and an “untreated” breast (due to treatment effects) it was inappropriate to combine data on test performance for the detection of ipsilateral breast tumour recurrence and metachronous contralateral breast cancer.
The following types of outcome were considered:
- Test performance in diagnosing ipsilateral breast tumour recurrence in women undergoing routine surveillance
- Test performance in diagnosing ipsilateral breast tumour recurrence in women undergoing non-routine surveillance
- Test performance in diagnosing metachronous contralateral breast cancer in women undergoing routine surveillance
- Test performance in diagnosing metachronous contralateral breast cancer in women undergoing non-routine surveillance
To be considered for inclusion, the studies had to report the absolute numbers of true-positives, false-positives, false-negatives and true-negatives, or provide information allowing their calculation, and report a per-patient analysis.
In studies reporting the above outcomes, we planned to record the following additional outcomes, if stated:
- Adverse effects (defined as physical harms) of mammography and other tests
- Acceptability of the tests
- Reliability of the tests
- Radiological/operator expertise (who conducts the test and previous experience)
- Interpretability/readability of the tests
Major electronic databases were searched using sensitive search strategies to identify diagnostic studies of surveillance mammography, MRI, ultrasound or clinical follow-up. Searches were conducted from 1990 to March 2009 and were restricted to the English language. Conference abstracts were not included. The following databases were searched for primary studies: Medline, Medline In process, Embase, Biosis, Science Citation Index, Cancerlit, while Medion, the Cochrane Database of Systematic Reviews (CDSR), Database of Reviews of Effects (DARE) and the HTA Database were searched for reports of evidence syntheses. Reports of ongoing and recently completed trials were sought from Current Controlled Trials, Clinical Trials, WHO International Clinical Trials Registry Platform, NCI Clinical Trials Database, NRR Archive and NIHR Portfolio Database. In addition, relevant websites were searched and the reference lists of all included studies were scanned for additional reports. Full details of the search strategies used are available from the authors or the full study report, currently in press (“The clinical effectiveness and cost-effectiveness of different surveillance mammography regimens after the treatment of primary breast cancer.” by Robertson et al. accepted for publication in Health Technol Assess 2011).
From an initial first screening round of titles and abstracts we were able to exclude reports that were clearly irrelevant to the review (e.g. did not include any of our considered diagnostic tests). We then assessed the full text versions of the remaining reports against our eligibility criteria using a screening tool comprising a checklist of our inclusion eligibility criteria, which we developed specifically for this review. One reviewer independently carried out data extraction. A second reviewer independently validated the data extraction. We calculated sensitivity, specificity, positive and negative likelihood ratios and diagnostic odds ratio for each included study.
We evaluated the quality of studies using an adapted version of the Quality Assessment of Diagnostic Accuracy Studies QUADAS tool [6
]. Higher quality studies were defined as those considering a representative patient spectrum and judged to have successfully avoided partial verification bias (whether the whole or random sample of the population received reference standard verification), differential verification bias (whether patients received the same reference standard) and test review bias (whether index and reference standard test results were interpreted independently). Disagreement or uncertainty regarding data extraction or quality assessment was resolved by discussion or arbitration by a third reviewer.