|Home | About | Journals | Submit | Contact Us | Français|
7 January 2016
Scientific progress is made by building upon the findings of other researchers and confirming those findings through repeated experiment. It also comes through learning from one’s mistakes.
Eight major randomized trials of breast cancer screening using mammography have been conducted to test the efficacy of mammography in contributing to reducing breast cancer mortality. In November 2014, most of the world’s leading breast cancer epidemiologists met for 8 days in Lyon, France, at the International Agency for Research on Cancer, part of the World Health Organization, to review all the studies on breast cancer screening. Anthony Miller and I both attended that meeting. The assembled experts concluded, on the weight of the randomized trials and multiple other studies that used more modern mammography than was used in the randomized trials, that there was sufficient evidence that mammography screening reduces breast cancer deaths among women 50–74 years of age and limited evidence among women 40–49 years of age1. (A resolution for “sufficient evidence” for screening women 45–49 failed by 1 vote.) The Independent U.K. Panel on Breast Cancer Screening had similar findings2. Nevertheless, Miller continues to base his position on screening only on his study, the Canadian National Breast Screening Study (cnbss), ignoring the results of all of the other studies that found benefit when his did not.
In my response to Steven Narod’s defense of the cnbss3, I pointed to the critical 1993 review of those studies by Boyd et al.4, which raised multiple concerns regarding the conduct of the cnbss. Two of the most important were the apparent imbalance in advanced cancers found in the prevalence round of the cnbss and the poor diagnostic quality of the images as determined by at least 5 external expert breast radiologists who reviewed cnbss images5. In his letter, Miller, while referring to those concerns as specious, then launches into a masterful circumlocution. Responding to my tongue-in-cheek remark about the apparent danger of having been allocated to the mammography arm of his study, he refers to a speculative publication6 suggesting (without presenting much in the way of evidence) that breast surgery should not be performed because it promotes metastases that could lead to death.
Miller also suggests that one should not expect to influence mortality on the basis of cancer detected at the prevalence screen. That makes little sense. In a previously unscreened population, one expects the prevalent cancers to consist of a mix of more advanced cancers that have had an opportunity to develop (and for which the opportunity to avert death is probably lower), but also some earlier cancers whose detection and treatment are more likely to be efficacious. My remark regarding the danger of being in the mammography arm refers to the imbalance in the proportions of the earlier and more advanced cancers between the two trial arms (addressed in more detail shortly), an imbalance that was not seen in any of the other randomized trials.
Miller describes me as a “critical” member of the cnbss. I should clarify my role there. Initially, he asked me to provide consultation to the study when Irwin Bross, an American biostatistician, suggested that the radiation from all the mammograms used in the cnbss would be responsible for killing more women than would be saved by the screening. It was not difficult to demonstrate that that argument was wrong. But what I did learn was that the image quality at cnbss sites was poor. I implemented a quality control program and did my best to advise Miller and Cornelia Baines, the cnbss deputy director, that it was necessary to improve both the technology used (which in many cases was already obsolete) and also the techniques used by the technologists producing the mammograms and by the radiologists interpreting them. But the mammography in cnbss was performed at independent facilities, many of them being privately-owned clinics whose choice of technology, decisions to update equipment and techniques, and training of personnel were largely in the hands of the owners, many of whom were unwilling to invest in such improvement. In fact, Miller was able to secure modest funds to assist some of the facilities in improving equipment, but those changes occurred later in the study, and not all facilities were compliant. The structure of the study simply did not allow rigorous standardization of quality, and I expressed that concern publicly in 19937.
Although the quality of the imaging and image interpretation were certainly problematic, the process for allocating women to the study arms is what has elicited (and should elicit) even greater concern. Miller speaks of the “perfect balance” in the randomization with respect to traditional risk factors. Because most women were properly randomized, that balance is expected, but does not negate the concern about improper allocation of women with palpable abnormalities at the time of randomization. The greatest risk factor for dying of breast cancer is to have entered the study with advanced cancer. An imbalance of only a few women with such cancers is certainly enough to influence the results of the study markedly, and it appears that this indeed is the case. More women with advanced cancers were assigned to the screening group in cnbss1.
I pointed to that situation in my recent Point–Counterpoint article in this journal8, but it might be worthwhile to demonstrate how sensitive a study such as the cnbss is to bias in allocation. Although 50,430 women were enrolled in the cnbss1, after the initial follow-up of 8.5 years, there were a relatively small number of breast cancer deaths [n = 66: 38 in the screened group (mp) and 28 in the usual care group9], an apparent mortality disadvantage of 38 / 28 = 1.36. In the first (prevalence) screen, several (n = 24) poor-prognosis cancers (4 or more positive axillary lymph nodes) were found. That number is not unexpected in a previously unscreened population, but what was surprising is that 19 of those cancers (17 with palpable abnormalities) appeared in the mp arm and only 5 in the usual care arm. Within statistical fluctuation, one would expect equal numbers in the two arms. In his letter, Miller suggests that the imbalance is the result of differences in how nodal involvement was assessed between the study arms (another potential limitation of the study), but there is no evidence that this was the case10. He also states that because an abnormality “might have been detected on physical examination of the breasts does not necessarily mean that the cancer was palpable.” Of course not. Screening doesn’t directly detect cancers. It detects abnormalities, some of which are diagnosed as cancer. The point is that the almost all of the poor-prognosis cancers in the mp arm had palpable abnormalities and therefore provided no mammography lead time.
Given that many of the observed breast cancer deaths in cnbss1 would have been attributable to those cancers (presumably Miller could provide this information), even a small imbalance in the number of those cancers would affect the hazard ratio. A rough example is given in Table i. The top row gives the observations presented in the first report of cnbss1 8. In each row, an additional 1 of those cancers has been shifted from the mp arm to the usual care arm until, at the row shown in boldface type, the result that might be expected if the poor-prognosis cancers had been balanced in the allocation of those cancers is seen.
Whatever the reason for the imbalance, the effect on the conclusions of the cnbss of shifting only 7 women of the 50,430 between study arms is enormous: a 36% mortality disadvantage of screening becomes an 11% mortality reduction! Given the fact that women received clinical breast examination before registration, and thus, that clinic staff were aware of any palpable abnormalities before registration to the trial arm through an open-list process, such a shift is plausible. In the 25-year follow-up publication, Miller et al.11 show—by omitting the cancer deaths associated with the prevalence screening round from their analysis—almost the same result. However, in that work they combine data from cnbss1 and cnbss2 (women 50–59), and so it is more difficult to interpret the individual results. In any case, to avoid such problems, this type of randomization would not be permitted in modern trials.
I have declared my potential conflicts of interest. Much of my research involves technology and is therefore carried out in collaboration with industry. I also own shares in a company that develops software to measure breast density. Its greatest use will likely be to identify and divert away from mammography screening those women unlikely to benefit from the procedure. After spending more than $30 million (in today’s Canadian dollars) on cnbss, Miller has invested more than 25 years in attempting to defend that study, whose flaws are revealed by its own data. I suggest that this is also a conflict of interest that should be declared.
Mammography screening has limitations. They include reduced sensitivity in some women, including those with very dense breasts, recalls of some women without cancer for further imaging and, occasionally, needle biopsy. Certainly, some screen-detected breast cancers, mainly ductal carcinoma in situ (and probably some self-detected cancers) are overtreated. But to assert that screening does not reduce breast cancer deaths—not to mention allow for less use of some of the debilitating, aggressive therapies necessary for women with advanced disease—is a fringe opinion at odds with the evidence and global expert opinion. Furthermore, it is an irresponsible message to convey to women and their health care providers.
I have read and understood Current Oncology’s policy on disclosing conflicts of interest, and I declare the following interests: My institution receives funding from GE Healthcare for a research collaboration in the areas of digital breast tomosynthesis and contrast-enhanced digital mammography. I am also a founder and shareholder in Volpara Health Technology, a company that produces software for the purposes of quantifying breast density.