In this study, we used IHC data from 1,212 patients and FISH data from 616 patients. The data were derived from a series of 4,046 cases of invasive breast carcinoma diagnosed in 1986–1992, referred to the British Columbia Cancer Agency (BCCA) for treatment, and assembled into 17 tissue microarray (TMA) blocks. Ethical approval for the study was obtained from the Clinical Research Ethics Board of the BCCA [
28]. Previously frozen breast cancer tissue samples were fixed in 10% neutral buffered formalin, embedded in paraffin and used to construct TMAs consisting of 0.6 mm tissue cores using a manual arrayer (Beecher Instruments, Inc., Silver Springs, Maryland, USA) as previously described [
35,
36].
From each TMA block, 4 μm thick sections were cut and immunostained on Ventana Benchmark XT staining system (Ventana Medical Systems, Tucson, Arizona, USA). Sections were deparaffinized in xylene, dehydrated through three alcohol changes and transferred to Ventana Wash solution. Endogenous peroxidase activity was blocked in 3% hydrogen peroxide. Slides were then incubated with Ventana PATHWAY anti-HER2/neu (4B5) rabbit monoclonal antibody at 37°C for 32 min and developed in DAB for 10 min. Finally, sections were counterstained with hematoxylin and mounted.
HER2 was scored visually by two independent pathologists (BG, GT) according to the HercepTest guidelines: 0 (negative): no staining is observed, or membrane staining is observed in <10% of the tumor cells; 1+ (negative): a faint/barely perceptible membrane staining is detected in >10% of tumor cells; the cells exhibit incomplete membrane staining; 2+ (weakly positive, equivocal): a weak to moderate complete membrane staining is observed in >10% of tumor cells; and 3+ (strongly positive): a strong complete membrane staining is observed in >10% of tumor cells. Only six 3+ cases (0.5%) showed heterogeneous staining, i.e. would have been interpreted as 2+ by ASCO/CAP guidelines. Therefore, the scoring system used in this study would not impact the results and conclusions. Scores were entered into a standardized Excel worksheet with a sector map matching each TMA section. Cases were not included in the statistical analysis if there was no tumor tissue in the cores or the cores were cut through. Original scoring grids were converted to tables using Deconvoluter 1.10 [
37] and combined in a single text file with TMA-Combiner 1.00 [
38]. The resulting text files were imported into SPSS 15.0 and R2.4.0 for Windows [
39].
The same slides were digitized with a commercial image analysis system Ariol (Applied Imaging Inc., San-Jose, California, USA). For clinical lab applications, Ariol has received FDA clearance as an aid to pathologists in the detection, classification, and counting of cells of a particular color, intensity, size, pattern, and shape. Applied Imaging has received additional FDA 510(k) clearances for specific applications, including immunohistochemical assessment of HER2 in breast cancer. The Ariol system is based on an Olympus microscope with motorized stage and autofocus capabilities, and equipped with a black and white video camera. We regularly performed bright-field calibration using the Calibration slide to ensure accurate scanning and analysis. The system was set to Kohler illumination to capture high quality images. Slides were scanned at 20× objective magnification with three filters: red, green and blue. Ariol software, which converts these three-channel images into color reconstructions, was used for image analysis. The program was trained by a pathologist (DT) using representative cores containing areas that would be scored as 1+ and 3+ visually. Using the color pickup tool within the Ariol image analyzer, we selected membranes with weak positive staining and assigned "1+ intensity"; we then selected the membranes with strong positivity and assigned "3+ intensity". Similarly, we selected counterstained nuclei with the color pickup tool, and adjusted the desired size, roundness and other shape parameters under visual control. Numeric values for colors of the positive objects, i.e. membranes, and negative objects, i.e. nuclei, were stored on the hard drive in a color classifier file. Numeric values for the shape of the nuclei were stored in a separate shape classifier file. The program used these two files for segmentation of the nuclei and the membranes in all other cores, and these two files were sent out to be used in the machine 2. Scores from a "0" to a "3+" were automatically generated by the Ariol image analysis software for each core, based on the intensity and completeness of the positively stained membranes, and the percent of positive cells. The Ariol algorithm applies HercepTest criteria for the score calculations. Visual examples and a graphical explanation are given in Figure . The training step increases the specificity of the analysis as it ensures that extracellular matrix and most stromal cells are excluded from image analysis, and it allows the program to calculate percent of positive tumor cells more precisely. After the program training on one of the representative TMA cores, the rest of the analysis was performed without human supervision. All tissue cores were analyzed in toto; no specific pathologist selection of tumor tissue within the cores was made following the training step. For statistical analysis, we selected only cores with at least 50 tumor cells detected, i.e. all cores with less than 50 cells were considered unscorable. To get an estimate of the demands posed on the operator of the Ariol system, the same slides were scanned and processed on an identical Ariol system by an operator with less than one week experience working with this particular Ariol script (KM). The descriptors of the color and shape of the positive and negative tumor cells were transferred from one system to another, therefore variations in the image analysis results depended only on the scanner settings, i.e. brightfield calibration, positioning and white balance, but not on the image analysis settings.
The hematoxylin and eosin and IHC images of all cores used in this study are publicly available at the companion site [
40]. The site was constructed using Genetic Pathology Evaluation Centre (GPEC) database and a Java applet provided by Bacus Laboratories, Inc. All slides were scanned with a BLISS scanner (Bacus Laboratories, Inc., Lombard, Illinois, USA), and posted on the site. WebSlide Browser for Windows (Bacus Laboratories, Inc., Lombard, Illinois, USA) can be used for viewing preview images of the arrays and images of individual cores.
Six-micron sections of the TMA slides were hybridized with probes to LSI HER2 and CEP17 with the PathVysion™ HER2 DNA Probe Kit using a modified protocol, as previously described [
41]. Analysis of FISH signals was performed using Metasystems™ automated image acquisition and analysis system, Metafer (Metasystems, Altlussheim, Germany). This automated system scores FISH signals by employing specific measurement algorithms to detect and quantify clustered signals. Average copy number for each probe was calculated and the amplification ratio (ratio between the average copy per cell for Her2 and the average copy for centromere 17) determined (MC). HER2 amplification was defined as a HER2/CEP17 ratio of 2.2 or more. A HER2/CEP17 ratio <1.8 was considered negative for HER2 amplification, and a ratio at or near the cut-off (1.8–2.2) was interpreted as equivocal. Tumors that failed to hybridize were not included in the analysis. We only accepted scores if >40 tiles were counted. With Metafer system, one tile is considered one cell as the size of a tile is approximately the average size of a nucleus. Normal cells were excluded wherever possible, and the corresponding H&E slides were reviewed when needed.
For statistical analysis, we used data from 1212 patients for the IHC and 616 patients for the IHC/FISH comparisons. Exclusion criteria included core drop-off during processing, insufficient or absent tumor tissue within the cores, and artifactual distortion of the tissue making discrimination of cellular structure impossible. Statistical analysis was performed in SPSS 15.0 for Windows (SPSS Inc., Chicago, Illinois) and R 2.4.0 [
39]. All tests were two-sided and used a 5% alpha level to determine significance. 95% bootstrapped confidence intervals were calculated using the adjusted bootstrap percentile (bias-corrected and accelerated) method [
42]. Breast cancer specific survival was estimated using Kaplan-Meier curves and survival differences were determined by log-rank tests. We used the open-source R 2.4.0 package to calculate differences between kappa statistics from visual to automated scoring comparisons; a permutation test with 10,000 permutations was implemented.