We investigated biomarker heterogeneity and the optimal number of FOVs required for accurate immunostaining assessment of biomarker expression in breast carcinoma. Our mixed-effects analysis showed that, between the eight biomarkers we examined, there were significant differences in heterogeneity, as quantified by the intra-tumor coefficient of variation. Optimal number of 20X FOVs, determined by the cross-validated average prediction error, varied by epitope from three to fourteen. The clinical significance of our findings is two-fold. First, they suggest that biopsies consisting of very few FOVs may be inadequate for use in diagnostic immunostains, because they may not contain enough FOVs to account for biomarker heterogeneity. Second, they suggest that the optimal tissue sampling algorithm required to account for biomarker heterogeneity must be determined individually for each biomarker introduced into clinical use. The optimal number of FOVs trended with the results of the mixed-effects analysis of heterogeneity. S6K1, ERK, and AKT had similar optimal FOV sample sizes and a correspondingly large overlap in the 95% confidence intervals for their coefficients of variation. ER, which had the highest measured coefficient of heterogeneity, had a relatively large optimal sample size. Although it was not possible to calculate a coefficient of variation for MAP-Tau, its large optimal FOV sample size is consistent with the qualitative heterogeneity observed in immunostains. The similarity of the optimal number of FOVs between ER and ERK, despite significant differences in their coefficients of correlation, demonstrates imperfect correspondence between mixed-effects modeling of heterogeneity and the optimal number of FOVs. This suggests that optimal sampling must be empirically calculated for each marker rather than predicted from models of marker heterogeneity.
Of note, we included only ER-positive cases, as judged by pathologist-based scoring systems, in our analysis. We predicted that ER-negative cases would likely have an extremely low intra-tumor variability. Therefore, analysis of a mixed sample of ER-negative and ER-positive cases might have underestimated both intra-tumor variability and the optimal number of FOVs. We do not believe that this limits the generalizability of our results, as our goal was to estimate a minimum number of FOVs required for accurate determination of ER status.
The differences between the optimal number of FOVs for the biomarkers we tested suggests that there exists no single, optimal sampling algorithm for all biomarkers in breast carcinoma. Instead, the optimal number must be determined on a marker-by-marker basis. Biomarkers that are known to be more heterogeneous, such as MAP-Tau, are likely to require more FOVs; however, for the reasons stated above, precise sampling algorithms must be empirically determined.
The observed heterogeneity likely arises from several sources: intrinsic biological differences in epitope expression, pre-analytic variables (such as variable cold ischemic time and formalin penetration of tissue), and technical variables of the AQUA method of quantitative immunofluorescence. As it is impossible to know a priori the relative contributions of the different sources variability, we believe that blind adjustment of the assay to reduce its dynamic range risks the loss of clinically relevant information. Instead, we believe that the best strategy is to first determine the degree of sampling necessary to produce a representative score and then to compare that score to cutoffs that have been validated against clinical outcomes.
This study has several limitations. First, we used the average AQUA score over all FOVs in a whole tissue slide to model the 'true' representative score for each subject when calculating prediction error. The variation within a single whole tissue slide may be less than the variation between histologic 'blocks' from different regions of tumor. As a result, the number of FOVs determined in this study may underestimate the amount required to obtain a representative measure for each biomarker's expression. Our results may be conservatively interpreted as a minimum required number for clinical use.
A second limitation is the relatively small number of subjects used for many of the biomarkers. For all biomarkers other than MAP-Tau, we were required to simulate sampling FOVs from a normal distribution described by the measured mean and variation of observed FOVs, in order to avoid introducing bias. However, the validity of this analysis of the smaller cohort (n = 14) is strongly supported by our dual analysis of MAP-Tau, which was a large cohort (n = 122) with a large number of FOVs measured per subject. When MAP-Tau data was analyzed by both direct sampling and simulation, the results for the optimal number of fields and SE of the estimate were identical.
The third limitation is that AQUA is not currently used in many clinical laboratories. AQUA uses fluorescence for visualization and optimal quantification rather than DAB used in most conventional labs. However, the underlying immunohistochemistry technique and biology is the same, so the results should be generalizable to any method of visualization. Furthermore, the most recent set of ASCO/CAP guidelines states, 'image analysis is a desirable method of quantifying percentage of tumor cells that are immunoreactive' [2
This pilot study offers guidance regarding the size of tissue sample that is required to account for heterogeneity in the specific biomarkers studied. More broadly, it suggests that further investigations are necessary in order to describe optimal sampling for other biomarkers in pre-clinical or clinical use, both in breast carcinoma and other tissue types.