In this study we examined the reproducibility of a 70-gene breast cancer signature in a series of experiments performed in three laboratories, one in Amsterdam, one in California, and one in Paris. In the first part of the study identical RNA samples were labeled and hybridized to identical microarrays using the same platform and protocols, in both the Amsterdam and California laboratories. Reproducibility of signals and ratios was measured for replicate assays in each laboratory. We found that the results were very reproducible between sites. The low noise across the entire platform, as shown by the reproducibility of replicate hybridizations (those done in the same laboratory with the same labeled material), allowed the averaging of the replicates, with the result that minor differences in the data became more apparent (Figure ). In the second phase of the study, the same tumors were labeled and hybridized in the Paris laboratory. Despite being done several months later, and using different lots of microarrays and labeling reagents, the results from the third laboratory were in close agreement with those from the two other laboratories, giving another indication of the robustness of the measurement technology.
We took care to be sure the same operating protocols were used between all the laboratories, and the operators in all laboratories were well trained. We found that if variations in the wash protocol were introduced between laboratories, significant discrepancies in the results emerged (data not shown). It is clear from our findings and those of others [17
] that microarray protocols must be uniform and strictly adhered to in order to achieve good reproducibility between laboratories and operators. However, as we show here, if this is done then reproducibility is very high.
A DNA microarray measurement can be considered as hundreds or thousands of simultaneous analytical measurements of the relative concentrations of mRNAs in a sample. In order to examine the analytical precision, accuracy, and detection limits of these measurements, several laboratories have published cross-platform and other comparisons of microarray measurements [17
]. However, there has not been a detailed examination of the factors contributing to any observed variability in the measurements. A microarray measurement requires several distinct steps. The microarrays themselves must be printed, handled, and stored until use. The RNA sample is purified, labeled with fluorophores, possibly amplified, and possibly fragmented. The labeled sample is hybridized to the arrays, which are then washed, dried, and scanned. At each of these steps variation and errors can arise which could contribute to imprecision in the overall measurement. By using the same input RNAs, the same batches of arrays and reagents, and by exchanging labeled samples and hybridized slides between the Amsterdam and California laboratories, we were able to examine which steps exhibited the largest variation between the two sites.
It should be noted that the experimental setup used in this study cannot measure every possible source of variation. Since all of the hybridizations involving a common sample were hybridized to arrays on the same slide, and the replicate slides in each laboratory were hybridized on different days, we cannot determine whether any variation observed between the two replicate slides is due to slide-to-slide variability or day-to-day variability, or a combination of the two. However, since the experimental setup compounds both potential sources of variation, we would expect that any such differences would be maximized in this study. Despite this, the 70-gene signature correlation values did not vary significantly by hyb day (Figure ).
Another possible source of variation is inter-individual variability. Since all the labelings and hybridizations done at each site were performed by single individuals, the cross-laboratory variability cannot be de-convoluted from the inter-individual variability. However, we would expect that if two different individuals took care to follow the exact protocols, as in this study, that interlaboratory variation would be greater than inter-individual variation, due to use of a different set of laboratory equipment (pipettes, hybridization ovens, etc.). Another study reported measuring the 70-gene signature correlation values of two tumor samples repeatedly in the same laboratory, by six different individuals, with very consistent results (14, and data not shown).
We found that the largest discrepancy between the Amsterdam and California sites was in the amplification/labeling step. This discrepancy was relatively small (about 0.02 in the log10
ratios, which amounts to a 5% difference in the actual expression ratio) but is detectable nonetheless. We used labeling kits from the same lots and purchased at the same time, so all labeling reagents were equivalent. While the labeling site differences were significant for only two of the four tumors when comparing the tumor signature correlation values, the differences extended to all four tumors when examining the log10
ratios of the 70 signature genes on an individual basis. This suggests that the differences seen on an individual gene level are relatively random, and cancel one another out when looking at the signature as a whole, which represents a correlation of the log10
ratios of all 70 genes and averages of measurements from three replicate features for each gene. The variation in individual genes did not correlate with the expression level of the genes, which differs from the findings of Dobbin et al. [19
] who found that lower expressed genes were more variable between laboratories.
Several previous studies examined the cross-platform comparability of microarray measurements [17
], with some studies reporting less variability between platforms than others. Our findings that array results on one platform performed with identical protocols are reproducible across laboratories are similar to the findings of other studies [17
]. However, ours is the first report of the reproducibility of a gene expression signature comprised of a small, defined set of genes. Such signatures have great potential utility in biomedical research, toxicogenomics, pharmaceutical development, and diagnostics. Reproducibility across labs and over time is essential in all these application areas, and our results are an encouraging indication that microarray-based analysis of defined gene signature sets can yield highly robust and reproducible measurements.