Using the intraclass correlation coefficient, which, in repeatability studies, provides an index of the natural variability between samples relative to the total variability [16
], we found that three (IL-1β, IL-6, and IL-8) of the four cytokines with measurable levels had good interlaboratory agreement. This was the case both in absolute level measurements and (for IL-1β and IL-6) when examining within-subject change when baseline and post-product samples are tested within the same laboratory. IFN-α2, in contrast, showed poor reproducibility. We note, also, that several important cytokines of biologic interest (IFN-γ, IL-10, IL-17, and TNF) demonstrated levels too low to be reliably measured using the Luminex-based assay kits, an observation which may guide analyte selection for future studies of genital-tract immune markers.
While our results suggest reasonably good interlaboratory agreement, there was variability in absolute measurements among the laboratories, which was more marked for some samples than others. This impacted the IFN-α2 reproducibility most obviously, but was observed to a lesser extent for the other cytokines tested as well. Among known causes of variability of Luminex measurements [17
], it has been demonstrated that different instruments can give significantly different readings even when calibrated to the same standard, presumably due to differences in their opto-electrical response curves [20
]. Also of note in our study is that the instruments used by the participating laboratories were outfitted with different software packages for acquisition and analysis. This raises the possibility of variability being introduced through different underlying curve-fitting algorithms in the respective software packages, even with all three laboratories using a 5-parameter logistic curve fit. While the 5-parameter model most closely fits the ligand-binding kinetics of immunoassays, nearly eliminating the lack-of-fit error of the 4-parameter logistic model while avoiding the pitfalls of overparameterized models [21
], it can be much more difficult to fit via software algorithms. In determining best fit by minimizing the weighted sum of squared errors, most algorithms are unable to reliably distinguish a local minimum in an ill-conditioned regression from the global minimum of the correct result [21
]. Unfortunately, employment of a proprietary data-file format by STarStation [22
] precludes our exchanging raw fluorescence data for reanalysis on the opposite platform to further examine the role of software in our findings.
Low levels may also have contributed to the poor interlaboratory agreement observed for IFN-α2. Wong et al. reported coefficients of variation in cytokine measurements between replicate serum samples as high as 44%, which was in sharp contrast to previous studies that had assessed reliability of Luminex measurements in the linear portion of the standard curves [23
]. Because these authors studied physiologic levels, many cytokines fell into the lowest portion of the sigmoidal standard curve, where a leveling off of the curve increases the imprecision in unknown interpolation. In spite of that, they concluded that — in cases where the intersubject variability results in high intraclass correlation coefficients irrespective of assay coefficients of variation — the method has potential utility in epidemiologic studies. Thus, in considering the two cytokines with low, measurable levels in our own study, caution is urged in measurement of IFN-α2 from CVL specimens, whereas the high intraclass correlation coefficients we report for IL-1β are reassuring for measurement of that cytokine.
To our knowledge, this is the first study of the interlaboratory reproducibility of cytokine measurements by Luminex using clinical, cervicovaginal specimens. In a multicenter study of cytokine immunoassay performance, Fichorova et al. examined the contributions of interlaboratory variability, matrix effect, and assay method on recovery, using recombinant reference standards for IL-1β and IL-6 spiked into different matrices [24
]. The authors concluded that, in the commercial Luminex kit studied, interlaboratory reproducibility is good for IL-1β (able to detect a 1.84-fold difference between measurements performed in different laboratories), but less so for IL-6 (able to detect only a 6.5-fold or higher difference). The relative contributions to that variability of manufacturing lot, software package, and curve-fit model were not addressed. They reported that recoveries are better for both cytokines when prepared in saline (as was used for CVL collection in the present study) than in phosphate-buffered saline, highlighting an important matrix effect of specimen collection medium. Another important conclusion from that study was that biologically active reference standards or endogenous cytokines should be used to validate assay performance and reproducibility, rather than the calibrators included with assay kits. Our results, using clinical study specimens, confirm their findings for IL-1β but differ for IL-6.
Although the intraclass correlation coefficients reported here suggest good agreement for several cytokines, and thus potential utility for the Luminex platform in microbicide safety studies, they do not provide a context for evaluating whether that level of agreement is sufficient. For a biomarker assay to be useful, it must have a level of reproducibility (interlaboratory and other) that allows one to distinguish biologically meaningful changes in expression from background variability. Adopting concepts and terminology from Lee et al. [25
], assay validation is best regarded as an iterative process and is intertwined with biomarker "qualification" (i.e., identification of specific biomarkers that can serve as acceptable surrogates for an endpoint of interest). Part of the biomarker qualification process for microbicide safety studies will entail defining meaningful thresholds of change and should include revisiting the question of reproducibility. Thus, as candidate biomarkers of microbicide safety are identified and characterized, assay acceptance criteria must include demonstration that the fold differences that can be reliably measured within and between laboratories allow clinically meaningful changes to be detected. Whether a centralized laboratory is needed will depend on interlaboratory variability as evaluated within that context. Irrespective of whether multiple laboratories are used, there appears to be broad support in the literature for selection of a single assay kit vendor for use throughout a study [13
] as well as support for using either biologically active reference standards or actual clinical specimens for validation [24
]. Our results also support testing pre- and post-product specimens from a given subject in the same laboratory. Lastly, we recommend that either the same software package be used for curve fitting, or that software packages be employed that allow exchange of raw fluorescence data for cross-laboratory reanalysis and validation. If such cross-validation were to indicate interoperator variability, curve fitting and unknown interpolation of data from different laboratories could then be centralized.
> Three laboratories measure cytokines in cervicovaginal lavage samples by Luminex
> IFN-γ, IL-10, IL-17 and TNF are below detection in a majority of CVL samples
> IL-1β, IL-6 and IL-8, but not IFN-α2, show good agreement in absolute measurements
> IL-1β and IL-6 show good agreement in within-subject change after microbicide gel use
> Cytokine measurement by Luminex has potential utility in microbicide safety studies