|Home | About | Journals | Submit | Contact Us | Français|
The interlaboratory reproducibility of cytokine measurements from cervicovaginal samples by Luminex has not been reported. Using cervicovaginal lavage specimens collected on three study days from 12 women participating in a Phase I microbicide study, we measured a panel of eight cytokines in three independent laboratories. Four (IFN-γ, IL-10, IL-17 and TNF) were below the limit of detection in the majority (85%) of samples in either two or all three laboratories, an observation that may guide analyte selection for future studies. Good interlaboratory agreement (intraclass correlation coefficient, r > 0.7) in absolute levels was observed for IL-1β, IL-6, and IL-8, while poor agreement was seen for IFN-α2 (r = 0.47). When considering within-subject change from baseline (pre-product, at study-day 0) to either post-product visit (study-days 7 and 14), IL-1β and IL-6 exhibited good interlaboratory agreement (r > 0.7), while IFN-α2 and IL-8 did not. Future studies addressing the clinical utility of specific biomarkers of inflammation for microbicide trials should consider reproducibility in the context of defining biologically meaningful thresholds of change for candidate biomarkers, ensuring that such change can be reliably distinguished from background variability.
Vaginal microbicides hold great promise as a female-controlled strategy for the prevention of human immunodeficiency virus (HIV) and other sexually transmitted infections. The recent CAPRISA 004 trial, in which a 39% reduction in HIV and an unanticipated 51% reduction in HSV-2 acquisition were observed in women who applied 1% tenofovir gel before and after sex, illustrates the exciting potential of this strategy . These encouraging results contrast with those obtained in earlier microbicide trials with surfactants and polyanionic entry inhibitors. Not only did the earlier products fail to protect against HIV, but several (Nonoxynol-9, C31G [Savvy], and cellulose sulfate) were associated with at least a trend towards higher rates of HIV infection [2–4]. Subsequent work indicated that those products may have facilitated HIV acquisition by inducing local inflammation and disrupting the epithelial barrier [5–9].
The central role inflammation plays in promoting HIV infection is further supported by studies with other sexually transmitted infections, suggesting that the inflammatory response recruits T cells into the genital tract to increase the risk of HIV infection. Inflammatory cytokines may also promote HIV replication by activating the long terminal repeat. Together these observations suggest that inflammatory mediators in genital tract secretions may serve as biomarkers of HIV risk and their measurement could prove predictive of the safety of vaginal microbicides, mucosal vaccines, or other interventions. A critical prerequisite in the development of biomarkers is the validation and standardization of assays. While Luminex multiplex technology is commonly used to measure cytokines and chemokines in various specimen types including female genital tract specimens, there are little or no data about the reproducibility of results across different laboratories. This has important implications for future clinical studies, as it will determine whether assays need to be performed at a single centralized laboratory and the extent to which results obtained from different studies can be compared.
Therefore to address this gap, convenience cervical samples that had been collected as part of a Phase I microbicide safety trial were evaluated in a blinded fashion in three independent laboratories to determine the interlaboratory variability in cytokine and chemokine measurements using the Luminex-100 multiplex system. We selected a panel of mediators that included those that had been previously shown to be associated with HIV infection risk in vitro (IL-1β, IL-6, IL-8, and TNF), antiviral cytokines (IFN-α2 and IFN-γ), the anti-inflammatory cytokine IL-10, and IL-17, which plays a role in neutrophil recruitment, promotes the production of β-defensins , and has been shown to play a role in the immune response to N. gonorrhoeae .
Cervical vaginal lavage (CVL) specimens collected from women participating in a Phase I clinical safety trial of a candidate vaginal microbicide were used for this study . The trial is registered at www.ClinicalTrials.gov (NCT00331032) and all participants provided informed consent. De-identified specimens from days 0, 7, and 14 were evaluated, where day 0 represents baseline (pre-product) and days 7 and 14 represent the respective number of days on either product or placebo (in the same gel carrier). At each study visit, the cervix was lavaged with 5 ml normal saline, following which the specimen was aspirated and frozen at −80°C without centrifugation.
Thirty-six CVL specimens from 12 subjects were tested in a blinded (for subject and study day) fashion by three independent research laboratories for eight cytokines (IFN-α2, IFN-γ, IL-1β, IL-6, IL-8, IL-10, IL-17, and TNF). As it has previously been reported that assays from different vendors have poor intervendor agreement [13–15], a single vendor, Millipore (Billerica, MA), was used for this study. Specimens were thawed, vortexed to ensure homogeneity, aliquoted, refrozen, and distributed to each laboratory. Following identical sample handling and testing protocols at each of the three laboratories, the specimens were thawed at room temperature, centrifuged (2000 G at 4°C for 10 min) to remove mucus and cellular debris, and tested in duplicate using MilliPlex MAP human cytokine/chemokine immunoassay kits (Millipore), per manufacturer’s instructions. Kits from a single manufacturing lot were used by all three laboratories. Briefly, samples were incubated with antibody-conjugated microspheres, overnight, in 96-well filter-membrane assay plates with agitation. Plates were then washed with wash buffer provided in the assay kits and vacuum filtration, following which analyte-bound beads were incubated with a biotinylated detection antibody cocktail and finally with streptavidin-phycoerythrin. Following additional wash steps and resuspension of beads in instrument sheath fluid, plates were run on Luminex 100 instruments (Luminex, Austin, TX). Regression curves (5-parameter logistic) were fit, and unknown concentrations in pg/ml determined by interpolation, by each laboratory using their local software (Laboratories A and B: STarStation [Applied Cytometry Systems, Sacramento, CA]; Laboratory C: MiraiBio MasterPlex QT version 2.5 [Hitachi Software, South San Francisco, CA]). Concentrations from duplicate wells were averaged.
Sample values below the lowest standard (3.2 pg/ml) were set at the midpoint between zero and this value; values above the highest standard (10,000 pg/ml) were set at 10,000 pg/ml. Agreement of the cytokine measurements among the three laboratories was assessed using the intraclass correlation coefficient (r), which provides an index of the intersubject variability relative to the total variability , for all measurements and stratified by study day. Within-subject changes from baseline (day 0) at day 7 (“Δ7”) and day 14 (“Δ14”) were calculated by subtracting log-transformed baseline levels for each subject from log-transformed day 7 and (separately) day 14 levels (equivalent to ratios of the non-log-transformed values). Interlaboratory agreement in detecting within-subject change at the two post-product visits was then examined using the intraclass correlation coefficient. SPSS version 19 (SPSS, Inc.) was used for all statistical analyses.
The median and range of baseline (day 0) values obtained for each cytokine and from each lab are shown in Table 1. Four of the cytokines tested (IFN-γ, IL-10, IL-17 and TNF) exhibited expression levels too low in CVLs to measure reliably, with 85% or more of samples (from all study days) falling below detection limits in either two or all three laboratories (Table 2). These were therefore excluded from further analyses. Of the other four cytokines, good interlaboratory agreement (r > 0.7) was seen for IL-1β and IL-6, both overall and stratified by study day, and for IL-8 overall and on study-days 7 and 14 (Table 3). IFN-α2, in contrast, showed poor interlaboratory agreement except for the day-7 samples.
Because of the variable degree of interlaboratory agreement in absolute cytokine measurements, and because normal ranges for these markers in CVL specimens have not been established, it was important to assess whether the ability to detect within-subject change from baseline was consistent across laboratories. Interlaboratory agreement in within-subject change was good at both post-product study days for IL-1β and IL-6 (Table 4). IFN-α2 and IL-8, in contrast, showed poor interlaboratory agreement.
In the case of IL-8 it was noted that, despite the overall interlaboratory agreement in absolute levels shown in Table 3, three of 36 specimens showed marked interlaboratory discordance (not shown) and that these were all clustered in the baseline specimen group. Because the discordance among baseline samples could affect both the day-7 and day-14 within-subject change assessments, which might not have been the case had these been distributed across study days, we repeated the analysis with these three subjects omitted. In this analysis, the intraclass correlation coefficient rose to 0.46 (95% CI: 0.05, 0.82) at day 7 and 0.81 (95% CI: 0.53, 0.95) at day 14.
Using the intraclass correlation coefficient, which, in repeatability studies, provides an index of the natural variability between samples relative to the total variability , we found that three (IL-1β, IL-6, and IL-8) of the four cytokines with measurable levels had good interlaboratory agreement. This was the case both in absolute level measurements and (for IL-1β and IL-6) when examining within-subject change when baseline and post-product samples are tested within the same laboratory. IFN-α2, in contrast, showed poor reproducibility. We note, also, that several important cytokines of biologic interest (IFN-γ, IL-10, IL-17, and TNF) demonstrated levels too low to be reliably measured using the Luminex-based assay kits, an observation which may guide analyte selection for future studies of genital-tract immune markers.
While our results suggest reasonably good interlaboratory agreement, there was variability in absolute measurements among the laboratories, which was more marked for some samples than others. This impacted the IFN-α2 reproducibility most obviously, but was observed to a lesser extent for the other cytokines tested as well. Among known causes of variability of Luminex measurements [17–19], it has been demonstrated that different instruments can give significantly different readings even when calibrated to the same standard, presumably due to differences in their opto-electrical response curves . Also of note in our study is that the instruments used by the participating laboratories were outfitted with different software packages for acquisition and analysis. This raises the possibility of variability being introduced through different underlying curve-fitting algorithms in the respective software packages, even with all three laboratories using a 5-parameter logistic curve fit. While the 5-parameter model most closely fits the ligand-binding kinetics of immunoassays, nearly eliminating the lack-of-fit error of the 4-parameter logistic model while avoiding the pitfalls of overparameterized models , it can be much more difficult to fit via software algorithms. In determining best fit by minimizing the weighted sum of squared errors, most algorithms are unable to reliably distinguish a local minimum in an ill-conditioned regression from the global minimum of the correct result . Unfortunately, employment of a proprietary data-file format by STarStation  precludes our exchanging raw fluorescence data for reanalysis on the opposite platform to further examine the role of software in our findings.
Low levels may also have contributed to the poor interlaboratory agreement observed for IFN-α2. Wong et al. reported coefficients of variation in cytokine measurements between replicate serum samples as high as 44%, which was in sharp contrast to previous studies that had assessed reliability of Luminex measurements in the linear portion of the standard curves . Because these authors studied physiologic levels, many cytokines fell into the lowest portion of the sigmoidal standard curve, where a leveling off of the curve increases the imprecision in unknown interpolation. In spite of that, they concluded that — in cases where the intersubject variability results in high intraclass correlation coefficients irrespective of assay coefficients of variation — the method has potential utility in epidemiologic studies. Thus, in considering the two cytokines with low, measurable levels in our own study, caution is urged in measurement of IFN-α2 from CVL specimens, whereas the high intraclass correlation coefficients we report for IL-1β are reassuring for measurement of that cytokine.
To our knowledge, this is the first study of the interlaboratory reproducibility of cytokine measurements by Luminex using clinical, cervicovaginal specimens. In a multicenter study of cytokine immunoassay performance, Fichorova et al. examined the contributions of interlaboratory variability, matrix effect, and assay method on recovery, using recombinant reference standards for IL-1β and IL-6 spiked into different matrices . The authors concluded that, in the commercial Luminex kit studied, interlaboratory reproducibility is good for IL-1β (able to detect a 1.84-fold difference between measurements performed in different laboratories), but less so for IL-6 (able to detect only a 6.5-fold or higher difference). The relative contributions to that variability of manufacturing lot, software package, and curve-fit model were not addressed. They reported that recoveries are better for both cytokines when prepared in saline (as was used for CVL collection in the present study) than in phosphate-buffered saline, highlighting an important matrix effect of specimen collection medium. Another important conclusion from that study was that biologically active reference standards or endogenous cytokines should be used to validate assay performance and reproducibility, rather than the calibrators included with assay kits. Our results, using clinical study specimens, confirm their findings for IL-1β but differ for IL-6.
Although the intraclass correlation coefficients reported here suggest good agreement for several cytokines, and thus potential utility for the Luminex platform in microbicide safety studies, they do not provide a context for evaluating whether that level of agreement is sufficient. For a biomarker assay to be useful, it must have a level of reproducibility (interlaboratory and other) that allows one to distinguish biologically meaningful changes in expression from background variability. Adopting concepts and terminology from Lee et al. , assay validation is best regarded as an iterative process and is intertwined with biomarker "qualification" (i.e., identification of specific biomarkers that can serve as acceptable surrogates for an endpoint of interest). Part of the biomarker qualification process for microbicide safety studies will entail defining meaningful thresholds of change and should include revisiting the question of reproducibility. Thus, as candidate biomarkers of microbicide safety are identified and characterized, assay acceptance criteria must include demonstration that the fold differences that can be reliably measured within and between laboratories allow clinically meaningful changes to be detected. Whether a centralized laboratory is needed will depend on interlaboratory variability as evaluated within that context. Irrespective of whether multiple laboratories are used, there appears to be broad support in the literature for selection of a single assay kit vendor for use throughout a study [13–15] as well as support for using either biologically active reference standards or actual clinical specimens for validation . Our results also support testing pre- and post-product specimens from a given subject in the same laboratory. Lastly, we recommend that either the same software package be used for curve fitting, or that software packages be employed that allow exchange of raw fluorescence data for cross-laboratory reanalysis and validation. If such cross-validation were to indicate interoperator variability, curve fitting and unknown interpolation of data from different laboratories could then be centralized.
The authors thank Dr. Craig Cohen, protocol chair of the VivaGel Phase I trial, for making specimens available for this study and for facilitating pre-submission review of the manuscript by the STI Clinical Trials Group Executive Committee. This study was supported by the National Institutes of Health (NIH) grants AI065309 from the National Institute of Allergy and Infectious Diseases (NIAID) and R37 CA051323 from the National Cancer Institute, and by the STI Clinical Trials Group (NIAID Division of Microbiology and Infectious Diseases HHSN266200400074C). In addition, this research was supported by NIH/NCRR UCSF-CTSI grant number UL1 RR024131 from the National Center for Research Resources (NCRR). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. Information on NCRR is available at http://www.ncrr.nih.gov. Information on Re-engineering the Clinical Research Enterprise can be obtained from http://nihroadmap.nih.gov/clinicalresearch/overview-translational.asp.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.