|Home | About | Journals | Submit | Contact Us | Français|
The concentrations of cytokines in human serum and plasma can provide valuable information about in vivo immune status, but low concentrations often require high-sensitivity assays to permit detection. The recent development of multiplex assays, which can measure multiple cytokines in one small sample, holds great promise, especially for studies in which limited volumes of stored serum or plasma are available. Four high-sensitivity cytokine multiplex assays on a Luminex (Bio-Rad, BioSource, Linco) or electrochemiluminescence (Meso Scale Discovery) platform were evaluated for their ability to detect circulating concentrations of 13 cytokines, as well as for laboratory and lot variability. Assays were performed in six different laboratories utilizing archived serum from HIV-uninfected and -infected subjects from the Multicenter AIDS Cohort Study (MACS) and the Women's Interagency HIV Study (WIHS) and commercial plasma samples spanning initial HIV viremia. In a majority of serum samples, interleukin-6 (IL-6), IL-8, IL-10, and tumor necrosis factor alpha were detectable with at least three kits, while IL-1β was clearly detected with only one kit. No single multiplex panel detected all cytokines, and there were highly significant differences (P < 0.001) between laboratories and/or lots with all kits. Nevertheless, the kits generally detected similar patterns of cytokine perturbation during primary HIV viremia. This multisite comparison suggests that current multiplex assays vary in their ability to measure serum and/or plasma concentrations of cytokines and may not be sufficiently reproducible for repeated determinations over a long-term study or in multiple laboratories but may be useful for longitudinal studies in which relative, rather than absolute, changes in cytokines are important.
The role of immune activation and inflammation in the pathogenesis of multiple disease states such as HIV (22), cardiovascular disease (20), and neurological disease (17) is becoming increasingly well recognized. Of particular interest is the study of multiple potential biomarkers to identify predictors of disease progression or outcome (12, 13, 28). A widely used approach for relating systemic inflammation or immune status to disease outcome is the measurement of soluble biomarkers in blood (2, 26), tissue culture supernatant (24), or mucosal secretions (32). Historically, studies have focused on quantitation of one or a few soluble biomarkers for correlation with or prediction of disease outcome. However, given the complexity of the human immune system, studies using a limited panel of analytes could potentially miss important disease-associated markers. A number of techniques have recently been developed or improved to address these limitations, including mRNA expression profiling (29), mass spectroscopy (9), and multiplex assays based on flow cytometry, bead-based assays, or electrochemiluminescent assay. The ability to measure a broad array of cytokines using small sample volumes has allowed new insight into disease pathogenesis, for example, defining the ontogeny of cytokine induction in acute viral infections (18, 30, 31).
Despite these advances, the precision and reproducibility of these new approaches have not been well defined. In a study of multianalyte bead-based (Luminex) kits, World Health Organization (WHO) cytokine standards were assayed at the same expected concentrations as the standards provided with each kit, but WHO and kit standards often yielded very different absolute concentrations (21). Multiple studies have compared standard-sensitivity multiplex assays with each other (4, 15, 16) or with enzyme-linked immunosorbent assays (ELISAs) (19), flow cytometry cytokine bead arrays (27), or electrochemiluminescent Meso Scale Discovery (MSD) assays (3). These comparison studies have shown variable agreement among assays and have indicated that absolute cytokine concentrations differ across testing platforms. Such variability is not unique to multiplex assays, as proficiency testing has demonstrated that absolute concentrations of cytokines measured by a single-analyte ELISA can vary widely from lab to lab, although a similar rank order of cytokine concentrations between samples is often preserved (5).
Recently, high-sensitivity cytokine and chemokine multiplex panels have been introduced for the Luminex and MSD platforms. The higher level of sensitivity has allowed detection of cytokine perturbations not previously recognized (30) and raised the possibility that multiplex cytokine assays may be appropriate for use in large multisite prospective studies. The present study was undertaken to investigate this possibility. Six laboratories associated with the Multicenter AIDS Cohort Study (MACS) and the Women's Interagency HIV Study (WIHS) compared high-sensitivity multiplex cytokine kits based on the Luminex and MSD platforms in two ways: first, for their ability to detect circulating concentrations of cytokines, and second, for variability due to laboratory and kit lot. Multiplex panels designed to measure up to 13 cytokines (including some that are known to be altered by HIV infection and disease progression) were evaluated using archived serum samples from MACS and WIHS participants. Serial samples from commercial plasma donors undergoing primary HIV infection were also utilized to determine whether the kits detected changes in plasma cytokine concentrations before and after the onset of detectable HIV viremia.
Serum samples that had been frozen and stored at −80°C and had never been thawed were obtained from 9 HIV-seronegative (HIV−) and 9 HIV-seropositive (HIV+) individuals from the study repositories of the MACS (n = 18 men) and the WIHS (n = 18 women). HIV− and HIV+ subjects were matched by race (all African-American), age (range, 36 to 49 years), study center, and sample date (±6 months), with all samples obtained between 2001 and 2006. At the time of the sampling, all HIV+ subjects had been HIV infected for >2 years, were free of clinical AIDS, had received no antiretroviral therapy for at least 6 months, and had an HIV plasma load greater than 10,000 RNA copies/ml (Roche Amplicor).
Aliquots of frozen sodium citrate plasma, originally obtained from three paid donors over the course of primary HIV infection, were purchased from Zeptometrix Corporation (Franklin, MA) and SeraCare Life Sciences (Milford, MA). Each plasma sample had been thawed and refrozen two times. Each donor donated plasma regularly at 2- and 5-day intervals, with 4 to 7 samples collected before (total n = 15) and 3 samples collected after (total n = 9) the first day with detectable plasma HIV RNA (>100 copies/ml, determined retrospectively by Quest Diagnostics [Amplicor; Roche]). The samples spanned 4 to 5 weeks, which included 14 to 23 days prior to and 15 days following initial HIV viremia.
The six participating laboratories received a 1-ml aliquot of each plasma and serum sample, except for lab D, which received aliquots of plasma samples only. Each laboratory thawed the samples upon receipt, centrifuged the plasma samples, and prepared aliquots of 70 to 125 μl that were refrozen at −80°C until testing. Thus, all samples had the same freeze-thaw history at the time of testing in each laboratory.
Labs A to D used fluorescent bead-based instruments (Luminex-100 [Luminex Corporation, Austin, TX] or Bio-Plex 200 [Bio-Rad, Hercules, CA]) and tested three different high-sensitivity kits: Bio-Plex Precision Pro human cytokine 10-plex (Bio-Rad, Hercules, CA), Human UltraSensitive Cytokine Ten-Plex (BioSource/Invitrogen, Carlsbad, CA), and LincoPlex/Milliplex Map high-sensitivity human cytokine panel (13-plex; Linco/Millipore, Billerica, MA) (Table 1). Luminex/Bio-Plex instruments were validated using a Bio-Plex validation kit within 2 weeks of each assay and calibrated on assay days using a Bio-Plex or Luminex validation kit. Labs E and F used electrochemiluminescence instruments (Sector Imager 2400; Meso Scale Discovery, Gaithersburg, MD) and the Th1/Th2 10-plex MSD assay kit; lab E also used a singleplex MSD assay to measure interleukin-6 (IL-6) since this was considered an analyte of particular interest (Table 1). Each assay was performed strictly according to the manufacturer's protocol for serum or plasma samples, utilizing recommended sample dilutions and standard curve concentrations, with all samples and standards assayed in duplicate.
For Luminex assays, thawed aliquots were gently vortexed and then centrifuged at 13,200 rpm for 10 min at 4°C immediately prior to testing. The Bio-Rad assay protocol required that serum and plasma samples (and respective standard curves) be prepared differently. Therefore, for all Luminex kits, serum and plasma samples were assayed on separate plates. Each sample was tested using two different lots of each kit in two independent assays at least 1 month apart. Complete data were available, with the following exceptions: (i) no serum data from the BioSource kit by lab B (operator error), (ii) no serum data from the Bio-Rad kit in lab C (kit failures), and (iii) no plasma data from lot 2 of the Bio-Rad kit in lab D (reagent was missing from kit). Luminex data were analyzed using Bio-Plex Manager software, version 4.1 (Bio-Rad), and a five-parameter logistic curve fit, with the exception of BioSource and Linco data from one laboratory (lab C), which utilized STarStation software (Applied Cytometry, Sheffield, United Kingdom).
In MSD assays, samples were tested on two different kit lots in lab E but were tested twice on the same Th1/Th2 kit lot in lab F. Electrochemiluminescent data were analyzed with a four-parameter logistic curve fit using MSD Data Analysis Toolbox, version 3.0, or Discovery Workbench, version 3.
IL-6 concentrations in previously unthawed aliquots of the same serum and plasma samples were also determined using a high-sensitivity ELISA kit (Quantikine HS human IL-6 immunoassay; R&D Systems, Inc., Minneapolis, MN), according to the manufacturer's protocol. ELISA data were acquired using a Multiskan MCC/340 plate reader and Ascent software (Thermo Fisher Scientific, Waltham, MA) with a 4-parameter logistic curve fit.
All individual raw data and calculated cytokine concentrations for standards and samples were reported by each laboratory to the MACS/WIHS statistical coordinating center, along with the kit manufacturer, lot number, and assay date; data were reported as generated by the respective data acquisition and analysis software programs, without any operator editing. For each cytokine on each assay, laboratories also reported the concentration of the lowest detectable standard, i.e., the lowest standard on the standard curve provided with the kit (prepared as directed by the manufacturer) for which a concentration could be calculated by the analysis program. For MSD data, the lowest detectable standard was required to be above background and to have a coefficient of variation (CV) for duplicate wells of less than 25%. The data underwent quality control review by one of the authors (E.C.B.), and discrepancies were resolved with each of the participating laboratories.
The four data analysis programs reported cytokine concentrations for every sample well for which it was mathematically possible to calculate a value, based on the standard curve generated for each cytokine on that day's assay. For many samples, this resulted in extrapolated values, i.e., a calculated concentration that was less than the lowest detectable standard reported for that assay (after taking sample dilution, if any, into account). Only the Bio-Plex Manager program flagged extrapolated values. Therefore, in order to treat data from all sources consistently, two approaches were taken. In the first approach, the lower limit of quantitation (LLOQ) was defined strictly as the concentration of the lowest detectable standard, and any extrapolated values (i.e., calculated values less than the LLOQ) were considered undetectable (strict data set). In the second approach, the LLOQ was defined as the minimum value able to be calculated on any sample or standard by the analysis programs, so that all calculated values reported by the software were utilized even if they were extrapolated (relaxed data set). For Luminex-generated data, all calculated extrapolated values above 0.01 pg/ml were included; for MSD-generated data, extrapolated values were included only if they were more than 2.5 standard deviations above background.
Cytokine data acquired from sera (36 cross-sectional samples) and plasma (3 individuals, longitudinal samples) were treated as independent data sets and analyzed separately. Utilizing the serum data, the overall frequencies of detectable values for each cytokine on each kit were determined for the strict and relaxed data sets; additional analyses were also performed by examining frequencies by lab and lot and by HIV status. Cytokines with ≥50% of detectable results overall (all labs, all lots) on a given kit were considered to have a majority of detectable results. Because the strict data set excluded data points that would have been reported by three of the four data analysis programs, the relaxed data set was utilized for all additional analyses.
Continuous analyses of serum cytokine concentrations and sources of variability (residual error [RE], between-subject [BS] variance) were performed separately for each kit, using a linear mixed model that included a random-subject effect to account for within-subject correlation, with left censoring at the LLOQ (as described above for the relaxed data set). Since the distributions were quite skewed, the cytokine data were log transformed for analysis; descriptive data are presented as medians and interquartile ranges (25th to 75th percentiles). Models included gender, HIV status, lab, lot, and their interaction (the lab-lot interaction); statistical significance was defined as a P value of <0.05. Although multiple cytokines were included in the models, corrections for multiple comparisons were not performed because the primary interest in this study was the pattern of significance, not individual P values. Therefore, methods for adjusting for multiplicity that focus on individual P values for individual effects and not on an overall pattern were not appropriate. No significant differences were observed on the basis of gender (data not shown). If a lab-lot interaction was significant, no further analyses were performed. If this interaction was not significant, then the model was run without the interaction effect and results were reported for the separate lab and lot effects.
For longitudinal plasma data, day 0 (d0) was defined as the first day with a detectable HIV load (>100 copies/ml). For each multiplex kit, changes in cytokine concentrations were evaluated by comparing the mean concentration of all time points from all three subjects preceding d0 to the mean of those following d0, using a linear mixed model with a random-subject effect to account for repeated measurements on the same subjects and adjustment for lab and lot. For graphical purposes, plasma concentrations across time were plotted for each cytokine on each kit together with a nonparametric smoother utilizing pooled data from all lots, labs, and donors; samples with undetectable values were plotted as 0.1 pg/ml or one-half the LLOQ for that cytokine, whichever was greater.
For comparisons of serum or plasma IL-6 concentrations obtained by the Luminex multiplex or MSD singleplex kit to those from the high-sensitivity ELISA, Pearson correlation coefficients were calculated after a value of one-half of the LLOQ was assigned to all undetectable values. Correlation analyses were performed utilizing all data points from all labs and lots and were then repeated for serum samples with right-censored IL-6 ELISA data, excluding three values of >10 pg/ml, which is the upper limit of quantitation for neat samples on the high-sensitivity ELISA.
The ability of a high-sensitivity assay to detect the low concentrations of cytokine typically found in serum or plasma depends on how the sensitivity of the assay is defined. In the package inserts or technical manuals of the multiplex kits, the manufacturers define and report assay sensitivity on the basis of a lower limit of detection, i.e., the cytokine concentration equivalent to a raw value 2 or 2.5 standard deviations above the raw values of repeated measures of a zero standard or blank sample. In this study, assay sensitivity was evaluated on the basis of two different definitions of the LLOQ, i.e., using the concentration of either the lowest detectable standard (strict data set) or the minimum calculated value (relaxed data set) as the LLOQ. As shown in Table 1, the strict LLOQs were the same or higher than the relaxed LLOQs, but for nearly all cytokines and kits, the LLOQs in the relaxed data set were consistent with (and sometimes even lower than) the manufacturer's reported assay sensitivities.
Relative assay sensitivity was assessed by calculating the frequency of detectable values in the serum samples tested for each cytokine using each kit (see Table S1 in the supplemental material). If the frequency of detectable values was less than 50%, then the median value for that cytokine would be undetectable. Therefore, acceptable overall assay sensitivity for each cytokine on each kit was defined as having detectable results in a majority (≥50%) of the serum samples. Cytokines meeting this criterion are indicated by LLOQs in boldface type in Table 1. The Linco kit had the greatest number of cytokines with acceptable sensitivity in the strict data set (8/13), with one additional cytokine crossing the majority detectable threshold in the relaxed data set. MSD had 5/11 cytokines with acceptable sensitivity in the strict data set and benefited the most from the use of the relaxed data set, with three additional cytokines having a majority of detectable results. The Bio-Rad and BioSource kits, both of which required sample dilutions, had the fewest cytokines with acceptable sensitivity in the strict data set (1/10 and 3/10, respectively), and these results improved only slightly in the relaxed data set (3/10 and 4/10, respectively). Hence, out of the 44 cytokine assays across all kits, 17 assays (39%) in the strict data set and 24 (55%) in the relaxed data set yielded a majority of detectable results. Since there was not a major difference between the strict and relaxed data sets in the pattern of cytokines with a majority of detectable results and in order to be most consistent across different software reporting formats (as described in Materials and Methods), the relaxed data set was used for all further analyses.
When the overall assay sensitivities were evaluated by cytokine, every cytokine except gamma interferon (IFN-γ) had a majority of detectable results on at least one kit, while IL-6, IL-8, IL-10, and tumor necrosis factor alpha (TNF-α) were detected in a majority of serum samples on at least three kits. Since the panel of serum samples used was composed of samples from equal numbers of matched HIV− and HIV+ subjects, the pattern of majority-detectable cytokines was examined according to HIV status. Different patterns between HIV− and HIV+ subjects were seen only for three cytokines in the Linco kit (IL-1β, IL12p70, granulocyte-macrophage colony-stimulating factor [GM-CSF]) and one cytokine in the MSD kit (IL-4); for each of these cytokines, the HIV− subjects had a majority of samples with detectable cytokines, while the HIV+ subjects did not (data not shown).
When the frequencies of detectable results for cytokines were plotted according to lab and lot, it became apparent that the relative sensitivity of some cytokine assays varied considerably from one lot (or assay run) to another (Fig. 1). Among cytokines with a majority of detectable results overall, all three cytokines in the Bio-Rad kit (IL-1β, IL-4, TNF-α), as well as IL-2 in the Linco and MSD kits, showed notable shifts in detection frequencies from one lot to another (Fig. 1). However, most of the other cytokines showed relatively consistent frequencies of detection across labs and lots, in many cases at or near 100%.
Serum cytokine data for each Luminex kit were available from at least two labs for the same two kit lots, which permitted quantitative assessment of kit variability due to lab and/or lot differences. Only those cytokines with a majority of detectable results were included in the analyses. Lab A generated data for all kits and lots and so served as the reference for pairwise comparisons (Table 2). Overall, every cytokine on every kit showed at least one statistically significant lab and/or lot effect. However, IL-8 on the BioSource kit appeared to have the least variability, with only a lab effect of borderline statistical significance. Interestingly, this was the assay that generated the highest median cytokine concentrations (>20 pg/ml). Other BioSource assays showed either a significant lab-lot interaction (IL-6), individual lab and lot effects (IL-1β), or lot-only effects (IL-10). Bio-Rad assays showed lab-lot interactions (IL-1β) or lab and/or lot effects (IL-4, TNF-α), but the cytokine concentrations were extremely low (≤0.1 pg/ml) in many cases, indicating that although the assays had a majority of detectable results, those results were just barely above the detection limit. Linco assays, which had the most cytokines with detectable results, showed significant variability in all of those cytokines, with lab-lot interactions in IL-2, IL-5, IL-7, IL-8, IL-10, GM-CSF, and TNF-α and lab-only effects in IL-6 and IL-13. An example of lab-lot interaction can be seen in the Linco IL-10 data (Fig. 2 A), where the median lot 1 value (12.0 pg/ml) was higher than the lot 2 value (9.8 pg/ml) in lab A, but the lot 1 value (8.9 pg/ml) was lower than the lot 2 value (11.0 pg/ml) in lab B and the values were similar between lots 1 and 2 (7.4 and 8.2 pg/ml, respectively) in lab C. A similar lab-lot interaction was seen in the Linco TNF-α data (Fig. 2B). Interestingly, between the same two Linco lots, median IL-6 values in labs A and B differed by less than 1 pg/ml, while lab C had higher median IL-6 values in lot 1 than lot 2 (7.2 and 4.6 pg/ml, respectively; Fig. 2C). Although these lot or lab differences may be small in absolute terms, they are of the magnitude often seen between groups for these cytokines that are typically found at low concentrations in serum (Table 3 and results below). Figure 2 also illustrates the differences in absolute cytokine concentrations yielded by the different kits for the same samples, especially by the Bio-Rad kit for IL-10 and IL-6, which had few detectable results yet generated some individual cytokine values of >100 pg/ml.
Among the MSD results, only the seven analytes in the MSD Th1/Th2 multiplex panel were included in the analyses for lab effects, since the IL-6 singleplex assay was performed only in lab E. It was not possible to evaluate lab-lot interactions for the MSD multiplex kit, since only lab E used two different lots. However, when results of labs E and F on lot 1 are compared, every Th1/Th2 cytokine showed highly significant lab differences (P < 0.001; Table 2). When the results for lots 1 and 2 within lab E were compared, four of the multiplex MSD assays (IL-2, IL-5, IL-13, IFN-γ) and the IL-6 singleplex assay showed significant lot-to-lot differences (Table 2).
Variability in measurements can be attributed to RE, which is a function of technical and/or measurement error, and to BS variance, which represents biological differences. In order to maximize the ability to discern biological differences, it is desirable to minimize RE relative to BS to have a lower RE/BS ratio. Because of the large number of very low cytokine values obtained, the calculated RE for many cytokines was ≤0.2 pg/ml (see Table S2 in the supplemental material), with only one very high RE/BS ratio (Bio-Rad kit for IL-2) that was attributable to an extremely low frequency of detectable results (17%).
Utilizing all data from all laboratories for cytokines with a majority of detectable results, the Linco kit showed statistically significant differences between HIV− and HIV+ subjects in serum concentrations of IL-2, GM-CSF, and TNF-α (Table 3). Unadjusted median IL-2 and GM-CSF concentrations (Table 3) and adjusted mean concentrations (accounting for lab and lot differences) were lower in HIV+ subjects, which is consistent with published reports suggesting defects in production of these two cytokines as a result of HIV infection (6, 10, 25). In other reports, serum concentrations of IL-2 or GM-CSF were consistently below the level of detection or showed lower median concentrations in HIV+ subjects (7, 11, 14). It is likely, therefore, that the differences detected by the high-sensitivity Linco kit reflect the significantly lower frequency of detectable results in HIV+ than HIV− subjects (for IL-2, 55% versus 78% [P = 0.02]; for GM-CSF, 35% versus 68% [P = 0.007]). In contrast, TNF-α, which was 100% detectable on the Linco assay regardless of HIV status, had higher serum concentrations in HIV+ subjects. This is consistent with the long-established observation of immune activation, especially inflammation, in HIV infection (1, 6, 14). In subanalyses examining Linco data from each laboratory individually, no additional cytokines showed significant differences by HIV status. For IL-2 and TNF-α, one laboratory no longer showed statistically significant differences, while for GM-CSF, two laboratories retained significant differences but the third could not be evaluated due to a majority of undetectable results (data not shown). Bio-Rad, BioSource, and MSD kits showed no significant differences in mean serum concentrations between HIV− and HIV+ subjects (see Table S3 in the supplemental material).
Changes in circulating cytokine concentrations were assessed in three donors who had onset of HIV viremia amid serial donations of plasma (Fig. 3) by comparing the means of pooled plasma data from all time points pre- and postviremia for each cytokine for each kit (Table 4). As a qualitative evaluation, mean lab plasma values and a smoothed curve for each cytokine for each kit were plotted; cytokines with statistically significant postviremia increases using at least one out of three kits are shown in Fig. 4 and and5.5. Three patterns of plasma cytokine responses were observed. In the first pattern, significant postviremia increases in IL-8 (P ≤ 0.02), IL-10 (P ≤ 0.01), and IFN-γ (P ≤ 0.04) were detected by all of the kits in which these cytokines were included (Fig. 4); increased IL-7 (P = 0.03) was also detected by the Linco kit, the only panel to include this cytokine (Table 4). In the second pattern, five cytokines (IL-2, IL-4, IL-5, IL12p70, IL-13) showed no significant postviremia increases in any of the kits and/or were unable to be analyzed due to the large number of undetectable results. Finally, the third pattern showed mixed results for three cytokines (IL-1β, TNF-α, IL-6) which were assayed on all four kits (Fig. 5), as well as for GM-CSF, which was assayed on two kits (Table 4). All four of these cytokines had statistically significant postviremia increases in at least one but not all kits. Interestingly, for these cytokines, there was no kit that detected increases in all four cytokines in question. For example, the BioSource kit was the only one that detected a significant increase in IL-1β (P = 0.03) (Fig. 5A) but did not detect an increase in TNF-α (Fig. 5B). In contrast, neither the Linco kit nor the MSD kit detected a significant increase in IL-1β, but both clearly detected increases in TNF-α (P = 0.01). For IL-6 (Fig. 5C), postviremia increases were seen using all four kits, but the increase was not statistically significant for MSD. This result may be due at least in part to the smaller number of data points generated by the MSD assays in only one laboratory.
Aliquots of the same serum and plasma samples with the same freeze-thaw history were available for a comparison of multiplex assays to ELISAs. However, the volume available was enough to perform only a single ELISA. IL-6, a proinflammatory and B cell-stimulatory cytokine which figures prominently in many studies of immune activation, was chosen for this comparison.
On a high-sensitivity IL-6 ELISA (LLOQ = 0.2 pg/ml) performed by lab A, all serum and plasma samples had clearly detectable IL-6 concentrations (serum concentration range, 1.4 to 26 pg/ml; plasma concentration range, 0.7 to 6.7 pg/ml). Consistent with Luminex and MSD IL-6 serum data (Table 3; see Table S3 in the supplemental material), the ELISA IL-6 serum data showed no significant difference in mean IL-6 concentrations between HIV− and HIV+ subjects (median, 2.9 pg/ml in both groups; P = 0.3). Eleven of the plasma samples had previously been assayed by lab B using the same IL-6 ELISA kit. The correlation between the IL-6 ELISA data obtained in the two laboratories on these 11 samples more than 3 years apart was 0.98 (data not shown).
Among the serum IL-6 data, MSD clearly correlated the best overall with the ELISA (r = 0.72) and Bio-Rad correlated the worst (r = 0.13), with BioSource and Linco showing weak correlations (r = 0.32 and 0.25, respectively; Table 5, see Fig. S1A in the supplemental material). However, when three serum samples with very high serum ELISA values (>10 pg/ml) were excluded from the analyses, only MSD still showed a statistically significant correlation (r = 0.57, P < 0.001). There was no significant difference in correlations between HIV− and HIV+ subjects (data not shown). Surprisingly, among the plasma IL-6 data (all of which were <10 pg/ml on ELISA), a very different pattern was seen (Table 5; see Fig. S1B in the supplemental material). Bio-Rad correlated the best with the ELISA (r = 0.79), followed by BioSource (r = 0.49), with weak correlations in MSD and Linco (r = 0.22 and 0.20, respectively). Some of the weaker correlations for multiplex assays did reach statistical significance, but this is likely due to the large number of data points from all labs and lots included in these analyses (144 to 240 for Luminex multiplex assays, 58 to 72 for the MSD singleplex assay).
The work reported here was designed to evaluate the performance of commercially available high-sensitivity multiplex cytokine assays for possible use in prospective human studies performed at multiple sites over time. The critical assay elements needed in such studies are (i) the ability to detect circulating concentrations of cytokines found in serum and plasma, (ii) reproducibility across users so as to produce comparable results in different laboratories, and (iii) reproducibility across kit lots so as to give comparable results over time and/or large batches of samples.
Different laboratories often have different approaches to immunoassay protocols, data acquisition, and analysis that could impact the cytokine values generated in an assay. In order to make all multiplex cytokine data as comparable as possible, all six laboratories in this study followed a consensus protocol for sample handling and instrument settings (where applicable), performed assays strictly according to each manufacturer's protocol (especially the preparation/concentration of assay standards), and reported data exactly as they were produced by the data acquisition/analysis software associated with each assay instrument. While we recognize that this may not be the exact approach that any one laboratory might take when performing these assays independently, it provided a framework in which to collect data with as little operator-associated influence as possible.
The four high-sensitivity multiplex assay kits evaluated on the Luminex and MSD platforms (Table 1) were selected because they were designed to have an extended range at the low end of the standard curve to enable detection of cytokine concentrations in human serum and plasma. Even with this extended range, many cytokine values were calculated on the basis of extrapolation below the lowest concentration on the standard curve. These extrapolated values were included in a relaxed data set but were excluded in a strict data set limited to the range of the standard curve. When the two data sets were compared to see which cytokine assays yielded a majority of detectable results on a panel of serum samples (boldface data in Table 1), it was somewhat surprising to find that only one or two additional cytokines on each Luminex kit reached that threshold in the relaxed data set. This was true even though several LLOQs dropped by 3-fold or more from the strict to the relaxed data set. This indicated that most undetectable values remained that way even when calculations were extrapolated, giving similar results with either data set. For MSD assays, three additional cytokines crossed the majority detectable threshold in the relaxed data set, but two of these (IL-12p70, IL-13) were analytes that had the highest LLOQs (9.8 pg/ml) in the entire strict data set (Table 1). It is possible that the standards for these particular MSD assays were performing suboptimally at concentrations below 9.8 pg/ml (excluding more than 75% of serum results in the strict data set), while many serum samples with concentrations below this level were actually being detected, as reported in the relaxed data set.
Four cytokines (IL-6, IL-8, IL-10, TNF-α) emerged as generally having good detectability in serum samples on the high-sensitivity multiplex assays (Table 1; Fig. 1). IL-8 was detected in 95 to 100% of serum samples by all three of the kits in which it was included (BioSource, Linco, MSD). For IL-6, IL-10, and TNF-α, which were included on all four kits, there was always one kit that did not have a majority of detectable results (Bio-Rad for IL-6 and IL-10, BioSource for TNF-α). The requirement for sample dilutions for the Bio-Rad (1:4) and BioSource (1:2) kits appeared to reduce the likelihood of detectable results, with the notable exception of IL-1β on BioSource, which was the only kit to detect this cytokine in nearly 100% of serum samples (Fig. 1). Interestingly, comparison of standard-sensitivity, multiplex bead-based kits for detection of IL-1β also found that the BioSource kit was more sensitive than the other Luminex kits tested for this particular analyte (15). For other analytes, it is possible that it is not simply the sample dilution but, rather, some other component(s) of the Bio-Rad and BioSource assays (such as the proprietary monoclonal antibody pairs) that makes them less sensitive. Reduced assay sensitivity is not due to the bead-based nature of the Bio-Rad and BioSource assays, in which antibodies are conjugated to latex beads in solution, since the Linco kit, which yielded the greatest number of cytokines with a majority of detectable results (9/13), utilized the same bead-based technology. The MSD kits (Th1/Th2 multiplex, IL-6 singleplex), in which monoclonal antibodies are immobilized on the bottom of a well, similar to a traditional ELISA, were almost as sensitive as the Linco assays, with 8/11 cytokines having a majority of detectable results. These observations are consistent with prior studies using standard-sensitivity multiplex kits that have shown greater sensitivity of Linco assays than the Bio-Rad assay (4) and greater sensitivity of MSD assays than the BioSource or Bio-Rad assay (8, 21). As pointed out by Fu et al., it is important to note that while the MSD platform may be more sensitive, its well-based technology limits the number of analytes that can be tested using a single sample volume (8).
While the performance of the multiplex kits in detecting circulating concentrations of at least some cytokines was encouraging, concerns arose when comparability across laboratories and/or kit lots was examined (Table 2; Fig. 2). All cytokines included in the analyses showed significant lab and/or lot variability, although one had only a borderline lab effect. The cytokine assay with the least variability was the BioSource IL-8 assay, which also had the highest median values (>20 pg/ml) of any cytokine on any kit (Table 2). It might be expected that cytokines that have serum or plasma concentrations not clustered at the low end of the standard curve would make better targets for multiplex assays. However, less variability was not guaranteed by higher cytokine concentrations, as the other two kits with IL-8 (Linco, MSD) also generated relatively high values but still had significant variability. Similarly, Linco and MSD TNF-α concentrations had median values in the 5- to 20-pg/ml range, yet the values were highly variable across labs (MSD) or lab and lot (Linco). It is hard to discern whether the difficulties in generating comparable data on the same samples are primarily a function of lab operations or instrumentation differences not addressed by our consensus study protocol, differences in the level of experience with kits and instruments, or lot variability due to manufacturing processes (including leakage of Luminex assay filter plates, which was reported by more than one operator). Given the large number of lab-lot interactions, all of these could be contributing factors. In any case, the lab and lot variability observed in this study under carefully standardized conditions may actually be less than what might be experienced in independent laboratories under real-world conditions.
Taken as a whole, the results of this study raise concerns about the utility of the assays tested in multiple sites and/or at multiples times in prospective epidemiologic studies. These considerations are not unique to multiplex assays, as prior studies focused on ELISAs demonstrated that the absolute concentration of cytokines measured by ELISA varied depending on which lab performed the testing, but trends in cytokine concentrations were consistent across multiple labs (5). Strategies to address assay reproducibility could focus on including an aliquot of a pedigreed control sample of human serum or plasma to be run on each assay or testing cytokines singly and randomly retesting 15 to 20% of samples, as has been suggested for radioimmunoassay and enzyme immunoassay testing (1).
Since the panel of serum samples tested in the multiplex assays was composed of samples from age-, gender-, race-, and time-matched populations of HIV− and HIV+ men and women, we hoped to evaluate whether multiplex assays detected differences in cytokine concentrations as a function of HIV status. The Linco kit demonstrated three modest differences (decreased IL-2 and GM-CSF and increased TNF-α in HIV+ subjects; Table 3) that are consistent with published reports (1, 6, 10, 25). No significant differences were seen for any other Linco cytokines or for any cytokines with the other kits, suggesting either that there were no other cytokine differences between these two groups or that the differences were small enough to be obscured by other sources of variability within the assays included in this study. Two of the participating laboratories recently described cytokine perturbations associated with very early HIV viremia, with many of the changes detectable by high-sensitivity cytokine assays (23, 30). This suggested that plasma cytokine concentrations before and after initial HIV viremia might be an appropriate context in which to evaluate the ability of high-sensitivity multiplex assays to characterize the immunologic changes in this study. Although there were some differences in absolute values and LLOQs, the same general plasma cytokine patterns were seen across all six laboratories when using the same kit (Fig. 4 and and5).5). Also, consistent with the serum assays, the BioSource IL-1β assay was clearly superior to the other three assays, as it was the only one that demonstrated a significant increase in this cytokine following initial HIV viremia (Table 4; Fig. 5A). IL-8, IL-10, and IL-6 assays (Fig. 4 and B and and5C,5C, respectively) showed significant increases postviremia in at least three different kits, while significant TNF-α increases were demonstrated (Fig. 5B) in two kits (Linco and MSD, but not BioSource, where the IL-1β increase was seen). Unlike the serum samples, in which IFN-γ was consistently undetectable, in the context of primary HIV infection, significant IFN-γ increases were seen using all four kits (Fig. 4C). Hence, longitudinal plasma studies over the course of initial HIV viremia clearly demonstrated the value of multiplex assays to detect changes in certain cytokines over time, if the change is of sufficient magnitude and samples are assayed at the same time.
Although not a primary focus of this study, the comparison of IL-6 data generated by Luminex or MSD assays to data from a widely used high-sensitivity ELISA highlights some additional technical issues that should be considered when assays are evaluated. As has been reported by others (19), it was expected that the absolute IL-6 values would differ among ELISA, the Luminex assay, and the MSD assay. It was somewhat unexpected, however, to see poor correlations between an ELISA and other assays in most cases and very surprising to see contradictory results in unrelated sets of serum and plasma samples (Table 5). While the good correlations seen among plasma samples in the Bio-Rad and BioSource assays may be attributable at least in part to a higher percentage of detectable IL-6 values in these samples (data not shown), this does not explain the good correlation in serum but weak correlation in plasma by the MSD assay, which had 98 to 100% detectable IL-6 values in plasma and serum sample sets. Other studies have reported generally excellent agreement between ELISA and Luminex-based tests for relatively high concentrations of serum IL-8 (15) but that correlations for the results of other multiplexed cytokine assays with ELISA results vary depending upon the specific analyte (19, 27). Our observations raise questions beyond the scope of the current study, including the effects of sample matrix (serum versus citrate plasma versus other types of plasma), assay type (multiplex in liquid phase versus singleplex in solid phase versus ELISA), study design (cross-sectional samples from 36 individuals versus repeated measures on 3 individuals), and possibly even HIV status (chronic versus acute infection), although among the serum samples, the correlation results were virtually identical in HIV− and HIV+ subjects (data not shown).
In summary, the Bio-Rad, BioSource, Linco, and MSD high-sensitivity multiplex assays proved capable of detecting circulating serum and plasma concentrations of some cytokines. However, in spite of our best efforts to standardize assay performance across laboratories and lots, they did not appear to be well-suited for large studies involving multiple laboratories or using multiple lots of assay kits. If samples are collected at multiple sites, focusing the performance of multiplex assays in a single laboratory would address potential between-laboratory variability. Similarly, if samples are collected over an extended period of time, careful planning and consultation with kit manufacturers would be required to ensure a large and consistent lot of multiplex kits or the inclusion of control specimens to measure lot-to-lot variation and an analytical strategy to take such variation into account. However, sometimes this simply may not be feasible in order to generate cytokine data in a cost-effective or timely fashion. In the context of smaller studies in a single laboratory or when all samples have been collected and can be run at a single time and place, multiplex kits may be useful and offer the benefit of greatly reduced sample volumes. Since no one kit successfully detected all of the cytokines tested (especially the inflammatory cytokines IL-1β, IL-6, IL-8, and TNF-α) and in at least one case (IL-6) multiplex kits may generate data that cannot be consistently correlated with or compared to data previously obtained by ELISA, the choice of a particular multiplex kit or the decision to use multiplex assays at all needs to be tailored to each study, its resources, and the question(s) that it is designed to address.
Data and samples used in this study were collected by MACS, with centers [principal investigator(s)] at The Johns Hopkins Bloomberg School of Public Health (Joseph B. Margolick, Lisa P. Jacobson), Howard Brown Health Center, Feinberg School of Medicine, Northwestern University, and Cook County Bureau of Health Services (John P. Phair, Steven M. Wolinsky), University of California, Los Angeles (Roger Detels), and University of Pittsburgh (Charles R. Rinaldo) and by the WIHS Collaborative Study Group at New York City/Bronx Consortium (Kathryn Anastos), Brooklyn, NY (Howard Minkoff), Washington, DC, Metropolitan Consortium (Mary Young), The Connie Wofsy Study Consortium of Northern California (Ruth Greenblatt), Los Angeles County/Southern California Consortium (Alexandra Levine), Chicago Consortium (Mardge Cohen), and the Data Coordinating Center (Stephen Gange).
MACS is funded by the National Institute of Allergy and Infectious Diseases (NIAID), with additional supplemental funding from the National Cancer Institute (NCI; UO1-AI-35042, UL1-RR025005 [GCRC], UO1-AI-35043, UO1-AI-35039, UO1-AI-35040, UO1-AI-35041; website located at http://www.statepi.jhsph.edu/macs/macs.html). The WIHS is funded by NIAID, the Eunice Kennedy Shriver National Institute of Child Health and Human Development, NCI, the National Institute on Drug Abuse, the National Institute on Deafness and Other Communication Disorders, and the National Center for Research Resources (UO1-AI-35004, UO1-AI-31834, UO1-AI-34994, UO1-AI-34989, UO1-AI-34993, UO1-AI-42590, UO1-HD-32632, UCSF-CTSI UL1 RR024131). This work was also supported by the UCLA Older Americans Independence Center (OAIC) and the OAIC Inflammatory Biology Core Laboratory (NIH/NIA P30-AG028748), the NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI; AI-067854), and the UCSF-GIVI Center for AIDS Research (P30-AI027763). P.B. is a Jenner Institute Investigator (United Kingdom).
This work is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
We thank the MACS and WIHS participants for their ongoing participation that make this and many other studies possible and Norr Santz, Susanne Yoon, Djeneba Dabitao, Amritha Ramakrishnan, and Matt Matheson for technical support and assistance with manuscript preparation.
†Supplemental material for this article may be found at http://cvi.asm.org/.
Published ahead of print on 22 June 2011.