During the analysis of a retrospective biomarker study conducted with a set of banked sera from melanoma patients [23
], we discovered a potential correlation between the levels of analytes measured by Luminex and the time that the sera were stored at -80°C. Therefore, we examined several aspects of serum storage and the Luminex assay.
Repeat testing in 2010 of sera stored in 2005
Our first sample set consisted of 23 melanoma patient sera (the "old patients") who had a blood sample drawn in 2005, and had a Luminex assay performed on serum samples, on either 10/31/2005, 11/01/2005 or 2/17/2006; we refer to these as the "early" assays. To determine any changes over storage time, we thawed aliquots (not previously thawed) and tested a subset of the analytes originally tested, again by Luminex (Table ). Unexpectedly, we identified a number of apparent changes in analyte levels. We repeated these measurements up to three times (depending on the number of previously untouched aliquots remaining) for these 23 samples: (2/02/10, 5/13/2010 and 8/11/2010)--the "late" assays. Seven of the 10 analytes we examined had highly significant changes during the approximately 5 years of storage at -80°C.
There were different patterns seen for different groups of analytes, some of which were relatively stable over time (IL-4, change over time: p = 0.28) while others were found to change (IL-10, p = 0.093; GM-CSF, p = 0.11). Levels of some of the analytes decreased over the storage time (IL-6, p = 0.00021; decreasing in 21/23 samples; TNFα, p = 0.0078, decreasing in 20/23). Surprisingly, the IL-8 levels were significantly increased from the initial test to the subsequent tests 5 years later (IL-8, p = 0.000030, approximately 5-fold increased in 23/23 patient samples). MCP-1 levels also increased in a majority of samples (MCP-1, p = 0.00012) (Table /Figure ). Each p-value was computed with a one-sample Wilcoxon test on the ratio of the 5/13/2010 assay result (for which we had the most data) to the result of the early assay.
Representative cytokine and chemokine changes over time. Data are shown for old patients 13 through 23, for cytokines IL-8 and IL-10, and chemokine MCP-1. On a log scale, changes detected between 2005-2006 and 2010 assays are shown.
Healthy donor and melanoma patient serum time course in 2010
To determine whether we could detect similar changes over a period of months, we drew blood from 10 healthy donors (HD, Additional File 2
, Table S2, Table data) and 5 melanoma patients ("new patients") (Additional File 3
, Table S3, Table data). HD samples were tested initially 2 months after processing and freezing, and then twice more, at 5 and 8 months of storage on the same dates as the old patient sample described above. The melanoma patient samples were tested 2 days after processing and cryopreservation, and again 3 months later.
New Melanoma Patient Sera Analysis
As expected, HD samples had low circulating levels of many analytes tested. These HD control samples also showed changes in analyte levels, even after short-term storage. Again, some analytes were stable, others were much less stable. IL-8 increased in 3/10 HD, at the 8 month timepoint (n.s.), but not by 5 months. IP-10 also began to increase in 5/10 HD at 8 months (p = 0.01). Several analytes decreased in the relatively short storage time interval, including IFNγ (p = 0.06 at 5 mo., p = 0.03 at 8 mo., decreasing in 6/10 HD), and MCP-1, which showed the most dramatic decreases in 10/10 donors, by 8 mo. (p = 0.002). These changes, between the first assay and the second and third assays (100 and 190 days apart), are shown graphically in Figure . The melanoma patient samples did not show significant changes within the short storage time, with the exception of MCP-1, which decreased in 5/5 samples within 3 months (p = 0.06). When the ratios of the concentrations of the different analytes measured at different times were plotted together (Figure ), the trends in concentration changes observed were not significantly different between the serum sample data sets (old patients, HD, new patients) (Table , Table , Table ).
Time course of analyte concentrations for healthy donors. Assays done on 5/13/2010 and 8/11/2010 were normalized to those done on 2/02/2010.
Comparison of assay results obtained with sera from healthy donors, new melanoma patients and old melanoma patients. Points are the ratio of concentrations of the assays done on 8/11/2010 normalized to those done on 5/13/2010.
Cytokine Controls used in assays
We purchased our Luminex kits from a single source, however, that source changed ownership between Oct. '05 and Aug. '10 (from Biosource to Invitrogen to Life Technologies). Each kit includes reagents to generate an 8-point standard curve from which all values are determined. For the custom kits we requested, to test a specific array of analytes of interest, the manufacturer pre-tests the specific antibodies together, to confirm lack of cross-reactivity. The manufacturer indicates that the kits are not released unless the following criteria are met: " < 10% cross-reactivity to related recombinant protein at the highest point of the standard curve" (Life Technologies). We requested the specific cross-reactivity testing data performed for the kits we used in this study, but were repeatedly informed that company policy prohibits QC data release to customers.
As an additional control, we included "Multiplex QC" controls, which are complex mixtures of recombinant cytokines, chemokines and growth factors prepared by the manufacturer at 3 concentrations (low, medium and high). We have established the reproducibility of this control (Additional File 4
, Table S4) when tested via Luminex (% CV = 1%-52%, average % CV = 14% for 8 analytes). While the absolute values for each analyte do not exactly match the "expected" value from the QC control manufacturer (R&D Systems), they are similar, and we use a different platform and different antibody clones for detection via Luminex, which may account for those differences (as indicated in the package insert).
We also received WHO cytokine standards for IL-4, IL-8, IL-10 and GM-CSF. These lyophilized cytokine controls were resuspended (Materials and Methods) and individually tested at 1:10, 1:50 and 1:100 dilutions in two replicate Luminex assays for the same ten analytes described above. These data are presented in Table . As expected, the standard under study was almost always detected. However, there were some surprising results. MCP-1 was also almost always detected in addition to the standard, and MIG was always detected when the standard IL-10 was used. The apparent concentrations of these two analytes in some instances exceeded 10% of that of the standard. IL-6, IFN-γ and GM-CSF also showed evidence of minor cross-reactivity.
The apparent cross-reactivity seen for MCP-1 and MIG might be caused by a medium additive present in the AIM V medium (a serum-free lymphocyte culture medium) used in a dilution step for these proteins. We tested several commonly used culture medias (AIM V, RPM1640, Iscoves and CellGenix DC media) in a 30-plex Luminex assay which also included a repeat test of the WHO standards. The results did identify low levels (3-62 pg/mL) of several analytes in the culture medias (HGF, FGF basic, RANTES, IL-17 and IL2R) but not MCP-1 or MIG (data not shown). The MCP-1 was again detected in the IL-8 and GM-CSF WHO standards and MIG in the IL-10 standard (as well as HGF, FGF basic and RANTES). We are investigating other possible sources of low levels of other cytokines and growth factors in the WHO standards.
As a test of the day-to-day reproducibility of two of the cytokines of particular interest, IL-6 and IL-8, a set of samples and controls were run in two different custom kits one day apart (with samples kept thawed, at 4°C overnight), in which both IL-6 and IL-8 were included in both kits. Notably, these two kits also had different standard curves and upper limits of detection. For IL-6, the 10-plex kit upper limit was 7, 400 pg/mL, while in the 8-plex, it was 13, 800 pg/mL (1.8 fold higher). For IL-8, the 10-plex upper limit was 24, 800 pg/mL and in the 8-plex, 10, 160 pg/mL (2.4 fold lower). When the values for the 38 samples were compared between the two kits, the ratio of the IL-6 values was 1.0 (median & mean), showing excellent concordance. For IL-8, where the upper limits were more disparate, the ratio of the values was 0.80, which was a small but significant difference (Figures and ). These data indicate that the assay with the higher upper limit has larger measured values.
Figure 4 Two plates run together compared for A) IL-6 and B) IL-8 values. A set of 38 cell culture samples were run on both an 8-plex and a 10-plex plate. The values for IL-6 and IL-8 are compared on a log scale. Each plate had a unique upper limit. The values (more ...)
Upper limit problem
The Luminex kits that we used at the different time points were not identical. In particular, we noticed that the upper limits of quantitation for individual analytes changed over time for the different kits. In principal, this should not affect the measured concentrations, because the kits include kit-specific standards to generate 8-point standard curves matched to the expected detection range. However, if the concentration determinations were affected, that would confound our interpretation of the observed changes in analyte concentration over time, and therefore we investigated that possibility. Data from assays done on 5/13/2010 ("late" assay) were compared to data from assays on 10/31/2005, 11/1/2005 or 2/17/2006 ("early" assays). Kits used in 2005 and 2006 had the same upper limits, and because no samples had assays done on the same date, results were combined. Figure is a scatter plot of the late-to-early ratio of analyte concentrations versus the late-to-early ratio of assay upper limits assays with a smooth curve is superimposed. The late-to-early ratio of upper limits was different for each of the 10 analytes. Typically, 12 samples were assessed for each analyte. The correlation of the two ratios is highly significant (p < 10-15, Spearman's test). Therefore, we are concerned that assays performed at different times with different kits may not be comparable.
Figure 5 Scatter plot of the late-to-early ratio of analyte concentrations versus the late-to-early ratio of assay upper limits of quantitation; a smooth curve is superimposed. Early assays were done on 10/31/2005, 11/01/2005 or 2/17/2006; late assays were done (more ...)
In this report, we detail reproducibility problems we encountered testing circulating cytokines, chemokines and growth factors by Luminex in serum samples which were stored over months to years under highly controlled conditions. Some of these changes were very dramatic: IL-8 increased 4-6 fold in old patient samples; MCP-1 decreased 4-6 fold in new patient samples, and up to 10-fold in healthy donor samples; IL-10 changed from negative to positive or positive to negative within the same old patient serum dataset (Figure ). Our initial hypothesis was that the changes were entirely biological, and that despite standardized blood handling procedures and temperature-controlled freezer storage, some analytes became unstable over time or upon thaw. Two recent reports testing cytokine stability found most tested cytokines to be stable over 1-2 years at -80°C, and a subset (including IL-8 and IL-10) became unstable after 2-4 years [24
]. Many of the proteins became unstable after repeated freeze-thaw cycles. If these were the only mechanisms, then the analytes we tested should have behaved consistently between our three datasets, because the change would be analyte-specific. This is not the only explanation, because, for example, MCP-1 increased over time in the majority of old patient samples and decreased over time in both HD and new patient sets.
Our study has a number of limitations. The more recently acquired HD and new patient data sets were tested within months of blood draw. A better analysis of the impact of storage time on analyte stability would require a large number of patients and HD samples stored for longer periods with costly repeated multiplex testing. We also limited the diversity of analytes we examined. Another variable was the time from blood draw to serum separation and freezing. Some of our samples were drawn within the laboratory and at our nearby clinic and processed within a few hours, while other old patient samples were shipped overnight and processed the following morning. However, the nature of these blood handling procedures reflects the unavoidable limitations inherent in transferring patient blood from the clinic to a central laboratory capable of standardized processing, as well as for multi-institutional trials where large numbers of patients can be treated and tested, but overnight shipping is required. Lastly, some of our healthy donor and control samples were run in duplicate, but to reduce costs, large numbers of patient sera were run in singlets. Due to the small average % CVs determined for many duplicates (Additional File 1
, Table S1) this may have minimal impact on the trends we observed.
The Luminex assay has been shown (by ourselves [26
] and others [27
]) to show good correspondence to ELISA platform assays. In addition, the Luminex assay has good reproducibility from well-to-well, and from day-to-day (Figure ). Also, our use of the R&D QC controls (Additional File 4
, Table S4) indicate good reproducibility of recombinant analytes when mixed together. This may indicate that the serum matrix may impact reproducibility, and/or the biological impact of a tumor may lead to systemic changes (including altered glycosylation) which impact the assay.
This study also suggests that the changes in the upper limits of detection, which can vary substantially from kit to kit, month to month, and analyte to analyte from a single manufacturer, may impact the ability to determine analyte concentration. This impacts kit-to-kit reproducibility, and greatly increases the importance of comparing samples with the identical lot of kits with identical standard curve ranges. We attempted to dissect this further by requesting access to manufacturer QC data, but we were repeatedly denied access to any additional information specific to the testing performed on the kits we used.
We do not understand why the assay kit upper limits seem to affect assay performance in the systematic way that is evident in Figure . However, we have to conclude that the results of assays done with different kits cannot be directly compared. Therefore, the apparent changes in analyte levels over time that we observe may arise from the kit-to-kit variability: we cannot claim to observe changes in analyte levels over storage time at -80°C.