|Home | About | Journals | Submit | Contact Us | Français|
While viral load testing has gained widespread acceptance, a primary limitation remains the variability of results, particularly between different laboratories. While some work has demonstrated the importance of standardized quantitative control material in reducing this variability, little has been done to explore other important factors in the molecular amplification process. Results of 185 laboratories enrolled in the College of American Pathologists (CAP) 2009 viral load proficiency testing (PT) survey (VLS) were examined. This included 165 labs (89.2%) testing for cytomegalovirus (CMV), 99 (53.5%) for Epstein-Barr virus (EBV), and 64 (34.6%) for BK virus (BKV). At the time of PT, laboratories were asked a series of questions to characterize their testing methods. The responses to these questions were correlated to mean viral load (MVL) and result variability (RV). Contribution of individual factors to RV was estimated through analysis of variance (ANOVA) modeling and the use of backward selection of factors to fit those models. Selection of the quantitative calibrator, commercially prepared primers and probes, and amplification target gene were found most prominently associated with changes in MVL or RV for one or more of the viruses studied. Commercially prepared primers and probes and amplification target gene made the largest contribution to overall variability. Factors contributing to MVL and RV differed among viruses, as did relative contribution of each factor to overall variability. The marked variability seen in clinical quantitative viral load results is associated with multiple aspects of molecular testing design and performance. The reduction of such variability will require a multifaceted approach to improve the accuracy, reliability, and clinical utility of these important tests.
Molecular amplification methods have played an increasing role in clinical diagnostics since the PCR was first described in 1981. Over time, qualitative assays have evolved to quantitative methods; the latter have been adapted to fit numerous applications across medicine. Viral load testing in particular has become an integral part of patient care (6, 16, 17), recognized in numerous guidance documents for its role in guiding therapy and predicting disease course (3, 19). Early assays that used endpoint and competitive quantitative methods have given way to real-time determinations, potentially able to provide rapid and highly accurate results. Numerous assays have been developed for diagnostic testing, targeting virtually every known clinically important virus. Testing that was once limited to the research lab has now become a routine part of laboratory medicine and patient care.
Despite these advances, significant challenges remain in the optimal use of this technology. The perceived exquisite accuracy of such methods is belied by the enormous variability seen in quantitative results. While quantitative clinical chemistry assays typically have coefficients of variation (CVs) in the range of 1 to 5%, it is not unusual to see numbers of 30 to 40% or more when viewing real-time, quantitative PCR results for viral targets. This wide fluctuation is seen primarily between different laboratories quantifying the same analyte. Compounding this fluctuation are differences in accuracy, such that log-unit differences in viral load measurements may be seen when split samples are tested by different laboratories (6, 9, 11, 18).
This marked viral load variability has hindered the widespread adaptation of quantitative molecular testing for patient management. In this setting, published case series from individual centers can be valuable in demonstrating the utility of quantitative PCR test strategies, but clinical correlations may be limited to the specific method used. This may account for the lack of viral load interpretive standards for many viral pathogens. Specifically, with the exceptions of those established for HIV and hepatitis C virus (HCV) viral load testing, viral load breakpoints for risk stratification and therapeutic decisions are largely absent.
For HIV and HCV viral load testing, the development of international quantitative standards, the availability of FDA approved tests, the development of quality calibration standards, and other factors have resulted in improved reproducibility, both within and between laboratories. In fact, it has been shown that the use of common calibration materials can lead to a reduction of interlaboratory variability of viral load results (2, 5, 6). This may have marked benefits for patient care. One can surmise that the development of international quantitative standards for other viruses, and the application of those standards throughout the diagnostic and clinical communities, will have a beneficial effect on the outcome of patients infected with these pathogens.
It is clear, however, that the calibration materials used do not account for the entire range of variability seen in viral load testing. The process of virus quantification involves the sequential performance of several individual processes, each of which may contribute to diminished accuracy or precision of results; standardization of each critical process is required to achieve optimal test accuracy and reproducibility.
While individual aspects of assay design and performance have been examined to determine their impact on quantitative results, absent from the literature is a multivariate approach which attempts to assess the relative contribution of different aspects of the assay process to inter- and intralaboratory variability. Proficiency testing (PT) data offer a unique opportunity to view results across a large number of individual diagnostic laboratories, each using its own methodology to measure viral load in shared, common specimens. The present study uses PT data in combination with supplemental data regarding several aspects of laboratory methods to more fully explore the causes of variability in viral load data generated by clinical diagnostic laboratories.
The study population consisted of 193 laboratories out of the 218 laboratories enrolled in the College of American Pathologists (CAP) 2009 viral load survey (VLS-A). The VLS survey included two challenges (specimens) for each of three analytes (cytomegalovirus [CMV], Epstein-Barr virus [EBV], and BK virus [BKV]). The two challenges sent for each viral analyte in the 2009 VLS-A mailing were identical in composition, including viral load. This had the effect of producing duplicate testing results for each of the three viruses. Participants were asked to test all samples using the methodology that they would use for patient specimens. Laboratories that did not have clinical methods in place for testing all three analytes submitted results only for those viruses for which they routinely tested. In addition, participants were asked to respond to a series of questions regarding the characteristics of the methods they used for detection and quantification of those viruses for which they submitted quantitative results. PT materials were mailed to participants on 22 June 2009, and results received by 16 July of the same year were included in this analysis.
Survey challenges were manufactured as 1-ml liquid aliquots (ZeptoMetrix Corporation, Buffalo, NY), and provided in screw-cap vials. The viral particles were treated chemically, using the NATtrol process (proprietary method; ZeptoMetrix Corporation, Buffalo, NY), prior to being spiked into EDTA-preserved plasma for distribution. This process alters viral surface proteins, rendering agents noninfective, while leaving nucleic acids intact. Material was shipped to participants ambient via 2-day delivery. Participants were asked to store materials at 2 to 8°C upon receipt, prior to testing.
At the time of PT result entry, each participating laboratory was asked to respond to 11 multiple-choice questions regarding lab practices for viral load testing. The questions were asked in triplicate, once for each viral analyte. Questions covered all major aspects of testing, including specimen preparation, use of commercially prepared primers and probes (commercial detection reagents), nucleic acid amplification target, calibration curve generation, and result reporting. Individual responses were anonymously correlated to viral load data and compared across participating laboratories to determine associations with results and with result variability (RV) (Fig. 1).
For each factor variable, the difference in mean and homogeneity of variance of viral load readings for subgroups of that variable were tested by one-way analysis of variance (ANOVA) and the Levene's test (1), respectively. Evaluating differences in the homogeneity of variance was undertaken in order to determine if the observed differences in mean were due partially to differing variability between variables rather than true differences in the mean. Repeated measurements per lab were taken from the two observations each lab reported, and standard deviations for between-lab and within-lab variation were estimated by fitting a generalized linear model. As shown by the analysis, within-laboratory variation was very small and negligible. Therefore, the observational unit for further analysis comprised the average of the two observations submitted per target by each laboratory. To determine sources of variation stemming from interaction between variables, means and variances of subgroups generated by combining two variables were evaluated for pairs of variables with significant interaction. To determine major sources of variation that contributed to the overall variation of the viral load measurements for each of the three targets, candidate variables were considered, and identification of those variables and interactions that made significant contributions to the total variation was done by fitting the ANOVA model with backward selection. Only data from labs which answered all questions for any given virus were included in the analysis. Results from laboratories reporting in units other than copies/ml were excluded from the analysis, as were variables for which fewer than 10 laboratories reported results. Findings that did not reach a significance level (P value) of 0.1 were not reported.
The VLS-A response rate was 95.9% (185). Of the 185 laboratories, 165 (89.2%) tested for CMV, 99 (53.5%) tested for EBV, and 64 (34.6%) tested for BKV. Laboratories testing for only one virus comprised 48.6% (90) of the respondents (CMV, 81.1%; EBV, 11.1%; BKV, 7.8%); 25.4% (47) tested for a combination of two viruses (CMV and EBV, 80.8%; CMV and BKV, 12.8%; EBV and BKV, 6.4%), while 26% (48) of the labs tested for all three viruses. Overall descriptive statistics for the 2009 VLS PT survey are shown in Table 1, together with 2009 PT survey results for HIV and hepatitis C virus PCR testing. The data clearly indicate a high degree of variability for all three VLS analytes, with the range of results for most samples extending across greater than three log units. This variability is strikingly greater than that seen for HIV and HCV PT results from the same period.
Intralaboratory absolute variation was smallest for EBV, with a standard deviation of 0.13 log10 (mean of 0.1 log10); this was followed by CMV, with a standard deviation (SD) of 0.14 log10 (mean of 0.09 log10) and BK, with a SD of 0.2 log10 (mean of 0.14 log10).
Quantitative calibrators used in testing for CMV included those manufactured by Acrometrix (29 laboratories), Advanced Biotechnologies or ABI (20), Qiagen (38), and Roche (38) (Fig. 2; Table 2) (see Table S1 in the supplemental material). The CMV mean viral load (MVL) for laboratories using Roche (4.32 log10) was lower than that for the laboratories using Acrometrix (4.87 log10, P = 0.0003), ABI (5.06 log10, P < 0.0001), and Qiagen (5.03 log10, P < 0.0001) calibrators. The RV observed for those using Roche calibrators (SD, 0.28 log10) was also significantly lower than the variability for those using Acrometrix calibrators (SD, 0.71 log10; P < 0.0001) and those using Roche instead of ABI calibrators (SD, 0.59 log10; P = 0.0107); there was no difference in variability between results reported by Roche and Qiagen calibrator users (SD, 0.46 log10).
In testing for EBV, a lower variability was observed when Qiagen (0.30 log10 SD) was used for calibration instead of ABI (0.56 log10; P = 0.008) or Roche (0.47 log10; P = 0.0437); use of the Qiagen calibrator yielded a higher MVL (4.30 log10) than when the Roche calibrator (4.04 log10) was used (P = 0.0302) (Fig. 3; Table 2; see also Table S1 in the supplemental material). Other calibrators used in testing for EBV included those manufactured by Abbott (1 laboratory) and Acrometrix (7 laboratories); however, the numbers were relatively small compared to the number of laboratories using ABI (14), Qiagen (13), and Roche (25) calibrators.
Among laboratories testing for BKV, only 1 laboratory used Acrometrix calibrators, 1 used Roche, and 20 used ABI. The low number of laboratories using Acrometrix or Roche reagents for calibration did not allow for pairwise comparisons.
A total of 119 laboratories reported the use of commercial detection reagents when testing for CMV, while 40 reported the use of noncommercial detection reagents; there were no differences observed in variability or MVL between laboratories using commercially versus noncommercially prepared detection reagents, nor were there significant differences in variability or MVL between individual commercial detection reagent results (Table 2) (Fig. 2; see also Table S1 in the supplemental material).
Of those laboratories testing for EBV, 53 (53.5%) reported the use of commercial detection reagents and 40 (32.3%) reported the use of a noncommercial detection reagent, the latter group showing greater variability (SD, 0.75 log10) than that among those using commercial detection reagents (SD, 0.47 log10; P = 0.0043). Subgroup analysis revealed a higher MVL with Qiagen than that with Roche reagents (P = 0.002) but no other significant differences in either MVL or variability between results for individual commercial detection products (Fig. 3; Table 2; see also Table S1 in the supplemental material).
No significant differences in variability or mean viral load were observed for laboratories testing for BKV when analyzed based on the detection reagent used.
Three methods of nucleic acid extraction (NAE) were used by laboratories testing for CMV or EBV: liquid phase, magnetic bead, or silica membrane/column. Among those laboratories testing for BKV, either the magnetic bead or the silica membrane/column method was used.
Liquid-phase extraction was used by 8.7% of laboratories testing for CMV, whereas 65.1% of the labs used magnetic bead or silica membrane/columns as their NAE method (Table 2; see also Table S1 in the supplemental material). The lowest level of variability and MVL was observed in those using liquid-phase (MVL, 4.20 log10; SD, 0.21 log10) compared to results for silica membrane/column (MVL, 5.16 log10; SD, 0.53 log10) and magnetic bead (MVL, 4.78 log10; SD 0.56 log10) methods. This may have primarily been a result of the relatively small number of laboratories using this method. The MVL was significantly higher for the silica membrane/column method than for magnetic bead users (P = 0.0002) and liquid-phase extraction (P < 0.0001); variability was significantly lower for liquid phase than magnetic bead and silica, with P values of 0.001 and 0.0021, respectively.
The distribution of NAE was similar for EBV and BKV, compared to that noted for CMV above. However, no significant differences were noted in either variability of results or MVL when results for these two viruses were stratified based on NAE.
Differences in MVL were associated with the amplification target gene used for all three viruses; however, little effect on variability was seen (Fig. 2 to to4;4; Table 2; see also Table S1 in the supplemental material). Amplification of the DNA polymerase gene when testing for CMV yielded a lower MVL (4.49 log10) than when any of the following genes were targeted: glycoprotein B (4.82 log10; P = 0.0359), immediate-early (5.16 log10; P < 0.0001), major immediate-early (4.79 log10; P = 0.0269), and other (5.02 log10, P < 0.0001) genes. Targeting the immediate-early gene resulted in a higher MVL than targeting either glycoprotein B or the major immediate-early genes. Targeting of the EBV nuclear antigen (EBNA) gene in EBV assays resulted in a lower MVL than that for DNA polymerase (P = 0.0296) or LMP-2 (P = 0.0022) genes, and targeting the large T gene in BKV assays resulted in a lower MVL than when the VP-1 gene was targeted (P = 0.0132).
Neither the frequency of calibration (every run versus periodically) nor the method of pipetting (manual versus automated or robotic) was correlated with a difference in MVL or variability for any of the three viruses (Table 2; see also Table S1 in the supplemental material). No significant difference in variability was seen whether hydrolysis (TaqMan) or dual hybridization (fluorescence resonance energy transfer [FRET]) probes were used, although TaqMan probes were associated with a somewhat lower MVL (4.88 log10) than FRET probes (4.93 log10) for BKV. A higher MVL was significantly associated with the use of TaqMan over FRET probes when testing for CMV (5.05 log10 versus 4.62 log10) and EBV (4.34 log10versus 4.09 log10), with P values of <0.0001 and 0.0487, respectively. While extraction of calibrators prior to use when testing for CMV was not associated with a different MVL, it was found to have a significantly higher variability than when they were not extracted (P = 0.0187); differences in MVL or variability were not observed for the remaining targets. Results from those using real-time PCR as the detection method for CMV showed both a higher MVL and a higher variability than results from those using enzyme immunoassay (EIA), with P values of <0.0001 and 0.0007, respectively; no significant differences were observed for the other two targets. These results, however, may be due in part to the small number of labs using methods other than PCR; 85.4% of those testing for CMV, 99% for EBV, and 98% for BKV reported the use of real-time PCR.
Further stratifying viral load measurement by two effects, differences in variability and viral load caused by the interaction of the individual effects pairs of variables were evaluated (see Table S2 in the supplemental material). Interactions were observed most often in CMV data, though there were also significant interactions in EBV and BKV testing. Interaction of two variables is a positive or negative synergetic effect in additional to the sum of main effects of the two variables. Notably, some variables not showing any significance in pairwise analyses did demonstrate interaction with other variables, by which those combined factors may affect MVL and/or variability of viral load results.
Changes in CMV MVL could be explained by interactions observed when Acrometrix or ABI calibrators were paired with commercial detection or noncommercial detection reagents (P = 0.0191). MVLs observed when ABI calibrators were used in conjunction with commercial detection reagents (5.53 log10) were higher than those obtained when ABI calibrators were used in conjunction with noncommercial detection reagents (4.89 log10). MVLs observed when Acrometrix calibrators were used with commercial detection reagents (4.88 log10) were lower than those obtained when Acrometrix calibrators were used with noncommercial detection reagents (5.11 log10). For the latter combination, the main effects of two variables (calibrator and detection reagent) were not significant, but their interaction was significant. The use of various calibrators showed interactive effects when combined with calibrator extraction (P value of 0.0001 whether or not calibrators were extracted). Higher MVLs were observed when calibrators were not extracted, using ABI (5.06 log10), Roche (4.41 log10), and Acrometrix (5.58 log10) calibrators; higher MVLs were observed when Abbott (4.93 log10) and Qiagen (5.41 log10) were used and calibrators were extracted. Examining the combination of detection method and calibrator, interactions were found to be significant when the detection method was real-time PCR and when either Roche (4.42 log10) or Acrometrix (5.32 log10) calibrators were used (P < 0.0001), both yielding higher MVLs over those using “other” detection methods. Other interactions involving calibrators used can be found in Table S2 in the supplemental material.
Besides interacting with calibrators used, commercial detection reagents also interacted with other variables, causing significant variation that may explain differences in CMV MVL. These interacting variables included: amplification target gene (P = 0.0013), calibrator extraction (P = 0.0307), calibration frequency (P < 0.0001), method of nucleic acid extraction (P = 0.0022), probe chemistry (P < 0.0001), and the specimen routinely tested (P = 0.0065). For a list of all combinations of variables, along with their MVL, SD, median, minimum, and maximum values, see Table S2 in the supplemental material.
The same combinations of variables that reached significance in CMV were not observed for EBV or BKV testing. Observed significant interactions for both EBV and BKV included specimen type (for EBV, plasma, whole blood, and peripheral blood mononuclear cells; for BKV, plasma and urine) routinely tested together with whether or not the quantitative calibrators were extracted (P = 0.0031 for EBV; P = 0.0178 for BKV).
Three variables and one interaction between variables were found to explain 67.4% of overall variation in CMV viral load results (Table 3). Six variables and three interactions between those variables were found to explain 63.8% of overall variation in EBV viral load results. Five variables accounted for 93.7% of BKV viral load results. Excluding the specimen type usually tested, the greatest source of overall variability when testing for each of the three viruses was the commercial detection reagents used. This variable accounted for 47.1% of CMV result variability (P < 0.0001), 10.8% for that of EBV (P = 0.0003), and 18.2% for that of BKV (P < 0.0001).
Following commercial detection reagents, amplification target gene used made the second highest contribution to overall variability for CMV and BKV, contributing to 9.6% for CMV testing (P < 0.0001) and 17.3% for BKV testing (P < 0.0001). Amplification target gene was also a significant factor in variation of EBV results (7.09%), but the calibrator used appeared to account for a higher proportion of variability (10.6%).
Another source of variation for CMV viral load results was the interaction of commercial detection reagents with the amplification target gene used (7.2%; P < 0.0001). The remaining variables and their interactions could be other sources of variation for EBV; however, their contribution to variation was less than 9%. BKV viral load variation was found to come mostly from the specimen routinely tested (44.1%), with the highest MVL seen for those laboratories that routinely tested urine.
The data here confirm a high degree of variability in viral load results from assays targeting CMV, EBV, and BKV. Compared with results from viral load testing for HIV and HCV, this variability becomes even more apparent. While a limited number of studies have addressed this issue, most have focused solely on lack of standardized quantitative calibrators as a primary reason for differences in precision and accuracy among laboratories (2, 6, 18). This study is perhaps the first to look comprehensively at multiple factors in the complex process of quantitative PCR. The commercial detection reagents, the calibrator, extraction and detection methods, and the target gene selected can all play a role in magnitude and variability of quantitative results.
That the calibrator selected was shown to affect MVL and variability of CMV results is not surprising. Several studies have now shown that different quantitative standards may have dramatic effects on viral load results (6, 18). The consistent results seen for HIV and hepatitis viral load proficiency testing can be, at least in part, attributed to the fact that international quantitative standards have been broadly adopted for these targets (8, 14, 15); such international quantitative standards were not available for CMV, EBV, and BKV at the time of data collection for this study. The recent introduction of a WHO quantitative CMV standard can be expected to help address this issue, and standardization for EBV and BKV is planned (4, 5).
The impact of the detection reagent selected is also consistent with that in various reports in the literature. In fact, the reagent used was consistently ranked as one of the highest contributors to overall assay variability, across all three target viruses in this study. While not the subjects of a systematic review, many detection reagents have shown variability in performance characteristics when different reagents were compared (6, 7, 10, 12, 13). Such differences can be expected to affect both accuracy and precision, as seen in this study. The relatively low number of FDA-approved assays for HIV and hepatitis virus quantification may contribute to the uniformity of results for those targets. This study was not intended as a comparative analysis of commercial detection reagents, and the data shown cannot be used to assess the superiority of one product over another. However, it is clear that not all detection reagents produced equivalent results and that variability of results should be carefully assessed when validating viral load assays for routine clinical practice.
Target gene selection has been the subject of much study. It has previously been shown that target gene selection represents a critical part of the test design process, as polymorphisms vary in frequency depending on genomic region (7, 20). Assays targeting regions of increased sequence heterogeneity may be expected to show effects on sensitivity and on quantitative results. Consistent with this premise, findings here showed that the choice of genetic amplification target affected the MVL for all three targets. A substantial proportion of the overall quantitative variability seen for all three viruses could be accounted for by this factor. While these analyses were sufficient to indicate the importance of genetic target selection as a contributor to changes in MVL and RV, further inferences cannot be made to explain the behavior of individual targets.
Extraction is often overlooked as a key to assay performance. While findings were limited by the fact that most users now rely on silica-based (column/membrane or magnetic bead) extraction, a substantial minority continue to use liquid-phase extraction. These users reported diminished MVL and variability of results. The former may reflect improved yield, purity (removal of inhibitors), or both in the newer, solid-phase systems. The specific findings here may be less important than is the underlined importance of sample preparatory methods and their potential contribution to assay accuracy and precision.
Factors lacking an apparent association with changes in MVL or variability may be as revealing as those that did show such an association. Whether or not calibrators were extracted and how frequently they were assayed (periodically or with every PCR run) have been the subjects of much debate. The factors go beyond convenience and effects on variability, as recalibration of every PCR run can substantially impact cost and turnaround time for results. Neither calibrator extraction nor frequency of full quantitative calibration was associated with altered MVL or variability. Similarly, and perhaps surprisingly, the use of automated pipetting devices did not afford any clear benefits in terms of uniformity of results, compared to manual pipetting devices. Although such tools may save time and their value in reducing repetitive motion injuries is unquestioned, they did not contribute to reduced result variability in this study.
This study was limited by the number of survey participants for each analyte. CMV, with the largest number of responding laboratories, demonstrated the most factors reaching statistical significance in their effects on MVL and variability of results. Clearly, had larger numbers of participants tested for EBV and BKV (as well as for CMV), the number and strength of our findings may have been improved. Findings with respect to relative variability with the use of different calibrators may have been biased to some extent by the fact that some calibrators were used only with detection reagents produced by the same manufacturer (for example, Roche and Qiagen). Variability and MVL were examined separately in this study, but they are actually integral to one another. Although differences in MVL may be distinguished when variability is equal, major differences in variability between two groups may artifactually give the appearance of discrepant means. While the proficiency testing material was handled in a uniform manner prior to distribution, and for each virus was the product of a single, homogeneous production lot, some variation among aliquots cannot be ruled out. Furthermore, this study was not designed to measure the effects of preextraction manipulation and storage conditions. Clearly, there are factors involved in the process of viral load testing that either were not queried of participants or could not be adequately measured here, as evidenced by a substantial proportion of variability that remained unaccounted for in the analysis. Finally, although survey questions were designed to be as simple and unambiguous as possible, some imprecision or inaccuracies may have been introduced in participant interpretations and responses.
Future efforts in this area may include larger studies, perhaps isolating each of the variables seen to be important here. The contribution of calibration variability to imprecision and inaccuracy of viral load measurements cannot be underestimated, and the introduction of international standards can be expected to have a significant and beneficial impact. Repeating some of this work in the presence of such standards may further enable the identification of other important factors. These may in turn be optimized, leading to an iterative process for improving the accuracy and reproducibility of these important tests.
The findings here build on the work of others in clarifying the sources of nonuniformity in viral load testing by molecular diagnostic methods. The value gained in identifying such factors may suggest ways that individual users, as well as commercial manufacturers, may work toward systems that give the same results no matter when, where, or how many times a sample is tested. Improvement of test precision and accuracy can ensure portability of results, enable patients to be more accurately monitored as they move to different institutions, allow for consistency in the literature, and allow the establishment of more informative thresholds for therapeutic and prognostic purposes.
The assistance of Mark Merritt of the College of American Pathologists is gratefully acknowledged.
This work was supported in part by the American Lebanese Syrian Associated Charities (ALSAC).
Published ahead of print 23 November 2011
Supplemental material for this article may be found at http://jcm.asm.org/.