We report on a reproducible, internationally comparable quality assurance methodology traceable to ISs. The methodology for evaluation of laboratory performance needs to be standardized, in order to enable accurate comparison of the methodologies used in laboratories worldwide.
The current study has established that repeated issuing of international proficiency panels containing known amounts of virus plasmids with unitage traceable to ISs can be used to follow the development of the HPV typing methodologies for vaccinology that are being used globally and how robust they are when performed in different laboratories.
Overall, a majority of HPV DNA typing methodologies used by laboratories participating in this study had a proficient performance according to the established criteria. However, some limitations were revealed.
The 2008 study findings that there were systematic differences in the sensitivity to detect different HPV types remained in 2010. For example, HPV-16, HPV-11, and HPV-18 were still the types detected at the smallest amount in most data sets (only 3, 9, and 11 data sets, respectively, could not detect 500 IU/5 μl), whereas HPV-39, HPV-59, and HPV-56 could not be detected in the 500-GE/5-μl amount by 41, 37, and 32 data sets, respectively. This suggests that many surveys of circulating HPV types systematically underestimate the prevalence of HPV-39, -56, and -59 compared to that of HPV-16 and -18. As also found in 2008, HPV-52, -56, and -59 were the types most difficult to detect.
Correct typing of samples containing multiple HPV types was reported in 44% to 78% of the data sets, in comparison to an average of 86% when only 1 HPV type was present in the sample. A lower sensitivity in samples with multiple types was also seen in the 2008 study. The underestimation of the prevalence of multiple infections will introduce a systematic detection bias in epidemiological studies, with detectability being dependent on determinants of HPV acquisition. Some high-risk HPV types will thus be more difficult to detect in patients in high-risk groups, because of a higher likelihood of multiple HPV infections.
There was a rather large amount of false-positive results reported, with only 71/132 (54%) of the data sets being 100% specific. This is a small, but noteworthy, improvement compared to the results in 2008, when only 42% (34 of 80) of the data sets were 100% specific.
The proficiency panel contained only 2 entirely HPV-negative samples. The study was designed to evaluate HPV typing, and we considered that in this context specificity should be measured primarily as absence of detection of a specific HPV type when other HPV types are also present. Thus, for each HPV type evaluated, at least 38 negative samples were included in the panel, and 1 false-positive result thus equals >97% specificity.
We searched the data sets for patterns of consistent false positivity for any specific sample in the panel. The false-positive results appeared to be essentially randomly distributed among the samples, indicating that the problem with false positives is usually not related to a property of the assay itself (e.g., cross-reactivity) but rather is related to the laboratory conditions of use (e.g., contamination).
A systematic false positivity was found for the samples that contained the HPV-58 plasmid, where 15 data sets also detected HPV-52 in at least one of the HPV-58-containing samples. This could be related to the fact that both the Linear Array and InnoLiPA assays state that these tests cannot exclude HPV-52 detection in samples that contain HPV-58. Most of the HPV-52 detections in the HPV-58-positive samples were generated using the SPF10 primers used in InnoLiPA, but there were also other assays, including HPV-52 type-specific PCRs. As HPV-52 and HPV-58 are closely related viruses, it is conceivable that several assays could have problems with distinguishing these HPV types. However, it should also be considered whether these samples could have been contaminated in the proficiency panel itself. There were no less than 94 data sets from laboratories proficient to detect HPV-52 in the lowest dilution that did not report this false HPV-52 positivity in these samples, and several of them used the same assays as those reporting the false HPV-52 positivity, suggesting that a general proficiency panel contamination is unlikely as an explanation.
Some needs for improvement of the proficiency panel itself were identified by this study. The commercial test Papillocheck, used by 4 laboratories, uses primers directed to the E1 gene. Since the plasmid used for HPV-18 is cloned at one of the primer binding sites in E1, this assay cannot detect the HPV-18 plasmid and was considered to have not tested for HPV-18 in the study. The plasmid used to test for HPV-68a was not full length but contained only the L1 gene. We noted in 2008 that Linear Array and all other PGMY-based assays that are indeed directed against L1 could not detect the HPV-68a plasmid. In this new panel, a plasmid containing HPV-68b was included in addition to HPV-68a (
18,
23). All data sets reporting usage of primers directed to genes other than L1 or that used the PGMY primers were considered to have not tested for HPV-68a in this study. Accordingly, only 61 data sets could be analyzed for detection of HPV-68a. Still, only 17 of these laboratories (28%) could detect HPV-68a. In order to allow detection systems with targets outside L1, full-length genomes of HPV-68a will be included in the next panel.
The most commonly used commercial assay, Linear Array, used to generate 17 data sets, cannot exclude HPV-52 when the sample is positive for HPV-33, HPV-35, or HPV-58. In the 2008 study, 4/15 laboratories scored all samples with multiple infections containing HPV-52 as negative for HPV-52. In 2010, no laboratory scored HPV-52 as negative in multiple infections containing HPV-52, and all laboratories using Linear Array were proficient in detecting HPV-52. Six data sets generated using Linear Array reported between 2 and 10 false-positive results and were considered not proficient. Among the 31 total false-positive results submitted for the 17 data sets using Linear Array, 11 were false positive for HPV-66. Ten of these 11 false-positive detections were in samples that contained HPV-56. This confirms the observation already made in 2008 that the Linear Array assay is prone to false detection of HPV-66 in HPV-56-positive samples.
For the commercial test InnoLiPA, 9 out of 12 data sets reported between 2 and 8 false-positive results. Fifteen out of the 42 false-positive results reported were for HPV-52 detection in samples with HPV-58 plasmids, as described above, and four data sets detected HPV-52 in samples that contained HPV-68b. The other false-positive results appeared to be randomly distributed among the samples and were always different for the different laboratories.
Four of eight laboratories using the assay CLART HPV 2/3 submitted data sets with between 2 and 4 false-positive results. This is a major improvement compared to the study results in 2008, when 3 laboratories using this assay reported 7, 17, and 21 false-positive results, respectively, with some having more than 3 false positives in each sample. This indicates that the previous problem with low specificity is not related to the assay kit itself, and there are also examples of several laboratories that had completely proficient results using this assay.
The line blot assay PGMY-CHUV is described in the WHO HPV laboratory manual (
36). The assay was developed within the WHO HPV LabNet (
9) in order to provide an inexpensive assay that would be independent of any specific commercial vendor. The 6 different laboratories in 4 different continents that had used this assay generally had good results, with no false-positive results and 4/6 laboratories being fully proficient, supporting the suggestion that this assay is suitable for standardization and technology transfer.
As was also found in our previous study (
6), differences in performance were much larger between laboratories than between different types of assays. Proficiency panel testing is thus particularly useful to stimulate a learning process for improved performance in laboratories.
Three samples were included in the panel to evaluate the DNA extraction step before the HPV testing and typing. These contained cells from the cervical cancer cell line SiHa in a background of the HPV-negative cell line C33A to mimic a clinical sample. SiHa cells have about 1 copy of HPV-16 per cell, and HPV-16 was correctly identified in samples with 2,500 cells/5 μl in 83% of the data sets. This is a major improvement compared to the results obtained in 2008, when only one-third of the data sets could detect 2,000 IU of HPV-16/5 μl. In the sample containing only the HPV-negative cell line, 12 data sets reported false-positive results, and in total, 21 false-positive results were reported in the 3 extraction samples. This suggests that, for a noteworthy minority of laboratories, contamination in the DNA extraction step is an issue.
The HPV LabNet has chosen to perform proficiency testing using a panel of HPV plasmids since this material can be used to generate exactly reproducible panels with defined content in quantities that can be distributed to hundreds of laboratories over many years. The use of clinical samples in proficiency panels does not allow the same reproducibility over time. To assess the additional steps in the laboratory detection process that are not evaluated by the current proficiency panel, e.g., evaluation of the sampling technique, handling, and storage and for the presence of PCR-inhibiting substances, the HPV LabNet instead performs quality control by a confirmatory testing scheme. Participating laboratories annually submit a part of their clinical samples tested to a higher-level reference laboratory for retesting (
5).
This was the second HPV DNA proficiency panel issued by HPV LabNet that was open for testing by participants worldwide. The number of participating laboratories almost doubled, from 54 laboratories in 2008 to 98 laboratories in 2010. This increased participation in the study shows that many laboratories are interested in quality assurance for their assay methodologies and laboratory performance. Comparing the results of the laboratories that tested both the 2008 and 2010 WHO HPV DNA proficiency panels, we observed only marginal overall improvements. Among laboratories that used the same assay in both years, 27% were proficient in 2008, whereas 30% were proficient in 2010. However, there are several noteworthy examples of laboratories that achieved major improvements. We also saw a strong trend toward increased sensitivity of assays. For example, among the laboratories using the same assay in 2008 and 2010, 50 IU of HPV-16 could be detected by all (100%) laboratories in 2010, whereas 86% of laboratories could detect 50 IU of HPV-16 in 2008. However, for several laboratories, the increased sensitivity was accompanied by increased amounts of false-positive results, resulting in nonproficiency. We suggest that recommendations for HPV laboratory testing include an increased emphasis on the use of negative controls in the assays. Furthermore, we suggest that the requirements for proficiency in future proficiency panels announce at the outset that proficiency requires no false positives at all.
The demands on sensitivity of HPV typing assays vary depending on the purpose of the testing. The WHO HPV LabNet proficiency panels are designed to evaluate the performance of HPV typing tests used in HPV vaccinology and HPV surveillance. In vaccinology, high analytical sensitivity is needed, as failure to detect prevalent infections at trial entry may result in false vaccine failures in vaccination trials. It should be noted that the HPV tests used in cervical cancer screening programs have different requirements for evaluation, since for that purpose, only HPV infections associated with high-grade cervical intraepithelial neoplasia or cancer and not those transient HPV infections that do not give rise to clinically meaningful disease are relevant. Since the latter are characterized by low viral loads, HPV screening assays do not have demands on analytical sensitivity that are as high (
19).
In conclusion, we find that the use of global HPV DNA typing proficiency panels for validating different HPV DNA tests and laboratories promotes the comparability of data generated from different laboratories worldwide. Regularly issued global HPV DNA typing proficiency panels that allow comparison of global results over time will be required for the continuing work toward international standardization and quality improvement of HPV DNA typing results worldwide.