We report on the development of an internationally comparable quality assurance methodology that is traceable to ISs. An accurate and internationally comparable HPV DNA detection and typing methodology is an essential component in the evaluation of HPV vaccines and in effective implementation and monitoring of HPV vaccination programs. Standardized methodology for evaluation of laboratory performance is fundamental to enable any comparison of the methodologies used in laboratories worldwide. The major tools for achieving progress toward this goal are developing international biological standards and preparing and validating proficiency panels to qualify methods. The current study has established that such international proficiency panels with units traceable to ISs can be used in global studies. We have also demonstrated that such studies provide a unique overview of the status of the HPV detection and typing methodologies that are being used globally and how well they perform in different laboratories.
Overall, it can be said that a majority of laboratories in this study had good performances of their HPV DNA-typing tests. However, some limitations were revealed.
There was a clear tendency toward systematically different limits of sensitivity for different HPV types, e.g., HPV16 and HPV18 were the types detected at the smallest amounts in most data sets (only 1 and 3 data sets, respectively, could not detect 500 IU/5 μl), whereas HPV52, HPV59, and HPV56 could not be detected at the 500-GE/5 μl concentration by 25, 19, and 18 data sets, respectively. Thus, many surveys of circulating HPV types might systematically underestimate the prevalence of HPV52, -56, and -59 compared to HPV16 and -18.
There was also a tendency toward lower sensitivity of tests when multiple HPV types were present. In the samples containing multiple HPV types, between 50% and 73% of the data sets could correctly identify the types present, but in samples with only 1 HPV type present, an average of 84% of HPV types could be identified without false-positive results. This tendency would cause a systematic underestimation of the prevalence of multiple infections and would introduce a systematic detection bias in epidemiological studies, with detectability being dependent on determinants of HPV acquisition (e.g., a given HPV type would be more difficult to detect in high-risk groups because of the higher likelihood of other HPV infections).
There were a surprisingly large number of false-positive results reported, with only 34/80 data sets being 100% specific. The proficiency panel contained only 1 entirely HPV-negative sample. The present study was designed primarily to evaluate HPV typing (rather than mere HPV detection), and we considered that in this context specificity should be measured primarily as an absence of detection of a specific HPV type, including when other HPV types were present. Thus, for each HPV type evaluated, there are at least 39 negative samples included in the panel, and 1 false-positive result thus equals >97% specificity. There was only 1 indication of a systematic mistyping (some Linear Array-based data sets reported HPV56-containing samples as positive for HPV66), but otherwise, there was no single sample that had systematic false positivity for the same type in several laboratories. These very common false positives are therefore not associated with the panel or with the assays used but rather appear to result from the laboratory environment and performance. Considering the deleterious consequences that a false-positive result may have, it appears that a substantial effort for increased specificity of testing is warranted.
On the other hand, there were some needs for improvement of the proficiency panel itself that were identified by this study. The HPV39 plasmid used in the panel was cloned into the vector at the binding site of one of the most commonly used PCR primers (PGMY). All assays using the PGMY primer system, including Linear Array and CLART, could not detect the HPV39 plasmid in the panel. As this was because of the way the plasmid was constructed, all these data sets were considered not to have been evaluated for HPV39 in this study.
The plasmid used to test for HPV68a was not full length but contained only the L1 gene. We noted that Linear Array and all other PGMY-based assays that are indeed directed against L1 could not detect the HPV68a plasmid. Comparison of the sequences of HPV68a and HPV68b (ME180 isolate) showed significant differences in the sequence corresponding to the PGMY primer binding site. As the sequence of HPV68b was published before the sequence of HPV68a, it appears that these systems are designed to detect only HPV68b (
11,
14). All data sets reporting the use of primers directed to genes other than L1 or that used the PGMY primers were considered not to test for HPV68 in this study. Accordingly, only 29 data sets could be analyzed for detection of HPV68a and only 11 of the 29 laboratories (38%) could detect HPV68a. For the next WHO HPV LabNet proficiency panel, HPV39 will be recloned to change the cloning site, and full-length genomes of both HPV68a and HPV68b will also be included.
The Linear Array cannot exclude HPV52 when the sample is positive for HPV33, HPV35, or HPV58. Some laboratories have developed a type-specific PCR for HPV52 to test HPV33-, -35-, and -58-positive samples, whereas some laboratories (4/15) scored all sample with multiple infections containing HPV52 as negative for HPV52 (
4,
23). This resulted in their being regarded as not proficient for HPV52 in this study. Four data sets generated using Linear Array were considered not proficient, since they reported 2 or even 3 false-positive results. HPV66 was detected as falsely positive in 7 of 15 false-positive results submitted in the 15 data sets using Linear Array; 6 of these samples contained 500 GE of HPV56 that was correctly identified. The detection of HPV66 in these samples was not reported by any other assay, indicating that the false detection of HPV66 in HPV56-positive samples is a problem that is commonly seen with the Linear Array assay.
For two commercial tests (InnoLiPA and CLART), 4 out of 6 data sets were not proficient because of too many false positives. InnoLiPa could not identify HPV52 in 5 of 6 data sets. On the other hand, HPV52 was reported in 9 samples where it was not present. The numbers of false-positive samples reported by InnoLiPA were between 3 and 5 for the 4 laboratories that were not proficient. Three laboratories using CLART reported 7, 17, and 21 false-positive results, some with more than 3 false positives in each sample. Four laboratories using CLART could not detect HPV56 and -45 in samples with multiple types. There was no consistent false positivity for any specific sample for these two assays. The false positivities for these assays appeared to be randomly distributed among the samples and were always different for the different laboratories, indicating that the problem is not related to the assay kit itself. Indeed, there were examples of several laboratories that had completely proficient results using these assays.
A major conclusion of the present study is that differences in performance were much larger between laboratories than between different types of assays. Proficiency panel testing is particularly useful to stimulate a learning process for improved performance in laboratories. Once regular feedback on proficiency testing results is implemented, improvement of performance usually follows rapidly. An example of this was the results of the PGMY-Lineblot assay that was recently set up in the HPV LabNet. Several laboratories that were using this assay for the first time had suboptimal results but became proficient in a subsequent proficiency test performed when there had been more time for practice.
The 2 samples for evaluation of the DNA extraction step before the HPV testing and typing had a surprisingly low proportion of correct results. The sample containing 2,000 cells of the cervical cancer cell line SiHa with about 1 copy of HPV16 per cell (i.e., a total of 2,000 IU of HPV16/5 μl) was detected in only about one-third of the data sets. Also, a large number of data sets (six) reported false-positive results for the sample containing an HPV-negative human cell line. This indicates that low yield in the DNA extraction step, potentially reducing sensitivity, as well as contamination in the DNA extraction step may be significant problems in the field of HPV DNA testing. Future proficiency panels will contain a larger set of samples designed to specifically evaluate the DNA extraction step before the actual HPV testing and typing.
There are additional steps in the laboratory detection process that are not evaluated by the present strategy, notably, sampling technique, handling and storage, natural variability of circulating virus strains, PCR-inhibiting substances, and naturally occurring genome modifications (e.g., integration and rearrangement). The HPV LabNet has chosen to perform quality control for these aspects of testing by launching a confirmatory testing scheme, where part of the clinical samples being tested are annually submitted to a higher-level reference laboratory for retesting (
5). The alternative strategy, to include clinical samples in proficiency-testing schemes, was not chosen because of the need to have exactly reproducible panels with defined content that can be used by hundreds of laboratories over many years and since confirmatory testing schemes were considered to better reflect the actual testing being done.
It should be emphasized that the current proficiency panel study was designed to evaluate the performance of HPV testing and typing tests used in HPV vaccinology and HPV surveillance but not for evaluation of HPV tests used in cervical cancer screening (
12). The demands on performance of HPV-typing assays vary depending on the purpose of the testing. In vaccinology, high sensitivity is needed for clinical vaccine trials, as failure to detect prevalent infections at entry may result in apparent vaccine failures. In contrast, the clinical HPV-associated diseases, such as high-grade cervical intraepithelial neoplasia, typically contain larger amounts of virus, and cervical-screening programs using HPV testing do not have as high demands on sensitivity (
12). Guidelines for evaluations of such tests have recently been published (
12).
In conclusion, we found that global HPV DNA proficiency studies are both feasible and informative. The launch of an internationally standardized methodology to analyze the specificity and sensitivity for different HPV-typing assays (as well as the performance of participating laboratories) to correctly identify the 16 HPV types that are most important in HPV surveillance and vaccinology is likely to greatly enhance the quality and comparability of studies in these fields.