|Home | About | Journals | Submit | Contact Us | Français|
High-risk human papillomavirus (HPV) DNA/RNA testing provides higher sensitivity but lower specificity than cytology for the identification of high-grade cervical intraepithelial neoplasia (CIN). Several new HPV tests are now available for this purpose, and a direct comparison of their properties is needed. Seven tests were evaluated with samples in liquid PreservCyt transport medium from 1,099 women referred for colposcopy: the Hybrid Capture 2 (Qiagen), Cobas (Roche), PreTect HPV-Proofer (NorChip), Aptima HPV (Gen-Probe), and Abbott RealTime assays, the BD HPV test, and CINtec p16INK4a cytology (mtm laboratories) immunocytochemistry. Sensitivity, specificity, and positive predictive value (PPV) were based on the worst histology found on either the biopsy or the treatment specimen after central review. Three hundred fifty-nine women (32.7%) had CIN grade 2+ (CIN2+), with 224 (20.4%) having CIN3+. For detection of CIN2+, Hybrid Capture 2 had 96.3% sensitivity, 19.5% specificity, and 37.4% PPV. Cobas had 95.2% sensitivity, 24.0% specificity, and 37.6% PPV. The BD HPV test had 95.0% sensitivity, 24.2% specificity, and 37.8% PPV. Abbott RealTime had 93.3% sensitivity, 27.3% specificity, and 38.2% PPV. Aptima had 95.3% sensitivity, 28.8% specificity, and 39.3% PPV. PreTect HPV-Proofer had 74.1% sensitivity, 70.8% specificity, and 55.4% PPV. CINtec p16INK4a cytology had 85.7% sensitivity, 54.7% specificity, and 49.1% PPV. Cytology of a specimen taken at colposcopy (mild dyskaryosis or worse) had 88.9% sensitivity, 58.1% specificity, and 50.7% PPV. Our study confirms that, in a referral setting, HPV testing by a number of different tests provides high sensitivity for high-grade disease. Further work is needed to confirm these findings in a routine screening setting.
High-risk (HR) human papillomavirus (HPV) is a necessary factor for the development of cervical cancer (31), but the presence of HR HPV DNA does not invariably lead to disease. We have shown that the detection of HR HPV DNA provides high sensitivity but has lower specificity than cytology for the identification of high-grade cervical lesions in a screening population in the United Kingdom (9), and this finding has been replicated in several other studies (1, 3, 8, 15–17, 21, 22, 26, 27). In addition, prospective studies have shown that HPV DNA-positive women who do not have disease initially are significantly more likely to develop high-grade squamous intraepithelial lesions (SILs) within 10 years than women with a negative HPV DNA test (4, 12, 14, 18). If testing for HR HPV DNA is to be used as a primary cervical screening test, refinements or additional tests are highly desirable to improve its specificity while retaining its very high sensitivity. Candidates include HPV typing, detection of HPV E6/E7 mRNA tests, and p16INK4a cytology, which have shown higher specificity than HPV DNA testing (10, 19).
The introduction of a liquid-based medium for collection of cytological specimens has allowed other molecular techniques to be evaluated more easily as adjunctive or triage tests (2). We have previously compared a number of different molecular tests in a population referred for colposcopy because of abnormal cytology (7, 29). However, there are now a number of newer tests, which also require evaluation. A number of studies have compared two tests against each other (usually the comparator was the Hybrid Capture 2 [HC2] assay), but as far as we are aware, apart from our previous study, none have simultaneously compared such a wide range of tests using the same sample. The aim of this study was to compare directly the sensitivity and specificity of several tests for the detection of high-grade cervical intraepithelial neoplasia (CIN) using aliquots from the same sample from the same women in a population referred for colposcopy because of abnormal cytology. This population was chosen because the high disease rate allows comparison of the sensitivities of the different tests within a modest sample size. Use of a screening population would require a sample size at least 20 times larger in order to have similar power. All the tests were compared against the “gold standard” of centrally reviewed histopathology.
The study population comprised 1,099 women who had been referred to the colposcopy clinics at the Hammersmith and St. Mary's Hospitals in London, United Kingdom, between September 2007 and October 2009 because of abnormal screening smears (Fig. 1). In England, women are referred for colposcopy if they have cytology showing mild dyskaryosis or worse or three smears showing borderline dyskaryosis. While not a screening population, the advantage was a broad range of outcomes and a high disease rate, which would enable accurate comparisons of sensitivity and specificity in a relatively small sample. Women were eligible if they had been referred as a result of one or more abnormal cervical smears, were not pregnant, had not been treated previously for CIN, and had not had a hysterectomy. All women received a patient information sheet explaining the study and provided written consent. Approvals were obtained from the relevant local research ethics committees.
Prior to colposcopy, two cervical samples were obtained using a Cervex broom and placed in separate 20-ml vials of PreservCyt transport medium. Two samples were taken because the number of tests proposed required more aliquots than could be obtained from a single sample. Testing from two samples has been shown to have good concordance for molecular testing (11). Colposcopy was then performed in the usual manner. The liquid-based cytology (LBC) samples were transported to The Doctors Laboratory (TDL), where an aliquot from the first sample was removed for cytology, processed using the ThinPrep system, and returned to the relevant cytopathology departments for reporting. Cytopathology reporting was carried out blindly, although the cytopathologists were inevitably aware that the sample came from a colposcopy clinic. The remaining material from the first sample was used for NorChip PreTect HPV-Proofer testing (which was also performed by TDL).
Both samples were then sent to the laboratory at Queen Mary, University of London (QMUL), where aliquots were removed for the other tests. The Qiagen Hybrid Capture 2 assay and mtm laboratories CINtec p16INK4a cytology were carried out using material from the first sample. The second sample was used for the Abbott RealTime, Roche Cobas, Gen-Probe Aptima, and BD HPV tests. Tests were carried out in the QMUL laboratory according to standard manufacturers' instructions. The molecular testing laboratories were blinded to the cytology and histopathology results.
Neither the women nor their clinicians were informed of the HPV test results, and the results were not used to influence patient management.
Histopathology was first reported locally and then centrally reviewed by M.Y., who was blinded to all study test results. Where discrepant readings occurred, further review was undertaken by M.S., and the majority opinion was taken; where all three readings were discrepant (one each of CIN less severe than grade 2 [<CIN2], CIN2, and CIN3), CIN2 was assigned. All results are presented on the basis of the reviewed histopathology, and the highest grade of abnormality seen in the biopsy or treatment specimen was used.
In this study, the following assays were carried out and scored in strict accordance with the manufacturers' protocols.
The Hybrid Capture 2 assay (Qiagen) detects 13 HR HPV genotypes collectively and is based on the hybridization of HPV DNA to a 13-HR-HPV-type RNA probe cocktail. The DNA-RNA hybrid is captured by an anti-DNA-RNA antibody and detected by chemiluminescence. Readings over 1 relative light unit (RLU) are considered positive.
The Cobas 4800 (Roche Diagnostics) test is a qualitative in vitro test for the detection of 14 HR HPV types. The test specifically identifies HPV type 16 (HPV-16) and HPV-18 while concurrently detecting the rest of the high-risk types as a group (types 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, and 68).
The Abbott RealTime High Risk HPV assay (Abbott Molecular, Wiesbaden, Germany) is a qualitative multiplex real-time test that also specifically identifies HPV-16 and HPV-18 while concurrently detecting the rest of the high-risk types as a group (types 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, 66, and 68).
The BD HPV test (BD Diagnostics) a real-time PCR (at the time of writing, not yet commercially available) which detects 14 HR HPV types. Typing is provided for types 16, 18, 31, 45, 51, 52, and 59. The remaining HPV types are grouped into the type 33, 56, 58, and 66 group and the type 35, 39, and 68 group.
The PreTect HPV-Proofer assay (NorChip) was performed by TDL. PreTect HPV-Proofer is a real-time multiplex nucleic acid sequence-based amplification assay for isothermal amplification of E6/E7 mRNA expressed by five high-risk HPV types (types 16, 18, 31, 33, and 45) using proprietary primer sets (19).
The Aptima (Gen-Probe) assay is based on target capture, transcription-mediated amplification, and hybridization protection for the detection of E7 mRNA expression of 14 HR HPV types. Although not commercially available as yet, in this study, typing was also performed at Gen-Probe Laboratories for HPV types 16, 31, and 33 and HPV types 18 and 45 combined (13).
p16INK4a cytology uses immunocytochemical detection of overexpression of the p16INK4a tumor suppressor gene (CINtec cytology; mtm laboratories) and scoring using nuclear morphology.
Immunostaining was done at mtm laboratories using monoclonal antibody to p16INK4a with antimouse as secondary antibody and detected by diaminobenzidine chromogen. The p16INK4a score was based on nuclear assessment of brown-stained cells by four criteria (32): A, increased size; B, granular or hyperchromatic chromatin; C, irregular shape; D, variable morphology from cell to cell.
Cells positive for any one of these criteria were scored as 2. Cells positive for criterion A and one other criterion were scored as 3, and cells positive for criterion A and more than one other criterion were scored as 4. The sample score was the highest score observed. Reporting of CINtec p16INK4a cytology was carried out blindly by C.B. according to a scoring method recently described (32, 33).
Sample adequacy was assessed by a cellular DNA or RNA marker for all molecular assays except HC2. Assay variability was controlled by including a pooled positive control and a pooled negative control in each run for each test. Reagents from the same batch were used throughout, and the controls provided in the kits showed little assay variability over time.
Data entry and all statistical analysis were carried out at QMUL.
The main outcome measures were sensitivity, specificity, and positive predictive value (PPV). Confidence intervals for these were based on binomial statistics. Comparisons between tests were conducted by McNemar's test for matched pairs and giving odds ratios and 95% confidence intervals for discordant samples. The study was designed to detect with 80% power a difference in sensitivity of 95% versus 90%. This assumed 300 CIN2+ cases and a discordant difference of 20 (positive/negative) versus 5 (negative/positive). Assays were considered to be significantly different if the two-sided P value was less than 0.05. Additional calculations of sensitivity, specificity, and PPV were carried out for women aged ≤30 and women aged >30 years. Full details of type-specific results will appear elsewhere, but positivity results for HPV-16 and HPV-18 are shown.
We considered the worst histology within 9 months of the initial baseline visit. Histologically confirmed CIN2+ and CIN3+ were taken as the primary endpoints. We also reported CIN2 separately. Specificity is reported only for <CIN2, as we do not consider the detection of CIN2 to be a false positive. Where appropriate, we also computed receiver operating characteristic (ROC) curves to further compare sensitivity and specificity at different cutoffs for Hybrid Capture 2, BD HPV, Abbott RealTime, Roche Cobas, Gen-Probe Aptima, and mtm laboratories p16INK4a.
All statistical analyses were carried out using the Stata (version 10.1) program (StataCorp).
Figure 1 (flowchart) shows the study recruitment. A total of 1,117 women consented to participate in the study, and the tests from 1,099 women were analyzed. The median length of time between the referral smear and colposcopy was 1.8 months (interquartile ratio [IQR], 1.4 to 2.6 months; range, 0.3 to 59 months). The median age of the women was 29 years (IQR, 26.6 to 34.3 years). Over three-quarters of the women were under 35 years of age, with 55% being under the age of 29 years.
The referral population comprised approximately 24% of women with high-grade dyskaryosis, and 76% had low-grade disease (16.7% borderline, 44.2% with a single mildly dyskaryotic cytology, and 15.4% with mild dyskaryosis and one or more borderline smears or worse). Over 20% of the concurrent smears were borderline, 32% were mild dyskaryosis, and approximately 26% were high grade. Just under one-fifth (19%) of the concurrent smear specimens taken in the colposcopy clinic were negative. This may reflect sampling variability in the smear or regression of disease.
Table 1 tabulates referral cytology against worst histology. Twenty-seven percent of the women either had a normal colposcopy with no biopsy or a negative biopsy, 37% of women had CIN1 or other minor abnormalities, 12% had CIN2, and 20% had CIN3 or worse.
Cytology of samples taken at colposcopy (mild dyskaryosis or worse) had 88.9% sensitivity, 58.1% specificity, and 50.7% PPV. The cytology was read with the knowledge that the patient was having colposcopy and so may not reflect normal screening practice.
The overall HPV positivity of the different tests is shown in Table 2 and ranged from 79% to 86%, apart from p16INK4a and PreTect HPV-Proofer, at 58.9% and 43.9%, respectively. The proportion of women positive for HPV type 16 and/or 18 was similar across tests (for HPV type 16, the positivity ranged from 26.2% to 31.9%, and for HPV type 18, it ranged from 7.8% to 11.8%; Table 2).
Sensitivity, specificity, and PPV for CIN2+, CIN2 alone, and CIN3+ of the different tests (all samples taken concurrently at colposcopy) are reported in Table 3. We have chosen to report separately the results for CIN2+ and CIN3+ because CIN2 is less likely to progress and has greater variability in diagnosis. CIN3 has been shown to have greater reproducibility.
Five adjunctive tests had a sensitivity greater than 95% for CIN3+: Hybrid Capture 2, Cobas 4800, Abbott RealTime, BD HPV, and Aptima. The Abbott RealTime test was significantly less sensitive for CIN2+ than Hybrid Capture 2, BD HPV, and Aptima and had a marginally but not significantly lower sensitivity than Cobas 4800. All these tests were significantly more sensitive than p16INK4a or PreTect HPV-Proofer.
There were seven cancers, of which all but one (a microinvasive stage 1A cancer missed by PreTect HPV-Proofer only) were detected by all the HPV tests. This sample tested positive for HPV-52 by BD HPV and for other HPV types by Abbott RealTime and Cobas 4800.
Overall, the highest specificity was achieved with the PreTect HPV-Proofer (70.8%, for CIN2+), but this test had relatively low sensitivity (Table 3). The CINtec p16INK4a cytology test had a significantly lower specificity (54.7% and 49.4% for CIN2+ and CIN3+, respectively) than PreTect HPV-Proofer but a higher specificity than the other tests. Of the five highly sensitive tests, Hybrid Capture 2 showed significantly lower specificity than the other four tests (McNemar's test). Aptima and Abbott RealTime were also significantly more specific than Cobas 4800 and BD HPV. When focusing on CIN3+, overall, sensitivity is slightly improved; however, the ordering of the tests remains similar (Table 3). Because there is uncertainty about the progressive potential of CIN2, we provide sensitivity estimates for both CIN2+ and CIN3+; however, few would consider CIN2 to be a false positive, so we do not report specificity for CIN3+.
Figure 2 shows the effects on sensitivity and specificity of using different cutoffs for Hybrid Capture 2, Cobas, Abbott RealTime, Aptima, BD HPV, and p16INK4a in predicting histologically confirmed high-grade disease (CIN2+ and CIN3+). The ROC curves for the highly sensitive tests are very similar. If the cutoff for Hybrid Capture 2 was raised (from ≥1 RLU to ≥2 RLU), the sensitivity remained relatively unchanged, while the specificity slightly improved. Using a cutoff ≥3 instead of ≥2 (see Materials and Methods) for p16INK4a reduced the sensitivity for CIN3+ (from 90.2% to 79.0%), although it increased the specificity from 49.4% to 75.9%. For the BD HPV test, lowering the cutoff to a cycle threshold (CT) value of ≤33 from the nominal ≤36.2 would result in a small decrease in sensitivity for CIN3+ (from 97.8% to 97.3%) and for CIN2+ (from 95% to 93.8%), while it would result in an increase in specificity for CIN3+ (from 21.9% to 26.7%) and for CIN2+ (from 24.1% to 29.4%).
Sensitivity was slightly higher for all tests in younger women (age <30 years), being (for CIN3+) between 3.0% and 5.3% higher for all tests except PreTect HPV-Proofer, whose sensitivity was 88.9% in women under age 30 years and 70.3% in those over age 30 years. All tests showed higher specificity in the older age group (data not shown).
To more accurately reflect the use of these tests in a situation where HPV triage has been recommended, we also considered the restricted population of women who had a borderline or mildly dyskaryotic referral smear. The results were generally similar (Table 3). For the five tests that showed the highest sensitivity (Hybrid Capture 2, Abbott RealTime, Cobas 4800, BD HPV, and Aptima), specificities were similar when only women with borderline or mildly dyskaryotic smears were considered, and the relative performance (i.e., the order) of the tests was unchanged in these lower-risk cytology categories.
It should be noted that the tests did not all miss the same cases, making comparisons more complex. There were 120 CIN2+ individuals missed by at least one test: 87 individuals were negative by one test only (60 by PreTect HPV-Proofer, 26 by CINtec p16INK4a cytology, and 1 by Hybrid Capture 2); 10 individuals were negative by two tests (7 by PreTect HPV-Proofer and CINtec p16INK4a cytology, 3 by PreTect HPV-Proofer and Abbott RealTime); 23 individuals were negative by three or more tests. Data on individual HPV types were complicated by the large number of multiple infections and will be reported separately.
This study's strength is that it is possible to compare at the same time a wide range of adjunctive tests with samples taken from the same woman. All the women in the population studied were referred with abnormal cytology, but they had a wide range of histological outcomes, which facilitated the ability to look at relative sensitivity. One other study (23) has compared several commercially available tests in the same cohort. The study was performed in 281 women referred with an abnormal smear, and the tests compared were HC2, PreTect HPV-Proofer, Roche's linear array, and the DR.HPV IVD kit (DR.Chip Biotech, Inc.). Sensitivity from CIN3+ was highest for HC2 and linear array (100%), whereas NorChip PreText HPV-Proofer showed a high specificity (88%). Another study based in a screening setting (5) compared Abbott RealTime with Hybrid Capture 2. The study was performed in a random sample of 998 specimens and showed sensitivities for CIN2+ of 96.4% and 97.6% for Abbott RealTime and HC2, respectively. Specificity was 92.3% for Abbott RealTime and 92.6% for HC2.
A recent population-based study in France (20) which compared HC2 with Aptima showed sensitivities similar to those from our study for CIN2+ of 92% for Aptima and 96.4% for HC2. Specificities for CIN2+ were much higher in the FASE study (91.8% for Aptima versus 86.4% for HC2) than our study (28.8% for Aptima versus 19.5% for HC2). This difference in specificity is due to the differences in a screening population versus a referral population. A second population-based study from Slovenia (25) compared Abbott RealTime with Hybrid Capture 2. In that study, Abbott RealTime showed an overall sensitivity of 98.2%, compared with 93.3% in our study, while Hybrid Capture 2 showed a sensitivity of 94.7%, compared with 96.3% in our study. As with the FASE study, specificity was much higher for both tests, for similar reasons.
A recent study (6) compared the performance of the BD HPV assay to that of HC2 in an archived subset of ALTS clinical trial specimens among women referred with either atypical squamous cells of undetermined significance (ASCUS) or low-grade squamous intraepithelial lesion (LSILs; borderline or mild dyskaryosis). The clinical performance of the BD assay (sensitivity = 90.0%; specificity = 43.9%; PPV = 12.9%) for detection of a 2-year worst histologic outcome of CIN3 was comparable to that of HC2 (sensitivity = 87.5%; specificity = 41.8%; PPV = 12.2%). In addition, specimens that tested HC2 negative and BD HPV positive were more likely to be positive for oncogenic HPV genotypes, whereas those that tested HC2 positive and BD HPV negative were more likely to be positive for low-risk HPV genotypes.
A study conducted in the United States (28) compared HC2 with Cobas 4800 among 1,578 women with ASCUS cytology. This study showed a slightly lower sensitivity for CIN2+ for both tests (87.2% for HC2, 90.0% for Cobas 4800). However, specificity in the study by Stoler et al. (28) was considerably higher (71.1% for HC2, 70.5% for Cobas 4800). It should be noted that in the United Kingdom, women are referred for colposcopy only after three smears showing borderline dyskaryosis, whereas in the study by Stoler et al. (28), referral for colposcopy took place after a single ASCUS result.
Five tests demonstrated very high sensitivity, indicating that they are unlikely to miss significant disease, and of these, the Aptima and Abbott RealTime tests had the best specificity, suggesting that these tests may lead to fewer unnecessary colposcopy referrals. Sample adequacy was not an issue for these samples, as they were collected by trained health care professionals in referral clinics. Sample adequacy can be a problem when using material after cytology slides have been prepared, but since we had two 20-ml vials, this was not an issue here.
The results for CINtec p16INK4a cytology were less favorable than those in other recent publications (33), which found a sensitivity of 100% in an LSIL population, with a specificity of 81.7%. However, there is a wide variation in the literature, which may be at least partly related to changes in and a lack of standardization in reporting of p16INK4a cytology results (10, 30). The results for CINtec p16INK4a cytology are similar to those of the concurrent cytology. However, the concurrent cytology was based on samples taken at colposcopy, where sampling and reading may be better than those for routine screening samples.
High sensitivity is clearly important. However, it is known that many CIN2 and some CIN3 lesions will regress spontaneously, and it is possible that tests with lower sensitivity still identify the lesions which are destined to progress to cancer. Among individuals with CIN2+ in our study, 13 tested positive by HC2 but were missed by Abbott RealTime, whereas only 2 tested positive by Abbott RealTime but were missed by HC2. These results differ from those of Poljak et al. (24) in the comparison of HC2 with Abbott RealTime, where 7 individuals with CIN3+ tested positive by HC2 but were missed by Abbott RealTime and 15 were positive by Abbott RealTime but negative by HC2.
Triage is most needed for women with a single mildly dyskaryotic or borderline dyskaryosis on smear. For this group, the absolute sensitivity and the relative performance of the tests were similar to those for the whole cohort, but specificity was higher, especially in the case of borderline dyskaryotic smears, indicating the value of a triage test for these women. These data provide useful relative comparisons of the performances of these tests in such a situation. However, a limitation of this study in terms of triage is that the HPV tests were not done on the referral sample, as would normally be the case when using liquid-based cytology.
It should be noted that this study was not designed to provide specific algorithms for cervical screening or triage but was designed to compare a range of HPV tests in a sample of women with a high disease rate, with a particular focus on evaluating the test sensitivities relative to each other. The very high sensitivity in this situation has provided reassurance that disease is unlikely to be missed in women with abnormal cytology. It also suggests that sensitivity will be very high in cytology-negative women, although this requires direct verification. However, there is already some evidence to support the relative performances of the tests in a screening setting, although only in pairwise comparisons (5, 6, 20, 25).
Compared with the population for our previous publication (29), this population contained a somewhat greater proportion of women aged under 29 years and fewer women aged over 45 years, which may account for the overall higher HPV positivity rates. This may also contribute to the higher proportion of individuals with CIN2 and the lower specificity observed compared to the previous study.
Our study confirms that, in a referral setting, HPV testing by a number of different tests offers high sensitivity for high-grade disease. However, this finding may not be extrapolated to a screening setting, and further work in unselected screening populations is needed.
We thank the staff in the Departments of Cytopathology and Colposcopy at the Hammersmith Hospital and St. Mary's Hospital, London, for their help with this study.
Jack Cuzick is on advisory boards for Gen-Probe, Roche, Qiagen, Abbott, and BD; Mark Stoler is a consultant to MSD, Roche, BD, Gen-Probe, Qiagen, and mtm laboratories; Christine Bergeron is on advisory boards for Gen-Probe, Roche, and mtm laboratories.
This study was supported by Cancer Research United Kingdom Programme grants C569/A10404 and C8162/A10406 and supplemented with financial contributions and/or assay kits from Qiagen, Gen-Probe Incorporated, Abbott Molecular, BD Diagnostics, mtm laboratories, NorChip, and Roche Diagnostics.
Published ahead of print 14 March 2012