|Home | About | Journals | Submit | Contact Us | Français|
Our objective was to assess the sensitivity and specificity of human papillomavirus (HPV) testing for cervical cancer screening in randomized trials. We conducted a systematic literature search of the following databases: MEDLINE, CINAHL, EMBASE, and Cochrane. Eligible studies were randomized trials comparing HPV-based to cytology-based screening strategies, with disease status determined by colposcopy/biopsy for participants with positive results. Disease rates (cervical intraepithelial neoplasia [CIN]2 or greater and CIN3 or greater), sensitivity, and positive predictive value were abstracted or calculated from the articles. Six studies met inclusion criteria. Relative sensitivities for detecting CIN3 or greater of HPV testing-based strategies vs cytology ranged from 0.8 to 2.1. The main limitation of our study was that testing methodologies and screening/management protocols were highly variable across studies. Screening strategies in which a single initial HPV-positive test led to colposcopy were more sensitive than cytology but resulted in higher colposcopy rates. These results have implications for cotesting with HPV and cytology as recommended in the United States.
Cytology, the primary modality for cervical cancer screening in the United States, has resulted in significant declines in cervical cancer morbidity and mortality.1 Nonetheless, cervical cytology has limitations, including a false-negative rate for cancer of at least 20%,2 leading to a search for more sensitive screening strategies. Human papillomavirus (HPV) is the causal factor of cervical cancer, and 18 HPV types including HPV16 and HPV18 have been associated with invasive cancer.3 HPV DNA testing has therefore been proposed as an alternative or adjunct for cervical cancer screening, with advantages that the test is more objective and sensitive than cytology.
Most women acquire HPV soon after sexual debut and spontaneously clear the virus within 1-2 years after infection; only approximately 10% of women remain HPV positive 5 years after acquisition.4 Although incident HPV infection is common, the risk for cervical cancer is associated with persistent infection. Therefore, the prognostic value of a single positive HPV test in young women is limited.
Recent guidelines in the United States recommend that in women older than 30 years, cervical cancer screening can be performed with either cytology every 3 years or cytology plus HPV cotesting every 5 years.5,6 In cases in which the HPV test is positive and cytology normal, repeating both tests in 12 months is recommended, unless HPV16 or HPV18 is present.7 In addition to cotesting as used in the United States, other strategies under evaluation globally include HPV testing alone without cytology and HPV testing first followed by cytology triage for positive results.
Metaanalyses comprised largely of observational studies have shown that HPV testing-based strategies are more sensitive but less specific than cytology-based screening strategies.8,9 Although observational studies are useful in determining test accuracy, performance estimates from observational studies can be biased when evaluating 2 diagnostic tests simultaneously in a manner that differs from how the tests are used separately. Potential bias is minimized when sensitivity and specificity of different strategies (single or combined tests) are analyzed in randomized trials.10 More reliable comparisons of sensitivity and specificity can be made in the context of randomized clinical trials conducted over 1 or more screening episodes.11
Our objective was to assess the sensitivity and specificity of HPV testing in randomized trials by summarizing data from randomized trials of various cervical cancer screening strategies that incorporate HPV testing.
We conducted a systematic literature search in 2010 of the following electronic databases: MEDLINE, CINAHL, EMBASE, and the Cochrane Library. We used the following subsets of search terms combined by the word “and:” (papillomavirus/papilloma-viridae/papilloma virus/hpv) and (cytodiagnosis/cytolo*/pap smear/papanicolaou/colposcopy/cervical smear/uterine smear/cervicovagina smear/cervix uteri smear/endocervix smear) and (cervical cancer/cervical intraepithelial neoplasia/CIN/cervical dysplasia/cervical neoplasm/uterine cervical neoplasm/uterine cervix tumor/cervix cancer/uterine cervix carcinoma-in situ/uterine cervix dysplasia/cervix dysplasia/cervix neoplasm/uterine cervix cancer/mass screening/cancer screening). The following limit was placed on all searches to retrieve primarily randomized studies: “random*.” No language restrictions were included. The references cited in the articles selected for study inclusion were hand searched for additional citations. For each primary study, citations of relevant articles were searched for in MEDLINE, and a cited reference search was conducted in the Institute for Scientific Information Web of Knowledge electronic database. As a secondary analysis of published data, the study was exempt from institutional review board approval.
We applied the following inclusion criteria: (1) the study was a randomized trial comparing HPV-based strategies to cytology-based strategies for primary cervical cancer screening and (2) disease status was determined by colposcopy/biopsy for study participants in whom treatment was warranted. Our study was limited to screening strategies and therefore did not assess the use of reflex HPV testing for a cytology result of atypical squamous cells of undetermined significance (ASCUS). None of the studies in this review used genotyping assays recently approved by the US Food and Drug Administration.
Studies were selected with a 2-step method. First, resulting titles and abstracts from literature searches were analyzed, and citations that were likely to meet the aforementioned criteria were chosen. The full manuscripts of these citations were then evaluated to determine whether full inclusion criteria were met.
Each study was abstracted onto pre-tested data abstraction forms by at least 2 reviewers with a third reviewer for adjudication of discrepancies. In cases of multiple publications from a single study, the publication analyzing the most recent data set was used. If this was unclear, the study authors were contacted for clarification; repeat data abstraction was performed if indicated by the authors’ response. In addition, although another study (the HPV FOCAL trial: a randomized trial of Human papillomavirus testing for cervical cancer screening)12 met our inclusion criteria, colposcopy results were not published; therefore, it was excluded from the systematic review. The final results from 1 trial were published after the performance of the systematic literature review; results from this publication13 were abstracted and used in the analysis primarily rather than the interim results.14
We devised a simplified classification system of different HPV testing strategies to determine whether differences in outcomes were related to testing strategy. These strategies, illustrated in Figure 1, included the following: (1) HPV testing alone (no cytology) with referral colposcopy for positive HPV test results, (2) HPV testing with cytology triage for positive HPV test results, (3) combination of cytology and HPV testing (cotesting) with an active response to positive HPV testing results (women with a positive initial HPV test were referred to colposcopy), and (4) cotesting with a passive response to positive HPV testing results (women with a positive HPV test but normal cytology underwent more frequent surveillance but were not referred to colposcopy based on the initial positive HPV test result).
Quality was assessed using the following markers for internal validity: method of randomization, blinding, intention-to-treat analysis, and loss to follow-up. Blinding was assessed with a point given for each entity blinded: patient, clinician, pathologist, cytologist, and statistician.
Study quality was determined by using the Quality Assessment of Diagnostic Accuracy (QUADAS) questionnaire, an evidence-based tool that assesses the quality of diagnostic accuracy studies.15 Although a 14-point QUADAS tool is used primarily to assess the quality of observational studies rather than randomized trials of diagnostic tests, we used a modified 11-point assessment to provide additional information on study quality. The modified QUADAS tool covered the following topics: representativeness of the study population to a screening population, description of selection criteria, validity of reference standard, the delay between index and reference tests, the reproducibility of these tests, blinding, clinical information available to clinicians, and reporting of inadequate results and loss to follow-up. Based on the criteria used in previous studies, we chose a score of 50% or higher (at least 6 of 11 points) to denote a high-quality study.15
Disease outcome measures for the purposes of this study were cervical intraepithelial neoplasia (CIN)2 or worse (CIN2 or greater) and CIN3 or worse (CIN3 or greater). For each study, the rates of HPV test positivity/abnormal cytology rates or percent referred to colposcopy were either abstracted or calculated. When the positivity/abnormality rate was not recorded, the colposcopy referral rate was used instead because these values should theoretically be the same. The only exception was for passive-response studies in which a positive HPV test did not result in a referral to colposcopy. The only passive-response study that did not give a colposcopy referral rate was phase 1 of the New Technologies for Cervical Cancer Screening (NTCCS) trial in the 25-34 year age group.16 Hereafter, the term, test positivity rate, will signify HPV positivity rate, abnormal cytology rate, or colposcopy referral rate.
Using these values, the rates of CIN2 or greater and CIN3 or greater per positive test result were calculated by multiplying the test positivity rate by the total number of participants in each group and using that result as the denominator and the total number of CIN cases as the numerator. To calculate the rate of disease per person screened, the total number of participants was used as the denominator. Most of these data were not available for the second round of studies with multiple screening rounds; therefore, these measures were reported only for the first round of screening.
To assess the performance of the different strategies, the values for relative sensitivity, specificity, and relative positive predictive value (PPV) for HPV testing vs cytology-based strategies were directly abstracted from the publication if available. If not stated in the text, relative sensitivity was computed by taking the ratio of disease rate in the HPV based-testing group divided by the disease rate in the cytology-based testing group.
Comparisons of test specificity were reported for the A Randomised Trial of HPV Testing in Primary Cervical Screening (ARTISTIC) trial18 and the Finnish trial.16 The relative PPV was used as a marker for specificity and was calculated with the number of cases found divided by the number of participants who underwent colposcopy used as the denominator. The relative sensitivities for the Population Based Screening Study Amsterdam (POBASCAM) trial13,14 and relative sensitivities and relative PPVs for the India trial17 were calculated in the manner described in the previous text. The relative sensitivities and PPVs were calculated for the ARTISTIC trial using published sensitivities and PPVs.18
One potential harm associated with cervical cancer screening is the excess number of colposcopies and biopsies needed to detect 1 case of disease (number needed to colposcopy [NNC]). The NNC for CIN2 or greater and CIN3 or greater was calculated for round 1 as the reciprocal of the number of CIN cases per positive test result. The NNC was not calculated for the NTCCS phase 1 study16 in 25-34 year olds because the colposcopy rate was not given. The SwedeScreen study19 was also not included because overall sensitivities and PPVs were not published. The colposcopy rate used for the POBASCAM trial was reported in the earlier publication.14
Statistical methods used to evaluate sensitivities varied between individual studies and included a χ2 test, a Fisher exact test, and Poisson regression analysis. For the studies in which we calculated relative sensitivities based on the published data, statistical comparison was not performed.
The NNC values were compared statistically by calculating the z-statistic on the SE of the difference between the NNC for the HPV arm and the cytology arm. A P value was then determined for this z-statistic, and the statistical significance was defined as P < .05.
Six studies met inclusion criteria and were chosen after undergoing the evaluation process described in the previous text (Figure 2).13,14,16-20 Covariate characteristics such as study location, study size, and exclusion criteria are summarized in Table 1. The sample sizes ranged from 12,410 to 131,746 women; combined the studies included a total of 422,084 women. Four studies had data available from 2 screening rounds. A round is defined by a screening episode; in the second round, participants from the first round were rescreened. The time between screening intervals ranged from 3 to 5 years. Outcome measures ranged from CIN1 or greater to cancer death. Cancer is rare in countries with established screening programs and the numbers of cases were small in most studies.
The NTCCS study16 evaluated 3 different HPV testing–based methodologies compared with cytology in separate populations imbedded in 1 large study. Phase 1 evaluated a combination of HPV and cytology with differing responses to the HPV test based on age: younger women received passive follow-up, whereas older women received active follow-up (Figure 1). Phase 2 enrolled additional participants to evaluate HPV testing alone compared with cytology in all ages. The different trials were analyzed individually. Although the NTCCS trials categorized results based on age, most studies did not do this for all of their outcomes.
The studies differed in important methodological aspects such as HPV testing methods and thresholds for colposcopy (Table 1). The POBASCAM13,14 and SwedeScreen19 studies used general primer pair GP5-GP6 for HPV DNA polymerase chain reaction testing, whereas the other studies used the Hybrid Capture 2 (hc2) high-risk HPV test (Qiagen, Gaithersburg, MD).
Some studies used liquid based-cytology (LBC), whereas others used conventional cytology. Phase 1 of the NTCCS16 study used conventional cytology in the control arm and LBC combined with HPV testing in the experimental arm. Also, NTCCS16 used HPV testing and/or cytology in round 1 but only cytology in round 2 for all participants, meaning that the randomized screening strategy was not continued into round 2. Similarly, POBASCAM13,14 performed both HPV testing and cytology in all participants in round 2. Most importantly, the cytological and temporal thresholds for colposcopy referral varied between studies.
Two studies used ASCUS, 1 used low-grade squamous intraepithelial lesion (LSIL), and 3 used high-grade squamous intraepithelial lesion (HSIL) as the threshold for colposcopy. One study with passive response retested at 6 and 18 months, and 3 studies with passive response retested at 12 months (Table 1).
The presence of study quality measures is summarized in Table 2. Studies ranged in duration from 6.5 to 8 years. All studies had 2 rounds over this time period with the exception of the India study.17 The second round of the Finnish study20 is still in progress. Most studies were of good quality, scoring 9-10 of 11 with the QUADAS tool. Although still of high quality, the Finnish study20 had the lowest QUADAS score (6 of 11) because participant selection criteria were not clearly defined, the delay between positive index test and colposcopy was not described, colposcopy methods were not described in detail, and unsatisfactory test results and loss to follow-up were not reported.
Given that the HPV testing strategy, the HPV testing method, the cytology threshold for colposcopic referral, the type of cytology (LBC vs conventional), and age were heterogeneous in the included studies, a formal meta-analysis could not be conducted.
To compare the numbers of interventions required to find disease, the test positivity rates and the rates of disease per woman screened and per woman who tested positive in round 1 are presented in Table 3. The test positivity for the HPV-based strategies ranged from 1.2% to 13.1% and the test positivity for the cytology-based strategies ranged from 1.2% to 7.0%. The wide range in positivity rates for HPV-based strategies were most likely related to population differences in HPV prevalence and different responses to positive HPV tests.
The range in cytology-based test positivity generally varied according to the cutoff threshold for colposcopy. For example, POBASCAM13,14 had a low abnormal cytology rate of 1.3%, but the threshold for colposcopy was a diagnosis of HSIL or worse. Higher rates of abnormal cytology (≥3.1%) were observed in NTCCS16 using a colposcopy threshold of ASCUS or worse. However, this pattern did not hold true in the ARTISTIC trial,18 in which the test positivity rate for cytology was 5.2% despite the colposcopy threshold being HSIL or worse.
Rates of disease per woman screened also varied widely. The rate of CIN2 or greater per woman screened ranged from 0.4–2.5% in the HPV testing groups and from 0.3–2.2% for cytology alone, whereas the rate of CIN3 or greater ranged from 0.1% to 1.3% in both the HPV-based testing groups and cytology groups.
Table 4 tabulates the relative sensitivities and relative PPVs for each round of each study, with cytology alone as the reference standard. In general, HPV testing was more sensitive for CIN2 or greater and CIN3 or greater than cytology in the first round of screening and less sensitive in the second round of screening.
The overall relative sensitivities of HPV-based strategies for detecting CIN3 or greater ranged from 0.9 to 2.1. For the 4 studies that used a strategy with a passive response to a positive HPV test result, the overall relative sensitivities for CIN3 or greater ranged from 0.9 to 1.1. For the NTCCS studies16 with an active response to a single positive HPV result, the relative sensitivities for CIN3 or greater ranged from 1.6 to 2.1 and were statistically significant. Thus, higher sensitivities for HPV testing–based strategies were observed only for strategies that incorporated immediate referral to colposcopy based on a single initial positive HPV test.
Regarding cancer outcomes, the NTCCS trial showed significantly decreased detection of invasive cancer during the second round of screening and overall after 2 rounds of screening in the HPV group.16 The India trial showed no difference in cancer detection between the HPV-only testing group and the cytology-only testing group; however, the incidence rates of stage II or higher cervical cancer and death from cervical cancer were higher in the cytology group than in the HPV testing group.17 The SwedeScreen trial described the numbers and histology of the cancers detected in the intervention and control groups; the total numbers were small and the authors did not perform a statistical comparison.19 The POBASCAM study reported that the number of cancers detected in the second round of screening was significantly lower in the intervention group than the control group, but the difference was not statistically significant overall after 2 rounds of screening.13
Only 2 studies reported specificities. The ARTISTIC trial reported that the specificity for detecting CIN2 or greater during round 1 was lower with cytology with HPV cotesting compared with cytology alone.18 Conversely, the Finnish trial reported a higher specificity when comparing HPV testing with cytology triage with cytology alone.20
Because specificities were not given for most studies, the relative PPV for HPV-based strategies compared with cytology-based strategies was used as a marker for specificity. Most studies did not report relative PPVs for round 2 or overall for both rounds. For round 1, the relative PPVs of HPV-based strategies compared with cytology-based strategies for detection of CIN2 or greater ranged from 0.4 to 1.3 and of CIN3 or greater ranged from 0.2 to 1.2. In the studies in which the threshold for colposcopy was ASCUS or greater, the relative PPV ranged from 0.2 to 0.9, whereas the relative PPV for the ARTISTIC trial18 was 1.0 in which only women with HSIL or worse were referred to colposcopy.
To compare the potential harms of the various strategies, we calculated the numbers of women who would need to undergo colposcopy to detect 1 case of disease from round 1 of each study (CIN2 or greater and CIN3 or greater in Figures 3 and and4,4, respectively). For detection of CIN2 or greater, significantly higher numbers of colposcopies were needed in the HPV arm in the India trial,17 the NTCCS phase 1 trial in women 35-60 years of age,16 and the ARTISTIC trial,18 whereas significantly higher numbers of colposcopies were needed in the cytology arm in the Finnish trial20 (Figure 3). For detection of CIN3 or greater, significantly higher numbers of colposcopies were needed in the HPV arm in the NTCCS phase 1 trial in women 35-60 years of age16 and in the ARTISTIC trial18 (Figure 4). For the India study,17 which reported cancer rather than CIN3 or greater as an outcome, the NNC values were 27.7 and 14.8 for the HPV-based and cytology-based groups, respectively, to find 1 case of cancer; this difference was statistically significant (P < .0001).
This systematic review synthesizes and compares results from all of the randomized trials with published results comparing cytology with HPV-based testing for cervical cancer screening. Although cervical cancer morbidity and mortality are the most relevant outcomes, cancer is rare in countries with established screening programs. Given that CIN2 reverts in up to 40% of cases,21 detection of CIN2 may result in overtreatment and morbidity in reproductive-aged women; therefore, CIN3 or greater is likely the most clinically relevant outcome in cervical cancer screening trials. One consistent finding from all the studies is that CIN3 or greater is an uncommon outcome, which necessitates a careful assessment of tradeoffs between sensitivity and specificity. The low rates of CIN3 or greater are of additional significance, given that the intervals between screening rounds in the studies (3-5 years) are greater than the screening interval of 1-2 years previously used in the United States until the recent introduction of new guidelines.5,6
The results presented here indicate that in randomized trials, HPV testing is significantly more sensitive for CIN3 or greater during the first round of screening for some, but not all, strategies. The fact that sensitivity of HPV testing is lower than that of cytology in the second round suggests that as more disease is detected and treated in the first round, there will be less disease to detect subsequently. After completion of both screening rounds, sensitivity for detection of CIN3 or greater was significantly increased only with strategies involving an active response to a positive HPV test (ie, immediate colposcopy).16,17 This approach, however, also increases false-positive results as indicated by the relative PPV for detection of CIN3 or greater of 0.50 for women aged 35-60 years in the NTCCS trial.16
The strategy that has been adopted in the United States is to add HPV testing to cytology for women over age 30 years, and to respond passively to HPV-positive results if the cytology is normal, rather than to perform immediate colposcopy.5,6 Four trials in our analysis used a similar passive response with cotesting.13,16,18,19 None of these trials demonstrated an overall significantly increased relative sensitivity for CIN3 or greater, and only 1 trial demonstrated a significantly increased relative sensitivity for CIN2 or greater.
Another important metric is the number of colposcopies needed to find a case of disease. In the ARTISTIC trial, significantly higher numbers of colposcopies were needed in the HPV cotesting arm for detection of CIN3 or greater in round 1, but this did not result in increased sensitivity. The calculated number of colposcopies likely underestimates the true number of colposcopies because it reflects the initial test positivity rate and does not include subsequent colposcopies resulting from increased surveillance in women who remain HPV positive over time. Thus, it will be important to monitor the impact of cotesting on overall colposcopy rates as the new screening guidelines are adopted.
There are many strengths of this review. All studies were high-quality randomized trials with large study populations and provide longitudinal data about disease detection over multiple rounds of screening. This review also provides a novel method to estimate and compare the impact of different screening strategies through the calculation of the number of colposcopies needed to detect a single case of CIN2 or greater or CIN3 or greater.
This review also has limitations. We chose to review randomized controlled trials based on the strength of their study methodology and, furthermore, assessed study quality using a validated tool. However, there were weaknesses in individual studies that fall outside the parameters measured in this tool. For example, 3 of the trials used HPV testing on women under the age of 30 years (Table 1), an age group in which HPV testing is less specific because of the higher prevalence of transient HPV infections. The rate of loss to follow-up was high in some of the studies (Table 2). As discussed in the Results section, in 2 of the trials, the randomized screening strategy was not continued into the second round of screening.13,14,16
Additional limitations include the fact that results on relative specificity of HPV testing compared with cytology were not reported in most of the trials, limiting the ability to compare test performance. The varied strategies for incorporating HPV testing and the differing thresholds for colposcopy used by the studies presented significant challenges in comparing the performance of the screening strategies across studies. The studies were performed before the widespread uptake of HPV vaccination, and test performance is likely to change as the prevalence of HPV decreases. The number of screening rounds studied in these trials to date is insufficient to assess the impact of HPV testing over a woman’s lifetime. Further data will be forthcoming from additional rounds of screening from some of these trials. In addition, all of these studies were completed outside the United States and followed their respective country’s screening guidelines, all of which differed from US guidelines.
This review highlights the need for a longitudinal randomized trial performed in the United States over multiple screening rounds of 3-5 year intervals in which relative specificities and rates of colposcopy are reported. In addition, triage of women with HPV-positive results directly to colposcopy should be considered in any prospective US trial.
In summary, this systematic review indicates that after 2 rounds of screening, HPV-testing based screening strategies are more sensitive than cytology for the detection of CIN3 or greater only when referral to colposcopy follows a single positive HPV test. This strategy results in more colposcopies needed to detect a single case of CIN3 or greater or cancer, especially in women over 35 years of age. Because CIN3 and cervical cancer are rare in well-screened populations, the impact on increased disease detection needs to be balanced with the impact on cost, numbers of colposcopies, and morbidity associated with potential overtreatment.
We thank Dr John Boscardin, Associate Professor of Medicine and Biostatistics, University of California, San Francisco, School of Medicine, for assistance in statistical analysis.
This study was supported the University of California, San Francisco, Dean’s Quarterly Research Fellowship (I.Y.P.).
The current affiliation for Dr Patanwala is the Department of Obstetrics and Gynecology, University of Chicago, Chicago, IL.
The current affiliation for Dr Miyamoto is the Department of Pediatrics, University of Washington, Seattle, WA.