|Home | About | Journals | Submit | Contact Us | Français|
We conducted a systematic review to evaluate characteristics of human papillomavirus (HPV) testing, particularly Hybrid Capture 2 (HC2), in follow-up after treatment for cervical intraepithelial neoplasia (CIN) for detection of residual or recurrent CIN grade 2 or worse.
MEDLINE was searched for relevant studies published between 1992 and September 2007. Of the 1,107 citations identified, 20 met inclusion criteria.
Studies using polymerase chain reaction (PCR) testing were too heterogeneous to combine. We identified 5 studies that performed both HC2 and colposcopy. Pooled sensitivity for HC2 was 90.7% (95% CI: 75.4%–96.9%) and pooled specificity was 74.6% (95% CI: 60.4%–85.0%). Pooled sensitivity for cervical cytology was 76.6% (95% CI: 62.0%–86.8%) and pooled specificity was 89.7% (95% CI: 22.7%–99.6%).
HC2 testing can identify about 91% of women with residual or recurrent CIN 2 or worse, but about 30% of women in follow-up would undergo colposcopy.
Guidelines from the American College of Obstetricians and Gynecologists (ACOG) and from the American Society for Colposcopy and Cervical Pathology (ASCCP) include testing for oncogenic types of human papillomavirus (HPV) as an acceptable strategy for surveillance after treatment of cervical intraepithelial neoplasia (CIN).1, 2 Published literature about the use of HPV testing for this purpose, however, is relatively limited. Prior reviews of HPV testing after treatment for CIN have included studies using two different techniques for HPV detection, polymerase chain reaction (PCR) and hybrid capture.3–5 Systematic reviews of HPV testing for primary screening were conducted in a different context, in which risk of CIN and the tissue architecture of the cervix differed from that of women after treatment. Drawing conclusions about the overall body of evidence, therefore, has been challenging.
PCR selectively amplifies HPV DNA, increasing the viral sequences in the sample. PCR techniques vary and studies may include a varying group of selected oncogenic types. Hybrid Capture 2 (HC2) identifies thirteen oncogenic HPV types (16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59, and 68) by detecting HPV DNA-RNA hybrids formed from the sample and captured by antibodies.6 Hybrid Capture 2 is the only HPV test approved by the Food and Drug Administration for commercial use in the United States. Hybrid Capture 1 (HC1) is an earlier version of this test and is no longer available. Test performance characteristics (sensitivity and specificity) of PCR testing have been shown to differ from those of HC2 testing in studies of HPV testing for primary screening.7
Three published systematic reviews have examined HPV testing in the context of post-treatment follow up of CIN.3–5 One review noted “marked heterogeneity in the design, population, intervention and follow-up policy across different studies”.4 Indeed, all three systematic reviews included prospective and retrospective studies and studies using PCR methods as well as HC2 methods for HPV detection. Despite this heterogeneity, these reviews provided pooled estimates of sensitivity and specificity without regard to the number of different HPV types tested for or the HPV detection technique employed. Since the publication of these reviews, data from three studies on HPV testing and recurrent CIN have been published: one relatively large study8 and two smaller studies.9, 10 These new publications add substantially to the body of evidence for post-treatment HPV testing.
We conducted a systematic review and meta-analysis to determine the test characteristics of HPV testing for post-treatment detection of residual or recurrent CIN grade 2 or worse. More stable estimates of test-specific characteristics in this context will allow for the evaluation of potential outcomes of different testing strategies.
We searched MEDLINE for relevant citations from January 1, 1992 through September 20, 2007. Three search queries were used (Appendix). All searches were conducted independently and their results were combined with duplicates removed. References identified from either search were candidates for further review. We also searched reference lists of review articles and of articles identified through our search for additional citations. Since this study was a systematic review, it was considered exempt from review by the University of California, Davis, Institutional Review Board.
Identification of relevant studies was conducted using a multi-step process. The titles of the full list of citations were reviewed for relevance by two reviewers. Abstracts of citations with potentially relevant titles were then reviewed by two reviewers. At the final step, the full text articles of the abstracts deemed relevant were reviewed for inclusion in the systematic review.
We included studies in which women were treated for any grade of CIN by conization, loop excisional procedures, laser ablation, or cryotherapy. All participants had to have had HPV testing within 12 months of treatment and a minimum of 12 months of follow-up. For a study to be considered for pooling, we required colposcopic evaluation with biopsy (when applicable) of all positive results in all women to determine the proportion of women with and without disease post-treatment. The histological distinctions between CIN 2 and CIN 3 are poorly reproducible according to current guidelines, which use CIN 2 as the threshold for treatment.2 Therefore, we chose CIN 2 or worse as the clinically important outcome rather than the more stringent criteria of CIN 3 or worse. In studies in which cytology was also performed, we used a threshold of atypical squamous cells of undetermined significance (ASCUS) to define a positive result since guidelines suggest this threshold for referral to colposcopy.11 Studies that had incomplete reporting of outcomes and interim studies with data included in a later report were excluded. We also excluded studies with more than 30% loss to follow-up.
The literature search identified 1,107 citations (Figure 1). Of these, 1,002 were excluded as not relevant by their titles. After review of the abstracts of the remaining 105 citations, a further 42 were considered not relevant and 63 underwent full text review. Among these, 20 studies were deemed relevant to the topic and met methodological criteria. Among the 43 excluded studies, 32 did not address the research question and 11 had methodological limitations that precluded them from inclusion. The limitations included unspecified treatment,12–16 less than 12 months of follow-up,17–19 follow-up HPV test not done,20 loss to follow-up >30%,21 and colposcopy or biopsy not performed to confirm the diagnosis of recurrence.22
Data were abstracted from full texts by two reviewers (JM and either BC or RA). Discrepancies were resolved by consensus. When the data presented in the published studies were ambiguous or missing, we contacted the corresponding author. Abstracted study characteristics included country and year of study, study design, initial and final sample sizes, mean or median age and range, type of initial treatment, type of HPV test (HC1, HC2, or PCR) and viral types included (for PCR), findings from HPV testing, cytology, colposcopy and biopsy, and mean or median length of follow-up and range. Quality indicators abstracted included blinded assessment and whether follow-up was influenced by HPV test results.
The relationship between HPV positivity and CIN grade 2 or worse at follow-up by colposcopy was abstracted from each study in a 2-by-2 table, which was used to calculate sensitivity and specificity. A continuity correction of 0.5 was added to cells in 2-by-2 tables with zero counts before calculating sensitivities and specificities. This continuity correction allows the use of the logit transform for meta-analysis and the calculation of variances when zero cell counts are present.23
Studies were grouped categorically according to the type of HPV test typing performed (HC1, HC2, or PCR). We reviewed characteristics of individual studies to determine whether pooling was appropriate, including assay type, HPV types included (for PCR), and whether colposcopy was performed on all subjects as a reference standard. When appropriate, pooled sensitivity and specificity were calculated using a bivariate normal model.24 The model assumes the logit transformed sensitivities and specificities have a bivariate normal distribution. The mean values for each incorporate a random effect to account for variability around the means, and hence, the variability between studies. Correlation between the sensitivity and specificity is explicitly modeled through the bivariate normal variance-covariance matrix. The bivariate normal model also incorporates the precision of the sensitivities and specificities of each study. Studies with more precise estimates are given more weight than studies with less precise estimates. 95% confidence regions were also calculated using the parameters estimated from the bivariate model.25 The model was estimated using the “nlme” package in R.26, 27
Since hypothesis tests for heterogeneity have poor statistical properties, we used the Q-statistic and the I2-statistic as indicators for potential heterogeneity.23, 28 When heterogeneity was quantitatively indicated, we further examined study characteristics for variation that may have accounted for the observed heterogeneity (e.g., age range of participants, time between colposcopy and procedure).
The 11 studies using PCR testing were highly heterogeneous in their study characteristics. First, PCR tests included a varying number of viral types, from as few as 2 types38, 39 to as many as 25.41 In addition, varying study designs were employed: 6 were prospective cohort studies36, 37, 40–43 while 5 were retrospective case-control studies.35, 38, 39, 44, 45 Few of the studies using PCR followed-up all subjects with colposcopy.37, 40, 42 These studies also exhibited quantitative heterogeneity reporting a wide range of sensitivities (29%–93%) and specificities (64%–98%). Thus, data from the PCR studies were not pooled.
The characteristics of the HC2 studies are shown in Table 1. Pooled sensitivities and specificities were calculated for the HC2 studies as this test has uniform test characteristics for a fixed set of oncogenic HPV types and it is the only HPV test approved for commercial use in the United States. Among the HC2 studies, two were excluded from pooling because follow-up colposcopy was not done on all women.31, 34 An additional study was excluded because it included only women who were referred for an abnormal Pap smear after treatment.30 Thus, 5 studies were used in the meta-analysis.8–10, 32, 33 All studies of HC2 used a positive cutoff of 1 pg/ml.
Recurrence rates of CIN grade 2 or worse, and sensitivities and specificities from the included HC2 studies are shown in Table 2. Among the 5 HC2 studies that were pooled, residual disease or recurrence of CIN 2 or worse was found in 1.1%–11.8% of women (pooled rate, 6.6%). The bivariate normal model fitted on these 5 studies estimated the pooled sensitivity for HC2 to be 90.7% (95% CI: 75.4%–96.9%) and the pooled specificity to be 74.6% (95% CI: 60.4%–85.0%) (Table 2 and Figure 2). The Q-statistic for sensitivity was 1.7 (P=0.80) and the I2-statistic was 0%. Both statistics suggest there was no heterogeneity in the sensitivities reported across the 5 studies. The Q-statistic for specificity was 20.1 (P<0.01) and the I2-statistic was 80%, indicating that 80% of the variability in the specificities across the 5 studies was due to heterogeneity. In other words, the specificities of the HC2 studies were heterogeneous. The heterogeneity was due to a relatively low specificity (63.8%) in the largest study.8 The mean age of participants in this study was over 10 years younger (24 years) than in any other HC2 study. A sensitivity analysis excluding this study resulted in a pooled sensitivity estimate of 91.7% and a pooled specificity estimate of 76.5%; neither appreciably different from the estimates when this study was included. Therefore, excluding this study did not seem necessary.
Among the HC2 studies, four included data on cervical cytology as well.8–10, 33 All used a threshold of ASCUS or worse for recurrent disease. Two of the studies used liquid-based cytology8, 10 while the other two used conventional cytology.9, 33 Combining the four studies using an ASCUS or worse threshold gave a pooled estimate of cytology sensitivity of 76.6% (95% CI: 62.0%–86.8%) and a pooled estimate of cytology specificity of 89.7% (95% CI: 22.7%–99.6%). The same four studies also included data on the combination of either HC2 positive or cytology positive, where a positive result on either test suggests referral to colposcopy and negative results on both tests suggest routine testing, which corresponds to clinical guidelines in the post-treatment context.2, 11 However, one study used an LSIL or worse threshold for cytology.8 From the three studies using an ASCUS or worse threshold the pooled sensitivity for the HC2 or cytology combination was 93.1% (95% CI: 16.7%–99.9%) and the pooled specificity was 75.7% (95% CI: 57.2%–87.9%) (Table 3). This specificity was lower than for HC2 alone pooled from the same three studies (79.0%; 95% CI: 69.3%–86.2%).8
For the surveillance of women after treatment for CIN, summary evidence indicates that 90.7% of women with residual or recurrent CIN 2 or worse can be identified by HC2 testing. Hybrid Capture 2 combined with cytology offers greater sensitivity (93.1%) than either test alone, but a somewhat lower specificity to HC2 alone (75.7%). Compared to the previous reviews of HPV testing after treatment, which pooled studies of PCR and HC2 testing without a common reference standard,3, 5 our approach yielded a similar value for sensitivity to that found by Zielinski et al,5 but lower than that found by Arbyn et al.3 The pooled specificities of both previous reviews were similar to ours. Women with a history of CIN are at higher risk for future CIN, and cervical treatments may change cervical architecture, potentially hindering adequate sampling and leading to decreased test sensitivity. However, our estimate of HC2 sensitivity was similar to that observed for HC2 in primary screening settings (90.7% versus 90.0%). Our estimate of pooled specificity, however, was considerably lower (74.6% versus 86.5%).7
The clinical implications of our findings are shown in Table 4. Given the mean prevalence of residual or recurrent CIN 2 or worse in the pooled studies of 6.6%, our findings indicate that in a theoretic cohort of 1000 treated women, 66 would have residual or recurrent disease. Assuming our summary estimates reflect a true difference in test accuracy between HC2 and cytology, using cytology surveillance, about 15% (n=147) would be referred to colposcopy, and 51 of the 66 women expected to have CIN 2 or worse would be identified. Using HC2 surveillance, about 30% (n=297) would be referred to colposcopy and 60 of 66 expected cases of CIN 2 or worse would be found. In other words, with HC2 testing, twice as many women would undergo colposcopy to identify 9 additional cases of residual or recurrent CIN 2 or worse for every 1000 women in post-treatment surveillance. In this setting, it seems prudent to offer a highly sensitive test; however, the impact of this clinical decision on the use of healthcare resources and the anxiety and discomfort resulting from extra procedures for women must be considered.46–49
The sensitivities reported in the studies of HC2 did not exhibit quantitative heterogeneity, however, the specificities did. The study that contributed most of the heterogeneity to the estimate of specificity was the largest.8 This study, which had the lowest specificity, included the youngest cohort of women, with a mean age of 24 years, in contrast to the other HC2 study cohorts with mean ages between 34 and 40 years. Excluding this study from pooling did not yield appreciably different estimates. The heterogeneity found for HPV specificity in our review was consistent with the heterogeneity found in a review primary screening studies, which also found lower specificity among younger women.50
Our findings are limited by the quantity and quality of available published studies addressing the properties of HPV testing for evaluation of women after treatment for CIN. We did not attempt to locate or include unpublished studies. We were not able to control for differences in patient populations between studies (such as time from treatment to first HPV test, sequential positivity and negativity, or HPV type) because of the small number of studies or lack of uniform reporting of such information; however, we did conduct a sensitivity analysis. Despite these limitations, our review of HPV testing after treatment for CIN applies the most rigorous methods of reviews published to date. Limits on current knowledge are highlighted by our findings, including the lack of randomized controlled trials comparing follow-up strategies and the limited number of studies applying a reference standard test (colposcopy) for all patients. Among those five studies that met our standards for pooling, the largest included 485 women and most were considerably smaller, while the longest mean follow-up time after treatment was only 24 months.
Recent randomized trials of HPV testing for primary screening have suggested that the higher sensitivity of HPV testing for CIN 2 or worse leads to earlier detection of these lesions and lower rates of abnormalities in subsequent screening rounds.51–54 The most recent guidelines addressing follow-up of women after treatment for CIN include HPV testing alone at 6–12 months as an initial follow-up strategy to determine whether women can return to routine screening beginning 12 months after treatment.1, 2 While a highly sensitive test may well be appropriate to detect recurrent CIN after women have undergone treatment, a long-term follow-up study of women in Sweden treated for CIN 2–3 that applied HPV testing retrospectively to archived specimens found that HPV testing 3–24 months after treatment had poor sensitivity (24%) for predicting risk of recurrence beyond 2 years.45 Initial follow-up with colposcopy in this setting offers an alternative as a highly sensitive but more costly test.55
Our findings suggest that policy recommendations for follow-up of women after treatment for CIN may have moved faster than the ability of current scientific evidence to define the most beneficial strategies. Ideally, a randomized, controlled trial comparing HPV testing to cytology, and perhaps to colposcopy, with follow-up over a period of at least five years, would be performed to clarify the best approach, and permit evaluation of the impact of both false-positive and false-negative findings. Alternatively, a large prospective cohort study with standardization of diagnostic tests and performance of a simultaneous reference standard (e.g., colposcopy) with follow-up over an extended period of time would provide important information. Until such data are available, cost-effectiveness and cost-utility models of available observational data will be valuable in determining surveillance strategies that maximize benefits and minimize harms.
Hybrid Capture 2 identifies 91% of the small proportion of women with post-treatment residual/recurrent disease, but 30% of women will test positive and need colposcopy.
Financial support This research was supported by the National Cancer Institute grant 1R01CA109142
Previous presentation of findings Preliminary results from this study were presented as a poster at the North American Primary Care Research Group (NAPCRG) 2007 annual meeting, Vancouver, BC, Canada, October 20–23, 2007
Conflicts of interest to declare: None