|Home | About | Journals | Submit | Contact Us | Français|
In 2012, the United States Preventive Services Task Force (USPSTF) and a consensus of 25 organizations endorsed concurrent cytology and HPV testing (“cotesting”) for cervical cancer screening. Past screening and management guidelines were implicitly based on risks defined by Pap-alone, without consideration of HPV test results. To promote management that is consistent with accepted practice, new guidelines incorporating cotesting should aim to achieve equal management of women at equal risk of cervical intraepithelial neoplasia grade 3 and cancer (CIN3+).
We estimated cumulative 5-year risks of CIN3+ for 965,360 women aged 30–64 undergoing cotesting at Kaiser Permanente Northern California 2003–2010. We calculated the implicit risk thresholds for Pap-alone and applied them for new management guidance on HPV and Pap cotesting, citing 2 examples: HPV-positive/ASC-US and HPV-negative/Pap-negative. We call this guidance process “benchmarking”.
LSIL, for which immediate colposcopy is prescribed, carries 5-year CIN3+ risk of 5.2%, suggesting that test results with similar risks should be managed with colposcopy. Similarly, ASC-US (2.6% risk) is managed with 6–12 month follow-up and Pap-negative (0.26% risk) is managed with 3-year follow-up. The 5-year CIN3+ risk for women with HPV-positive/ASC-US was 6.8% (95%CI 6.2% to 7.6%). This is greater than the 5.2% risk implicitly leading to referral to colposcopy, consistent with current management recommendations that HPV-positive/ASC-US should be referred for immediate colposcopy. The 5-year CIN3+ risk for women with HPV-negative/Pap-negative was 0.08% (95%CI 0.07% to 0.09%), far below the 0.26% implicitly required for a 3-year return and justifying a longer (e.g., 5-year) return.
Using the principle of “equal management of equal risks,” benchmarking to implicit risk thresholds based on Pap-alone can be used to achieve safe and consistent incorporation of cotesting.
In 2012, cervical cancer screening guidelines from both the United States Preventive Services Task Force (USPSTF) (1) and a consensus of 25 organizations endorsed concurrent Pap and HPV testing (“cotesting”) for women age 30 and older (2). These national guidelines recommended that women testing HPV-negative/Pap-negative have enough safety against cervical cancer that they can return in 5 years for routine screening. However, management of nearly every other combination of Pap result and HPV test result was left unresolved, as were other management issues such as incorporating HPV testing into post-colposcopic management. In response, the American Society for Colposcopy and Cervical Pathology (ASCCP) convened a consensus meeting, mainly to address management of abnormal cotesting results.
Because many complex combinations of Pap, HPV, and histologic test results can occur, especially over time, an organizing principle is needed to ensure that guidelines promote rational and consistent management. The fundamental organizing principle should be based on risk of precancer and cancer, because risk summarizes a complex combination of test results over time into a single number that forms the basis for action. If 2 very different combinations of screening tests yield the same risk of precancer and cancer then, all other things being nearly equal, the 2 combinations should be managed equally. This fundamental principle, “equal management of equal risks,” should ensure simplified, safe and consistent management of different complex combinations of tests that imply equal risk of precancer and cancer.
To use risk for guidelines development, we expand on the concept of “benchmarking to implicit risk thresholds” (3, 4). Until recently, cervical cancer screening was based on Pap testing alone (“Pap-alone”) without consideration of HPV test results. Underlying this was the acceptance by treating clinicians of “implicit risk.” Under this concept, risk estimates were not known when screening and management guidelines were created, but there was an underlying understanding of which screening results carried the greatest risk of clinically important outcomes, such as CIN3 and cancer. As a result, different abnormal Pap and biopsy results were managed with interventions of different aggressiveness depending on the implicit risk they carried (e.g., immediate colposcopy, return for repeat Pap testing in 6–12 months, or repeat routine screening in 3 years). When risk exceeded a given threshold, implicitly, guidelines triggered a corresponding management option.
These accepted risks for Pap-alone can be used to determine how to incorporate a new testing strategy like cotesting. Risks can be calculated for each cotest combination, and matched (“benchmarked”) to the most similar risk based on Pap-alone. In accordance with the principle of “equal management of equal risks”, the management option for the cotest result would then be the management option for the Pap-alone result with the most similar risk. Also, benchmarking to implicit risk thresholds ensures that new guidelines promote management that is comparable to existing and accepted practice, i.e., the implicit risk thresholds for each management option remain unchanged and are merely applied to cotesting.
To calculate risks of CIN2+, CIN3+, or cancer, we used data from 965,360 women aged 30–64 undergoing cotesting (and, in some of the accompanying articles, nearly 400,000 additional women aged 21–29 undergoing Pap-alone testing with HPV triage) from 2003–2010 at Kaiser Permanente Northern California (KPNC). These data are by far the longest and largest clinical experience with HPV and cytology cotesting, with longitudinal data on nearly every Pap, HPV test, biopsy, and treatment conducted, and with more than 400 cancers, 4,000 CIN3+, and 10,000 CIN2+ having been diagnosed in this cohort since 2003. The new longitudinal data permit evaluation of follow-up tests and screening intervals. Using the KPNC data, we benchmark the risk for each cotest result to the implicit risk thresholds derived from the risks for each Pap-alone result to suggest how new guidelines might incorporate cotesting according to the principle of “equal management of equal risks”. In this introductory article, we describe the principles of benchmarking, and apply the principles to 2 cotest combinations, HPV positive/ASC-US and HPV negative/Pap-negative, for guideline development.
The design of our cohort study from KPNC has been described previously (5). Briefly, we examined 965,360 women aged 30–64 from 2003 to 2010. For this introduction to the principles of risk estimation to guide management, we excluded women under age 30 and the less than 5% of women with unknown Pap result. Biopsy and cancer information was collected on all women through December 31, 2010. The data were matched to the Bay Area Cancer Registry to improve identification of all cancer cases (including women who may have left KPNC). The KPNC Institutional Review Board (IRB) approved use of the data, and the National Institutes of Health Office of Human Subjects Research deemed this study exempt from IRB review.
Pap tests were performed at KPNC regional and facility labs. HPV testing was performed at the single regional lab. Conventional Pap slides were manually reviewed following processing by the BD FocalPoint Slide Profiler (BD Diagnostics, Burlington, NC, USA) primary screening and directed quality control system, in accordance with FDA-approved protocols. Starting in 2009, KPNC transitioned to liquid-based Pap using BD SurePath (BD Diagnostics, Burlington, NC, USA). Reporting of conventional or liquid-based Pap tests was according to the 2001 Bethesda System(6). Hybrid Capture 2 (HC2; Qiagen, Germantown, MD, USA) was used to test for high-risk HPV types according to manufacturer’s instructions.
Women were followed according to routine local practice. The Permanente Medical Group (TPMG), which is the physician component of KPNC, develops Clinical Practice Guidelines for cervical cancer screening and management of abnormal tests in KPNC in partnership with the KP National Guideline Program, Care Management Institute, to support the clinical decisions of their providers. Almost all women with ASC-US receive HPV triage, and cotesting is almost always performed for women 30 and older. Women with HPV-negative/Pap-negative contests are rescreened in 3 years. Cotesting is sometimes used in management of abnormal results.
Cumulative risk of CIN2+, CIN3+, or cervical cancer for each co-test result was calculated as the sum of risk at the baseline screen (plotted at time zero on each figure) and the risk after baseline. Cancer itself is rare, thus, we focused on CIN3+. As an example, presume 1,000 women are found to have LSIL at the baseline screen, and 24 are diagnosed with CIN3+ at the biopsy immediately following the baseline screen, and then another 28 are diagnosed with CIN3+ over the next 5 years. Then the immediate CIN3+ risk following LSIL is 24/1000=2.4% and the 5-year cumulative risk for LSIL is 2.4%+2.8%=5.2%. We primarily focused on risk of CIN3+ rather than CIN2+ because CIN2 is unreliably determined by pathologists (7, 8), often regresses (9, 10), and may simply reflect uncertainty between acute HPV infection (CIN1) and CIN3 (11). Of note, these analyses predated the adoption of the new LAST diagnostic terminology (12); thus, we continue to use CIN terminology.
We calculated risk at the baseline screen as simply the proportion of women diagnosed with CIN2, CIN3, or cancer at the biopsy following the baseline screen. This was estimated separately for each co-test result or Pap, including the very few women testing HPV-negative/ASC-US or Pap-negative at their baseline screen but who underwent colposcopy anyway. To calculate risk after the baseline screen, we used Weibull survival models (13) to estimate risks over time strictly after the baseline screen, among women for whom CIN2+ was not found at the baseline screen. Weibull models can make smoother and more accurate risk estimates than non-parametric methods analogous to Kaplan-Meier (14) and naturally handle interval-censoring of disease outcomes between screening tests. Separate Weibull models were fit for each co-test result or Pap alone. For more details on the risk calculations and Weibull modeling, see Supplemental Digital Content 1.
Table 1 shows the raw data for the estimations, i.e., the distribution of the worst histologic findings by Pap result through 2010, for women age 30–64. Figure 1 shows risks of CIN2+, CIN3+, and cancer, for each Pap-alone result. In purple are the Pap-alone results for which immediate colposcopy is usually performed (i.e., LSIL, AGC, ASC-H, HSIL, and SCC). LSIL, the result with the lowest CIN3+ risk but for which colposcopy is usually performed, had baseline CIN3+ risk plotted at time zero of 2.5% (95%CI 2.2% to 2.8%). ASC-US alone (HPV unknown), for women typically return for a repeat test in 6–12 months, had baseline CIN3+ risk of 1.3% (95%CI 1.1% to 1.4%). Thus immediate colposcopy is implicitly being recommended by current practice if the baseline CIN3+ risk is 2.5% or greater, not recommended if baseline risk is 1.3% or below, and we have no reference point for how baseline risks between 1.3% and 2.5% would be managed.
In red (Figure 1) is the Pap-alone result for which women typically return in 6–12 months (ASC-US). The 1-year (not baseline) CIN3+ risk after an ASC-US Pap result is 1.6% (95%CI 1.5% to 1.8%). Thus a 1-year return is implicitly recommended if the 1-year CIN3+ risk is around 1.6%. Finally in black (Figure 1) is the Pap-alone result for which national guidelines recommend a 3-year return (i.e., Pap-negative). The 3-year CIN3+ risk is 0.16% (95%CI 0.15% to 0.17%). In effect, 0.16% risk of CIN3+ is the level that national organizations listed above have implicitly determined is acceptably safe, as applied to the KPNC population.
Importantly, the uniform pattern of risk curves allows us to simplify the benchmarking of risk thresholds to 1 time-point. In Figure 1, all the risk curves stack neatly and do not intersect. When 1 Pap interpretation has a greater risk at a certain time than another Pap interpretation, then the risk will be greater for all times. This allows us to simplify the risk thresholds by choosing a single time at which to compute risk thresholds. We choose 5 years because it is currently the longest screening interval allowed for any cotest result by the national guidelines, and a long interval gives maximal chance for any CIN not diagnosed baselinely to be diagnosed later. In addition, a single time point facilitates risk benchmarking when a cotest risk falls between risk thresholds. Using 5-years for the risk thresholds, immediate colposcopy is typically performed if 5-year CIN3+ risk exceeds 5.2% (i.e., 5-year CIN3+ risk from LSIL), a 6–12 month return follows a 5-year CIN3+ risk of around 2.6% (i.e., ASC-US), and a 3-year return is recommended by national guidelines if 5-year CIN3+ risk is around 0.26% (i.e., Pap-negative).
Examples will clarify the use of benchmarking. Using data from Figure 2, we benchmark risks for HPV-positive/ASC-US to the Pap-alone implicit risk thresholds. The 5-year CIN3+ risk for women with HPV-positive/ASC-US is 6.8% (95%CI 6.2% to 7.6%), which is greater than the 5.2% 5-year risk implicitly linked to referral to colposcopy. Similarly, the baseline CIN3+ risk for HPV-positive/ASC-US was 3.2%, which is greater than the 2.5% baseline threshold for colposcopy. Thus, to maintain consistency with current risk thresholds for clinical action in the US guidelines based on Pap-alone, HPV-positive/ASC-US would be referred for immediate colposcopy.
Using data from Figure 2, we benchmark risks for HPV-negative/Pap-negative to the Pap-alone implicit risk thresholds. The 5-year CIN3+ risk for women with HPV-negative/Pap-negative was 0.08% (95%CI 0.07% to 0.09%), far below the 0.26% implicitly linked to a 3-year return. In fact, the 5-year CIN3+ risk of 0.08% is less than the 3-year CIN3+ risk for Pap-negative of 0.16% (p<0.0001). Thus, to maintain consistency with current risk thresholds for clinical action based on Pap-alone, HPV-negative/Pap-negative women would be asked to return at a longer interval than 3 years, and 5 years is the current national guideline.
Table 2 benchmarks all cotest CIN3+ risks to the implicit CIN3+ risk thresholds from Pap-alone. The left side presents the risks for all Pap combinations in descending order, and the right side presents the risk for all co-test combinations in descending order (HPV-negative/ASC-H and HPV-negative/AGC are grouped with all high-grade Pap results because their cancer risks are high). The colors shade the possible management options.
To ensure safe and consistent management of the many complex combinations of abnormal Pap, HPV, and histologic test results in new management guidelines, we expanded on the concept of “benchmarking to established, implicit risk thresholds”. Using data from approximately 1 million women undergoing screening at KPNC, we derived implicit risk thresholds for Pap results alone, based on 5-year CIN3+ risks. We benchmarked the risk for each cotest combination to the implicit risk thresholds for Pap-alone, to ensure “equal management of equal risks”. For example, we showed that HPV+/ASC-US has risk greater than the LSIL-derived risk threshold, and thus HPV+/ASC-US would be referred for colposcopy. The following papers in this Monograph will systematically address different cotest combinations in screening and management.
Although we used CIN3+ risk thresholds, the same management conclusions would be reached if CIN2+ risk thresholds were used instead. CIN2+ risk thresholds can be useful for situations where there are too few CIN3+ endpoints in the KPNC data to develop reliable CIN3+ risk estimates. Cancer risk thresholds might be the ideal risk thresholds, especially for assessing the safety following a negative screening test, but for most situations at the low-risk end of the management spectrum (return in 1-, 3-, or 5-years), there are too few cancer outcomes to reliably estimate cancer risks.
Another advantage of benchmarking to implicit risk thresholds is that it applies in any situation, i.e., not only for management of abnormal Pap, but also post-colposcopic management. Risks after colposcopic findings can be benchmarked to the risks after Pap-alone, ensuring “equal management of equal risks” across the entire cervical screening and management program. This is particularly useful as the previous guidelines for post-colposcopic management of CIN were sometimes based on expert opinion rather than more formal evidence.
Although risk is clearly the most important consideration, the harms of management options must also be considered, in particular, the number of colposcopies and excisional treatments expected to be performed under a management guideline. For example, the CIN3+ risk we calculate for ASC-US is somewhat close to LSIL, and could tempt one to lower the immediate colposcopy threshold slightly and refer all ASC-US to colposcopy. However, 2.8% of all women had ASC-US, so referring all ASC-US to immediate colposcopy would nearly triple the overall colposcopy burden from 1.5% to 4.3%. This would be a substantial increase without proven benefit, so a 6–12 month return for ASC-US was also based on the practical need to limit the colposcopy burden and the recognition that tripling the number of colposcopic examinations would inevitably lead to the detection and (over-) treatment of some lesions that are destined to regress spontaneously.
Therefore, the fundamental principle of “equal management of equal risks”, while powerful, does not incorporate all aspects of clinical management. In particular, it presumes that the burden of each management option (i.e., the expected number of colposcopies/treatments) is similar for the benchmark risks as for the risks based on the new tests. For example, consider a test result that has exactly the CIN3+ risk of ASC-US, but is extremely rare. Referring all women with this test result for immediate colposcopy will not appreciably add to the colposcopy burden. Also, the balance may be tipped in the favor of immediate colposcopy based on other considerations, for example, if the cancer risk following a particular cotest result (e.g., cancer, including endometrial cancer, associated with HPV-negative/AGC) is particularly high. Furthermore, as we discuss in another article (REF) women age 21–24 are generally managed more conservatively, for a given risk of CIN3+, than women age 30–64 because of the possibility that excisional treatments could increase their chance of pre-term birth in the future. Therefore, “equal management of equal risks” is an excellent starting point for the discussion of management options, but is not a rigid mechanical principle for determining management.
The KPNC data provide some advantages over relying on clinical trial data. Although clinical trials can show efficacy in tightly controlled, ideal circumstances, effectiveness in routine clinical management is the final proof of clinical utility. Our risk estimates reflect the real-world complexity of clinical practice (5). In other words, the KPNC experience serves as a large-scale demonstration project of what could be achieved by incorporating HPV testing into routine clinical practice.
However, using the KPNC experience to aid guidelines development has some limitations, because it may not be generalizable to certain other populations with different patterns of care, compliance, testing, rescreening intervals, ethnicity or risk factors. KPNC is an integrated health delivery system caring for more than 3.2 million persons (approximately 30% of the population in 14 Northern California counties) who are broadly representative of the local and statewide population (with the exception of a slight under representation of the extremes of income). KPNC is a well-screened population with historically below-average cancer risks. KPNC uses cotesting (with HC2) rather than Pap alone for screening and follow-up, affecting the detection of CIN2+, CIN3+, and cancer in unknown ways. KPNC utilizes an electronic health record for capture of information on all outpatient and inpatient clinical encounters, prescription medications, and laboratory tests. KPNC also has quality management systems in place at the local and regional levels to facilitate follow-up and management of patients with abnormal tests results. Some practices may not have equivalent quality management capabilities and practices. Results are still generalizable in that the concept of benchmarking to implicit risk thresholds only requires that the risk bands for equal management in Table 2 are the same within the population to which they are being applied. For example, in KPNC HPV+/ASC-US had the same risk as LSIL. This equivalence of risk has been observed in research studies in other populations as well (15). Although the risks of HPV+/ASC-US and LSIL differ between populations, what matters is that the 2 results also have comparable risk in the particular population, thus permitting similar management. Nonetheless, it would be helpful to have data from other US and international populations to assess generalizability.
In the following articles in this Monograph, we apply the principles of benchmarking to each of the combinations of HPV testing and Pap result, to post-colposcopic management, and to post-treatment follow-up. We also consider the special situation of women <25, whose risks of cancer are substantially lower than for older women. Although the scope of the articles is restricted to cotesting in the management of cervical screening abnormalities, the establishment of a risk-based management philosophy and the benchmarks for clinical decision-making provide objective guidance for guideline development, and ultimately to integrate future advances (e.g., newly validated biomarkers) and changes in the population risk (e.g., HPV vaccination).
Supplemental Digital Content 1. Microsoft Word file with details on statistical methods and equations. doc
Role of the funding source
The Intramural Research Program of the US National Institutes of Health/National Cancer Institute reviewed the final manuscript for publication. The Kaiser Permanente Northern California Institutional Review Board (IRB) approved use of the data, and the National Institutes of Health Office of Human Subjects Research deemed this study exempt from IRB review.
Conflicts of Interest: Dr. Schiffman and Dr. Gage report working with Qiagen, Inc. on independent evaluations of non-commercial uses of care HPV (a low-cost HPV test for low-resource regions) for which they have received research reagents and technical aid from Qiagen at no cost. They have received HPV testing for research at no cost from Roche. Dr. Castle has received compensation for serving as a member of a Data and Safety Monitoring Board for HPV vaccines for Merck. Dr. Castle has received HPV tests and testing for research at a reduced or no cost from Qiagen, Roche, MTM, and Norchip. Dr. Castle is a paid consultant for BD, GE Healthcare, and Cepheid, and has received a speaker honorarium from Roche. No other authors report any conflicts of interest.