Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Cancer Res. Author manuscript; available in PMC 2011 April 15.
Published in final edited form as:
PMCID: PMC2855741

A population-based prospective study of carcinogenic human papillomavirus (HPV) variant lineages, viral persistence, and cervical neoplasia


HPV types differ profoundly in cervical carcinogenicity. For the most carcinogenic type, HPV16, variant lineages representing further evolutionary divergence also differ in cancer risk. Variants of the remaining 10-15 carcinogenic HPV types have not been well-studied.

In the first prospective, population-based study of HPV variants, we explored whether, on average, the oldest evolutionary branches within each carcinogenic type predicted different risks of ≥2-year viral persistence and/or precancer and cancer (CIN3+). We examined the natural history of HPV variants in the 7-year, 10,049-woman Guanacaste Cohort Study, using a nested case-control design. Infections were assigned to a variant lineage determined by phylogenetic parsimony methods based on URR/E6 sequences. We used the Fisher's combination test to evaluate significance of the risk associations, cumulating evidence across types.

Globally, for HPV types including HPV16, the p-value was 0.01 for persistence and 0.07 for CIN3+. Excluding HPV16, the p-values were 0.04 and 0.37, respectively. For HPV16, non-European viral variants were significantly more likely than European variants to cause persistence (OR = 2.6, p = 0.01) and CIN3+ (OR = 2.4, p = 0.004). HPV35 and HPV51 variant lineages also predicted CIN3+.

HPV variants generally differ in risk of persistence. For some HPV types, especially HPV16, variant lineages differ in risk of CIN3+. The findings indicate that continued evolution of HPV types has led to even finer genetic discrimination linked to HPV natural history and cervical cancer risk. Larger viral genomic studies are warranted, especially to identify the genetic basis for HPV16's unique carcinogenicity.

Keywords: HPV, variants, evolution, cervix, cancer


The evolutionary biology of papillomaviruses, as understood through analyses of phylogenetic trees, is tightly linked to host and tissue tropisms, biological behavior and neoplastic potential. Papillomaviruses have evolved over millions of years from common ancestors into specific host ecosystems. Among human papillomaviruses (HPV), the alpha genus is associated with infections of the anogenital and oral mucosa (Figure 1, Chen and Burk). The topology of the alpha HPV phylogenetic tree based on complete genome information or restricted to the early region of the PV genome strongly predicts viral behavior, including highly specific tissue tropisms (e.g., vagina vs. cervix), induced cytomorphic abnormalities, and persistence related to carcinogenicity (1-3).

Figure 1
Bayesian tree of 99 human papillomavirus types inferred from the concatenated amino acids and nucleotide sequences of 6 ORFs (E6, E7, E1, E2, L2 and L1). MrBayes v3.1.2 (44,45) with 10,000,000 cycles for the Markov chain Monte Carlo algorithm was used ...

A subset of mucosal HPV genotypes (“types”) in the alpha genus (Figure 1) causes virtually all cases of cervical cancer (and some other anogenital and extragenital cancers not discussed here). More specifically, there are three main branches (clades) of the alpha genus; all established carcinogenic HPV types are in a single high-risk clade comprised of 5 species groups (“species”): alpha-5, alpha-6, alpha-7, alpha-9, and alpha-11.

The 5 species in the high-risk clade have different risk profiles. Alpha-9 is the most important species, consisting almost entirely of carcinogenic types.

Even infections with HPV types in the high-risk clade typically “clear”, defined as becoming undetectable even using sensitive HPV DNA test methods (4,5). Many infections are undetectable by 3-6 months later, and >90% eventually clear within a few years. Persistent HPV infections with carcinogenic types are linked to a high absolute risk of subsequent precancer (cervical intraepithelial neoplasia grade 3, CIN3) and an elevated risk of eventual cancer.

At an even finer level, within each species of the high-risk clade, HPV type-defined viral behaviors of persistence and carcinogenicity have characteristics of a quantifiable genetic trait. An HPV type is ≥10% different than all other characterized PV types in the L1 ORF (6). Individual types in the highest risk alpha-9 species show very large differences in risk of precancer (associated with viral persistence) and eventual cancer. At the extremes, HPV16 is responsible for half of the cases of cervical cancer globally, while HPV67 has very rarely been found alone in cancer (7,8).

Since the emergence of distinct HPV types within species, population expansion, geographic dispersion and time have contributed to the further evolution of HPV types into variant lineages (“variants”). Variants of each type have evolved primarily through single nucleotide variations with a pattern of lineage fixation of these variations (9,10). HPV variants can be classified based on their nucleotide sequence differences with most variants differing by ≤ 3%, however there are a few examples of more distantly related variants that have also been termed subtypes (e.g., HPV68 variants)(11,12).

Despite evolutionary relatedness, HPV variants can differ in carcinogenicity. As stated above, the variants have co-evolved with humans to correspond, at least for HPV16, with human geographic dispersion (13). Asian-American or African HPV16 variants have been associated in case-control studies with five-fold or greater risks of cancer compared to European variants. Of note, European variants of HPV16 are still very high risk, albeit at lower risk than other HPV16 variants.

There is some suggestion that variants of other HPV types also confer different risks of viral persistence and/or progression to precancer. However, except for some work on HPV18 in alpha-7 and on HPV variants in HIV-infected individuals, there is little known about the natural history of variants of types other than HPV16 among immunocompetent women.

This research gap leaves important questions unanswered. For example, HPV16 variants are all carcinogenic although they differ in risk. But variants in a moderately carcinogenic type like HPV31 could represent a homogeneous set of moderately carcinogenic variants, or a heterogeneous set of variants ranging from HPV16-like behavior to much weaker behavior. HPV types with only borderline carcinogenicity could have a single or few minor variants that are definitely carcinogenic.

In a previous case-control study of HPV16 variants in Guanacaste Province, Costa Rica, we observed differences in carcinogenicity (14). To address systematically the role of HPV variants in viral natural history and cervical carcinogenicity, we have now performed a comprehensive population-based prospective study of variants of all the HPV types in the high-risk alpha clade. We assessed whether variants present in Guanacaste were associated with persistence and/or risk of CIN3 and cancer (CIN3+). As a logical starting point, we studied whether the oldest (i.e., most divergent) evolutionary branching of each carcinogenic type affected viral natural history. The ultimate goal is to identify sequence-specific differences that provide clues about the biological basis of HPV carcinogenicity and eventually therapeutic antiviral targets.


Study Population

The Proyecto Epidemiologico Guanacaste study population is outlined in Figure 2. The project consented and enrolled a random sample of 10,049 (93.6% of eligible) women residing in a high-risk Costa Rican province in 1993-1994. The study was approved by Costa Rican and U.S. ethical boards (15,16). To supplement the random sample, we attempted to recruit all women with invasive cervical cancer diagnosed in Guanacaste during the enrollment phase of the study, and successfully enrolled 31.

Figure 2
Study Population. The cases and controls for the analysis were drawn from the population-based prospective Proyecto Epidemiologico Guanacaste, which included a random sample of approximately 1/6 of the adult women in the province. During the enrollment ...

A few, well-trained nurses screened adult women (age range 18–97) using visual assessment, an early pooled HPV test (Hybrid Capture Tube Test, Digene, Gaithersburg, MD), three kinds of cytology, and cervicography (magnified cervical images). We did not screen virgins (n = 583) until they became sexually active. Women without hysterectomies were categorized into several groups based on the results of the enrollment screening examination (which did not include HPV genotyping) and followed differently according to predicted risk of developing CIN3+. Women with any abnormal screening result were re-screened at 6-12 month intervals for seven years. Those without any abnormal screening test were re-screened at the five-year anniversary of their enrollment.

Throughout the study, women with possible CIN2 (equivocal precancer) or worse (CIN3, cancer), suggested by any screening technique including nurse concern on gross examination were referred to colposcopy with either guided biopsy of visible lesions or immediate loop electrical excision procedure depending on referral reason, colposcopic impression, age and parity. These women were followed individually for safety, with in-patient surgery if needed, but no longer studied in the cohort.

Cases and Controls

A case-control study was nested within this prospective cohort analysis. For each woman and viral type, we defined disease outcomes in a hierarchical manner as the most severe diagnosis during the study (baseline and follow-up combined). The original case categories were invasive cancer, else histologic CIN3, else CIN2, else HSIL cytology by liquid-based cytology that did not lead to histologic confirmation of CIN2 or worse, else definite long-term viral persistence of 2 years or longer (median = 46 months), else short-term viral persistence for at least 2 visits not observed to last 2 years (median = 8 months). For each HPV type, the original control group was chosen as a random sample of women with a single detection of that HPV type, frequency matched approximately 2:1 to the original cases.

We condensed the original case-control groups to achieve more statistical power. Because infections with short-term persistence did not vary from single detections in terms of variant distribution (data not shown), we combined the two to form a single control group (Figure 2). We compared this single control group to three case groups: CIN3+, CIN2, and long-term persistence (a relatively rare outcome used as an endpoint in some HPV vaccine trials). We combined the few cases of invasive cancer with CIN3 (CIN3+), but examined invasive cancer separately for HPV16, since we had adequate power.

The women in this nested case-control analysis had a mean of 5.0 visits (standard deviation = 3.0 visits), median of 4.0 visits, and a range of 1-14 visits including enrollment and follow-up specimen collection visits. The mean follow-up was 6.5 +/− 2.3 years (median 7.0 years).


Histology used for clinical purposes in Costa Rica was reviewed by U.S. pathologists to define study disease outcomes. The few cases of glandular lesions were combined with comparably severe squamous in situ or invasive carcinoma. The final assignment of cases as invasive cancer, CIN3, CIN2, or <CIN2 was made by an algorithm of the Costa Rican and U.S reviews (17). A few difficult cases were adjudicated by joint review, occasionally with consideration of cytologic slides as well as histology.

CIN1 was considered a morphologic consequence of HPV infection, not serious neoplasia, and was not treated during follow-up. Although CIN2 is a widely accepted treatment threshold, it is a sub-optimal surrogate endpoint for cancer risk because of its poor reproducibility and diagnostic heterogeneity (17,18). Thus, we kept histologic CIN2 and even more equivocal outcomes, like histologically unconfirmed cytologic high grade squamous intraepithelial lesions (HSIL), separate from the primary CIN3/cancer analytic case group as diagnosed by biopsy, LEEP, or surgical specimen.

HPV Detection and Genotyping

Typing methods

Cervical specimens were collected by nurses using broom devices, and were placed into Digene specimen transport medium (STM). HPV DNA was detected using a MY09/M11 L1 primer PCR system (MY09/11 PCR) with AmpliTaq Gold polymerase as previously described (19-21). Briefly, an aliquot of the STM specimen was lysed, and the specimen DNA was precipitated by ammonium acetate/ethanol solution, pelleted by centrifugation and resuspended in TE buffer. After PCR, the amplicons were analyzed by gel electrophoresis and Southern blot hybridization as previously described. Filters were individually hybridized to biotinylated, type-specific oligonucleotide probes. Dot-blot hybridization was used for typing (probes: 2, 6, 11, 13, 16, 18, 26, 31-35, 39, 40, 42-45, 51-59, 61, 62, 64, 66-74, 81-85, and 89). A specimen was considered HPV positive, but uncharacterized, if it tested positive for HPV DNA by the radiolabeled generic probe mix but was not positive for any type-specific probe.

Testing with additional PCR primer sets

To verify the sensitivity of the MY09/11-based PCR method, and to rule out false negative results, we tested approximately 2000 initially negative specimens using other primer sets (GP5+ and FAP). (21,22). The specimens were derived from initially HPV-negative women with cytologic or histologic abnormalities (low-grade or worse) at enrollment or during follow-up (n=850), women with 5 or more sexual partners in their lifetime (n=187), and a random selection of >1000 women with no reason to suspect false-negativity (n=955). Few specimens re-tested as positive for any HPV type by any of the other primers (n = 69); we included specimens positive for carcinogenic types in variant testing.

HPV Variant Selection and Testing Methods

Choice of HPV types for variant testing

For this analysis, we were interested in the possible relationship of HPV variants to risk of CIN3+. Thus, we considered only HPV types in the high-risk clade (Figure 1), which includes all types considered carcinogenic or at least possibly carcinogenic according to the latest IARC classification (7). These include all types in genus alpha-9 (HPV31, 33, 35, 52, 58 and 67), alpha-11 (HPV73), alpha-7 (HPV18, 39, 45, 70, and 68), alpha-5 (HPV51 and 69), and alpha 6 (HPV53, 56, and 66). We excluded those types in the high-risk clade that were extremely rare in our population.

Choice of which specimen to sequence from each woman

In this longitudinal analysis, most women had several specimens available for analysis. Initial comparison of variants from multiple samples from a single participant confirmed that we rarely detected more than one variant or a change in variant per subject over time. Specifically, to assess the intra-individual variability of variants in women infected with the same type more than once, we tested two specimens from different visits for a convenience sample of 268 women (mean time difference of 3.0 +/- 2.2 years, range of 0.7-7.7 years). Only 6 women had discordant variant results (mean time of 4.6 +/- 0.9 years).

We had begun by choosing the last positive specimen for cases before diagnosis of CIN2+, and (arbitrarily) the first positive for controls. Infections after CIN2+ diagnosis were not considered, because of possible treatment effects. Some specimens were depleted; and when we confirmed that HPV variants rarely varied for a woman, we switched to selecting the available sample with the highest signal strength on PCR testing for that HPV type.

Determination of variant lineages

We have been studying variant lineages from specimens tested in our laboratory; they are derived from Guanacaste, Costa Rica and many other countries. In order to classify variant lineages of carcinogenic HPV types detected by dot blot hybridization, we designed type-specific primer sets to amplify a partial fragment of the URR region and/or E6 gene using a one-tube nested PCR method (23). We test E6 regions for specimens that do not yield data for the URR region. PCR product size is confirmed by gel electrophoresis, purified using QuickStep 2 PCR Purification Kit (Edge BioSystems, USA) or QIAquick Gel Extraction Kit (Qiagen, USA), and submitted for sequencing of both strands.

We consider a specimen to contain a possibly unique variant if the amplified URR and/or E6 fragment sequences, in more than 1 specimen, differ by ≥2 SNPs compared to the prototype variant lineage or other available complete genome sequence. To pursue the possibility of a new variant, we amplify and sequence the complete genomes (9,10). Type specific variant lineages are defined based on a complete genome nucleotide sequence having >1% dissimilarity with the prototype and are named sequentially as A (prototype), B, C, etc. according to the phylogenetic tree topologies for each individual type (Chen & Burk, personal communication; manuscript in preparation). If the dissimilarity is ≤1%, we sometimes designate A1 vs. A2, etc. The nomenclatures of HPV16, HPV18 and HPV45 variant lineages are based on previous publications (10,23,24).

Lineage Groupings

For this project testing Guanacaste specimens, masked to case-control status, we assigned each HPV type detected to a specific variant lineage. The phylogenetic trees for the vast majority of the alpha PV variants have a root bifurcation (two main evolutionary branches), similar to that seen for HPV16, 18 and 45 (9,10). The average percentage of the complete genomic variation between the dichotomized lineage groups varies by type. For example, for HPV68, the two variant lineages differs by ~8% across the complete genomes, indicating that these are technically subtypes of HPV68. For HPV35, the two lineages found in Guanacaste differed by <1% in the complete genomes; as they are minimally divergent, they were named A1 and A2 rather than assigning them separate letters.

Data Analysis

This nested case-control analysis was performed at the level of individual infections, not individual women. Multiple HPV infections, whether concurrent or not, are common. Each infection outcome was considered separately, and the women in the cohort could contribute to multiple type analyses and case-control outcomes. For example, an individual with HPV X linked to development of CIN3 or CIN2 (it could not be both because of censoring for treatment) and HPV Y that cleared quickly would contribute both a case infection for the HPV X analysis, and a control infection for the HPV Y analysis. Moreover, a case of CIN3+, CIN2, persistence, or a control could be associated with 2 or more HPV types, and be counted more than once. This occurred in 38.4% of the 2074 women in the study; on average, each woman contributed 1.6 infections to the analysis. Because we have not found confounding between multiple infections in previous HPV natural studies in which we employed GEE statistical methods (25), we checked for possible confounding more simply in this analysis by restriction to women with single infections. This decreased statistical power and widened confidence intervals, but did not change the conclusions (data not shown). Thus, the full results are presented.

Statistical testing

We wished to know whether the major bifurcation, i.e., the division into dichotomized variant lineages (i.e., arbitrarily defined as lineage “A” or “B”, etc.) affected the probability of different case-control outcomes for each HPV type within the high-risk clade individually, and globally combining over all types. To assess each type individually, we compared the distribution of the major variant lineages among the cases and controls, within that type. For ease of presentation, we defined the “test” and “referent” variants arbitrarily so that the OR was positive for the risk of CIN3+. We calculated the odds ratio (OR) as a measure of the strength of association for each lineage and case group (CIN3, CIN2, persistence) compared with controls.

We did not large enough numbers of individual types (except for HPV16) to calculate confidence intervals around the OR estimates; instead, we calculated a p value by Fisher's exact test (26). Specifically, we calculated a mid-p statistic for each type because the standard p-values for exact tests are too conservative (27). To test whether variant lineages, on average, were associated with probability of case-control status, we combined the p-values across the types (analogous to a meta-analysis) using the Fisher's combination test (28). For extra rigor, we computed the Fisher's combination test statistics twice, including and excluding HPV16 (because we had prior reason to expect that risk associations exist for HPV16 variants).

To assure that we highlighted evidence only from types for which we had reasonable statistical power, we excluded from statistical testing any case-control comparison for which there were fewer than 5 case outcomes. We also excluded variant lineages for types having limited variability within the Guanacaste population (i.e., <1% variability), reasoning that these variants might be too minor to show phenotypic differences. The results of this agnostic decision in the special and interesting case of HPV35 are discussed in the results.


The results are shown in Table 1. The order of the species (alpha-9, 11, 7, 5, 6) and types within groups corresponds to Figure 1, to highlight phylogenetic relatedness. For HPV types including HPV16, the combined mid-p-value for CIN3+ versus controls was 0.07, for CIN2 versus controls it was 0.27, and for long-term persistence it was <0.01. When we excluded HPV16, the combined mid-p-values were 0.37, 0.25, and 0.04, respectively. Thus, the risk associations for HPV16 contributed greatly to the global statistic for CIN3+ when the predominant, powerfully carcinogenic type was included. Nevertheless, the association of variants with persistence remained significant even when HPV16 was excluded.

Variant Lineages of HPV Types in the High-Risk Clade and Outcome of Infection in the Prospective Guanacaste, Costa Rica Cohort Study

In confirmatory analyses, we re-created Table 1 combining all women without CIN2+ as the controls, and CIN3+ as the main case group (because CIN2 is unreliable). There were isolated small differences for a few types but no change in overall interpretation. We also restricted the analysis to incident infections. The statistical power was much reduced, but the risk relationships were unchanged within the limitations of the data.

Most type-specific analyses had limited power, except for HPV16 for which we were able to compute reliable confidence intervals. We previously published in a preliminary, cross-sectional examination of HPV16 during the enrollment phase of the Proyecto Epidemiologico Guanacaste that the non-European variant lineage was associated with an increased risk of histologic CIN2+ (14). We confirmed in the current analysis that the risk of the non-European lineage was especially high for the 10 screen-detected cases of invasive cervix cancer (OR = 6.3, 95% CI 1.6-24.6), and was significantly elevated among the 70 cases of CIN3 (OR = 1.8, 95% CI 0.9-3.6). Addition of 10 HPV16-associated supplemental cases of invasive cervical cancer diagnosed during the time of the Guanacaste Cohort Study, outside of the random sample of the population that comprised our cohort, supported the association for invasive cancer (OR = 5.25 (1.4 -19.6)). Viral persistence of HPV16 for 2 or more years was linked to the same non-European variants as CIN3 and cancer (OR = 2.6, 95% CI 1.2-5.7). It should be noted that the Asian American HPV16 variants comprised 94% of the non-European variants detected in the Guanacaste population.

The more equivocal precancerous diagnoses of histologic CIN2 or cytologic HSIL were not associated with HPV16 variant status (or any other type with the possible exception of HPV18).

Alpha-9, which contains HPV16, is the most important HPV species with regard to carcinogenicity; all other types in this species with the exception of HPV67 are also clearly carcinogenic. Nevertheless, HPV35 infections in this population had limited genetic variability precluding the characterization into major variant lineages at the >1% difference level. Although HPV35 variants were excluded from the main calculation of p-values, because A1 and A2 vary by less than 1% in the whole genome, A1 was associated with a substantial and statistically significantly elevated risk of both CIN3+ (OR = 6.4) and long-term viral persistence (OR = 3.7) compared with A2.

There are three lineages of HPV31 of nearly equal phylogenetic distance (Chen and Burk unpublished data, manuscript in preparation); comparing A and B to C resulted in an OR of 1.8, but the p-value was non-significant. No other grouping was linked to risk. For several of the carcinogenic HPV16-related types in alpha-9, all cases of CIN3 were linked to one of the two variant lineage groups (e.g., HPV33, 52 and 58), resulting in OR estimates of infinity. However, these findings were highly unstable and not statistically significant.

For many of the remaining HPV types in other species, one of the variants was associated with >1.5 - 2 times the risk of CIN3+ compared to the other arbitrarily chosen referent. The B variant of HPV51 was significantly associated with increased risk of CIN3+, although there were few cases.


We designed this population-based case-control study to extend our previous analysis in Guanacaste that demonstrated strong differences in risk of viral persistence and CIN3+ diagnosis for different HPV types in the alpha genus (1). In that analysis combining phylogenetics and epidemiology, we observed that all carcinogenic and possibly carcinogenic types belonged to one evolutionary branching or clade. This high-risk clade was generated in large part by the early gene-containing segment of the HPV genome (11,29), which contains the HPV oncogenes. Nevertheless, the types in the high-risk clade showed strong differences in their carcinogenic potential. HPV carcinogenicity may resemble a quantitative trait that peaks with the genome of HPV16 and nearly completely dissipates with other types, such as the evolutionarily related alpha-9 species member, HPV67.

We examined whether more recent evolutionary forces resulting in variant lineages of specific types might yield even better prediction of viral natural history. For each HPV type emerging from the high-risk alpha clade, we determined whether the earliest, most divergent variant lineages predicted risk of viral persistence or CIN3+ compared with viral clearance. To our knowledge, this is the first complete analysis of HPV variants and cervical carcinogenicity.

Our results demonstrate that variants of HPV16 have evolved clearly different propensities for both viral persistence and carcinogenicity. In Guanacaste, the higher risk, non-European HPV16 variants were virtually all Asian-American. Other investigators have shown that, compared with European lineages, African variants of HPV16 are associated with a higher risk of viral persistence, CIN3, and cancer (13). The underlying genetic details that make non-European variants of HPV16 more carcinogenic are not known, and this important clue definitely deserves pursuit to identify the underlying genetic basis of this association.

Even the large Guanacaste cohort was too small to generate enough cases to evaluate conclusively the association of variants of other HPV types and risk of CIN3+. We had hypothesized that some HPV types with intermediate (like HPV52 or 56) or even equivocal carcinogenicity (like HPV68) might be a mixture of strongly carcinogenic and non-carcinogenic variants. If true, such heterogeneity would have shown itself as large variation in risk between lineages. However, there were too few cases to find all but extreme risk differences. From a genetic epidemiological point of view, this is reminiscent of the large sample sizes required to identify single polymorphic nucleotides (SNPs) with a modest risk for breast, colon or prostate cancer.

We found substantial and significant associations HPV35 variant A1 compared to A2 for both CIN3+ (OR 6.4) and long-term persistence (OR 3.7). The finding for HPV35 is especially interesting because it is so closely related to HPV16. Variants of HPV35 have not been studied previously in relation to carcinogenicity. The number of SNP changes differentiating these HPV35 variants is only about 100. If we had included HPV35 in the mid-p calculations, the global estimates would have strengthened particularly for a summary (albeit a posteriori) examination of alpha-9 types (including HPV16, mid-p would be 0.001 for persistence, 0.03 for CIN3+). We intend to explore whether higher-risk HPV35 variants resemble HPV16 in pending full genome studies.

There have been only a few previous studies of variants of types related to HPV16 in immunocompetent women, including two previous reports of HPV33 variants associated with high-grade cytology/histology (30,31), and one report of HPV58 variants associated with increasing severity of neoplasia (32). Another study of HPV58 and HPV52 variants was null (30). A needed standard nomenclature of variants of all the carcinogenic HPV types accepted by the International Classification of Tumor Viruses Committee is forthcoming, which will promote comparisons and combinations of data (available from the authors, Chen & Burk, manuscript in preparation).

Regarding HPV species other than alpha-9, the numbers of cases of CIN3 for individual types were very small; some of the positive and negative results could have resulted by chance. For example, we observed a significantly elevated risk of CIN3+ for HPV51 variant B compared with variant A. There is no published work on HPV51 variants to which we can refer.

We observed no differences for CIN2 compared with controls for HPV16 variants or any other type, with the possible exception of HPV18. The lack of associations with risk of CIN2 or HSIL is consistent with the belief that many CIN2 histologic lesions (and HSIL cytology interpretations) are misclassified as precancer even when diagnosed by an expert pathology panel (17). This further supports the critical importance of having the correct phenotype (endpoint definition) for analysis, to avoid dilution of true cancer surrogate endpoints. This also applies to using long-term type-specific persistence of HPV as an outcome. Accurate repeated characterization of HPV types can be complicated by the presence of multiple types within a clinical sample; the methods we used are generally robust for the detection of multiple types (21).

The aggregate data showed that variants of other HPV types in the high-risk clade, even excluding HPV16, significantly influenced risk of persistence. We observed that viral variants influenced long-term viral persistence (2+ years) even when CIN3+ had not (yet?) been diagnosed. Viral persistence is an uncommon viral outcome (i.e., clearance is typical) and a necessary part of progression to precancer. The persistence of HPV with interruption of p53 and pRB pathways links HPV carcinogenesis to the mechanism of other tumor viruses (33). We had more power to study long-term viral persistence than CIN3+. Our finding that HPV variants assessed globally influence persistence is in line with a small literature including some studies in HIV-infected populations (34-38). It is not known whether observations among immunosuppressed individuals apply to immunocompetent populations in which most HPV evolution has presumably taken place.

Our analyses were based on viral lineages determined from URR sequence alignment, with examination of E6 when necessary. Lineage assignment based on this hyper-variable region is very efficient; the evolution of HPV is slow with very little evidence for recombination (13,39,40). Thus, lineages are stable and predictive of most viral SNPs. However, URR-based lineages are not sufficient for all purposes, and our data might have missed important variability in other regions of the HPV genome that predicts persistence and/or carcinogenicity.

Using whole genome sequencing and sufficiently large numbers of cases (i.e., thousands), we could make substantial progress in understanding the viral genetic basis of HPV carcinogenesis (41). Because the HPV genome is relatively small (~8,000 bases) with only 8 genes, and because variants of each of the HPV types differ by only about 200 base pairs we believe we will be able to identify viral genetic changes associated with carcinogenicity. We will be able to identify targets and or biochemical functions within the host which are perturbed in the cascade to cancer.

Though much more difficult, we envision eventually implementing systems biology approaches to approach the vast networks of changes required for an infected cervical cell to become frankly malignant. It might prove fruitful to pursue why a given woman becomes a case of CIN3+ caused by one HPV type or variant but successfully clears other HPV types and variants. Consideration of viral gene variation in relation to relevant human variation in immune recognition or tumor suppressor genes might permit an initial, focused look at systems biology (42,43). If successful, understanding the multiple networks, pathways and interactions could lead to a new understanding of cancer biology and innovative strategies to combat HPV carcinogenesis.


The Guanacaste Project was supported by the National Cancer Institute, National Institutes of Health, Department of Health and Human Services contracts NO1-CP-21081, NO1-CP-33061, NO1-CP-40542 and NO1-CP-506535. We are grateful for the support of Costa Rican health authorities and for the decade-long dedication of the superb project staff. Dr. Burk was supported by National Cancer Institute grant CA78527 and used the facilities available through the Einstein Cancer Research Center and CFAR.


1. Schiffman M, Herrero R, Desalle R, et al. The carcinogenicity of human papillomavirus types reflects viral evolution. Virology. 2005;337:76–84. [PubMed]
2. Schiffman M, Castle PE, Jeronimo J, Rodriguez AC, Wacholder S. Human papillomavirus and cervical cancer. Lancet. 2007;370:890–907. [PubMed]
3. Kovacic MB, Castle PE, Herrero R, et al. Relationships of human papillomavirus type, qualitative viral load, and age with cytologic abnormality. Cancer Res. 2006;66:10112–9. [PubMed]
4. Ho GY, Bierman R, Beardsley L, Chang CJ, Burk RD. Natural history of cervicovaginal papillomavirus infection in young women. N Engl J Med. 1998;338:423–8. [PubMed]
5. Rodriguez AC, Schiffman M, Herrero R, et al. Rapid clearance of human papillomavirus and implications for clinical focus on persistent infections. J Natl Cancer Inst. 2008;100:513–7. [PubMed]Bouvard V, Baan R, Straif K, et al. A review of human carcinogens--Part B: biological agents. Lancet Oncol. 2009;10:321–2. [PubMed]
6. de Villiers EM, Fauquet C, Broker TR, Bernard HU, zur Hausen H. Classification of papillomaviruses. Virology. 2004;324:17–27. [PubMed]
7. Bouvard V, Baan R, Straif K, et al. A review of human carcinogens--Part B: biological agents. Lancet Oncol. 2009;10:321–2. [PubMed]
8. Schiffman M, Clifford G, Buonaguro FM. Classification of weakly carcinogenic human papillomavirus types: addressing the limits of epidemiology at the borderline. Infect Agent Cancer. 2009;4:8. [PMC free article] [PubMed]
9. Chen Z, Terai M, Fu L, Herrero R, DeSalle R, Burk RD. Diversifying selection in human papillomavirus type 16 lineages based on complete genome analyses. J Virol. 2005;79:7014–23. [PMC free article] [PubMed]
10. Chen Z, DeSalle R, Schiffman M, Herrero R, Burk RD. Evolutionary dynamics of variant genomes of human papillomavirus types 18, 45, and 97. J Virol. 2009;83:1443–55. [PMC free article] [PubMed]
11. Narechania A, Chen Z, DeSalle R, Burk RD. Phylogenetic incongruence among oncogenic genital alpha human papillomaviruses. J Virol. 2005;79:15503–10. [PMC free article] [PubMed]
12. Hazard K, Andersson K, Dillner J, Forslund O. Human papillomavirus subtypes are not uncommon. Virology. 2007;362:6–9. [PubMed]
13. Bernard HU, Calleja-Macias IE, Dunn ST. Genome variation of human papillomavirus types: phylogenetic and medical implications. Int J Cancer. 2006;118:1071–6. [PubMed]
14. Hildesheim A, Schiffman M, Bromley C, et al. Human papillomavirus type 16 variants and risk of cervical cancer. J Natl Cancer Inst. 2001;93:315–8. [PubMed]
15. Bratti MC, Rodriguez AC, Schiffman M, et al. Description of a seven-year prospective study of human papillomavirus infection and cervical neoplasia among 10000 women in Guanacaste, Costa Rica. Rev Panam Salud Publica. 2004;15:75–89. [PubMed]
16. Herrero R, Schiffman MH, Bratti C, et al. Design and methods of a population-based natural history study of cervical neoplasia in a rural province of Costa Rica: the Guanacaste Project. Rev Panam Salud Publica. 1997;1:362–75. [PubMed]
17. Carreon JD, Sherman ME, Guillen D, et al. CIN2 is a much less reproducible and less valid diagnosis than CIN3: results from a histological review of population-based cervical samples. Int J Gynecol Pathol. 2007;26:441–6. [PubMed]
18. Stoler MH, Schiffman M, Atypical Squamous Cells of Undetermined Significance-Low-grade Squamous Intraepithelial Lesion Triage Study (ALTS) Group Interobserver reproducibility of cervical cytologic and histologic interpretations: realistic estimates from the ASCUS-LSIL Triage Study. JAMA. 2001;285:1500–5. [PubMed]
19. Castle PE, Schiffman M, Gravitt PE, et al. Comparisons of HPV DNA detection by MY09/11 PCR methods. J Med Virol. 2002;68:417–23. [PubMed]
20. Herrero R, Hildesheim A, Bratti C, et al. Population-based study of human papillomavirus infection and cervical neoplasia in rural Costa Rica. J Natl Cancer Inst. 2000;92:464–74. [PubMed]
21. Qu W, Jiang G, Cruz Y, et al. PCR detection of human papillomavirus: comparison between MY09/MY11 and GP5+/GP6+ primer systems. J Clin Microbiol. 1997;35:1304–10. [PMC free article] [PubMed]
22. Chen Z, Schiffman M, Herrero R, Desalle R, Burk RD. Human papillomavirus (HPV) types 101 and 103 isolated from cervicovaginal cells lack an E6 open reading frame (ORF) and are related to gamma-papillomaviruses. Virology. 2007;360:447–53. [PMC free article] [PubMed]
23. Wheeler CM, Yamada T, Hildesheim A, Jenison SA. Human papillomavirus type 16 sequence variants: identification by E6 and L1 lineage-specific hybridization. J Clin Microbiol. 1997;35:11–9. [PMC free article] [PubMed]
24. Ong CK, Chan SY, Campo MS, et al. Evolution of human papillomavirus type 18: an ancient phylogenetic root in Africa and intratype diversity reflect coevolution with human ethnic groups. J Virol. 1993;67:6424–31. [PMC free article] [PubMed]
25. Plummer M, Schiffman M, Castle PE, Maucort-Boulch D, Wheeler CM, ALTS Group A 2-year prospective study of human papillomavirus persistence among women with a cytological diagnosis of atypical squamous cells of undetermined significance or low-grade squamous intraepithelial lesion. J Infect Dis. 2007;195:1582–9. [PubMed]
26. Fisher RA. On the interpretation of χ2 from contingency tables, and the calculation of P. J Royal Statistical Society. 1922;85:87–94.
27. Lancaster HO. Significance tests in discrete distributions. J Am Statistical Assoc. 1961;56:223–34.
28. Fisher RA. RA Statistical Methods for Research Workers. Oliver and Boyd (pubs); Edinburgh: 1954.
29. Burk RD, Chen Z, Van Doorslaer K. Human papillomaviruses: genetic basis of carcinogenicity. Public Health Genomics. 2009;12:281–90. [PMC free article] [PubMed]
30. Xin CY, Matsumoto K, Yoshikawa H, et al. Analysis of E6 variants of human papillomavirus type 33, 52 and 58 in Japanese women with cervical intraepithelial neoplasia/cervical cancer in relation to their carcinogenic potential. Cancer Lett. 2001;170:19–24. [PubMed]
31. Khouadri S, Villa LL, Gagnon S, et al. Human papillomavirus type 33 polymorphisms and high-grade squamous intraepithelial lesions of the uterine cervix. J Infect Dis. 2006;194:886–94. [PubMed]
32. Chan PK, Lam CW, Cheung TH, Li WW, Lo KW, Chan MY, Cheung JL, Cheng AF. Association of human papillomavirus type 58 variant with the risk of cervical cancer. J Natl Cancer Inst. 2002;94:1249–53. [PubMed]
33. zur Hausen H. Papillomaviruses and cancer: from basic studies to clinical application. Nat Rev Cancer. 2002;2:342–50. [PubMed]
34. Gagnon S, Hankins C, Tremblay C, et al. Polymorphism of human papillomavirus type 31 isolates infecting the genital tract of HIV-seropositive and HIV-seronegative women at risk for HIV infection. J Med Virol. 2005 Feb;75(2):213–21. [PubMed]
35. Gagnon S, Hankins C, Tremblay C, et al. Viral polymorphism in human papillomavirus types 33 and 35 and persistent and transient infection in the genital tract of women. J Infect Dis. 2004;190:1575–85. [PubMed]
36. Aho J, Hankins C, Tremblay C, et al. Genomic polymorphism of human papillomavirus type 52 predisposes toward persistent infection in sexually active women. J Infect Dis. 2004;190:46–52. [PubMed]
37. Schlecht NF, Burk RD, Palefsky JM, et al. Variants of human papillomaviruses 16 and 18 and their natural history in human immunodeficiency virus-positive women. J Gen Virol. 2005;86:2709–20. [PubMed]
38. Xi LF, Kiviat NB, Hildesheim A, et al. Human papillomavirus type 16 and 18 variants: race-related distribution and persistence. J Natl Cancer Inst. 2006;98:1045–52. [PubMed]
39. Prado JC, Calleja-Macias IE, Bernard HU, et al. Worldwide genomic diversity of the human papillomaviruses-53, 56, and 66, a group of high-risk HPVs unrelated to HPV-16 and HPV-18. Virology. 2005;340:95–104. [PubMed]
40. Calleja-Macias IE, Villa LL, Prado JC, et al. Worldwide genomic diversity of the high-risk human papillomavirus types 31, 35, 52, and 58, four close relatives of human papillomavirus type 16. J Virol. 2005;79:13630–40. [PMC free article] [PubMed]
41. Diallo AB, Badescu D, Blanchette M, Makarenkov V. A whole genome study and identification of specific carcinogenic regions of the human papilloma viruses. J Comput Biol. 2009;16:1461–73. [PubMed]
42. de Araujo Souza PS, Maciag PC, Ribeiro KB, Petzl-Erler ML, Franco EL, Villa LL. Interaction between polymorphisms of the human leukocyte antigen and HPV-16 variants on the risk of invasive cervical cancer. BMC Cancer. 2008;8:246. [PMC free article] [PubMed]
43. Koshiol J, Hildesheim A, Gonzalez P, et al. Common genetic variation in TP53 and risk of human papillomavirus persistence and progression to CIN3/cancer revisited. Cancer Epidemiol Biomarkers Prev. 2009;18:1631–7. [PMC free article] [PubMed]
44. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001;17:754–5. [PubMed]
45. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–4. [PubMed]
46. Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14:817–8. [PubMed]