|Home | About | Journals | Submit | Contact Us | Français|
Anogenital cancers are associated with about 13 carcinogenic HPV types in a broader group that cause cervical intraepithelial neoplasia (CIN). Multiple concurrent cervical HPV infections are common which complicate the attribution of HPV types to different grades of CIN. Here we report the analysis of HPV genotype patterns in the ASCUS-LSIL triage study using unsupervised hierarchical clustering. Women who underwent colposcopy at baseline (n = 2780) were grouped into 20 disease categories based on histology and cytology. Disease groups and HPV genotypes were clustered using complete linkage. Risk of 2-year cumulative CIN3+, viral load, colposcopic impression, and age were compared between disease groups and major clusters. Hierarchical clustering yielded four major disease clusters: Cluster 1 included all CIN3 histology with abnormal cytology; Cluster 2 included CIN3 histology with normal cytology and combinations with either CIN2 or high-grade squamous intraepithelial lesion (HSIL) cytology; Cluster 3 included older women with normal or low grade histology/cytology and low viral load; Cluster 4 included younger women with low grade histology/cytology, multiple infections, and the highest viral load. Three major groups of HPV genotypes were identified: Group 1 included only HPV16; Group 2 included nine carcinogenic types plus non-carcinogenic HPV53 and HPV66; and Group 3 included non-carcinogenic types plus carcinogenic HPV33 and HPV45. Clustering results suggested that colposcopy missed a prevalent precancer in many women with no biopsy/normal histology and HSIL. This result was confirmed by an elevated 2-year risk of CIN3+ in these groups. Our novel approach to study multiple genotype infections in cervical disease using unsupervised hierarchical clustering can address complex genotype distributions on a population level.
More than 40 different types of human papillomaviruses (HPV) can infect the anogenital mucosa. Most of these types cause asymptomatic transient infections that may be associated with minor cytological alterations, while approximately a dozen carcinogenic types can cause anogenital cancer (1). HPV16 is by far the most carcinogenic type. HPV18, 31, 33, and 45 follow and together with HPV16 account for more than 90% of HPV-related cancers (2).
Traditionally, cervical cancer was thought to arise through increasingly severe grades of cervical intraepithelial neoplasia, defined by the extent and severity of cellular atypia. However, CIN1 is now known to represent acute HPV infection while CIN3 is precancer (and includes carcinoma in situ). At the transition between acute infection and precancer is an equivocal and poorly reproducible diagnosis called CIN2 (3), which probably represents a mixture of precancer and HPV infection. The cytologic correlates used in screening are high-grade squamous intraepithelial lesion (HSIL) corresponding to CIN2/3 or low-grade SIL (LSIL) corresponding to CIN1. The most common equivocal cytologic abnormalities are called atypical squamous cells of undetermined significance (ASC-US).
The multiple stages of cervical carcinogenesis are being re-defined due to increased understanding of HPV natural history (1;4). We distinguish acute HPV infection, a common and benign condition, from uncommon persistent infection that is the true risk factor for precancer and cancer. The prevalence of HPV types in the genital tract and the association of HPV types with different stages in CIN are related to multiple host and viral factors. HPV infection is easily transmitted by sexual contact while poorly understood immunological factors are related to viral clearance/persistence. The risk of viral persistence and associated development of precancer vary by type. Some non-carcinogenic types from the alpha3/alpha15 species are preferentially detected in the vaginal epithelium, while alpha7 types such as HPV18 and HPV45 are frequently found in endocervical lesions (5). There is no convincing evidence for interaction between multiple cervical HPV infections.
We wish to sort the different histologic and cytologic findings by their relationships with HPV natural history. But HPV prevalence studies using broad HPV genotyping have shown that multiple HPV infections are very common, especially in young women at the peak of their sexual activity (6;7). Current genotyping assays allow detection of up to ~40 HPV genotypes from the same sample, generating complex HPV genotyping data that complicate the attribution of individual HPV genotypes to grades of CIN and corresponding cytology. So far, the complexity has been mainly addressed by restricting analyses to single genotype infections, by attributing genotypes to disease hierarchically based on the HPV prevalence in cervical cancers, or by combining genotypes within phylogenetic species. We previously demonstrated the wide range of potential type attribution to all stages of cervical disease that can only be resolved by genotyping of individual lesions on the cervix (7).
The attribution of individual genotypes to cervical disease is further complicated by the imprecise ascertainment of cervical disease stages. Colposcopy and biopsy frequently miss prevalent precancer (8;9) and even if the worst lesion is not missed, cytology and histology have both limited reproducibility, especially at the lower disease stages (10). For example, in a cross-sectional study, we demonstrated that women with HSIL cytology and normal biopsy results had very similar HPV genotype patterns as women with biopsy-confirmed high grade disease, suggesting that the worst lesion was frequently missed in colposcopy (11).
To further our understanding of the spectrum of histologic and cytologic abnormalities in relationship to HPV types, we analyzed HPV genotype distributions in 20 disease categories based on cytology and histology and show their relation to subsequent 2-year risk of CIN3 in a large clinical trial called the ASCUS-LSIL Triage Study (ALTS).
ALTS was a multicenter, randomized clinical trial conducted by the National Cancer Institute to compare three clinical management strategies for women referred with a community cytologic interpretation of ASCUS (n = 3488) or LSIL (n= 1572) cytology (12). At enrollment, cytology was repeated and HPV testing was performed using HC2 (Hybrid Capture 2, Digene Corporation, now Qiagen). Women in the immediate colposcopy (IC) arm were referred to colposcopy regardless of test results. In the HPV Triage arm, women were referred to colposcopy if they had an HC2 positive or missing result at enrollment or if their enrollment cytology was HSIL. Women in the conservative management (CM) arm received colposcopy only in the case of an HSIL cytology result at enrollment. Women were followed for two years, with cytology follow-ups every 6 months. Women with an HSIL cytology result at any of these follow-up visits were referred to colposcopy. Our analysis included all women referred for either ASCUS1 or LSIL who underwent colposcopy at enrollment and who had a cytology diagnosis in the enrollment period (n= 2780, Figure 1).
Colposcopy was performed by nurse practitioner colposcopists, general gynecologists, gynecology oncology fellows, or gynecologic oncologists. The type of medical training did not influence the sensitivity of colposcopy to detect CIN3+ in two years of follow-up (7). Before colposcopy, high-resolution photography of the cervix was performed to evaluate visual screening and as additional colposcopy quality control.
At each study visit, a pelvic exam was performed and two cervical specimens were collected. One specimen was preserved in PreservCyt (Cytyc, now Hologic) for cytology and hc2 testing, and the second was preserved in specimen transport medium (STM; Qiagen). The National Cancer Institute and local institutional review boards approved the study and all participants provided written informed consent.
Line Blot Assay (LBA) was performed on enrollment STM specimens as previously described for the detection of 27 individual (HPV6, 11, 16, 18, 26, 31, 33, 35, 39, 40, 42, 45, 51–59, 66, 68, 73, 82–84) HPV types (13) A subset of specimens were retested by Linear Array (LA), a commercialized version of LBA that tests for 37 HPV genotypes, including 26 detected by LBA as previously described (14).
Residual PreservCyt (Hologic, Bedford, MA) specimens, after being used for liquid-based cytology, were tested by Hybrid Capture 2 (hc2; Qiagen, Gaithersburg, MD), a pooled probe, signal amplification DNA test that targets a group of 13 carcinogenic HPV types.
Treatment was based on cytologic and histologic diagnoses made by the clinical center (CC) pathologists as described previously. For quality control purposes, the Pathology Quality Control Group (QC Pathology) at Johns Hopkins Hospital reviewed referral smears, ThinPreps, and histology slides and provided secondary diagnoses. Excisional treatment by the loop electrosurgical excision procedure (LEEP) was offered to any women receiving a CC histology diagnosis of CIN2 or worse, or a QC diagnosis of CIN3 or worse. At the time of study exit, all women with persistent mild cervical abnormalities were offered treatment by LEEP.
Women were separated into twenty categories of cervical disease, formed by crossing enrollment histology diagnosis with cytology result. Disease combinations were consistently labeled using the following format: `histology result'/`cytology result'. Histology diagnoses were drawn from the QC histology diagnosis when available. If a subject did not have a QC histology diagnosis for the enrollment period, which was rare, her CC enrollment histology diagnosis was used. Histology diagnoses were classified as Normal, CIN1, CIN2, CIN3, or `No Biopsy' (because no biopsy was taken at colposcopy, indicating a negative colposcopic impression). Enrollment cytology interpretation categories were Normal, ASCUS (including ASC-H), LSIL, or HSIL. Using each individual's human papillomavirus infection status from the LBA results, we calculated type-specific HPV infection frequencies at enrollment for each diagnosis category. Also for each category, we calculated two-year risk of CIN3+ diagnosis and described some other clinical and demographic data.
In order to examine patterns of HPV infection in these twenty diagnosis categories, we used hierarchical clustering to compare HPV genotype patterns in each disease group. We used complete linkage and a Euclidian distance metric. We simultaneously clustered both disease combinations and HPV genotypes and created dendrograms to visualize the clustering using the treeview software. For sensitivity analyses, we also performed the same hierarchical clustering 1) using LA results instead of the LBA results (LA results were only available in women referred into ALTS with an ASCUS Pap) and 2) restricted to single-type HPV infections. We also recategorized the women based on worst two-year histology and enrollment cytology and performed the same analyses to examine the effect of misclassification of disease at the time of enrollment on the HPV frequency patterns. We confirmed the clustering results by performing a k-means cluster analysis specifying three and four clusters.
Supplemental table 1 displays the HPV genotype prevalence in all histology/cytology combinations. In 18 of the 20 groups the most frequent genotype was HPV16, the exceptions being the groups normal/LSIL and CIN1/normal. Within each histology category, the women with HSIL cytology had the highest frequency of HPV16 infection; HPV16 frequency also increased with increasing histological severity. One hundred seventeen of 165 (71%) women with CIN3/HSIL were infected with HPV16. Multiple infections were very common in this population, but varied significantly between disease groups from an average number of 0.74 types in women with normal/normal to an average number of 2.38 types in women with CIN2/LSIL. Across all histologic categories, most infections were found in women with LSIL cytology (Supplemental table 1).
Unsupervised hierarchical clustering of disease groups by genotype frequencies yielded a tree with four major histology/cytology clusters (Figure 2, Table 1). Cluster 1 included CIN3/ASCUS, CIN3/LSIL, and CIN3/HSIL. Cluster 2 included CIN3/normal, all less severe histology with HSIL except for CIN1/HSIL, and CIN2/normal. Cluster 3 included normal histology or no biopsy with normal, ASCUS, or LSIL cytology. Cluster 4 included CIN2/ASCUS, CIN2/LSIL, normal/LSIL, and all CIN1 except for CIN1/normal.
We observed very similar clustering of disease combinations when restricting the analysis to women referred for ASCUS cytology only using either LBA or LA HPV genotyping data, although some histology/cytology combinations had less than 20 cases and were excluded. When restricting to cases with single type HPV infections only, many combinations had only very small numbers and generated unstable clusters. K-means clustering specifying four disease clusters reproduced the same grouping pattern that was observed with unsupervised hierarchical clustering.
HPV genotypes were clustered in three major groups (Figure 2, Table 2): HPV16 clustered separately from all other HPV genotypes, driven by its high frequency across all disease stages (HPV cluster 1). Nine carcinogenic types plus HPV66 and HPV53 were included in HPV cluster 2; the first subgroup included only α9 types, the second mainly α7 types, and the third was dominated by α6 and α9 types. All remaining non-carcinogenic types as well as HPV33 and HPV45 were included in HPV cluster 3; there was no specific distribution of HPV clades in the three subgroups of cluster 3. HPV genotypes in the first and second HPV clusters showed a differential distribution across disease categories with their lowest prevalence in the disease cluster 3 and the highest prevalence in the disease cluster 4. Within the third HPV group, we observed two sub-groups: 1) a subset of HPV genotypes with very low prevalence and no disease-specific distribution (HPV11, HPV26, HPV40, and HPV57 in HPV cluster 3a) and 2) eleven types with higher prevalence evenly distributed across all disease clusters (HPV clusters 3b+c). Exclusion of non-carcinogenic types had only a minor impact on disease clustering, while exclusion of carcinogenic types produced a completely different clustering of disease combinations, suggesting that the clustering was driven by carcinogenic types. Overall, the HPV clusters were less distinct than the disease clusters, as indicated by the distance metrics and visualized by the flat branching in the HPV genotype dendrogram.
We studied two-year risk of CIN3 within the individual disease groups and within disease clusters (Figure 3, Table 1). The 2-year risk of CIN3 differed significantly between the clusters (p<0.001). Cluster 1 included only women with CIN3. Women in cluster 2 (including some women with CIN3 histology) had a 20% risk of CIN3 in average. The risk was much lower for women in cluster 4 (6.5%) and lowest for women in cluster 3 (3.5%). The clustering of women with normal or no histology and HSIL cytology in clusters with high grade histology and cytology is also reflected by their higher risk of developing CIN3: Among women with normal histology at enrollment, 16.5% of women with HSIL enrollment cytology (cluster 2) had a worst 2 year histology result of CIN3+, compared to 3.3% of women who had normal enrollment cytology (cluster 3) (p<0.001).
We also studied the clinical and demographic characteristics across the four disease clusters (Figure 3). Women in cluster 3 were oldest with a median age of 27 while women in cluster 4 were youngest with a median age of 23, similar to clusters 1 and 2 (median age of 24) (p<0.001). Semi-quantitative viral load determined by hc2 signal strength was highest in cluster 4 and lowest in cluster 3 (p<0.001). The frequency of high grade colposcopy was highest in cluster 1, followed by clusters 2 and 4, and lowest in cluster 3 (p<0.001), a pattern confirmed for the review of Cervigram images (p < 0.001). The frequency of current smoking increased with increasing severity of disease in a cluster (1 > 2 > 4 > 3), further suggestion that smoking is indeed a cofactor for the development of precancerous lesions.
In this analysis, we used a novel approach to study complex HPV genotype patterns in cervical disease and to address the misclassification of cervical disease stages defined by histology and cytology. Both histology and cytology are subjective methods with limited reproducibility. Colposcopic biopsy frequently misses the worst lesion on the cervix and augments the problem of disease misclassification (8;9). Here, we performed unsupervised hierarchical clustering of HPV genotyping data to agnostically define disease groups with similar HPV genotype patterns. We identified 4 disease clusters based on their unique HPV genotype distributions. The first cluster included women with CIN3 and HSIL, LSIL, or ASCUS cytology. The second cluster included mainly women with HSIL and/or CIN2. The third cluster included only women without high grade histology or cytology and low levels of viral infection, while the fourth cluster included women with mild to moderate dysplasia and cytologic signs of active viral infections (ASCUS and LSIL). We confirmed a previous finding that women with HSIL cytology but without histologically confirmed high-grade biopsy results clustered with histologically-confirmed high-grade disease (11). We now show that these women also have a high risk of being detected with CIN3+ in the subsequent two years, corroborating that colposcopy-biopsy missed some prevalent precancer.
Women grouped in the four clusters had distinct clinical characteristics: Women in the first cluster all had prevalent CIN3 and accordingly, they were the most likely to have visual and microscopic evidence of abnormalities and higher viral load. Women in the second cluster had the second highest risk of CIN3, had fewer abnormal cervical impressions and lower viral load. Women in cluster three were slightly older, had the lowest number of HPV infections across all types, few abnormal cervical impressions, very low viral load and the lowest risk of CIN3. In contrast, women in the fourth cluster were slightly younger, had the highest number of infections, the highest viral load and an intermediate risk of CIN3, lower than cluster two.
These results were obtained using HPV genotyping for 27 HPV types without applying any weighting or hierarchical attribution by genotype (e.g. giving more weight to carcinogenic types). The four clusters separate out distinct groups with different characteristics, partly reflecting different stages of the natural history of HPV-related disease: Younger women are likely to have multiple infections that are mainly productive rather than transforming and associated with high viral load (cluster four). Older women have fewer infections and lower viral load if there is no prevalent disease (cluster three). Women with prevalent precancer have fewer infections with many carcinogenic types (most importantly HPV16) and may have high viral load (clusters one and two). Admittedly, the age range in the ALTS population is limited and the difference in median age between clusters three and four is very small.
It is important to note that the risk prediction for women with CIN2 detected at baseline is limited by censoring, as most women with CIN2 had a LEEP procedure, interrupting the natural history,. The counterintuitive grouping of CIN2-ASCUS and CIN2-LSIL in disease cluster four, while CIN2-normal is grouped in disease cluster two, is driven by higher prevalence of genotypes from HPV cluster three, which mainly includes non-carcinogenic types. Restricting the clustering to carcinogenic types led to a closer grouping of CIN2 disease groups.
The second dimension of cluster analysis identified three major clusters of HPV genotypes: HPV16 clustered separately from all other HPV genotypes and showed the closest association to risk of CIN3 among all types. The second HPV cluster included nine high risk types plus HPV53 and HPV66 and was found at higher frequencies in disease clusters 1, 2, and 4, while the third HPV cluster included mainly non-carcinogenic types plus HPV33, HPV45, and showed frequencies distributed more evenly across the disease clusters. Although HPV33 and HPV45 had the highest prevalence in CIN3/HSIL in cluster 3, their more uniform distribution in comparison to other carcinogenic types most likely caused their grouping with non-carcinogenic types. The carcinogenicity of HPV53, HPV66, and HPV68 has been widely debated. In the most recent IARC classification, HPV53 and HPV66 were considered possibly carcinogenic (cluster 2b), while HPV68 was classified as probably carcinogenic (cluster 2a) because of experimental and phylogenetic evidence but without strong supporting epidemiologic data (17;18). The clustering of HPV53 and HPV66 with carcinogenic types in our analysis reflects their ability to cause a spectrum of mild and more severe precursor lesions, including CIN3, but little or no chance of invasion. The correlation of genotype clustering with the phylogenetic clades (19) and the WHO carcinogen classification (17) was quite good (Table 2) and we think that the remaining discrepancies are mainly related to the lack of cancers in the disease spectrum we analyzed. Still, while our approach can indicate which genotypes are important in the progression to precancer, it cannot identify the causal type in a multiple infection. Ultimately, lesion-specific genotyping is required to precisely attribute HPV genotypes to cervical precancer.
With our analytic approach we were able to show the complex relation of HPV genotypes and cervical disease in 2780 women from ALTS in a single figure. As exemplified in our study, it is possible to use this technique to address disease misclassification. For example, the same approach can be used to address the heterogeneity of CIN2, a very diverse category that, depending on the local histological interpretation, may include a lot of low grade disease (more likely to be in cluster 4) or that is more similar to CIN3 (more likely to be in cluster 2). Our clustering approach allows studying type attribution to disease in different regions of the world by comparing type allocation to genotype clusters. Similarly, it can be used to analyze shifts in type attribution to disease in vaccinated populations.
Due to the frequent misclassification of cervical disease, the disease groups used in the analysis are a combination of true results, cases in whom the worst lesion was missed in colposcopy, and cases in whom histology and/or cytology was under- or overcalled Despite the common notion that cervical histology is the gold standard of disease ascertainment, in our analysis, cytology was an important indicator of subsequent risk of CIN3+ within the group of normal/low grade histology, reflected by similar HPV genotype patterns as found in histology-confirmed high grade disease.
In summary, we present a novel solution to the handling of complex HPV genotype data in cervical disease and applied it in the prospective ASCUS-LSIL-triage study. We show that HPV genotype patterns at various cervical disease stages are complex, but distinctive when analyzed in aggregate. Our approach allows easily displaying HPV prevalence in disease groups and may be used to compare the distribution of HPV genotypes within in diagnostic categories between different populations.
Financial Support and Conflict of Interest Statement: The research was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Some of the equipment and supplies used in the ALTS trial were donated or provided at reduced cost by Digene (Gaithersburg, MD), Cytyc (Boxborough, MA), National Testing Laboratories (Fenton, MO), DenVu (Tucson, AZ), TriPath Imaging (Burlington, NC), and Roche Molecular Systems (Alameda, CA). Roche Molecular Systems provided reagents and research support to the lab of Patti E. Gravitt. Cosette M. Wheeler has received support through her institution from Roche Molecular Systems for HPV genotyping studies.
1ASCUS under the 1991 Bethesda system was slightly more inclusive, particularly of probable reactive changes and ASC-H (atypical squamous cells, cannot rule out high-grade intraepithelial lesion), than the ASC-US category of the 2001 Bethesda system.