In this analysis, we used a novel approach to study complex HPV genotype patterns in cervical disease and to address the misclassification of cervical disease stages defined by histology and cytology. Both histology and cytology are subjective methods with limited reproducibility. Colposcopic biopsy frequently misses the worst lesion on the cervix and augments the problem of disease misclassification (
8;
9). Here, we performed unsupervised hierarchical clustering of HPV genotyping data to agnostically define disease groups with similar HPV genotype patterns. We identified 4 disease clusters based on their unique HPV genotype distributions. The first cluster included women with CIN3 and HSIL, LSIL, or ASCUS cytology. The second cluster included mainly women with HSIL and/or CIN2. The third cluster included only women without high grade histology or cytology and low levels of viral infection, while the fourth cluster included women with mild to moderate dysplasia and cytologic signs of active viral infections (ASCUS and LSIL). We confirmed a previous finding that women with HSIL cytology but without histologically confirmed high-grade biopsy results clustered with histologically-confirmed high-grade disease (
11). We now show that these women also have a high risk of being detected with CIN3+ in the subsequent two years, corroborating that colposcopy-biopsy missed some prevalent precancer.
Women grouped in the four clusters had distinct clinical characteristics: Women in the first cluster all had prevalent CIN3 and accordingly, they were the most likely to have visual and microscopic evidence of abnormalities and higher viral load. Women in the second cluster had the second highest risk of CIN3, had fewer abnormal cervical impressions and lower viral load. Women in cluster three were slightly older, had the lowest number of HPV infections across all types, few abnormal cervical impressions, very low viral load and the lowest risk of CIN3. In contrast, women in the fourth cluster were slightly younger, had the highest number of infections, the highest viral load and an intermediate risk of CIN3, lower than cluster two.
These results were obtained using HPV genotyping for 27 HPV types without applying any weighting or hierarchical attribution by genotype (e.g. giving more weight to carcinogenic types). The four clusters separate out distinct groups with different characteristics, partly reflecting different stages of the natural history of HPV-related disease: Younger women are likely to have multiple infections that are mainly productive rather than transforming and associated with high viral load (cluster four). Older women have fewer infections and lower viral load if there is no prevalent disease (cluster three). Women with prevalent precancer have fewer infections with many carcinogenic types (most importantly HPV16) and may have high viral load (clusters one and two). Admittedly, the age range in the ALTS population is limited and the difference in median age between clusters three and four is very small.
It is important to note that the risk prediction for women with CIN2 detected at baseline is limited by censoring, as most women with CIN2 had a LEEP procedure, interrupting the natural history,. The counterintuitive grouping of CIN2-ASCUS and CIN2-LSIL in disease cluster four, while CIN2-normal is grouped in disease cluster two, is driven by higher prevalence of genotypes from HPV cluster three, which mainly includes non-carcinogenic types. Restricting the clustering to carcinogenic types led to a closer grouping of CIN2 disease groups.
The second dimension of cluster analysis identified three major clusters of HPV genotypes: HPV16 clustered separately from all other HPV genotypes and showed the closest association to risk of CIN3 among all types. The second HPV cluster included nine high risk types plus HPV53 and HPV66 and was found at higher frequencies in disease clusters 1, 2, and 4, while the third HPV cluster included mainly non-carcinogenic types plus HPV33, HPV45, and showed frequencies distributed more evenly across the disease clusters. Although HPV33 and HPV45 had the highest prevalence in CIN3/HSIL in cluster 3, their more uniform distribution in comparison to other carcinogenic types most likely caused their grouping with non-carcinogenic types. The carcinogenicity of HPV53, HPV66, and HPV68 has been widely debated. In the most recent IARC classification, HPV53 and HPV66 were considered possibly carcinogenic (cluster 2b), while HPV68 was classified as probably carcinogenic (cluster 2a) because of experimental and phylogenetic evidence but without strong supporting epidemiologic data (
17;
18). The clustering of HPV53 and HPV66 with carcinogenic types in our analysis reflects their ability to cause a spectrum of mild and more severe precursor lesions, including CIN3, but little or no chance of invasion. The correlation of genotype clustering with the phylogenetic clades (
19) and the WHO carcinogen classification (
17) was quite good () and we think that the remaining discrepancies are mainly related to the lack of cancers in the disease spectrum we analyzed. Still, while our approach can indicate which genotypes are important in the progression to precancer, it cannot identify the causal type in a multiple infection. Ultimately, lesion-specific genotyping is required to precisely attribute HPV genotypes to cervical precancer.
With our analytic approach we were able to show the complex relation of HPV genotypes and cervical disease in 2780 women from ALTS in a single figure. As exemplified in our study, it is possible to use this technique to address disease misclassification. For example, the same approach can be used to address the heterogeneity of CIN2, a very diverse category that, depending on the local histological interpretation, may include a lot of low grade disease (more likely to be in cluster 4) or that is more similar to CIN3 (more likely to be in cluster 2). Our clustering approach allows studying type attribution to disease in different regions of the world by comparing type allocation to genotype clusters. Similarly, it can be used to analyze shifts in type attribution to disease in vaccinated populations.
Due to the frequent misclassification of cervical disease, the disease groups used in the analysis are a combination of true results, cases in whom the worst lesion was missed in colposcopy, and cases in whom histology and/or cytology was under- or overcalled Despite the common notion that cervical histology is the gold standard of disease ascertainment, in our analysis, cytology was an important indicator of subsequent risk of CIN3+ within the group of normal/low grade histology, reflected by similar HPV genotype patterns as found in histology-confirmed high grade disease.
In summary, we present a novel solution to the handling of complex HPV genotype data in cervical disease and applied it in the prospective ASCUS-LSIL-triage study. We show that HPV genotype patterns at various cervical disease stages are complex, but distinctive when analyzed in aggregate. Our approach allows easily displaying HPV prevalence in disease groups and may be used to compare the distribution of HPV genotypes within in diagnostic categories between different populations.