In this work, over 12,000 cervicovaginal samples from women in the Americas (Costa Rica), Africa (Rwanda, Zambia, and Burkina Faso) and Asia (China, Taiwan, and Indonesia) were tested for HPV and 120 genomes from nearly 2,000 HPV16-related alpha-9 HPV isolates (HPV31, 33, 35, 52, 58 and 67) had their complete genomes sequenced. These HPV isolates were selected based on the analysis of the URR/E6 regions to identify samples representing or forming major variant lineages and also having the most diverse URR/E6 regions for each type 
. Based on the analyses of these genomes, there are two aspects of this study that deserve further consideration. First, the descriptive aspect of the HPV16-related alpha-9 type variants provide a framework to establish a nomenclature for variant lineages. Second, an emerging picture of the evolution of this highly pathogenic clade (see ) of HPVs is discussed.
Isolates of the same HPV type were originally considered as “variants” when their L1 genes contained 1 to 2% nucleotide sequence differences 
; however, the L1 ORF does not contain the optimal sequence information for distinguishing closely related HPV variants. As part of the ICTV Papillomavirus Study Group, we were recently assigned the task of developing a classification system for HPV variants 
. In contrast to the genera, species and type definitions that are based on the L1 ORF nucleotide sequence, we set the criteria for classification and nomenclature of variant lineages and sublineages using the complete genome, since the recently evolved variant genomes have changes that are not always evenly distributed throughout the genome (see ). To define distinct variant lineages, we used a nucleotide sequence difference of approximately 1.0% between two or more variants of the same type. This value was derived from empiric data on the distribution of differences between genomes of the same type (see Figure S1
). Similarly, differences across the genome of 0.5%–1.0% were used to identify sublineages. Each variant lineage was classified and named with an alphanumeric value (see for summary). The prototype sequence (i.e., the cloned genome designated as the original type) is always designated variant lineage A and/or sublineage A1 
Variants of HPV31, 33, 52, 58 and 67, similar to HPV16 and HPV18, form at least two deeply separated clades suggesting codivergence of host and virus as different lineages diversified from their most recent common ancestor (MRCA) 
. HPV35 variants are highly conserved and did not meet criteria for classification into more than one lineage. This probably represents a recent divergence from the MRCA of the HPV31, HPV35 and HPV16 clade. Alternatively, another variant lineage of HPV35 might exist in an isolated and/or unsampled population or could have disappeared by genetic isolation and/or host demise.
Although HPV16 and HPV18 variants are associated with specific geographic locations, the geographic distribution and ethnic association of HPV31, 33, 35, 52, 58 and 67 variant lineages are not well established. We believe a nomenclature based on alphanumerics is preferable to one based on geographic names, since it eliminates the problem of naming a lineage found in multiple geographic areas.
A number of investigators have used the strategy of PCR amplifying and sequencing one or a few informative segments to classify isolates into different variant lineages or groups. For example, HPV58 variants containing E7 SNPs- C632T and G760A (aa 63G) that have been reported to be associated with higher cervical cancer risk 
can be classified into HPV58 sublineage A3 (Figure S2E
). The C7732G SNP in HPV33 variants, which results in the loss of a putative binding site for the cellular upstream stimulatory factor has also been associated with high-grade squamous intraepithelial lesions (HSILs) 
. HPV33 C7732G is a lineage specific SNP within the URR region and represents HPV33 variant lineage A2 (Figure S2B
). The URR region contains many cis
-acting regulatory sequences; variations within these motifs may alter viral transcription and replication. Alternatively, these changes may be markers of other linked nucleotide changes within a lineage. A few studies have reported that alpha-9 HPV variants differ in risk of persistence; for some HPV genotypes, variant lineages or sublineages (e.g., HPV35 A1) differ in their risk of CIN3+ 
. Knowledge of the complete genome sequences and phylogenetic structure will facilitate understanding the clinical role sequence variations play in genotype-phenotype associations. An important point of the current analysis is showing that individual or groups of SNPs need to be interpreted in light of the high correlation of sets of SNPs within each lineage (). We have previously termed the stochastic process of papillomavirus genome accumulated mutations, “lineage fixation” 
. This has important practical considerations in that investigators sequencing or analyzing different regions of the genome will now be able to classify the lineages of these variants for genotype-phenotype studies based on the sequence data presented in Figure S2
. Thus, a common nomenclature will allow HPV researchers to discuss the properties of HPV variant lineages without having to describe sets of nucleotide changes to define a group of HPV variants. This will be particularly useful for future studies of the alpha-9 species group of HPVs that is an abundant and related group of viruses that have a high pathogenic potential.
Papillomavirus genomes accumulate SNPs and indels (see Figure S4
) through a stochastic process based on mutation rates similar to the host genomes they infect 
. This reflects the fact that PVs use the host's DNA replication machinery to copy and amplify their genomes; natural selection has likely played an important role over the course of evolution to filter and fix nucleotide changes within variant lineages. Whether the variation seen in highly related genomes (i.e., the variants described in this report) are determined by selection or genetic drift remain to be determined. However, the relatively recent evolution of the alpha-9 group of HPV types and variants cannot easily be explained by natural selection. Analyses used to detect selection at individual codon position 
identified only a few scattered nucleotide sites under Darwinian selection. This might be expected in viruses that have existed for hundreds of millions of years 
, and have perfected a survival strategy and optimized their structural components via natural selection 
. This inference is supported by the existence of strong purifying selection and conserved genome regions that are most evident in the L1, L2, E1 and E2 ORFs. Other regions of the genome have more flexibility to adapt HPVs to different biological niches, but the exact mechanisms and sequences responsible for these changes have not been identified. Moreover, since recombination is not a major form of papillomavirus evolution, SNPs are not correlated by distance, as is observed in the human hapmap and 1000 genome projects resulting in linkage disequilibrium (LD) blocks 
. In contrast, HPV evolution results in genome variation where changes in one region of the genome are highly correlated with those in other regions of variants from the same lineage (). Nevertheless, there is at least one example of recombination between a polyomavirus and a papillomavirus 
indicating that recombination has occurred in the distant past and could occur in the future. To date, our laboratory has not observed direct evidence of recombination in human papillomavirus genomes, and there is a lack of compelling data to suggest that recombination is important in the evolution of the alpha-9 HPVs.
In summary, we present an extensive description of the HPV16-related alpha-9 papillomavirus variants. We provide a taxonomy and nomenclature of these variants that should be useful for evolutionary biologists, virologists, epidemiologists and health care workers. Nevertheless, the mechanisms of adaptation and oncogenic pathogenicity of the alpha-9 HPVs will require additional studies and their role in morbidity and mortality, especially for cervix cancer, will continue for decades to come.