Search tips
Search criteria 


Logo of jvirolPermissionsJournals.ASM.orgJournalJV ArticleJournal InfoAuthorsReviewers
J Virol. 2013 August; 87(16): 9313–9322.
PMCID: PMC3754072

Human Papillomavirus Type 58 Genome Variations and RNA Expression in Cervical Lesions


Human papillomavirus type 58 (HPV58) is relatively prevalent in China and other Asian countries. In this study, the HPV58 genome in cervical lesions was decoded from five grade 2 or 3 cervical intraepithelial neoplasia lesion (CIN2/3) samples and five cervical cancer tissues using rolling-circle amplification of total cell DNA and deep sequencing and verified by whole-genome cloning and sequencing. HPV58 isolates from China feature a total of 52 nucleotide substitutions (0.66%) from the reference HPV58 sequence, which appear mainly in two regions, with 12 from nucleotides (nt) 3430 to 4136 covering the E2/E4/E5 open reading frames (ORFs) and 13 from nt 4621 to 5540 covering the L2 ORF; these could be grouped as HPV58 Chinese Zhejiang-1, -2, and -3 (CNZJ-1, -2, and -3) according to their sequence similarities and restriction enzyme digestion. Phylogenetically, CNZJ-3 is similar to the reference HPV58 sublineage A1 sequence. The other two are close to sublineage A2. Analysis of cervical lesion-derived RNA revealed abundant HPV58 early transcripts spliced at the E6 and E1/E2 ORFs, where two 5′ splice sites at nt 232 and nt 898 and two 3′ splice sites at nt 510 and nt 3355 can be identified. Thus, our study represents the first genome-wide analysis of HPV58 and its expression in cervical lesions.


Human papillomaviruses (HPVs) are causative agents in the development of cervical intraepithelial neoplasia (CIN) lesions and invasive carcinoma. Cervical persistent infection by high-risk HPV types is the single greatest risk factor for malignant progression (1, 2). Of 15 known high-risk HPV types, HPV16 and HPV18 are the two most common types inducing the development of cervical cancer (35). However, the prevalence of high-risk HPV58 infection displays considerable geographical variation (6). HPV58 was found in 3.3% of cervical cancers worldwide but in 5.6% of cervical cancers in Asia. HPV58 was detected in 7% of high-grade CIN lesions globally but in 12.2% in Asia (7, 8). Studies from East Asian populations have shown that HPV58 infection ranks third in cervical cancer cases (7, 9). In China's Zhejiang area, HPV58 is more prevalent than HPV18 in CIN lesions (19.1% versus 5.4%), and equal infection frequencies of HPV58 and HPV18 are found in cervical cancers (9.4% each) (10). Similar to other high-risk HPVs (1113), HPV58 integration in cervical lesions could occur by disruption of the viral E1 coding region (14).

In this study, we performed comprehensive analyses of the HPV58 genome and transcriptome. The full complement of the viral genome was decoded using a rolling-circle amplification and deep sequencing (RCA-seq) technique, which does not require prior knowledge of the underlying genome. Viral RNA transcripts from total cell RNA were analyzed by means of directional ligation strand-specific RNA sequencing (DeLi-seq) (15). Subsequently, three episomal HPV58 genomes were cloned and sequenced. A transcription map of the HPV58 early region was constructed from clinical CIN2 lesions. Our study represents the first genome-wide analysis of HPV58 and its expression in cervical lesions.

(Portions of this report were presented at the 28th International Papillomavirus Conference, San Juan, Puerto Rico, 30 November to 6 December 2012.)


Sample preparation.

DNA and RNA samples from 10 cervical cancer tissues and 10 CIN2/CIN3 (CIN2/3) tissues were collected from the Women's Hospital, School of Medicine, Zhejiang University. The study was approved by the Institutional Review Board for clinical research of this hospital. Informed consent was obtained from each participant prior to the study.

RNA and DNA were isolated simultaneously from each sample by TRIzol (Invitrogen, Carlsbad, CA) according to the manufacturer's instructions.

Rolling-circle amplification.

Since HPV has a circular genome, it was enriched by rolling-circle amplification (RCA) using ϕ29 DNA polymerase (16, 17). RCA was performed in a 20-μl reaction mixture containing 100 ng of DNA, 1 mM each deoxynucleoside triphosphate (dNTP; Epicentre, Madison, WI), 4 μg of bovine serum albumin (BSA; New England BioLabs [NEB], Ipswich, MA), 10 μM random hexamer (5′-NNNN*N*N-3′, in which the asterisk indicates a phosphothiol group), 10 U of ϕ29 DNA polymerase (NEB), 2 μl of dimethyl sulfoxide (DMSO), and 1× ϕ29 reaction buffer. Before the addition of ϕ29 enzyme, the reaction mixture was heated to 80°C for 2 min, followed by snap-cooling on ice for 2 min. The ϕ29 DNA polymerase was then added to the mixture, incubated at 10°C for 10 min and at 28°C for 16 h, and heat inactivated at 65°C for 10 min.

Three steps were performed to reduce the chimeras from the RCA reaction (18), as follows. (i) RCA products were debranched with 400 U of ϕ29 DNA polymerase (Epicenter), 1× reaction buffer, and 0.25 mM each dNTP (Epicentre) in a 50-μl reaction mixture without any primers and incubated at 30°C for 2 h and at 65°C for 3 min. (ii) Twenty-five microliters of debranching products was incubated at 37°C for 30 min with 2 U of S1 nuclease (USB, Santa Clara, CA) to digest the remaining single-stranded DNA in a 100-μl reaction mixture containing 1× S1 buffer (pH 4.6), 30 mM sodium acetate (NaAc), 50 mM NaCl, and 1 mM ZnSO4. (iii) DNA fragments were then sheared to fragments with an average length of 200 bp (Covaris S2 ultrasonicator), followed by nick translation and end repair by DNA polymerase I and T4 DNA polymerase in 1× buffer with 500 μM each dNTP. The reaction mixture was incubated at 25°C for 1 h and inactivated at 75°C for 10 min before paired-end (PE) DNA library construction.

To quantify RCA enrichment efficiency of HPV58 copy numbers, a real-time quantitative PCR (qPCR) with the HPV58-specific primer pair Pr743 and Pr854 was performed on ~100 pg of DNA from each sample before and after RCA enrichment. RCA products used for qPCR were debranched, without S1 digestion and DNA shearing, and linearized by AgeI or DraII digestion of HPV58.

Paired-end library preparation and sequencing.

The end-repaired double-stranded DNA (dsDNA) was size selected to 200 to 300 bp using a 2% agarose gel, phosphorylated with T4 polynucleotide kinase, and A-tailed with Klenow (exo) DNA polymerase (Epicentre), followed by ligation to homemade bar-coded Illumina paired-end Y adaptor and size selection (~250 to 450 bp). A 16-cycle PCR was then carried out with Phusion Hot Start High-Fidelity DNA Polymerase (Finnzymes, Vantaa, Finland) to generate the final RCA-seq library. All 21 RCA-seq libraries (20 independent patient samples plus a technical duplicate of sample 10) were sequenced with an Illumina HiSeq-2000 platform. During library preparation, each homemade paired-end Y adaptor was tagged with a unique 5-mer barcode so that multiple samples could be sequenced within a single lane. Paired 50-mer reads were obtained, of which the first 5 bp is the barcode for determining the sample identity (19). After the raw data based on the barcodes were split, the trimmed reads were then aligned to the HPV58 reference genome for downstream analysis.

RNA sample preparation and RNA-seq.

Total RNA isolated was further cleaned using an RNeasy Minikit (Qiagen, Germantown, MA) with an on-column DNase I digestion step. Starting with 10 μg of total RNA, two rounds of poly(A)+ RNA selection were performed using Dynabeads oligo(dT)25 (Invitrogen) to obtain high-quality poly(A)+ RNA. DeLi-seq, a protocol for strand-specific high-throughput sequencing of RNA transcripts (RNA-seq), was used to prepare the sequencing library (15). The final PCR products (or final DeLi-seq libraries) were purified by a Zymo Clean and Concentrator-5 kit (Irvine, CA) and quantified by a Qubit Fluorometer (Invitrogen) before Illumina sequencing.

Sequencing data analysis.

The trimmed reads were aligned to a human reference genome (human genome build 19 [hg19] assembly) from the University of California—Santa Cruz (UCSC) genome browser ( and an HPV58 reference genome (GenBank accession number D90400 or GI:222386) from the PaVE database ( simultaneously using the Burrows-Wheeler alignment tool (BWA) with default settings (20). The output alignment files in sequence alignment/map (SAM) format were further processed using SAMtools (21). The genome coverage files (WIGGLE or BAM files) were loaded onto the Integrative Genomics Viewer (IGV []) to visualize sequence alignments, genomic annotations, and substitutions (22) with default settings.

We defined a position as identical if more than 95% of the bases obtained were identical to those of the reference genome. Nucleotide substitutions were identified if the majority of bases in the reads differed from the reference genome with a read depth of 5 or more. Nucleotide positions with a read depth of less than 5 were treated as ambiguous sites since there was insufficient depth to make a high-confidence call.

A genomic phylogenetic tree analysis for each Chinese isolate along with the reference HPV58 genome was constructed using ClustalW2, a multiple-sequence alignment tool. Branch distances of the tree were calculated on the basis of percentage identity (PID) between the two sequences at each aligned position.

Isolation and cloning of full-length HPV58 genomes from clinical samples.

RCA reaction products without S1 digestion and DNA shearing were selectively linearized by AgeI or DraII digestion of HPV58 DNA. The primer set of Pr3506 and Pr7036 (Table 1) was used to amplify an ~3.5-kb fragment from nucleotides (nt) 3506 to 7036 of AgeI-digested RCA products, while the primer set of Pr6905 and Pr3694 (Table 1) was used to amplify an ~4.6-kb fragment from nt 6905 to 3694 of DraII-digested RCA products. The two fragments amplified by high-fidelity Taq DNA polymerase (Roche, Indianapolis, IN) were separately inserted into a pCR-XL-TOPO vector (Invitrogen) and sequenced by primer walking. Subsequently, the 3.5-kb insert from the vector was swapped into the 4.6-kb insert-containing vector at the NotI and BstEII sites to create a full-length HPV58 genome in the pCR-XL-TOPO vector.

Table 1
HPV-specific primers

PCR and reverse transcription-PCR (RT-PCR).

To examine the HPV DNA in each sample, gene-specific primers were designed according to the HPV58 open reading frames (ORFs) (23), and primers are listed in Table 1. HPV16 plasmid DNA (0.2 ng) served as a positive PCR control with the primer pair Pr122 and Pr821 (Table 1). Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) served as a DNA loading control with the primer pair oZMZ269 and oZMZ270 (24). PCR products were digested overnight by SwaI at 25°C, by BstEII at 60°C, or by MfeI at 37°C.

RT-PCR was performed on total RNA (2 μg) extracted from each clinical sample using three pairs of HPV58 primers (Table 1): Pr116 and Pr762 for detection of E6/E7 RNA splicing, Pr743 and Pr3532 for detection of E1/E2 RNA splicing, and Pr3506 and Pr5787 for detection of L2/L1 RNA splicing. RT reactions were conducted in the presence of random hexamers at 42°C for 1 h. RT-PCR products were gel purified and sequenced.

Nucleotide sequence accession numbers.

GenBank accession numbers for the HPV58 genomes of CNZJ-1, CNZJ-2, and CNZJ-3 are KC860269, KC860271, and KC860270, respectively.


Decoding the full-length HPV58 genome in cervical lesions.

Since the HPV genome is present at several orders of magnitude less than the human genome in cervical lesions, an enrichment procedure is required for efficient viral sequencing. This was achieved by RCA, which preferentially amplifies the circular HPV genome over the linear human genome (16, 17). The enriched DNAs were then subjected to debranching, S1 nuclease digestion, end repair, and nick translation to reduce potential chimeras from the RCA reaction (25). By using this protocol, we examined 20 DNA samples, including 10 each from CIN tissues and cervical cancer tissues, respectively.

As shown in Table 2, we obtained 38.6 to 195.9 million paired-end (PE) sequence reads from each sample by RCA-seq. Subsequently, the raw sequence reads from each sample were mapped to the reference human genome (hg19) and HPV58 genome using the BWA aligner. Of these uniquely mapped reads, contributions of human origins ranged from 43.91% (sample 12) to 74.5% (sample 3). Among 20 samples examined, 10 contained sequence reads uniquely mapped to the HPV58 genome, ranging from 0.0002% (181 counts, sample 20) to 1.1647% (1,399,332 counts, sample 13). Six of these 10 samples had coinfection with HPV16, but we focused our study on HPV58 in this report. To test whether viral read abundance reflects the copy numbers of HPV58 genome in the samples, we analyzed nine samples (four CIN samples and five cervical cancer tissues) for HPV58-specific amplification by PCR. Figure 1A shows that HPV58 at levels of only 476, 825, and 829 reads in samples 12, 8, and 14, respectively (Table 2), was not detectable, but at 3,453 reads or above (Table 2) the virus was detectable by PCR. Sample 13 with more than 1 million HPV58 reads gave the strongest PCR product, with a band intensity 10 times greater than that from samples 9 and 10, suggesting that the RCA-seq reads correlate with copy numbers of the virus genome in the tissues. This was confirmed by qPCR showing that HPV58 copy numbers were enriched by RCA to 7.31 × 106 copies (~6,300-fold increase) from the original 1.15 × 103 copies in sample 13 and to 2.0 × 104 copies from an undetectable level in sample 10. As an internal control, the linear GAPDH DNA in sample 13 exhibited no enrichment.

Table 2
RCA-seq detection of HPV58 from 20 Chinese cervical tissue samples
Fig 1
HPV58 in Chinese women with cervical lesions. (A) PCR validation of HPV58 DNA in the selected samples analyzed by RCA-seq. HPV58 DNA was amplified by Pr6159 and Pr7077 and resolved in a 1.5% agarose gel. GAPDH served as a loading control. Water and HPV16 ...

Figure 1B shows annotation of all sequence reads from 10 samples to the HPV58 genome position. Six samples (samples 9, 10/10B, 13, 15, 18, and 19) had sufficient reads for reconstruction of the respective full-length genome. The other four (samples 8, 12, 14, and 20) had fewer than 1,000 sequence reads, giving incomplete genome coverage. The average coverage of the sequence reads in each sample at each genomic position varied from 28 (sample 19) to 17,885 (sample 13), with samples 13 and 18 having the highest coverage depth.

Existence of two HPV58 sublineages in Zhejiang Province, China.

By comparing the nucleotide sequence at each position against the reference HPV58 genome containing 7,824 nucleotides, we uncovered 52 nucleotide substitutions (0.66%) scattered in the HPV58 genome (Fig. 2 and and3A),3A), with 37 from samples 13, 15, and 18, 39 from sample 10 (10B), and 22 from sample 9. The other five samples, which had less coverage depth of the HPV58 genome, were excluded from the substitution analysis. Subsequently, we grouped these clinical HPV58 isolates as Chinese Zhejiang-1, -2, and -3 (CNZJ-1, -2, and -3, respectively) based on sequence similarity (Fig. 2 and and3A3A).

Fig 2
Single nucleotide polymorphisms in the genomes of Chinese HPV58 CNZJ-1, -2, and -3. SNPs in six HPV58 genomes (sample 10 was run in duplicate) of Chinese isolates were aligned against the reference genome (23). Alignments were visualized by the Integrative ...
Fig 3
Chinese HPV58 isolates are sublineage A1 and A2 viruses. (A) Nucleotide substitutions of Chinese HPV58 isolates with respect to the reference genome (GI:222386). Amino acid (AA) residues are expressed in one-letter code, with the letter preceding the ...

Phylogenetic tree analyses (Fig. 3B) show that the CNZJ-3 is close to reference HPV58 (23), with a sequence similarity of 99.73%, and thus belongs to sublineage A1. The other two, CNZJ-1 and -2, deviate more from the reference genome and could be classified to sublineage A2 (6). Although sample 9 is an A1 virus, it contains two mutations described in A2: an A388C substitution in the E6 ORF and a T803C substitution in the E7 ORF (26). Within A2 isolates, CNZJ-2 (sample 10) differs from CNZJ-1 (samples 13, 15, and 18) at six nucleotide positions (nt 5375, 5378, 5437, 6560, 6827, and 6950) (Fig. 3A). Although the majority of these substitutions are silent, some lead to changes of amino acid residues from the reference HPV58 sequence. These changes include 9 residues in A1 and 13 residues in A2 (Fig. 3A). Many of these substitutions are novel and occur in the E1, E2, E4, and L2 coding regions.

Presence of an episomal HPV58 genome in clinical CIN2 lesions.

From the sequencing data it could not be distinguished whether HPV58 is integrated into the host chromosome and/or is present as an episome. Given that high-risk HPV genomes commonly integrate into the host genome, two additional approaches were taken to measure HPV58 status in these clinical samples. First, we detected a head-to-tail junction of the HPV58 genome. Two pairs of primers were used to amplify the regions overlapping the tail-to-head junction (Fig. 4A). All amplicons showed a correct tail-to-head junction from nt 7824 to 1 in the HPV58 genome (TATATAATA/CTAAACTA, where the slash indicates the division between the tail and head).

Fig 4
HPV58 genome in cervical CIN2 lesions is episomal. (A) Verification of the head-to-tail junction of the HPV58 genome by PCR and sequencing. Two pairs of primers were designed on each side of the junction (diagram above) for PCR amplification of HPV58 ...

Second, the full-length HPV58 genome from three clinical CIN2 lesions was analyzed by long-PCR amplification. As shown in Fig. 4B, two large fragments of HPV58 DNA, overlapped from nt 3505 to 7036 and from 6905 to 3694, were amplified from each sample, with a respective size of ~3.5 kb and ~4. 6 kb. This strategy allowed us to amplify an episomal (circular) HPV58 genome. By cloning and sequencing the amplified fragments, we verified all 52 nucleotide substitutions identified by RCA-seq. Further analyses showed that these nucleotide substitutions either create or inactivate eight restriction enzyme cutting sites which distinguish HPV58 sublineage A1 from A2 and even CNZJ-1 (sample 13) from CNZJ-2 (sample 10) within the A2 sublineage (Fig. 5). By ligation of the ~3.5-kb fragment to the ~4.6-kb fragment (Fig. 4B) into the pCR-XL-TOPO vector, we obtained a full-length HPV58 genome of CNZJ-1, CNZJ-2, and CNZJ-3.

Fig 5
Nucleotide substitutions in Chinese HPV58 isolates create or inactivate restriction enzyme cutting sites. (A) Distinguishable restriction enzyme cutting sites due to nucleotide substitutions in HPV58 genomes. *, nucleotide position with a substitution ...

HPV58 early transcripts from cervical lesions are spliced transcripts from E6 and E1/E2 ORFs.

Using the CIN2-derived sample 13 giving the highest coverage of sequence reads for the entire HPV58 genome, we extracted total cell RNA from this sample and investigated the HPV58 transcription profile by DeLi-seq. Overall, ~230 million paired-end sequence reads were obtained, of which 47,958 reads were predominantly mapped to the sense strand of the early region of the HPV58 genome, with average coverage reads of 1,139 counts at a given position (Fig. 6A, lower portion). No reads were detected in the antisense strand. The coverage values of sequence reads to the late region were less than 7, indicating that viral late transcripts were much less abundant in this CIN2 sample. This was in sharp contrast to an RCA-seq profile derived from the same sample, where the DNA sequence reads covered the entire genome continuously at high coverage depth (Fig. 6A, top portion).

Fig 6
HPV58 gene expression in cervical lesions. (A) Visualization of viral gene expression in cervical tissues by RNA-seq. Total DNA and RNA were collected from sample 13 for RCA-seq and RNA-seq separately. Both the RCA-seq reads (viral DNA) and RNA-seq reads ...

Interestingly, the majority of the RNA sequence reads are clustered at three regions overlapping the E6, E7, and E4/E5 ORFs in the early region (Fig. 6A, lower portion). Notably, the first nucleotide of the read was mapped to nt 100C or 102G, 10 or 8 nucleotides upstream of the E6 start codon AUG and 25 or 27 nucleotides downstream of a TATA box in the reference HPV58 genome. The last nucleotide of the RNA-seq reads in a single-base annotation analysis was mapped to nt 4230A, 21 nucleotides downstream of the viral early poly(A) signal in the reference genome. Examination of the sequences upstream and downstream of the three read clusters revealed the presence of a consensus splice site next to each cluster of the reads, suggesting that the regions with lower read coverage presumably represent an intron, whereas the regions with high read coverage could represent an exon of the spliced mRNA (Fig. 6A, lower portion).

To confirm the RNA splicing event in HPV58 early transcripts, we performed RT-PCR on total cell RNA extracted from cervical lesions using several primer pairs from the putative exon regions determined by RNA-seq (Fig. 6A). Two spliced RNA products from the viral early region were detected from both sample 10 (CNZJ-2) and sample 13 (CNZJ-1) (Fig. 6B), but none could be amplified from the viral late region when the primer pair Pr3506 and Pr5787 was used (data not shown). Sequencing of the gel-purified RT-PCR products indicated that one splicing event occurred from nt 232 to 510 in the E6 ORF region and that the other spliced from nt 898 to 3355 in the E1/E2 ORF regions (Fig. 6C).

Construction of an HPV58 transcription map from the viral early region.

We next aimed to reconstruct a transcription map for the HPV58 early region. As diagramed in Fig. 7, HPV58 early transcripts are most likely to start at nt 100 or 102 from a viral early promoter containing a TATA box at nt 72 in the virus genome and are polyadenylated from a nt 4230A cleavage site using an early poly(A) signal at nt 4209. Similar to HPV16 and HPV18 early transcripts (27, 28), HPV58 early transcripts also consist of three exons and two introns, which can be alternatively spliced. Splicing from nt 232 to 510 in the E6 ORF leads to an E6*I ORF, which is truncated compared to the E6 ORF. Thus, retention of this intron is required for viral E6 expression. The second RNA splicing event from nt 898 to 3355 disrupts both the E1 and E2 ORFs. Therefore, retention of this intron is important for E1 and E2 expression. An RNA splicing event from nt 232 to 3355 is also expected, as seen in other members of the alpha-HPVs. In this map, the viral E1^E4 ORF spans two exons, which is analogous to the structure previously described for other alpha-HPVs (27, 28). Because the first AUG of E1^E4 is positioned in the first exon, formation of an intact E1^E4 ORF requires RNA splicing.

Fig 7
A simple transcription map of HPV58 early genes deduced from HPV58-infected cervical tissues. Illustrated at the top of the figure is the updated HPV58 genome organization from PaVE. Note that the frame order changes from the original publication (23 ...


HPV58 accounts for a larger share of disease burden in East Asia (7). Recent studies of HPV58 mainly focus on the prevalence of different HPV variants based on nucleotide variations from the viral long control region (LCR) and the L1, E6, and E7 ORFs in different geographical regions (6, 26). To date, little is known about HPV58 genome variations and transcription of the genome, which prevents us from better understanding HPV58 molecular biology, pathogenesis, and diagnosis. Because of the noncultivable characteristics of papillomaviruses, which can be propagated only in sophisticated epithelial raft cultures (2931), whole-HPV genomic analysis is hampered by a lack of a sufficient amount of purified viral genome for traditional PCR amplification and direct sequencing analysis. In this report, we took advantage of RCA (16, 17) and conducted for the first time a genome-wide analysis of HPV58 and its expression in cervical lesions. We applied RCA to enrich viral DNA from a clinical sample and performed high-throughput deep sequencing of the RCA products (32). Although HPV58 DNA molecules are rare relative to human DNA sequences in clinical samples, RCA is capable of enriching viral DNA with a length of 8 kb by 2.4 × 104-fold, compared to linear cellular DNA, directly from infected cells (17). In this study, we found that RCA could enrich the copy numbers of the episomal HPV58 genome by more than 6,300-fold but was unable to do so for a linear genomic GAPDH DNA.

In this study, a total of 52 nucleotide substitutions from Chinese HPV58 isolates were identified cross over the reference genome by RCA-seq and verified by cloning and sequencing of three HPV58 genomes, indicating that the RCA-seq method used in our study, in addition to its deep coverage of any full-length genome sequences, was reliable in identifying nucleotide substitutions compared with a proofreading long-PCR template cloning and sequencing approach (~0.12% error rate [unpublished data]). The nucleotide substitutions identified in this study create or inactivate restriction enzyme cutting sites which distinguish HPV58 sublineage A1 from A2 and even CNZJ-1 from CNZJ-2 within the A2. In addition to two (C307T and A388C) in E6, four (G694A, T744G, G761A, and T803C) in E7, and five (A6014C, A6416G, T6434C, A6539G, and C7266T) substitutions in the L1 region and LCR that were reported in other PCR-based studies (6, 26), our RCA-seq identified 41 novel nucleotide substitutions from Chinese HPV58 isolates. The nucleotide substitutions appear mainly in two regions, with 12 from nt 3430 to 4136 covering the E2/E4/E5 ORFs and 13 from nt 4621 to 5540 covering the L2 ORF. Interestingly, the nucleotide substitutions from nt 4621 to 5540 appear less frequently in A1 isolate CNZJ-3. Thus, the two regions will be useful for further classification of HPV58 isolates. Moreover, an A-to-G substitution was identified at nt 7714 in all Chinese HPV58 A2 isolates, which is different from a reported C-to-G substitution at this position (6). We notice that the reference HPV58 genome (accession number D90400 [GI:222386] or NC_001443 [GI:9626489]) has an A at nt 7714 (23). We did not find any HPV58 A3 isolates in this study or determine the relation of this lineage to oncogenic risk (6), presumably due to our limited sample size.

Comparative study of viral DNA and RNA shows that HPV58 transcribes only from one strand of its double-stranded genome in one direction. This conclusion is largely based on the usage of the strand-specific RNA-seq strategy. In addition, we found two RNA splicing events for HPV58 early transcripts, one from nt 232 to 510 and one from nt 898 to nt 3355. These two splicing events are conserved in all HPV58 isolates sequenced in this study. Analogous to other high-risk HPVs, splicing from nt 232 to 510 of the HPV58 E6/E7 pre-mRNAs disrupts the E6 ORF, leading to production of an E6*I mRNA which encodes an E6*I protein with 59 amino acid residues. Similar to all other E6*I proteins from high-risk HPVs, the N-terminal 41 amino acid residues of HPV58 E6*I are the same residues as seen in the N terminus of HPV58 E6. It remains unknown whether the E6 splicing in HPV58 is necessary for the expression of the downstream E7 ORF observed in HPV16 and HPV18 (33). Although a slightly increased read coverage was noticed from the nt 3355 3′ splice site upstream to nt 2702 (Fig. 6A, lower portion), whether this represents alternative 3′ splice site usage in the intron or a transcription start site upstream of the nt 3355 splice site remains to be determined.

In this study, the expression of HPV58 late genes appeared to be below the detection level of RNA-seq and RT-PCR. This could be because tissue samples were from only the early phase of HPV58 infection. Nevertheless, with the advent of our RCA-seq and RNA-seq techniques, we can decode any viral genome sequence and gene transcription from clinical samples, without prior knowledge of the underlying genome information, in a timely and cost-effective manner.


This study was supported by the Intramural Research Programs of the NCI, Center for Cancer Research, and the NHLBI, NIH, and by the Natural Science Foundation of China (NSFC 81172475) and the Natural Science Foundation of Zhejiang Province of China (grant number LQ13H160003). Y.L., a visiting fellow to NCI, was supported by Zhejiang University Women's Hospital.

We thank Ding J. Jin for his comments on the manuscript.


Published ahead of print 19 June 2013


1. Bodily J, Laimins LA. 2011. Persistence of human papillomavirus infection: keys to malignant progression. Trends Microbiol. 19:33–39 [PMC free article] [PubMed]
2. zur Hausen H. 2002. Papillomaviruses and cancer: from basic studies to clinical application. Nat. Rev. Cancer 2:342–350 [PubMed]
3. Walboomers JM, Jacobs MV, Manos MM, Bosch FX, Kummer JA, Shah KV, Snijders PJ, Peto J, Meijer CJ, Munoz N. 1999. Human papillomavirus is a necessary cause of invasive cervical cancer worldwide. J. Pathol. 189:12–19 [PubMed]
4. Munoz N, Bosch FX, de Sanjose S, Herrero R, Castellsague X, Shah KV, Snijders PJ, Meijer CJ. 2003. Epidemiologic classification of human papillomavirus types associated with cervical cancer. N. Engl. J. Med. 348:518–527 [PubMed]
5. Durst M, Gissmann L, Ikenberg H, zur Hausen H. 1983. A papillomavirus DNA from a cervical carcinoma and its prevalence in cancer biopsy samples from different geographic regions. Proc. Natl. Acad. Sci. U. S. A. 80:3812–3815 [PubMed]
6. Chan PK, Luk AC, Park JS, Smith-McCune KK, Palefsky JM, Konno R, Giovannelli L, Coutlee F, Hibbitts S, Chu TY, Settheetham-Ishida W, Picconi MA, Ferrera A, De MF, Woo YL, Raiol T, Pina-Sanchez P, Cheung JL, Bae JH, Chirenje MZ, Magure T, Moscicki AB, Fiander AN, Di SR, Cheung TH, Yu MM, Tsui SK, Pim D, Banks L. 2011. Identification of human papillomavirus type 58 lineages and the distribution worldwide. J. Infect. Dis. 203:1565–1573 [PubMed]
7. Chan PK. 2012. Human papillomavirus type 58: the unique role in cervical cancers in East Asia. Cell Biosci. 2:17.10.1186/2045-3701-2-17 [PMC free article] [PubMed] [Cross Ref]
8. Smith JS, Lindsay L, Hoots B, Keys J, Franceschi S, Winer R, Clifford GM. 2007. Human papillomavirus type distribution in invasive cervical cancer and high-grade cervical lesions: a meta-analysis update. Int. J. Cancer 121:621–632 [PubMed]
9. Lin QQ, Yu SZ, Qu W, Cruz Y, Burk RD. 1998. Human papillomavirus types 52 and 58. Int. J. Cancer 75:484–485 [PubMed]
10. Hong D, Ye F, Chen H, Lu W, Cheng Q, Hu Y, Xie X. 2008. Distribution of human papillomavirus genotypes in the patients with cervical carcinoma and its precursors in Zhejiang Province, China. Int. J. Gynecol. Cancer. 18:104–109 [PubMed]
11. Romanczuk H, Howley PM. 1992. Disruption of either the E1 or the E2 regulatory gene of human papillomavirus type 16 increases viral immortalization capacity. Proc. Natl. Acad. Sci. U. S. A. 89:3159–3163 [PubMed]
12. Shirasawa H, Tomita Y, Sekiya S, Takamizawa H, Simizu B. 1987. Integration and transcription of human papillomavirus type 16 and 18 sequences in cell lines derived from cervical carcinomas. J. Gen. Virol. 68:583–591 [PubMed]
13. Corden SA, Sant-Cassia LJ, Easton AJ, Morris AG. 1999. The integration of HPV-18 DNA in cervical carcinoma. Mol. Pathol. 52:275–282 [PMC free article] [PubMed]
14. Li H, Zhang R, Cai Y, Li Y, Cheng X, Zhu B, Yang Y, Xiang Y. 2012. Determination of integrated HPV58 sequences in cervical lesions. Int. J. Gynecol. Cancer 22:1234–1237 [PubMed]
15. Ni T, Tu K, Wang Z, Song S, Wu H, Xie B, Scott KC, Grewal SI, Gao Y, Zhu J. 2010. The prevalence and regulation of antisense transcripts in Schizosaccharomyces pombe. PLoS One 5:e15271.10.1371/journal.pone.0015271 [PMC free article] [PubMed] [Cross Ref]
16. Dean FB, Nelson JR, Giesler TL, Lasken RS. 2001. Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11:1095–1099 [PubMed]
17. Rector A, Tachezy R, Van RM. 2004. A sequence-independent strategy for detection and cloning of circular DNA virus genomes by using multiply primed rolling-circle amplification. J. Virol. 78:4993–4998 [PMC free article] [PubMed]
18. Zhang K, Li JB, Gao Y, Egli D, Xie B, Deng J, Li Z, Lee JH, Aach J, Leproust EM, Eggan K, Church GM. 2009. Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human. Nat. Methods 6:613–618 [PMC free article] [PubMed]
19. Ni T, Corcoran DL, Rach EA, Song S, Spana EP, Gao Y, Ohler U, Zhu J. 2010. A paired-end sequencing strategy to map the complex landscape of transcription initiation. Nat. Methods 7:521–527 [PMC free article] [PubMed]
20. Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 [PMC free article] [PubMed]
21. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079 [PMC free article] [PubMed]
22. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. 2011. Integrative genomics viewer. Nat. Biotechnol. 29:24–26 [PMC free article] [PubMed]
23. Kirii Y, Iwamoto S, Matsukura T. 1991. Human papillomavirus type 58 DNA sequence. Virology 185:424–427 [PubMed]
24. Ajiro M, Jia R, Zhang L, Liu X, Zheng ZM. 2012. Intron definition and a branch site adenosine at nt 385 control RNA splicing of HPV16 E6*I and E7 expression. PLoS One 7:e46412.10.1371/journal.pone.0046412 [PMC free article] [PubMed] [Cross Ref]
25. Zhang K, Martiny AC, Reppas NB, Barry KW, Malek J, Chisholm SW, Church GM. 2006. Sequencing genomes from single cells by polymerase cloning. Nat. Biotechnol. 24:680–686 [PubMed]
26. Chan PK, Zhang C, Park JS, Smith-McCune KK, Palefsky JM, Giovannelli L, Coutlee F, Hibbitts S, Konno R, Settheetham-Ishida W, Chu TY, Ferrera A, Alejandra PM, De MF, Woo YL, Raiol T, Pina-Sanchez P, Bae JH, Wong MC, Chirenje MZ, Magure T, Moscicki AB, Fiander AN, Capra G, Young KE, Tan Y, Chen Z, Burk RD, Chan MC, Cheung TH, Pim D, Banks L. 2013. Geographical distribution and oncogenic risk association of human papillomavirus type 58 E6 and E7 sequence variations. Int. J. Cancer 132:2528–2536 [PubMed]
27. Zheng ZM, Baker CC. 2006. Papillomavirus genome structure, expression, and post-transcriptional regulation. Front. Biosci. 11:2286–2302 [PMC free article] [PubMed]
28. Wang X, Meyers C, Wang HK, Chow LT, Zheng ZM. 2011. Construction of a full transcription map of human papillomavirus type 18 during productive viral infection. J. Virol. 85:8080–8092 [PMC free article] [PubMed]
29. Meyers C, Frattini MG, Hudson JB, Laimins LA. 1992. Biosynthesis of human papillomavirus from a continuous cell line upon epithelial differentiation. Science 257:971–973 [PubMed]
30. Wilson JL, Dollard SC, Chow LT, Broker TR. 1992. Epithelial-specific gene expression during differentiation of stratified primary human keratinocyte cultures. Cell Growth Differ. 3:471–483 [PubMed]
31. Wang HK, Duffy AA, Broker TR, Chow LT. 2009. Robust production and passaging of infectious HPV in squamous epithelium of primary human keratinocytes. Genes Dev. 23:181–194 [PubMed]
32. Johne R, Muller H, Rector A, Van RM, Stevens H. 2009. Rolling-circle amplification of viral DNA genomes using ϕ29 polymerase. Trends Microbiol. 17:205–211 [PubMed]
33. Tang S, Tao M, McCoy JP, Jr, Zheng ZM. 2006. The E7 oncoprotein is translated from spliced E6*I transcripts in high-risk human papillomavirus type 16- or type 18-positive cervical cancer cell lines via translation reinitiation. J. Virol. 80:4249–4263 [PMC free article] [PubMed]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)