|Home | About | Journals | Submit | Contact Us | Français|
Human papillomavirus (HPV) initiates cervical cancer, and continuous expression of HPV oncogenes E6 and E7 is thought to be necessary to maintain malignant growth. Current therapies target proliferating cells, rather than specific pathways, and most experimental therapies specifically target E6/E7. We investigated the presence and expression of HPV in cervical cancer, to correlate HPV oncogene expression with clinical and molecular features of these tumors that may be relevant to new targeted therapies.
While virtually all cervical cancers contained HPV DNA, and most expressed E6/E7 (HPV-active), a subset (8%) of HPV DNA-positive cervical cancers did not express HPV transcripts (HPV-inactive). HPV-inactive tumors occurred in older women (median 54 vs. 45 years, p = 0.02) and were associated with poorer survival (median 715 vs 3046 days, p = 0.0003). Gene expression profiles of HPV-active and -inactive tumors were distinct. HPV-active tumors expressed E2F target genes and increased AKT/MTOR signaling. HPV-inactive tumors had increased WNT/β-catenin and Sonic Hedgehog signaling. Substantial genome-wide differences in DNA methylation were observed. HPV-inactive tumors had a global decrease in DNA methylation; however, many promoter-associated CpGs were hypermethylated. Many inflammatory response genes showed promoter methylation and decreased expression. The somatic mutation landscapes were significantly different. HPV-active tumors carried few somatic mutations in driver genes, whereas HPV-inactive tumors were enriched for non-synonymous somatic mutations (p-value < 0.0000001) specifically targeting TP53, ARID, WNT, and PI3K pathways.
The Cancer Genome Atlas (TCGA) cervical cancer data were analyzed.
Many of the gene expression changes and somatic mutations found in HPV-inactive tumors alter pathways for which targeted therapeutics are available. Treatment strategies focused on WNT, PI3K, or TP53 mutations may be effective against HPV-inactive tumors and could improve survival for these cervical cancer patients.
Essentially all cervical cancers contain human papillomavirus (HPV) DNA  suggesting that HPV infection is necessary for cervical cancer initiation. Routine HPV testing has revealed that most HPV infections resolve, indicating that HPV infection is necessary, but not sufficient for the development of cervical cancer, and that additional events are required .
The molecular features of cervical cancers are beginning to be described [3–6]. Here we report a detailed analysis of TCGA cervical cancer gene expression, DNA methylation, and somatic mutation profiles. Similar to recent reports in head and neck cancers , we identified a subset of tumors, which no longer express HPV E6/E7 oncogenes (HPV-inactive). These tumors have gene expression, DNA methylation and somatic mutation signatures different from HPV-active tumors, and more similar to those of other, viral-independent cancers. Implications for cervical cancer progression, and opportunities for targeted therapy are discussed.
We extracted all reads from 264 BAM files previously assembled to GRCh37-lite+HPV-Redux reference genome, and then re-assembled to a custom reference genome containing the HPV types most often associated with cervical cancer (Table (Table1).1). All but one of the cervical cancers analyzed (Supplementary Table 1) contained two or more normalized counts of Illumina DNA sequencing reads that assembled with HPV and were classified as HPV positive. In order to assign the HPV type, we used a more stringent cutoff of 50 normalized counts of HPV aligned reads, to classify a sample positive for a specific HPV type. We manually inspected all alignments and discarded instances of duplicated reads aligning to more than one HPV type. Using these criteria, there were 255 samples that we used to determine HPV type distribution (Table (Table1).1). Twenty-nine percent (75/255) of typed samples were positive for multiple HPV types. In order to characterize the gene expression landscapes of cervical cancers, we obtained the RNA-seq FASTQ files from the 261 patients with both tumor DNA and RNA sequencing data available, and determined expression levels of all genes (including HPV genes) (Supplementary Table 2A). To characterize the samples by HPV oncogene expression, we plotted the sum of the E6 and E7 expression against the sum of all HPV gene expression and performed unsupervised partitioning by K-Means clustering using the Euclidian distance function. This class discovery method revealed 2 distinct classes of cervical cancers (Figure (Figure1),1), those with high levels of E6 and E7 oncogene expression (HPV-active, n = 241, 92.3%) and those with low, or zero E6/E7 expression (HPV-inactive, n = 20, 7.7%), similar to what we previously reported for head and neck cancer .
The distributions of patient ancestry, tumor grade, and stage were not different between HPV-active and -inactive cancers (Table (Table2,2, Supplementary Table 3). However, patients with HPV-inactive tumors were significantly older at diagnosis (median age 54 vs 45, log rank Mantel-Cox p = 0.0360) (Figure (Figure2A),2A), and their median survival was dramatically shorter (715 days vs 3046 days, Gehan-Breslow-Wilcoxon p = 0.0003) (Figure (Figure2B).2B). Adenocarcinomas and adenosquamous carcinomas were significantly more common in the HPV-inactive than in the HPV-active tumors (Table (Table2,2, Supplementary Table 3) (p = 0.00022). Additionally, the percentage of HPV16 positive tumors was greater in the HPV-inactive tumors (81.8% vs 73.8%, Supplementary Table 1). Taken together, these differences point to the existence of a subtype of cervical cancers that silence HPV oncogene expression over time and have a worse prognosis.
We compared expression levels for 40,014 genomic elements and found that 2446 genes were significantly differentially expressed (4-fold up or down; FDR-adjusted p < 0.05) (Figure (Figure3A,3A, Supplementary Table 4). The majority of these most differentially expressed genes (94.6%) were overexpressed in the HPV-inactive tumors. In order to explore differences in biological themes, we performed Gene Set Enrichment Analysis (Partek Genomics Suite) using normalized gene expression data (Supplementary Table 2A) to query the hallmarks (h) and the chemical and genetic perturbations (cgp) gene sets . Selected enriched gene sets are summarized in Table Table3,3, enrichment scores and p-values for all gene sets are listed in Supplementary Tables 2B and 2C.
As would be predicted for tumors that have become HPV-independent, E2F target gene sets are decreased in expression in the HPV-inactive tumors (Ishida_E2F_Targets-c2.cgp). The expression of HPV associated gene sets that characterize gene expression profiles of HPV-driven tumors of the oropharynx is also decreased in the HPV-inactive tumors (Slebos_head_and_neck_cancer_with_HPV_up-c2.cgp, Pyeon_HPV_positive_tumors_up–c2.cgp). These results support the conclusion that the HPV-inactive cervical tumors are more similar to HPV negative head and neck tumors than to HPV-positive tumors, and they show that the HPV-inactive tumor class is not only characterized by the absence of HPV transcripts, but also by the absence of expression of many other human genes normally positively correlated with the presence and expression of HPV.
HPV-inactive tumors have lost E6 and E7 oncogene expression, and have evolved alternative pathways to support cancerous growth. Gene sets associated with activated WNT/CTNNB1 signaling, a pathway important to other types of adenocarcinomas, are elevated in expression in the HPV-inactive tumor class (Hallmark_WNT_beta_catenin_signaling–h.all.v5.1).
Multiple interferon response gene sets are decreased in the HPV-inactive tumors, including gene sets that deal directly with the inflammatory reaction (Browne _interferon_responsive_genes–c2.cgp, Sana_response_to_INFG_up–c2.cgp, Moserle_INFA_response–c2.cgp, Einav _interferon_signature_in_cancer–c2.cgp, Hecker_IFNB1_targets–c2.cgp, Der_inf_alpha_response_up–c2.cgp, Der_inf_beta_response_up–c2.cgp, Der_inf_gamma_response _up–c2.cgp, interferon_alpha_response-h.all.v5.1, interferon _gamma_response-h.all.v5.1), and gene sets that deal indirectly with viral mediated inflammatory response (Bosco_interferon_induced_antiviral_module–c2.cgp, Zhang_interferon_response–c2.cgp). Thus, HPV-active tumors still have an active interferon associated inflammation response that is lost in HPV-inactive tumors. Robust inflammation present in the HPV-active tumors, along with higher expression of immune checkpoint inhibitor targets TIGIT, CTLA4 and PDL-1 (Supplementary Table 2A), suggest that immune checkpoint inhibitor therapy may be a productive approach for these virally driven cancers.
Differences in gefitinib resistance gene sets (Coldren_Gefitinib_resistance_up – c2.cgp, Coldren_Gefitinib_resistance_down – c2.cgp) indicate gefitinib would be more effective in HPV-active tumors, whereas differences in dasatinib resistance gene sets (Huang_Dasatinib_resistance_up – c2.cgp, Huang_Dasatinib_resistance_down – c2.cgp) indicate HPV-inactive tumors may be more sensitive to dasatinib therapy. Thus, informing treatment decisions based on HPV gene expression may improve outcomes for cervical cancer patients.
To explore mechanisms that could explain the differential gene expression observed, we compared DNA methylation and found a dramatic genome-wide loss of methylation in the HPV-inactive class, with 24,206 loci being significantly differentially methylated (Bonferroni-adjusted p < 0.05). Among these most differentially methylated loci, 77% were hypomethylated in HPV-inactive tumors. This is concordant with reports that viral independent cancers are often hypomethylated compared to the normal surrounding tissue  (Figure (Figure3B,3B, Supplementary Table 5). Loci associated with gene promoter regions were evenly split between increased and decreased methylation (Figure (Figure4A)4A) and we noted that the gene expression was inversely correlated with promoter methylation (Figure (Figure4B).4B). There were 164 genes that had increased promoter methylation and decreased gene expression in HPV-inactive tumors (Supplementary Table 6). These genes were significantly enriched for the interferon response gene sets (Hallmarks_Interferon_Gamma_Response, Hallmarks_Interferon_Alpha_Response).
We compared the numbers and types of somatic mutations between the two classes (Table (Table4).4). HPV-active tumors had on average 115 somatic mutations per tumor, whereas HPV-inactive tumors had 228 somatic mutations per tumor. The overall background rates of silent (synonymous) mutations were different. HPV-inactive tumors had, on average 1.7 times as many silent mutations as did HPV-active tumors (58.5 mutations per tumor vs 34.8 mutations per tumor). This is consistent with the older age of HPV-inactive patients, which may have allowed additional passenger mutations to accumulate in the progenitor cell that became the last common ancestor of the tumor. We also noted a larger difference in the numbers of non-synonymous somatic mutations. HPV-inactive tumors had on average twice as many non-synonymous mutations as did HPV-active tumors (169 non-synonymous mutations per tumor vs 80 non-synonymous mutations per tumor). The ratio of non-synonymous to synonymous mutations, which is a crude measure of driver mutation load, was significantly higher in the HPV-inactive tumors (2.9 vs 2.3, Chi square two-tailed p < 0.0000001, odds ratio = 1.26 CI = 1.17–1.35) (Table (Table4).4). The non-synonymous/synonymous ratio in the HPV-active tumors (2.3) is not significantly different from the 2.2 ratio that is expected to occur by chance alone , indicating that the somatic mutations in HPV-active tumors are not enriched for drivers. In contrast, the non-synonymous/synonymous ratio in the HPV-inactive tumors (2.9) is significantly greater than what is expected by chance alone, which indicates that these tumors are enriched for cancer driver mutations enabling them to evolve to HPV independence by acquiring non-synonymous driver mutations to cancer causing genes.
We also investigated potential differences in the underlying mutation processes by enumerating and comparing each of the six types of nucleotide substitutions that can possibly occur (A:T > C:G, A:T > G:C, A:T > T:A, C:G > A:T, C:G > G:C, C:G > T:A) . As has been seen in other epithelial cancers , the most common single nucleotide substitution observed was C:G > T:A, representing 44% of the mutational burden; however, this substitution was not different between the two groups of cervical cancer. C:G > T:A mutations have been previously reported in cervical cancers  and can occur by different mechanisms depending on the sequence context. For example, in the a CpG dinucleotide, the C > T mutation is due to the spontaneous deamination of methylated cytosines . This type of mutation was significantly enriched in HPV-inactive tumors (HPV-inactive 638/4,326 vs. HPV-active 2,710/27,495; Chi Square p-value < 0.0000001). The PTEN tumor suppressor gene was more frequently the target of non-enzymatic deamination of methylated cytosine (3/5 mutations, 60%) in the HPV-inactive tumors, and less frequently the target of this process (3/14 mutations, 21%) in HPV-active tumors.
Alternatively, C:G > T:A mutations when the C is preceded by a T are caused by APOBEC family of cytidine deaminases [11, 14]. The activity of this enzyme is upregulated in HPV infected cells leading to a characteristic mutation pattern often seen in HPV-mediated cancers . We found that HPV-active tumors have a significantly greater number of C:G > T:A mutations in a TpC context (HPV-active 7,342/27,495 vs HPV-inactive 808/4,326; Chi Square p-value < 0.0000001). The majority of the PIK3CA mutations observed in the HPV-active tumors were C:G > T:A substitutions (29/34, 85%) in which the C was preceded by a T (TpC), consistent with an APOBEC mediated mutational process . In contrast, only 33% of the PIK3CA mutations in HPV-inactive tumors (2/6) were C:G > T:A substitutions in which the C is preceded by a T (TpC). These two different mutation patterns (C > T) suggest that HPV-active and HPV-inactive tumors have different mutational process histories.
In order to place cervical cancers of both classes into context based on somatic mutations, we performed a global comparison of select TCGA cancer cohorts according to the significance of somatic mutations . MutSig outputs containing the p-values for every gene in select cancer cohorts were obtained from Firebrowse.org. There were 18,278 genes with one or more non-synonymous somatic mutations in any of the 33 select TCGA cohorts. MutSig analysis revealed that 7,882 genes are significant, with a nominal p-value less than 0.05 in at least one cohort (MutSigCV v0.9). We used the minus-log p-values of these genes to drive unsupervised hierarchical cluster analysis of the cohorts (Figure (Figure5).5). In general, tumor cohorts clustered according to organ system such as GI tumors (stomach, colon and rectum), brain (glioma, lower grade glioma, glioma multiforme), and kidney (papillary, clear cell). The cervical cancers including both subsets of cervical cancers (HPV-active and HPV-inactive) are closely associated with uterine cancers (carcinosarcoma, endometrial carcinoma) with HPV-inactive cervical cancers mutationally more similar to uterine endometrial cancers.
In general, the frequencies with which MutSig significant cervical cancer genes are mutated are similar between the two classes of cervical cancer. For example, cervical cancer associated genes EP300 has on average 0.17 and 0.15 mutations per tumor (HPV-inactive, HPV-active respectively). Another commonly mutated gene, FBXW7, has 0.15 and 0.10 mutations per tumor (HPV-inactive, HPV-active respectively). However, there was a significantly different distribution in the mutation frequencies of other key cancer driver genes (Figure (Figure6).6). We compared the mutational frequencies of the HPV-inactive and HPV-active tumors using our somatic mutation calls on 258 tumor/normal cervical cancer pairs. 361/468 genes had at least one somatic mutation identified by our variant calling pipeline. Twenty-six genes had different somatic mutation rates and all were more frequently mutated in the HPV-inactive tumors (Figure (Figure6).6). TP53, a gene not commonly mutated in cervical cancers, was 17 times more likely (95% CI 5.8, 49.7) to be mutated in HPV-inactive tumors (47%) than in HPV-active tumors (4%). This is consistent with the absence of E6 HPV oncogene expression in inactive tumors. The enrichment of TP53 mutations in the HPV-inactive tumors provides convincing evidence that these tumors have escaped HPV dependence. Chromatin remodeling genes ARID1A and ARID5B, WNT signaling regulator CTNNB1, and the negative regulator of PI3K signaling PTEN, were among the notable genes significantly more mutated in HPV-inactive cervical cancers. Previously, TP53 and ARID1A have been reported as mutually exclusive in endometrial cancers . We found that one tumor contained mutations in both of these genes and twenty-six contain mutations in one of these genes reinforcing the mutual exclusivity between these genes. In addition to PTEN mutational differences, the PIK3CA gene was more frequently mutated in the HPV-inactive tumors (26% vs 13%). Overall, AKT pathway mutations are more common in the HPV-inactive tumors, indicating that an AKT inhibitor may be more effective in treating these tumors . Finally, the enrichment of the CTNNB1 mutations in HPV-inactive tumors is noteworthy because of the widespread up-regulation of WNT target gene expression that we detected in the HPV-inactive tumors by GSEA.
Essentially all cervical cancers are HPV positive by DNA, and it is widely accepted that HPV oncogene expression is necessary for cervical cancer development. Here we show strong evidence that a fraction of cervical cancers can slowly evolve to HPV independence by accumulating somatic mutations to cancer driver genes. HPV-inactive cancers differ from HPV-active by more than just the expression of HPV E6/E7, diminishing the possibility of a trivial explanation such as a technical failure to detect HPV transcripts. The patients harboring HPV-active tumors are on average 9 years younger than the HPV-inactive patients, and they have dramatically longer overall survival (4.3 fold). Global gene expression changes are also apparent, with E2F target genes up-regulated in HPV-active tumors. Additionally, other gene expression differences mirror what has been reported for HPV-positive and HPV-negative head and neck cancers [7, 20]. There were also profound differences in global DNA methylation levels, with more methylation observed in HPV-active tumors than in HPV-inactive tumors. The somatic mutation profiles are also substantially different. The ratio of non-synonymous to synonymous mutations, which is a crude measure of driver gene mutation load, is low for TCGA cervical cancers, but HPV-inactive tumors are significantly enriched for non-synonymous changes, which often target important cancer driver genes. Somatic mutations consisting of C:G > T:A changes are common in both types of cervical cancers, but are unevenly distributed when subdivided according to the surrounding sequence context and mutational mechanism. For example, the PTEN tumor suppressor gene was more frequently mutated in HPV-inactive tumors, and it was more often the target of spontaneous, non-enzymatic deamination of methylated cytosines in a CpG context. In contrast, the PIK3CA oncogene was frequently mutated in both tumor types, but was more often the target of APOBEC-mediated deamination in the HPV-active tumors. Finally, the cancer driver genes TP53, CTNNB1, PTEN, and ARIDs were far more likely to be mutated in HPV-inactive tumors. In total, these clinical and molecular phenotypes strongly indicate that a subset of cervical cancer exists that is independent of HPV oncogene activity.
Personalized oncology promises to deliver therapies tailored to the genetic features of an individual's tumor, to improve outcome. Cervical cancers are usually treated with a combined regimen of platinum-based chemotherapy and radiation; however, few biomarkers of response to targeted therapies are available in this disease type. The differences between HPV-inactive and HPV-active tumors suggest the use of different targeted therapeutic approaches. For example, gefitinib, an EGF-receptor inhibitor, may be more effective in treating HPV-active cervical cancers, while dasatinib may be more effective in treating HPV-inactive tumors. Overall, somatic mutations in the genes upstream of AKT are more common in HPV-inactive tumors, suggesting that dual PI3K/MTOR inhibitors may be more effective for these patients. Finally, inflammation-associated gene sets are increased in HPV-active tumors, as are the targets for checkpoint inhibitor based immunotherapy such as, TIGIT, CTLA4, PDL-1 [21–23], indicating that immunotherapy, in combination with standard radiotherapy and DNA-damaging chemotherapy protocols may also be more effective in treating HPV-active tumors.
Most HPV infections do not develop into cervical cancer, but rather resolve within one or two years [24, 25]. Studies of immunosuppressed women have revealed higher incidences of cervical dysplasia, neoplasia, and cancers than in immunocompetent individuals, highlighting the role that the immune system plays in clearing HPV-infected cells [26, 27]. The small fraction of HPV infections that ultimately give rise to cervical cancer may do so only in individuals with genetic variants that attenuate some components of the immune response to HPV needed for clearance . Our results suggest that, among the population of women who are immune surveillance competent, a small percentage of HPV-infected cells may acquire somatic mutations in cancer driver genes, rendering the cells ultimately independent of the expression of HPV, and allowing sub-clones to arise that become HPV-inactive tumors. Thus, women with HPV-inactive tumors may have a genetic background that is more similar to the majority of women who readily clear HPV infections. Careful analysis of the genomes of women without and with persistent HPV infection may reveal important cervical cancer risk alleles, and ultimately provide additional clues as to the origin and development of HPV-inactive cancers.
All sequencing data were downloaded from TCGA repository CGHUB. We obtained TCGA CESC whole exome sequence data from tumors and matched normal samples that were assembled to the GrCH37-lite-+-HPV_Redux reference sequence. Tumor and matched normal BAM files, and RNA-seq FASTQ files that were available on February 10, 2015 at Cancer Genomics Hub (CGHub) were used. Access to this level data was approved by the National Cancer Institute. Normalized, mean centered DNA methylation data for the same samples were obtained from TCGA data portal.
Clinical information on the CESC samples analyzed were downloaded from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga/). Statistical analyses presented in Table Table22 were calculated using the Chi Square Test (R by C Table, OpenEpi, http://www.openepi.com/), unknown samples were not included in the test for significance. The log rank Mantel-Cox survival model was used to compare the age of disease-onset between the two classes (PRISM v6.0g). The Gehan-Breslow-Wilcoxon survival model was used to compare overall survival (PRISM v6.0g).
We created a custom reference by concatenating hg19 human reference genome, and 18 human papillomavirus whole genome sequences as listed on Table Table1.1. We extracted forward and reverse paired-end FASTQ reads from all tumor and normal BAM files using bam2fastq (https://gsl.hudsonalpha.org/information/software/bam2fastq). Reads were imported into CLC Genomics Workbench 8 and assembled to our custom reference sequence (hg19+HPV) with the following parameters: References = Genome (Genome), Masking mode = No masking, Mismatch cost = 2, Cost of insertions and deletions = Linear gap cost, Insertion cost = 3, Deletion cost = 3, Insertion open cost = 6, Insertion extend cost = 1, Deletion open cost = 6, Deletion extend cost = 1, Length fraction = 0.5, Similarity fraction = 0.8, Global alignment = No, Auto-detect paired distances = Yes, Non-specific match handling = Ignore, Output mode = Create reads track, Create report = Yes, Collect un-mapped reads = No.
HPV typing was determined by inspecting the number of normalized reads mapping to each HPV reference. HPV-reads were normalized by dividing the raw number of HPV mapping reads by the raw number of all mapped reads (viral and human) to control for the depth of sequencing for the different tumor samples. Resulting normalized reads were scaled down to the sample with the lowest depth of coverage (TCGA-MA-AA41, 38,688,500). The threshold for calling a specific HPV type was set at a normalized read count greater than 50 for that HPV type. Tumor variants were called using the following setting in CLC Genomics Workbench 8: Required significance (%) = 1.0, Ignore positions with coverage above = 100,000, Restrict calling to target regions = Not set, Ignore broken pairs = No, Ignore non-specific matches = No, Minimum coverage = 10, Minimum count = 2, Minimum frequency (%) = 1.0, Base quality filter = Yes, Neighborhood radius = 5, Minimum central quality = 20, Minimum neighborhood quality = 15, Read direction filter = No, Relative read direction filter = Yes, Significance (%) = 1.0, Read position filter = No, Remove pyro-error variants = No, Create track = Yes, Create annotated table = Yes, Create report = Yes.
To identify somatic variants we filtered each tumor's variants calls against the patient matched normal BAM file. Any variant appearing in 2 or more reads in the normal BAM file and covered by more than 20 reads was considered a germline variant and removed. The remaining variants were filtered for overlap with the coding sequence. Amino acid changes were identified using the standard codon table. We filtered marginal variant calls based on the following parameters: Minimum frequency (%) = 30.0, Minimum forward/reverse balance = 0.05, Minimum average base quality = 20.0, Variant frequency = Yes, Forward/reverse balance = Yes, Average base quality = Yes. Next, we annotated all variants with identical matches in the dbSNP, 1000 genomes phase 3, exome variant server (ESP6500), or COSMIC databases. We annotated all variants with overlapping gene name information. The initial germline variants filter failed to identify all germline variants due to regions of low coverage in the matched normal. The remaining germline variants were removed by filtering against exact matches in the dbSNP, 1000 genomes phase 3 and ESP6500 databases.
The unassembled FASTQ files from CGHub were imported into CLC Genomics Workbench 8 and assembled to our custom reference sequence (HG19+HPV) using the RNA-Seq Analysis tool with the following parameters: Reference type = Genome annotated with genes and transcripts, Reference sequence = Genome (Genome), Gene track = Genome (Gene), mRNA track = Genome (mRNA), Mapping type = Also map to inter-genic regions, Mismatch cost = 2, Insertion cost = 3, Deletion cost = 3, Length fraction = 0.8, Similarity fraction = 0.8, Global alignment = No, Auto-detect paired distances = Yes, Strand specific = Both, Maximum number of hits for a read = 30, Count paired reads as two = No, Expression value = Total counts, Calculate RPKM for genes without transcripts = Yes, Minimum read count fusion gene table = 25, Create report = Yes, Create fusion gene table = Yes, Create list of unmapped reads = No. Gene expression levels were defined as the total number of reads assembling to the exons of the gene, normalized to the total number of reads that assembled to all exons of all genes.
HPV classifier was determined using the normalized expression of all HPV reads against the sum of the HPV E6 and E7 oncogene reads. We utilized the unsupervised Partitioning Clustering tool in Partek Genomics Suite 6.5 to cluster the samples into two groups using the Euclidean distance function.
Differentially expressed human genes were identified using a one-way ANOVA and defined as greater than 4 fold different in either direction and FDR corrected p < 0.05. All genes were rank ordered by the t-test statistic and used to perform gene set enrichment analysis [8, 29].
We downloaded CESC Illumina 450k array mean-centered beta values from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/). Differentially methylated human loci were identified using a one-way ANOVA and defined as Bonferroni-corrected p-value < 0.05.
Non-synonymous to synonymous ratios were compared using the Chi Squared test (OpenEpi, http://www.openepi.com/). Somatic mutation landscapes were compared between multiple tumor cohorts using variant calls and MutSig scores from firebrowse.org. We manually divided the samples in the CESC cohort by HPV expression class and ran MutSig CV0.9 (https://www.broadinstitute.org/cancer/cga/mutsig). The p-values were minus log p transformed and used to drive the unsupervised hierarchical clustering (Pearson dissimilarity, complete linkage) in Partek Genomics Suite 6.5.
The per gene somatic mutation frequencies (Figure (Figure6)6) were compared between classes using a two-tailed Fisher's Exact p-value. The corresponding conditional maximum likelihood estimate of odds ratio and 95% CI (OpenEpi, http://www.openepi.com/ ) are plotted on Figure Figure66.
This study was supported by funding provided from the NIH grants U01CA158428 and R21CA201853.
CONFLICTS OF INTEREST