The 12 ORF encoding for the E2 proteins were amplified from viral genomic DNA corresponding to the different HPV genotypes, cloned by the gateway recombinational cloning system (Invitrogen) into the entry vector pDON207 (Invitrogen), and were listed in the ViralORFeome database 
. The E2 ORFs were then transferred into gateway-compatible destination vectors pGBKT7-gw to generate E2-GAL4 DNA-binding domain fusion proteins for Y2H; pCI-Neo-FLuc-gw to generate Firefly luciferase-E2 fusions proteins for steady state levels measurement; pSPICA-N2-gw to generate proteins with amino acids 110 to 185 of the humanized Gaussia princeps
luciferase in fusion with the N-terminus of E2 (GL2-E2 fusion proteins) for the High-Throughput Gaussia princeps
Luciferase-based Complementation Assay (see 
for construct details). Entry gateway plasmids for cellular partners were obtained either by PCR amplification from clones recovered by Y2H or from the human ORFeome resource (hORFeome v3.1). The cellular ORF were transferred into gateway-compatible destination vectors pSPICA-N1-gw to generate proteins fused at the N-terminus with the amino acids 18 to 109 of humanized Gaussia
luciferase (GL1-fusion proteins). Mutagenesis of E2 proteins from HPV 16 and 18 was performed by PCR-directed mutagenesis method. The luciferase reporter (pTK6E2BS) driven by E2-responsive promoter contained 6 E2 binding sites upstream the minimal TK promoter. E2 BS sequences were as follows: (aACCGTTTTCGGTtaaACCGTTTTCGGTt
)X3, designed after the study of Sanchez et al 
to be optimal for the binding of a large panel of E2 proteins. The polymerase III-directed Renilla Luciferase plasmid (polIII-Ren) used as an internal control of transfection contained a 100-mer nucleotide encompassing the human Histone H1 promoter upstream of the Renilla ORF (hRluc).
Yeast Two Hybrid
For yeast two hybrid screening, GAL4 DNA-binding domain-E2 fusion proteins, expressed from the pGBKT7 vector, were used to probe a human HaCaT cDNA library (Clontech), cloned in fusion with the GAL4 transcription activation domain in pACT2. Each independent screening was performed by mating pGBKT7-E2 transformed yeast strain AH109 (MATa, trp1-901, leu2-3, 112, ura3-52, his3-200, gal4Δ, gal80Δ, LYS2 : : GAL1UASGAL1TATA-HIS3, GAL2UAS-GAL2TATA-ADE2, URA3 : : MEL1UAS-MEL1TATA-lacZ) with Y187 strain (MATα, ura3-52, his3-200, ade2-101, trp1-901, leu2-3, 112, gal4Δ, met–, gal80Δ, URA3 : : GAL1UAS-GAL1TATA-lacZ) transformed with the HaCaT cDNA library. Mating was performed 4 hr at 30°C on plates of non-selective rich YCM media. The number of diploid cells generated was systematically evaluated to be at least 10 times higher the HaCaT cDNA library complexity (2.5×106, Clontech).
Mated yeasts were grown on selective medium lacking tryptophan, leucine and histidine (SD-W-L-H), and supplemented with 3-aminotriazol according to the basal autoactivation test previously performed (see below). HaCaT cDNA sequences from positive colonies were PCR amplified and sequenced. Independent Y2H screens were repeated in the same way for each of the E2 protein until around 100 PPI could have been sequenced.
Yeast Two-Hybrid Bait Basal Transactivation Test
Because bait constructs sometimes self-transactivate reporter genes, SD-W-L-H culture medium was supplemented with 3-aminotriazole (3-AT) in the Y2H screenings. Appropriate concentrations of this inhibitor were determined by growing bait strains (AH109 yeast strain transformed with each E2 bait) on SD-W-H culture medium supplemented with increasing concentrations of 3-AT. Concentrations of 3-AT ranging from 5 mM (for 33, 39, 18, 11, 5 and 8 E2) to 10 mM (for 1, 3, 6, 9, 32 and 16E2) were sufficient to counter the weak transactivation observed. This falls into the range of Clonetech standards.
Analysis of Sequenced Y2H PPI (Interactor Sequence Tag or IST)
A bioinformatic pipeline was developed to assign each IST to its native human genome transcript. First, ISTs were filtered by using PHRED at a high quality score, sequence was extracted based on a sliding window of 30 bases which is successively shifted 10 bases until the average quality value from the window falls. A 30 bases motif from pACT2 linker was searched, sequences downstream of this motif were translated into peptides and aligned using BLASTP against human protein sequence databases from Ensembl (release 58 based on NCBI assembly 37), Uniprot and primate EMBL. Low-confidence alignments (E value > 10−10, identity < 80% and peptide length < 20 amino acids), frameshifted and premature STOP codon containing sequences were eliminated.
High-Throughput Gaussia princeps Luciferase-Based Complementation Assay (HT-GPCA)
HEK-293T cells were seeded at 35,000 cells per well in 96-well plates. After 24 h, cells were transfected by linear PEI (polyethylenimine) with pSPICA-N2-E2 and pSPICA-N1-cellular protein constructs (100 ng each), for expression of the GL2-E2 and GL1-fusion proteins, where GL1 and GL2 are two inactive fragments of the Gaussia princeps
luciferase. 10 ng of a CMV-firefly luciferase reporter plasmid was added to normalize for transfection efficiency. Cells were lysed 24 h post-transfection in 40 µL of Renilla luciferase lysis buffer (Promega) for 30 minutes. The Gaussia princeps
luciferase activity was measured on 30 µL of total cell lysate by a luminometer Berthold Centro XS LB960 after injection of 100 µL of the Renilla luciferase substrate (Promega). Firefly luciferase was measured on the remaining 10 µl lysate with Firefly luciferase substrate. Gaussia Luciferase activity was reported to Firefly luciferase activity for each sample, giving a normalized Gaussia luminescence. Each normalized Gaussia luciferase activity was calculated from the mean of triplicate samples. For a given pair of proteins (A and B), the normalized Gaussia luminescence of cells coexpressing GL1-A+GL2-B proteins was divided by the sum of normalized Gaussia luminescence of each partner coexpressed with matched empty plasmid: GL1-A+GL2-B/(GL1-A +GL2) + (GL1 + GL2-B). This gave a Normalized Luminescence Ratio (NLR) corresponding to the reconstituted Gaussia luciferase activity, thus reflecting the level of interaction between protein pairs. See 
for further details on the method.
Literature curated interaction (LCI) involving the E2 proteins were extracted from the VirHostNet 
, virusMINT 
and PubMed databases. Interaction data analyses were performed using the R statistics package. Raw NLR interaction data were separated into categories in order to minimize the dispersion of NLR values. Cut-off thresholds of each category were determined with the goal of maintaining the same frequency distribution across all categories. An Euclidian distance matrix was calculated from the data categories using the “dist” function from R. The interaction dendrogram was calculated using the “complete” (UPGMA) linkage method from the “hclust” function from R. E2 protein sequences were clustered using the “phylip” package 
. Protein distances were calculated with the “prodist” program, using default parameters. The phylogenetic dendrogram was generated with the “neighbor” program using the UPGMA method and default parameters. Both interaction and phylogenetic dendrograms were generated using JavaTreeView 
. A Pearson correlation coefficient was calculated with the “cor” function in R using the cophenetic distances between both interaction and phylogenetic dendrogram to determine the closeness of the two dendrograms, The label order for the intensity data was then randomly changed to generate 100,000 random dendrograms. The cophenetic distance matrix for these randomized dendrograms was compared to the cophenetic distance matrix from the phylogenetic dendrogram with a Pearson correlation (“cor”) function from R. The p-value was calculated based on the number of standard deviations the correlation between the interaction dendrogram and the phylogenetic dendrogram was from the mean of the distribution of the correlation between the random and the phylogenetic dendrogram. A Cumulative Density Function of the randomized dataset was compared to a normal distribution generated by the R function ‘rnorm’ using the same mean and standard deviation from the randomized dataset to check the normality of the data.
The E2 interaction networks were generated with the cytoscape software 
with interactions scoring positive in HT-GPCA (NLR above 3.5). The degree of each cellular protein in both E2 and HPRD-based human interactomes was extracted from cytoscape. To determine the overrepresented GO (Gene Ontology) terms in the interaction dataset and to evaluate the gathering of E2 targets by functional categories, we used the DAVID bioinformatic database 
. P-values were generated by DAVID.
293T cells were plated at 35,000 cells per well in 96-well plates and transfected 24 h later by linear PEI with 25 ng of pTK6E2BS E2 responsive reporter plasmid, 10 ng of the polIII-Ren as internal control for transfection efficiency, and 100 ng of GL2-E2 fusion proteins or empty GL2 plasmid. To assess the effect of GTF2B, HeLa cells plated in 12-well plates were transfected by linear PEI with 100 ng of pTK6E2BS, 10 ng polIII-Ren, 100 ng of mCherry-fused E2 or mCherry expressing plasmids, and either 1 µg of GTF2B expressed from pCI Neo or of empty pCI Neo (Promega). 30 h post transfection, cells were lysed in Passive lysis buffer according to manufacturer's instructions and luciferase activity was measured with Dual Glo Buffer (Promega). Results are given as the mean of three independent tests ± SD (errors bars).
HaCaT cells grown in coverslip were co-transfected by linear PEI with expression plasmids for GFP-fused E2 proteins (3 µg) and Cherry-fused cellular proteins (1 µg). 24 h post transfection, cells were fixed with 4% paraformaldehyde for 30 min, washed in PBS, and incubated with DAPI for 30 min. Cells were mounted with CitiFluor. Fluorescent Images were acquired using a ZEISS Apotome microscope.
7,500 HeLa cells were reverse transfected by INTERFERin (Polyplus-Transfection) with 1.75 picomole of a pool of four siRNA targeting GTF2B (from Qiagen bank Human Whole Genome siRNA Set V4.1), and plated in 96-well plates. 2 scrambled siRNA (ref 1027310, Qiagen) were used as negative controls. 48 h later, 20 ng of Cherry-E2 expression plasmids were transfected by linear PEI along with the 25 ng of pTK6E2BS reporter and 10 ng of polIII-Ren as internal control for transfection efficacy and cell viability. 24 h post transfection, cells were lysed in passive lysis buffer according to manufacturer's instructions (Promega). Firefly and Renilla luciferase were measured on a Berthold Centro luminometer to generate a Luciferase/Renilla ratio, each transfection was tested in triplicates with each bar representing the mean ± SD. Results are given as fold activation of TKE2BS by E2 in the presence of the siRNA, calculated relative to TKE2BS activity without E2. P-values were calculated by a Student statistical test.