|Home | About | Journals | Submit | Contact Us | Français|
Although significant effort is expended on identifying transcripts/proteins that are up-regulated in cancer, there are few reports on systematic elucidation of transcriptional mechanisms underlying such druggable cancer-specific targets. The mesothelin (MSLN) gene offers a promising subject, being expressed in a restricted pattern normally, yet highly overexpressed in almost one-third of human malignancies and a target of cancer immunotherapeutic trials. CanScript, a cis promoter element, appears to control MSLN cancer-specific expression; its related genomic sequences may up-regulate other cancer markers. CanScript is a 20-nt bipartite element consisting of an SP1-like motif and a consensus MCAT sequence. The latter recruits TEAD (TEA domain) family members, which are universally expressed. Exploration of the active CanScript element, especially the proteins binding to the SP1-like motif, thus could reveal cancer-specific features having diagnostic or therapeutic interest. The effcient identification of sequence-specific DNA-binding proteins at a given locus, however, has lagged in biomarker explorations. We used two orthogonal proteomics approaches— unbiased SILAC (stable isotope labeling by amino acids in cell culture)/DNA affnity-capture/mass spectrometry survey (SD-MS) and a large transcription factor protein microarray (TFM)—and functional validation to explore systematically the CanScript interactome. SD-MS produced nine candidates, and TFM, 18. The screens agreed in confirming binding by TEAD proteins and by newly identified NAB1 and NFATc. Among other identified candidates, we found functional roles for ZNF24, NAB1 and RFX1 in MSLN expression by cancer cells. Combined interactome screens yield an effcient, reproducible, sensitive, and unbiased approach to identify sequence-specific DNA-binding proteins and other participants in disease-specific DNA elements.
Cancer-specific proteins serve as starting points for studies of carcinogenesis and for developing diagnostics and therapeutics. Exploring their transcriptional regulation could unveil cancer-specific processes. There have been, however, relatively few successes in such transcriptional studies; for example, the most recent report exploring the detailed mechanism of transcriptional regulating a familiar cancer marker, CEA (CEACAM5), was published in 1994.1 Technical challenges, such as a lack of comprehensive unbiased screening methods, previously hindered the identification of locus-specific DNA-binding proteins. Here, we revisited this challenge using several technologies to determine the interactome of the MSLN cancer-specific element (the CanScript sequence) and then assayed the in vivo relevance of some of these interactions.
Historically, protein libraries were used to screen for unknown proteins binding to known DNA sequences. For example, phage display of cDNA libraries successfully uncovered several zinc finger proteins2,3 (including Can-Script-regulating candidate KLF6), by using specific transcriptionally active DNA elements as probes to generate qualitative results. The yeast one-hybrid screening of cDNA libraries was the in vivo analog of such screens and provided a quantitative output. Limitations have included studying libraries of inadequate complexity, poor modeling of native protein structure, omission of relevant binding partners, and blindness to the indirect interactions as would occur in assembled protein complexes.
Protein microarrays and yeast screens of custom polypeptide libraries are practical for screening DNA-binding proteins, carry similar limitations, but offer greater reproducibility (~80% hits reproducible between two experiments)4 and investigator control than natural libraries. DNA–protein interaction databases were established from their results,4,5 in which thousands of purified and arrayed proteins directly bound in vitro to applied DNA probes of interest.
Bioinformatics tools offer predictions of DNA-binding proteins.6,7 Their computational power rests on databases generated from hundreds of known DNA–protein interactions. The predicted results are limited by database size and accuracy, sampling biases, disparate informational sources, omission of cooperative binding patterns, etc. We used such tools in prior explorations.8
Tandem mass spectrometry allows identification and quantitation of native protein interactomes affnity-captured by defined DNA sequences directly from cell lysates.9,10 One of the major limitations of this method is that specific binding proteins are usually obscured by a background of nonspecific protein binding, even when incorporating stringent controls. To overcome this limitation, protein samples can be isotope labeled to distinguish the binding proteins in control and test samples to discern sequence-specific DNA-binding proteins. This is because mass spectrometry allows one to compare the intensities of the different isotopic forms in the MS spectrrum.9–12 We used the SILAC method for labeling proteins that were then subjected to affnity capture by target or control biotinylated oligonucleotides, mixed and analyzed together (Figure 1A). The protein samples were digested with trypsin, and each resulting peptide generates a quantitative ratio (heavy/light) of the paired peptides, which greatly increases the chance to identify specific binding proteins.10,11
Normally, MSLN is expressed only in a thin layer of mesothelial cells lining body cavities. For reasons that are currently unknown, MSLN expression is strongly activated in many cancers, especially the vast majority of ovarian and ductal pancreatic adenocarcinomas13,14 and is the molecular target for antigen-directed experimental cancer immunotherapeutics.15,16 We previously ascribed MSLN cancer-overexpression to CanScript, a 20-nt sequence with promoter-like activity in MSLN-positive cancers.8,17 CanScript is bipartite, comprising an MCAT motif and an SP1-like motif separated by a short linker (Figure 1B). The MCAT motif recruits a complex of TEAD1 and YAP1 (yes-associated protein1),18 genes whose expression is almost universal. TEAD1 and YAP1 are necessary, but not sufficient, for MSLN gene expression and CanScript reporter activity.8 Interestingly, the 5′ flanking sequence of MCAT motif often plays a determining role in tissue-specific expression of genes, such as cardiac troponin T in striated muscle.19 We postulated that the SP1-like motif and its associated unknown transcription factors may lend the cancer-specificity. In previous study, we found KLF6 as a candidate associated with the SP1-like motif, but it did not fully explain the features of CanScript’s activity.8
AsPc1, HeLa, RKO and HEK293 cell lines were obtained from the American Type Culture Collection. Cells were cultured in RPMI (AsPc1 only) or DMEM medium, supplemented with 10% FBS and 1% penicillin/streptomycin. The cultures were monitored for characteristic patterns of MSLN expression and morphology during passage.
The CanScript 3× wild-type sequence was CGGGG-(TCTCCACCCACACATTCCTG)3GGGCG; the mutant C5 sequence was CGGGG(TCTCAACCCACACATTCCTG)3-GGGCG; the mutant M6c sequence was CGGGG(TCTCCA-CCCACACTTACCTG)3GGGCG, the double mutant C5M6c sequence was CGGGG(TCTCAACCCACACTTACCTG)3-GGGCG. Biotin with a 15-atom spacer arm was covalently linked to the 5′ end of these polynucleotides. These were annealed with their complementary nonmodified sequence (both from IDT) to make double-stranded DNA.
The pRL-SV40 plasmids were obtained from Promega (E2231). pGL3-Can3 was described previously.8 TEAD1, YAP1 and ZNF24 sequences were inserted into pcDNA3.1-FLAG vector (Invitrogen) between BamHI and NotI to create pcDNA3.1-FLAG-TEAD1, pcDNA3.1-FLAG-YAP1, and pcDNA3.1-FLAG-ZNF24, TEAD1 was also inserted into the pRK5-HA vector (BD Pharmingen) between SalI and NotI to create pRK5-HA-TEAD1.
Cells were lysed by suspension using 1 mL hypotonic detergent (10 mM Tris pH 8.0, 10 mM KCl, 1.5 mM MgCl2, 0.1 mM EDTA, 0.5 mM DTT, 0.75% NP-40 with protease inhibitors, Roche 1183617001) per 0.2 g cell pellet for 15 min. The lysate was agitated followed by 10 min centrifugation at 3500× g for 30 s. The supernatant was collected as the cytoplasmic fraction. The nuclear pellet was resuspended in 1 mL isotonic solution A (20 mM Tris pH 7.5, 100 mM KCl, 50 mM NaCl, 1.5 mM MgCl2, 0.2 mM EDTA with protease inhibitors) per 0.2 g cell pellet. The nuclear suspension was sonicated for ten seconds and clarified by 10 min centrifugation at 13000× g. This supernatant was collected as the nuclear extract. Separation of cytoplasmic and nuclear proteins was confirmed by immunoblot to detect alpha-tubulin and lamin B, respectively.
For immunoblot analysis of captured proteins, nuclear extracts from ~107 AsPc1 cells were incubated with 0.5 μM biotinylated CanScript ×3 probes and 50 μg/mL dI-dC for 1 h, followed by addition of 1/20 volume NeutrAvidin beads (Pierce, 29200). For DS-MS, cell numbers and solution volumes were increased 10-fold excepting the beads, which were doubled. After 30 min binding, beads were washed three times in solution A. Bound proteins were released by boiling beads in 1× loading buffer. Released proteins were separated by SDS-PAGE before immunoblot or mass spectrometric analysis.
For analysis of unseparated proteins, cells were washed in PBS and lysed by RIPA buffer (containing protease inhibitors), followed by sonication. Protein concentrations were determined by DC protein assay (Bio-Rad 500–011). Samples were boiled with loading buffer and proteins separated by SDS-PAGE. For all immunoblots, proteins transferred onto a PVDF membrane were detected using one or more primary and secondary antibodies: anti-MSLN (Abcam, ab-3362); anti-TEAD1 (Santa Cruz, sc-81396); anti-TEAD2 (Santa Cruz, sc-32427); anti-YAP1 (Santa Cruz, sc-101199); anti-SP1 (Santa Cruz, sc-420); anti-KLF6 (Santa Cruz sc-7158); anti-ZNF24 (Santa Cruz, sc-101079); anti-RFX1 (Santa Cruz sc-10650); anti-NAB1 (Santa Cruz, sc-81565); anti-NFATC1 (Santa Cruz, sc-7294); anti-NFATC2 (Santa Cruz, sc-7296); anti-CEA/CEACAM5 (Santa Cruz, sc-23928); anti-IKZF1 (Santa Cruz, sc-13039); anti-HBP1 (Santa Cruz, sc-25390); anti-PRDM14 (Santa Cruz, sc-133923); antialpha-tubulin (Santa Cruz, sc-8035); antilamin B (Santa Cruz, sc-6216); anti-GRSF (Novus Biologicals, NBP1–57316); anti-HA (Santa Cruz, sc-7392 HRP); anti-FLAG (Sigma A8592). HRP-linked secondary antibodies were from Santa Cruz (antimouse sc-2005; antirabbit sc-2004; antigoat sc-2020). Membranes were developed using chemiluminescence substrate (Millipore WBKLS0500) and exposed to film.
AsPc1 cells were cultured in lysine and arginine-free RPMI (Cambridge, RPMI-500) supplied with 13C6 L-Lysine-2HCl (Pierce, 89988), 13C6 L-Arginine-HCl (Pierce, 88210) and 10% dialyzed FBS (Invitrogen, 26400036) for at least five cell generations for the complete incorporation of heavy stable isotope-labeled amino acids.
In-gel digestion of affnity-purified proteins was done prior to protein identification and quantification. Colloidal Coomassie blue-stained gel bands were excised and destained. Gel bands were subjected to reduction (5 mM DTT) and alkylation (20 mM iodoacetamide) and subsequently proteolyzed using sequencing-grade modified porcine trypsin (Promega, V5111) as described.20 Dried extracted peptides were resuspended in 0.1% formic acid and loaded for liquid chromatography–tandem mass spectrometry (LC–MS/MS) analysis. We carried out SD-MS experiments for three replicate, distinct experiments. Proteins identified twice to have wild-type/mutant (i.e., heavy/light) ratios of at least 10.0 were arbitrarily chosen as high-confidence interactors.
Protein identification by LC–MS/MS analysis of peptides was performed using the LTQ-Orbitrap XL mass spectrometer (Thermo Fisher Scientific) interfaced with a 2D nanoLC system (Eksigent) and Agilent 1100 autosampler (Meadows Instrumentation Inc.). Peptides were loaded on a 75 μm × 2.5 cm trap packed with Magic AQ C18, 5 μm 100 Å reversed-phase material, then separated by reversed-phase HPLC on the same material using an acetonitrile/0.1% formic acid gradient of 60 min with a flow rate of 300 nL/min.
Identification and quantitation of proteins were carried out using Proteome Discoverer 1.3 software with the Mascot and SEQUEST search algorithms and a human protein database (NCBI RefSeq, Reference Sequence collection, Release 46). Program parameter settings were: trypsin as the digestion enzyme (missed cleavage 1 allowed), peptide tolerance at 10 ppm, fragment tolerance at 0.6 Da (carbamidomethylation on Cys, deamidation of asparagine and glutamine, oxidation at methionine), lysine (13C6) and arginine (13C6) as variable modifications. Proteins were considered positive when their peptides were identified with a cutoff of <1% false discovery rate (FDR). SILAC-based quantitation was carried out using the Proteome Discoverer Event Detector (mass precision of 2 ppm) and Precursor Ions Quantifier algorithms to measure the area under extracted ion chromatograms. The data associated with this manuscript may be downloaded from Proteome-Commons.org Tranche using the following hash: 9jrzamhW-qUF+MRkk6cUzXJ0z+xt9yZWv+T5LZGXbYO2np +BQ8p6pRoFro2HS4bMti7q0TRun/OJedyz4qFK6clYVm-b0AAAAAAAAL/A==
The fluorophore Cy5 was covalently linked to the 5′ end of wild-type or C5M6c mutant CanScript ×3 sequences, which were then annealed to their complementary nonmodified sequences to form double-stranded DNA probes. Wild-type and mutant probes were incubated individually with identical TFM chips as described.21 The chips included 4191 human GST-tagged proteins purified from yeast. The software of GenePix@ Pro 7 was used to read spots’ intensities on the protein microarray. The raw intensity of the spot was defined as the ratio (in logarithmic scale, base 2) of foreground median to local background median. We adopted quantile normalization to modify the signals across microarrays under different conditions (wild-type and mutant in the study). Intensities of less than zero were treated as a half of the background (noise) data and its symmetric part (>0) was generated. Both parts formed the whole of the background noise, which approximately fitted to a normal distribution with the mean value of zero. Then, each spot’s signal was standardized according to
where Z is the Z-score of each spot, I is the intensity of the spot after quantile normalization, and σ is the standard deviation of the background noise. Since all proteins had duplicated spots on each microarray, only proteins having both spots’ Z-scores greater than three were considered as positive for DNA probe-binding activity. Those positive proteins for the wild-type probe (Z > 3), but not the mutant probe, were considered having wild-type binding specificity. We listed only those with wild-type/mutant ratio at least 10.0 in Table 2.
We primarily used HeLa cells for transient transfections, owing to low transfection effciency in AsPc1 cells. We seeded 1.5 × 105 cells per well of a 6-well plate one day before transfection by Lipofectamine (Invitrogen 18324–012, manufacturer’s instructions). pGL3-Can3 firefly luciferase vector (0.2 μg) and pRL-SV40 Renilla luciferase control vector (20 ng) were transfected per well. Transfection medium was replaced by growth medium at 6 h.
2.5 million HEK293 cells were seeded in a 10-cm dish followed by transfection with 12 μg pcDNA3.1-FLAG vector. Cells were harvested after 48 h and sonicated in 300 μL PBS buffer. After 10 min centrifuge at 10,000 × g, supernatant was collected and incubated with 20 μL anti-FLAG M2 magnetic beads (Sigma, M8823) at 4 °C for 1 h. The beads were washed and protein eluted using 150 ng/μL 3 × FLAG peptide (Sigma, F4799). The yield was ~1 μg/dish.
The dual luciferase reporter assay system (Promega E1910) was used. Cells were harvested 24 h after cotransfection with pGL3-Can3 and pRL-SV40 vector. Cells were lysed by the kit-provided lysis solution (300 μL/well). Each sample was measured using 20 μL of cell lysate mixed with luciferase reagent II (LARII, 100 μL per well). After gathering the firefly luminescent signal by photometer, Stop and Glo reagent was added (100 μL per well), followed by the Renilla luminescence reading. Firefly luciferase readings were normalized using the Renilla readings of the same wells. The average relative luciferase activity (RLA) was obtained from the triplicate wells in each experiment (the differences among three wells were usually less than 5%). Two independent experiments performed on different days had high agreement; a representative experiment was presented (Supporting Information Figure S3).
siRNA sequences and transfection. siRNA oligonucleotides were synthesized by Ambion. An effcient siRNA from an irrelevant gene (FANCD2, Irr) was systematically used as a negative control. Except for the previously reported TEAD1 and TEAD2 genes,17 three nonoverlapping siRNA sequences were evaluated for each gene (total five siRNA sequences were tried on RFX1 and only three could effectively reduce RFX1 protein level). Immunoblot assays were used to verify siRNA effciencies of gene depletion. The sense sequences were: FANCD2 (Irr) si:CCAUGUCCUUAGUAGCCGATT; TEAD1 si: GCCCUGUUUCUAAUUGUGGTT; ZNF24 si1: GGCACUGUGAUGAUGAUGGTT; ZNF24 si2: GUUC-CUGGCACUCUCAAUATT; ZNF24 si3: GCAUUCAGCC-GAAGUUCCATT; RXF1 si1: GAGUACAUGUACUACCU-GATT; RXF1 si2: GAGAGAUCUGUGGUCCAGGTT; RXF1 si3: GAUGGAAGGCAUGACCAACTT; NAB1 si1: GGA-GUUCCUUUGCAACCAATT; NAB1 si2: GUAGCAUACC-CAUCUAUAATT. NAB1 si3: GAGUGAAGAACUUGCAG-CUTT. siRNA transfections were done in HeLa cells using Oligofectamine (Invitrogen 12252011). The protocol was similar to our transient plasmid transfection except that 0.2 nM siRNA and 4 μL Oligofectamine were added into each well.
Using an immunoblot assay following DNA affnity-capture (Figure 1A), the positive control TEAD1 was retained at greater than 2% of the input amount using the wild-type CanScript sequence, but not retained by the known inactive C5M6c mutant (Figure 1B)8,17 (Figure 2). KLF6, a reported candidate,8 had a similar although weaker pattern. TEAD2, which shared DNA-binding characteristics with TEAD1, yet in our previous study was not recruited to CanScript in vivo,17 was not detected (Figure 2). The negative control SP1 bound to CanScript wild-type and mutant indistinguishably (Figure 2), confirming our findings that SP1 was not a regulator of CanScript activity.8 SP1 transcriptional function is not expected to tolerate the central adenine in the Canscript sequence. The TEAD1-binding partner YAP1 was specifically effciently captured by wild-type CanScript, reflecting indirect capture of a member of the complex (as contrasted with exclusively reflecting direct DNA-binding).
We attempted to visually select “probe-specific” protein bands appearing after DNA affnity-capture (Supporting Information Figure S1). To be complete, we excised and analyzed several of the visible differential protein bands even in the mutant bound lane, but they seemed nonrelevant (such as universally expressed ribosome-binding proteins). We then adopted the SILAC approach coupled to the DNA-affnity capture. A total of 850 proteins, based on 7,700 unique peptides, were identified from three replicate SD-MS experiments. Only 15 proteins preferentially bound to wild-type probe (wild-type/mutant ratio more than 10) whereas 83 proteins preferred C5M6c mutant probe (ratio less than 0.1). Nine hits were identified using the predefined criterion (10 as threshold ratio) (Table 1). TEAD1 (Figure 3A), an interactor reported previously by us (and thus the positive control), was one of the two with the highest ratios. TEA family members TEAD3 and TEAD4 were the first and the fourth in the order, respectively. TEAD2 was not identified by SD-MS. The discovery of a structural redundancy of TEAD family members (TEAD1–3) might indicate a functional redundancy, also. These results reflected those found using immunoblots (above) in place of mass spectrometry. We subsequently chose seven out of the nine hits (excluding TEAD3 and TEAD4 as being biochemically similar) to examine their cognate binding motifs (MCAT or SP1-like) by immunoblot analysis to measure DNA capture (Figure 4A). ZNF24, a TCAT repeat-binding protein,22 was identified as specifically recruited by the MCAT motif (Figure 3B). ZNF24 was reported as a transcriptional repressor for vascular endothelial growth factor.23,24 The knockout of ZNF24 in mice resulted in early embryo lethality,25 attributed to inhibition of proliferation of neural progenitor cells.26 In this study we demonstrated that ZNF24 might function as a transcriptional activator for MSLN expression. In contrast, RFX1 (regulatory factor X, 1) and NAB1 (NGF1A-binding protein 1) were specifically captured by the SP1-like motif. RFX1 is a transactivator for expression of MHC class II,27 proliferating cell nuclear antigen (PCNA),28 and certain viral proteins such as HBsAg.29 It binds as a monomer to the consensus sequence RGYAAC and as a dimer to palindromic DNA29,30 with a preference for methylated sequence.31 NFATc2 (cytoplasmic nuclear factor of activated T cells) bound wild-type CanScript and tolerated each of the tested single mutants (C5 and M6c), but not the double mutant, C5M6c. The NFATc consensus-binding sequence WGGAAANH,32 however, was not present in CanScript. NFATc family members NFATc1–4 are long known to play pivotal and redundant roles in activation of immune system.32 The DNA binding core sequences RAHYETEG is conserved in all NFATc family members.32 Binding to CanScript was not confirmed by immunoblots for CEACAM5 and GRSF; the highly overexpressed membrane protein CEACAM533 was considered a false positive due to all membrane components in the nuclear extract.
We next examined whether the four immunoblot-confirmed hits (ZNF24, RFX1, NAB1 and NFATc2) might correlate to MSLN expression patterns among various cell lines (Figure 4B). This was not the general pattern, for ZNF24, RFX1 and NAB1 were expressed in both MSLN-positive (AsPc1 and HeLa) and -negative cells (RKO and HEK 293). NFATc2 had high expression in AsPc1 cells, yet a very low level in HeLa and HEK293 cells and was not detected in RKO cells. Not surprisingly, these proteins identified as transcriptionally active interactors were predominantly distributed in the nuclear fraction (Figure 4B). This is interesting for NFATc2 because calcium influx is required for NFAT nuclear translocation in T cells.34 It is unclear whether NFATc2 nuclear localization in cancer cells is calcium influx-dependent.
We probed Cy5-labeled double-stranded wild-type or C5M6c mutant CanScript ×3 sequences individually on our TFM chips. Eighteen hits that bound specifically to the wild-type sequence were identified at a threshold ratio of 10 (wild-type/mutant signal intensity, logarithmic transformed) (Table 2). Due to fluorescent signals being a more sensitive reading (i.e., producing data for a greater proportion of the analytes), we expected and attained diversity in the ratios from this screen. Three of the TFM hits shared gene family members with SD-MS hits: NAB1 was a hit in both; NFATc1 and TEAD2 from the TFM hits shared identical DNA-binding protein sequences with NFATc2 and TEAD1, respectively, identified by SD-MS. It was not surprising to identify TEAD2 by TFM because the 72-amino acid DNA-binding domain is identical between TEAD1 and TEAD2,35 and a report had demonstrated that in vitro-synthesized TEAD2 could bind to the MCAT motif36 (note that TFM is an in vitro screen). Similarly, recombinant NFATc1 and NFATc2 were known to have share cognate DNA-binding sequences.37 Identification of NAB1 by TFM indicated that NAB1 could directly bind to CanScript. NAB1 might be an unconventional DNA-binding protein as described previously,4 because it has not been reported to have direct DNA-binding activity. NAB1 was reported as a zinc finger protein EGR1 (Early Growth Response factor 1) binding partner,38 which is implicated in promoting growth and progression of prostate cancer.39 ERG1 itself was not a hit on either of our screens. Interestingly, HMG box transcription factor 1 (HBP1)40 shared the binding consensus sequence TCAT repeat with ZNF24, identified by SD-MS (Table 1).
We next evaluated further an arbitrarily chosen five of the ten top hits (Table 2) by immunoblot assay after DNA-affnity capture (Supporting Information Figure S2). NAB1 and NFATc1 were specifically captured by wild-type CanScript. Immunoblotting did not confirm the binding of HBP1 and PRDM14 with CanScript, despite being expressed in the tested cells (AsPc1). In contrast, IKZF1 had CanScript C5M6c mutant-specific binding.
To validate some of these newly identified candidates, we asked whether they played roles in MSLN expression or CanScript reporter activity. We chose to examine ZNF24, NAB1 and RFX1 because: (1) ZNF24 was one of the two CanScript-binding candidates whose captured fraction was greater than 2% of the input (Figure 4A); (2) NAB1 and RFX1 were the only two proteins specifically recruited by the wild-type SP1-like motif, thought to be the likely cancer-specific component;8 (3) NAB1 was an unconventional DNA-binding protein and had been identified with both orthogonal proteomics approaches.
For functional studies, we used siRNA to individually knock down ZNF24, NAB1 and RFX1 in HeLa cells. Testing three sequences per gene, the expression levels of all three genes were effectively reduced by their siRNAs (Figure 5A–C). NAB1 and RFX1 siRNA reduced most MSLN expression with efficacies similar to our positive control, the TEAD1 siRNA (Figure 5A and B), whereas MSLN protein was moderately reduced by ZNF24 siRNAs in HeLa cells (Figure 5C). TEAD1 expression was slightly reduced by two of three ZNF24 siRNAs (Figure 5C, si1 and si3), but not by NAB1 or RFX1 siRNAs (Supporting Information Figure S4). We did not observe prominent cell loss or cell death induced by any siRNA (the reported proliferation inhibition of ZNF24 RNAi could be cell line-specific).25 Similarly, two of three ZNF24 and NAB1 siRNAs reduced CanScript ×3 reporter activity (Supporting Information Figure S3) with one of the ZNF24 siRNAs being as effective as TEAD1 siRNA. In contrast, two RFX1 siRNAs increased CanScript reporter activity; one reached almost 3-fold induction. The inconsistent effects of RFX1 manipulation on MSLN expression and CanScript reporter activity, are as yet unexplained but might represent a complex regulatory effect or a plasmid-dependent artifact.8,41–44
Overexpression of ectopic wild-type ZNF24, NAB1 or RFX1 neither switched on MSLN expression nor significantly augmented CanScript reporter activity in MSLN-negative cell lines (data not shown). The overall results suggested that the two transcription factors and NAB1 might be necessary or contributory, but not sufficient, for MSLN overexpression.
The association of ZNF24 with the MCAT motif raised a question, whether ZNF24 participated in a complex with TEAD1-YAP1 or bound to CanScript independently. ZNF24 was reported to bind to DNA only through its heterodimers,45,46 its homodimerization being weak when compared to its heterodimerization, as when it was paired with the leucine-rich SCAN domain-containing partners ZNF202 or SDP1,45 which were not identified in our screens. We did not observe the ZNF24-TEAD1 interaction when we attempted to coimmunoprecipitate ZNF24 using anti-TEAD1 antibody in transiently transcfected HeLa cells, yet the YAP1-TEAD1 interaction was confirmed (data not shown). We also failed to observe direct CanScript interaction with purified ZNF24 by EMSA (data not shown). These negative results were thus inconclusive.
We combined two orthogonal proteomic screens, SD-MS and TFM, systematically exploring the interactome at a structurally novel cancer-specific promoter driving the expression of a therapeutically and diagnostically relevant cancer marker. The combination provided an effcient, reproducible; and unbiased approach for identification of interactors having DNA-binding specificity. The results agreed well with each other: eight of 27 (9 + 18) hits overlapped at the level of protein families. The two approaches were practically complementary to each other as well, SD-MS presumably scanning all relevant native nuclear proteins specifically in relevant cells, and TFM providing evidence of direct DNA-binding.
Multiple advantages attend SD-MS as an approach for identifying DNA sequence-specific interactomes. Mass spectrometry is quite sensitive, requiring only tens of nanograms of captured protein for identification. Also, SILAC provides a facile and numerical prioritization among captured proteins. SD-MS is an unbiased screen, owing to all native proteins of the nuclear extract being interrogated by the DNA probe, and includes analysis of protein complexes that might mimic the in vivo functional binding as well or better than could purified proteins.
The major challenge of SD-MS is that the DNA-binding activities of native proteins captured and resolved in vitro are subject to arbitrary experimental conditions. For example, using a conventional nuclear extract protocol (high salt, ~ 0.4 M NaCl) solution, we found the TEAD1-Canscript interaction greatly impaired (data not shown). Our modified nuclear extract also did not fully exclude contamination from membrane proteins (CEACAM5, Table 1).
TFM has a unique advantage of detecting only direct interactions with DNA. The use of individual purified proteins on a chip also provides a clean background free of the nonspecific interferences expected from cell lysates. Thus, TFM could discover weak or rare DNA binding proteins that are at a competitive disadvantage in the SD-MS screen. TFM’s disadvantages, however, include: (1) purified recombinant proteins (here, provided in yeast) might not mimic the in vivo functional conformations; (2) DNA-binding proteins dependent on protein cooperation (such as ZNF24) might evade discovery.
We identified nine hits from SD-MS and 18 from TFM comparing the Canscript sequence with a transcriptionally inactive C5Mc6 mutant CanScript matched control and with stringent cutoffs. The screens confirmed the previous candidate TEAD1, but also reflected the other TEAD family members, as an expected redundancy. We selected eleven hits (TEAD1, ZNF24, RFX1, NFATc1, NFATc2, NAB1, CEACAM5, GRSF1, IKZF1, HBP1 and PRDM14) and confirmed the binding activities of six (TEAD1, ZNF24, RFX1, NFATc1, NFATc2 and NAB1) by immunoblot following DNA affnity-capture.
We chose MCAT motif-bound ZNF24 and SP1-like motif-bound NAB1 and RFX1 for further functional analysis. Knockdown of any of these three transcription factors could effectively reduce MSLN expression in MSLN-positive cells (Figure 5), suggesting they were necessary for MSLN expression. In a test of sufficiency, their overexpression did not switch on MSLN expression in MSLN-negative cells; this was not surprising, owing to their being nonspecifically expressed in MLSN-positive and -negative cells (Figure 4B). In addition, CanScript itself is likely necessary but not sufficient for MSLN expression; the reporter activity of the pGL3-CanScript ×1 plasmid, when tested exclusive of MSLN flanking sequences, was almost at the basal level of an empty reporter construct (data not shown),8,17 in contrast to the MSLN cancer-specific promoter region (CanScript is located at the very 5′ end) containing ~160 bp, which had greater than a 100-fold increase of transcriptional activity over the basic construct. Additional cis factors in the MSLN promoter region presumably cooperate with CanScript to fully activate MSLN cancer-specific overexpression.
Four additional annotated transcription factors would be considered had we chose 2.0 rather than 10.0 as our criterion in SD-MS (Table 1): ZFP64, MAFG, MAFF and MAFK. The MAF proteins belong to the AP-1/JUN superfamily of basic leucine zipper proteins. MAF homo- and heterodimers are reported to recognize a palindromic consensus DNA sequence TMARE (TGCTGAG/CTCAGCA),47 which is not present in CanScript, yet their possible roles related to MSLN expression may worth investigation.
Interestingly, 9 of the 15 (60%) detected wild-type Canscript-specific binding proteins were known transcription factors (Supporting Information Table S1), whereas only 12 of 83 (14%) mutant Canscript-specific binding proteins were known transcription factors (Supporting Information Table S2). We can not speculate what roles these mutant Canscript-binding transcription factors might play in vivo; the human genome lacks exact copies of that mutant sequence.
The study of DNA–protein interactions is a key bridge connecting genetics and proteomics. Combinatorial and unbiased approaches, such as those explored here, could effciently explore transcriptional controls driving cancer- or disease-specific, transcriptionally active, DNA elements known to be distributed among diverse genome loci in cancer and in other diseases. Better knowledge of such interactions could suggest novel targets for pharmacologic disruption, or draw fresh attention to overlooked influences in the recurring biochemical signatures of disease.
This work has been supported by NIH grant CA62924, NIH Technology Centers for Networks and Pathways grant U54 RR020839 and the Everett and Marjorie Kovler Professorship in Pancreas Cancer Research.