|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies identified numerous disease risk loci. Delineating molecular mechanisms influenced by cis-regulatory variants is essential to understand gene regulation and ultimately disease pathophysiology. Combining bioinformatics and public domain chromatin information with quantitative proteomics supports prediction of cis-regulatory variants and enabled identification of allele-dependent binding of both, transcription factors and coregulators at the type 2 diabetes associated PPARG locus. We found rs7647481A nonrisk allele binding of Yin Yang 1 (YY1), confirmed by allele-specific chromatin immunoprecipitation in primary adipocytes. Quantitative proteomics also found the coregulator RING1 and YY1 binding protein (RYBP) whose mRNA levels correlate with improved insulin sensitivity in primary adipose cells carrying the rs7647481A nonrisk allele. Our findings support a concept with diverse cis-regulatory variants contributing to disease pathophysiology at one locus. Proteome-wide identification of both, transcription factors and coregulators, can profoundly improve understanding of mechanisms underlying genetic associations.
Genome-wide association studies (GWAS) identified thousands of loci associated with diverse diseases (1). The majority of variants are located in noncoding DNA regions and have been proposed to affect transcriptional regulation. Advances of the ENCODE project and novel bioinformatics approaches improved the identification of cis-regulatory variants at complex loci (2–7). Moreover, deciphering allele-specific binding of transcription factors is essential to unravel the mechanisms ultimately affecting gene expression (6,8–10). However, the identification of allele-specific binding of transcription factors and coregulators remains elusive in most cases, despite the well-established importance of a coordinated interaction between transcription factors, coregulators, and the basal transcriptional machinery for regulation of gene expression (11). Thus, in most cases the precise molecular mechanisms underlying associations between variants and disease risk remain unknown.
Quantitative protein–DNA proteomics, coupling affinity chromatography with liquid–chromatography tandem mass spectrometry (LC–MS/MS) was reported for identification of enhancer-binding proteins and for allele-specific DNA binding transcription factors (6,12,13). In this study, we introduce label-free quantitative proteomics on salt eluted sub-fractions containing native complexes, circumventing limitations of metabolic labeling strategies such as time-consuming workflows, high-costs or inefficiencies (14), or artifacts (15). The preserved allele-specific DNA-protein binding activity in the eluted sub-fractions significantly reduced complexity and allowed high coverage of quantified proteins and thereby identification of both, transcription factors and coregulators (Figure (Figure1A1A).
The Peroxisome proliferator-activated receptor gamma (PPARG) locus has been robustly associated with type 2 diabetes (16–18). PPARG is crucial for adipogenesis, lipid metabolism, and systemic insulin sensitivity (19, 20) with the PPARG2 isoform being mainly expressed in adipocytes (21–24). In a previous study, we reported that adipose tissue PPARG locus expression quantitative trait locus (eQTL) data revealed allele-specific regulation of PPARG expression (6). In the same study, using bioinformatics phylogenetic module complexity analysis (PMCA) we found the regulatory variant rs4684847. This variant overlaps with a homeobox transcription factor binding site (TFBS), a common feature inferred for type 2 diabetes risk variants. We further reported rs4684847C risk allele binding of the Paired Related Homeobox 1 (PRRX1) transcription factor. PRRX1 represses PPARG expression, and negative correlation of PRRX1 mRNA levels with insulin-sensitivity supports contribution to insulin resistance phenotype at the PPARG locus (6).
In the current study, combining PMCA (6) with public domain DNase sequencing (DNase-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) data we inferred rs7647481, a second PPARG locus cis-regulatory variant. We show that label-free proteomics can find both, transcription factors and cofactors at the rs7647481A nonrisk and rs4684847C risk allele, supporting the prediction of both regulatory variants. For rs7647481, we present novel experimental and human genetics data supporting the pathophysiological relevance of the transcription factor YY1 and cofactor RYBP.
The SGBS human preadipocytes cell strain, primary human preadipocytes, Huh7 human hepatoma, C2C12 mouse myoblast, INS-l rat insulinoma, 293T human embryonic kidney and the HIB1B mouse brown adipocyte cell line were cultured, induced and differentiated as previously described (6,25). Ethical committee approval for primary human preadipocyte cells was obtained from the Faculty of Medicine of the Technical University of Munich, Germany.
Nuclear protein extracts from human SGBS preadipocytes, differentiated SGBS adipocytes, and human primary preadipocytes were prepared as described (26). Briefly, cells were washed with phosphate-buffered saline (PBS), lysed in pre-chilled buffer A (10 mM HEPES pH 8.0, 10 mM KCl, 0.1 mM EDTA, 0.1 mM EGTA, 1 mM DTT, protease- and phosphatase inhibitor mix (Roche, Mannheim, Germany), 0.6% Nonidet-P40). Nuclei were collected by centrifugation, lysed in buffer B (20 mM HEPES pH 7.9, 400 mM NaCl, 1 mM EDTA, 1 mM EGTA, 1 mM DTT, 1 mM PMSF, 20% v/v glycerol, protease and phosphatase inhibitors (Roche, Mannheim, Germany), incubated on ice for 15 min, and the nuclear extract supernatant collected for further analysis. Protein concentration was quantitated using Bradford assay. HIB 1B nuclear extracts were prepared as described (25); in brief, cells were washed with PBS, lysed in pre-chilled homogenization buffer (10 mM HEPES pH 7.9, 1.5 mM MgCl2, 10 mM KCl, 20 mM NaF, 0.5 mM DTT). Nuclei were collected by centrifugation, resuspended in low salt buffer (20 mM HEPES pH 7.9, 1.5 mM MgCl2, 20 mM KCl, 0.2 mM EDTA, 20 mM NaF, 25% v/v glycerol, 0.5 mM DTT), lysed by adding high salt buffer containing 1.2 M KCl followed by vigorous shaking, centrifuged, and the nuclear extract supernatant was recovered for further analysis.
40 bp oligonucleotides rs4684847 5΄-TTTAAATCATCTCTAATTCT[C/T]ACAACTCCGAAAAGATAAG-3΄; rs7647481 5΄-CAACTCCCCCACTTTATTCC[A/G]TGATGTTCAGACCCAGCCA-3΄; rs17036342 5΄-GCTCTCCCAAAGAATTGTAA[A/G]TTCCCAGAGTGTAGGACCA-3΄; rs2881479 5΄GCAAGACTCTGTCTCAAAAA[A/T]AAATAAATAAATAAATAAA-3΄ with Cy5- (for EMSA) or biotin-label (for affinity chromatography) and with the respective SNP alleles at mid-position were synthesized (Eurofins, Ebersberg, Germany) and annealed with unmodified complementary oligonucleotides to obtain double stranded oligonucleotides, by heating to 90°C for 5 min in TE buffer followed by slow cooling down at room temperature overnight. For competition experiments unlabeled oligonucleotides (CdxA 5΄-GCATTTTATTACCACGCCTGCACTGTTGGTA-3΄; MyoD 5΄-CCCCCAACAGCTGTTGCCTGA-3΄, Scramble 5΄-AGCAAACCCTGACTAGTTATAGAGTCAAGACCGCCCACTT-3΄; YY1 5΄-CGCTCCCCGGCCATCTTGGCGGCTGGT-3΄) were used. Double-stranded oligonucleotides were separated from remaining single strand oligonucleotides on a 12% polyacrylamide gel.
Non-radioactive EMSA was performed with Cy5-labeled oligonucleotide as previously described (6) with some modifications. 5 μg of nuclear protein and 0.35 μg of poly (dI-dC) (Roche, Mannheim, Germany) were used for each reaction. The reaction conditions for affinity chromatography are in general similar to those used for EMSA. To optimize DNA-protein binding in EMSAs, several conditions were tested, such as amount of proteins, concentration of the nonspecific competitor poly (dI-dC) and salt concentration in the gel binding buffer (3% (v/v) glycerol, 0.7 mM MgCl2, 0.4 mM EDTA, 0.4 mM DTT, 37 mM NaCl, 0.7 mM Tris–HCl pH 7.5). In competition EMSA, 33-fold molar excess of unlabeled allele-specific probes and probes with perfect binding sites for YY1, MyoD, CdxA and a scramble oligonucleotides was added to the reaction, prior to addition of Cy5-labeled probes. Binding reactions were incubated for 20 min at 4°C. In supershift experiments, nuclear extracts were pre-incubated with 0.8 μg of anti-YY1 (sc-281x, Santa Cruz Biotechnology, Heidelberg, Germany) or isotype control IgG (sc-2027, Santa Cruz Biotechnology, Heidelberg, Germany) antibodies, respectively, for 30 min at 4°C. All EMSA experiments were replicated at least three times.
To isolate and enrich allele-specific binding proteins, we performed magnetic beads-based affinity chromatography. According to the manufacturer΄s instructions, streptavidin coupled magnetic beads (Dynabeads M-280, Invitrogen, Darmstadt, Germany) were washed and collected using Bind&Wash buffer (Dynabeads M-280, Invitrogen, Darmstadt, Germany) and Magnetic particle separator (Magna-Sep™, Invitrogen, Darmstadt, Germany), discarding the supernatant. Magnetic beads were coupled to biotinylated oligonucleotides at 4°C overnight. We tested different conditions for optimal binding of beads to oligonucleotides and found that both the concentration of oligonucleotides and beads, as well as incubation time were critical for the coupling efficiency of oligonucleotides to beads. To prevent undesired reaction with streptavidin, the magnetic beads were incubated with the Bind&Wash buffer containing 8.2 μM biotin for 1 h at room temperature. The magnetic beads were then washed two times with wash buffer followed by equilibration with 1× binding buffer (10 mM Tris–HCl, 1 mM MgCl2, 0.5 mM EDTA, 0.5 mM DTT, 4% v/v glycerol) and incubated with nuclear extracts for 20 min in binding buffer containing 50 mM NaCl and 0.01% CHAPS using a rotator. We used 7 mg of nuclear extracts, which showed the most significant enrichment of proteins of interest. To reduce non-specific DNA binding, poly (dI-dC) was subsequently added to the mixture and incubated further for 10 min. Competition with poly (dI-dC) supports the specific binding of proteins to the oligonucleotides. Subsequently, the beads were washed three times with binding buffer containing 10 mM NaCl and the bound proteins were eluted by 1× binding buffer with increasing concentration of NaCl (200 and 300 mM) in a volume of 100 μl (eluates E200 and E300). All steps were performed at 4°C. Finally, 5–10 μl of protein from supernatants, washes and eluates were used for EMSA. If eluates showed enrichment of proteins of interest, the remaining volumes were subjected to LC–MS/MS analysis. All affinity chromatography experiments were replicated at least three times.
Salt eluted fractions were processed as described before (10,27) in an adaptation of the FASP approach (28) using Microcon devices YM-30 (Merck Millipore, Darmstadt, Germany). The LC–MS/MS analyses were performed as described previously on a LTQ-Orbitrap XL (Thermo Scientific, Dreieich, Germany) (29) with the following adjustments: A nano trap column was used (300 μm inner diameter × 5 mm, packed with Acclaim PepMap100 C18, 5 μm, 100 Å; LC Packings) before separation by reversed phase chromatography (PepMap, 25 cm, 75 μm ID, 2 μm/100 Å pore size, LC Packings, Thermo Scientific, Dreieich, Germany) operated on a RSLC (Ultimate 3000, Thermo Scientific, Dreieich, Germany) using a nonlinear 170 min LC-gradient from 5 to 31% of buffer B (98% acetonitrile) at 300 nl/min flow rate followed by a short gradient from 31 to 95% buffer B in 5 min and an equilibration for 15 min to starting conditions. From the MS pre-scan (acquired in profile mode), the 10 most abundant peptide ions were selected for fragmentation in the linear ion trap if they exceeded an intensity of at least 200 counts and if they were at least doubly charged. Dynamic exclusion was set to 30 s. During fragment analysis, a high-resolution (60 000 full-width half maximum) MS spectrum was acquired in the Orbitrap with a mass range from 300 to 1500 Da.
The RAW files (Thermo Scientific, Dreieich, Germany) were analyzed using the Progenesis LC–MS software (version 4.0, Nonlinear Dynamics, Waters, Eschborn, Germany), as described previously (29,30), with the following changes: Spectra were searched using the search engine Mascot (Matrix Science, London, UK) against the Ensembl mouse database (Release 69; 50512 sequences). False discovery rates were stringently kept below 1% as calculated by a Mascot-integrated decoy database search using the percolator algorithm (cut-off score 13, significance threshold of P < 0.05). Peptide assignments were re-imported into Progenesis LC–MS. Normalized abundances of all unique peptides were summed up and allocated to the respective protein.
For each experiment, 1 × 106 primary human preadipocytes as well as preadipocytes in vitro differentiated to adipocytes for 14 days were cultured in six wells of a six-well plate as previously described (6). ChIP experiments were performed using the ChIP‐IT® Express Enzymatic Chromatin Immunoprecipitation Kit from Active Motif (La Hulpe, Belgium) according to the manufacturer's protocol with slight modifications as described elsewhere (31). Briefly, after enzymatic digestion for 15 min, 10 mM EDTA was added and chromatin was sheared using the EpiShear™ Probe Sonicator (Active Motif, La Hulpe, Belgium; 20 pulses consisting of 20 s sonication followed by 30 s rest at 25% amplitude) in the same buffer. Chromatin was then incubated for 30 min with protein G magnetic beads (Active Motif, La Hulpe, Belgium) and 2 μg of rabbit polyclonal αYY1 antibody (sc-281x, Santa Cruz Biotechnology, Heidelberg, Germany). Incubation with 2 μg of rabbit IgG (sc-2027x) served as internal negative controls. The amount of precipitated DNA was evaluated by allele-specific quantitative PCR (AS-qPCR) using the Eppendorf Mastercycler (Eppendorf, Hamburg, Germany). To quantify allele-specific protein binding we performed SYBR-green qPCR (Maxima SYBR Green/ROX qPCR Master Mix, Thermo Scientific, Dreieich, Germany) using rs7647481 forward primer (5΄-AAGATGTTTTGGGGCTTAATGG-3΄) with the allele-specific reverse primers rs7647481A nonrisk (5΄-GCTGGGTCTGAACATCATAG-3΄) and G-risk (5΄-CTGGGTCTGAACATCACAG-3΄), respectively (bold: SNP position, underlined: additional mutation). Allele-specific reverse primers were designed (TIB Molbiol, Berlin, Germany) with the respective rs7647481-allele and an additional mutation to increase allele-specificity as previously described (32). Allele-specificity was tested using a 611 bp DNA fragment containing rs7647481A nonrisk and G risk allele, respectively; primer efficiencies were calculated using REST 2009 software (www.gene-quantification.de/rest-2009.html). The rs7647481 allele-specific protein-chromatin interaction at the A nonrisk/G risk allele (Figure (Figure4E)4E) was determined by calculating ΔCt(A) and ΔCt(G) for A- and G-allele by subtracting the input-chromatin Ct-values from respective ChIP-chromatin Ct-values for both anti-YY1 and IgG experiments, the allele-specific ratio for each antibody based on ΔΔCt method (here (primer efficiency A-allele)(–ΔCt(A))/(primer efficiency G-allele)(–ΔCt(G))) and finally the ratio to the respective IgG control for each experiment. Pairs of anti-YY1 and IgG ChIP experiments were performed from fixed chromatin of preadipocytes and in vitro differentiated adipocytes from three donors, which were previously genotyped with a concordance rate of >99.5% using the MassARRAY system with iPLEX™ chemistry (Sequenom, Hamburg, Germany) as described (33).
SGBS cells were cultured in 6-well plates and transfected with 25 nM siRNA targeting YY1, RYBP or as control non-targeting (NT) siRNA (ON-TARGETplus human siRNA SMARTpool, Dharmacon, Freiburg, Germany) using HiPerFect (Qiagen, Hilden, Germany) according to the manufacturer's instructions. Seventy two hours after transfection, confluent cells were harvested using the RNeasy-Minikit (Qiagen, Hilden, Germany) to extract total RNA and stored at –80°C. The high capacity cDNA Reverse Transcription kit (Applied Biosystems, Darmstadt, Germany) was used for transcription of 1 μg total RNA into cDNA according to the manufacturer's instructions and stored at –20°C. RNA/cDNA concentration and purity (A260/280) was assessed using NanoQuant Plate™ (TECAN, Männedorf, Switzerland). qPCR analysis of human PPARG1, PPARG2 isoform transcripts (6), GAPDH housekeeping gene, YY1 and RYBP to control for knockdown efficiency, was performed using the qPCR SYBR-Green ROX Mix (ABgene, Hamburg, Germany) according to the manufacturer's instructions, using a Mastercycler Realplex 4 and Realplex Software (Eppendorf, Hamburg, Germany) with an initial activation of 15 min at 95°C followed by 40 cycles of 15 s at 95°C, 30 s at 60°C and 30 s at 72°C. Amplification of specific transcripts was confirmed by initial sequencing, and melting curve profiles (cooling samples to 68°C and heating slowly to 95°C with measurement of fluorescence) and by agarose gel electrophoresis to assess the size of PCR products. PPARG1 (NM_138712.3): forw5΄-CGTGGCCGCAGATTTGA-3΄ + rev5΄-AGTGGGAGTGGTCTTCCATTAC-3΄ = 177bp; PPARG2 (NM_015869.4): forw5΄-GAAAGCGATTCCTTCACTGAT-3΄ + rev5΄-TCAAAGGAGTGGGAGTGGTC-3΄ = 146bp; GAPDH (NM_002046.5): forw5΄-GATCATCAGCAATGCCTCCTGC-3΄ + rev5΄-ACAGTCTTCTGGGTGGCAGTGA-3΄ = 128bp; RYBP (NM_012234.6): forw5΄-CTGCACCTTCAGAAACAGTGC-3΄ + rev5΄-GTGCCACCAGCTGAGAATTG-3΄ = 103 bp; YY1 (NM_003403.4): forw5΄-CGAGTTCTCGGTCACCATGT-3΄ + rev5΄-CTGCCAGTTGTTTGGGATCT-3΄ = 181 bp (specificity of primer tested by BLAST). Mean target mRNA levels, standardized to gene expression levels of the housekeeping gene, from five independent biological replicates of knockdown experiments, each PCR measured in duplicates, were calculated using the ΔΔCt method relative to the siNT control experiment. P-values were calculated using one-sample t-test.
To assess transcriptional activity mediated by SNP-adjacent regions, luciferase reporter gene assay was performed as described previously (6) with the following modifications. C2C12 cells cultured in 48-well plates to approximately 80% confluence were differentiated for 7 days as described above. C2C12 cells (undifferentiated myoblasts, differentiated myocytes), 293T, INS-1 β and Huh7 cells at 80–90% confluence were transfected in 48-well plates by Lipofectamine 2000 transfection reagent (Invitrogen, Darmstadt, Germany). 293T, Huh7, and C2C12 (undifferentiated and differentiated) were transfected with 0.3 μg of the respective firefly luciferase reporter vector, 0.04 μg of the ubiquitin promoter vector and 1 μl of Lipofectamine 2000 reagent. INS-1 β-cells were transfected with 1.2 μg of the respective firefly luciferase reporter vector, 0.16 μg of the ubiquitin promoter vector and 2 μl of Lipofectamine 2000 reagent. 3–4 h after transfection the medium was replaced by fresh medium followed by incubation at 37°C. Twenty four hours after transfection, the cells were washed once with PBS and lysed in 1× passive lysis buffer (Promega, Mannheim, Germany) on rocking for 20 min at room temperature. Firefly and renilla luciferase activity were measured using luminometer (Berthold, Pforzheim, Deutschland), respectively. The ratios of firefly luciferase expression to renilla luciferase expression were calculated and normalized to the thymidine kinase promoter control vector. All experiments were performed repeatedly as indicated. P-values were calculated using paired t-test.
The Genomatix GePS-tool (Genomatix, Munich, Germany) was used to assess the enrichment of molecular function GO-terms (Supplementary Table S2) and signaling pathways (Supplementary Table S3) in the protein/gene data sets identified by LC–MS/MS, using all identified proteins. Next, to calculate the enrichment of transcriptional coregulators, we used the Genomatix GePS-tool to build a co-citation based network. All proteins annotated as cofactors (Supplementary Table S4) and the respective candidate transcription factors PRRX1, YY1 and NFATC4 were used as input gene list to create networks using the settings co-citation level: Function word level; co-citation filter: (i) Network generation: simple network; and additional interactions per gene: (ii) Genes with interactions are shown in the Figure Figure5A5A and Supplementary Figure S5A-C.
The insulin-resistance measure HOMA-IR (homeostasis model assessment of insulin resistance) and mRNA expression levels were measured in a cohort comprising 30 obese (BMI > 30 kg/m2) otherwise healthy and 26 non-obese (BMI < 30 kg/m2) healthy women (34), all pre-menopausal and free of continuous medication and investigated in the morning after an overnight fast. Venous blood sample was obtained for measurements of glucose, insulin and for preparation of DNA. HOMA-IR was calculated by the formula fP-Glucose (mmol/L) × (fS-Insulin (microU/ml)/ 22.5) (35). Following blood sampling abdominal subcutaneous adipose tissue biopsy was obtained by needle aspiration and adipose microarray analysis was performed exactly as described (34) using the Affymetrix GeneChip Array protocol with 1 μg of total adipose RNA from each subject (gene expression deposited in the National Center for Biotechnology Information Gene Expression Omnibus (GEO; https://ncbi.nlm.nih.gov/geo/) and accessible using GEO series accession number GSE25402). Linear regression analyses were performed with R, version 3.0.2 (R: A Language and Environment for Statistical Computing: from the R Development Core Team of the R Foundation for Statistical Computing, Vienna, Austria, 2014, http://www.R-project.org/) to assess correlation of YY1, RYBP and PPARG adipose tissue mRNA levels with HOMA-IR (adjusted for age, age/BMI and without adjustment) for 20 risk allele and 18 nonrisk allele carriers with available genotypes. Adipose tissue samples were genotyped for rs1801282, rs4684847 and rs7647481 with a concordance rate of >99.5% using the MassARRAY system with iPLEX™ chemistry (Sequenom, Hamburg, Germany), as previously described (6,33). The study was approved by the local ethics committee of the Karolinska University Hospital, Stockholm, Sweden; written informed consent was obtained from all patients who donated biological samples.
All results were expressed as mean ± SD. Student΄s t-tests, Wilcoxon signed rank test and one-sample t-tests were used to compare two groups (two alleles/complex and non-complex SNP regions). All statistical analyses were performed using the Graph Pad Prism software v5.02 (GraphPad Software, CA, USA) or the Statistical Software SPSS v20.0 (IBM Corporation, NY, USA). The applied statistical methods for each experiment are given in the respective figure legend. Statistical differences of results were shown in *P < 0.05, **P < 0.01, ***P < 0.001.
At the PPARG locus, with a total of 23 non-coding variants in high linkage disequibrilium (r2 ≥ 0.7 with PPARG tagSNP rs1801282 (6)), six variants were predicted to be cis-regulatory by conserved binding site modularity analysis (Figure (Figure1B).1B). We showed previously that rs4684847 specifically overlaps with a homeobox TFBS and modulates regulation of PPARG expression (6). Further assessing proximity of the remaining five variants to marks of regulatory regions in adipocytes, we find rs7647481 to overlap with all histone modification regions (36) at both tested late stages (day 3 and day 9) of human adipogenesis (Figure (Figure1C),1C), and with DNase-seq regions (2) in differentiated primary human adipocytes (Supplementary Figure S1); supporting a contribution of rs7647481 to adipocyte specific PPARG2 expression (Figure (Figure1D).1D). Both, rs7647481 and the previously reported rs4684847 (6) are in perfect linkage disequilibrium (r2 = 1.0 in 1000 Genomes (37)), confirmed by sequencing in this study for all analysed tissues and cells (data not shown). To experimentally evaluate these cis-regulatory predictions, we further analysed allele-specific protein–DNA interactions at both predicted cis-regulatory variants and additionally at two variants predicted as non cis-regulatory (Figure (Figure1D1D).
Electrophoretic mobility shift assays (EMSA) using nuclear extracts of mouse brown adipocytes, revealed allele-specific protein–DNA interaction for both predicted cis-regulatory SNPs, not observed with the predicted non cis-regulatory SNPs (Figure (Figure2A2A upper panel). Quantification of protein–DNA complexes confirmed allele-specific binding at both predicted cis-regulatory SNPs (P = 0.03) in contrast to non cis-regulatory SNPs (P = 0.82 for rs17036342 and P = 0.80 for rs2881479, respectively) (Figure (Figure2A2A lower panel). PPARG regulates adipogenesis and mature adipocyte metabolism. In EMSA experiments using nuclear protein extract from primary human preadipocytes, human SGBS cell strain preadipocytes and in vitro differentiated SGBS adipocytes, we find differential protein–DNA interaction patterns (Figure (Figure2B),2B), indicative for different transcription factors and regulatory protein complexes contributing to PPARG expression in different stages of adipogenesis. Allele-specific differential binding was observed consistently at predicted cis-regulatory variants, with increased binding at rs7647481A nonrisk and rs4684847C risk allele, respectively.
Next, we aimed to identify allele-specific binding proteins, i.e. regulators and coregulators at those variants by an unbiased, quantitative label-free protein–DNA proteomics (Figure (Figure1A-5).1A-5). To enrich allele-specific binding proteins for identification by mass spectrometry, we incubated biotinylated oligonucleotides of 40 bp length with risk or nonrisk allele of each SNP at midposition with nuclear extracts. DNA-binding proteins were concentrated by affinity chromatography with streptavidin coupled to magnetic beads. We performed a fractionation approach using increasing elution stringency of the native protein complexes, enabling direct control for an enrichment of allele-specific binding proteins in the eluted fractions by EMSA assays prior to LC–MS/MS analysis. This mass spectrometric analysis of the relevant fractions effectively reduced complexity, and thus enabled specific detection of both, transcription factors and transcriptional coregulators (Figure (Figure1A-6,1A-6, see also chapter on YY1 transcription factor and RYBP cofactor at rs7647481). EMSA experiments with bead-eluted proteins revealed an enrichment of allele-dependent protein DNA-binding complexes for both predicted cis-regulatory SNPs, by increased binding to the rs4684847C risk and to the rs7647481A nonrisk allele (Figure (Figure2C).2C). Protein eluates from predicted non cis-regulatory SNPs revealed no obvious allelic differences (Supplementary Figure S2).
At predicted cis-regulatory SNPs, LC–MS/MS found 41–108 and 142–165 proteins (Figures (Figures33 and Supplementary Figure S3, respectively, upper panels) with a significant allele-specific binding (fold change ≥ 2, P ≤ 0.05, n = 3, t-test), in contrast to 25–29 and 44–82 proteins at predicted non cis-regulatory SNPs (lower panels). Comparing the numbers of allele-specific binding proteins at predicted cis- versus non cis-regulatory variants, we found a significant enrichment of differentially binding proteins at cis-regulatory variants (1.87 × 10−25 ≤ P ≤ 1.72 × 10−6, two-sided, two-group binomial test for pairwise comparison of differentially binding proteins, Supplementary Table S1). The comparison of predicted cis- versus cis-regulatory and non cis- versus non cis-regulatory SNPs revealed no significant enrichment for most pairs (4.26 × 10−4 ≤ P ≤ 0.28, Supplementary Table S1). Thus, the highest numbers of allele-specific binding proteins were found at predicted cis-regulatory SNPs supporting specific protein–DNA interaction (Figure (Figure2A).2A). Moreover, when assessing GO-terms for allele-specific binding proteins (fold change ≥ 2, P ≤ 0.05), we found a significant enrichment in the GO-terms DNA binding proteins (P = 1.36 × 10−6, P = 1.44 × 10−7) and structure-specific DNA binding proteins (P = 2.11 × 10−5, P = 3.69 × 10−8) at the predicted cis-regulatory variants rs4684847 and rs7647481 in contrast to low respective GO-term enrichment at predicted non cis-regulatory variants (4.44 × 10−3 ≤ P ≤ 0.04, and 1.47 × 10−3 ≤ P ≤ 9.43 × 10−3, Fisher΄s exact test, Supplementary Table S2).
Focusing on transcription factors identified at the novel cis-regulatory variant rs7647481, label-free proteomics found the nonrisk allele binding transcription factors YY1 with a 6.6-fold (P = 2.94 × 10−3) and NFATC4 with a 2.6-fold (P = 0.01) allelic fold-change (Table (Table1,1, Figure Figure3).3). For rs4684847, previously shown to abrogate a type 2 diabetes-specific homeobox TFBS and to infer with PRRX1 homeobox protein binding (6), proteomics found increased risk allele binding of PRRX1 and ILF3 (2.6-fold, P = 0.01 and 4.2-fold, P = 0.01, respectively, Table Table1,1, Supplementary Figure S3). Subsequently, we assessed the enrichment of canonical signaling pathways using the GePS-tool (Genomatix) within the set of all identified allele-specific binding proteins and the occurrence of candidate transcription factors in the identified pathways (Supplementary Table S3). Notably, the only transcription factor identified in our screen included in significantly enriched signaling pathways was YY1 (P < 0.05, Fisher΄s exact test, E2f transcription factor network, p53 pathway, Prc2 complex sets long-term gene silencing through modification of histone tails, and signaling events mediated by HDAC Class I).
The common rs7647481G risk allele abrogates the core of a YY1 consensus TFBS, confirming the mass spectrometric identifications and GO-term analyses (Figure (Figure4A).4A). The allele-specific protein–DNA interaction at the rs7647481-adjacent region was efficiently blocked in competition and supershift EMSA experiments by 33-fold molar excess of unlabeled YY1 consensus binding sequence or by pre-incubation with a YY1 specific antibody (Figure (Figure4B4B left panel), while protein binding was not affected at all other tested SNP-adjacent regions, including the cis-regulatory variant rs4684847 (Figure (Figure4B).4B). Competition with unspecific competitor oligonucleotides (consensus MyoD myogenic regulatory factors, consensus CdxA chicken homeodomain protein, and scrambled control sequence) did not affect the allele-specific protein binding (Supplementary Figure S4A), further confirming the specificity of YY1 binding at the rs7647481-adjacent region. PPARG expression is essential for preadipocyte differentiation and metabolic function of mature adipocytes. By YY1 competition and supershift experiments we confirmed YY1 binding at the rs7647481A nonrisk allele in both, 3T3-L1 preadipocytes and adipocytes (Supplementary Figure S4B). In reporter gene assays with luciferase constructs of all tested variants transfected into 293T cells, overexpression of the transcription factor YY1 revealed a significant higher transcriptional activity at the rs7647481A nonrisk allele as compared to the risk allele (P = 0.003, Figure Figure4C).4C). Although we found an overall increase of all tested reporters, no allele-specific effects of the cis-regulatory variant rs4684847 and both non cis-regulatory variants were observed. The rs7647481A nonrisk variant also increased transcriptional activity in different cell types significantly, i.e. by 1.2-fold in 293T cells, 1.4-fold in INS1 β-cells, 2.7-fold in C2C12 myoblasts, 2.2-fold in C2C12 myocytes, 1.3-fold in Huh7 hepatocytes (P < 0.01, Figure Figure4D)4D) and 1.5-fold in 3T3-L1 adipocytes (6).
Next, we aimed to confirm in vivo allele-specific binding of YY1 at the rs7647481A nonrisk allele, as results from reporter assays are limited by missing chromosome context or length of analysed sequences. We performed chromatin immunoprecipitation (ChIP) combined with allele-specific quantitative PCR (AS-qPCR) to test for allelic imbalance in primary human adipose tissue cells heterozygous for rs7647481G/A. ChIP experiments were performed using a YY1-specific antibody to pull down cross-linked YY1-protein / chromatin complexes from primary human preadipocyte and adipocyte cells of three donors (Figure (Figure4E).4E). In both, input chromatin and immuno-precipitated chromatin, the nonrisk and risk allele, respectively, were analyzed using AS-qPCR. In adipocytes we observed a significant mean 31.8-fold (P = 0.02, Wilcoxon signed rank test) increased binding of YY1 at the rs7647481A nonrisk allele as compared to the risk allele in chromatin immuno-precipitated with anti-YY1. In preadipocytes, we observed a large variation of YY1 binding at the nonrisk allele but no significant allele-specific binding. No allele-specific differences were observed when using an IgG control antibody. Overall, YY1 supershift (Figure (Figure4B,4B, Supplementary Figure S4B) and ChIP (Figure (Figure4E)4E) experiments in preadipocytes and adipocytes establish binding of the transcription factor YY1 at the rs7647481A nonrisk allele, and support its role in transcriptional regulation of the PPARG gene in both stages of adipogenesis.
Metabolic homeostasis is largely regulated at the transcriptional level through the coordinated interaction between transcription factors, coregulators, and the basal transcriptional machinery (11). Our pull down of protein–DNA binding complexes offers the opportunity to identify functional protein–DNA interactions as we found both, transcription factors (Table (Table1)1) and co-eluting transcriptional coregulators (Supplementary Table S4). We assessed relevant protein-protein interaction using literature co-citation of the allele-specific binding transcription factors YY1, NFATC4, PRRX1 and ILF3 (Table (Table1)1) with all identified coregulators for prioritization of in-depth proof of concept experiments. Note, that considering the second order binding of cofactors to transcription factors binding at DNA, we included all cofactors identified by proteomics regardless of fold-change (Supplementary Table S4). In contrast to NFATC4, PRRX1 or ILF3, we found a significant enrichment of reported transcriptional coregulators co-cited with YY1 (P = 1.56×10−5, fishers exact test, Methods), i.e. RING1 and YY1 binding protein (RYBP), YY1 associated factor 2 (YAF2), prohibitin (PHB), nucleophosmin (NPM1), host cell factor C1 (HCFC1), metastasis associated 1 family (MTA2), DEK oncogene (DEK), and high mobility group box 2 (HMGB2). Additionally, visualizing annotated gene-gene interactions of the transcription factor YY1 with all proteomics-identified proteins annotated as cofactors reveals a network connecting YY1 with RYBP, NPM1, YAF2, MTA2, HCFC1 and metadherin (MTDH) (Figure (Figure5A).5A). Co-localization of YY1 and RYBP has been reported previously by using immunofluorescence staining and co-immunoprecipitation experiments (38). NFATC4 and ILF3 were found connected with the cofactor calreticulin (CALR) and DEAH-box helicase 9 (DHX9), respectively, none of the identified cofactors was connected with PRRX1 (Supplementary Figure S5A–C). For the novel cofactors which were identified by our unbiased proteomics but were not co-cited with YY1 or NFTATC4 at rs7647481A nonrisk and PRRX1 or ILF3 at rs4684847C risk allele (Supplementary Table S4) further experimental proof will be necessary to support protein-protein interaction. Here, we focused on experimental proof for the YY1 / RYBP interaction.
Next, we assessed if the risk and nonrisk allele-specific proteomics findings can be related to disease pathophysiology. The minor nonrisk allele of the PPARG locus (tagSNP rs1801282 Pro12Ala) was repeatedly associated with improved insulin-sensitivity in diverse studies. Thus, allele-specific PPARG expression may contribute to this phenotype, given the essential role of PPARG to maintain insulin-sensitivity. In fact, testing allele-specific PPARG mRNA levels in adipose tissue samples, we observed a confirmative age- and age/BMI-independent negative correlation of total PPARG mRNA levels with the insulin resistance measure HOMA-IR in nonrisk allele carriers (β = –6.25, P = 2.18 × 10−4; β = –3.26, P = 0.05, respectively) as compared to risk allele carriers (β = 0.23, P = 0.89; β = 0.11; P = 0.92, respectively, Table Table2).2). A coordinated regulation of PPARG expression by YY1 and co-identified coregulators at the rs7647481A nonrisk allele may contribute to phenotypes—such as insulin-resistance—associated with the PPARG locus. Here, for adipose tissue mRNA expression levels of the coregulator RYBP, identified by proteomics and reported to interact with the YY1 transcription factor (39), we found a negative age- and age/BMI-independent correlation with HOMA-IR in individuals carrying the nonrisk (β = −5.71, P = 1.15 × 10−3; β = −3.38, P = 7.04 × 10−3; respectively) as compared to risk (β = 0.57, P = 0.67; β = 0.16, P = 0.85, respectively) allele. For none of the other proteins co-cited with YY1 or NFATC4 (Supplementary Table S5) an allele-dependent and BMI-independent correlation was observed. We found a BMI-dependent correlation of PHB, CALR and NPM1 expression (also found at rs4684847C risk allele with PRRX1 and ILF3), requiring further analysis in the future. Taking into account the impact of BMI on insulin-sensitivity, our data support the literature-mining based initial focus on YY1 and RYBP. While we found no significant correlation of YY1 mRNA levels with HOMA-IR in the small available data set, a confirmative direction of β-values was observed (Table (Table2).2). Finally, we also found a positive correlation for adipose mRNA expression levels of both, YY1 and RYBP, with total mRNA levels of the insulin-sensitizing transcription factor PPARG from both alleles (Supplementary Table S5). YY1 and RYBP can function as repressor or activator depending on genomic context and availability of interaction partners (38,40). Assessing the effect on endogenous mRNA expression levels in SGBS preadipocytes, we found that knockdown of YY1 and its cofactor RYBP alone (Figure (Figure5B,5B, 63% or 60% knockdown efficiency, P = 7 × 10−3 or 4 × 10−4, respectively) resulted in a slight but not significant 1.6-fold increase of PPARG2 expression (Figure (Figure5C)5C) suggesting inhibitor-activity. However, in 293T cells YY1 overexpression significantly increased rs7647481A nonrisk allele reporter activity (Figure (Figure4C)4C) suggesting activator-activity, possibly by recruitment of endogenously expressed RYBP (Supplementary Figure S5D). Therefore, we performed simultaneous knockdown of YY1 and RYBP (Figure (Figure5B,5B, 68% or 73% knockdown efficiency, P = 3×10−3/2×10−3, respectively) in the PPARG1 and PPARG2 expressing adipose tissue cell line SGBS. We found a significant 2-fold reduction of endogenous mRNA expression levels for the insulin-sensitizing PPARG2 isoform (P = 0.027, Figure Figure5C)5C) suggesting that YY1 and RYBP jointly activate PPARG2 expression and supporting the importance to account for both, transcription factors and related cofactors. How further cofactors identified by our unbiased proteomics approach (Supplementary Table S4) contribute to activation or inhibition of PPARG gene expression requires future experiments. Overall, our findings suggest that PPARG expression, and thereby insulin sensitivity, may be increased by rs7647481A nonrisk allele-specific binding of the transcription factor YY1 and its cofactor RYBP (Figure (Figure5D5D).
GWAS revealed numerous risk loci associated with common traits (1). Advances of the ENCODE project and novel bioinformatics approaches facilitate identification of cis-regulatory, potentially diseases-causing variants within complex loci (2–7). However, the spatial and temporal varied expression pattern of transcription factors and coregulators supports the need to consider cell-type specific open chromatin data (2,41–45) to prioritize candidate cis-regulatory variants. In our study, combining computational TFBS modularity analysis (6) and functional cell type- and differentiation-specific data from human adipocytes (36) enabled identification of a second cis-regulatory variant at the PPARG locus, additional to a previously reported (6). Our unbiased quantitative label-free proteomics approach came up with significantly more allele-specific binding proteins at predicted cis-regulatory as compared to non cis-regulatory variants, supporting the integrative framework predictions. Moreover, we find both, allele-specific transcription factors and cofactors which may contribute to the rs7647481A nonrisk allele-specific association of the PPARG locus with insulin sensitivity.
Identifying the allele-dependent binding proteins is an essential step to further decipher the molecular mechanisms underlying genetic associations with disease pathophysiology and ultimately to define personalized interventions. However, identification of allele-specific binding proteins by TFBS matrix overlap or ChIP-seq faces limitations, such as availability of TFBS matrix annotation and the complexity of TFBS modularity. So far, few proteomic studies have provided successful transcription factor identification at cis-regulatory variants (12,13,46). Here, taking into account the importance of TFBS modularity for specific protein–DNA interaction (6,47,48), we assessed protein binding directly at the genomic regions of interest, instead of e.g. concatenated oligonucleotides (13). Peptide intensity-based label-free quantification across samples was previously benchmarked against SILAC labeling and compared well with respect to reproducibility, sensitivity, and robustness (14). In contrast to metabolic (13,46) or chemical labeling strategies (12) our label-free approach does not limit future application to any tissue or cell type, including primary human tissue. Our proteomics workflow on eluted fractions containing allele-specific binding-proteins allows identification of allele-specific binding transcription factors. Moreover, our workflow enables for the first time the identification of transcriptional coregulators related to allele-specific gene regulation, which are among the lowest abundant proteins in cells (42,49).
The PPARG locus is robustly associated with type 2 diabetes and insulin-sensitivity (16–18,50,51) and mRNA levels of the insulin-sensitizing PPARG are increased in nonrisk allele carriers (6). At the here identified rs7647481A nonrisk allele, proteomic analysis identified allele specific binding of two transcription factors, YY1 and NFATC4. YY1 was reported to regulate metabolic, diabetes-related phenotypes in skeletal muscle, liver and adipocytes (52–56). NFATC4 contributes to regulation of PPARG and mouse adipocyte differentiation by direct binding at the PPARG2 promoter (57), YY1 indirectly by interaction with C/EBPα (56). We focused on the allele-specific activity of human YY1 and found in vivo allele-specific binding at the rs7647481A nonrisk allele in the human PPARG promoter by ChIP experiments combined with assessment of allelic imbalance in heterozygous primary human adipose tissue cells, supporting the biological relevance of the proteins identified by our label-free proteomics approach. At the rs4684847C risk allele, we previously reported an overlapping homeobox TFBS, risk allele specific binding of the transcriptional inhibitor PRRX1 repressing PPARG expression and promoting insulin-resistance (6). Our unbiased proteomics confirmed risk allele binding of PRRX1 and also found ILF3, a transcription factor so far not related to adipose PPARG gene regulation or diabetes. The regulation of PPARG expression is complex, and future experiments are needed to support contribution of further factors, such as NFATC4 and ILF3 at rs7647481 and rs4684847, respectively, to allele-specific PPARG expression in human adipose tissue, but also for possible interactions of PPRX1 with ILF3 or NFATC4 with YY1.
In addition to transcription factors, the high sensitivity of the applied proteomics approach enabled identification of several cofactors which may be involved in modulating transcriptional regulation of PPARG, by yet to be characterized mechanisms in future studies. For proof-of-concept that our allele-specific proteomics can find both, allele-specific transcription factors and cofactors, we selected the transcription factor YY1 with the highest allele-specific binding and the co-citation enriched RYBP cofactor at rs7647481. The coregulator RYBP—known to interact with YY1 and here co-identified at the rs7647481 variant—was recently shown to associate with skeletal myogenesis (38), in addition to its function as transcription repressor in cancer (58,59), embryogenesis (39) and central nervous system development (60). While YY1 has been reported previously to promote adipocyte differentiation (61), we found that the combined action of the transcription factor YY1 and the cofactor RYBP is necessary for full activation of PPARG2 isoform expression in an adipocyte cell line. PPARG2 has been reported to be crucial for maintaining insulin-sensitivity (20). In human samples we demonstrated a nonrisk allele-specific association of RYBP expression levels with the PPARG nonrisk allele associated phenotype insulin-sensitivity. Thus, in carriers of the protective PPARG allele, binding of YY1 and RYBP at the rs7647481A nonrisk allele may increase PPARG2 expression and thereby insulin sensitivity (Figure (Figure5D,5D, upper panel); and in PPARG risk allele carriers the previously reported rs4684847C risk allele binding of PRRX1 inhibits PPARG expression thereby promoting insulin-resistance (6) (Figure (Figure5D,5D, lower panel).
In conclusion, our results demonstrate the importance to consider allele-specific protein-complexes for delineation of the molecular mechanisms affected by cis-regulatory variants. We present an approach enabling an unbiased identification of allele-specific protein–DNA interactions including both, transcription factors and transcriptional cofactors. Additionally, we provide data supporting that different cis-regulatory variants at the PPARG locus may contribute to disease pathophysiology (Figure (Figure5D),5D), in line with the ‘multiple enhancer hypothesis’ suggesting multiple regulatory SNPs per LD block (62). Integrative approaches combining computational and cell type specific histone / chromatin mark based cis-regulatory prediction in combination with highly sensitive proteome-wide identification of allele-specific binding proteins can help to clarify the phenotypic effects of inherited and somatic genetic variability.
We thank M. Arnold and C. Fuchs for statistics, G. Leboulle for advice with allele-specific primer design, E. Hofmair, C. Herrmann, J. Behler, S. Becker and M. Hubersberger for technical assistance. We acknowledge the ENCODE Consortium and the John Stamatoyannopoulos (UW) ENCODE production laboratory generating the datasets ENCFF513INJ and ENCFF206PZO.
Authors Contributions: Conceived and designed the experiments: H. Lee, H.L. S.M.H.; performed the experiments: H. Lee., K.Q., L.H., F.E., L.J., C.H., Cv.T., V.G., M.B.; analyzed the data: H.Lee., H.L., C.B., S.M.H., Cv.T., S.W., S.M., H.G., M.C.; contributed reagents/materials/analysis tools: I.D., C.B., P.A., H.H., H.G.; wrote the paper: H.Lee., H.L., S.M.H., Cv.T.; revised the manuscript: C.H., K.Q., C.B., L.H., V.G., S.W., M.B., M.C., S.M., H.G., I.D., P.A., H.H. All authors approved the final version.
Present address: Heekyoung Lee, Max Planck Institute for Immunobiology and Epigenetics, 79108 Freiburg, Germany.
Supplementary Data are available at NAR Online.
This work was funded by the Else Kröner-Fresenius Foundation, Bad Homburg v.d.H, Germany; the grant Virtual Institute ‘Molecular basis of glucose regulation and type 2 diabetes’ received from the Helmholtz Zentrum München, München-Neuherberg, Germany; the grant Clinical Cooperation Group ‘Nutrigenomics and type 2 diabetes’ received from the Helmholtz Zentrum München, München-Neuherberg, Germany, and the Technische Universität München, München, Germany; a grant from the German Federal Ministry of Education and Research to the German Centre for Diabetes Research (DZD e.V.); and the grant LA2595/3-1 from the German Research Foundation (DFG). Funding for open access charge: Grant Clinical Cooperation Group ‘Nutrigenomics and type 2 diabetes’ received from the Helmholtz Zentrum München and the Technische Universität München.
Conflict of interest statement. None declared.