Single nucleotide polymorphisms (SNPs) that change the pattern of transcription factor (TF) binding to DNA are believed to be a major contributing factor to cis
-modulation of gene expression; approximately 30% of expressed genes show evidence of cis
-regulation being influenced by common alleles [1
]. In particular, polymorphisms occurring in TF binding sites (TFBSs) that change the pattern of regulatory protein binding to DNA are believed to be a major contributing factor to cis
-modulation of gene expression. Recent advances in genomic technologies [2
] are now making allele-specific analyses of expression, TF-DNA interactions and chromatin states possible across the human genome, aiding in evaluation of how DNA polymorphisms in regulatory elements control gene expression.
Chromatin immunoprecipitation-sequencing (ChIP-Seq) and related approaches are now extensively applied to study genome-wide binding of TFs. ChIP-Seq allows the detection of total binding at specific sequences and of their allele-specific activity in cases in which heterozygous sites overlap ChIP-Seq peaks. For example, recent reports extended global allele-specific analysis across individuals to DNA-protein binding [5
]. Of particular relevance to our study is the work of Kasowski and co-workers [6
], in which the authors analyzed binding of the NF-κB protein RELA in stimulated lymphoblastoid cells across eight individuals and documented binding differences between paired individuals at numerous genomic locations.
A major impediment to the ChIP-based evaluation of cis
-regulatory SNPs is that, by its nature, ChIP can identify genomic regions that interact with TFs but not individual binding sites [7
]. Other limiting factors in ChIP that can confound measured TF-DNA binding include the state of chromatin at binding regions [9
], differing extents of nucleosome occupancy [10
], the quality of the antibodies that are so vital to its success and also the near impossibility of isolating a specific dimer instead of all dimers having a subunit in common. Thus, a ChIP-based method is typically used in conjunction with other techniques that can map the site of TF-DNA interactions more precisely. In particular, protein binding microarrays have significantly enhanced our understanding of what individual sequence variants do to alter binding potential within an in vitro
setting, allowing for greater predictive capability of the effect of a SNP on a TFBS [11
]. While microarrays were established using a stable attachment of DNA to a solid surface that is in contact with a TF through a liquid medium, other alternative high-throughput platforms, such as Bind-n-Seq [14
] or multiplexed massively parallel SELEX (systematic evolution of ligands by exponential enrichment) [8
]), are based on both the TF and DNA being in a purely liquid environment. SELEX is a process through which consecutive rounds of selective purification are employed to progressively enrich for a population of DNA ligands that are 'preferentially' bound by the TF in question.
This study focuses on NF-κB, but there is, in general, a great interest within the scientific community to qualitatively and quantitatively define at high resolution all the different DNA sequences bound by TFs [15
]. The NF-κB family of TFs has been extensively studied due to its roles in different biological processes like inflammation, apoptosis, development and oncogenesis [16
]. NF-κB proteins function as homo- or heterodimers, which are made up of Rel homology domain-containing monomers from two subfamilies: the p50 and p52 subfamily (type I subunits); and the RELA, RELB and C-Rel subfamily (type II subunits). Type I subunits lack a transactivation domain and can only activate transcription as a heterodimer with a type II subunit or as a homodimer in complex with co-factors, such as BCL3, IKBZ, and so on [18
]. In a given heterodimer, the type II subunit confers transcription-activating capability. Members of the NF-κB TF family bind to a 'core motif' that is between 10 to 11 bases in length [21
Our overall approach is outlined in Figure . We first characterized the binding of nine NF-κB dimers (homodimers of RELA, p50 and p52 and the heterodimers RELAp50, RELAp52, RELBp50, RELBp52, C-Relp50 and C-Relp52) to a limited, 11-mer NF-κB consensus binding space using our microarray platform. This produced data that did not require extensive post-processing and allowed for rapid visualization of the different binding profiles for the dimers. Previously, Badis and co-workers [24
] highlighted binding models with coverage of sequence space beyond what has been defined by more canonical models. Included in their study were models with sequence compositions that were again substantially different from those in the canonical models. This suggested that there may be an entire area of 'less canonical' k-mer space that is, as yet, not well defined. We therefore extended our observations to cover this space by further profiling the three RELA dimers using a method we have developed, electrophoretic mobility shirt assay-sequencing (EMSA-Seq) combining EMSA assays done with purified proteins and degenerate oligonucleotide libraries with complete coverage of 11-mer space followed by next generation sequencing of bound DNA molecules. Our results show that a high number of sequences are binders that fall outside of the canonical NF-κB consensus and specificity of binding for typical examples of these novel sequences was validated by UV-laser footprinting.
Figure 1 Outline of the dual platform approach used to profile NF-κB family dimers. Double-purified, His-tagged NF-κB dimers interact with DNA-probes (microarray) or DNA-ligands (electrophoretic mobility shift assay-sequencing (EMSA-Seq)). Two (more ...)
Finally, we examine the relationships between NF-κB in vitro
binding affinities (defined as binding potential) and their significance in vivo
by overlaying sequences and measured binding affinities from our datasets onto genomic locations of RELA ChIP-Seq peaks containing SNPs in stimulated lymphoblastoid cells across eight individuals [6
]. Direct positive correlation of NF-κB binding potential with in vivo
NF-κB binding can be found in 65% of relevant cases examined and these span 1,405 genomic locations that show differences in ChIP-Seq peak heights between individuals. These include regions that may also have potential implications for disease association studies and we show examples in which the risk allele for disease is present in the haplotype associated with higher binding properties in vitro
and in vivo
, whereas the normal allele haplotype contains motifs with lower binding properties. This illustrates the utility of studies utilizing TF binding potential for the interpretation of regulatory functional traits.