Possible outcomes of SeqChIP experiments
Outcomes of SeqChIP experiments can fall into one of three categories: (i) complete co-occupancy, (ii) no co-occupancy or (iii) partial co-occupancy of the two factors (A and B) being tested (Figure ). Complete co-occupancy occurs when two proteins always associate with the same DNA fragment, and neither factor is found on the DNA in the absence of the other. No co-occupancy occurs when A and B associate with the same genomic region in vivo but can do so only on mutually exclusive sub-populations of DNA fragments. In the case of partial co-occupancy, some DNA molecules have both A and B, while others have only A and/or only B.
Figure 1 SeqChIP experiments have three different outcomes: complete co-occupancy, no co-occupancy, and partial co-occupancy. In full co-occupancy, proteins A and B always co-occupy promoters; neither one is found bound to a particular region without the other. (more ...)
There are two types of partial co-occupancies (Figure ). First, DNA binding of protein B always occurs in combination with protein A, whereas protein A can associate with DNA in the absence of protein B. For example, promoter binding by yeast TAFs depends on the TBP, yet TBP is found at many promoters that lack TAFs (17
). Second, A and B can bind independently of one another to a given genomic region, but the proteins may co-occupy the same stretch of DNA in only a fraction of the cases. This second possibility may occur when two DNA-binding proteins do not interact with other and recognize different target sequences in an enhancer or other transcriptional regulatory region of a eukaryotic promoter.
Fold-enrichments and the nature of experimental background in SeqChIP experiments
The basic measurements in ChIP and SeqChIP experiments involve the fold-enrichments of ‘target’ genomic regions (e.g. an active promoter) to ‘non-target’ or control genomic regions (e.g. an intergenic region or coding region of an inactive gene). The experimental values for the control genomic regions are usually considered as experimental background. However, two separate components contribute to this experimental background.
One such experimental background can arise from the non-specific precipitation of DNA fragments that are either not crosslinked to proteins or are crosslinked with nuclear proteins not specifically targeted for immunoprecipitation. This background is frequently due to non-specific ‘purification’ by the antibody, protein A, or the agarose-based resin, and can occur even though those fragments are not associated with the protein being specifically precipitated. The extent to which this type of experimental background would be a problem largely depends on the avidity/affinity of the antibody toward the antigen and the conditions employed in the immunoprecipitation (e.g. stringencies of washes, elution conditions, etc.). Specific steps taken to minimize this type of background can include longer antibody–antigen incubation times, more stringent washes of bound complexes and optimized elution with epitope-containing peptides or protein fragments. Ideally, and often in practice, DNA fragments not crosslinked to the immunoprecipitated protein make a minor contribution to the overall experimental background.
A second experimental background in SeqChIP experiments can arise when the protein of interest gets crosslinked to genomic DNA in a non-sequence-specific manner. Such crosslinked material can arise from random collisions between the protein and DNA in the nucleus and/or true non-sequence-specific association of the protein. The extent to which non-specific DNA crosslinking contributes to the experimental background depends on both the average length of sheared DNA fragments and on the relative binding affinities to non-specific versus specific DNA sequences, which vary widely among proteins. Importantly, background due to non-specific protein–DNA crosslinking is an intrinsic feature of SeqChIP. It generally represents the major form of experimental background in SeqChIP, and it cannot be eliminated by varying the experimental procedure. As a consequence, meaningful information cannot be obtained from control experiments involving two sequential immunoprecipitations involving the same protein. In essence, the second immunoprecipitation largely re-purifies the same material (i.e. both crosslinked control and target regions), and it normally results in no change in fold-enrichments.
Theoretical predictions for fold-enrichments in SeqChIP experiments
The key concept for quantitating SeqChIP experiments is that the final fold-enrichment of the sequential ChIP should be equal to the product of the fold-enrichments of the individual ChIPs, if two proteins completely co-occupy DNA (12
). In this regard, the immunopurifications involved in SeqChIP are analogous to sequential biochemical purification steps or fold-stimulation of a biochemical process by two independent events. Importantly, this concept relies only on the assumptions that in vivo
formaldehyde crosslinking events are independent and inefficient.
Formaldehyde crosslinking of proteins to DNA is inefficient, with maximal crosslinking efficiencies for individual proteins typically ranging between 1 and 10% (1
). As the physical crosslink of A to DNA is independent of the physical crosslink of B to DNA, most DNA molecules crosslinked to A will lack a similar physical link to protein B (and vice versa
) even when proteins A and B always co-occupy a region of DNA. Hypothetically, extremely efficient protein–protein crosslinks between A and B (or crosslinks involving a common intermediary protein) could yield SeqChIP values that are significantly more than the product of the individual ChIPs. However, protein–protein contacts per se
do not generate ChIP signals and as such play a secondary role relative to direct protein–DNA contacts. Furthermore, substantial experimental evidence presented here and elsewhere (16
) validates these key concepts.
The theoretical results of a SeqChIP experiment are calculated as follows (Figure ). There are three relevant classes of DNA molecules—those containing A alone, B alone, or A + B. ‘X’ is defined as the fraction of A-DNA molecules that also contain B, and ‘Y’ represents the fraction of B-DNA molecules that also contain A. X and Y are unrelated to each other because the occupancy characteristics of proteins A and B are different. Thus, if the fold-enrichments of the individual ChIP experiments are defined as A and B, then the expected fold-enrichments for the different classes of molecules are as follows:
Figure 2 Predicted experimental outcomes for quantitative SeqChIP. Predicted full co-occupancy (A), no co-occupancy (B) and partial co-occupancy (C) values are expressed as functions of parameters X and Y, where X and Y represent fractions of A-DNA molecules (more ...)
From the above equations it follows that:
For complete co-occupancy (X = 1 and Y = 1; Figure A), the fold-enrichment in a SeqChIP equals the product of the individual ChIPs, and fold-enrichment is not affected by the order of the individual ChIPs (AB = BA). For no co-occupancy (X = 0 and Y = 0; Figure B), the fold-enrichment in the SeqChIP is within experimental error of the fold-enrichment of the first IP, and is also independent of the actual order of the IPs (AB = BA = 0). Partial co-occupancy occurs when the fold-enrichment of a SeqChIP is significantly higher (in practical terms, this is often considered to be 2-fold) than the fold-enrichment of the first ChIP, but significantly lower than the product of the two ChIPs (Figure C). Unlike the examples above, the order of the individual IPs can matter in cases of partial co-occupancy (AB ≠ BA). In fact, partial co-occupancy will always be the result if two proteins do not completely co-occupy DNA, even though it might be experimentally observed only in one direction under special circumstances (see below).
In cases of partial co-occupancy, the order of individual ChIPs makes a difference (Figure C). Furthermore, partial co-occupancy will be observed in only one direction when X = 1 and Y is small; i.e. factor A always co-occupies DNA with factor B, but only a small proportion of B molecules co-occupy with A molecules (Figure C). When DNA-crosslinked A is immunoprecipitated first, most or all of those molecules will also have B associated and hence will benefit from the second IP. In contrast, when B is immunoprecipitated first, only a small percentage of molecules will contain A, so that the second IP will not add much fold-enrichment to the already highly enriched pool of B-DNA molecules.
To illustrate the importance of the order of individual ChIPs in cases of partial co-occupancy, consider an example in which X = 1, Y = 0.1, A = 10, B = 50. In this case, the predicted fold-enrichments are 50 when the A ChIP is first and 95 when the B ChIP is first. When the A ChIP is first, the SeqChIP value of 50 represents a 5-fold enrichment over the A ChIP, which indicates partial co-occupancy as it is well below the theoretically predicted maximal co-occupancy value of 500. However, when the B ChIP is first, the SeqChIP value of 95 is <2-fold enrichment over the individual B ChIP, and hence is within experimental error, making it impossible to distinguish partial co-occupancy from experimental background. In contrast, in situations where both X and Y are both significant (e.g. each being 0.5), the predicted SeqChIP values are 130 if the A ChIP is first and 150 if the B ChIP is first, indicating that partial co-occupancy should be observed in both directions. Even though partial and full co-occupancy results are definitive in one direction, the examples above underscore the importance of performing SeqChIP in both directions (e.g. A first then B and vice versa), as lack of co-occupancy in one direction is not necessarily indicative of no co-occupancy.
A quantitative metric of protein co-occupancy
We define a measure of SeqChIP efficiency (C; in percent) that is related to the extent of partial co-occupancy between two proteins. A C-value of 100 is defined as complete co-occupancy, a value of 0 is defined as no co-occupancy, and intermediate C-values represent partial co-occupancy. Specifically, C = 100(AB − A)/(A · B − A), where AB represents the fold-enrichment for the sequential ChIP and A and B represent the fold-enrichments for the individual ChIPs. The value A is subtracted from both AB (SeqChIP result) and A · B (product of the individual ChIP results), because A represents the contribution of the first IP and as such does not represent sequential IP enrichment per se. When the order of IPs is reversed (B ChIP is first, A ChIP is second), C is calculated according to the formula C = 100(BA − B)/(B · A − B).
In cases of partial occupancy, efficiencies are dependent on the order of individual IPs (as AB ≠ BA), which in turn means that typically CAB ≠ CBA. From the above example where X = 1, Y = 0.1, A = 10, and B = 50, CAB ≈ 8 (CBA is undefined SeqChIP enrichment over the first ChIP is within experimental error), a low partial co-occupancy value. When the X and Y values are 0.5 each, then CAB ≈ 24 and CBA ≈ 29, indicating substantial partial co-occupancy between factors A and B. Thus, aside from providing information on the extent to which two proteins occupy a given genomic region, this quantitative metric can be used to estimate values of X and Y and hence additional information about the nature of co-occupancy.
Considerations related to the relative locations of the two proteins associated with DNA
Our analysis of co-occupancy is best suited for two proteins with binding sites that are spaced closely together (<100–200 bases apart). As sonication of crosslinked chromatin produces a randomized population of fragments that average 400–500 bases in length, inter-site spacing of significantly >200 bases will result in a substantial proportion of DNA fragments that contain one binding site but not the other. For proteins that always co-occupy a given DNA region, but whose sites are separated significantly >200 bases, the outcome from a standard SeqChIP experiment will appear to be partial co-occupancy, because the percentage of fragments containing both sites will be significantly lower than if the sites are close together. To investigate co-occupancy of two proteins whose binding sites are separated by several hundred bases, sonication times should be decreased so that the average fragment length after sonication is 1.5–2 kb, and PCR primers should be designed so that amplification products encompass both binding sites if possible. However, these modifications will result in higher background as well as lowered signal. These considerations do not reflect any inadequacy of the theoretical treatment of SeqChIP experiments described above, but rather the fact that proteins that associate with DNA sequences too far apart from one another will not often be found on the same DNA fragments generated by sonication or other fragmentation methods.
To validate the above theoretical treatment of SeqChIP data, we performed co-occupancy experiments for situations related to the RNA polymerase II transcription machinery for which detailed biochemical and structural information is available. It has been strongly suggested that basic components of the Pol II machinery completely co-occupy promoters (17
). In accord with this prediction, SeqChIP experiments involving pairwise combinations of TBP, TFIIA, TFIIB, and Pol II at the PGK1
promoter show complete co-occupancy in all cases tested, with an average C
-value remarkably close to 100 (Figure A). As expected, TBP and TFIIB do not co-occupy a tRNA promoter transcribed by RNA Polymerase III (Pol III) since TFIIB binding to this promoter is <2-fold above background. On the other hand, no co-occupancy of Mot1 and TFIIA is observed at the PYK1
promoter when the SeqChIP is performed in either direction (Figure B; C
= 0), even though both Mot1 and TFIIA exhibit considerable binding to this promoter individually. The lack of co-occupancy by Mot1 and TFIIA is in accord with their functional antagonism in vitro
) and their competitive binding to the solvent-exposed surface of TBP (25
). Finally, we observe varying degrees of partial co-occupancy between TBP and TAFs depending on whether the promoter is TAF-dependent or TAF-independent (17
-values for TAF6–TBP and TAF12–TBP co-occupancies at TAF-dependent promoters range from 33 to 50, whereas C
-values for TAF-independent promoters are 2- to 3-fold lower (Figure C). Partial co-occupancy between TBP and TAFs is expected, because distinct TBP complexes lacking TAFs can associate with promoters (18
). Furthermore, the difference in TBP–TAF co-occupancy values at TAF-dependent and TAF-independent promoters is consistent with previous genetic and molecular observations (17
), and it provides strong evidence for a direct correlation between the magnitude of C
and extent of co-occupancy. More extensive SeqChIP analysis of the co-occupancy behaviors of TBP, TFIIB, TFIIA, Pol II, TAFs and Mot1 demonstrate that cellular stress alters the transcriptional properties of promoter-bound Mot1–TBP complexes, and are presented elsewhere (16
Figure 3 The three predicted outcomes of SeqChIP: full co-occupancy, no co-occupancy and partial co-occupancy. (A) Full co-occupancy of TFIIA, TFIIB, TBP and Pol II at the PGK1 promoter. (B) No co-occupancy of Mot1 and TFIIA at the PYK1 promoter irrespective (more ...)
Practical experimental considerations
To apply SeqChIP in a quantitative manner as described above, it is essential that the individual immunoprecipitations are efficient, and that the fold-enrichment for a given protein is equivalent when the immunoprecipitation is first or second. Ideally, both individual immunoprecipitations should yield ≥5-fold enrichment of target sites over non-target sites in order to unambiguously resolve instances of partial co-occupancy from cases of full (or no) co-occupancy. It is highly recommended to perform SeqChIP experiments in both forward (A-B) and reverse (B-A) directions. Although technically unnecessary in the cases of complete co-occupancy, performing SeqChIP in both directions will provide compelling evidence of complete co-occupancy, and will permit unambiguous differentiation between states of partial and no co-occupancy in the event A-B do not fully co-occupancy the DNA fragment being studied.
To demonstrate that the first and second immunoprecipitations are equivalent, it is critical to have a positive control involving two proteins that are known or strongly suspected to completely co-occupy genomic regions and hence give C
-values = 100 in the control experiment. It is best, although not always possible (or practical), to individually design positive control experiments for every protein analyzed by SeqChIP. In the examples described above, the positive controls for the partial co-occupancy of TBP and TAF6 are the complete TBP–TFIIB and TAF6–TAF12 co-occupancies at the same genomic sequences. While the actual positive controls are determined on a case-by-case basis, tightly associated polypeptides or subunits of multiprotein complexes as determined by biochemical studies will often be good choices. It is also possible to use some of the protein combinations described here and elsewhere (16
) as general controls for the SeqChIP procedure, depending on the availability of antibodies or epitope-tagged strains. As discussed in the section on the nature of experimental background, control experiments involving sequential immunoprecipitations involving the same protein are inappropriate, because re-purification of the same material normally does not change the fold-enrichment. In contrast, sequential immunoprecipitations involving two subunits of a multi-protein complex results in increased fold-enrichments described by the theoretical treatment above, because the two purification steps involve different classes of DNA molecules due to the low crosslinking efficiency of the two proteins.