ChIP-Seq has become a standard tool for whole genome mapping of histone modifications, transcription factors and chromatin-associated proteins1
. However, the technique is plagued by inefficiencies at ChIP and sequencing steps that translate into a requirement for large amounts of starting materials, typically on the order of millions or tens of millions of cells. ChIP assays yield small amounts of DNA whose availability for downstream assays is further reduced by DNA damage introduced during fixation and fragmentation. Although locus-specific analysis has been achieved for small cell numbers by coupling ChIP with PCR or promoter arrays2–5
, such assays do not achieve the comprehensiveness afforded by DNA sequencing approaches. Sequencing, however, requires relatively large amounts of DNA, in part because library preparation involves an inefficient ligation step that requires double-stranded DNA with intact ends. Direct sequencing of DNA without need for ligation or amplication6
can address some of these inefficiencies and interrogate small quantities of immunoprecipitated DNA. However, neither standard nor direct ChIP-Seq approaches have so far been successfully applied to small cell numbers. We therefore sought to develop a ChIP-Seq assay for limited numbers of cells using the Illumina Genome Analyzer, the dominant short read sequencing platform.
First, we optimized the ChIP assay for small samples. We identified reproducible conditions for shearing small amounts of chromatin (Supplementary Fig. 1
; Methods) and minimized background due to non-specific pull-down by reducing sample volumes and titrating quantities of antibody and beads (Supplementary Fig. 1
; Methods). These improvements enabled us to effectively enrich for histone H3 lysine 4 trimethylation (K4me3) in a ChIP performed with 10,000 mouse embryonic stem (ES) cells, as judged by quantitative PCR (qPCR).
However, the modified ChIP procedure yielded scarce quantities of DNA, below the detection limit of fluorometry and thus remained incompatible with standard ChIP-Seq protocols which require several nanograms of ChIP DNA. We estimated that a K4me3 ChIP performed on 10,000 cells yielded ~10–50 picograms of DNA, and sought to develop a library preparation procedure compatible with these amounts. We initially pursued a strategy that coupled a random primer-based amplification procedure widely used in microarray studies7
with standard Illumina library preparation. However, when we sequenced the resulting libraries and integrated the reads into density maps, we observed many false positive peaks, a conspicuous absence of signal over GC-rich regions and high numbers of unalignable reads that likely reflect primer artifacts.
We therefore implemented a series of strategies to overcome these problems (). First, we tested modified random primers that carry bulky chemical groups or were designed to form secondary structures that prevent self-annealing. By using a random primer hairpin structure and implementing an exonuclease digestion step after priming, we were able to amplify small quantities of ChIP DNA while minimizing non-specific product (, Supplementary Fig. 2
; Online Methods
). Second, we identified additives, cycling conditions and a specific polymerase enzyme that enabled faithful amplification of ChIP DNA, maintaining representation of GC-rich sequences (; Online Methods
). Finally, we digested the amplified DNA at a BciVI site introduced near the ends of the ChIP fragments to yield double-stranded products with 3′ A overhangs that could be ligated directly to Illumina adapters for sequencing.
Preparation of sequencing library from low amounts of ChIP DNA
We validated this protocol with qPCR on the sequencing libraries because the scarce starting ChIP samples could not themselves be evaluated. This procedure quantifies short amplicons corresponding to positive- and negative-control genomic sites whose chromatin states are relatively invariant across cell types (, Supplementary Fig. 1
; Online Methods
We deep sequenced a library prepared from an H3K4me3 ChIP performed on 10,000 ES cells. We obtained roughly 10 million 36 base reads. Initial alignments indicated that the first 9 bases of the sequencing reads had higher mismatch rates, possibly due to imperfect hybridization of random primers. Accordingly, subsequent alignments were performed using bases 10 to 36 of each read. Seven of the 10 million sequenced reads could be aligned to the mouse reference genome, as is typical for ChIP-Seq experiments. Aligned reads were processed into a density profile using standard procedures8
. Visual analysis of the ChIP-Seq map generated from 10,000 ES cells suggested good concordance to a standard H3K4me3 ChIP-Seq map generated from roughly 10 million ES cells (), with striking peaks at a majority of GC-rich promoters. A more quantitative analysis revealed roughly 93% overlap between a set of 11,193 H3K4me3-enriched promoters identified in the low cell number dataset and a set of 12,079 enriched promoters identified from the standard dataset (). Considering the standard ChIP-Seq dataset as a gold standard, we estimate that the low cell number ChIP-Seq dataset has a sensitivity of ~80% at a specificity of ~90% (; Online Methods
Validation of low ure 2 cell number ChIP-Seq maps
Next, we profiled chromatin in hematopoietic stem and progenitor cells isolated from mouse bone marrow and enriched for the immunophenotype Lineage−
(LSK) (Supplementary Fig. 3
; Online Methods
). Using roughly 20,000 cells per assay, we profiled H3K4me3 and two other H3 modifications, lysine 27 trimethylation (H3K27me3) and lysine 36 trimethylation (H3K36me3). These experiments yielded between 13 and 19 million reads, of which roughly 75% could be aligned to the mouse reference genome. Several lines of evidence suggest that the resulting chromatin maps accurately reflect the true patterns of histone modification in the LSK population. H3K4me3 and H3K36me3 localize to gene promoters and gene bodies, respectively, consistent with known roles for these marks in transcriptional initiation and elongation9
. Integration of published gene expression profiles for LSKs10
confirmed expected positive (H3K4me3, H3K36me3) and negative (H3K27me3) correlations between the modifications and the transcriptional status of corresponding genes (). Finally, we confirmed the reproducibility of the method by profiling all three modifications in a second LSK population ().
The LSK chromatin profiles revealed several interesting findings of potential relevance to hematopoietic development. First, we observed a relatively small number of large H3K4me3 regions or ‘domains’ (>10 kb) that are distinct from the sharp peaks observed at most promoters (). In ES cells, the largest H3K4me3 domains correspond to pluripotency genes such as Oct48
. In the hematopoietic progenitors, the largest H3K4me3 domains coincide with known hematopoietic regulators, including HoxB4, HoxA7, HoxA9, Runx1, Meis1 Ikzf2
(, Supplementary Table 1
Chromatin domains at developmental regulators in hematopoietic progenitors
To investigate the significance of the H3K4me3 domains, we examined chromatin maps for ES cells8
. Roughly a third of loci with H3K4me3 domains in LSKs carry both H3K4me3 and H3K27me3 in ES cells. Such ‘bivalent domains’ have been correlated to developmental loci that are silent but poised for activation at later development stages11,12
. Reasoning that this subset of loci could illuminate genes with critical functions in hematopoietic progenitors, we collated genes marked by bivalent domains in ES cells and H3K4me3 domains in LSKs. The resulting list includes 40 genes, 30 of which encode transcription factors or other developmental regulators with previously described functions in hematopoiesis (Supplementary Table 2
Although initially described in ES cells, bivalent domains are also present in multipotent cells, including hematopoietic progenitors8,13,14
. The LSK chromatin maps contain roughly 1700 promoters with detectable H3K27me3 and H3K4me3. A majority of these promoters are also bivalent in ES cells, but they have variable patterns in CD4+
T-cells, a differentiated progeny of LSKs15
(, Supplementary Fig. 4
Whereas essentially all promoters with H3K27me3 in ES cells also have high K4me3, K27me3-marked promoters in LSKs have a wide range of K4me3 levels. We used this opportunity to explore the significance of the bivalent state in an in vivo
setting, specifically asking whether the level of H3K4me3 predicts the likelihood that a given promoter will be activated in differentiated progeny. We integrated our chromatin data with a compendium of gene expression profiles for differentiated hematopoietic cells15,16
. We found that the amount of H3K4me3 at a given H3K27me3-marked promoter in LSKs significantly correlated with the number of differentiated cell types in which the corresponding transcript could be detected (). Accordingly, master regulators of hematopoietic lineages, such as Pax5
, have high levels of H3K27me3 and H3K4me3 in LSKs (, Supplementary Fig. 5
). In contrast, master regulators that function in non-hematopoietic lineages, such as MyoD
, are enriched for H3K27me3, but not for H3K4me3 (, Supplementary Fig. 5
). Thus, the presence of bivalent chromatin in hematopoietic progenitors is associated with an increased likelihood of transcriptional induction during differentiation. Further study is needed to determine the causality of these chromatin structures and their relationship to transcriptional priming and to the developmental potential of progenitor cells17
.This Chip-seq approach should be generally applicable for characterizing chromatin landscapes in biologically- and clinically-important cell models that have been inaccessible due to inadequate sample size. Going forward, further reductions in cell numbers may be achievable by careful optimization of ChIP conditions. The library preparation scheme and accompanying validation procedure may also be useful for profiling transcription factors from relatively smaller samples. However, this application is likely to prove challenging due to variability among factors and affinity reagents and the exceedingly small DNA yields associated with such experiments.