|Home | About | Journals | Submit | Contact Us | Français|
High-throughput DNA sequencing approaches have enabled direct interrogation of chromatin samples from mammalian cells. We are beginning to develop a genome-wide description of nuclear function during development, but further data collection, refinement, and integration are needed.
Until recently, the primary motivation for DNA sequencing was a structural one. We wanted to sequence a specific molecule, perhaps a genome, to ‘read’ the information encoded in the DNA sequence. Determination of new genome sequences is still a major activity, and furthermore, sequencing of additional copies of the same genome to identify variation has moved on apace. However, the availability of extensive genome sequences of the major model organisms, including human , and the development of second-generation high-throughput (HTP) methods and instrumentation for genome sequencing [2,3] have opened up a new area of application: use of DNA sequencing as an analytic method. In this analytic mode, sequencing stands at the evolutionary pinnacle of a line of methods stretching back through DNA microarrays via Southern and Northern blotting to solution hybridisation. For any experimental sample containing nucleic acids, we want to determine the composition, preferably with accurate quantitation. Whereas prior methods depended upon nucleic acid hybridisation producing an analog signal one-step removed from the sequence of the DNA molecules, sequencing offers precise determination of the composition of a nucleic acid mixture and is not restricted to the sequences available on a microarray. Moreover, for complex nucleic acid samples, HTP sequencing generates gigabases of genome-wide sequence from each instrument run, providing quantitative digital information on an acceptable timescale at a cost that facilitates coverage for even large genomes.
Over the past 2 years, a number of reports have described the application of this new analytic approach to a variety of chromatin sample mixtures derived from mammalian cells. These data are beginning to give us a description of the complex regulatory processes that control the function of the genome in the cell nucleus.
Johnson et al.  described the application of HTP sequencing to chromatin immunoprecipitated with antibody to human neuron-restrictive silencer factor (NRSF) in the Jurkat T-cell line [chromatin-immunoprecipitation sequencing (ChIP-seq)]. They identified 1,946 NRSF-binding sites associated with repression of gene transcription close to 1,020 genes. These included previously unidentified gene targets of NRSF repression. In addition, a significant number of NRSF sites containing a relaxed form of the previously identified canonical NRSF-binding motif were identified. Concurrently, Robertson et al.  used ChIP-seq to identify ~41,000 and ~11,000 potential STAT1 targets in human HeLa S3 cells with or without γ-interferon stimulation respectively, reflecting activation of STAT1 upon cytokine stimulation. Sensitivity relative to known STAT1 targets was determined as 71%. Notably, Robertson et al. deployed almost an order of magnitude more sequence reads than Johnson et al. In both studies, approximately one-half to two-thirds of the total sequences generated could be mapped uniquely to the genome. The success of both studies indicated that ChIP-seq using antibodies for transcription factors (TFs) could be deployed to create genome-wide maps of TF-binding sites. Future mapping of many TFs on samples from multiple tissues will result in a rich map of regulatory sites across the genome.
An alternative approach to mapping regulatory sites, including promoters, enhancers, silencers, and insulators, is to identify sites of DNase 1 hypersensitivity. HTP sequencing of libraries from DNase 1-digested nuclei of primary human CD4+ T-cells was used by Boyle et al. , together with microarray data, to define ~95,000 sites on a genome-wide DNase 1 hypersensitivity map. Many of the strongest DNase 1 sites are found at transcription start sites (TSSs), but this accounts for only 16% of sites, whereas only 15.5% of sites lie outside of genes. Virtually all of the TSSs of highly expressed genes are marked by DNase 1 hypersensitive sites (HSs). Strikingly, Boyle et al. found that background DNase 1 sensitivity away from HSs exhibited an oscillating pattern occurring over a single nucleosome length with a frequency of one DNA double-helical turn. Furthermore, well-positioned nucleosomes could be detected close to some HSs. DNase 1 HS maps will serve as a valuable backbone on which to overlay TF-binding sites identified by ChIP-seq.
ChIP-seq has also been applied to analysis of the distributions of modified histones across the genome. Barski et al.  analysed enrichment of 20 different histone methylations as well as RNA polymerase II, CCCTC-binding factor (CTCF), and the histone variant H2AZ in CD4+ T-cells. In addition to the previously described association of histone H3K4me1, H3K4me2, H3K4me3, and H3K36me3 association with gene activation, histone H3K27, H3K9, H4K20, H3K79, and H2BK5 monomethylations were all linked to active genes. In contrast, histone H3K27 and H3K9 trimethylations were associated with repressed genes and heterochromatin formation. CTCF was found to mark the boundaries of active and repressed histone modification domains . Since the Barski et al. ChIP-seq protocol used micrococcal nuclease digestion to prepare mononucleosomes, the sequence data could be analysed on a strand-specific basis to identify nucleosome positions in promoters with extremely fine resolution . Further analysis of additional histone methylations and acetylations in CD4+ T-cells indicated that many of the modifications are strongly correlated . For instance, a module including 17 histone modifications appears to be common at promoters. Mikkelsen et al.  also generated genome-wide maps of chromatin modifications in embryonic stem and lineage-committed cells and observed that actively transcribed genes are marked by H3K4me3 at their promoters and H3K36me3 along their transcribed length. H3K27me3 marked either regions that are transcriptionally repressed or, when H3K27me3 occurred together with H3K4me3 at a promoter, genes poised for expression along future developmental paths. Subsequently, the H3K4me3/H3K36me3 signal at actively transcribed genes has been used to identify novel multi-exonic non-coding RNA genes . A further advantage of the sequencing approach was revealed by Mikkelsen et al., who were able to identify allele-specific histone modification patterns using single-nucleotide polymorphisms.
The application of HTP sequencing in ChIP-seq and DNase 1-seq is now extending to other TFs and other cells and tissues. The next step will be to integrate the different datasets with each other and with other approaches using HTP sequencing including DNA methylation analysis  and sequencing of cellular transcriptomes [14,15] to provide a comprehensive view of the regulation of transcription at a chromatin-wide level. Further on, it may be possible, along with HTP data on the three-dimensional interactions within chromatin , to model the regulatory structure of the nucleus. Comparison of these comprehensive chromatin-state maps between tissues and along developmental pathways should facilitate our understanding of regulation in development and disease.
The application of HTP sequencing to these complex samples is not without its caveats. Substantial algorithmic development to map the sequence reads efficiently and to score the sites of enrichment compared with an untreated control (non-immunoprecipitated chromatin or ‘input’ for ChIP-seq) is still ongoing. Indeed, even for the most comprehensively sequenced genome, not all sequences are represented in the reference, including unknown amounts of certain repetitive sequences, therefore careful filtering of the data is required. Improvements in the sequencing technologies, including higher throughput and longer sequence reads, will help to address these issues as well as to increase productivity and cost efficiency. However, sample input for ChIP-seq is also limited by the availability of suitable antibodies. In addition, future technologies that sequence single molecules without amplification may allow certain applications to work with very few cells, thereby overcoming the limitation restricting current analyses to the averages of broad populations of cells. In any case, it is not hard to imagine a time when an HTP sequencer will be located in every functional genomics laboratory.
The author thanks John Collins and Stephan Graf for their comments on the draft of the manuscript.
The electronic version of this article is the complete one and can be found at: http://F1000.com/Reports/Biology/content/1/32
The author declares that he has no competing interests.