In the era of functional genomics, the challenge is to elucidate gene function, regulatory networks and signaling pathways
[1]. Since regulation of gene expression
in vivo mainly occurs at the transcriptional level, identifying the location of genetic regulatory elements is a key to understanding the machinery regulating gene transcription. A major goal of current genome research is to identify the locations of all gene regulatory elements, including promoters, enhancers, silencers, insulators and boundary elements, and to analyze their relationship to the current annotation of human genes
[2],
[3]. In recent years, many genome-wide strategies have been developed for identifying functional elements. However, no method yet has the resolution to precisely identify all regulatory elements or can be readily applied to the entire human genome. The classical method of mapping DNase I hypersensitive sites (DHSs) by Southern blotting has been used to identify many different types of genetic regulatory elements
[4], but it can only be applied to one small region of the genome at a time. Chromatin immunoprecipitation with microarray detection (ChIP-chip) can define the global locations of regulatory factors
[5],
[6],
[7], but is more suitable for studying known factors, and requires high quality ChIP antibodies. More recently, new methods have been described that work by capturing a library of chromatin with DNase I-digested ends, and by using massively parallel signature sequencing (MPSS) for sequencing (DNase-seq), or labeling and hybridization to tiled microarrays (DNase-chip)
[8],
[9]. Crawford et al. produced approximately 230,000 sequence tags and identified an estimated 20% of sites in their DNase-seq experiments
[10], while their DNase-chip strategy covered 1% of the genome
[11]. Boyle et al mapped open chromatin using a DNA library from single DNase I cleavage ends and next-generation sequencing (NGS)
[12], while Sabo et al generated a DNase I library of DNA fragments (<~1200 bp) released by two-cleavage ‘hits’ occurring next to each other and identified DNase I hypersensitive sites (DHSs) using microarrays
[13],
[14].
The introduction of next generation sequencing (NGS) technology is one of the major breakthroughs in recent genomics research
[15],
[16],
[17],
[18]. Generally a DNA library of short fragments (100–500 bp) is required for NGS. Thus, methods capable of generating large numbers of short DNA fragments are advantageous for NGS. We speculated that DNase I double-hit fragments of 100–300 bp would resist mechanical shear better than longer sequences during DNase I digestion, and this would help us lower background noise. In addition, the short DNA fragments would be easily purified, and could be used for NGS library preparation, thus greatly simplifying library preparation and sequencing.
In the present study, we enriched short DNA fragments (100–300 bp) released by DNase I digestion and generated a DNA library from human HeLaS3 cells. For convenience we call this method the “Short DHS assay”. We identified 83,897 DHSs in 10,505,607 DHS tag sequences with high sensitivity and specificity. By combining whole-genome data from the Short DHS Assay and expression microarrays, we detected a specific correlation between DHS location and gene expression. Our data suggest that the Short DHS Assay is straightforward and should be a valuable tool for preparing DNA libraries for global identification of gene regulatory elements.