|Home | About | Journals | Submit | Contact Us | Français|
In contrast to changes in protein-coding sequences, the significance of noncoding DNA variation in human disease has been minimally explored. A recent torrent of genome-wide association studies suggests that noncoding variation represents a significant risk factor for common disorders, but the mechanisms by which they contribute to disease remain largely obscure. Distant-acting transcriptional enhancers - a major category of functional noncoding DNA - are likely involved in many developmental and disease-relevant processes. Genome-wide approaches for their discovery and functional characterization are now available and provide a growing knowledgebase for the systematic exploration of their role in human biology and disease susceptibility.
Multiple lines of evidence indicate that important functional properties are embedded in the noncoding portion of the human genome, yet identifying and defining these features remains a major challenge. An initial glimpse of the magnitude of functional noncoding DNA was derived from comparative analysis of the first available mammalian genomes (human and mouse) which indicated that less than half of the evolutionarily constrained sequences in the human genome encode for proteins 1, a notion that was further reinforced when additional vertebrate genomes became available for comparative genomic analyses 2.
The overall impact of these presumably functional noncoding sequences on human biology was initially unclear. A considerable urgency to define their locations and functions came from a growing number of known associations of noncoding sequence variants with common human diseases. Specifically, genome-wide association studies (GWASs) have revealed a large number of disease susceptibility regions that do not overlap protein-coding genes, but rather map to noncoding intervals. For example, a 58kb linkage disequilibrium block located at human chromosome 9p21 was shown to be reproducibly associated with an increased risk for coronary artery disease, yet the risk interval lies more than 60kb away from the nearest known protein-coding gene 3,4. To estimate the global contribution of variation in noncoding sequences to phenotypic and disease traits, we performed a meta-analysis of ~1200 SNPs identified as the most significantly associated variants in GWASs published to date (www.genome.gov/26525384, accessed on March 2, 2009). Using conservative parameters that tend to overestimate the size of linkage disequilibrium blocks (details available upon request), we find that in 40% of cases (472 of 1170) no known exons overlap the linked SNP nor its associated haplotype block, suggesting that in more than a third of cases noncoding sequence variation causally contributes to the traits under investigation.
One possibility that could explain these GWAS hits is that the noncoding intervals contain enhancers, a category of gene regulatory sequences that can act over long distances. A simplified view of our current understanding of the role of enhancers in regulating genes is summarized in Figure 1. The docking of RNA polymerase II to proximal promoter sequences and transcription initiation are fairly well characterized; in contrast the mechanisms by which insulator and silencer elements buffer or repress gene regulation, respectively, are less well understood 5. Transcriptional enhancers represent regulatory sequences that can be located upstream, downstream or within their target gene and can modulate expression independent of their orientation 6. In vertebrates, enhancer sequences are thought to represent densely clustered aggregations of transcription factor binding sites 7. When appropriate occupancy of transcription factor binding sites is achieved, recruitment of transcriptional co-activators and chromatin remodeling proteins occurs. The resulting protein aggregates are thought to facilitate DNA looping and ultimately promoter-mediated gene activation. In-depth studies of individual genes such as APOE or NKX2-5 (reviewed in ref. 8) have shown that many genes are regulated by complex arrays of enhancers, each driving distinct aspects of the mRNA expression pattern. These modular properties of mammalian enhancers are also supported by their additive regulatory activities in heterologous recombination experiments 9.
The purely genetic evidence from GWASs does not allow any direct inferences regarding the underlying molecular mechanisms, but a number of in-depth studies of individual loci (see below) suggest that variation in distant-acting enhancer sequences and the resulting changes in their activities can contribute to human disorders. While we clearly expect a variety of other noncoding functional categories such as negative gene regulators or noncoding RNAs to play a role in human disease, in this review we will focus on the role of enhancers and on strategies to define their location and function genome-wide.
Beginning with the discovery that an inherited amino acid change in the beta-globin gene causes sickle-cell anemia 10,11, thousands of coding mutations in genes responsible for monogenic disorders were identified over the past half-century. In sharp contrast, the role of mutations not involving primary gene structural sequences has been minimally explored, largely due to the inability to recognize relevant noncoding sequences, let alone predict their function. Molecular genetic identification of individual enhancers involved in disease has been in most cases a painstaking and inefficient endeavor. Nevertheless, a number of successful studies have elegantly shown that distant-acting gene enhancers exist in the human genome and variation in their sequences can contribute to disease. In this section, we discuss three examples where enhancers were directly demonstrated to play a role in human disease: thalassemias resulting from deletions or rearrangements of beta-globin (HBB) enhancers, preaxial polydactyly resulting from Sonic hedgehog (SHH) limb enhancer point mutations, and susceptibility to Hirschsprung disease associated with a RET proto-oncogene enhancer variant.
The extensive studies of the human globin system and its role in hemoglobinopathies have historically not only served as a test bed for defining the role of coding sequences in disease 10,11, but also for that of noncoding sequences. Alpha- and beta-thalassemias are hemoglobinopathies resulting from imbalances in the alpha-to beta-globin chain ratios in red blood cells. The molecular basis for these conditions was initially elucidated in those cases where inactivation or deletion of globin structural genes could be readily identified 12. However, while gene deletion or sequence changes resulting in a truncated or nonfunctional gene product explained some thalassemia cases, for a subset of patients intensive sequencing efforts failed to reveal abnormalities in globin protein coding sequences. Through the extensive long-range mapping and sequencing of DNA from individuals diagnosed with thalassemia but lacking globin coding mutations, it was eventually discovered that many of these globin chain imbalances were due to deletion or chromosomal rearrangements which resulted in the repositioning of distant-acting enhancers required for normal globin gene expression 13,14. These early molecular genetic studies revealed a clear role for noncoding regulatory elements as a cause of human disorders through their impact on gene expression. Since then multiple such examples of “position effects”, defined as a change in the expression of a gene when its location in a chromosome is changed, often by translocation, have been uncovered 15.
In addition to the pathological consequences of the removal or the repositioning of distant-acting enhancers, there are also examples of single nucleotide changes within enhancer elements as a cause of human disorders. One example of this category of disease-causing noncoding mutations involves the limb-specific ZRS (also known as MFCS1) long-distance enhancer of Sonic hedgehog (SHH) (Figure 2). This enhancer is located at the extreme distance of approximately one million base-pairs from SHH within the intron of a neighboring gene 16,17. Of interest is the fact that initially the gene in which the enhancer resides was thought to be relevant for limb development based on mouse studies and was therefore named limb region 1 (LMBR1) 18. Facilitated by the functional knowledge of the ZRS enhancer from mouse studies, targeted resequencing screens of this enhancer in humans revealed that it is associated with preaxial polydactyly. Approximately a dozen different single nucleotide variations in this regulatory element have been identified in humans with preaxial polydactyly and segregate with the limb abnormality in families 17,19. Studies of the impact of the human ZRS sequence changes have been carried out in transgenic mice where the single nucleotide changes result in ectopic anterior limb expression during development, consistent with preaxial digit outgrowth 20. Furthermore, sequence changes in the orthologous enhancers were found in mice as well as cats with preaxial polydactyly 21,22, and targeted deletion of the enhancer in mice causes truncation of limbs 16. These elegant studies illustrate the importance of first experimentally identifying distant acting enhancers to enable subsequent human genetic studies to explore the potential role of disease-causing mutation in functional noncoding sequences.
Another example of enhancer variation contributing to human disease is provided by the discovery of a common noncoding variant linked to disease susceptibility in Hirschsprung disease (HSCR). While multigenic, HSCR disease risk is strongly linked to coding mutations in the RET proto-oncogene 23,24. However, familial studies have also revealed evidence for HSCR disease linked to the RET locus but lacking any accompanying functional RET coding mutations. Through the use of multi-species comparisons of orthologous genomic intervals including and flanking RET coupled with in vitro and in vivo functional studies, an enhancer sequence located in intron 1 of RET was identified and found to contain a common variant contributing greater than a 20-fold increased risk for HSCR disease compared to rarer alleles in this element 25,26. In transgenic mice, this enhancer was shown to be active in the nervous system and digestive tract during embryogenesis in a way consistent with its putative role in HSCR 26. It is interesting to note that while this enhancer variation is clearly important in disease risk, the variant alone is not sufficient to cause HSCR, highlighting the complex etiology of this disorder.
As is evident from these labor-intensive gene-centric studies, enhancers can in principle play an important role in disease, but it remains unclear whether they represent rare exceptions or if variation in enhancers contributes to disease on a pervasive scale. Support for the latter comes from a rapidly growing number of examples where noncoding SNPs linked to disease traits through GWASs were found to affect the expression levels of nearby genes 27, suggesting that variation in regulatory sequences may commonly contribute to a wide range of disorders. The results of the recent GWASs, coupled with the role of gene regulation in normal human biology, provide a strong incentive for defining the distant-acting enhancer architecture of the human genome.
Gene-centric studies have been crucial for defining general characteristics of gene regulatory regions in specific human disorders but have only identified and characterized a limited number of such elements. Systematic large-scale identification of sequences that are likely to be enhancers was first enabled by comparative genomic strategies. These approaches are based on the assumption that the sequences of gene regulatory elements, like those of protein-coding genes, are under negative evolutionary selection because most changes in functional sequences have deleterious consequences 28–31. Thus, it was hypothesized that statistical measures of evolutionary sequence constraint would provide a way to identify potential enhancer sequences within the vast amount of noncoding sequence in the human genome. Support for this approach initially came from retrospective comparative genomic analyses of experimentally well-defined enhancers revealing that they frequently shared sequence conservation with orthologous regions present in the genomes of other mammals. The observation that DNA conservation identified many of these complex regulatory elements encouraged investigators to move from blind studies of regions flanking genes of interest to focusing specifically on noncoding sequences constrained across vertebrate species, culminating in whole-genome studies where conservation level alone guided experimentation 31–33.
Initially, comparisons across extreme evolutionary distances, such as between human and fish, were deemed most effective for this purpose 28,30. Indeed, it was observed through large-scale transgenic mouse and fish studies that many of these noncoding sequences that had been conserved for hundreds of millions of years of evolution were enhancers that drove expression to highly specific anatomical structures during embryonic development. Likewise, so-called “ultraconserved” noncoding elements which are blocks of 200bp or more that are perfectly conserved between human and rodents 34 were also found to be highly enriched in tissue-specific enhancers, suggesting that the success rate of comparative approaches for enhancer identification depends on scoring criteria, rather than just evolutionary distance 31. This notion was further supported by the development of statistical tools specifically for this purpose, from which it became evident that even comparisons between relatively closely related species can be effective predictors of enhancers 2,35,36. A large-scale transgenic mouse study that included nearly all non-exonic ultraconserved elements in the human genome revealed that while many of them are developmental in vivo enhancers, other noncoding conserved sequences that are under similar evolutionary constraint, but less than perfectly conserved between human and rodents, are equally enriched in enhancers 32. These results suggest that ultraconserved elements do not represent a functionally distinct subgroup of conserved noncoding sequences regarding their enrichment for in vivo enhancers, but rather that there is a much larger number of noncoding sequences that are under similar evolutionary constraint and just as enriched inenhancers as ultraconserved elements.
Independent of the specific algorithms and metrics that were used, most categories of conserved noncoding sequences were found to be not randomly distributed in the genome. Instead, they are located in a highly biased manner near genes active during development 2,32–34, consistent with the observation that a large fraction of these noncoding sequences give robust positive signals in various assays as tissue-specific in vivo enhancers active during development.
Comparative approaches are an effective high throughput genomic strategy for identifying noncoding sequences with a high likelihood of being an enhancer, but they suffer from several limitations. First, while conservation is indicative of function, it is not necessarily indicative of enhancer activity because many other types of noncoding functional elements are known to exist that may have similar conservation signatures. Second, even when conserved noncoding DNA is due to enhancer function, conservation cannot predict when and where an enhancer is active in the developing or adult organism. For all identified candidates experimental studies are needed to decipher the gene regulatory properties of each element and these transgenic studies cannot feasibly be scaled to generate truly comprehensive genome-wide datasets.
A perplexing study questioning the importance of extremely conserved enhancers was the lack of an apparent phenotype upon targeted deletion of four independent ultraconserved elements in mice 37. General expectations were that noncoding sequences perfectly conserved in mammals for dozens of millions of years must be essential and their deletion should result in severe phenotypes, comparable to those observed upon deletion of the Shh limb enhancer and other less well-conserved enhancers 8,16. However, mice with deletions of such ultraconserved enhancers were viable, fertile and showed no overt phenotype 37. Interpretations of this lack of obvious effects are similar to those for absence of phenotypes upon deletion of highly conserved protein-coding genes: Minor phenotypes may have escaped detection in the assays used, functional redundancy with other genes or enhancers, or reductions in fitness that only become apparent over multiple generations or are not easily detected in a controlled laboratory environment. This study highlighted that while extreme noncoding sequence conservation is an effective predictor of the location of enhancers in the genome, the degree of evolutionary constraint is not directly correlated with the severity of anticipated phenotypes.
As a complementary strategy to comparative genomic methods, it has recently become possible to generate genome-wide maps of chromatin marks that can be used to identify the location of enhancers and other regulatory regions. These genomic approaches have been enabled by (a) an improved understanding of the proteins and epigenetic marks found at particular categories of regulatory elements and (b) concurrently developed technologies that allow traditional chromatin-immunoprecipitation techniques to be applied on the scale of whole vertebrate genomes. In particular, the initial in-depth studies of 1% of the genome in the ENCODE pilot project, largely based on datasets generated by the ChIP-chip technique (Box 1), revealed molecular properties of a variety of regulatory elements. With respect to enhancer identification, a particularly relevant insight was the identification of specific methylation signatures found at enhancers. In contrast to promoters, which are marked by trimethylation of histone H3 at lysine residue 4 (H3K4me3), active enhancers are marked by monomethylation (H3K4me1) at this position 38. Mapping these marks in the ENCODE regions and, more recently, throughout the entire genome 39 revealed tens of thousands of elements that were predicted to be active enhancers in the examined cell types. Importantly, these predicted enhancers were also frequently associated with the transcriptional coactivators p300 and/or TRAP220, raising the possibility that such coactivators might represent useful general markers for mapping enhancers. While it was initially not clear to what extent the presence of transcriptional coactivators like p300 is indicative of active vs. inactive enhancers, comparison of DNaseI hypersensitivity (DNaseIHS, a marker of open chromatin structure) in several cell lines throughout the ENCODE regions revealed that the location of cell line-specific distal DNaseI HS sites correlates with cell line-specific p300 binding at these sites, providing further support for the possibility that transcriptional coactivators, along with histone modification signatures, may be useful for mapping of DNA elements with cell-and tissue-specific enhancer activities 40.
Formaldehyde cross-linking of DNA to proteins that bind to it directly or as part of larger complexes 70 combined with subsequent immunoprecipitation targeting specific DNA-associated proteins (ChIP, 71) has been widely used in the pre-genomic era to study protein-DNA interactions directly in living cells. The technique involves the molecular fixation of non-covalent protein-DNA interactions, shearing of the cross-linked chromatin, immunoprecipitation with an antibody binding the protein (or protein modification) of interest, and subsequent quantitation of enrichment of the associated DNA fragments compared to non-immunoprecipitated (“input”) DNA. While useful to examine protein-DNA interactions at individual hypothesized binding locations, the need for quantitation at every single site of interest initially thwarted the application of this technique on a genomic scale. The introduction of DNA microarrays enabled hybridization-based quantitation of large numbers of candidate sites in parallel (“ChIP-on-chip” or “ChIP-chip”), thus making it possible to screen in a single experiment entire compact model organism genomes 72,73 or large vertebrate genome intervals 74 (Figure 3). This technique was used on a massive scale in the Encyclopedia of DNA Elements (ENCODE)pilot project, where dozens of proteins and protein modifications were initially mapped in a representative 1% portion of the human genome 57.
Recently, chromatin immunoprecipitation coupled to massively-parallel sequencing (ChIP-seq) has become increasingly utilized as an alternative to ChIP-chip 42–45. The ChIP-seq method is very similar to the experimental setup of ChIP-chip, except that in the final step, massive-parallel sequencing techniques are used to determine the sequence of immunoprecipitated DNA fragments, which are then computationally mapped to the reference genome (Figure 3). Improved sequencing technologies offer the possibility to obtain millions of mappable reads in a single ChIP-seq experiment at moderate cost. The results from ChIP-seq are based on statistical analysis of read counts, which overcomes many of the challenges associated with the quantitation and normalization of hybridization signals, and an increasing number of advanced computational ChIP-seq analysis tools are becoming available 75. ChIP-seq analysis covers by default the entire mappable portion of the reference genome without the need to restrict the analysis to its subregions.
Thanks to the development of the ChIP-seq technique (Box 1), which has now superseded ChIP-chip as the method of choice for many applications, genome-wide maps for a considerable number of chromatin marks and transcription factors both in human and mouse have become available 41–53. In addition to the H3K4me1/3 signature discussed above, these datasets enabled the identification of additional chromatin marks present at predicted or validated enhancers and provided a refined view of their correlation to enhancer activities 42,49,53. However, with very few exceptions (e.g., references 48,52) genome-wide mapping of these and other regulation-associated chromatin marks (Table 1) was done in immortalized cell lines, cultured stem cells or primary cell cultures. Thus, the maps of potentially enhancer-associated marks produced by these studies provided limited insight into their in vivo distribution during embryonic development and in adult organs, likely concealing the genomic location of enhancers that are inactive in these cells.
In a recent ChIP-seq study targeted at the prediction of enhancers that are active in a particular tissue during embryonic development, the transcriptional coactivator p300 was mapped in chromatin directly derived from embryonic mouse tissues including the forebrain, the midbrain, and the limb buds 54. Overall, several thousand p300 peaks were identified from these three tissues, with the vast majority of genome regions only being significantly enriched in one of the three tissues and located in noncoding regions distal from known promoters. Transgenic mouse experiments with close to a hundred of these sequences revealed that they are in almost all cases developmental enhancers. More importantly, the tissue-specific occupancy by p300 as identified by ChIP-seq could in most cases also accurately predict the in vivo patterns of expression driven by these enhancers, providing an important advantage over comparative genomic methods for enhancer identification. The study also showed that tissue-specific p300 peaks are globally enriched near genes that are expressed in the same tissue, again consistent with their hypothesized function as active transcriptional enhancers.
These experimentally predicted genome-wide sets of in vivo enhancers also made it possible to address the controversial issue to what extent evolutionary conservation is a hallmark of in vivo enhancers 55. Several studies have shown that highly conserved noncoding elements are enriched in developmental in vivo enhancers 31–33. However, some observations have challenged such a generalized correlation between sequence conservation and enhancer activity: (1) experimental analysis of individual loci suggested that a large proportion of enhancers cannot be detected by comparative genomics 56, (2) a surprisingly large fraction of sequences in the ENCODE regions whose molecular marks suggest regulatory functions were not or only weakly conserved 57, (3) histone methylations present at orthologous loci in human and mouse did not correlate with overall increased levels of sequence conservation 58. In contrast to these findings, approximately 90% of the tissue-specific p300 peaks identified by ChIP-seq in developing mouse tissues overlapped regions that are under detectable evolutionary constraint 54. While there may be variation in the degree of evolutionary constraint of enhancers that are active in different types of cells or developing tissues, these data suggest that developmental enhancers that can be identified through p300-binding are commonly evolutionarily constrained.
While in its infancy, the selected studies reviewed here highlight the clear potential of mapping various chromatin marks for identifying and predicting the activity of transcriptional enhancers on a genome-wide scale. The continued progress in throughput and cost reductions of next-generation sequencing technologies offers an increasingly powerful genome-wide means for identifying specific DNA-protein interactions. We anticipate that high-resolution genome-wide in vivo maps of chromatin marks will become available for comprehensive series of developing and adult tissues in normal as well as disease states, providing multi-layered in vivo annotations of the noncoding portion of our genome. It is important to realize that despite this expected progress, we will continue to need parallel in vitro and in vivo biological studies to understand the functions associated with chromatin marks and to conclusively study the mechanisms by which sequence variation in distant-acting enhancers contributes to disease.
The methods described above have considerably improved our capability to identify enhancers and their associated activity patterns on a genomic scale, but a remaining important challenge will be to determine the relations between enhancers and genes. Comparing ChIP-chip or ChIP-seq data with transcriptome data from microarrays or RNA-seq 59 can provide highly suggestive clues what the target gene of a given enhancer in a given tissue is, but this comparison does not provide the direct evidence for enhancer-promoter interactions that would be desirable in order to map tissue-specific regulatory networks on a genomic scale.
Early circumstantial evidence suggested that long-distance regulation of genes by enhancers occurs through the formation of physical chromatin loops, yet it became first possible to study such interactions systematically through the introduction of the chromosome conformation capture (3C) assay and its derivative technologies 60. Similar to ChIP, the 3C approach relies on formaldehyde cross-linking to capture DNA-DNA interactions directly in intact cells or cell nuclei. Previously hypothesized pairs of interacting sites are subsequently tested and validated in a one-by-one fashion through the quantitation of cross-linking events. As one of many examples demonstrating the utility of 3C in the analysis of distant-acting vertebrate enhancers, Amano et al. 61 recently used this technique to study chromatin interactions at the Shh hedgehog locus whose role in limb development we discussed in detail above. Using the 3C technique, the authors demonstrated elegantly that the limb-specific long-range enhancer located in an intron of the Lmbr1 gene directly interacts with increased frequency with the Shh promoter in limb buds but not in other tissues tested, providing important mechanistic support for its proposed role in Shh gene regulation in limb development. As an alternative approach to 3C, RNA-tagging and recovery of associated proteins (RNA-TRAP) can also be used to establish physical proximity between distal noncoding sequences and actively transcribed genes, which was first demonstrated in the mouse beta-globin locus 62.
This work and other gene-centric studies (for more examples see references 63,64) were critical to shape our understanding of enhancer-promoter interactions. However, they suffer from the fundamental limitation that only one or very few previously hypothesized interactions between specific loci can be assayed per experiment. This limitation was partially overcome through the use of microarrays to analyze entire 3C libraries (chromosome conformation capture-on-chip, 4C 65 and circular chromosome conformation capture, also called 4C 66). By applying this approach to fetal liver and brain, it was demonstrated that the beta-globin locus control region (LCR) makes reproducible tissue-specific contacts with other loci predominantly located on the same chromosome, yet in some cases dozens of megabases away from the LCR 65. Of possible relevance for adopting this approach for enhancer discovery, reproducible interactions with other chromosome regions were also observed in the brain where the LCR is thought to be inactive.
The 4C approaches represented a significant improvement, but they still preclude the generation of truly genome-wide interaction networks because each experiment only reveals the genome-wide interactions of a single site of interest. This problem is partially alleviated by the chromosome conformation capture carbon copy (5C) method 67 in which a complex 3C library generated through multiplexed PCR is analyzed by large-scale sequencing to generate a comprehensive “many-to-many” interaction map of DNA-DNA-interactions. However, due to the need for specific primers for each possible interacting fragment and the sequencing depth required for analysis of the resulting libraries, application of 5C has so far been restricted to the in-depth analysis of single loci or chromosome regions.
As an alternative genome-wide approach, antibody-based methods might be used to restrict the analysis space for studying DNA-DNA interactions to a size that can be affordably analyzed by currently available sequencing technologies. Namely, it was suggested to couple a chromatin interaction paired-end tag sequencing (ChIA-PET) strategy to a ChIP step that enriches for chromatin fragments bound to a specific transcription factor or other chromatin mark of interest 63. While the technical feasibility of this approach remains to be demonstrated, it has remarkable potential for enhancer discovery. This is because its application to general enhancer-associated marks such as p300 or histone methylations 38,54 might identify, in a single step, enhancers active in a tissue of interest, as well as their respective target genes.
Genetic and medical resequencing studies have been empowered by knowledge about the structure of protein-coding genes and a detailed understanding of the relation between mRNA sequences and the primary structure of the proteins they encode. Through such studies, disease links have been established for a sizeable fraction of the ~20,000 protein-encoding genes in the human genome. In contrast, a very limited number of sequence changes in gene regulatory sequences could be linked to human disease. Consequently, an important impetus for functionally annotating the noncoding portion of the human genome and the cis-regulatory elements it contains is to assess the relationship between variations in noncoding sequences and human disease. In the absence of genome-wide catalogues of functionally annotated regulatory elements, their impact on human biology as well as disease will remain an untested hypothesis.
In this review, we have outlined how the number of annotated noncoding regulatory sequences is poised to dramatically expand through the continued progress of DNA sequencing technologies coupled with markers to assess higher order chromatin status. Nevertheless, functionally characterizing the distant-acting enhancer architecture of the human genome in its entirety will be an enormous undertaking due to the vast number of data points needed, which include dozens of tissues and cell types, as well as developmental and possibly disease states.
A further challenge will be to link distant-acting enhancers to the genes they regulate. Linking enhancers to their cognate gene will allow the further assigning of these functional sequences to their basic “gene” unit of heredity for collective resequencing analysis.
It is important to keep in mind that several categories of functional elements exist in the noncoding portion of the genome (e.g. enhancers, insulators, negative regulators, promoters, and non-coding RNAs). While this review focused on distant-acting enhancers, these other types of noncoding sequences will also be crucial targets for large-scale identification and characterization and it is expected that technologies similar to those described here for enhancers will make it possible to explore their roles in human biology and disease.
The authors wish to thank M. Blow, S. Deutsch, and A. Sczyrba for help with computational analysis of GWAS data and C. Attanasio for critical comments. L.A.P./E.M.R. were supported by the Berkeley-PGA, under the Programs for Genomic Applications, funded by National Heart, Lung, & Blood Institute, and L.A.P. by the National Human Genome Research Institute.