We present a rapid and unbiased
in vivo method to screen a large genomic fragment for enhancer activity. The high efficiency of lentiviral vector-mediated transgenesis
[26] enables testing of many sequences in a single experiment. Moreover, the method bypasses time-consuming mouse breeding since it does not need the generation and maintenance of transgenic lines, but is instead based on the analysis of F0 embryos. The method described here substantially diminishes the number of oocyte injections and foster mice and thus increases the throughput compared to single construct injections
[19]. Our demonstration that injecting pools of up to 20 different lentiviral vectors leads to the successful identification of transcriptional enhancers allows the scale-up of this enhancer screen covering up to megabases of DNA.
We have extensively screened a mouse BAC for enhancer activity, with over 74% of the total sequence tested. To our knowledge, this is the first broad unbiased (i.e. not driven by evolutionary conservation) screen for transcriptional enhancers in transgenic mice. We identify 3 enhancers with a high degree of confidence, the most robust of which drives expression of the reporter in the posterior diencephalon and neural tube. Importantly, of the three identified enhancers, only one is strongly evolutionarily conserved. The two other regulatory elements show no detectable sequence conservation whatsoever and would not have been uncovered in a conservation-based candidate approach. This observation indicates that exhaustive screens for functional elements should not be restricted to conserved DNA elements. Moreover, while current annotation of the mouse genome (NCBI37-mm9) does not display predicted transcription factor binding sites, the human orthologous fragment of enhancer 5F7 harbours an abundance of predicted binding sites (FoxC1, Oct-B1, POU3F2). It is possible that even the non-conserved elements may contain a short sequence of conservation that is responsible for enhancing activities, particularly since the typical transcription factor binding site is just a few nucleotides-long. Interestingly, the two non-conserved enhancers, separated by only 57 kb, displayed the same pattern of reporter expression in the trigeminal ganglion. They could represent « shadow enhancers » with overlapping activities
[36], but it remains unknown whether the target gene of these enhancers is
Olig or a more distant or unannotated gene.
Since we screened a BAC mapped within an orthologous fragment studied in the ENCODE project pilot phase, we asked whether our identified conserved enhancer 5F7 carried annotations suggestive of function. Human 5F7 does not show any significant DNaseI hypesensitivity in the seven cell lines tested (CD4+ T cells, GM06990 lymphoblastoid, HeLa S3 cervical carcinoma, HepG2 liver carcinoma, H9 human embryonic stem, IMR90 human fibroblast, K562 myeloid leukemia-derived). Interestingly, human 5F7 is mostly covered by repressive chromatin marks (H3K27me3 mainly) in all cell lines investigated by ENCODE (erythroleukaemia, umbilical vein endothelial, skeletal muscle myoblast, mammary epithelial, lymphoblastoid, embryonic stem, epidermal keratinocyte, lung fibroblast). However, the most conserved part of human 5F7 is marked by monomethylation on lysine 4 of histone H3 (H3K4me1) in embryonic stem cells (H1-hESC), a modification associated with enhancers
[37],
[38]. This suggests that the locus is tightly regulated and mostly repressed but can be activated in a specific spatio-temporal manner. Such a tight control pattern would be compatible with the likely regulation of
OLIG genes. These data should be treated with caution however as they originate from non-neural human cell lines that likely differ in their regulation of this locus compared to LacZ positive cells in our E11 murine embryos. We also looked at p300 binding sites in forebrain, midbrain and limbs of E11 mouse embryos (data from
[39]), but none of our identified enhancers overlapped with a peak of p300 binding in these tissues.
The ENCODE project pilot phase had previously described several functional regions that showed no evidence of evolutionary constraint
[23]. Likewise, another report had subsequently suggested that non-conserved elements could also harbour enhancer activities in zebrafish transgenics
[20], but a broad unbiased screen had not so far been conducted in mice. Here, we provide further evidence that non-conserved sequences with enhancer activity exist. This observation has important implications regarding the annotation of genomes and the identification of disease-related variation. It is noteworthy that our study presented two limitations precluding the exhaustive identification of enhancers in the DNA region under study. First, we concentrated our analysis on a narrow window of embryonic development. Second, overlapping signals may have masked the activity of some discrete enhancers.
To increase the likelihood of discovering sequences potentially associated with human disorders, we set out to study a region syntenic with human chromosome 21 that harbours the
OLIG1 and
OLIG2 genes. These genes are specifically expressed in the CNS, hence their dysregulation is potentially involved in Down Syndrome. A recent study in a mouse model of Down Syndrome confirmed that
Olig genes triplication indeed causes neurological phenotypes
[40]. Moreover,
OLIG2 deregulation has been associated with disorders such as schizophrenia
[41],
[42] and Alzheimer's disease
[43]. Our in situ hybridisation and chromosome conformation capture data support the hypothesis that enhancer 5F7 contributes to the expression pattern of
OLIG genes in the posterior diencephalon but could also be regulating other more distant genes. The specificity of this CNS transcriptional enhancer slightly differed between the human, chicken and mouse orthologous sequences. All three highlighted the posterior diencephalon and the neural tube. However, the human and chicken elements displayed very similar staining with a higher frequency of diencephalon staining and lower frequency of neural tube staining, relative to the mouse element. These differences could be explained by an inaccurate “reading” of foreign DNA fragments by the murine transcriptional machinery. However, the recent study of a transchromosomal mouse carrying human chromosome 21 showed that the foreign chromosome could be recognized and interpreted in the appropriate spatio-temporal manner by the host machinery. In hepatocytes of this mouse, the human chromosome was recognised by murine transcription factors to dictate accurate gene-expression despite the lack of conservation of certain DNA binding motifs; showing that adequate instructions to direct species-specific transcription must be embedded in the genetic sequence
[44]. Alternatively, the differences we observe could be evolutionarily relevant and represent species-specific differential regulation. A recent example of such differences was reported for an enhancer gaining a limb expression domain in human relative to chimpanzee
[45]. However, the high relatedness of the expression patterns induced by all three orthologous 5F7 elements strongly suggests a conserved role for this enhancer in the three species. Furthermore, the combination of all the patterns seen with the three enhancers includes all the patterns seen in
Olig1 and
Olig2 in situ hybridizations. For example, the less penetrant LacZ neural tube and hindbrain domains are visible in the
Olig2 in situ hybridization. How different activities of this enhancer are generated with respect to
Olig1 or
Olig2 at the original locus in different tissues is not known and could be dependant on other regulatory influences coming from additional
cis-acting elements.
It is becoming increasingly apparent that sequence conservation alone is not a sufficient criterion to predict all regulatory elements and that other features can facilitate the identification of functional sequences. For example, a recent study showed that p300 association accurately predicted tissue-specific activity of enhancers
[39], while evolutionary conservation of the three-dimensional structure of DNA has also been proposed as a marker of functional elements
[46]. It is likely that a combination of chromatin marks
[37],
[47], bound proteins
[39], DNaseI hypersentivity
[48], three-dimensional structure
[46] and sequence conservation criteria along with other yet unknown parameters will be required to improve the prediction of regulatory elements. The method we present here could be scaled up to cover large chromosomal regions, and determine what fraction of the conserved and non-conserved genome has regulatory potential.