One type of sequence that is prone to silencing by methylation is repetitive DNA. Endogenous regions carrying multiple blocks of similar sequences >1 kilobases (kb) in length, such as those found at centromeres or the D4Z4 and NBL2 microsatellite repeats of the human genome, are known to accumulate repressive chromatin marks including DNA methylation (
Kondo et al., 2000;
Miller et al., 1974;
Ponzetto-Zimmerman and Wolgemuth, 1984). Such silencing of repetitive sequences can have functional consequences. For example, silencing of the D4Z4 repeats was recently shown to repress expression of a polymorphic allele of a gene that would otherwise trigger the human disease Facioscapulohumeral muscular dystrophy (
Lemmers et al., 2010).
Studies using transgenic constructs newly introduced into the genome support the idea that the repetitive nature of a DNA sequence is a strong cue for silencing. In mouse and plants, transgenes that integrate into the genome as high copy number concatemeric arrays typically show decreased expression (
Davis and MacDonald, 1988;
Linn et al., 1990;
Mittelsten Scheid et al., 1991;
Robertson et al., 1995;
Sharpe et al., 1993). The link between silencing and repetitive DNA was elegantly demonstrated by Garrick et al. (
Garrick et al., 1998) who established a mouse line carrying approximately 100 repeats of an erythroid-specific LacZ transgene flanked by loxP sites. Initially, animals showed very low expression of LacZ in less than 1% of cells and a high accumulation of DNA methylation. However, when embryos were injected with Cre recombinase, the resultant mice carried a single copy of the transgene, which showed less methylation and, correspondingly, a more than 1000-fold increase in the number of cells expressing LacZ (
Garrick et al., 1998).
Short tandem repeats with unit lengths of less than 100 base pairs (bp) are also widespread in eukaryotic genomes (
Boby et al., 2005), and there is some evidence for their silencing. Short repeats that contain CpG dinucleotides, such as those associated with Fragile X syndrome and other trinucleotide expansion diseases, accumulate methylation that is correlated with reduced gene expression (
Oberle et al., 1991). Short tandem repeats may also be involved in the epigenetic regulation of imprinted genes, as they are often enriched in the surrounding DNA (
Hutter et al., 2006). However, the potential for short tandem repeats to accumulate epigenetic marks associated with silencing has not been explored in depth.
The Gal4/UAS regulatory system serves as a useful model for monitoring DNA methylation and transcriptional silencing of a short tandem repeat. In yeast, the Gal4 transcription factor binds to upstream activating sequences (UAS) to direct transcription of genes necessary for metabolism of galactose (
Giniger et al., 1985). Each UAS is 17 base pairs long, roughly palindromic, and in the form of CGG-N
11-CCG. The CpG dinuleotides are essential for Gal4 binding (
Marmorstein et al., 1992) and serve as a target for methylation (
Goll et al., 2009). The Gal4/UAS system was first adapted to zebrafish by Scheer and Campos-Ortega (
Scheer and Campos-Ortega, 1999), who assayed reporter expression under the control of 5 UAS copies (5X UAS). It was difficult to obtain high levels of expression from these constructs, most likely because they were integrated as large concatemers of multiple transgenes, which made them susceptible to silencing. To compensate for the low expression, Köster and Fraser (
Köster and Fraser, 2001) used the potent Gal4-VP16 fusion protein for transcriptional activation and modified constructs designed for over expression screens in
Drosophila that contained fourteen tandem copies of a synthetically generated upstream activating sequence (14X UAS) (
Rorth, 1996). While this approach resulted in robust expression, a high level of toxicity was observed and stable transgenic lines were not generated. Since this initial work, new technologies such as Tol2 transposition have become available that allow integration of transgenes as single copies, thereby eliminating the problems associated with insertions containing complex concatemeric arrays (
Kawakami et al., 2000). High levels of gene expression are obtained in transient embryo injection assays when Gal4-VP16 binds to the 14X UAS to promote transcription of the gene encoding green fluorescent protein (GFP) (
Köster and Fraser, 2001). However, when stably integrated into the genome as single copy sequence, the same 14X UAS is prone to CpG methylation. Transgenic embryos show variegated GFP expression that correlates with increased DNA methylation, and silenced transgenes can be reactivated in larvae with hypomethylated genomes (
Feng et al., 2010;
Goll et al., 2009). Strikingly, while there is minimal silencing in the first generation, it is exacerbated upon propagation through later generations (
Goll et al., 2009). Therefore, using the Gal4/UAS system, one can monitor the progression of methylation of short repeats and probe the cues that cause their silencing.
Silencing of UAS-regulated transgenes can be a technical challenge for the zebrafish field. This especially applies to studies of developmental processes that require all cells of a given population to express the UAS-regulated transgene, such as in genetic ablation of a specific cell type. The presence of DNA methylation machinery in fish and the associated variegation or silencing of gene expression is an impediment to creating the repertoire of powerful Gal4-based tools currently available for the Drosophila community.
Some efforts have been made toward optimizing the Gal4/UAS system for zebrafish. Using a luciferase-based assay in cultured zebrafish fibroblasts, Distel et al. demonstrated that expression from UAS constructs increased linearly from 1 to 5 UAS copies until leveling off, indicating that fewer than 14 copies of the UAS can provide an effective substrate for Gal4-VP16 in zebrafish cells and in transgenic animals (
Distel et al., 2009). In other work, stable transgenic lines carrying fluorescent reporter genes driven by 5 copies of the UAS were shown to produce strong labeling (
Asakawa et al., 2008;
Collins et al., 2010). However, these studies did not directly address the susceptibility of UAS variants to DNA methylation and transcriptional silencing over multiple generations.
We set out to test systematically how UAS sites with different copy number and sequence diversity behave
in vivo, by monitoring reporter expression in transgenic animals for three generations and correlating it with methylation at the UAS repeats. Four distinct Gal4 binding sites were placed in tandem and expression from this non-repeating construct (4Xnr UAS) was compared to the 14X UAS commonly used for many studies in zebrafish (for example: (
Campbell et al., 2007;
Davison et al., 2007;
Douglass et al., 2008;
Köster and Fraser, 2001;
Pisharath and Parsons, 2009;
Scott et al., 2007). We show that the 4Xnr UAS drives high levels of reporter expression and is significantly less susceptible to methylation than the 14X UAS. In addition, we find that silencing and methylation are enhanced when promoter-driven Gal4 is placed upstream of UAS-regulated responder genes in a bicistronic construct. Our findings suggest strategies for effective Gal4-regulated gene expression in transgenic zebrafish. Moreover, the results support the hypothesis that sequence or structural cues embedded in short tandem repeats attract DNA methylation and demonstrate the utility of the zebrafish for elucidating the specific nature of these cues in a live organism.