|Home | About | Journals | Submit | Contact Us | Français|
Beyond Mendelian inheritance, an understanding of the complexities and consequences of the transfer of nonhereditary information to successive generations is at an early stage. Such epigenetic functionality is exemplified by DNA methylation and, as genome-wide high-throughput methodologies emerge, is increasingly being considered in the context of conserved intragenic and intergenic CpG islands that function as alternate sites of transcription initiation. Here we characterize an intragenic CpG island in exon 2 of the protein-coding mouse Klf1 gene, from which clustered transcription initiation sites yield positive-strand, severely truncated, capped and spliced RNAs. Expression from this CpG island in the testis begins between Postnatal Days 14–20, increases during development, and is temporally correlated with the maturation of secondary spermatocytes as they become the dominant cell population in the seminiferous epithelium. Only full-length KLF1-encoding mRNAs are detected in the hematopoietic tissue, spleen; thus, expression from the exon 2 CpG island is both developmentally regulated and tissue restricted. DNA methylation analysis indicates that spatiotemporal expression from the Klf1 CpG island is not associated with hypermethylation. Finally, our computational analysis from multiple species confirms intragenic transcription initiation and indicates that the KLF1 CpG island is evolutionarily conserved. Currently we have no evidence that these truncated RNAs can be translated via nonconventional mechanisms such as in-frame, conserved non-AUG-dependent Kozak consensus sequences; however, high-quality carboxyl-terminal antibodies will more effectively address this issue.
Contrary to initial expectations, completion of the human genome sequence and the development of high-throughput genome sequencing of multiple species has actually expanded the scope of efforts needed to develop a comprehensive understanding of gene regulation because it has increased our awareness of the diversity and complexity of transcription and epigenetic regulation. For example, high-throughput sequencing has revealed that the genomes of higher organisms are transcribed mainly into noncoding RNAs (ncRNAs) that regulate numerous essential cellular processes . This sequencing technology has also sparked new interest in short hypomethylated regions of the genome that are enriched in CG dinucleotides (CpG islands [CGI]) [2–6].
From early studies, CGIs were thought to be confined to active or permissive gene promoters, and methylation of CG dinucleotides was viewed as a simple mechanism by which expression could be permanently suppressed at imprinted loci or across broad chromosomal regions such as the X chromosome. However, recent advances in the field show that DNA methylation is far more complex; it is developmentally regulated, reversible, and tissue- or cell-type specific, and altered methylation patterns are involved in the pathophysiology of human diseases, including cancer [2, 6–12]. Furthermore, approximately half of the CGIs found in the genomes of higher mammals are distant from gene promoters and are evolutionarily conserved [4, 5]. These intragenic and intergenic CGIs are often sites of transcription initiation and alternative splicing and generate transcripts that do not contain open reading frames for protein synthesis. Rather, such ncRNAs appear to regulate the expression of other genes or specify chromosomal domains (reviewed in Jones  and Deaton and Bird ).
Herein, we characterize testis expression of the mouse erythroid Kruppel-like factor (Klf1) gene, including a conserved intragenic CGI in exon 2. This CGI is permissive for transcription initiation that occurs at several clustered sites, as is typical for orphan CGIs . In addition, expression of the resulting capped and spliced RNAs is both developmentally regulated and tissue restricted. An apparent lack of protein-coding capacity stems from severe 5′ truncation compared to the canonical Klf1 mRNA and the absence of an in-frame AUG codon. However, these short RNAs may have protein-coding potential at several conserved non-AUG-dependent Kozak consensus sites , as has been demonstrated for other transcription factors .
Expression of the Klf family of genes in testis is not without precedent, and several family members have attracted considerable attention for their roles in germ cell biology (reviewed in Nandan and Yang ). However, the functions of the encoded proteins are not always clear. For example, KLF5 is expressed in primordial germ cells during rodent embryogenesis, but its function in these cells has not been established . KLF4 is expressed in round spermatids in adult tissue , although it is not required for spermatogenesis . Expression of KLF4 also has been studied in the Sertoli cell TM4 cell line in vitro, where it is an immediate-early factor and is thought to regulate expression of the tight junction protein CLMP , as well as in vivo, where it may be involved in differentiation but is ultimately not required for adult Sertoli cell function .
Expression of a truncated sense-strand Klf1 transcript in testis has been known for many years, but its function and molecular identity have remained uncharacterized. We have determined that expression correlates with the differentiation and expansion of the germ cell lineage beginning around 20 days of age. We also show low-level expression of canonical Klf1 mRNA throughout testis development in Sertoli cells and possibly spermatogonia. The function of this protein-coding isoform of the Klf1 gene has been studied extensively in erythroid tissues such as spleen, in which transcription initiation occurs at exon 1 to give rise to an mRNA encoding the full-length KLF1 protein ; however, its expression has not been systematically examined outside of erythroid lineages. KLF1 is a developmentally regulated C2H2-class zinc finger transcription factor that binds to asymmetric sites on DNA, probably as a monomer via its three zinc fingers , and plays pivotal roles in the terminal differentiation of hematopoietic cells and in regulation of the HBB gene in adults (reviewed in Siatecka and Bieker ).
All experiments involving mice were carried out in compliance with Department of Laboratory and Animal Sciences guidelines at Wayne State University under the auspices of our IACUC protocol approved by the Wayne State University Animal Investigation Committee. Mice were maintained in-house on an intercrossed hybrid background comprising three inbred strains, C57Bl/6J and (129X1/SvJ and 129S1/Sv-+p+Tyr-cKitlSl-J/+). Interrogation of dbSNP, build 128 (http://www.ncbi.nlm.nih.gov/projects/SNP), to identify single-nucleotide polymorphisms in reference inbred mouse strains was performed using the Mouse Genome Informatics SNP query form, version 5.0.10 (http://www.informatics.jax.org/strains_SNPs.shtml).
Testes were rapidly dissected, rinsed briefly in PBS (pH 7.4), and frozen in 13-ml Falcon tubes (Fisher Scientific, Fair Lawn, NJ) on dry ice. To purify total RNA, frozen testes were homogenized using an ULTRA-TURRAX model T 50, fitted with a microprobe (IKA Works Inc., Wilmington, NC) for 1 min in 2 ml of 4 M guanidinium isothiocyanate containing 0.5% sodium lauroyl sarcosine (Sigma-Aldrich, St. Louis, MO) and 0.1 M mercaptoethanol added freshly. Homogenates were briefly centrifuged at 3000 × g for 5 min, and the supernates were overlaid onto 3-ml columns of 5.7 M CsCl in SW55 tubes (Beckman Coulter, Brea, CA) for centrifugation at 35000 rpm for 20 h at 22°C. Northern blots were generated using 10 μg total RNA samples run on 1% formaldehyde-agarose gels  and probed as detailed previously . Randomly primed (New England Biolabs, Ipswich, MA), 32P-labeled probes used were a 1.5-kb full-length mouse Klf1 cDNA, a 0.8-kb Nco I-Nae I fragment of the Klf1 transactivation domain containing exon 1 and part of exon 2, a 0.25-kb Sac I-Bam HI fragment of the Klf1 zinc finger domain containing part of exons 2 and 3, and full-length mouse Actb (β-actin) cDNA.
Spleen and testis were dissected from P42 mice and rapidly frozen on dry ice for storage. Frozen tissues were thawed and homogenized using an ULTRA-TURRAX for 1 min at room temperature in Promega (Madison, WI) cell lysis buffer containing 1% Triton X-100, 2 mM EDTA, 0.5 mM PMSF, protease inhibitors (Sigma-Aldrich; P8340; concentration as recommended), and benzonase (1:400 dilution; EMD-Millipore, Billerica, MA) in 10 mM Tris-buffered saline (pH 7.4). Protein concentrations were measured using Bradford (Bio-Rad, Hercules, CA), and 30 (spleen) or 100 (testis) μg of protein per lane were electrophoresed on 10% or 15% SDS-polyacrylamide gels . Proteins were transferred to Protran membranes (Fisher) at 100 V for 1 h, blocked with 5% Nestlé Carnation instant nonfat dry milk (http://www.amazon.com) in PBS, and incubated with primary antibodies in 2.5% dry milk in PBS overnight. Secondary antibodies in 2.5% dry milk in PBS were incubated for 1 h and developed on x-ray film using Pierce ECL (Fisher) as recommended.
Testis were dissected from P42 mice, punctured through the poles with an 18-gauge hypodermic needle, and immersion fixed in Bouin reagent (Fisher) overnight at room temperature and then washed for several days in PBS with multiple changes until the yellow color ceased to leach from the tissue. Samples were dehydrated in an ethanol series into xylene over several hours, cut in half at the equator, and embedded in paraffin (Fisher). Seven-micrometer sections were cut, mounted on Superplus slides (Fisher), dewaxed in xylene, and rehydrated through ethanol into PBS. Sections were blocked in 1% tyramide blocking solution (Life Technologies, Grand Island, NY) for 1 h and incubated in primary antibodies overnight and secondary antibodies for 3 h. After staining with DAPI (Invitrogen, Carlsbad, CA) for 10 min, the sections were coverslipped with Prolong Gold antifade (Invitrogen) for fluorescence microscopy. Digital photomicrographs were captured using a DM5500 microscope (Leica Microsystems Inc., Buffalo Grove, IL) and Hamamatsu (Bridgewater, NJ) ORCA-R2 camera. Images were processed in Photoshop CS5 (Adobe, San Jose, CA) by assembling the 16-bit TIFF images and their cognate negative controls into a single layer and then simultaneously adjusting the contrast of the image set followed by resizing and conversion to eight-bit images for figures.
The following primary antibodies were used in this study: mouse anti-KLF1 IgG1 (6B3; raised against amino acids 20–60, diluted 1:200 for immunocytochemistry and 1:2000 for Western blots; kind gift from Dr. James Bieker, Mount Sinai School of Medicine, New York, NY); rabbit anti-KLF1 (LS-C46845; diluted 1:200 for immunocytochemistry and 1:1000 for Western blots; Lifespan Biosciences, Seattle, WA); goat anti-GATA4 (sc-1237; diluted 1:200 for immunocytochemistry; Santa Cruz Biotechnology, Santa Cruz, CA). Anti-goat or anti-mouse Alexa secondary antibodies used for immunocytochemistry (Southern Biotechnology, Birmingham, AL) were diluted 1:1000, and anti-mouse-HRP conjugates were diluted 1:10000 for Western blots. A Cell Signaling (Danvers, MA) anti-rabbit HRP conjugate antibody for Western blots was diluted 1:2000.
Additional anti-KLF1 antibodies tested in the current project that did not detect KLF1 on testis Western blots or paraffin sections include two monoclonal mouse antibodies, 7B2 and 4B9, which recognize the amino terminus of KLF1 (gifts from Dr. James Bieker); four polyclonal antibodies from Santa Cruz, H210 (sc-14034), C12 (sc-27139), F20 (sc-27194), and L20 (sc-27195), were also tested on Western blots and paraffin sections without success. Three of these antibodies recognize the amino terminal 150 amino acids of KLF1, while the C12 antibody is carboxyl terminal.
For 5′ RACE and cloning of Klf1 transcripts, we used GeneRacer (Invitrogen) according to the manufacturer's instructions. Cesium chloride-purified total RNA from P45 mouse testis was used for priming with the antisense primer 5′-GAG CGA GCG AAC CTC CAG TCA CA-3′ (Invitrogen) in exon 3 of the Klf1 gene. After annealing the 5′ RACE primer and PCR amplification, amplicons were TA cloned into the pCR II vector (Invitrogen), and 14 insert-containing clones were sequenced in one direction. These sequences were aligned to the RefSeq mouse Klf1 cDNA using the Sequence Conformation software application from MacVector version 11.0.1 (Accelrys Software, San Diego, CA).
The BLAT (BLAST-Like Alignment Tool) functionality of the UCSC Genome Browser [29, 30] was used to align the 5′ RACE products directly to the mouse genome. The clones were mapped in the UCSC Genome Browser within the context of an integrated view of their localization relative to the known Klf1 gene structure and publicly available evidence of Klf1 transcription, including full-length cDNA clones and expressed sequence tags (ESTs) from GenBank.
The reference sequences (RefSeq) of mouse and human Klf1 used in the current study were retrieved from the Entrez Gene search function of NCBI (http://www.ncbi.nlm.nih.gov) as NM_010635.2 and NM_006563.3. The mouse reference protein sequence, NP_034765.2 (http://www.ncbi.nlm.nih.gov/protein/225543580), was displayed using the Conserved Domains option to localize the zinc finger (ZnF) domain. The sequence length of 376 amino acids was identical to the published sequence .
The mouse RefSeq primary structure was identical to the Miller and Bieker  sequence except for three amino acid substitutions, two of which were documented mouse SNPs, rs33450402 (outside of the ZnF domains) and rs32936102 (inside the first ZnF domain encoded by exon 2), and one of which was not supported by known SNPs and was not pursued because it may have resulted from an error in the original sequence .
The UCSC Genome Browser  (http://genome.ucsc.edu) was used to visualize the RefSeq KLF1 gene structure from multiple species (exons, introns, direction along the genome assembly), special tracks for DNA methylation, histone binding, CGI annotations, and transcriptome support. All coordinates and select feature locations in Figures 4 and 5, including CGI and EST mappings, were obtained from the Browser using the mm9 mouse genome and the hg19 human genome assemblies.
The FANTOM4  mouse genome viewer release 2009/03/25 (http://fantom.gsc.riken.jp/4/gev/gbrowse/mm9) was used to search for CAGE evidence of Klf1 intragenic transcription initiation from nonredundant FANTOM3/4 CAGE transcription clusters (TCs). The “LiftOver TC from mm5” track (Supplemental Fig. S1; all Supplemental Data are available online at www.biolreprod.org) is linked to the mm5 FANTOM3 CAGE Basic Viewer (http://gerg01.gsc.riken.jp/cage). The FANTOM3 mouse transcriptome project is the principle source of the data used in the current study (Supplemental Fig. S1 and Supplemental Table S1). Importantly, FANTOM [31–33] is the most comprehensive, highest-quality, and best spatiotemporally defined collection of sequence-verified transcriptome libraries from any mammal . Although not rigorously quantitative, this data set provides valuable and specific support for utilization of specific TSSs.
We extracted the FASTA sequence of the mouse Klf1 RefSeq exon 2 (mm9 genomic coordinates 87426588–87427401) and used it as a query in an NCBI BLASTN search (http://www.ncbi.nlm.nih.gov/blast) of the “human ESTs” and “other ESTs” (non-mouse) subsets of dbEST. We manually curated all 5′ EST matches for Supplemental Table S2 to confirm that they corresponded to Klf1 orthologs and not to paralogous genes by using reciprocal BLASTN and/or cross-species BLAT to establish Klf1 as the best match of all such non-mouse EST sequences in the mouse genome .
Testes were rapidly dissected from mice, rinsed in PBS, cut in half at the equator, and coarsely minced with a razor to facilitate digestion with 0.5 mg/ml proteinase K (Sigma) in 50 mM Tris buffer (pH 8) containing, 0.5% sodium dodecyl sulfate (Fisher) and 0.1 M EDTA (Fisher). Genomic DNA was purified from these homogenates using phenol:chloroform:isoamyl alcohol (Fisher) extractions followed by sodium acetate precipitation in ethanol and resolubilization in TE buffer at 1 μg/ml.
Methylation of DNA samples was determined by Southern blot using a method loosely based on the HELP assay . Genomic DNAs purified from testes were restriction digested with Eco RI and Bam HI (New England Biolabs) at 37°C for 4–6 h. DNA samples were then either restriction digested with Hpa II (New England Biolabs) for 4–6 h or methylated in vitro using Hpa II methyltransferase (New England Biolabs) and digested with Hpa II. Alternatively, the Eco RI-Bam HI-digested samples were further digested with Aci I, Fau I, or Hha I enzymes (New England Biolabs) for 4–6 h.
Eight- to 10-μg aliquots of these processed samples were run on 1% agarose gels overnight and transferred overnight to Nytran N or Nytran N supercharged membranes (Whatman plc, Maidstone, Kent, U.K.) for Southern blot, as previously described . Blots were probed using a randomly primed (New England Biolabs), 32P-dCTP (PerkinElmer, Waltham, MA) labeled 1.1-kb fragment of the mouse Klf1 gene containing the entire predicted CGI in exon 2 of the mouse Klf1 gene. This DNA fragment was generated by PCR using the sense/antisense primer pair 5′-ACC GTG TGG GTA AAT GAC-3′ and 5′-AGT CAC TGC TAT CCA CCT-3′ (Invitrogen) and purified from an agarose gel using QIAquick gel extraction (QIAGEN Sciences, Germantown, MD).
In view of our previous analyses in Cldn11-null and Tg (Cldn11)605Gow mice [37–39], we have become interested in the transcriptional regulation of this gene in testis. As part of a computational analysis of the promoter/enhancer region of the Cldn11 gene in mice , we identified potential KLF1 DNA binding sites within the 2 kb of the Cldn11 TSS known to regulate testis expression, which has prompted us to examine the developmental expression of this transcription factor.
Northern blots revealed a developmentally regulated RNA from the Klf1 gene that is roughly half the size of the archetypal message in other tissues, such as spleen (Fig. 1A). These data confirm earlier evidence of a truncated tissue-restricted isoform of Klf1 RNA in wild-type mice . Because of the possibility that KLF1 regulates expression of the mouse Cldn11 gene, we have explored the molecular identity, expression profile, epigenetic regulation, phylogenetic conservation, and protein-encoding potential of this truncated RNA species.
To extend the previous report of Klf1 expression in testis , which demonstrated the exclusion of exon 1 from the truncated Klf1 RNA in adult tissue, we probed developmental Northern blots of mouse testis whole RNA from the day of birth (P1) to 6 wk of age (P42) with cDNAs making up different regions of the Klf1 cDNA (Fig. 1A). Expression of the major truncated isoform begins around P20 and increases to plateau intensity by P28 (Fig. 1B). However, normalization of the signal at each age to Actb expression indicates that Klf1 expression increases steadily from P20 to P42 (at least two independent samples per age).
The hybridization signal at P28 and beyond is well defined, which contrasts with the diffuse or heterogeneous size of Klf1 message at P20. Such heterogeneity may reflect limited degradation of the sample, but this seems unlikely in view of the clean Actb signal at this age. Alternatively, splice isoforms or multiple intragenic transcription start sites (TSSs) within the Klf1 gene could account for the diffuse signal.
Spleen total RNA contains the canonical Klf1 mRNA at approximately 2 kb and lacks a clearly defined shorter transcript, which indicates that the 0.8-kb RNA is tissue restricted. Probing blots with a cDNA making up the KLF1 transactivation domain encoded by exon 1 and the 5′ end of exon 2 fails to reveal the truncated RNA in testis but does yield the 2-kb signal in spleen; thus, the major testis isoform reflects transcription beginning from the central region of this 3 exon-containing gene.
Full-length Klf1 mRNA is visible in testis samples on the developmental Northern blot. Expression is barely detected from P6 and increases after P14 to a maximum around P20. This age corresponds to the first spermatogenic wave of differentiating germ line cells as they begin migration into the luminal compartment of the seminiferous tubules. Thereafter, full-length Klf1 mRNA expression appears to decrease by P28; however, after normalization using the Actb signal, Klf1 expression is likely constant into adulthood or may slightly increase.
Consistent with Northern blots showing the canonical Klf1 mRNA, Western blots probed with either of two anti-KLF1 antibodies, 6B3 and LS-C46845, detect the full-length protein in the spleen (positive control) and P42 whole testis (Fig. 1C). Although KLF1 is a 40-kD protein, it typically migrates on denaturing gels as a doublet at 37 kD. The relative abundance of KLF1 in the upper and lower doublet bands differs between spleen and testis, which may reflect tissue-specific differences in posttranslational phosphorylation and/or acetylation of the protein (reviewed in Siatecka and Bieker ). No bands are labeled from either tissue if the primary antibody incubation step is omitted (No Ab lanes), and neither of the antibodies appears to cross-react with a related 65-kD protein, GKLF/KLF4, which is developmentally expressed in germ line and Sertoli cells in testis .
To determine which cells in testis express KLF1, we labeled paraffin sections from P42 mice with both anti-KLF1 antibodies used for Western blots. At low magnification (Fig. 2A), cells at the basement membrane of most seminiferous tubules are labeled with the 6B3 amino-terminal antibody against KLF1 (green). Antibodies against GATA 4 (red), which is a well-characterized Sertoli cell marker , label the small number of these cells near the basement membrane in each tubule, and this labeling colocalizes with 67B3 (arrowheads). This antibody also labels immature spermatids probably by binding nonspecifically to the acrosomes in many tubules (arrows). The high-magnification inset shows a Sertoli cell (left) strongly expressing GATA 4 and KLF1 as well as two spermatogonia weakly expressing KLF1. Several surrounding cells are unlabeled. The carboxyl-terminal anti-KLF1 antibody yields similar results to 6B3, but staining quality is inferior because of a lower signal-to-noise ratio (data not shown).
The black-and-white lower panels (Fig. 2B) show portions of the basal region of seminiferous tubules labeled with DAPI (upper panels), antibodies against KLF1 (middle left), and GATA 4 (lower left). Four Sertoli cells identified from their morphology by DAPI and GATA 4 staining are clearly KLF1 positive (arrowheads). Sertoli cells are unlabeled when sections are incubated with secondary antibodies without prior primary antibody incubation (right panels). Thus, together, these data indicate that KLF1 is expressed by Sertoli cells and spermatogonia.
A capped, 5′ RACE analysis of testis total RNA (P42) using an antisense primer near the 5′ end of exon 3 yielded 14 Klf1 clones, all of which are canonically spliced to remove intron 2. The TSSs inferred by DNA sequencing from 13 of these clones localize to two clusters within 80 base pairs (bp) in exon 2 (Fig. 3).
The upstream cluster represented by clones 1–7 corresponds to nucleotides 715–735 of the NCBI RefSeq for Klf1 cDNA (NM_010635.2) and the downstream cluster (clones 8–13) to nucleotides 771–792. Because the size of the RefSeq cDNA from these positions to the beginning of the polyA+ tail approximates 0.8 kb, the genomic location of intragenic TSSs in exon 2 is consistent with the Northern data in Figure 1. The TSS of the singleton 14th clone maps to nucleotide 923 (not shown), which is located at the beginning of the first zinc finger domain close to the 3′ end of exon 2.
We observe several sequence differences between the 5′ RACE products of the upstream cluster and the Klf1 RefSeq cDNA . The triple nucleotide changes G768C, G769C, and G770C observed in clones 1–7 result in the amino acid change, G241P. These changes are consistent with the current published mouse genomic sequence (mm9) and also are present in multiple species from marmoset to human, thereby likely representing the correct nucleotide sequence and identifying a sequence error in the Klf1 RefSeq data .
Further downstream (data not shown), a nonsynonymous single-nucleotide polymorphism (SNP) is present in 13 of 13 5′ RACE products. This previously identified SNP (rs32936102) is a T960A substitution in the reference Klf1 cDNA (NM_010635.2) and is found in two inbred mouse strains, C57BL/6J and 129X1/SvJ (build 128 of the dbSNP database [http://www.ncbi.nlm.nih.gov/projects/SNP]). The presence of this SNP in all 13 5′ RACE clones suggests homogeneity at this allele and, therefore, that both copies of the Klf1 gene in our colony are derived from either or both of the C57BL/6J and 129X1/SvJ inbred strains because 129S1/Sv does not carry this SNP.
We also find a hemizygous C737T SNP in 5′ RACE clones 4–7. This SNP is not present in the C57BL/6J, 129X1/SvJ, or 129S1/Sv reference strains or, indeed, in any of the other inbred strains included in dbSNP and must be a de novo mutation in our colony. Importantly, this change does not alter the primary sequence of KLF1 and is likely to be functionally neutral for this protein.
In light of canonical splicing of the 5′ RACE clones, the first open reading frame (ORF) is in exon 3 and is 278 bases downstream of the 5′-cap for the most common TSS in the upstream cluster. This ORF encodes a 17-amino-acid peptide within the canonical KLF1 ORF but is out of frame. An additional ORF, comprising 52 codons, is located in the 3′-untranslated region downstream of the canonical KLF1 ORF. Although the abundance and developmental regulation of the 0.8-kb RNA in testis suggests that expression does not arise from indiscriminant transcription, the polypeptides theoretically encoded by these RNAs do not conform to conserved protein domains in NCBI databases, do not share homology with other gene products, and are too short to fit the generally accepted definitions of protein-coding capacity beyond conservation metrics . Furthermore, the 5′ RACE-supported TSSs that give rise to these transcripts are not likely to be pseudogene derived and, rather, map uniquely to chromosome 8 (chr8) between nucleotides 87427114 and 87427190 of the mm9 genome assembly.
The nucleotide sequence flanking the upstream TSS cluster (−10 to +10 sequence) is highly conserved in placental mammals, suggesting that it may be active in multiple species (Fig. 4). This is apparent in the Mammalian Cons multiZ alignment histogram (score = 1) assembled from alignments of 30 marine and terrestrial vertebrate genomic sequences  and spanning more than 400 million years of evolution. One exception is the de novo C/T polymorphism in our colony (Fig. 3), three nucleotides downstream of the most abundant 5′ RACE TSS cluster (red arrowhead). The ancestral base is C. In addition, the sixth nucleotide downstream of this TSS is substituted in mice and rats from the ancestral C to a T. Finally, the 10th nucleotide upstream of the TSS is changed from the ancestral C to a G in rodents. These polymorphisms are silent at the amino acid level, but we do not know if they have functional consequences for intragenic transcription in testis.
In conjunction with our 5′ RACE data demonstrating internal initiation of a 5′-truncated Klf1 RNA in testis, a 5′ EST, CB273655 from a mouse round spermatid library corresponds to an orphan TSS within exon 2 in proximity to our 5′ RACE products (nucleotide 919 of the mouse Klf1 RefSeq). This TSS is independently supported by two other ESTs from an embryo (13.5–14.5 dpc) library, and a second orphan EST, CX059853, from a pig testis library maps to the upstream 5′ RACE TSS cluster (nucleotide 732 of the mouse Klf1 RefSeq). Neither EST contains a translation start site in frame with KLF1.
A multispecies alignment to the human KLF1 gene (UCSC Genome Browser) reveals 17 EST clones from a range of tissues in six placental mammal species with apparent TSSs in exon 2 (Supplemental Table S2). Five of these are in the vicinity of our 5′ RACE products, and a rat EST maps close to the 5′ end of exon 2. None of these ESTs include either an in-frame ATG or an out-of-frame ORF greater than 40 codons. However, two sheep ESTs map closer to the 5′ end of exon 2 and include weak Kozak  consensus codons with potential for translation of the KLF1 zinc finger domain. Thus, there is little overall evidence that canonical translation of exon 2-initiated transcripts would yield functional KLF1 protein isoforms across multiple species.
Independent of the 5′ RACE and public 5′ EST data sets, the FANTOM Consortium Cap Analysis of Gene Expression (CAGE) resource, versions 3  and 4 [31, 33], identifies five singleton-tag sites supporting sense strand exon 2 TSSs in Klf1 (Supplemental Fig. S1 and Supplemental Table S1). Two sites map to the TSS of the round spermatid EST, CB273655, and another maps within 17 bp of a human forebrain TSS . Thus, our 5′ RACE data, public ESTs, and FANTOM CAGE data jointly support extensive intragenic TSS heterogeneity within the Klf1 gene from multiple species.
In light of intragenic transcription from the Klf1 gene in mouse testis, it is of interest that exon 2 is entirely overlapped by a CGI, which also extends more than 100 bp into the 5′ end of intron 2. Assuming that this CGI and its associated TSSs are biologically relevant, both features might be expected to be evolutionarily conserved. Indeed, our bioinformatics analysis has identified multiple TSSs in exon 2 in different tissues from six placental mammal species, and Klf1 orthologs from dog to human harbor the exon 2 CGI of approximately 1 kbp. In all cases, the ratio of observed to expected CG dinucleotides within the CGI is between 0.79 and 0.81, which is significantly above the threshold ratio of 0.65 used to define CGIs at the 5′ end of genes while excluding common repetitive elements . In addition, the CGI in all of these species except rat extends 80–350 bp into the 5′ end of intron 2, which suggests that the maintenance of this feature is not simply a consequence of evolutionary constraint on the CGI-rich protein-coding sequence in exon 2. Together with deep-sequencing and EST data for exon 2 TSSs, evolutionary conservation of the CGI beyond the boundaries of exon 2 suggests that truncated Klf1 RNAs are functional, even if their biological activity is unknown.
To determine if the Klf1 CGI is methylated in testis during development and might account for its temporally regulated expression, we purified DNA from P5 to P45 wild-type mouse testes for digestion with the restriction endonucleases, Eco RI and Bam HI, followed by the methylation-sensitive enzyme, Hpa II. Controls were either left undigested with Hpa II or methylated in vitro prior to Hpa II digestion and then electrophoresed for Southern blot. The schematic in Figure 6A shows mouse Klf1 exon 2 and flanking introns, the locations of salient endonuclease sites, the CGI, and the DNA fragment used for hybridization to Southern blots of the digested DNAs. From the schematic, Hpa II recognition sequences form four clusters of one or more sites in the CGI. The predicted sizes of the major fragments (5′–3′) are 404, 187, 283, 174, and 598 bp and match those observed on the Southern blots. All of the remaining fragments are less than 40 bp in length and are not resolved on Southern blots.
Eco RI/Bam HI/Hpa II triple-digested testis DNAs from P5 to 45 (Fig. 6B, left panel) show four hybridization signals of approximately 600, 400, 300, and 200 bp, which correspond to the predicted sizes of the major triple enzyme-restricted fragments. These bands are also present in P45 spleen DNA (P45 Spl). We do not detect appreciable hybridization signals of larger DNA fragments on the blots, indicating that the CGI is likely hypomethylated in virtually all cells in the testis at all postnatal developmental ages. As expected, the Eco RI/Bam HI double-digested DNAs show unique hybridization at 1.85 kb (center panel), as do control DNAs from several ages that were methylated prior to triple enzyme digestion (right panel).
We performed additional methylation experiments (Supplemental Fig. S2) using triple Eco RI/Bam HI digests with Aci I (left panel), Fau I (center panel), or Hha I (right panel) and obtained similar results for all ages that the Klf1 CGI is hypomethylated in the testes samples during development. Together, the four methylation-sensitive enzymes interrogate 50% (46/92) of the CG dinucleotides in the exon 2 CGI (Supplemental Fig. S3), including 59% (19/32) of the mouse-human conserved CGs; thus, we conclude that the Klf1 exon 2 CGI is hypomethylated in virtually all cells at all stages of testis development as well as in young adult spleen. This finding is consistent with the deep-sequencing study by Maunakea et al. , who found that the KLF1 CGI in cells from human forebrain cortex is essentially unmethylated.
While considering the contribution of KLF1 expression to regulation of the Cldn11 gene in mouse testis, we encountered diffuse transcription initiation in exon 2 of the Klf1 gene. In previous analyses of erythroid tissue and a number of bone marrow-derived stem cell lines, the exon 1 TSS has been the only site identified [23, 40]. However, transcripts of 0.8–0.9 kb in length were previously observed in testis by Lingrel et al. . In the current study, we have determined the intragenic TSSs of these RNAs in exon 2 and have characterized their temporal expression during postnatal testis development. In addition, we observe full-length Klf1 transcripts in testis as well as protein expression in Sertoli cells and spermatogonia.
Mouse transcriptome support of the exon 2 Klf1 TSSs is not limited to testis; FANTOM CAGE data (Supplemental Fig. S1 and Supplemental Table S1), which are high-quality, rigorously characterized data sets derived from direct sequencing, support intragenic TSSs in bone marrow, liver, cerebellum, and lung. Together with deep-sequencing and Northern blot evidence of intragenic initiation, this multitissue expression profile supports our contention that truncated Klf1 RNAs arise from specific, nonstochastic intragenic initiation. Furthermore, the absence of these RNA species in spleen total RNA (this study and Anderson et al. ) indicates that intragenic transcription from the Klf1 CGI is tissue restricted. Finally, our Southern blot analysis suggests that expression of these truncated, noncoding transcripts are not associated with developmentally regulated or tissue-specific hypermethylation of the overlapping CGI. Nevertheless, we have examined 50% of the CpGs in the CGI and cannot exclude the possibility that low-level or site-specific methylation may regulate the developmental expression of Klf1.
Several lines of evidence presented here argue for intragenic transcription initiation of the Klf1 gene in testis. First, we demonstrate developmentally regulated expression of a 0.8- to 0.9-kb transcript lacking exon 1 of the gene using Northern blot (Fig. 1). Second, because our 5′ RACE reaction is based on cap trapping and 5′ cap addition to primary transcripts occurs cotranscriptionally for short genes (reviewed in Cowling ), it is likely that our cloned cDNAs arise from the TSSs of a bona fide TATA-less promoter rather than RNA from truncation artifacts. Third, in silico prediction of an evolutionarily conserved intragenic CGI overlying exon 2 of KLF1 in placental mammals from dog to human, as well as methylation-dependent cloning of this region from human blood cell genomic DNA , is suggestive of function. Fourth, in silico alignment to the Klf1 gene of EST clusters from multiple libraries, tissues, and species indicates that exon 2 TSS clusters are evolutionarily conserved, even though they do not generally contain ORFs of substantial length or homology to KLF1 (Supplemental Table S2).
Two recently published genome-wide analyses also lend support to the existence of exon 2 intragenic TSSs in the KLF1 gene from multiple species. In one study , a digital DNAse hypersensitivity analysis of human epiderm-, mesoderm-, and endoderm-derived ENCODE cell lines (UCSC Genome Browser track: Digital DNAse I Hypersensitivity, hg19) reveals an open chromatin conformation across the 3′ end of exon 2 in the KLF1 gene for most of the 74 cell lines tested in duplicate, which is consistent with transcriptional initiation of truncated transcripts. A fine-scale analysis of the DNAse I hypersensitivity region supports its extension through intron 2 and until the start of exon 3, coincident with the 5′ end (putative TSS) of the EST, W57216.
In the second study of adult human forebrain chromatin , histone 3 lysine 4 trimethylation (H3K4Me3) status, which is a well-established marker of transcriptional activity at canonical promoters, is indicative of transcription initiation across the 3′ end of exon 2 of KLF1 (refer to Worksheet 2 of Supplemental File 1 from Maunakea et al.  at hg18 chr19:12857086–12858039). In addition, RNA-seq analysis in this tissue reveals a TSS centered at nucleotide 647 of the KLF1 mRNA (NM_006563.3), which corresponds to nucleotide 667 of mouse Klf1 (NM_010635.2) only 68 bp upstream of the most abundant 5′ RACE cluster. Together, these studies provide evidence that KLF1 exon 2 is characterized by a chromatin conformation that is consistent with transcriptional initiation and that truncated mRNAs from this gene arise in at least two tissues from different species.
It is unlikely that functional proteins could be generated from AUG-dependent initiator codons, particularly for those with homology to KLF1, although a possible exception is found in sheep. However, there is potential for noncanonical translation of a family of amino terminally truncated KLF1 isoforms in multiple species from dog to human. Supplemental Figure S4 shows four conserved in-frame non-AUG-dependent initiator codons in exon 2, downstream of the TSSs identified in the current study (Fig. 5B), that could initiate translation. With regard to our 5′ RACE analysis, the major TSS at nucleotide 735 (Fig. 3) would lead to translation initiation at nucleotide 813, and the resulting protein isoform would be 13.4 kD and would include the nuclear localization signals at amino acids 275–296/293–376  as well as the full-length zinc finger domain for binding to its CCN CNC CCN target sites in DNA.
In light of functional studies to characterize the KLF1 transactivation domain (reviewed in Siatecka and Bieker ), the properties of the short isoforms of KLF1 could differ significantly from the full-length protein in several respects. First, short isoforms would have a longer half-life in the cell because they lack two amino-terminal PEST sequences . Second, they would be comparatively weak transcriptional activators because they include a critical acetylation site at lysine 288 for interacting with SWI/SNF-related complexes but lack a regulatory phosphorylation site at threonine 41. Finally, the truncated proteins would be relatively weak repressors because they include the acetylation site at lysine 302 for binding to Sin3A/HDAC1 but lack the sumoylation site at lysine 74 that can bind to the NuRD inhibitor complex. Thus, because KLF1 probably interacts with its target binding sites on DNA as a monomer , the greater stability of the short isoforms could enable them to competitively inhibit the recruitment of full-length KLF1 to transcriptional complexes and, thereby, influence gene expression.
Despite the potential for short non-AUG-dependent KLF1s in testis, we cannot detect these proteins on Western blots from testis or spleen. Three monoclonal antibodies raised against full-length KLF1 and the approximate binding sites have been mapped by Western blot of truncated recombinant proteins from bacteria [50, 51]. Only the 6B3 antibody recognizes KLF1 from testis homogenates (data not shown), and the epitope lies within the amino-terminal 60 amino acids of KLF1. Accordingly, this antibody recognizes the full-length protein but not theoretical truncated isoforms. The commercial carboxyl-terminal antibody (LS) used in this study also recognizes the full-length protein but does not reveal any proteins approximating 13 kD.
An aspect of our study of potential interest is the developmental activation of Klf1 exon 2 TSS clusters from P20 (Fig. 1) to adulthood, which is concomitant with increasing abundance of round and condensed spermatids (Fig. 7). Germ line cells make up 75% of cells in the seminiferous epithelium at the beginning of this developmental period  and rise to 95% by P45 and beyond , with relatively transcriptionally inactive meiotic spermatocytes accounting for the vast majority. The temporal correlation between truncated Klf1 RNA induction and spermatocyte dominance of the seminiferous epithelium during development suggests germ cell-based expression of these RNAs. The levels of full-length Klf1 transcript increase up to P20, when somatic cells are abundant, and subsequently remain relatively constant in similar fashion to the proportion of somatic cells, which falls below 5% by P35. Indeed, we are able to confirm Sertoli cell expression of canonical KLF1 using immunocytochemistry (Fig. 2). Together, these data are consistent with expression of full-length Klf1 mRNA by Sertoli and germ cells and truncated Klf1 transcripts by differentiating germ line cells, although we have not directly demonstrated that spermatids express the 0.8-kb Klf1 transcript.
In light of our data and taking into account the lack of EST evidence for antisense transcripts in the Klf1 gene (antisense CAGE tags in Supplemental Fig. S1 are very low abundance and likely background), it is tempting to speculate that TSS choice and evolutionarily conserved transcription of spliced ORF-less RNAs from exon 2 initiation sites may repress canonical Klf1 expression by diverting transcriptional machinery away from the core promoter at exon 1 by a mechanism known as squelching . Alternatively, recent work on promoter-associated short RNAs  shows that posttranscriptional processing may generate intermediates with 5′ caps that are indistinguishable from mature functional mRNAs. However, consideration of the ENCODE DNAse I hypersensitivity data and the H3K4Me3 deep-sequencing profiles provides two independent lines of evidence that exon 2 of human KLF1 has an epigenetic configuration with multiple properties of a TSS [5, 47]. This chromatin state strongly suggests that the truncated Klf1 transcripts initiating in this region from multiple species are genuine transcription initiation events rather than posttranscriptional cleavage and recapping.
We thank Michael Bradley and Rhochelle Krawetz, CMMG, Wayne State University, for technical assistance with cloning and annotation and Dr. Xin Wu, CMMG, for use of Northern blots from other projects. We also thank Dr. James Bieker, Developmental and Regenerative Biology, Mount Sinai School of Medicine, New York, for his critique of our manuscript and helpful suggestions.
1Supported by a grant to A.G. from the National Institutes of Health, NIDCD (DC006262).