Cloning of mmGCN5.
In order to study the function of acetyltransferases in a mammalian system, we endeavored to clone mouse GCN5
homologs. First we generated a fragment of the mmGCN5
cDNA using a nested PCR strategy employing degenerate primers homologous to conserved regions of the yeast GCN5
and the Tetrahymena
p55 genes. To further enhance the probability of identifying GCN5
-related sequences, this fragment was used together with a human GCN5
EST to screen a 13.5-dpc mouse embryonic cDNA library under conditions of low stringency (11
). Multiple positive clones were identified, and upon sequencing, these were found to contain open reading frames predicted to encode proteins with significant homology to either hsGCN5 or hsP/CAF (Fig. ).
FIG. 1 Alignment of the mmGCN5, mmP/CAF, and reported hsGCN5 amino acid sequences. Identical amino acids are shaded. Amino acid deletions are indicated with a dotted line. The locations of the histone acetyltransferase (HAT)/acetyl coenzyme A binding regions (more ...)
One cDNA clone contained an open reading frame encoding 756 amino acids, and the C-terminal portion of this predicted amino acid sequence exhibited 98% identity with the reported hsGCN5 sequence, but only 71% homology to the hsP/CAF sequence, over the length of the predicted proteins (Fig. A). We tentatively concluded that this cDNA clone likely contains the mmGCN5 gene, as confirmed below.
FIG. 2 Comparison of GCN5 and P/CAF sequences across species. (A) Schematic comparisons of mmGCN5 and mmP/CAF sequences with GCN5 and P/CAF proteins from other species. Published sequences were obtained by searching the PIR-Protein and SWISS-PROT databases. (more ...)
We next used a fragment from the 5′ end of this clone to screen a library of mouse genomic sequences. Three different clones were isolated, and restriction analysis and sequencing indicated that all three clones harbored the entire mmGCN5 gene. Comparison of the genomic and cDNA clones of mmGCN5 revealed that the cDNA clone isolated as described above actually lacked the first 74 amino-terminal codons and that the mmGCN5 gene is divided into 19 exons and contains relatively small (85-bp to 1-kb) introns (Fig. A). We inserted sequences from the genomic clone containing the missing amino-terminal codons into the cDNA clone to generate a full-length (encoding 830 amino acids) recombinant mmGCN5 cDNA.
The two previously reported hsGCN5 sequences differ in the position of the initiating methionine, such that one reported sequence contains 49 additional amino-terminal amino acids relative to the other (7
). The mmGCN5
open reading frame also encodes these additional amino acids, but the open reading frame is further extended for some distance upstream of these sequences, potentially encoding 356 additional amino acids. The context of the predicted translation initiation site in this extended region of mmGCN5
matches well the Kozak consensus sequence (Fig. B) (21
). Moreover, the amino acids in this amino-terminal extension exhibit more than 66% identity to sequences in the corresponding regions of both mouse (see below) and human P/CAF, and the length of this extended region is similar to that of the P/CAF proteins. These data indicate that mmGCN5
encodes a protein that is very homologous to yeast Gcn5p and is almost identical to the previously reported hsGCN5 but that contains an extended N-terminal domain homologous to P/CAF in both size and sequence.
Incomplete splicing might yield a shorter GCN5 protein in mouse and human cells.
We were interested in determining the basis of the incongruity in size between mmGCN5
and the reported human cDNA. Inspection of the mmGCN5
genomic sequence revealed the presence of an intron (intron 6 in Fig. A) 10 bp upstream of the previously reported upstream-most hsGCN5 translation initiation site (41
). Sequences highly similar (91% identical) to these intronic sequences are also present in the predicted 5′ untranslated region of the reported hsGCN5
cDNA but are absent in the mouse cDNA we isolated as described above. These comparisons suggest either that the mouse and human GCN5
genes are subject to differential splicing events, in which this intron is either removed (mouse) or retained (human), or that the previously identified human cDNA sequence is incomplete. Interestingly, a conserved, in-frame stop codon is found near the beginning of intron 6, and retention of this intron would prevent translation of the larger protein in both mouse and human cells, perhaps yielding a smaller protein with a size corresponding to that previously predicted for hsGCN5.
To investigate the possibility of alternative (or incomplete) splicing of mouse and human GCN5 transcripts, we performed RT-PCR on total RNA isolated from human HeLa cells, human hepatoma cells, mouse kidney, mouse ovary, and a 13.5-dpc mouse embryo. All RNAs were treated with RNase-free DNase I before RT-PCR to remove any genomic DNA from the samples. An mmGCN5 genomic DNA clone was used in a separate reaction, as a positive control for the presence of the intron sequences. Two primers corresponding to conserved sequences in exons 6 and 8, which flank introns 6 and 7 (Fig. A and B), were used for the amplification. The RT-PCR products were separated on an agarose gel, transferred to a membrane, and then probed sequentially with mmGCN5 cDNA sequences or intron 6 sequences.
A predominant RT-PCR product of a size corresponding to the spliced cDNA (lacking the intron) was amplified from mouse embryonic, kidney, and ovarian RNAs (lower band in Fig. B). As expected, this product was significantly smaller (126 bp) than the amplification product from the genomic DNA (about 1 kb), which contains introns 6 and 7. This small product hybridized to the mmGCN5 cDNA sequences but not to the intron 6 probe, consistent with the removal of these intronic sequences by splicing. In contrast, two less abundant, closely spaced bands were detected by both the cDNA and the intron 6 probes. An intron 7 probe hybridized only to the genomic DNA but failed to detect any of the RT-PCR products (data not shown), suggesting that intron 7 had been removed in all of the transcripts. Sequencing of the larger, closely spaced RT-PCR products revealed that they represent two alternatively spliced variants of mmGCN5 (Fig. C). Both of these variants retained intron 6, but one also contained a novel 25-bp exon (exon 7) located between introns 6 and 7. Intron 7 was removed from both of these alternatively spliced products, bringing the stop codons in intron 6 to a position just upstream of the ATG sequence corresponding to the previously predicted translation start site of hsGCN5. Together these data indicate that the predominant form of the mouse cDNA is completely spliced, lacks these stop codons, and therefore is predicted to encode the longer version of GCN5. However, the two minor RT-PCR products that we observed might encode shorter GCN5 proteins, consisting of the amino-terminal, P/CAF-like domain in isolation or of the C-terminal domain, which is most similar to yeast GCN5.
RT-PCR of total RNA from human cells revealed a similar mixture of completely and incompletely spliced RNAs. For example, two RT-PCR products were generated from the human HeLa cell and hepatoma cell RNAs. The size of the more abundant, smaller product again is consistent with a spliced cDNA lacking sequences homologous to the mouse intron 6 and exon 7, and this product hybridizes only to cDNA sequences. The less abundant, larger product hybridizes to both intron and exon sequences (Fig. B, middle panel). We suggest that the longer product likely corresponds to the hsGCN5 cDNA sequences previously reported, whereas the more prevalent, shorter form represents a spliced product predicted to encode a longer protein analogous to that encoded by the mouse cDNA isolated as described above.
Long GCN5 proteins are present in both human and mouse cells.
To identify the size of the native mammalian GCN5 protein(s), total cell extracts prepared from a 12.5-dpc mouse embryo or human HeLa cells were probed with a polyclonal serum raised against the previously described hsGCN5 (generously provided by Shelley Berger, Wistar Institute). The hsGCN5-specific antiserum detected a 98-kDa protein in the HeLa cell nuclear extracts, consistent with the predicted size of the full-length GCN5 protein containing the extended amino-terminal region (Fig. A, left panel). To ensure that this band corresponded to mmGCN5 and that the hsGCN5 antibody did not cross-react with P/CAF, we compared the relative signals obtained with the hsGCN5 antibody and a P/CAF antibody (generously provided by Yoshihiro Nakatani, National Institutes of Health) with extracts from U2OS cells or HeLa cells. The P/CAF antibody recognized a single band in the U20S extract, consistent with previous reports that P/CAF is well expressed in these cells (41
), and in the HeLa cell nuclear extract. The hsGCN5 antibody, however, did not recognize any proteins of a similar size in either extract but did recognize a prominent band of ~98 kDa in the HeLa cell nuclear extract. Therefore, the hsGCN5 antibody does not appear to cross-react significantly with PCAF, and we conclude that the 98-kDa protein recognized by this antibody in HeLa cell extracts is GCN5.
FIG. 4 Detection of both long and short GCN5 proteins. (A) Left panel, protein extracts were prepared from U2OS cells or HeLa cell nuclei and probed with polyclonal antibodies to hsP/CAF or hsGCN5, as indicated. Right panel, protein extracts were prepared from (more ...)
The hsGCN5 antibody also recognized a faint 60-kDa band (lower arrow in right panel of Fig. A) in the HeLa cell extracts, close to the predicted size of the shorter GCN5 protein described previously (38
) and above. Thus, both the long and short forms of GCN5 appear to be expressed in these cells, but the longer form appears to be predominant. Interestingly, the long form of GCN5 was the only form detected in mouse embryo extracts. The expression of GCN5 protein in the embryonic extracts is consistent with high levels of GCN5-specific RNA detected in these tissues (see Fig. ). Moreover, since only very low levels of P/CAF RNA were detected at this (or any) stage of mouse embryogenesis (data not shown and see Fig. ), these data further support our conclusion that the hsGCN5 antibody recognizes mmGCN5 rather than mmP/CAF. Neither the long nor the short form of GCN5 was detected by control, preimmune serum in either the mouse or human extracts (data not shown).
FIG. 5 Ubiquitous and complementary expression of mmGCN5 and mmP/CAF. (Top panel) Total RNA was isolated from various mouse tissues and embryos as indicated. Northern blot hybridization was performed with a mixture of mmGCN5 and mmP/CAF cDNA probes. Two transcripts (more ...)
We also used the anti-hsGCN5 serum to immunoprecipitate GCN5 proteins from the mouse embryo extract. Precipitated proteins were then detected by Western blotting with the same serum. Again, a 98-kDa protein was detected by the hsGCN5 antibody but not by a control rabbit serum (Fig. B). Unfortunately, the shorter form of GCN5, if it was present, would comigrate with the immunoglobulin G band and thus could not be detected by this approach. Nevertheless, these experiments confirm the presence of the longer GCN5 protein in mouse embryos.
Cloning of mmP/CAF.
A second GCN5
-related cDNA clone that contained a high degree of similarity to hsP/CAF
was isolated in our screen of the mouse cDNA library. Since all initial clones appeared to be incomplete, containing an 867-bp fragment of the cDNA (relative to the human sequence), a second library was screened by using GeneTrapper technology. Multiple full-length cDNAs containing an open reading frame predicted to encode 813 amino acids were obtained. This open reading frame exhibited 93% identity to the hsP/CAF
cDNA sequence but only 75% identity to the reported hsGCN5
cDNA sequence (41
). We therefore designated this clone mmP/CAF
. Both the mmGCN5 and the mmP/CAF sequences possess predicted catalytic domains and bromodomains identified in a number of recently identified histone acetyltransferases, including several highly conserved amino acids near the putative catalytic center (Fig. ).
Using a fragment from the 5′ region of the mmP/CAF cDNA as a probe, we identified multiple clones from a library of mouse genomic sequences that contained P/CAF sequences. Four of these contained different portions of the cDNA sequence. These clones indicate that in contrast to the mmGCN5 gene, which contains small introns (a few hundred base pairs each), the mmP/CAF gene contains very large introns (16 to 20 kb). Because of these large introns, we have not completed cloning of mmP/CAF genomic sequences.
Interestingly, several clones identified in our genomic screens apparently contain a P/CAF pseudogene. No intronic sequences are present in these clones, and several base substitutions, relative to the cDNA sequence, are scattered throughout the predicted coding region of the pseudogene. RT-PCR analysis indicates that the pseudogene is not expressed in several mouse tissues examined, including brain, eye, heart, lung, liver, kidney, thymus, spleen, fat, diaphragm, small intestine, ovary, testis, or a 13.5-dpc embryo (data not shown).
Ubiquitous but complementary expression of mmGCN5 and mmP/CAF.
To examine and compare the expression of mmGCN5 and mmP/CAF, total RNA was extracted from various mouse tissues, subjected to denaturing electrophoresis, transferred to a membrane, and then probed with mmGCN5- or mmP/CAF-specific sequences.
A single transcript of 3.3 kb was detected in all tissues with the GCN5 probe, consistent with size of the cDNA clone we isolated. Similarly, a single, ubiquitous transcript was detected with the P/CAF probe, and the size of this RNA, 4.4 kb, is similar to that of the P/CAF cDNA that we isolated. Interestingly, the P/CAF RNA always exhibited a broader banding pattern than did the GCN5 RNA. These two RNAs were clearly distinguished from one another when probed on the same blot, and a differential pattern of expression was detected (Fig. ). For example, the ratio of mmGCN5 to mmP/CAF expression is higher in brain, thymus, spleen, testis, and 13.5-dpc embryonic tissue, while this ratio is much lower in heart, liver, kidney, and skeletal muscle. Western blot analysis of GCN5 protein levels (with the polyclonal antiserum to hsGCN5 described above) in various mouse tissues confirmed the general pattern of expression indicated by this RNA analysis (data not shown).
Chromosomal locations of the mmGCN5 and mmP/CAF genes.
The chromosomal location of the mmGCN5
gene was mapped by standard linkage analysis with the Jackson Laboratory interspecific backcross panel (C57BL/6Jei × SPRET/Ei)F1
× SPRET/Ei, also known as Jackson BSS (33
mapped cleanly to a distal region on chromosome 11 and cosegregated tightly with BRCA1
, as well as with a number of other genes previously mapped to that locus (data not shown, but raw data from the Jackson Laboratory are available at http://www.jax.org/resources/documents/cmdata
). Interestingly, the hsGCN5
gene was recently mapped by fluorescent in situ hybridization analysis to a syntenic region of human chromosome 17 (9
) and was also found to cosegregate with human BRCA1
The location of mmP/CAF was mapped in a similar fashion, using the same backcross panel. In this case we used a probe specific for intronic sequences to ensure that we mapped the authentic mmP/CAF gene and not the P/CAF pseudogene. This analysis indicated that mmP/CAF is located 32 centimorgans from the centromere of mouse chromosome 17 and that it cosegregates with the DNA marker D17Bir8 (see www address above).
mmGCN5 encodes a histone acetyltransferase with substrate specificity similar to that of P/CAF.
The high degree of homology between the mouse, human, and yeast GCN5 proteins strongly predicts that mmGCN5 and mmP/CAF will exhibit histone acetyltransferase activity. We confirmed this initially by examining the activities of the isolated, conserved acetylase domains of mmGCN5 and mmP/CAF, expressed as recombinant proteins in Escherichia coli
. As expected, this domain of mmGCN5 was quite active as a histone acetylase, and it preferentially acetylated free (nonnucleosomal) histone H3, and to a lesser degree H4, as does yeast Gcn5p (23
) and the previously reported form of the hsGCN5 protein (41
). Full-length mmGCN5 and mmP/CAF recombinant proteins (also expressed in bacteria) exhibited this same substrate specificity towards free histones (Fig. and data not shown).
FIG. 6 Acetylation of histone H3 synthetic peptides by mmGCN5 and mmP/CAF. (A and B) Results of acetyltransferase assays with recombinant full-length mmGCN5 and synthetic peptides corresponding to the amino-terminal tail of histone H3. (C and D) Results of peptide (more ...)
To determine which residues of histone H3 were acetylated by mmGCN5, we performed assays with synthetic peptides corresponding to the amino-terminal tail of this histone. As expected, we found that the full-length GCN5 protein efficiently acetylated peptides corresponding to the first 20 amino acids of histone H3 (Fig. A and B). This domain alone, then, is sufficient for binding to the enzyme and subsequent catalysis. However, mmGCN5 could not acetylate a peptide that contained acetyl-lysine moieties at positions 9 and 14 (Fig. B), suggesting that one or both of these lysines may be a target site for mmGCN5. In contrast, mmGCN5 readily acetylated a peptide containing acetyl-lysine moieties at positions 9 and 18 (Fig. A). Taken together, these data suggest that K14 is the preferred acetylation site in H3 for mmGCN5. Similar assays performed with H4 peptides indicate that K8 is the preferred site of acetylation in H4 (data not shown). These results are consistent with the site specificity determined for recombinant yeast Gcn5p, which was confirmed by protein sequencing of acetylated histones (23
). Importantly, these results indicate that the extended amino-terminal domain of mmGCN5 does not change the histone or lysine residue specificity of the enzyme.
The specificity of mmP/CAF was also tested with the peptide substrates. In all respects, mmP/CAF exhibited a substrate specificity identical to that of mmGCN5 (Fig. C and D).
One striking difference between the previously reported, shorter form of recombinant hsGCN5 (or yeast Gcn5p) and recombinant hsP/CAF was the ability of P/CAF to acetylate nucleosomal substrates (23
). Given the homology between the amino-terminal portions of P/CAF and mmGCN5, we asked whether the full-length recombinant mmGCN5 could also acetylate histones within a nucleosome. We found that mmGCN5, like hsP/CAF, can acetylate nucleosomal H3 and, to a lesser degree, H4 (Fig. ). In agreement with previously reported results (23
), we also found that the short form of mmGCN5 or yeast Gcn5p was unable to acetylate nucleosomes (data not shown). These results suggest that one function of the amino-terminal domains of mammalian GCN5 and P/CAF may be to facilitate the recognition of chromatin templates.
FIG. 7 Acetylation of nucleosomal histones by mmGCN5 and mmP/CAF. Acetyltransferase assays were performed with HeLa cell mononucleosomes or free histones as indicated, and an aliquot of each assay mixture was resolved on an SDS–22% polyacrylamide (more ...) mmGCN5 and mmP/CAF both interact with CBP and p300.
hsP/CAF interacts with CBP and p300 (41
). Given the similarity between mmGCN5 and mmP/CAF, we examined the abilities of both of these proteins to bind to CBP or p300 in vitro (Fig. ).
FIG. 8 mmGCN5 and mmP/CAF both interact with CBP and p300 in vitro. (A) Fragments of CBP fused to GST that were used for the interaction assays in panel C. These fragments span the region of homology to ADA2 and extend into the transactivation domain of CBP (more ...)
Whole-cell lysates from bacteria expressing fragments of CBP fused to GST (fusion constructs were kindly provided by Y. Nakatani, National Institutes of Health) (41
) were mixed with lysates from cells expressing the amino-terminal domain of mmP/CAF, the amino-terminal domain of mmGCN5, or the C-terminal domain of mmGCN5. The CBP fragments (A to F) spanned the ADA2 homology domain and extended into the transcriptional activation domain (41
). A fragment of p300 (B′) homologous to the B fragment of CBP was also tested. GST fusion proteins were purified together with any interacting proteins by using glutathione-Sepharose, and the interacting proteins were identified by Western blotting with an antiserum specific for the six-histidine tag present in the recombinant mmP/CAF or mmGCN5 protein.
The amino-terminal domain of mmP/CAF selectively bound to fragments A and B of CBP and the corresponding B′ fragment of p300. In some experiments, we also observed binding to the D fragment, but we never observed binding to fragment C, E, or F. A deletion within the B fragment (ΔB) of CBP that removed residues 1801 to 1851 eliminated binding. This pattern of binding to the CBP/p300 fragments is extremely similar to that previously reported for hsP/CAF (41
), as expected.
A recombinant form of hsGCN5, which lacked the amino-terminal domain reported here for mmGCN5, failed to bind CBP or p300 in previous experiments by Yang et al. (41
). This form of hsGCN5 corresponds to the C-terminal region of mmGCN5. We therefore compared binding of the amino-terminal and the C-terminal halves of mmGCN5 to the GST-CBP and -p300 fragments. Surprisingly, we found that both of these mmGCN5 domains bound to CBP fragments A to D, with little or no binding to fragment E, fragment F, or the ΔB fragment. Both the amino-terminal and C-terminal regions of mmGCN5 also bound to the p300 B′ fragment. The amount of the GST fusion proteins recovered from the GST columns that did not exhibit binding to the GCN5 fragments was greater than or equal to that of the GST fusions that did exhibit binding (data not shown), so the absence of binding was not due to reduced amounts of the E, F, or ΔB fragment. In addition, the selective binding of the mmGCN5 peptides to CBP fragments A to D indicates that these interactions are not nonspecifically mediated by the GST moiety, since this moiety is also present in fragments E, F, and ΔB. The specificity of the interactions was further tested by using an unrelated protein, HIRA, which failed to bind to any of the GST-CBP or -p300 fragments. Thus, CBP fragments A to D do not exhibit general, nonspecific binding to random proteins. We conclude that mmGCN5 contains two distinct CBP/p300 interaction domains and that these domains interact with a broader region in CBP than does P/CAF. Importantly, our finding that mmGCN5 and mmP/CAF can both interact with CBP/p300 indicates that these proteins are very similar in function as well as in structure.