|Home | About | Journals | Submit | Contact Us | Français|
Methylation at CpG dinucleotides in genomic DNA is a fundamental epigenetic mechanism of gene expression control in vertebrates. Proteins with a methyl-CpG-binding domain (MBD) can bind to single methylated CpGs and most of them are involved in transcription control. So far, five vertebrate MBD proteins have been described as MBD family members: MBD1, MBD2, MBD3, MBD4 and MECP2.
We performed database searches for new proteins containing an MBD and identified six amino acid sequences which are different from the previously described ones. Here we present a comparison of their MBD sequences, additional protein motifs and the expression of the encoding genes. A calculated unrooted dendrogram indicates the existence of at least four different groups of MBDs within these proteins. Two of these polypeptides, KIAA1461 and KIAA1887, were only present as predicted amino acid sequences based on a partial human cDNA. We investigated their expression by Northern blot analysis and found transcripts of ~8 kb and ~5 kb respectively, in all eight normal tissues studied.
Eleven polypeptides with a MBD could be identified in mouse and man. The analysis of protein domains suggests a role in transcriptional regulation for most of them. The knowledge of additional existing MBD proteins and their expression pattern is important in the context of Rett syndrome.
Methylation at CpG dinucleotides in genomic DNA is a fundamental epigenetic mechanism of gene expression control in vertebrates [1-3]. Strong evidence exists for a correlation between DNA hypermethylation, hypoacetylation of histones, tightly packed chromatin, and transcriptional repression. Effects of DNA methylation are mediated through proteins which bind to symmetrically methylated CpGs. Such proteins contain a specific domain, the methyl-CpG-binding domain (MBD) which consists of ~70 residues in an α/β-sandwich fold built of three to four β-twisted sheets and a helix with a characteristic hairpin loop in the opposite layer [4-7]. Recently, a transcriptional repressor protein, Kaiso, lacking an MBD, has also been shown to bind to methylated CpG dinucleotides. This binding is mediated through zinc finger motifs [8,9]. Members of the MBD protein family are found in different animal species. So far, five vertebrate MBD proteins have been identified as members of the MBD protein family: MBD1, MBD2, MBD3, MBD4 and MECP2 (for review, see: [6,10]). Except for MBD4, all of them are associated with histone deacetylases (HDAC), and a transcriptional repression mechanism mediated by the recruitment of HDACs has been shown for MECP2, MBD1 and MBD2 [11-13].
One of these five MBD proteins, MECP2, is implicated in a human neurological disorder called Rett syndrome [14,15]. Symptoms of this syndrome are mental retardation, loss of speech and purposeful hand use, autism, ataxia, and stereotypic hand movements. The similar phenotype of conditional Mecp2 knockout mice and in vitro studies of functional consequences of MECP2 mutations indicate that the disorder is due to a loss of MECP2 function in the nervous system [16-21]. It remains unknown, why these patients present with a neurological phenotype, although MECP2 is ubiquitously expressed. It has been proposed that MECP2 is complemented by other MBD proteins in non-neural tissues and this hypothesis was tested for MBD2 by crossing Mecp2 knockout mice with Mbd2 knockout mice . However, no evidence for functional redundancy of these two genes could be found in this way.
Here we report two new polypeptide sequences with an MBD as well as four MBD proteins in man and mouse that had not been mentioned as MBD protein family members up to date. Analysis of their amino acid sequence revealed additional domains associated with chromatin and point to a function in transcription control.
We used a bioinformatics approach with the MBD of human MECP2 as query sequence to search for new members of the MBD protein family. Initial standard BLAST searches of the NCBI, Celera and SwissProt databases resulted only in five MBD proteins (MECP2, MBD1, MBD2, MBD3 and MBD4) which had previously been described and studied intensively. However the search of protein domain family databases (NCBI, Pfam, Smart and Prosite) revealed similarities to the MBD of MECP2 for four additional proteins, i.e. BAZ2A/TIP5, BAZ2B, CLLD8 and SETDB1 and for two cDNAs, KIAA1461 and KIAA1887. These databases use Hidden Markov Models (HMM) to detect motifs in amino acid sequences. An MBD has been described in CLLD8, SETDB1 and BAZ2A/TIP5 so far [22-24].
Nine of the eleven MBD-containing protein sequences could also be detected by screening the Sequence Similarity DataBase (SSDB)  at GenomeNet. The cDNAs for KIAA1461 and KIAA1887 were not found in the KEGG database that underlies the SSDB. These results are summarized in Tab Tab1.1. MBD amino acid sequences of the five previously published and the six newly described human polypeptides were aligned (Fig. (Fig.1).1). A sequence logo derived from the alignment of all eleven sequences is shown in Fig. Fig.2.2. These analyses implicate a small number of highly conserved and apparently essential amino acids within the domain. At three positions identical amino acids are present and five positions with conservative substitutions can be found.
A phylogenetic tree of the MBD amino acid sequences of all eleven polypeptides is shown in Fig. Fig.3.3. Four major MBD subsets are indicated there. The MBDs of the originally described proteins (MBD1, MBD2, MBD3, MBD4 and MECP2) are found as one group besides a second (BAZ2A/TIP5, BAZ2B) and third subset (CLLL8 and SETDB1) which are joined by a very short branch. KIAA1461 and KIAA1887 appear in a fourth branch. MBDs of the original five proteins are more similar to each other than to the novel ones, which explains why BLAST analyses with the MECP2 MBD query failed to identify the second, third or fourth class.
An analysis of the amino acid sequences revealed that the MBD was the only domain shared by all eleven sequences. The MBD of MECP2, MBD1, MBD2, MBD4 and BAZ2A/TIP5 mediates binding to DNA, in case of MECP2, MBD1 and MBD2 preferentially to methylated CpG [13,24,26-28]. MBD4 has a special role acting as a DNA repair enzyme that reverses spontaneous CpG to TpG base exchanges, thereby maintaining methylated CpG motifs. It binds preferably to m5CpG/GpT mismatches [29,30].
In case of human MBD3 and SETDB1 the MBD has been shown to mediate protein-protein interactions [23,31]. Xenopus MBD3 is exceptional in its binding to methylated CpG which can be explained by the difference of an amino acid residue within the MBD (Lys30) important for DNA binding . It remains to be determined whether the MBDs of BAZ2B, CLLD8, KIAA1461 and KIAA1887 mediate DNA binding or protein-protein interactions. Additional domains found in seven of the eleven polypeptides indicate that they are associated with chromatin and function in epigenetic mechanisms of gene regulation. Some of the proteins are already known to be involved in transcriptional repression, and the domains of the remainder strongly suggest a comparable function.
MECP2 recruits the Sin3A co-repressor complex and MBD2 the NuRD co-repressor complex, which itself contains MBD3. Both complexes contain HDACs, and MBD1 is also associated with HDAC activity although the identity of the deacetylase remains unknown . Within the C-terminal part of MECP2, a histidine and proline-rich region is present which is conserved in certain neural-specific transcription factors .
BAZ2A/TIP5 is part of the NoRC, nucleolar remodeling complex, which represses rDNA transcription by recruiting histone methyltransferases, HDACs and DNA methyltransferases .
BAZ2B has a domain structure similar to BAZ2A/TIP5, both contain a DDT (DNA binding homeobox and Different Transcription factors) and a tandem PHD-bromodomain. The PHD domain is a C4HC3 zinc-finger-like motif and the bromodomain consists of 110 amino acids and is found in many chromatin-associated proteins that can interact specifically with acetylated lysine. Tandem PHD-bromodomains have been found in several transcriptional co-repressors . The DDT domain is exclusively associated with nuclear domains in other proteins and was found in different transcription and chromatin remodeling factors . An AT_hook motif (which allows binding to the minor groove of AT-rich DNA regions) was found in BAZ2A/TIP5 but not in BAZ2B.
SETDB1 is a H3-K9 histone methyltransferase , its mouse homologue, ESET, has furthermore been shown to interact with the mSin3A/B co-repressor complex . The SET domain is a signature motif for lysine-specific histone methyltransferases [37,38]. This domain is also present in CLLD8 to which no function has yet been assigned.
The predicted protein sequence of KIAA1461 harbors a PWWP motif  named after the conserved amino acids Pro-Trp-Trp-Pro. It was first described in the WHSC1 protein, encoded by a gene within the Wolf-Hirschhorn syndrome critical region. The PWWP domain of Dnmt3b, a DNA methyltransferase, has been recently shown to bind to DNA. A common feature of PWWP containing proteins is the presence of additional domains known to be associated with chromatin . Furthermore KIAA1461 has been shown to interact with the KIAA1549 protein in a yeast-two-hybrid experiment (http://www.kazusa.or.jp/huge/ppi). This interaction partner has not been studied in detail so far, but contains a serine-rich stretch as well as a helix-turn-helix motif (PS00622, LuxR family) according to Prosite (http://www.expasy.org/prosite). Helix-turn-helix motifs can be found in many transcription regulation proteins.
In the predicted protein sequence of KIAA1887, only a proline-rich extension (http://www.ebi.ac.uk/interpro) but no protein motif as such could be found.
The co-existence of MBDs and domains involved in chromatin modification, present in many of the identified polypeptides, could also point to a connection between the latter mechanism and methylated DNA. Interestingly, a very recent study  has shown that Mecp2 is associated with a H3-K9 methyltransferase activity, indicating a link between DNA methylation and histone methylation.
In the mouse, homologues were found for all human MBD proteins. Sequence identity scores range from 63.8% to 94.0% (Tab. (Tab.2)2) indicating a conserved function in both species. Human and mouse MBD1, MBD2, MBD3, MBD4 and MECP2 are curated orthologues [27,42-44].
Homology searches in the ENSEMBL database revealed the following murine homologues of BAZ2B, CLLD8, KIAA1461 protein and KIAA1887 protein. The mouse homologue of BAZ2B, the ENSEMBL protein ENSMUSP00000028367 (gene ENSMUSG00000026987) shows a 82.6 % amino acid sequence identity. The predicted mouse gene ENSMUSG00000021980 coding for ENSEMBL protein ENSMUSP00000022552 has a 65.2 % amino acid sequence identity to the human CLLD8. We furthermore found the predicted gene ENSMUSG00000036792 coding for the ENSEMBL protein ENSMUSP00000036847 as mouse homologue of human KIAA1461 protein with a 94.0 % amino acid sequence identity. For the KIAA1887 protein, the mouse gene sequence ENSMUSG00000025409 (ENSEMBL protein ENSMUSP00000026476) with 91.1 % sequence identity was present in the database. The latter database entry however consists of only 101 amino acids. The existence of translated proteins remains to be determined for all four mouse genes.
DNA methylation as a mechanism of gene expression regulation exists also in plants. In our database searches we detected plant MBD proteins as well. The Pfam database contains polypeptides from Arabidopsis thaliana and Triticum aestivum. BLAST analyses revealed additional proteins in Zea Mays, Hordeum vulgare and Lycopersicon esculetum. Entries for MBD containing proteins from plants over C. elegans to mouse and human are present in Pfam.
Expression analyses had been carried out previously for all genes of the mouse/human MBD family except for KIAA1461 and KIAA1887 (only the abundance of KIAA1887 ESTs in different tissues has been reported ). The results of published Northern blot experiments are summarized in Tab. Tab.3.3. Since expression levels of MBD4 were too low to be detected by Northern blots, only results of RT-PCR studies in three tissues are shown. However the presence of MBD4 EST sequences from numerous tissues points to a ubiquitous expression (http://www.ncbi.nlm.nih.gov/UniGene).
We performed Northern blot analyses for KIAA1461 and KIAA1887. Strong signals of ~8 kb were detected for KIAA1461 in skeletal muscle, heart, pancreas, kidney and placenta. A faint band could be detected in brain, lung and liver. For KIAA1887 a strong band of ~5 kb was present in heart, kidney, liver, skeletal muscle, placenta and pancreas, weaker signals could be seen for brain and lung tissue (Fig. (Fig.44).
Taken together, MBD1, MBD2, MBD3 and MECP2 as well as SETDB1, CLLL8, BAZ2A, KIAA1461 and KIAA1887 show a broad tissue distribution. The expression of BAZ2B is more restricted according to northern blot results . It is of note that CLLL8, BAZ2A, KIAA1461 and KIAA1887 show a very low expression in brain.
In this study we present additional four proteins and two cDNAs with a methyl-CpG-binding domain in mouse and man. Transcripts of SETDB1, BAZ2A, CLLL8, KIAA1461 and KIAA1887 are found in all adult tissues studied. Among the six proteins described here as new members of the MBD protein family, CLLL8, BAZ2B, KIAA1461 and KIAA1887 show a low expression in brain. MECP2 is found preferentially in mature neurons of the brain [47,48], and the cell types that express CLLD8, BAZ2A/TIP5, SETDB1 as well as the predicted KIAA1461 and KIAA1887 in brain are not known.
Rett syndrome is caused by mutations in MECP2. Even though MECP2 is ubiquitously expressed, the phenotype of the syndrome is restricted to the brain. This could be explained by a greater need of long lived, non-dividing neuronal cells for a special chromatin state that involves MECP2 and tightly suppresses transcription of undesired genes. Another explanation would be that the loss of function of MECP2 in non-neural tissues is compensated by another protein with similar properties. MBD2 has been studied in this respect, but no evidence for a genetic interaction of the two genes was found by combining Mbd2- and Mecp2-null mice . Based on gene expression studies of Mecp2 knockout mice and biochemical evidence, it has been suggested that the essential function of Mecp2 in the brain might not be transcriptional . In view of this aspect and the protein-protein interaction property of the methyl-CpG-binding domain of human MBD3 and SETDB1, functional compensation would not necessarily require a DNA binding property.
Almost all presented polypeptides are known or predicted to be involved in mechanisms of gene expression regulation. In order to understand the higher-order interplay of MBD proteins and associated complexes, it will be a major task to identify interacting proteins as well as regulated targets of all components. This will help to solve the question whether some of the polypeptides can functionally complement MECP2 in tissues other than the brain.
Non-redundant GenBank, high throughput genomic sequences and expressed sequence tag (EST) databases were searched using BLASTP and TBLASTN of the NCBI web tools (http://www.ncbi.nlm.nih.gov/blast) applying default conditions. The human MECP2 chain A (MBD) was used as a query input. TBLASTX was carried out applying the web tools at the European Bioinformatics Institute (http://www.ebi.ac.uk/blast2/).
Using the MBD of human MECP2 as query, the NCBI (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein), Pfam (release 7) (http://www.sanger.ac.uk/Software/Pfam), Prosite (release 17.17) (http://www.expasy.ch/prosite/) and Smart (version 3.4) (http://smart.embl-heidelberg.de/) databases were screened for proteins with this domain. The resulting polypeptides were sorted according to their species of origin and additional identified domains within their sequence.
SSDB at GenomeNet (http://ssdb.genome.ad.jp) was searched with pfm:MBD as query motif. BLASTP and BLASTN searches were carried out against the Celera database (http://www.celera.com) with standard settings.
Homologues of KIAA1461, CLLL8 and KIAA1887 were identified in the ENSEMBL database.
Alignments and phylogenetic trees were computed using the ClustalW program at GenomeNet (http://clustalw.genome.ad.jp/) with standard settings and the ClustalX program (version 1.8)  for graphical representation.
The sequence logo was constructed by means of the plogo script (http://www.cbs.dtu.dk/~gorodkin/appl/plogo.html). File formats were converted with the GCG package (version 10.2) when necessary.
Sequence comparison between mouse and man was carried out using the LALIGN algorithm at EMBNET (http://www.ch.embnet.org/software/LALIGN_form.html) with default settings.
CCRF-CEM and lymphoblastoid cells were grown according to manufacturer's instructions (DSMZ, Braunschweig, Germany and the Coriell Institute for Medical Research, Camden, Netherlands) and harvested at 95 % confluency. Genomic DNA was isolated with the QIAGEN Blood Maxi-KIT (QIAGEN, Hilden, Germany). Total RNA was isolated using the Trizol reagent (Invitrogen, Karlsruhe, Germany). A reverse transcriptase reaction was performed in the presence of 50 ng/μl oligo(dT) and 2 mM dA/C/G/TTP with 10 U/μl SSII reverse transcriptase (Invitrogen, Karlsruhe, Germany). The resulting cDNA was purified with the QIAquick PCR purification kit (QIAGEN, Hilden, Germany). cDNA (200 ng) was used in subsequent 50- μl PCR amplifications with 10 pmol gene-specific primers. Standard PCR conditions and respective annealing temperatures were used.
A cDNA fragment of KIAA1461 was PCR-amplified with exon specific primers KIAA1461for (5'-CTAGACCATGGGAAAAATGT-3') / KIAA1461rev (5'-ACTTGGAGACTGCTCCTCTA-3') and human genomic DNA as template using standard conditions. For KIAA1887 a cDNA fragment was PCR-amplified with exon 6 specific primers KIAA1887for (5'-CAGACCCCCTACTGTATTTC-3') / KIAA1887rev (5'-CAAAAGGTTAAAGCTTCCAT-3') and cDNA from a lymphoblastoid cell line as template using standard conditions. A cDNA fragment of β-actin amplified with primers β-actinfor (5'-TGAACCCTAAGGCCAACCGTG-3') / β-actinrev (5'-GCTCATAGCTCTTCTCCAGGG-3') was used as a loading control. These probes were radioactively labeled with 32P-dCTP in a random prime reaction and hybridized in ExpressHyb solution (CLONTECH, Palo Alto, CA, USA) to a human multiple tissue northern blot (CLONTECH, Palo Alto, CA, USA) for 16 h at 65°C. Washing was performed in 2 × SSC / 0.1% SDS at 65°C for 10 min. Signals were detected with a PhosphorImager (Amersham Biosciences, Freiburg, Germany).
MBD: Methyl-CpG-binding domain
HDAC: histone deacetylase
TCR: carried out the database searches, comparative analyses and molecular genetic experiments
HHR: revised the manuscript and participated in the study coordination
UAN: devised the study, supervised and coordinated it and drafted the manuscript
All authors read and approved the manuscript.
We thank Ralph Schulz and Bettina Lipkowitz for excellent technical assistance and Ulf Gurok and Antje Krause for critical reading and suggestions. This work was supported by a grant from the Deutsche Forschungsgemeinschaft (DFG) (SFB 577, Teilprojekt C3).