Cell type specific expression pattern of genes is set during development. This complex process is accompanied by differential packaging of the genome in a cell and tissue specific manner that involves post-translational modifications, like methylation and acetylation of histones, and subsequent interaction of other regulatory proteins. The differential organization of chromatin, once set early during development, is maintained by
Polycomb group (PcG) and trithorax group (trxG) proteins. Maintenance of chromatin structure, and thereby the expression state, is referred to as epigenetic cellular memory that provides continuity of specific pattern of expression states in daughter cells when a differentiated cell divides and also throughout the life span of organisms. PcG proteins maintain the repressed state, while trxG proteins maintain genes in the active state. Both PcG and trxG proteins function as multi-protein complexes. In
Drosophila, PcG proteins form two major complexes. PC, Ph, Psc and dRing form
Polycomb Repressive Complex1 (PRC1) [
1], while PRC2 consists of Esc, E(z), Su(z)12 and P55 [
2,
3]. It has been observed that PCL also interact with a subset of PRC2, making a highly active and distinct complex [
4,
5]. A third complex consists of a DNA binding protein Pleiohomeotic (PHO) that interacts with PC and directs its binding to the specific sites of recruitment [
6]. Similarly, trxG proteins form specific complexes [
7]. The DNA sequences that function as sites for the recruitment of the PcG/trxG proteins are called the cellular memory elements or Polycomb response elements (PREs) [
8]. Often common elements function to recruit both PcG and trxG proteins. It is generally believed that a balance between the two opposing functions on PREs maintains the precise level of expression state of a particular genomic region. Expression state of the locus is interpreted and maintained by distinct set of PcG and trxG complexes that bind to PREs and establish a chromatin state marked by specific histone modifications [
9-
14].
PC, the core member of PRC1, was first identified in
Drosophila. The PC mutation causes a dominant phenotype of extra sex comb and is an essential gene. Many members of this group show similar phenotype due to the defect in the expression pattern of homeotic genes [
15]. In
Drosophila, the initial expression of homeotic genes is determined by segmentation genes [
16] and subsequently this expression pattern is maintained by PcG and trxG proteins [
17]. Changes in the expression pattern leads to homeotic transformation and/or lethality [
18]. Insects have one PC gene, where as mammals have five homologues. Vertebrate homologues of PC contain
Chromatin
organizer
modifier domain, chromodomain, and are referred to as
chromo
bo
x, CBX, proteins. These include CBX2, CBX4, CBX6, CBX7 and CBX8. The other CBX proteins CBX1, CBX3 and CBX5 are the homologues of
heterochromatin
protein (HP1). Here we refer PC proteins of vertebrates as CBX proteins.
Several lines of evidence suggest that homeotic genes are not the only targets of PcG genes [
19,
20]. More recently, genome wide ChIP on Chip analysis of PcG proteins and its associated histone methylation marks in fly, human and mouse cells have identified large number of targets of these proteins [
21-
23]. PcG members are essential for maintenance and normal proliferation of cells and have been implicated in the maintenance of stem cells [
24]. Genome wide mapping of H3K27Me3 in various prostate cancer tissues shows the PcG mediated repression of several genes which are down regulated in cancer [
25]. The abnormal expression of PcG genes cause misregulation on its target loci and subsequently to abnormal proliferation of cells and cancer [
24,
26]. CBX7 and CBX8 are involved in maintaining the repressive state of
INK4A-ARF locus which is involved in the regulation of cellular proliferation and senescence [
27,
28]. CBX7 knockdown increases the ARF and INK4A expression which causes impairment in cell growth [
29]. CBX4 is the repressor of C-MYC and mutation in its C-terminal region leads to enhanced expression of this proto oncogene and cellular transformation [
30]. Genome wide mapping of CBX8 target shows that this PcG protein is predominantly associated with genes that are involved in developmental and differentiation processes [
31]. CBX2 and CBX7 have also been implicated in maintenance of the inactive X-chromosome in mouse [
32,
33].
The N-terminal end of PC has chromodomain, which binds to the histone methylation marks created by PRC2 on PREs [
34]. Chromodomain is a three beta strands and a helix containing domain present in proteins that are involved in chromatin organization, viz., HP1, SU(var)3-9, Swi6, CHD1, MSL-3, MOF, etc. Chromodomain is involved in targeting the protein to specific regions of chromatin. The chromodomain of
Drosophila PC exhibits preferential binding to tri-methylated histone H3 at lysine 27 (H3K27Me3) [
35] whereas chromodomain of HP1 recognizes H3K9Me3 mark. Mutation in the chromodomain of PC results in the disintegration of PRC1 and subsequently loss of its silencing activity [
36]. In
Drosophila, a chimeric protein generated by replacing chromodomain of HP1 with PC chromodomain localizes HP1 to euchromatic PC binding sites indicating that chromodomain is essential for recognizing specific histone methyl marks [
37,
38]. This suggests that subtle differences in the sequence/structure of chromodomains may confer differential affinity to different histone methylation patterns, for example, H3K9Me3 and H3K27Me3. Unlike the fly PC that recognizes only H3K27Me3, mammalian PC homologues show differential binding to methylated histone. CBX2 and CBX7 bind to both H3K9Me3 and H3K27Me3 whereas CBX4 shows strong affinity for H3K9Me3 [
33].
Significance in PcG system is apparent from the observation that these genes are not only conserved from plants to animals, but also highly evolved animals have more homologues of PcG genes. For example, while insects have only one copy of PC, vertebrates have at least five homologues. The importance of having more PC homologues in an organism remains elusive. Identification of uniquely conserved regions in each CBX protein will help us understand the function of homologues. In this study, we carried out extensive mining and analysis of PC homologues to understand their evolution and sequence-structure-function relationship in the context of motif organization.