|Home | About | Journals | Submit | Contact Us | Français|
A recent comparative genomic analysis tentatively identified roughly 40 orthologous groups of C2H2 Zinc-finger proteins that are well conserved in "bilaterians" (i.e. worms, flies, and humans). Here we extend that analysis to include a second arthropod genome from the crustacean, Daphnia pulex.
Most of the 40 orthologous groups of C2H2 zinc-finger proteins are represented by just one or two proteins within each of the previously surveyed species. Likewise, Daphnia were found to possess a similar number of orthologs for all of these small orthology groups. In contrast, the number of Sp/KLF homologs tends to be greater and to vary between species. Like the corresponding mammalian Sp/KLF proteins, most of the Drosophila and Daphnia homologs can be placed into one of three sub-groups: Class I-III. Daphnia were found to have three Class I proteins that roughly correspond to their Drosophila counterparts, dSP1, btd, CG5669, and three Class II proteins that roughly correspond to Luna, CG12029, CG9895. However, Daphnia have four additional KLF-Class II proteins that are most similar to the vertebrate KLF1/2/4 proteins, a subset not found in Drosophila. Two of these four proteins are encoded by genes linked in tandem. Daphnia also have three KLF-Class III members, one more than Drosophila. One of these is a likely Bteb2 homolog, while the other two correspond to Cabot and KLF13, a vertebrate homolog of Cabot.
Consistent with their likely roles as fundamental determinants of bilaterian form and function, most of the 40 groups of C2H2 zinc-finger proteins are conserved in kind and number in Daphnia. However, the KLF family includes several additional genes that are most similar to genes present in vertebrates but missing in Drosophila.
Zinc-finger proteins (ZFP) represent the largest family of DNA-binding transcription factors in eukaryotes. Although many proteins are predicted to contain single zinc-finger domains, two zinc fingers in close proximity appear to be required for high-affinity DNA binding. There are many diverse subfamilies of zinc-finger proteins in eukaryotes, but the most numerous are the Kruppel-type C2H2 ZFPs. Many of these proteins contain either multiple tandem pairs of zinc-fingers or tandem arrays of three or more zinc-fingers. As transcription factors, they participate generally in the fundamental mechanism of gene expression. However, they usually also play more specific roles in a wide variety of regulated biological processes, including signal transduction, cell growth, differentiation, and development. As part of our collaborative role in annotating the draft genome assembly v1.1 of the Daphnia pulex genome, we focused our attention on a subset of roughly 40 orthologous groups of C2H2 ZFPs identified in a recent comparative genomic analysis to be well conserved in "bilaterians" (i.e. worms, flies, and humans). While many of these were known or likely DNA-binding transcription factors encoding proteins with tandem arrays of zinc-fingers (e.g. Zif268, MTF1, TFIIIA, SP1 and KLF), others had only a single zinc finger (e.g. SAP61, SAP62, and Kin17), which generally lack a DNA-binding function . Also included are some genes that encode multiple split pairs of C2H2 zinc-fingers, like Disco
Previously 39 families of C2H2 ZFP were determined to be present in the common ancestor of bilaterians based on a survey of three organisms: Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans. Although the work described below necessitated the addition of three more conserved families to this collection, two pairs of the original 39 families could also be reasonably combined into single families. Hence, we created a reorganized summary list of 40 orthologous groups of C2H2 ZFP. The resulting list of family members and their accession IDs are provided in Table Table1,1, ,2,2, ,33 and and4,4, while an efficient numerical summary is provided in Table Table5.5. A brief description of the known or proposed function(s) and structural organization for each of these families is provided in series below.
A relatively rigorous assessment of homology/orthology is provided by a set of carefully constructed phylogenetic trees (see Figure Figure1,1, ,2,2, ,3,3, ,4,4, ,5,5, ,6,6, ,77 and and88 and supplementary Additional file 1 and 2). These trees also serve to summarize various evolutionary events of interest, such as presumptive gene losses, duplications, and lineage-specific expansions. Larger C2H2 Zinc-finger families (i.e. Sp and KLF) are presented within separate trees (Figures (Figures11 and and2),2), while most of the remaining families are displayed in a short series of multi-family summary trees (Figure (Figure3,3, ,4,4, ,5,5, ,6,6, ,77 and and8).8). The latter provide additional evidence that member gene clusters represent distinct families of correctly identified homologs and putative orthologs. Evidence for the expression of almost all of these genes at the RNA level was obtained using a NimbleGen tiling array [3-5] and the EST data available at JGI portal [1,6]. Only three genes lack any evidence of expression to date (KLF2D, ZFam9, and FEZL).
The Sp/KLF proteins are DNA-binding transcription factors each containing 3 zinc-fingers. Although Sp and KLF factors are closely related, they occupy distinct branches on a combined evolutionary tree. To facilitate presentation, however, it was most convenient to separate these families into two distinct trees, each rooted with two members from the other family (Figures (Figures11 and and2).2). Although frequently described as simple transcription factors (especially Sp), many are known to interact with particular sets of chromatin remodeling complexes to facilitate transcriptional activation or repression . In vertebrates, Sp/KLF proteins are grouped into three classes that tend to correlate with the type of chromatin remodeling complexes they utilize. Conveniently, most of the invertebrate proteins can also be assigned to one of these classes . dSP1, btd, and CG5669 correspond to three distinct subsets of Sp-Class I proteins Sp7/8, Sp5, and Sp1/2/3/4, respectively (Figure (Figure1).1). Luna, CG12029, and CG9895 correspond to the KLF-Class II proteins KLF6/7, KLF5, and KLF3/8/12, respectively (Figure (Figure2).2). Bteb2 and Cabot correspond to KLF-Class III proteins KLF15 and KLF9/10/11/13/14/16, respectively (Figure (Figure2).2). The two remaining fly proteins (CG3065 and hkb) are difficult to place unambiguously. The former shows roughly equal similarity to members of both the Sp and KLF subfamilies, while the latter appears to be a highly diverged and relatively unique member of the Sp/KLF family. Information concerning the function of many of these proteins is available in a variety of organisms [9,10]. In vertebrates, some Sp/KLF proteins produce early embryonic lethals when mutated (Sp1, KLF5), some are known to affect behavior and/or the development of structures within the brain (Sp4, Sp8, KLF9), and others affect development of the blood cells (Sp3, KLF1, KLF3), goblet cells in the colon, or bone cells (Sp7). In Drosophila, buttonhead (btd) is important for the development of head structures, and both btd and dSp1 affect development of the mechano-sensory organs . Cabot (cbt) also affects sensory organ development, as well as dorsal closure . Perturbations of luna expression via RNA interference or over-expression during early Drosophila embryogenesis leads to developmental arrest at different embryonic stages . Daphnia appear to contain three Sp homologs (Sp8, Sp5, Sp4), each of which correspond to specific counterparts (SP1, btd, CG5669) in Drosophila. In contrast, the KLF gene family in Daphnia includes 10 distinct genes, 4 more than the complement in Drosophila. Dp-KLF3, Dp-Luna, Dp-Cabot, and Dp-Bteb2 appear to correspond to the fly genes CG9895, Luna, Cabot, and Bteb2, respectively. No clear homologs of KLF17 or CG12029 are apparent in Daphnia. In contrast, six KLF genes in Daphnia with no direct homologs in Drosophila seem to correspond to two subfamilies of KLF found in the vertebrate genome. Five of these, Dp-KLF1A through Dp-KLF1E, may represent a species specific gene expansion that roughly corresponds to an independent expansion in humans that includes KLF1/2/4. The sixth may be a lone homolog of the vertebrate expansion that includes KLF 9/13/14/16.
Some C2H2 ZFP exist as single copy family members in all 4 genomes. Hence, these genes appear to be relatively resistant to deletion or expansion over evolutionary time. ZNF277 is one of several examples (Figure (Figure3).3). This gene encodes a protein with five C2H2 zinc fingers. The function of this gene is not well understood. In humans, this gene is expressed in early embryonic tissues, parathyroid adenoma, and chronic lymphocytic leukemia suggesting that this gene might be involved in differentiation .
ZFAM5 is also known as ZNF622 or Zinc finger related protein 9 (ZRP9) and is highly conserved almost universally in eukaryotes. Most homologs have 4 zinc-fingers (Figure (Figure3).3). ZFAM5 was originally identified in mouse as a cellular MPK38 serine/threonine kinase binding protein that may be involved in early T cell activation and embryonic development . In humans this gene is responsible for interaction with the ubiquitously expressed MYB-B transcriptional regulator. The role of this gene in other organisms is not well understood .
ZFAM6 is also known as Zinc finger Matrin Type 2 (ZMAT2). It is a highly conserved gene present in most eukaryotes. Little is known about its function. A single homolog of this gene was also found in Daphnia (Figure (Figure3).3). All family members possess one U1-like zinc finger. This type of C2H2 zinc finger is also present in the protein matrin, the U1 small nuclear ribonucleoprotein C, and other RNA-binding proteins.
ZFAM7 is also known as Zinc Finger 598 (ZNF598) in vertebrates. This gene is highly conserved in most eukaryotes and is present as a single homolog in most species, including Daphnia (Figure (Figure3).3). The gene has five C2H2 zinc fingers. The function of this gene is largely unknown.
SAP61 (Splicosome Associated Protein 61) is also known as Splicosome factor 3a subunit 3 (SF3a3), while SAP62 (Splicosome Associated Protein 62) is also known as Splicosome factor 3a subunit 2 (SF3a2). These sequences do not cluster within a phylogenetic tree of zinc finger genes (Figure (Figure3).3). However, since they share common function by virtue of being subunits of the same protein complex involved in RNA splicing, we described them together. A single homolog for these genes is present in almost all species of eukaryotes, including Daphnia. Both have a highly conserved U1-like zinc finger that is typical for RNA binding proteins. These are essential proteins, required for the formation of SF3a and functional U2 snRNP. Together with SF3b, SF3a binds to the 12S U2 snRNP, which contains a common core of seven Sm proteins and the U2-specific proteins U2-A and U2-B [17,18].
KIN17 is present in almost all eukaryotic organisms. Daphnia have a single homolog for this gene. KIN17 includes a single highly conserved U1-like zinc finger and is likely to be involved in cellular response to DNA damage, gene expression, and DNA replication. The KIN17 protein shares sequence homology with bacterial RecA protein over 40 residues near the c terminus. KIN 17 is ubiquitously expressed in mammals at low levels, but is up regulated after exposure to UV and ionizing radiation. KIN17 binds to DNA targets found in hot spots of illegitimate recombination [19-21].
A single homolog of ZNF207 is present in almost all vertebrates and invertebrates. Daphnia also has a single homolog (Figure (Figure8).8). The N terminus of this protein contains 2 C2H2 type zinc fingers. This gene is expressed ubiquitously in humans , but the exact function is still not clear.
The Daphnia genome appears to deploy a relatively efficient set of well conserved C2H2 ZFP because many C2H2 subfamilies have undergone lineage specific expansions in other genomes. In fact, 12 of the 40 well conserved families considered here show expansions in flies or humans, but not in Daphnia. For instance, Zfam1 (Figure (Figure4)4) codes for a small peptide of 70 to 80 residues that contains one C2H2 type zinc finger. This gene is conserved in chordates and insects. Lineage specific independent duplications have generated 2 homologs in Drosophila and 2 in C. elegans. Daphnia have just one homolog for this gene that clusters with the Drosophila homolog. The function of these genes is not well understood.
Humans have three ZFAM2/BCL11 homologs: Bcl11A, Bcl11B, and ZNF342/Zfp296. BCL11A has 5 zinc-fingers, and is a homolog of the murine gene Evi9. Evi9 was found to be deregulated in mouse myeloid leukemias induced by proviral integration. Hence Evi9 has characteristics of a dominant oncogene. Human EVI9/BCL11A is expressed in CD341 myeloid precursors. BCL11A is known to be involved in both Hodgkins and non-Hodgkins B-cell lymphoma [23,24]. BCL11A acts as a proto-oncogene for B-cell lymphoma, as a recessive oncogene for T-cell lymphoma, and is apparently required for the expression of some globin genes . Bcl11B also appears to act like a recessive oncogene for T-cells . The single Daphnia homolog appears closely related to the Drosophila version (Figure (Figure4).4). Apparently, duplication in mammals led to 2 or three versions of this gene, two of which became key regulators in hematopoetic lineages, while the third appears to function in the nervous system. ZNF342 has been indirectly implicated in the suppression of gliomas.
ZFAM4 is also known as Rotund and Squeeze in Drosophila, and Lin-29 (abnormal cell LINeage family member 29) in C. elegans. There appears to be 3 homologs for this family in humans and one additional homolog in Drosophila. Most genes in this family have 5 zinc fingers while two human genes (ZNF384 and ZNF362) and one Drosophila gene (CG2052) have an additional zinc finger. The Daphnia homolog clusters with Drosophila Rotund and squeeze (Figure (Figure4).4). Roughened eye (Roe) is a part of the rotund gene represented by a different transcript by using a different promoter. They both share the C-terminal region and zinc finger domain but differ in their N-terminal regions. Roe appears to have a role in eye development in the embryos . The C. elegans gene lin-29 is required for terminal differentiation of the lateral hypodermal seam cells during the larval-to-adult molt and proper vulva morphogenesis. CIZ (CAS interacting Protein or ZNF384) is a nucleocytoplasmic shuttling protein that binds to CAS elements found in promoters of matrix metalloproteinases (MMPs) genes that produce enzymes used to degrade the extracellular matrix proteins .
ZFAM11 is also known as KCMF (Potassium channel modulatory factor). It is present in most vertebrates and invertebrates. Daphnia, like most vertebrates, has a single copy. C. elegans too have a single member for this family which suggests that the gene has been duplicated specifically in flies/insects (Figure (Figure4).4). All of these genes have a highly conserved region which includes one ZZ type zinc finger and one C2H2 type zinc finger. The ZZ motif is known to bind two zinc ions and most likely participates in ligand binding or molecular scaffolding. In vertebrates, KCMF1 is shown to have intrinsic E3 ubiquitin ligase activity. Studies indicated that KCMF1 is involved in regulating growth modulators . The function of KCMF1 homologs in Drosophila and worms is poorly understood.
Fez typically has 6 zinc-fingers, and is a likely ortholog of the human genes ZNF-312 and 312-like. The forebrain expression pattern for this gene was first described in zebrafish, where there is also a second homolog known as Fezl. Fez expression is first detected in the anterior presumptive neuroectoderm of zebrafish during epiboly. Expression becomes focused in the rostral forebrain region during somitogenesis. By 24 hrs, expression is largely restricted to the telencephalon and anterior/ventral region of the diencephalon. Hence Fez is an early marker of anterior neuroectoderm and appears to regulate forebrain development . In mammals, these proteins appear to regulate olfactory-bulb development and neuronal differentiation in the cortex [31,32]. Double knockouts indicate that together FEZ and FEZL play a role in rostral brain patterning in mouse. Drosophila and Daphnia appear to have only one homolog of Fez (Figure (Figure4).4). There is little information about the role of the Fez protein in these organisms.
The zinc-finger E-box binding (ZEB) homeobox genes (Figure (Figure5)5) were previously described as two separate families in chordates (ZEB1/ZFHX1A/ZFH1 and ZEB2/ZFHX1B/ZFH2) or as the zinc finger axon guidance gene (ZAG1) in C. elegans. The gene is conserved in most bilaterians and usually has a homeodomain flanked by two separate, highly conserved zinc finger clusters. Most have 6 C2H2 type zinc fingers present as triplets distributed over the length of the gene. The E-box-like target sites overlap with those bound by the Snail family of zinc-finger proteins. ZEB proteins can repress target genes transcription by recruiting the CtBP (C-terminal-binding protein) co-repressor, which is a component of the larger repressor complex containing HDAC (histone deacetylase) and PcG (polycomb group proteins) . ZEB1 and ZEB2 in humans are expressed in several tissues including muscle and CNS. They are also expressed in T lymphocytes and during skeletal differentiation. They are mediators of epithelial dedifferentiation in mammals through the down-regulation of E-cadherin expression . ZEB2, also known as Smad Interacting Protein 1 (SIP1), is over expressed in cancer cells, causing loss of cell polarity and facilitating migratory and invasive behavior. SIP1 is also involved in the development of the neural-crest, the central nervous system, the septum of the heart, and establishment of the midline . Mutations in SIP1 cause Mowat-Wilson Syndrome, a mental retardation syndrome in humans [36,37]. In Drosophila, ZFH1 is a transcriptional repressor that regulates differentiation of muscle and gonadal cells, but is also expressed in the CNS . ZAG-1 in C. elegans also acts as a repressor that regulates multiple, discrete neuron-specific aspects of terminal differentiation, including cell migration, axonal development, and gene expression . Daphnia have a single homolog.
ZFHX genes encode zinc-finger homeobox containing proteins previously described as two separate families (ZFH3 and ZFH4) in bilaterians. In vertebrates this gene appears to have undergone duplications generating 2 or more additional homologs (Figure (Figure5).5). In humans, there are 3 homologs (ZFHX2, ZFHX3 and ZFHX4). Family members usually contain 8 or more C2H2 zinc fingers distributed throughout the gene. The Daphnia homolog has 11 C2H2 type zinc fingers. ZFHX genes are thought to be important regulators of neuronal differentiation . Like most homeotic genes, these genes are also involved in embryonic morphogenesis. ZFHX3, also known as AT motif binding factor 1 (ATBF1), inhibits cell growth and differentiation and may play a role in malignant transformation. It has been shown that it is potential tumor suppressor genes that repress alpha-fetoprotein (AFP) whose altered expression may lead to development of carcinoma in various tissues . ZFHX4 expression is important for neuronal and muscle differentiation, and in rats it is shown to be involved in neural cell maturation . The Drosophila homolog ZFH2 is involved in establishing proximal-distal domains in the developing wing disc .
Spalt-like (SALL) proteins have a variable number of zinc-fingers: the worm homolog has 6, flies and Daphnia have 7, and chicken/human have 7 or 9, depending on the homolog. The two fly genes, Spalt-major (SALM) and Spalt-related (SALR), appear to have duplicated independently from the ancestral gene that also gave rise to the four human homologs (Figure (Figure5).5). SAL in flies is required for proper development of the trachea, for vein patterning in wing imaginal discs, and for bristle formation in the thorax. In the later case, SAL acts through regulation of pro-neural gene expression . Nervous system expression is a well-conserved aspect of SAL gene function from worms to man. Mutations in the worm homolog (SEM4) affect development of neurons and sensory organs, while mutations in a human homolog (SALL1) result in sensorineural hearing loss and mental retardation (but also anal, genital, and limb malformation). Flies lacking both SALM and SALR are also deaf, with limb and genital malformations potentially analogous to those in humans .
ZEP homologs (Figure (Figure6)6) are referred to as Schnurri (Shn) in flies and SMA9 in worms. There are three or more homologs in vertebrates, but just one in worms, flies, and Daphnia, suggesting a lineage specific expansion exclusive to vertebrates. ZEP proteins generally have five C2H2 zinc-fingers divided into two pairs and a solitary medial finger (missing in some homologs). The worm homologs have an additional C-terminal zinc finger pair with little similarity to the other family members, implying a unique function. The C terminus of this protein is also unique to worm . ZEP/Shn/SMA9 homologs are involved in BMP signaling. Bone morphogenetic proteins (BMPs) are members of the transforming growth factor β (TGFβ) family that regulate various biological process including embryonic axes, cell fate determination, proliferation and apoptosis in both invertebrate and vertebrate model systems. In mouse, Shn-2 is required for efficient transcription of PPARγ2, which in turn drives the expression of several genes involved in adipocyte differentiation . Shn3 in mouse is a transcriptional regulator of Runx2, which in turn activates several osteoblast differentiation genes. In humans Shn3 is involved in T-cell proliferation, cytokine production, effector function, and inflammatory response . In worms, SMA9/SMAD affects body size regulation and male tail patterning in worms . In Drosophila, Shn binds to SMAD to form the repression complex controlling brinker (Brk), which is a transcriptional repressor of the Dpp gene. Dpp is involved in anterior-posterior patterning and cell proliferation in the wing blade .
The Insulinoma Associated gene (IA1) of vertebrates is called Nerfin in flies and Egg laying 46 (EGL46) in worms. Daphnia and C. elegans appear to have just one homolog with two conserved zinc-fingers. This gene appears to have undergone independent duplications in the human and fly lineages, giving rise to two paralogs in each (Figure (Figure6).6). IA homologs are involved in various aspects of neuronal differentiation including cell fate specification, axon guidance decisions and cell migration. In Humans IA1 promotes pancreatic and intestinal endocrine cells development . Recent reports for mice and zebra fish imply that its role in neurogenesis is conserved across vertebrates as well as invertebrates [52,53].
Hamlet is also called PR domain zinc finger protein 16 (PRDM16) or ecotropic virus integration site 1 (EVI-1) homolog in vertebrates, Hamlet in Drosophila and Egg laying 43 (EGL43) in C. elegans. Daphnia has one homolog. Independent duplications in insects and vertebrates appear to have generated two paralogs each in their respective clades (Figure (Figure6).6). All homologs contain an N-terminal PR (PRD1-BF1-RIZ1) homology domain followed by a group of six zinc fingers and a group of three additional ZFs at the C-terminus. In Drosophila, hamlet functions as a binary genetic switch specifically affecting the dendritic branching structure of external sensory (ES) neurons in the peripheral nervous system . In C. elegans, egl-43 encodes two transcription factors that act to control HSN migration and phasmid neuron development, presumably by regulating other genes that function directly in these processes . The murine homolog Evi-1 was obtained from a common site of viral integration in murine myeloid leukemia. The human homolog MDS1/EVI1 is transcriptionally activated by several recurrent chromosomal aberrations like acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Recently, another homolog of HAM called MEL1 (MDS1/EVI1-like gene 1) was identified as a member of the EVI1 gene family and also as a PR domain member (PRDM16), all of which are implicated in neural development . The disruption of the PR domain of this gene can cause leukemia. A partial disruption of the Mds1/Evi1 locus in mouse leads to multiple defects causing mid-gestation lethality, including defects of hypocellularity in the neuroectoderm and a failure of peripheral nerve formation .
The stripe gene (Sr) in Drosophila functions in the epidermis to facilitate cellular recognition of myotubules (Figure (Figure7).7). Hence, stripe mutants exhibit a disruption in myotubule patterning. Stripe is a member of the EGR (early growth response) family of transcription factors. The Egr transcription factors are rapidly induced by diverse extracellular physiological/chemical stimuli within the vertebrate nervous system. These proteins possess 3 zinc fingers. Another member of this family, Krox20, is known to be involved in development of the hindbrain and neural crest in mammals. Analysis of mouse knockouts has demonstrated that Egr2/Krox-20 is important for hindbrain segmentation and development, peripheral nervous system (PNS) myelination, and Schwann cell differentiation . Krox20 expression correlates with the onset of myelination in the PNS. Egr-1 and egr-3 are also both implicated in learning and memory . EGR-1 was shown to be induced in specific subregions of the brain during retrieval of fear memories. Knockout mice further showed that egr-1 was essential for the transition from short- to long-term plasticity and for the formation of long-term memories. In T-cells of the immune system, egr-3 and egr-4 work together with NF-kappaB to control transcription of genes encoding inflammatory cytokines. Egr-2 and egr-3 can also inhibit T cell activation . The egr genes are distantly related to the Wilm's tumor (WT) gene. The latter, like the distantly related klumpfuss gene in Drosophila, has 4 zinc fingers rather than 3. The single Daphnia homolog is seen to be most similar to that in Drosophila (Figure (Figure33).
Disco homologs have 4 to 6 zinc fingers in a paired arrangement. In humans, Basonuclin 1 and 2 (BNC1 & 2) correspond to the Disco and Disco-r genes present in flies (Figure (Figure7).7). In humans basonuclin is expressed in keratinocytes, germ cells, cornea, and lens epithelia. BNC2 mRNA is abundant in cell types that possess BNC1 but is also found in tissues that lack BCN1, such as kidney, intestine, and uterus . In keratinocytes, BNC maintains proliferative capacity and prevents their terminal differentiation . Recently, it has been suggested that BNC2 is also involved in mRNA export, nonsense-mediated decay, and/or polyadenylation . Both BNC's activate the expression of rRNA genes. Disconnected (Disco) and disco-related (Disco-r) are two functionally redundant, neighboring genes localized on the fly X chromosome that may act in combination with the homeotic genes deformed (dfd) and Sex Combs Reduced (SCR) to specify gnathal structures in Drosophila [63,64]. The ancestral Disco gene appears to have undergone independent duplication events in the human and fly lineages. Daphnia appear to have just one homolog.
Lineage Specific duplications of well conserved C2H2 ZFP in Daphnia appear to be rare. GPS homologs (GFI/PAG/Sens) generally have 6 tandem zinc-fingers (Figure (Figure6).6). The C elegans homolog (PAG3) has only 5 fingers (missing the first one). The Drosophila homolog (SENS) has only 4 fingers (missing the first two). The two Daphnia homologs cluster together and thus appear to be a recent duplication independent of that producing the two GFI homologs in humans. One Daphnia homolog (GPSa) has a unique 5aa insertion between fingers 1 and 2. GPS proteins are involved in hematopoesis and neurogenesis. The hematopoetic functions of vertebrate GFI-1 and GFI-1B (6 fingers) appear to be exchangeable, but distinct, due to differential cell-type specific expression. They differ in their ability to facilitate late maturation of inner ear neurons . While generally known as transcriptional repressors, some act as transcriptional activators and/or conditional repressors (like Sens). The single worm homolog serves to repress touch neuron gene expression in interneuron cells . The Drosophila Senseless gene is required for normal sensory organ development . The two Daphnia GPS genes are closely linked on the same scaffold. A domain required for transcriptional repression, the SNAG domain, is found only in vertebrate GFI proteins to date; hence no SNAG domain is seen in the Daphnia homologs .
PRDM/Blimp proteins are Putative Positive Regulatory Domain/B-lymphocyte induced maturation proteins. These proteins have 5 fingers, but the 5th finger is relatively poorly conserved and has a C2HC structure. A single PRDM1/Blimp gene is found in humans, one in flies, and one in worms (Figure (Figure6).6). Blimp1 expression in the tracheal system of Drosophila embryos was found to be important for the development of this tissue. Blimp1 is also induced by ecdysone, and reduced Blimp1 expression results in prepupal lethality . Blimp1 is expressed in many other tissues of Drosophila. Blimp is similarly expressed in many different tissues in vertebrates, where it is known to play important roles in embryogenesis, germ cell determination, specification in nerve and muscle cells, linage determination in epidermis, and B-cell maturation [70-72]. There appear to be two Blimp homologs in Daphnia. However, in one, the 5th finger is missing, and the third finger has two serines replacing the two cysteines of the finger. Consequently, the function of this finger (and the entire protein) may have changed substantially.
Zinc finger X-linked duplicated (ZXD) is a newly described C2H2 zinc finger family in bilaterians present in most chordates and has undergone duplication specifically in mammals. Among the bilaterians, humans and other mammals have 3 homologs for this family while nematodes, water fleas, sea urchins, chicken and frog all have one homolog including Daphnia (not shown). Interestingly, no homologs have been detected in insect lineage that has a sequenced genome and in C. elegans. Zinc finger X-linked duplicated family member C (ZXDC) along with its binding partner ZXDA, forms a complex that interacts with CIITA and regulates MHC II transcription [73,74]. The function of the other paralog ZXDB as well as the homolog in C. elegans is little understood.
CTCF and CTCFL (Figure (Figure8)8) have 11 zinc fingers arranged in tandem. They act in part as "enhancer blockers" in vertebrates by binding to insulator elements. In flies, dCTCF binds to the Fab-8 insulator element between iab-7 and iab-8 . Mammalian CTCF's are also involved in reading gene imprinting marks (i.e. Boris) at a high fraction of imprinted genes . Recently, CTCF (along with YY1) has also been implicated in the global repression mechanism known as X-inactivation . There are two human homologs, CTCF and CTCFL, but only one in Daphnia and Drosophila, and none in worms.
Yin Yang 1 (YY1) generally has 4 zinc-fingers. Recent phylogenetic analyses proposed that the YY1 gene has undergone independent duplication events in different lineages through retro-transposition. Two duplication events in placental mammals are believed to have given rise to the YY1, YY2, and REX1 (Reduced EXpression). A similar duplication event in flies produced the pleiohomeotic (Pho) and Pho-like genes . The Daphnia Pho gene clusters with the Drosophila Pho (Figure (Figure8).8). YY1 acts to activate or repress transcription in different contexts. In mammals, this gene appears to play multiple roles, including induction and patterning of the embryonic nervous system, differentiation within blood cell lineages, cell-cycle control, cell proliferation, differentiation, and apoptosis, DNA synthesis and packaging, and X-inactivation . The fly homologs, Pho and phol, are classified as PcG (poly comb group) proteins that bind to PREs (PcG response elements that regulate homeotic genes). Pho and Phol act redundantly to repress homeotic gene expression in imaginal discs of the fly .
Hindsight (Hnt) in Drosophila is a homolog of Ras-Responsive Element Binding protein 1 (RREB1) in vertebrates. No homolog has been detected in worms, but Daphnia appears to have a single homolog (Additional file 1). The number of zinc fingers varies from species to species: 15 in humans, but only 10 in Daphnia. The Pebbled (peb) gene encodes the Hindsight protein, involved in morphogenetic processes and is expressed in several kinds of epithelial cells during development including extra embryonic amnioserosa, midgut, trachea, and the photoreceptor cells of the developing adult retina. In the amnioserosa, Hnt is required for embryonic germ band retraction and embryonic dorsal closure . In tracheal development, it is required for the maintenance of epithelial integrity and assembly of apical extracellular structures known as taenidia . During eye development, it is required for the accumulation of F actin in the apical tip of photoreceptor precursor cells in the ommatidial clusters, as well as in the developing rhabdomere during the pupal period . HNT expression is also essential for maintaining epithelial integrity for amnioserosa, and retinal epithelium. Recently HNT has been shown to regulate Notch signaling in follicular epithelial development, which in turn alters cell differentiation and cell division. It is responsible for repressing String, Cut, and Hedgehog signaling, which are essential for regulating follicular cell proliferation . The human homolog of HNT, RREB1 acts as a transcription factor that binds specifically to the RAS-responsive elements (RRE) of gene promoters. Recent investigations indicate that RREB1 is essential for spreading and migration of MCF-10A breast epithelial cells .
OAZ was apparently duplicated giving rise to two homologs in many vertebrates, including humans. OAZ is also called ZF423, while the closely related protein EHZF1 is called ZF521. No worm homolog has been detected, but a single homolog exists in Daphnia (Figure (Figure5).5). Human and mouse homologs for this family have 30 zinc fingers, while Drosophila and Daphnia have fewer. OAZ/ZNF 423 and EHZF/ZNF521 are implicated in the control of olfactory epithelium, in B-lymphocyte differentiation, and in signal transduction by bone morphogenic protein (BMP). They are known to activate the BMP target genes vent-2 (Xenopus) and ventx2 (human) via interaction with SMAD . ZNF521, in humans is known to regulate ontogenesis of the hemato-vascular system through BMP pathways. OAZ/ZNF423 can also repress BMPs by activating repressors of BMPs [87,88]. OAZ can apparently use different clusters of zinc fingers to interact with DNA, RNA or Protein . In flies, 21 Zinc fingers are grouped into 4 clusters; the cluster near the amino terminus is assumed to bind DNA. DmOAZ is expressed throughout the life of flies and is strongest in posterior spiracles. Recent studies have shown that OAZ is involved in controlling posterior structure by regulating specific genes .
ZFAM9 is also known as Positive regulatory domain 13 (PRDM13) and is present in most vertebrates. A likely homolog of this gene is also present in Drosophila and Daphnia (Figure (Figure8).8). However, the C. elegans genome had no homolog for this gene. All orthologs have 4 C2H2 zinc fingers. The function of these genes is unclear.
MTF (Metal-responsive Transcription Factor) have 6 tandem zinc-fingers. MTF activates metallothionein promoters in metazoans (Figure (Figure3).3). MTF binds to the metal responsive element (MRE) and is involved in metal homeostasis and heavy-metal detoxification . MTF in Drosophila appears to have a greater role in copper homeostasis than seen in vertebrates . A single MTF homolog exists in Daphnia (not shown), but there appear to be no homologs in worms.
TFIIIA is a DNA-binding transcription factor that also binds RNA. It is generally required for 5sRNA gene expression in metazoans. TF3A's are poorly conserved between distantly related organisms. Vertebrate, insect, fungal, and plant sequences show within group similarity but only weak between group similarity [93,94]. TFIIIA usually has 9 zinc-fingers, as does the single homolog in Daphnia (not shown). In yeast, S. pombe has a 10th zinc finger following a long spacer, while S. cereviseae has a long spacer between the 8th and 9th fingers . No homolog of TFIIIA has yet been identified in worms.
Daphnia have single homologs of the following well-known developmental control genes in Drosophila: Ovo, CI, LMD, OPA, SCRT, Slug, and ESG. In addition, Daphnia have three genes similar to the Odd gene family members Bowl, Bowel, and SOB (Figures S1 and S2). Although these genes encode C2H2 ZNF, other companion papers from the Daphnia consortium are intended to cover these in detail (personal communication).
Zinc-finger proteins probably represent the largest class of DNA-binding transcription factors in metazoan organisms, and as such, are likely to play critical roles in determining the extent to which various aspects of form and function are shared among taxa. The majority of these proteins are C2H2 zinc-finger proteins, many of which are already known to affect development and/or differentiation through a more or less direct effect on gene activation and/or repression.
A recent comparison of the full complement of C2H2 zinc-finger protein observed or predicted within worm, fly, and human genomes has lead to the tentative identification of nearly 40 orthologous groups shared between humans and invertebrates. From a phylogenetic perspective, the Daphnia genome would be expected to contain identifiable members for most of these groups. Using the reciprocal blast hit approach for estimating orthology, we uncovered 58 genes in Daphnia that appeared to be members of one of the 40 families conserved in bilaterians (Tables, (Tables,2,2, ,3,3, ,44 and and5).5). At least one member was identified for almost all families; only the JAZ family appeared to be absent from both Daphnia and C. elegans. All but two families had 3 or fewer members; only the SP and KLF families had more than three members each. Only the Odd-skipped and Snail families had 4 members each, while GLI, GFI, and Blimp had two members each. For all other families (33), only a single conserved member was identified in Daphnia. For 9 of these, a single conserved member was also present in each of the three other genomes; hence the latter genes appear to be relatively resistant to lineage specific deletion or expansion. Only three of the 40 families (including the most notable example, KLF) exhibited duplication or expansion in Daphnia relative to flies, but in many cases, gene duplications or expansions observed in flies and/or humans appeared to be absent in Daphnia. Thus the Daphnia genome appears to be relatively efficient with respect to the number of C2H2 ZNF homologs per family.
Updating a previous analysis of C2H2 ZFP present in the common ancestor of bilaterians based on a survey of Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans, we identified 58 well conserved C2H2 ZFP genes in Daphnia that belong to 40 distinct families. The Daphnia genome appears to be relatively efficient with respect to these well conserved C2H2 ZFP, since only 7 of the 40 gene families have more than one identified member. Worms have a comparable number of 6. In flies and humans, C2H2 ZFP gene expansions are more common, since these organisms display 15 and 24 multi-member families respectively. In contrast, only three of the well conserved C2H2 ZFP families have expanded in Daphnia relative to Drosophila, and in two of these cases, just one additional gene was found. The KLF/SP family in Daphnia, however, is significantly larger than that of Drosophila, and many of the additional members found in Daphnia appear to correspond to KLF 1/2/4 homologs, which are absent in Drosophila, but present in vertebrates.
Identification of orthologs in D. pulex: Previously identified orthologs  that were present in the common ancestor of the bilaterians Homo sapiens, Drosophila melanogaster, and Caenorhabditis elegans were used as a focus for the present study. Protein sequences from each of the 3 different species belonging to 39 different classes of C2H2 zinc finger proteins were collected. Each of these sequences was used in turn as a query in a BLAST search against the v1.1 gene model annotations of the draft genome assembly of D. pulex to detect homologous protein sequences. High-scoring Daphnia sequences were examined to ensure a good overall match, and then used in a reciprocal BLAST against Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans. Only those sequences that detected members of the same family represented by the original query sequence were retained as putative orthologs for those families. Other known C2H2 zinc finger binding genes were also used to search for any new families common to these bilaterians.
After the initial reciprocal BLAST approach, to search for additional gene members for the families, Hidden-Markov model (HMM) searches were conducted using the HMM profiles obtained from TreeFam for each of these 39 gene families. Daphnia pulex protein predictions containing all models were used to search with HMM profiles using HMMER 3.0 search. Only domains with an E-value < 0.1 were accepted, and any identified zinc finger gene was further manually inspected for their family characteristics. BLAST search was performed against the non redundant protein dataset at NCBI to confirm its family association. Only genes that appeared to be substantially complete and those that had approximately the same number of zinc fingers as other members of the proposed family were considered to be unambiguous members of that family. This approach identified additional homologs of Odd-skipped and Snail families and validated all the genes that were identified with reciprocal BLAST approach.
Alignments and phylogenetic analyses: Daphnia pulex homologs identified as above were combined with their respective family members from the other three bilaterians (Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans; see Tables, Tables,2,2, ,3,3, ,44 and and5)5) and used to create family-specific and/or multiple family alignments using the Muscle program (version 3.6) with 16 iterations and a standard Clustalw weighting scheme (gap opening extension,  closing and separation penalty of 10, 0.2, 4 and 1 respectively). The obtained alignment was then trimmed and converted to suitable format using trimAI (version 1.2). Phylogenetic trees were generated using Bayesian inference (MrBayes; version 3.1.2) using WAG amino acid substitution matrix, empirically estimated amino acid frequencies plus gamma distribution of eight categories (WAG+F+Γ8). Successive runs were executed for a fixed number of generations with a sampling frequency of 100 and a burn-in parameter of 200. Runs were extended in each case until a convergence value of less than 0.03 was achieved. Since the multi-family trees each contained only an exclusive subset of the 40 total C2H2 zinc-finger families, internal branch patterns and statistics could be misleading with respect to the degree of relatedness between families. Hence, internal branches indicating a specific relationship between specific families within multi-family trees were collapsed into polytomies for presentation.
GS established the overall concept and approach, and both YB and GS initiated gene identification and annotation. AS completed the bulk of the identification, annotation, organization, and documentation of genes, as well as producing all phylogenetic trees and writing early drafts of the manuscript. All authors read and approved the final manuscript.
Genes likely to be involved in oogenesis and/or pattern formation showing no expansion in Drosophila relative to Daphnia.
Genes likely to be involved in oogenesis and/or pattern formation showing expansions in Drosophila relative to Daphnia.
The sequencing and portions of the analyses were performed at the DOE Joint Genome Institute under the auspices of the U.S. Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, Los Alamos National Laboratory under Contract No. W-7405-ENG-36 and in collaboration with the Daphnia Genomics Consortium (DGC) http://daphnia.cgb.indiana.edu. Additional analyses were performed by wFleaBase, developed at the Genome Informatics Lab of Indiana University with support to Don Gilbert from the National Science Foundation and the National Institutes of Health. Coordination infrastructure for the DGC is provided by The Centre for Genomics and Bioinformatics at Indiana University, which is supported in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. Our work benefits from, and contributes to the Daphnia Genomics Consortium.