Identification of proteins with a DCX domain
Human and mouse proteomes were searched for sequences similar to that of the human DCX domain yielding a total of 22 proteins containing one or two DCX repeats (Table , the complete sequences used in the present study are found in supplementary Fig. ).
Sequence relationships between human and mouse proteins containing one or two DCX domains.
Figure 1 A. Schematic representation of human and mouse proteins containing DCX domains. DCX domains more similar to the N-terminal repeat of DCX were labeled in green, whereas those more similar to the C-terminal repeat were labeled in purple. Protein kinase (more ...)
Serine/threonine protein kinase domains were found in three human/mouse proteins (DCLK, DCLK2, and DCLK3), and a ricin domain predicted to bind carbohydrates was found in a human/mouse protein referred to as FLJ46154 [30
]. The structure of the human FLJ46154 and DCDC2B proteins differed from other proteins with tandem repeats; they contained a repeat more similar to DCX C-terminal repeat, which appeared in the N-terminal part of this protein, and a second repeat more similar to DCX N-terminal repeat. In the mouse orthologs of these two proteins, only one DCX domain was present. All the mouse genes reside in chromosomal regions (Fig. ), which are synthenic to the human orthologs (supplementary Fig. 2). This includes also the location of DCDC1 and BAC26042, however they are not true orthologs since the sequence similarity is very low (52%, among 46 out of 86 amino acids) only in the DCX domain, and the phylogenetic and evolutionary analysis, described below, indicate that they are different. BAC26042 is also unique in its close physical proximity with FLJ46154, the distance between these two genes being only 2 kb, suggesting they may share common regulatory elements.
This study is focused on the DCX domains, and does not cover the full-length proteins. Phylogenetic analysis was conducted for the individual DCX domains, separating the N- and C-terminal parts (Fig. ). Several interesting features emerged from the human and mouse DCX domain phylogenetic analysis. The majority of human genes had a mouse ortholog. Two genes do not obey this rule as they do not have unambiguous orthologs (human DCDC1 and mouse BAC26042). Furthermore, in most instances the N-terminally located DCX domains were more similar to other N-terminal domains than to the C-terminal domains of the same protein. The two exceptions were already mentioned; human DCDC2B and FLJ46154. Sequence analysis combining BLAT [31
] and phylogenetic analysis identified the orthologous relationships listed in Table .
Maximum Likelihood (ML) phylogenetic tree including DCX domain proteins from human and mouse, bootstrap values are indicated.
Next, we extended the sequence analysis by including several additional non-mammalian genomes. Initially, the analysis encompassed proteins found in the conserved domain database CDD [32
]. Subsequently, these searches were broadened with extensive BLAST, TBLASTN, and BLAT searches. Using BLAT search [31
] sequences from opossum, rat, and rhesus monkey were added. Ciona
sequences were added using TBLATN analysis against the genomic data, and only those sequences corresponding to ESTs were included. Hence, the present phylogenetic analysis included DCX-motif-containing proteins from human, chimpanzee, mouse, cow, dog, chicken, fish, worms, insects, frogs, fungi, and sea squirts (multiple alignments are provided in supplementary Fig. 3). The analysis of the tandem DCX domain proteins (67 proteins) resulted in an unrooted tree with bootstrap values shown in Fig. .
ML tree of the tandem DCX domain proteins from different species. Bootstrap values are indicated.
Four groups of proteins are easily categorized within the tandem DCX domain tree, which contains 67 proteins. From top to bottom, the group of RP1 and RP1L1 include orthologs from the frog Xenopus laevis, fish (zebrafish Danio rerio, and pufferfish Tetraodon nigrovidis), chicken, cow, dog, mouse, rat, chimpanzee, and human. The second group includes proteins similar to DCDC2A (previously known as DCDC2, name approved by the HUGO Gene Nomenclature Committee) from mammals, including opossum (a marsupial), as well as chicken, fish, frog, and simpler organisms such as the ascidian Halocynthia roretzi, and the sea squirt Ciona intestinalis. The third group of proteins is devoid of mammalian proteins, but contains proteins from the social amoebae Dictyostelium discoideum, and one protein from Ciona intestinalis. Similar proteins were identified in the fruit fly, Drosophila melanogaster, the malaria mosquito, Anopheles gambiae, and the honey bee, Apis mellifera. Furthermore, two similar proteins from the worms Caenorhabditis elegans (ZYG-8), and Caenorhabditis briggsae are detected in this group. The fourth group of proteins includes those most similar to DCX, DCLK, and DCLK2. This group included mammalian, chicken, fish proteins, and one protein from and Ciona intestinalis. This analysis of proteins with two domains, was followed by an analysis for the N- and C- terminal domain proteins (supplementary Figs. 4-5). One hundred and seven proteins were analyzed in the N-group, and one hundred and one proteins in the C-group, suggesting that there are slightly more proteins similar to the N-terminal part of DCX. The general subdivision into the four groups was preserved. Inspection of the proteins composing the N-terminal phylogenetic tree detected that additional proteins were added mainly to the third group containing the Dictyostelium discoideum protein (including 8 members). Proteins from flies and worms were also added to this group. The fruit fly genome contains five DCX proteins, four of which are single repeats. Furthermore, several mammalian proteins were added to this group as well. This group was increased to contain 26 members in the N-group and 19 members in the C-group. This group included a protein from the unicellular organism Plasmodium falciparum, the malaria parasite.
Inspection of the proteins composing the C-terminal phylogenetic tree detected a group containing all the DCLK3 proteins. It should be noted that this group as a whole is quite distinct from DCX, DCLK, and DCLK2. Proteins in this group contain a single DCX domain from mammals (human, chimpanzee, cow, rat, and opossum), but also from fruit flies, honeybees, and malaria mosquitoes. An exception is the ciona protein demarking this group (Sca_10), which has a tandem repeat. One of the groups contain both DCDC2A and DCDC2B proteins, and yet an additional group contains several more DCDC2B proteins, suggesting probably less evolutionary conserved sequences in the C-terminal domains of this subset of proteins.
During the analysis of the DCX domain proteins, the presence of tandem or single DCX-domains was noted in corresponding orthologs. The simplest way to explain these differences may be through loss of intergenic sequences. The analysis of exon-intron boundaries included all the mammalian species and chicken since it is a non-mammal vertebrate, close enough to mammal to make comparison possible (Table ). In general, the location of the intron-exon boundaries is highly conserved. In some cases the presence of an additional exon, does not change the length of amino acids that are part of the DCX domains. Such is the case with DCDC2C; most species contain one exon, whereas the cow ortholog the corresponding amino acid sequence is divided into two exons. However, in most cases, the lack of an exon implies a reduction in the amino acid information. For example, FLJ46154 contains in most species three exons, whereas in mouse and in the corresponding sequence in rat only two. Consequently, in mouse and rat only a single DCX domain was identified in the region corresponding to the human FLJ46154 DCX domains. This analysis also allows identifying key time points in the evolution of the DCX-domain proteins. The common vertebrate ancestor of mammals and birds is now believed to reach back 310 million years, marsupials split from the main (placental) group about 180 million years ago, and humans and rodents split off from their evolutionary family tree about 87 million years ago. The above analysis revealed that it is likely that BAC26042 was lost during evolution (in mouse two exons exist, while rat and rhesus monkey harbour only one exon). This analysis has been complicated due to a predicted sequence in rat (XM_230359) that is a fused sequence containing both FLJ46154 and BAC26042. However, we have experimental evidence that do not support the existence of this fused sequence. Antibodies we generated against the mouse FLJ46154 protein recognize a protein of the predicted size for FLJ46154 in mouse brain extract (supplementary figure 6). Thus, we have conducted our analysis based on the human data, which is derived from mRNA and EST data, and the mouse data that is based on EST data, supported by our experimental data. DCLK3 was generated after the mammals and birds split. BAC26042, FLJ46154, and DCDC2C were generated after the marsupials split from the main placental group. DCDC1 was generated after the humans and rodent split. According to this analysis the most conserved genes in this superfamily are DCX, DCLK, and DCDC2A.
Summary of the number of exons in DCX domains using BLAT . The species analyzed are human (hum), chimpanzee (pan), rhesus monkey (rhe), dog, cow, mouse (mou), rat, opossum (opp), chicken (chi).
Following analysis of the two groups including N- and C- terminal domains, analysis for all the DCX proteins was conducted (data not shown). As previously observed for the human and mouse proteins (Fig. ), the N- and C- terminal domains were more similar to each other than to the corresponding repeat within the same protein. This result suggested that the DCX-domain duplications were ancient, and probably these two repeats have differed in their functions. Subspecialization of the N-terminal and C-terminal DCX motifs can be visualized at the level of logo sequences. Previously, four conserved blocks (A-D) within the DCX motif were identified [12
], these conserved blocks are shown in the bottom of Fig. . When the N-terminal region was analyzed separately from the C-terminal region, it was obvious that the A and portions of B- and C- subdomains specify the N-terminus, while a portion of the C- subdomain specifies the C-terminus (Fig. ). This result was obtained using the Lawrence Gibbs sampler motif-finding algorithm. Similar results were obtained with the Smith's MOTIF motif-finding algorithm (data not shown). This analysis indicates that although the tandem domains share a short sequence of similar amino acids, the N-terminal domain has a unique very conserved block of amino acids.
Figure 4 Sequence logos of the N-terminal and C-terminal DCX motifs. Multiple alignments of the motifs from the DCX motifs are shown as sequence logos. The height of each amino acid represents bits of information and is proportional to its conservation at that (more ...)
Expression analysis by in situ hybridization
Taken into consideration the similarities among the different DCX-domain paralogs, and their common functions in relation to signal transduction and microtubule regulation [26
], it is important to establish when and where these genes are expressed. This will help in delineating their potential function. For example, the distinction whether a specific gene is expressed in proliferating, migrating, or differentiating cells is critical when trying to figure out gene function. Additionally, coexpression in a particular tissue may indicate that paralogs could cooperate or be redundant.
Our analysis was carried out by in situ hybridization at E14.5, a stage at which many differentiated cell types characteristic of an adult organism have formed yet at same time such mid-gestation embryonic tissues still contain progenitor cells. This analysis was performed with the goal to generate an expression profile "snapshot". With the exception of the ubiquitously expressed Dcdc2B (Fig. ), expression patterns of genes encoding DCX-repeat-containing proteins are to a greater or lesser extent regional. Dcx, Dclk and Dclk2 are expressed in the central and peripheral nervous system including the brain, spinal cord, cranial and dorsal root ganglia and in the parasympathetic ganglia (Fig. ). A high power view (Fig. ) shows that in the developing neocortex Dcx and Dclk transcripts are much more abundant in the preplate, but individual cells expressing the Dcx and Dclk genes can be detected in the ventricular zone. Both Dclk2 and Dcdc2B are expressed in the developing neocortex, largely uniform and at low levels, but more pronounced in the ventricular zone than Dcx and Dclk. Outside the nervous system, prominent sites of Dcx and Dclk expression are the skeletal muscles, tongue muscles and individual cells of the olfactory epithelium (Fig. ). The latter tissue also expresses Dclk2 (Fig. ).
Figure 5 In situ hybridization patterns of genes containing the DCX protein domain. Blue stain denotes expression of the gene whose name is indicated in each panel. For details see Text. (A-D) Sagittal sections of whole E14.5 embryos. (E-H) Expression in frontal (more ...)
BAC26042, FLJ46154 and Dcdc2A exhibit highly regional expression patterns, which in the brain appear to be similar for BAC26042 and FLJ46154 (Fig. ). Fig. and show sagittal sections through the forebrain with BAC26042 and FLJ46154 transcripts present in the septum, various cell groups of the ventral thalamus, and in the posterior hypothalamus. Other sites of expression are a group of neurons at the base of the olfactory bulb (Fig. ), the pretectal area, the facial nucleus, and scattered neurons in the ventral and dorsal parts of the spinal cord (data not shown). Dcdc2A expression in the CNS is restricted to a group of scattered neurons in the lateral most part of the developing cerebellum (Fig. ). BAC26042 and Dcdc2A are expressed in the choroid plexi (Fig. ).
The majority of DCX-repeat encoding genes are expressed in the developing retina. Three types of patterns emerge: Dcx, Dclk, Dclk2 transcripts are strongly expressed in the postmitotic inner neuroblastic layer (Fig. ), whereas BAC26042 and FLJ46154 are also expressed in this layer, but in a more restricted fashion near and at its surface (Fig. ). Finally Rp1l1 transcripts are found in the outer neuroblastic layer that contains proliferating cells (Fig. ). Radially arranged Dcx, Dclk or Dclk2-expressing cells are detected in the outer neuroblastic layer which is reminiscent of the situation seen in the ventricular zone of the neocortex (Fig. ).
In addition, lung and kidney express Dcx, Dclk and Dcdc2A. Dclk2 transcripts are also found in the developing ovary and weak expression is also seen throughout the kidney (data not shown).
Our analysis included most of the 11 genes listed in Table , the exceptions being Dclk3, and Dcdc2C for which we could not yet identify suitable templates. Rp1 was also examined but it is not expressed at E14.5, except expression noted in some midline cells of the spinal cord (data not shown). To summarize our studies, we found that tissues destined to respond to electrical stimuli – central and peripheral nervous systems and skeletal muscles – represent the most striking sites of expression of DCX-repeat encoding genes. Outside these tissues, expression is mostly low and usually not regional, the exceptions being kidney and lung.
Expression analysis in human and mouse
The relevance of functional genomics approaches using mouse models for studying human diseases obviously depends on the similarity of gene expression in the two species. Thus, we compared the expression of the human members of the DCX gene superfamily investigated in this study with their mouse orthologs. For this purpose, we used the Unigene database of expression data website. Tissue-dependent expression profiles for both human and murine DCX repeat-containing proteins were generated from the EST count provided by UNIGENE [33
]. Since the mouse-human comparison was a key feature, the analysis was limited to tissues with a high total number of EST counts that were common to both organisms. We analyzed data for ten different human genes, and eight mouse genes. For two human genes there were no corresponding expression data in mouse: DCDC2B
, which has a mouse ortholog that is not listed in UNIGENE, and DCDC1
, which does not have a mouse ortholog. The clustered expression data resulting from this analysis is shown Fig. and a gene-gene correlation based on this information is shown in Fig. .
A) Clustered Unigene  gene-tissue expression data. B) Gene-gene correlations based on Unigene expression data.
We tested the significance of the correlation by random permuation analysis. The correlations were re-calculated 1000 times after rescambling for each gene independently all tissues at random. We found that all high correlation (>0.5) were significant (p < 0.01). Two clusters revealing very high correlation were observed. The largest group included human RP1
, and their murine orthologs. In addition, DCDC1
, which so far had been reported to be expressed mainly in testis, and embryonic brain [34
], was included in this group. This group is characterized by high levels of expression in the eye, which is common amongst most DCX proteins, and has been noted in our in situ
analysis. In addition to expression in the eye, these genes are expressed at lower levels only in a few other tissues. In this group there is no clear distinction in the gene-gene correlation in the expression in mouse and human. The correlation between the different members of this group is >0.9 in all cases. Both the human and the mouse FLJ46154
are related to this group, however the correlation between the human and mouse FLJ46154
is low (0.3). The protein-products of these two genes have also diverged, with a loss of a DCX domain in the mouse protein. Thus, it may be possible that there has been less conservation in the regulatory regions of these genes as well.
The second group exhibiting high gene-gene correlations includes the murine genes Dcx, Dclk
, and Dclk2
, and their human orthologs. Human DCLK2
exhibited somewhat lower correlations with its mouse ortholog (0.4) than the other genes in this group. This may stem from its general overall lower expression levels (Fig. ). Our in situ
data also indicated a high similarity in the co-expression of Dcx, Dclk
, and Dclk2
. Furthermore, our functional analysis [26
] indicated that this group shares more properties and only they interact with the scaffold protein neurabin 2. A third group of genes with lower levels of correlation include DCDC2A, DCLK3
, and Dclk3
. In this group the correlation between the corresponding orthologs does not exceed 0.5. It should be noted that there are some additional high correlations between different genes, for example; DCLK3
, or FLJ46154
with DCX, DCLK
, and Dcx