We identified and annotated 82 genes representing 21 unique gene families from the Daphnia pulex
v1.1 draft genome sequence assembly (September, 2006) (Table ) [23
]. In parallel, we collated information from the genome sequences of five insect species to identify changes in gene family membership among taxa. For the phylogenetic analyses, we included additional arthropod sequences when they were available. The differences in gene family membership among taxa are of particular interest as they may reflect the evolutionary genomic response to the unique repertoire of immune challenges that a species has faced, and thus provide clues as to which genes are evolving in response to host/parasite interactions. For example, recognition and effector molecules that interact directly with pathogens display considerable species-specific gene expansion, in contrast to signal transduction molecules, which show no copy number variation and high sequence divergence [22
Annotated gene copy number for five species of insects and the crustacean Daphnia pulex
Overall, our search for immune system homologues uncovered fewer genes in D. pulex than in D. melanogaster, A. gambiae or A. aegypti, a similar repertoire to T. castaneum, and more genes than A. mellifera (Table ). The fact that we found fewer genes than are present in the dipteran genomes may be an artefact due to the high degree of sequence divergence between query and target sequences. However, D. pulex gene families are not consistently lower in number in comparison to the three dipteran species. For example, D. pulex has 11 members of the gram-negative bacterium binding protein (GNBP) family, while D. melanogaster has 3 members, and A. gambiae and A. aegypti each have 6 members. Thus, lower copy numbers in D. pulex are unlikely to be entirely due to the use of dipteran sequences to search a crustacean genome, and may instead reflect real differences in these organisms' evolutionary histories and subsequent strategies in combating their respective pathogens.
Gram-negative binding proteins (GNBPs)
GNBPs are PRR that bind pathogens involved in initiating the prophenoloxidase and Toll immune system cascades. There are two distinct groups of GNBPs, characterised by the presence or absence of the cysteine rich (CR) domain, which binds compounds of pathogenic origin (e.g. β-1-3-glucan, lipopolysaccharide, or lipotechoic acid). All of the GNBP genes of D. melanogaster, A. mellifera and Bombyx mori have the CR domain. In contrast only two of seven GNBP genes of A. gambiae contain this domain. Additionally, all GNBPs, except two from D. melanogaster (CG13422 and CG12780), contain a glucanase-like (GLU) domain that is susceptible to protease digestion and has lower affinity for polysaccharides than the CR domain. In all crustaceans examined previously the GLU domain contains a putative catalytic site that is absent in D. melanogaster. A. gambiae and D. pulex have genes both with and without the putative catalytic site.
Eleven D. pulex GNBP genes were found. Three scaffolds contain two GNBP genes each whilst one scaffold has four GNBP genes, three of which are in close proximity to each other. Dappu-GNBP2 is alone on a separate scaffold. The number of exons in the D. pulex GNBP genes is typically six or seven but Dappu-GNBP2 has nine exons. Overall, the conservation of intron/exon boundaries is apparent and may indicate recent duplication events, a model supported by phylogenetic analysis suggesting that D. pulex GNBP family expansion is recent on an evolutionary timescale.
Phylogenetic analysis shows that the GNBP fall into four well-supported clades (Figure ). GNBP clade I is insect-specific, and contains proteins with CR domains and inactive GLU domains. Within clade I, two D. melanogaster
proteins, CG13422 and CG12780, consist of only a signal peptide and CR domain. Both of these D. melanogaster
paralogues are likely to be functional despite the loss of the GLU domain, as they are upregulated during bacterial infection and CG13422 is also upregulated during fungal infection [27
]. Clade II contains a species-specific expansion of ten D. pulex
GNBP paralogues. The only other member of this clade is a GNBP from the oligochaete annelid Eisenia foetida
, suggesting that insects have lost this GNBP subtype. Many of the D. pulex
clade II genes are clustered on the same scaffold, and thus probably arose from local duplication events. It is not unusual for diversified duplicated immune genes to be selectively advantageous to new pathogenic challenges [22
]. The short branch lengths separating some gene pairs in clade II are indicative of either recent duplication or concerted evolution. None of the GNBP clade II proteins have a CR domain, but all of them, except for Dappu-GNBP1, have an active GLU domain.
Figure 1 Bayesian phylogenetic analysis of the Gram-negative binding proteins (GNBPs) from available insect and crustacean species. All genes containing a catalytic site in their glucanase-like domain are marked with asterisks (*). All GNBP genes known to be involved (more ...)
GNBP clade III consists of proteins exclusively from crustaceans. However, surprisingly, none of the D. pulex GNBP homologues are members, indicating that it is missing from the draft assembly, that it has been lost, or that it diverged substantially since the separation of these species. GNBP clade III proteins have no CR but retain GLU domains. GNBP clade IV consists of five A. gambiae proteins and one D. pulex protein suggesting an Anopheles-specific gene expansion and, in D. melanogaster and in other insect species, either loss or substantial divergence. None of the proteins in this clade contains a CR domain and all have an active GLU domain.
Peptidoglycan recognising proteins (PGRPs)
Another major class of PRR are the peptidoglycan recognition proteins (PGRPs). Aside from recognising peptidoglycan, a required constituent of the cell membranes of gram-negative bacteria, it has been shown in D. melanogaster
that GNBP1 and PGRP-SA form a complex that leads to the activation of the Toll receptor, resulting in the production of antimicrobial peptides. PGRPs have been implicated in a variety of other immune functions, notably the induction of phagocytosis, activation of the Imd and prophenoloxidase pathways, and may even have direct cytotoxic activity towards bacteria [29
]. Surprisingly, our search of the D. pulex
genome found no PGRP genes. Notably, however, D. melanogaster
PGRP-LD is part of a complex transcriptional unit that undergoes differential splicing and translation in a different reading frame to yield two very different peptides: PGRP-LD or PMI. Although we recovered a homologue of PMI in D. pulex
, its alternative reading frames do not encode a PGRP. This is a surprising finding because D. pulex
has homologues of Toll-1, the receptor that is activated in response to PGRPs' recognition of gram-negative bacteria. Our finding that D. pulex
has an expansion of GNBP genes relative to D. melanogaster
could indicate that some GNBPs compensate for the absence of a PGRP.
The TOLL pathway
The Toll family cell surface receptors play an important role in the innate immune system of both invertebrates and vertebrates. These ancient proteins function as signal transducers, inducing pathways that result in the production of antibacterial and antifungal proteins following the recognition of pathogens via peptidoglycan recognition proteins (PGRPs) and gram-negative binding proteins (GNBPs). Different Toll proteins have specific interactions with particular pathogens [30
]. Unlike mammalian Toll-like proteins, which function solely as recognition and signalling proteins resulting in the activation of immune system pathways, some D. melanogaster
Toll genes are also involved in developmental regulation.
The Toll gene family is among the best characterised in the immune system. All members encode transmembrane proteins that contain signal peptides, leucine-rich repeat (LRR) regions interspersed with cysteine rich areas, and an intracellular C-terminal Toll-interleukin 1 receptor (TIR) domain. LRRs are 22–28 amino acids long and are typically involved in protein-protein interactions. The intracellular TIR domain is involved in signalling and interactions with the other players in the Toll pathway. Signal transduction in the Toll pathway involves three well conserved single copy genes (Tube, Pelle and MyD88), orthologues of two of which, Pelle and MyD88, we have identified in the D. pulex genome (Table ), indicating that this pathway is likely to function in a manner similar to that of D. melanogaster.
We found seven Toll receptor genes in the D. pulex
genome, located on seven different scaffolds (Figure ). These proteins possess 13 to 28 LRRs, similar to the Toll receptors of D. melanogaster
(which have from 10–31 LRRs). The variation in the number of LRR repeats makes credible sequence alignment of the full protein sequences problematic, and thus only the relatively conserved TIR domains were used in the phylogenetic analysis (Figure ). The deep branches of the TIR tree are not well supported, and so it is not possible to comment on the ancestral relationships among the different paralogous groups. However, some of the more recent nodes of the tree are strongly supported, allowing the identification of several candidate immune-related D. pulex
Toll genes. Dappu-TOLL 1 and Dappu-TOLL 3 are members of a clade that includes Dm
-Toll1, shown to act as signal transducer for the induction of the Toll pathway following infection with fungi and bacteria [31
]), and Dm
-Tehao, implicated in induction of drosomycin in response to fungal infection [33
]. Dappu-TOLL2 and Dappu-TOLL4 are clustered with Dm
-Toll-9, which is most similar to the mammalian Toll-like receptors that are involved in signalling to initiate acute inflammatory responses, phagocytosis and antimicrobial peptides. However, definitive evidence of the involvement of Dm
-Toll-9 in immune responses is lacking. Dappu-TOLL3 differs from Dappu-TOLL4 and the other members of this clade in being encoded by a single, large exon (rather than 6 or 7 exons). This gene may be the result of an ancient duplication mediated by retrotransposition of a mature mRNA.
Figure 2 Bayesian phylogenetic analysis of Toll genes from insects and crustaceans, with C. elegans as an outgroup. All Toll genes thought to be involved in the immune system are highlighted in bold text. Genes marked with * need further in vivo experiments to (more ...)
Invertebrate thioester proteins (TEPs) are an ancient family related to vertebrate complement factors. Most of these proteins contain a thioester (TE) motif (GCGEQ) accompanied by a catalytic histidine residue. These two elements covalently bind pathogens through a thioester bond until the pathogens are cleared by phagocytosis [35
]. However, not all TEPs contain a TE motif or catalytic histidine. How TEPs function in the absence of these motifs is currently unknown, but it has been suggested they may act as adaptors for the initiation of the membrane attack complex as is found in vertebrates [36
Among the best-studied TEP family members are the α2-macroglobulins (α2m), protease inhibitors that are activated by pathogen-released proteases. This class of TEPs contain a TE motif, but lack a catalytic histidine residue. Upon activation, α2m proteins undergo a conformational change that traps the pathogen protease within the protein, leading to the exposure of the TEP C-terminal recognition domain, which binds phagocytic cells and promotes endocytotic clearance.
While not all invertebrate TEPs have a documented immune function, several functional studies have begun to elucidate their role in this capacity. For example, the expression of the Ag
-TEP1 gene is upregulated after immune challenge and, upon activation, it has been shown to promote phagocytosis [37
]. Additionally, some D. melanogaster
TEPs, Tep1, Tep2 and Tep4 are upregulated following immune challenge [38
]. Finally, it has been suggested that different TEP family members bind different pathogens [39
We found seven TEP genes in the D. pulex
genome. Three are on different scaffolds and four lie clustered on a single scaffold. All seven D. pulex
TEP proteins have a signal peptide, indicating they are secreted. Our phylogenetic analysis (Figure ) identified four well-supported clades. TEP clade III unites the mammalian α2
m with non-vertebrate TEPs including Dappu-TEP1; this clade does not have representatives from A. gambiae
or D. melanogaster
. TEP clades II and IV contain 1 and 5 representatives respectively from D. pulex
and correspond to the previously defined invertebrate TEP class. All members of TEP clade II except Dm
-TEP5, and all members of clade IV have a TE motif. TEP clade IV includes the D. melanogaster
macroglobulin complement-related (Mcr) gene, essential for the specific phagocytosis of the fungus Candida albicans
]. Four of the five D. pulex
proteins in this clade (Dappu-TEP4, -5, -6 and -7) are similar in sequence and are neighbours on one scaffold, and thus probably arose through recent local duplication events. Clade I is an A. gambiae
Figure 3 Bayesian phylogenetic analysis of thioester proteins. Clades I–IV represent (I) an Anopheles gambiae specific species-expansion, (II) all genes except D. melanogaster TEP-5 have a TE motif (III) alpha-2-macroglobulin-like genes and (IV) genes (more ...)
Regarding the possible immune function of these TEPs, Dm
-TEP3 from clade II has been shown to act as an opsonin for Staphylococcus aureus
. Another D. melanogaster
gene in this clade, Dm
-TEP2, has five splice variants, the functional significance of which remains to be determined.Dappu
-TEP2, also in clade II, may also exhibit alternative splice forms as there are additional putative coding exons in the region of the gene homologous to those found to be alternatively spliced in D. melanogaster
. Interestingly, a clade III TEP from the tick Ornithodoros moubata
also shows splice variants [40
]. Multiple splice forms may serve to increase the repertoire of proteases that are recognised by these TEP proteins. Of the seven D. pulex
TEP genes, only Dappu-TEP2 has both the TE motif and a catalytic histidine residue suggesting that it likely functions as an opsonin. TEP clade III includes representatives from human and arthropods: from insects (a hymenopteran and a colepteran), two chelicerates, three malacostracan crustaceans and three species of Daphnia
. As is the case for all the proteins in this clade, Dappu-TEP1 has a TE motif but no histidine residue, and likely functions similarly to the functionally characterized α2
Scavenger receptors (SR) are a diverse, multigene family of cell surface membrane proteins that share broad structural similarities. SR recognize and bind modified low-density lipoprotein (LDL), multiple polyanionic ligands and cell wall components [41
]. SR have a dual cellular role: they are PRR of the immune system, whose triggering results in the cellular encapsulation of bacteria, and also have a housekeeping role of 'scavenging' cellular debris. Structurally, SR can display different numbers and types of protein domains, including chitin-binding domains, scavenger cysteine-rich receptors (SRCR), low-density lipoproteins (LDL), and C-type lectins (see Additional file 1
). SRCR domains are candidates for ligand binding and protein interaction, although their precise biological functions remain largely unknown. We examined only one of three classes of SR, namely the macrophage scavenger receptor class A (SR-A), as they appear to function primarily in the immune system [42
]. The SR-A class can be divided into two groups: those that have at least one SRCR domain and those that do not (SR-A1 and SR-A2 respectively; see Additional file 1
, A. gambiae
and A. aegypti
each have five SR-A genes that form four orthologue pairs [22
]. In contrast, six class A scavenger genes were recovered from 6 different scaffolds within the D. pulex
genome. Structurally, only A. gambiae
Scrasp3 and Dappu-SCV1 do not contain SRCR domains and are therefore classified as SR-A2 type. Due to the highly variable domain structure of the SR, an alignment of the full protein sequences was not possible. Therefore, we conducted a phylogenetic analysis of the conserved SRCR domains of the remaining fourteen SR-A1 proteins (containing 31 domains) from D. melanogaster
, A. gambiae
and D. pulex
. Thus, a single protein that contains three SRCR domains (labelled A, B, C from the N-terminus) has three representations on the tree (Figure ).
Figure 4 Bayesian phylogenetic analysis of scavenger receptor domains (SRCR) from scavenger genes from available arthropod sequences. SRCR copy number varies from 1–3 among the genes analysed. For those genes containing more than one SRCR domain, the gene (more ...)
Eight well-supported groups were resolved, each containing at least one SRCR domain from each of D. melanogaster, A. gambiae and D. pulex. The fact that orthologous trios are recovered when a crustacean is added to the sequences examined strongly supports the hypothesis that these recognition molecules are under strong functional constraint. SCRC domains found within the same protein are placed in different groups, excluding intragenic domain duplication as a mechanism of their evolution. Further supporting this conclusion is the fact the spatial orientation of the multiple domains is maintained across these diverse taxa, including the single protein examined from the nematode Caenorhabditis elegans. The single SRCR domain from group 2 proteins is most similar in sequence to the third SRCR domain from group 4 proteins, but other inter-group relationships were not resolved. This may be due to a rapid burst of domain duplication in an ancestral genome and subsequent maintenance of functionally divergent paralogues.
With regard to the different functional roles of the SR paralogues, the D. melanogaster
gene encoding the protein Tequila/Graal, a secreted protein that is primarily transcribed in the fat body, was found to be significantly upregulated after immune challenge [44
]. Likewise, its orthologue in A. gambiae
, Scrasp1, is also upregulated after microbial challenge [45
]. However, disruption of Tequila did not effect the survival of flies infected with either of gram-positive or gram-negative bacteria, the production of antimicrobial peptides or prophenoloxidase activity [44
]. Therefore, the role, if any, in immunity of these genes is currently unclear, although it is evident from the experiments with D. melanogaster
that they do not play a role in the activation of the Toll pathway. D. pulex
has two co-orthologues of Tequila/Graal/Scrasp1 (Dappu-SCV3 and Dappu-SCV4), both of which are therefore also putatively involved in the immune system. This is the only example of two gene copies from one species within a group, indicating that either a gene duplication has occurred in D. pulex
, or a gene copy has been lost in the other species. Contrary to what one would expect as the result of recent gene duplication, the two D. pulex
genes are not identical in domain structure: one (Dappu-SCV3) contains three SRCR domains and two chitin-binding domains, while the other (Dappu-SCV4) contains two SRCR domains and no chitin-binding domains. Similar to Dappu-SCV3, D. melanogaster
Tequila and A. gambiae
Scrasp1 both contain at least 2 chitin binding domains, but resemble Dappu-SCV4 in that they each have only two SCRC domains. Nevertheless, a D. pulex
specific gene duplication seems likely.
The phylogenetic analysis shows that SRCR_B and SCRC_C from Dappu-SCV3 share a common ancestry with the first and second domain copies respectively of Tequila, Scrasp1 and Dappu-SCV4. Thus, Dappu-SCV3 SCRC_A lacks a direct orthologue in the other members of the group, indicating that the other two species may have lost it. Moreover, the branch lengths separating the two D. pulex
genes are very short. Thus, it appears that in the lineage leading to group 1 proteins in the dipterans have lost a SCRC domain, and that D. pulex
has undergone a recent species-specific gene duplication resulting in a truncated gene copy with only two SRCR receptors and no chitin-binding domains. Finally, the D. melanogaster
protein Corin, containing a single SCRC domain, also has a documented putative functional role in the immune system. Indeed, it is up-regulated three-fold when challenged by any of gram-positive, gram-negative or fungal pathogen challenges [28
]. Thus, based on our phylogenetic analysis, the Corin orthologue Dappu-SCV2 is likely to have an immune function.
Chitin is a polysaccharide found in the supportive structures of many organisms including the exoskeletons of invertebrates, cell walls of some fungal spores, and cysts of amoeboid parasites. Chitinases hydrolyse chitin, and are critical in arthropod development (e.g. during ecdysis). A wide range of organisms that do not synthesise chitin also express chitinases where they play roles in chitin digestion and in immune defence against chitin-containing pathogens. For example, plant class I chitinase has been shown to undergo rapid adaptive evolution in its active site cleft, presumably due to an arms race with a fungal pathogen [47
]. Additionally, two chitinase-like proteins in A. gambiae
-AgBr1 and Ag
-AgBr2) were shown to be upregulated in the presence of gram-positive bacteria [48
], and chitinase-like proteins are part of the mammalian immune system [49
]. We identified 17 chitinase genes in D. pulex
, a similar repertoire to that of D. melanogaster
, which has 16. Intrachromosomal tandem duplication has likely contributed to the large number of chitinase genes within the D. pulex
Chitinase and chitinase-like proteins contain two primary structural domains: the glycosyl hydrolase family 18 domain (GH18), which is responsible for hydrolysing chitin oligosacchirides, and the chitin-binding domain (CH14). The number and spatial arrangement of these structural domains varies among the available arthropod sequences. However, all contain between 1 and 5 GH18 domains while only a subset contain CH14 domains.
All of the chitinase and chitinase-like proteins that have a characterised immune related function contain a single GH18 domain and do not contain a CH14 domain. Phylogenetic analysis of the individual GH18 domains for all available chitinase and chitinase-like genes from various arthropod taxa yielded a well-supported tree (Figure ), with all the immune-related genes in one distinct clade (clade Ib). The GH18 domains from Dappu-CHT1 and Dappu-CHT2 are closely related to this cluster, and define these two as likely immune-related chitinase genes. Furthermore, all members of clade Ib, including Dappu-CHT1 and Dappu-CHT2, lack an active site glutamate residue critical for the hydrolysis of chitin, a trait that does not appear to be necessary in immune system function. Additionally, based on the longer branch lengths in clade Ib, its members appear to be evolving more rapidly than most of the other clades, a trait consistent with genes experiencing changing selective pressure, such as can be caused by host-pathogen interactions. Interestingly, a further four GH18 domains found elsewhere in the tree also lack the glutamate residue (Dm-CHT12, Dm-CHT10, Dappu-CHT3, and Tenebrio molitor CHT1), indicating that these domains also have a function that does not require hydrolysis of chitin, and should be considered for functional studies.
Figure 5 Bayesian phylogenetic analysis of the glycosyl hydolase family 18 (GH18) domain from arthropod chitinase genes sequences. The number of GH18 domains in the chitinase genes analysed ranges from 1 to 5. For those genes containing more than one GH18 domain, (more ...)
An additional nine GH18 domains from seven D. pulex chitinases are contained within clade I, representing a large species-specific gene expansion. Although all of these gene copies are predicted to have the ability to hydrolyse chitin, such a large gene expansion of genes in the class that may have given rise to the immune-related chitinases warrants functional characterization.
Nitric oxide synthase (NOS)
The nitric oxide synthase (NOS) genes encode an enzyme responsible for the production of nitric oxide (NO), a highly reactive free radical gas. NO is toxic to nearly all types of pathogens. All vertebrates studied to date have three NOS paralogues, permitting partitioning of gene function. Two paralogues produce constitutively expressed proteins (neuronal and endothelial NOS), while the third, inducible NOS, is an immune response molecule triggered by pro-inflammatory cytokines. In contrast to the vertebrate NOS genes, all previously investigated invertebrates, which include six insect species and one crustacean (the blackback land crab Gecarcinus lateralis
), have a single NOS that performs multiple metabolic functions, including roles in both humoral and cellular innate immune responses [50
]. In contrast, we identified two NOS paralogues in the D. pulex
genome. Dappu-NOS1 and Dappu-NOS2 differ in gene structure, suggesting that they are not the result of a recent duplication event. Indeed, the two D. pulex
NOS proteins are only 44% identical, and thus are less similar to each other than are mouse inducible NOS and either mouse neuronal NOS (51%) or mouse endothelial NOS (52%).
Our reconstruction of the NOS phylogeny identifies an insect-specific clade with the NOS of the blackback land crab as the sister to this clade (see Additional file 2
). However, the relationship of the two D. pulex
paralogues to this clade is not resolved. Regardless of their evolutionary relationship with the other NOS homologues, the duplication of NOS in D. pulex
could have resulted in gene subfunctionalization [53
], potentially in a manner similar to the NOS genes in vertebrates, neofunctionalization [54
], or even a combination of the two (subneofunctionalization [55
]). Without further functional information with respect to the product of these two genes, it is not possible to test between these different models. However, the fact the branch lengths of both of the D. pulex
NOS genes (especially that of Dappu-NOS2) are long relative to other branches within the tree suggests that both of these genes are either experiencing release from selective constraint or are subject to positive selection that has accelerated their evolution.
Caspases are members of the cysteinyl aspartate proteinase family that cleave particular substrates after aspartic acid residues. Caspase proteins contain three domains: prodomain, p20 and p10. The prodomain varies in sequence length and composition, and contains motifs that direct the protein to particular complexes or organelles. The p20 and p10 units are necessary for substrate recognition and catalytic activity.
Members of the caspase family play roles in programmed cell death and inflammation in vertebrates, but their roles in invertebrates are less clear. A phylogenetic analysis of caspase p-domains from D. melanogaster
, Danio rerio
, Xenopus laevis
, Gallus gallus
, Mus musculus
and Homo sapiens
identified three main clades corresponding to inflammatory, apoptotic or apoptotic initiator responses [56
]. No arthropod caspase homologues were placed in the clades containing vertebrate inflammation responsive homologues. However, the D. melanogaster
caspase Dredd mediates immune responses to infection by gram-negative bacteria, possibly by cleaving the antimicrobial peptide transcription factor Relish [57
]. Furthermore, a study of the D. melanogaster
caspases Decay, Daydream and Drice found that all three were upregulated by pathogens [28
]. Thus it appears that vertebrate and arthropod caspases may have independently evolved an immune function.
We identified eight putative caspases in D. pulex, the same number found in D. melanogaster and T. castaneum. A. gambiae has fifteen (including two caspase-like genes), while A. mellifera has only one (Table ). D. pulex caspases are distributed among five scaffolds, with three paralogues arranged on a single scaffold. The prodomain of the various caspases was too variable to align with confidence and thus was excluded from further analysis. In contrast, the p20 and p10 domains were sufficiently similar to allow an alignment to be constructed. The phylogenetic tree of the caspases contains 8 clades, the relationship among which is unresolved (Figure ). There are only two 1:1 orthologues between D. melanogaster and A. gambiae, and neither orthologous pair includes a D. pulex member. However, all but one D. pulex and one A. gambiae caspase cluster with paralogues from the same species, indicating that lineage-specific gene duplication is a relatively frequent event.
Bayesian phylogenetic analysis of caspase genes from available insect and crustacean sequences. Genes with known immune function are highlighted in bold text. Numbers at nodes are posterior probabilities.
Two of the eight clades (clades II and III), containing a total five genes, are unique to D. pulex, while another (clade IV) is unique to A. gambiae. Only clade VIII, containing the immune related Dm-Dredd, includes orthologues from all three arthropods, suggesting that the putative immune function of this clade is ancestral to the split between the crustaceans and insects. The transcription factor Relish that is activated by Dredd is also conserved among these species. However, only D. pulex contains two paralogues within clade VIII, a lineage-specific expansion that may provide additional flexibility in immune-related function. The other caspases with characterized immune function in insects are found in clades I and V, neither of which contains D. pulex orthologues.
Anitviral RNAi genes
RNA interference (RNAi) is an ancient defence mechanism that targets invading viral double-stranded RNA and transposons [58
]. The RNAi cascade involves many genes working in synchrony, including Argonaute, an endonuclease, and Dicer, which is responsible for cutting dsRNA into small fragments for further processing. It has been shown that genes involved in the RNAi pathway are evolving rapidly due to positive selection, and that they show patterns of nucleotide polymorphisms that are consistent with a recent selective sweep [5
]. Based on these findings, it is suggested that the rapid adaptive evolution in these genes may be caused by a coevolutionary arms race between viral pathogens and host defence [5
As expected from the D. melanogaster genome, we identified two Argonaute genes in D. pulex, corresponding a housekeeping endonuclease with no immune function, and an RNAi endonuclease (Figure ). Contrary to expectations from D. melanogaster, three Dicer paralogues were identified in the D. pulex genome. Dicer paralogues resolve into two primary clusters corresponding to housekeeping genes and those that take part in the antiviral activity (Figure ). The additional D. pulex Dicer gene appears to be the result of a lineage-specific duplication of the antiviral pathway paralogue. These duplicates show lower sequence similarity to each other than do the orthologues from the mosquitoes Aedes aegypti and A. gambiae, so the duplication event is unlikely to be new. From their respective branch lengths, both the D. pulex antiviral Dicer proteins are evolving at a higher rate than the housekeeping paralogues, a finding consistent with the hypothesis that they are undergoing a co-evolutionary arms race with D. pulex pathogens. Similarly, the D. pulex Argonaute gene copy in the putative antiviral pathway shows a higher rate of evolution than the housekeeping paralogue. A population genetic survey could test whether these genes are indeed experiencing strong selective pressures.
Figure 7 Bayesian phylogenetic analysis of two RNAi genes. (A) Argonaut genes, and (B) Dicer genes. Based on functional studies in Drosophila, AGO1 and DCR1 gene copies are thought to play a housekeeping function, whilst AGO2 and DCR2 gene copies are thought to (more ...)
In vertebrates, the innate immune system is complemented by a complex adaptive immune system that generates novelty in immune receptors by somatic recombination and rearrangement of immunoglobulin-domain containing genes. While there has been no evidence that non-vertebrates have homologous machinery for an adaptive immune response, one gene with an immunoglobulin domain, a homologue of the human Down Syndrome cell adhesion molecule (DSCAM), has recently been implicated in immune-related somatic diversification in insects [59
]. DSCAM is a key player controlling neural wiring in both vertebrates and invertebrates. However, while vertebrates can make three different transcripts through alternative splicing, D. melanogaster
has the capacity to make 30,000 unique transcripts due to four clusters of variable exons spliced in a mutually exclusive manner. This remarkable gene structure has been observed in all insects studied to date, including the dipterans D. melanogaster
, A. gambiae
, and A. aegypti
, the lepidopteran Bombyx mori
, the hymenopteran A. mellifera
, and the colepteran T. castaneum
, although alternative exon number is variable.
It has been speculated that the alternative transcripts may yield protein isoforms that act in a manner similar to that of vertebrate antibodies. Dscam transcripts have been found in fly hemolymph, fat body cells and hemocytes [60
]. In these tissues, novel alternative splice products were found that have not been observed in neural tissue. Furthermore, A. gambiae
challenged with different types of pathogens resulted in the transcription of different Dscam variants [59
]. RNAi experiments in both the fly and mosquito have shown that the organisms were less able to clear pathogens when Dscam was inhibited [59
]. Dappu-Dscam, fully described in a companion paper [61
], has similar complexity to that observed in insects (Dappu-Dscam contains three variable exon clusters) and the frequency of alternative transcript expression differs between the brain and hemocytes, suggesting that Daphnia
DSCAM may also play a role in immunity.