|Home | About | Journals | Submit | Contact Us | Français|
This manuscript presents the first extensive phylogenetics analysis of a key family of immune regulators, the interferon regulatory factor (IRF) family. The IRF family encodes transcription factors that play important roles in immune defense, stress responses, reproduction, development, and carcinogenesis. Several times during their evolution, the IRF genes have undergone expansion and diversification. These genes were also completely lost on two separate occasions in large groups of metazoans. The origin of the IRF family coincides with the appearance of multicellularity in animals. IRF genes are present in all principal metazoan groups, including sea sponges, placozoans, comb jellies, cnidarians, and bilaterians. Although the number of IRF family members does not exceed two in sponges and placozoans, this number reached five in cnidarians. At least four additional independent expansions lead up to 11 members in different groups of bilaterians. In contrast, the IRF genes either disappeared or mutated beyond recognition in roundworms and insects, the two groups that include most of the metazoan species. The IRF family separated very early into two branches ultimately leading to vertebrate IRF1 and IRF4 supergroups (SGs). Genes encoding the IRF-SGs are present in all bilaterians and cnidarians. The evolution of vertebrate IRF family members further proceeded with at least two additional steps. First, close to the appearance of the first vertebrate, the IRF family probably expanded to four family members, predecessors of the four vertebrate IRF groups (IRF1, 3, 4, 5 groups). In the second step, 10 vertebrate family members evolved from these four genes, likely as a result of the 2-fold duplication of the entire genome. Interestingly, the IRF family coevolved with the Rel/NF-κB family with which it shares some important evolutionary characteristics, including roles in defense responses, metazoan specificity, extensive diversification in vertebrates, and elimination of all family members in nematodes.
Members of the interferon regulatory factor (IRF) family of transcription factors are major regulators of host defense in vertebrates, controlling many different aspects of the innate and adaptive immune response and reactions to cell stress (Tamura et al. 2008). This family also functions in reproduction and development (Bazer et al. 1997; Ozato et al. 2007). All IRF family members share a conserved N-terminal DNA-binding domain (DBD) with a wing-type helix-loop-helix structure and a motif containing five regularly spaced tryptophan residues resembling the Myb DBD. The C-terminal region of these proteins contains an IRF association domain 1 or 2 (IAD1, IAD2) that has transactivation potential and can mediate association with other IRF members or with members of different transcription factor families (Takaoka et al. 2008).
Ten IRF family members (IRF1–10) have been described in vertebrates, and all 10 genes are present in many vertebrate species (Takaoka et al. 2008; N.J., H.R., B.H.R., unpublished data). In certain species, some of these genes have been eliminated or rendered nonfunctional, such as IRF10 in mice and humans (Nehyba et al. 2002). In other species, the IRF family expanded (Stein et al. 2007). Phylogenetic analysis has demonstrated that the 10 IRF proteins can be subdivided into four groups that reflect their evolutionary history: IRF1-G (IRF1, IRF2), IRF3-G (IRF3, IRF7), IRF4-G (IRF4, IRF8, IRF9, IRF10), and IRF5-G (IRF5, IRF6; Nehyba et al. 2002). Eight IRF proteins comprising three of these groups (IRF3-G, IRF4-G, and IRF5-G) share a common C-terminal IAD1 domain, which is related closely to the C-terminal MH2 domain of the Smad proteins and more distantly, to the forkhead-associated domain of certain kinases and transcription factors (Qin et al. 2003). The remaining two factors that form the IRF1-G group have a different C-terminal associated domain (IAD2) that is not related to any other known domain. Consequently, the IRF family can be viewed as consisting of two large supergroups (SGs)—the IAD1-containing IRF4-SG and the IAD2-containing IRF1-SG.
Recent advancements in our knowledge of the genomes and transcriptomes of numerous invertebrate species have changed our view of the evolution of many genes. Genes that were previously thought to develop in deuterostomes because they were not present in the Drosophila and Caenorhabditis genome have been detected in other metazoan species (Pennisi 2007). Genes belonging to the IRF family fit in this category. Although IRF genes have been known for a long time in vertebrate species and more recently in other deuterostomes, their origin has not been determined (Sodergren et al. 2006; Azumi et al. 2007; Huang et al. 2008; Takaoka et al. 2008). IRF-like genes have also been occasionally detected in genomic and expressed sequence tag (EST) databases of a few basal metazoans and protostomians (Miller et al. 2007; Putnam et al. 2007; Venancio et al. 2007; Srivastava et al. 2008). In order to understand the origins of the IRF family and how the 10 vertebrate family members may have evolved, we have carried out a comprehensive search of IRF sequences in available databases and analyzed the relationship of IRF genes and proteins in different metazoan taxa.
Phylogenetic analysis suggests that the origin of IRF genes coincides with the appearance of animal multicellularity. IRF genes are present in all five principal metazoan groups, including sea sponges, placozoans, comb jellies, cnidarians, and bilaterians, however, were not detected in species outside of the Metazoa group. The IRF genes of ancient metazoan sea sponge group already contain sequences encoding both principal IRF domains, IRF DBD and IAD1 domains. In cnidarians and in all bilaterians which possess IRF genes, the IRF family is divided into IRF1 and IRF4-SGs. The IRF family apparently underwent a turbulent evolution. Several independent expansions lead to 5–11 members in Cnidaria, Mollusca, amphioxus (Cephalochordata), sea squirts (Tunicata), and finally in Vertebrata. In contrast, all IRF family members disappeared or mutated beyond recognition, independently, in two large protostomian groups—roundworms (Nematoda) and insects (Hexapoda). Finally, our analysis suggests that at some stage, close to the appearance of the first vertebrate, the IRF family expanded to four members, predecessors of the four IRF groups (IRF1, 3, 4, 5G), and the 10 vertebrate family members evolved from those four genes as a result of the 2-fold duplication of the entire genome.
Most searches employed databases accessible through the National Center for Biotechnology Information (NCBI) Web site (http://www.ncbi.nlm.nih.gov) and the set of Blast programs available at this site (exceptions are mentioned below). EST databases from metazoan species were searched using TBlastN initially using human IRF1 DBD as a query followed by searches with the complete sequences of human IRF1 (hsIRF1; NP_002189), human IRF4 (hsIRF4; NP_002451), and Oscarella carmela IRF-OC1 (FJ752596). Whole-genome shotgun (WGS) databases from metazoan species were searched by TBlastN using DBD sequences of hsIRF1, hsIRF4, and, in some instances, IRF-OC1. EST and WGS databases of nonmetazoan eukaryotes were searched with a query composed of concatenated sequences of the DBDs of hsIRF1, hsIRF4, IRF-OC1, Pleurobrachia pileus IRF-PP1, and Trichoplax adhaerens IRF-TA1. EST and WGS databases of ecdyzoan protostomians were also searched with concatenated sequences of DBDs of Mesobuthus gibbosus IRF-MG1, Rhipicephalus appendiculatus IRF-RA1, and Eriocheir sinensis IRF-Er. Amphimedon queenslandica WGS sequences were searched by TBlastN using database and program provided by the University of Kiel, Germany (http://www.compagen.org; Hemmrich and Bosch 2008). Saccoglossus kowalevski genome traces were downloaded from the NCBI Web site and searched by a local TBlastN program. Genomic sequences of Lottia gigantea, Capitella sp. I, and Helobdella robusta were searched using database and program provided by the Department of Energy Joint Genome Institute (http://www.jgi.doe.gov). Genomic sequences of Schistosoma mansoni were searched using database and program provided by the Sanger Institute (http://www.sanger.ac.uk). All hits with expect value of 5 and lower were individually evaluated by searching target sequences back by BlastX against the protein “nonredundant” database at the NCBI Web site. Detected invertebrate IRF sequences are listed in supplementary tables S1 and S2 (Supplementary Material online). Control human and chicken IRF sequences were retrieved from Swiss-Prot database (P10914, P14316, Q14653, Q15306, Q13568, O14896, Q92985, Q02556, Q00978, Q90876, Q98925, Q98TX7, Q5ZJM5, Q1PS65, Q90643, Q90871, and Q90WI0) and the Smad sequences from GenBank (NP_005891.1, ABC88374.1, NP_005893.1, NP_005576.3, and XP_001631691.1).
Protein sequence alignments were constructed by the ClustalX program (Jeanmougin et al. 1998). Some of the aligned sequences were corrected by visual inspection using the Seaview program (Galtier et al. 1996). Two methods were used for phylogenetic tree constructions: Bayesian analysis and the distance-based Neighbor-Joining (NJ) method. Bayesian inference analysis was performed by MrBayes 3.1.2 program (Ronquist and Huelsenbeck 2003). Evolutionary models implemented by the program included mixed fixed-rate amino acid substitution model and gamma-distributed rate variation across sites. Two Bayesian analyses each consisting of four Metropolis-coupled Markov chain Monte Carlo were run for 200,000 generations and sampled every 200th generation. Convergence of both analyses was assessed using a plot of the generations versus the log probability of the data. The consensus tree was created with burn-in value set to 250. To confirm the Bayesian analysis, the phylogenetic trees were constructed from the same alignments using the NJ method as implemented in ClustalX. One thousand bootstrap replicates were generated by ClustalX for the bootstrap tests. The trees were plotted by the tree-drawing program Dendroscope (Huson et al. 2007). The alignments and the confirmatory NJ trees are shown in two supplementary files—Alignments and Sequences—and figures S1–S3 (Supplementary Material online).
The data of Putnam et al. (2008) together with the chromosomal positions of the human IRF genes shown in Homo sapiens genome (build 36.3) at the NCBI Web site were used (see supplementary table S5, Supplementary Material online). The nonfunctional human IRF10 gene is located between the DUSP15 and FKHL18 genes in chromosome 20, and the remnants of its 3’ sequence are listed as the 3’ region of the DUSP15 gene.
Two EST cDNAs (G840P34RB5.T0 and G840P310RO18.T0) from O. carmela pSport1 cDNA library (Nichols et al. 2006) were sequenced. The newly determined nucleotide sequence of O. carmela IRF was submitted to GenBank (FJ752596). IRF sequences determined by new annotation and/or assembly of WGS or EST sequences already in GenBank are shown in supplementary files—Alignments and Sequences—(Supplementary Material online).
The IRF DBD, the defining and most conserved sequence of the IRF family in vertebrates, was used to search EST and genomic databases of metazoan animals. Subsequent searches also used full IRF sequences that included the IAD1 domain. Blast searches among the genes of sponges (Porifera), placozoans, comb jellies (Ctenophora), cnidarians, and protostomian bilaterians, which together represent more than 95% of all metazoan species, revealed 3 IRF genes in sea sponges, 2 in placozoans, 2 in comb jellies, 13 in cnidarians, 5 in flat worms (Platyhelminthes), 9 in segmented worms (Annelida), 14–15 in mollusks, and 6 in arthropods (fig. 1; supplementary tables S1 and S2, Supplementary Material online). Because of the limited data available, we did not try to classify these genes into ortholog groups. Each gene was named by an abbreviation for the species and the gene number (e.g., T. adhaerens has two IRF genes IRF-TA1 and IRF-TA2, abbreviated to TA1 and TA2) (Table 1).
Although IRF genes are commonly found in the metazoan kingdom, Blast searches of genomic, and EST databases of other, nonmetazoan eukaryotes (as specified in supplementary tables S3 and S4, Supplementary Material online) failed to find any genes that would encode proteins with detectable homology to the IRF DBD. IRFs are absent even in Choanoflagellata, the sister group of Metazoa (fig. 1), suggesting a correlation between the beginnings of this family and the origin of animal multicellularity.
Among the most ancient IRF genes in existing animal species are those of sponges. Multiple phylogenetic analyses place poriferans close to the bottom of the metazoan tree (fig. 1; Budd 2008). To determine the exact structure of the poriferan IRF gene, two EST cDNAs of the homoscleromorph species O. carmela were sequenced and the coding sequence analyzed (IRF-OC1; fig. 2). Two additional IRF genes have been identified by database searches in the demosponge A. queenslandica genome (AQ1 and AQ2). The amino acid sequence of DBD of the three poriferan genes is 40% identical (60% similar) to the vertebrate IRF genes and has all the hallmarks of the IRF DBD, including the five regularly spaced tryptophans and five conserved amino acids needed for DNA sequence recognition (Fujii et al. 1999). In addition to the IRF DBD, IAD1 is also present in IRF-OC1 (based on mRNA sequence) and IRF-AQ1 (based on juxtaposition of the DBD and IAD1 in the genome). The amino acid sequence of the IAD1 domain is less conserved than the DBD sequence, 25% identical (40% similar) to its vertebrate counterpart, but is clearly recognizable. The complete coding sequence of IRF-OC1 also indicates that three other regions typical for the vertebrate IRF4-SG proteins are present in sponges including 1) a short variable N-terminus that precedes the IRF DBD; 2) an approximately 150 amino acid long, proline- and serine/threonine-rich middle region separating the DBD from IAD1; and 3) an approximately 50 amino acid long, variable C-terminus. Poriferan IRF proteins despite being the genes of one of the most ancient metazoan lineage are surprisingly complete and “modern” in their architecture.
During the evolution of the basal metazoan and protostomian groups, the number of IRF family members in the genome and their diversity fluctuated considerably (fig. 1). In order to evaluate the degree of diversification of IRF genes in individual metazoan groups, phylogenetic trees of IRF protein sequences were constructed using IRF sequences of four basal metazoan groups (Porifera, Placozoa, Ctenophora, and Cnidaria), two protostomian groups (Lophotrochozoa and Ecdyzoa), and, as a control, genes of two selected vertebrate species were included (human and chicken). Separate trees of the two domains, IRF DBD and IAD1, were created because full-length IRF sequences for most invertebrate metazoan species were not available. Phylogenetic relationships were analyzed by both character-based Bayesian inference (figs. 3 and and4)4) and by distance-based NJ method (supplementary figs. S1–S2, Supplementary Material online). The trees yielded by both methods were in mutual agreement.
Two genes are present in the genome of several basal metazoan groups (Porifera, Placozoa, and possibly Ctenophora, where only EST data are currently available). In all three groups, limited IRF sequence diversity is apparent because all the sequences belonging to any single species cluster together (fig. 3A). By contrast, there are up to five IRF family members per genome in Cnidaria. These genes form two separate clusters, indicating increased gene diversification (fig. 3B). In the protostomian group of Lophotrochozoa, the number of genes differs significantly with Mollusca and Platyhelminthes representing two extremes. The IRF family of Mollusca diversified up to seven per genome, and their sequences form four clusters (fig. 3C). In contrast, just one IRF gene was found in the genomes of three representatives of Platyhelminthes. Finally, the general tendency leading to a reduction of the IRF family complexity is apparent in protostomian group of Ecdyzoa. In the genomes and EST sequences of two well-studied groups, Nematoda and Hexapoda, IRF DBD sequences were not detected (supplementary tables S3 and S4, Supplementary Material online). Nevertheless, several IRF genes were found in other arthropods (Chelicerata and Crustacea), indicating that the IRF family was either eliminated or mutated beyond recognition at least twice during the evolution of Metazoa. Two IRF genes per genome were found in Chelicerata (Ixodes scapularis). Interestingly, the only crustacean with an IRF gene detected is the Chinese mitten crab (E. sinensis) though extensive sequence data exist from other crustacean species. All arthropodean genes form a single cluster (even when all DBD trees in fig. 3 are combined—not shown) but with long internal and terminal branches (fig. 3D).
The IRF genes of cnidarians and lophotrochozoans are divided into two groups, one related to the vertebrate IRF1-SG and the other to the IRF4-SG (fig. 3B and C). Genes related to IRF1 include Cnidaria G1, Mollusca G1, and Annelida G1 groups, whereas Cnidaria G2; Mollusca G2, G3, and G4; Annelida G2 and G3; and platyhelminthan IRF genes comprise the IRF4-related group. The control vertebrate proteins in all DBD trees also form two separate clusters, corresponding to the vertebrate IRF1- and IRF4-SGs, with no invertebrate sequences located within these clusters. Together, these results suggest that both vertebrate IRF-SGs diverged prior to the separation of cnidarian and bilaterian lineages and that the diversification within those two SGs occurred only after the split of the protostomian and deuterostomian lineages.
Diversification of IRF genes into two SGs in cnidarians and bilaterians is further supported by the pattern of occurrence of IAD1, the signature domain of the IRF4-SG. Our data suggest that IAD1 occurs in proteins that cluster in branches close to the vertebrate IRF4-SG (Cnidaria G2; Mollusca G2, G3, and G4; Platyhelminthes), whereas it is absent in clusters branching close to IRF1-SG (Cnidaria G1, Mollusca G1). The phylogenetic tree of the IAD1 sequences indicates that the IAD1 domains of invertebrates are closely related to IAD1 of vertebrate IRF proteins and clearly distinct from the MH2 domain of vertebrate and cnidarian Smad proteins (fig. 4A). The Smad MH2 domain is the closest known relative of IRF IAD1 (Eroshkin and Mushegian 1999; Qin et al. 2003). The IAD1 domains of all vertebrate IRF genes form a separate cluster, supporting the conclusion from DBD analyses that the eight vertebrate IRF4-SG members diverged only after the split of protostomians and deuterostomians. This result is further supported by the tree of the sequences of full-length protostomian proteins (fig. 4B).
Survey of several prevertebrate deuterostomian genomes revealed the presence of 3 IRF genes in hemichordate (S. kowalevski), 2 in sea urchin (Strongylocentrotus purpuratus), 11 in cephalochordate amphioxus (Branchiostoma floridae), and 9–11 in tunicates (Ciona; fig. 1; supplementary tables S1 and S2, Supplementary Material online). Additional echinoderm and tunicate genes were found in EST databases. Phylogenetic relationships among these genes analyzed by both character-based Bayesian inference (fig. 5) and by distance-based NJ method (supplementary fig. S3, Supplementary Material online) yielded trees that were in mutual agreement.
The phylogenetic tree of the DBDs of these IRFs indicates that some of these proteins are closely related to vertebrate IRFs (fig. 5A). The analysis also confirmed the ancient separation of the IRF1-SG from all other IRF genes. A single protein (SK1, SP1, BF4, CI1) of each of four species—the hemichordate, urchin, amphioxus, and tunicate—are clearly members of the IRF1-SG. All the remaining IRFs either have an IAD1 domain or are closely related to IRFs containing IAD1 sequences, suggesting that they recently lost the IAD1 motif. An interesting example is the sea squirt IRF-CI3 that has a sequence exhibiting similarity to a highly charged region of the kazrin gene in place of the IAD1 domain (data not shown).
The relationship between prevertebrate deuterostomian IRFs with an IAD1 motif and the vertebrate IRF4-SG was also addressed. The phylogenetic tree of the DBDs (fig. 5A) shows that the vertebrate IRF4-SG is subdivided into two clusters (IRF3 + 5-G and IRF4-G) and that groups of invertebrate IRFs are positioned between these two clusters. One of these groups (Deuterostomia, IRF4-G-like DBD) that includes IRFs from hemichordate, echinoderm, cephalochordate, and tunicate species shows a much closer relationship to vertebrate IRF4-G than to IRF3 + 5-G. The other cephalochordate and tunicate proteins (BF1-2, BF7-11, CI2, and CI5-9), however, do not exhibit a particularly close relationship to any of the vertebrate IRF proteins. The phylogenetic analysis of the IAD1 domain provides a different perspective of the IRF evolution (fig. 5B). Based on the IAD1 sequence, vertebrate IRF4-SG members form a closed cluster and are separated from other deuterostomian IRF genes. Apparently, the IRF DBD and IAD1 sequences were under different constraints during evolution. The simplest reconciliation of the phylogenetic analyses of the DBD and IAD1 is to assume the separation of the IRF4-G and IRF3 + 5-G ancestors after the vertebrate lineage split from other deuterostomians. In that case, the DBD sequence of vertebrate IRF4-G proteins would remain conserved and, therefore, highly similar to the ancestral sequence, whereas in vertebrate IRF3 + 5-G, the DBD sequences would diverge significantly (for further considerations, see Discussion).
Many features of the vertebrate genome have been explained by the suggestion that a 2-fold whole-genome duplication (2WGD) event occurred in early vertebrates after they separated from other deuterostomians (Kasahara 2007). This hypothesis is well supported and may explain the origin of the 10 vertebrate IRF genes. Comparing the amphioxus and vertebrate genes, Putnam et al. (2008) were able to reconstruct the 17 original chordate linkage groups (CLGs) and show that each of these groups correspond to up to four gene regions in vertebrates. Using their tables and the known position of IRF genes in the human genome, we are able to trace the 10 human IRF genes (including the nonfunctional human IRF10) to four original CLGs (fig. 6). Importantly, each linkage group is related to one of the four groups of vertebrate IRFs (CLG3 to IRF4-G, CLG6 to IRF1-G, CLG13 to IRF5-G, and CLG14 to IRF3-G). This arrangement suggests the existence of one ancestral IRF gene for IRF1-SG and three ancestral IRF genes for IRF4-SG in prevertebrate chordates and implicates 2WGD as the mechanism involved in the generation of the 10 vertebrate genes from the four predecessors. The amphioxus genome, therefore, clearly preserves the general gene arrangement of the basal prevertebrate chordate before 2WGD. However, the 11 distinct IRF genes in the amphioxus genome and their location in eight different CLGs also suggest a pattern of extensive independent evolution. Moreover, only two of the amphioxus IRF genes (IRF-BF3 and BF4) appear to be directly related to the vertebrate IRF genes. IRF-BF4, located in the CLG6 to which the human IRF1 can be traced, is very likely directly linked to the predecessor of vertebrate IRF1-G genes. IRF-BF3 is the closest relative of the IRF4-G genes and is located with three other amphioxus IRF genes in CLG3, from which the IRF4-G originated.
Phylogenetic analysis suggests that the evolutionary origin of the IRF family is temporally associated with the appearance of multicellular animals. Although IRF genes were not found outside of the taxon Metazoa, they are present in all five principal metazoan groups. Sponges, considered by many to exemplify the most ancient metazoan design, have the oldest geological record (635 million years[My]), represent a basal metazoan branch in most phylogenetic trees, and are separated from vertebrates by more than 552 My of independent evolution (Benton and Donoghue 2007; Love et al. 2009; Schierwater et al. 2009). The IRF genes of sponges already have all the major features that define the IRF family in vertebrates. Therefore, it is plausible that the conserved domains, the IRF DBD and IAD1, evolved from an unidentified winged helix-turn-helix sequence and a forkhead-associated–like predecessors as a single protein during the transition from single-cell to multicellular animals. To further substantiate this hypothesis, it will be important to determine full-length IRF sequences of other basal metazoans especially those of the Ctenophora, another group alternatively suggested as a basal metazoan branch (Dunn et al. 2008).
Additional new families of transcription factors appear to have emerged in the period before the divergence of living metazoans. This has been recognized by comparing transcription factors encoded in the genomes of the single-cell choanoflagellate Monosiga and the sponge A. queenslandica as well as several other metazoans (King et al. 2008; Larroux et al. 2008). Metazoan-specific transcription factors include the Paired and POU homeobox factors, T-box, ETS, WNT, SMAD, Rel/NF-κB, and nuclear hormone receptors (Gauthier and Degnan 2008; King et al. 2008). It was suggested that these transcription factors evolved to function in embryogenesis, the new process specific for multicellular organisms that requires cooperative interactions between cells (Larroux et al. 2008). The metazoan immunity also involves intricate cellular cooperation. Importantly, many of the families of metazoan-specific transcription factors as Rel/NF-κB, ETS, and SMAD, which function in embryogenesis, have central roles in immunity and are known to directly interact with IRF (Li et al. 2000; López-Rovira et al. 2000; Marecki and Fenton 2000). Therefore, it is likely that the increased demand for more complex regulation by both processes, immunity and embryogenesis, leads to evolution of new families of transcription factors in multicellular organisms.
Variable numbers of IRF family members in different metazoan species suggest a complicated evolutionary history of the family marked by several periods of expansion and contraction. In sponges (Porifera), placozoans, and comb jellies (Ctenophora), the number and the diversity of IRF family members are limited. Only two IRF family members are present in the sponge and placozoan (A. queenslandica, T. adhaerens; where the genomes have been sequenced), and two IRF family members have been detected in EST sequences of a comb jelly. The IRF genes of poriferan, placozoan, and ctenophoran do not form lineages that can be unambiguously linked to the vertebrate IRF1- and IRF4-SGs. Other transcription factors in sponges also exhibit a limited number of family members (Larroux et al. 2008). The IRF family expanded in Cnidaria as also has been shown for other transcription factor families (Larroux et al. 2008). The IRF family diverged into the IRF1 and IRF4 branches early in evolution, likely before the separation of cnidarian and bilaterian lineages. Proteins in the IRF1 branch are unlikely to possess the IAD1 domain because no sequence encoding IAD1 was found to be juxtaposed to their IRF DBD encoding sequence in several cnidarian and bilaterian genomes. It is not clear when the IAD2 domain appeared due to the poor conservation of this domain and a lack of full-length sequences for most IRF genes in invertebrates.
The IRF family evolved distinctly in different taxonomical groups within Bilateria. The family apparently underwent an independent expansion in several taxons, including Mollusca, Cephalochordata, Tunicata, and Vertebrata. In each case, it was predominantly the members of the IRF4 branch that increased in number and diversified. In a few instances, IRF4-like proteins lost the IAD1 domain, or IAD1 was replaced with a different sequence. In other bilaterians, the entire IRF genes were deleted or their sequence diversified beyond recognition. In this way, IRF1 genes were likely lost in Platyhelminthes, and in the two most extreme cases (in insects and in nematodes), the entire IRF family was lost. The loss of all genes belonging to the family may be related to the unusually high evolutionary rate in these animals. The high level of protein sequence divergence in insects suggests that they are evolving up to three times faster than vertebrates, and this elevated rate is associated with high gene ortholog loss rates (Wyder et al. 2007). Though there is less genomic data from nematodes, the remarkable divergence between three species of Caenorhabditis at the genome level and the large number of existing nematode species suggest a high rate of evolution similar to insects (Coghlan 2005; Thomas 2008).
Two species of deuterostomians (a hemichordate and a echinoderm) have a single IRF1-SG gene and 1–2 IRF4-SG genes per genome, which may reflect the original state of IRF genes in basal deuterostomians. In more complex chordates, 8 to more than 10 IRF genes per genome are present. The expansion of the IRF family in the three divisions of chordates (cephalochordates, tunicates, and vertebrates) likely represents three separate evolutionary events. With the exception of the apparently ancient division of the family into the IRF1 and the IRF4-SG branches, there is generally less similarity between IRF genes from cephalochordates, tunicates, and vertebrates than between IRFs within each group. One exception (see Results) is that some deuterostomian IRF genes form a group, in which members have a DBD highly similar to vertebrate IRF4, 8, 9, and 10, suggesting that the IRF4-G could have separated from the IRF3 and IRF5 groups early in deuterostomian evolution (fig. 5A). However, the tree constructed with the IAD1 domains of these IRF genes does not support this conjecture (fig. 5B). When two new paralogs are created, one of the possible outcomes is that one copy assumes the function of the original gene and remains more conserved, whereas the other gene is free to mutate and assumes new functions (Ohno 1970). It is likely that the higher similarity of vertebrate IRF4-G DBDs to certain deuterostomian IRFs is a reflection of the conservation of function rather than an indication of an early split of the IRF4-SG in Deuterostomia.
In conclusion, analysis of the IRF genes in prevertebrate metazoans indicates that in many animal groups a large number of family members evolved. By contrast, most of the IRF diversity seen in vertebrate species is vertebrate-specific and, therefore, likely evolved to regulate specific functions in vertebrates.
A number of mechanisms lead to the gene duplication and the creation of gene families. These include retrotransposition, segmental duplication, and genome duplication (Babushok et al. 2007). Each of these mechanisms appears likely to have contributed to the evolution of the IRF family. The IRF genes of the sea hare that lack introns (supplementary files—Alignments and Sequences—Supplementary Material online) apparently duplicated as a result of a reverse transcriptase–generated copying mechanism, and the four tandem head-to-tail–oriented IRF genes in amphioxus (fig. 6) likely arose by segmental tandem duplication. Finally, the IRF genes expanded during the early evolution of vertebrates. The relative position of the human and amphioxus genes indicates that the 10 vertebrate IRF genes evolved from four ancestral genes in the prevertebrate genome by 2WGD. Only two existing amphioxus IRF genes can be linked by both homology and genomic position to two of these four predicted predecessors (vertebrate IRF1-G and IRF4-G founding members). The other nine amphioxus genes are likely specific for cephalochordates. Indeed, the innate immune system in the amphioxus lineage appears to have undergone extensive lineage-specific changes because these species contain many proteins with novel domain architecture not found in echinoderms nor vertebrates (Zhang et al. 2008). The putative IRF3-G and IRF5-G predecessors that were not identified in the cephalochordate genome could either be lost, mutated beyond recognition or, most likely, evolved only after the split of cephalochordate and vertebrate lineages.
The IRF family has many features in common with the Rel/NF-κB family of transcription factors (Takaoka et al. 2008). In vertebrates, both families cooperate extensively and together represent the major signaling system employed in innate immunity (Hiscott 2007; Ghosh and Hayden 2008; Takaoka et al. 2008). Interestingly, there are also many similarities in the evolution of both families (Takaoka et al. 2008). Both Rel/NF-κB and IRF families evolved during the transition period from unicellular to multicellular animals. The single Rel/NF-κB gene cloned from sponges encodes a protein that is architecturally similar to modern Rel/NF-κB, containing both an N-terminal Rel homology domain and C-terminal ankyrin repeats followed by a death domain (Gauthier and Degnan 2008). In cnidarians, the fate of both families differs; although IRF genes expanded to about five members, only a single Rel/NF-κB gene was found in Anthozoa, whereas Hydrozoa may lack Rel/NF-κB altogether (Miller et al. 2007; Sullivan et al. 2007). The Rel/NF-κB family diverged in early bilaterians or before into two branches; one contains C-terminal ankyrin repeats and a death domain (NF-κB/Relish branch) and a second contains a C-terminal transactivation domain (Rel branch). The IRF family also diverged before the bilaterian stage into two branches, the IRF1 and IRF4-SGs (Larroux et al. 2008). Among protostomian species, both IRF and Rel/NF-κB genes are present in Mollusca and Chelicerata, whereas they are absent in the nematode Caenorhabditis elegans (Montagnani et al. 2004; Wang et al. 2006). The absence of both these major signaling systems in C. elegans and possibly in other nematodes suggests that innate immunity signaling must function very differently in these animals. In contrast, innate immunity in insects depends heavily on Rel/NF-κB (Cherry and Silverman 2006). It is possible that Rel/NF-κB may partially substitute for the functions of IRF genes in insects and that IRF genes may substitute for some of the NF-κB functions in hydrozoans. In deuterostomians, Rel/NF-κB genes are present in echinoderms, tunicates, and cephalochordates (Yagi et al. 2003; Sodergren et al. 2006; Huang et al. 2008). In contrast to IRF, there are only two Rel/NF-κB members (one of each branch) in these species, indicating no expansion in the number of family members in tunicates and cephalochordates. However, similar to the IRF family, genes encoding Rel/NF-κB proteins expanded in vertebrates to five members. Although we were not able to trace the positions of these Rel/NF-κB members in human chromosomes to common ancient chordate clusters as we did for IRF genes, the timing of the expansion suggests that the 2WGD process was the likely mechanism. As a result, vertebrates possess the most complex IRF-Rel/NF-κB networks of all metazoan species.
The evolutionary history of both IRF and Rel/NF-κB families involved multiple expansions and contractions. Both families are known to be critical for the metazoan organism to detect and react to external threats. Because the efficiency of defense responses determines fitness of the organism, the complicated evolutionary history of both families is likely the consequence of changing selection pressure. The study of their evolution may contribute to our understanding of key factors that shaped metazoan evolution.
We would like to thank N. King and S. A. Nichols (University of California at Berkeley, CA) for providing EST clones of Oscarella carmela. This study was supported by the Public Health Service grants CA33192 and CA098151 from the National Cancer Institute. We also thank to W. Bargmann, J. Sheely, R. Tiwari, and E. Ulug for careful reading of the manuscript.