Phylogenetic tree
Bayesian inference and maximum parsimony yielded nearly identical topologies, so we have presented the tree in which branch lengths are proportional to the amount of change occurring in each lineage (Fig. ). This tree is also available in a detailed, expandable, readable pdf document as supplementary material (Additional file
2). The Bayesian tree includes posterior probabilities at nodes which reflect the proportion of trees sampled during the search that included each particular node. The sea anemone genome has not been manually curated and therefore the data are subject to potential assembly errors until validated. However, the sea anemone provides an ancient anthozoan class cnidarian, and more importantly a diploblast, to the analysis as the only non-triploblastic species in our phylogenetic tree. The honeybee and fruitfly provide a protostome insect CYP genome for comparison, while the human CYPs provide a deuterostome for comparison to the Branchiopod crustacean,
D. pulex.
In general, the tree shows six distinct monophyletic clades, all but one supported by posterior probabilities of 1.00. These include the mitochondrial, CYP2, CYP3, and CYP4 clans, and two deep branches that do not include any arthropod CYPs. The non-arthropod lineages are nearly deuterostome exclusive and would be if not for the two anemone CYPs that show their closest relationships to CYP26. These anemone CYPs found nested within a vertebrate lineage indicate that a CYP26-like ancestor may have existed in cnidarians but was lost in protostomes. Similar inferences concerning CYP loss in protostomes have already been made about CYP51, CYP20 and possibly CYP7. The history of CYP20 in protostomes may be complex since an ortholog has been detected in the annelid leech
Haementeria depressa (CN807321). In addition, CYP19 (aromatase) clusters with the CYP2 and mitochondrial clans, and CYP46 and related sea anemone CYPs are part of a sister clade to the CYP4 clan (Additional file
2).
Overall, the tree demonstrates that the four major clans found in insects (mitochondrial, CYP2, 3, 4) encompass all of the CYPs in
D. pulex. An Excel table of each of the CYP genes and pseudogenes found in the
Daphnia pulex genome, their nucleotide and amino acid sequences, and links to their scaffold position is available as supplementary material (Additional file
3).
The mitochondrial clan of
D. pulex contains six members in five families and five subfamilies. Three of the members are highly conserved Halloween genes involved in ecdysone synthesis [
19,
23]. Specifically,
disembodied (Cyp302A1;
dib),
shade (Cyp314A1,
shd), and
shadow (Cyp315A1;
sad) are all mitochondrial CYPs involved in the last three steps of 20-hydroxyecdysone (20-HE) synthesis. Cyp314A1 is the CYP required for the conversion of ecdysone to its active form, 20-HE [
19,
23]. The other three CYPs are divided into two new families CYP362 (Cyp362A1, Cyp362A2) and Cyp363A1, indicating that these CYPs are not as well conserved and probably have taken on new roles. A tree of the mitochondrial CYP clan is available as supplementary material (Additional file
4).
The CYP2 clan of
D. pulex contains 21 members, including one pseudogene (not shown in Fig. , ), separated into five distinct families (Cyp18, 306, 307, 364, 370) and six subfamilies. Two of the genes,
phantom (Cyp306A1; phm) and
spook (Cyp307A1;
spo) are conserved Halloween genes involved in the early stages of ecdysone synthesis [
19,
24]. The exact role of Cyp307A1 in ecdysone biosynthesis is unknown, but appears to be part of the "Black box" of reactions involved in the earlier steps of ecdysone synthesis from cholesterol.
Daphnia pulex only contains one Cyp307 gene similar to sequenced lepidopterans, but different than
Drosophila that use two Cyp307 genes at different stages of their lifecycle [
24,
25]. Interestingly, the Cyp307 genes are a sister group to the non-arthropod CYP2 members (Fig. ).
The other three CYP2-clan families in
D. pulex (Cyp18, 364, 370) are divided into four subfamilies and contain 19 genes. Interestingly, a sister-group in the CYP2 clan contains no arthropod CYPs. This group primarily contains anemone CYPs, but also contains CYP17, CYP21, and the CYP1 family members inducible by chlorinated hydrocarbons in vertebrates [
26](Fig. ; also available as Additional file
5). The Cyp370 family is the largest of the CYP2-clan families, containing 15 members with 13 members in the 370A subfamily and 2 members in the 370B subfamily (Cyp370B1,2). There is also one pseudogene in the Cyp370A subfamily, Cyp370A3P.
The Daphnia CYP370 family is greatly expanded relative to the single gene CYP15 and CYP303 families in insects it most closely resembles. Cyp15A1 is a regio- and stereo-specific epoxidase critical in the formation of juvenile hormone III (JH III) from methyl farnesoate in the corpora allata of the cockroach [
27]. However, methyl farnesoate, a juvenile hormone precursor, is considered the major terpenoid hormone in crustaceans [
28]; therefore, a methyl farnesoate epoxidase is unnecessary and it is unlikely that the CYP370A and CYP370B subfamily members specifically perform this function. The role of the Cyp370 family in
Daphnia is currently unknown. Cyp18 and Cyp364 are both close relatives of Cyp306 (
phantom), suggesting potential involvement in ecdysone synthesis or catabolism (Fig. ). The Cyp364 family is a new family that contains three genes (Cyp364A1,2,3). Cyp18, which is also found in insects, is induced by 20-HE in
Drosophila [
29].
The CYP3 clan consists of numerous CYPs involved in detoxification of xenobiotics and endobiotics [
30-
33]. Some CYP3 clan members are inducible by hormones such as progesterone [
34] and ecdysone [
17], and are responsible for the metabolism and elimination of steroid hormones in vertebrates [
20,
35,
36]. Although the posterior probability at the base of the CYP3 clan is low (0.54), this only reflects the uncertainly of the position of the first two
Drosophila lineages at the base of the CYP3 clan. The CYP3 clade is strongly supported when
Drosophila is not included in the analysis (1.00). In
Daphnia pulex, the CYP3 clan contains 12 genes and one pseudogene, arranged into two new families (Cyp360, Cyp361) and three subfamilies. Eleven of these thirteen genes are in the Cyp360A subfamily leaving just Cyp361A1 and Cyp361B1 outside this subfamily in the Cyp3 clan of
D. pulex. The closest relatives of the Cyp360 subfamily in the tree are the Cyp6 and Cyp9 subfamily members of insects involved in endobiotic and xenobiotic metabolism and detoxification. Similarly, the closest relatives of the two Cyp361 subfamily members are the anemone CYP3-like group, and the human CYP3A and CYP5A subfamily members involved in detoxification and thromboxane A
2 biosynthesis. In general, the CYP3 clan in insect species has more CYPs than the
Daphnia pulex CYP3 clan. Most CYP families in insects have only a few members, but the CYP6AS subfamily has 18 members, 37.5% of the honeybee P450s, and there are 35 members of the CYP3 clan distributed between seven families making up 42% of the
D. melanogaster P450s. The CYP3 clan in insects, and in particular the CYP6AS subfamily in honeybee was recruited for major gene expansion as were the CYP360 and CYP370 families in
D. pulex, and to a lesser degree the CYP4C subfamily. A tree of the CYP3 clan is available as an additional file (Additional file
6).
The CYP4 clan, which is the sister-group to a clade containing the CYP3 clan plus one of the two "non-arthropod" clans, consists of 38 members all in the same family (Cyp4) and arranged into five subfamilies (Cyp4C, Cyp4AN, Cyp4AP, Cyp4BX, Cyp4BY) with 4–10 members in each subfamily (Additional file
7). There is also a pseudogene in the Cyp4C subfamily. Two members of the Cyp4 family were not observed in the
Daphnia pulex genome v1.1 draft genome sequence assembly (September, 2006), but were cloned by degenerative PCR in a previous study in which nine CYP4 members were partially cloned [
15]. The two absent CYPs, Cyp4C32 (95% identical to 4C34v1; 89% to 4C34v2 from the
Daphnia pulex genome) and Cyp4AN1 (92% to 4AN2v1; 96% identical to CYP4AN2v2 from the
Daphnia pulex genome), are available on GenBank (
BQ703381 and
BQ703379, respectively). The
D. pulex genome sequence coverage was 8.7×, therefore our inability to find these two CYPs may be due to the known gaps in the genome assembly. It is also possible that these two genes were deleted from the Daphnia Genomics Consortium's chosen parthenogenic
D. pulex or "chosen one" due to strain differences.
The Cyp4 members are considered the least studied of the CYP clans in insects [
22] and are involved in fatty acid metabolism, including inflammatory arachidonic acid metabolites, and xenobiotic metabolism in mammals [
37]. Some CYP4 members may be involved in sensory perception in insects as they are found in the antenna [
38]. A Cyp4c member is also involved in the biosynthesis of juvenile hormone, and another is inducible by hypertrehalosemic hormone, a key hormone in arthropod carbohydrate metabolism [
39]. Furthermore, some Cyp4 members are down-regulated by ecdysteroids [
40], indicating that Cyp4 members may play a key role in sensory and hormonal functions in
D. pulex.
Primarily CYP3, but some CYP4 and mitochondrial clan members have been associated with resistance to pesticides [
17,
30,
41-
44]. Several Cyp3 clan members, such as Cyp6g1 and Cyp6a5 are associated with DDT or pyrethroid resistance, respectively [
41,
44]. Cyp4D10 and other CYP4 members in
Drosophila are inducible by plant alkaloids and may be important in plant host interactions [
45]. In
D. pulex, differential expression of two Cyp4 genes is associated with resistance to tannic acid and leaf litter [
15]. Cyp4C32 expression is much higher in ecotype 1 and Cyp4AP1 shows much higher levels of transcription in ecotype 2, which is exposed to high amounts of leaf litter and polyphenols, and in turn resistant to toxic leaf litters [
15]. Interestingly, the CYPs that show differential expression based on ecotype and leaf litter exposure, are the two CYPs that are not found within the
D. pulex v1.1 genome sequence assembly. The complete sequence of these two CYPs is not available [
15].
Expansion of the CYP 2 and 4 clans
A comparison of the number of genes in each of the CYP clans from
D. pulex, silkmoth (
Bombyx mori), human (
Homo sapiens), pufferfish (Fugu rubripes), honeybee (
Apis mellifera), sea urchin (
Stronglyocentrotus purpuratus), and fruitfly (
Drosophila melanogaster) indicates both subtle and demonstrative differences in CYP clan numbers between species (Fig. ). In general, the protostomes (fruitfly, honeybee,
D. pulex) have significantly fewer CYP2 clan members than deuterostomes, indicating an expansion of the CYP2 clan in deuterostomes (Fig. ). Of the protostomes investigated,
D. pulex has the greatest percentage of CYP2 clan members. The CYP2 clan encompasses approximately 5.5–10% of the total CYPs in most insects with honeybee the exception at 17.4% of its CYPs in the CYP2 clan [
22]. In contrast,
D. pulex has 21 CYPs in the CYP2 clan, nearly double of any sequenced insect (11/132 in
Aedes aegypti), and
D. pulex's CYP2 clan members encompass 26.9% of its CYPs, indicating a significant expansion of this clan relative to the insects. No other crustacean has been sequenced, so whether the expansion of the CYP2 clan is typical of crustaceans is unknown.
The CYP4 clan is also slightly expanded in
D. pulex relative to the insects. There are 38 CYP4 members that encompass 49% of the CYPs in the genome. The CYP4 clan varies from 8.6–42% of the CYP genome in sequenced insects [
22]. Excluding the honeybee, which only has 4 CYP4 members, the rest of the insect's CYP4 members vary from 30.7–42% of the total CYPs. Relative increases in CYP2 and CYP4 members in
D. pulex leaves a relative reduction in the CYP3 clan compared to insects. Only 16.7% (13/78) of the members of the
D. pulex CYP genome are CYP3 clan members; whereas 38–61% (28–76) of the insect CYPs are CYP3 clan members.
There are several CYP2-clan members similar in structure to other ecdysone metabolizers (CYP18, CYP364 members). However, the formation of juvenile hormone III from methyl farnesoate by CYP15A1 is unnecessary in crustaceans and
D. pulex in turn lacks the CYP15A1 gene. In addition, D. pulex lacks the CYP303 subfamily members with unknown but putative external sensory development function [
5]. Nevertheless, the CYP370 family, which is phylogenetically related to the CYP15 and CYP303 families, has expanded dramatically in
D. pulex, and this family lacks a specific enzyme with a known function. Based on our current knowledge of CYPs, the expansion of the CYP370 family is probably necessary for responses to environmental stressors such as toxicants, and/or other growth or behavioral stimulators such as plant alkaloid toxins.
Tandem repeat regions
Tandem duplicates are genes that are within intron distances of each other, nearly identical (95+% identity), and may harbor some interesting biology. Gene expansion by tandem duplication is common in P450 evolution but the basis for recruitment of a founder gene is not understood. There are several CYP tandem repeat regions in the
D. pulex genome (Fig. ). Overall, 45 of the 77 (58%) CYP genes are located within tandem repeats. Twenty-eight of the 37 (76%) CYP4 family members, including Cyp4C53, are found in tandem repeat regions, which explain in part the expansion of the CYP4 clan in
D. pulex. In contrast, the CYP3 clan has 69% (9/13) of its genes found in tandem repeat regions, including 9/11 of the Cyp360 family members. However, the CYP3 clan is small relative to the insect genomes, thus presence in tandem repeats is not correlated to the size of the clan. Lastly, none of the mitochondrial CYPs, many of which have specific functions in ecdysone synthesis [
19], are members of a tandem repeat region (Fig. ). Expansions of conserved endogenous functions are rare and may be selected against. Conversely, tandem duplications may indicate diversity in exogenous xenobiotic substrates [
3].
The CYP2 clan is highly expanded relative to the insects, but only 9 of the 21 CYP2 (43%) clan genes are members of a tandem repeat region. Initially, we thought this indicated that tandem repeat regions had little to do with the expansion of the CYP2 clan in D. pulex. However, all nine of the tandemly repeated CYP2 clan members are in the CYP370 family. It is interesting that the rest of the CYP2 clan members are not found in tandem repeat regions, and it is tempting to speculate that most D. pulex CYP2 clan members, just as many of the mitochondrial CYPs, have highly specific functions such as ecdysone biosynthesis. Needs for P450s are often met by expansion via tandem duplication leading to gene clusters. This suggests that the CYP370 family expanded while under selective pressure. Which genes expand may be independent of clan or family membership and depend on substrate specificity required to cope with a new xenobiotic stress. The C. elegans genome has almost half of its P450s in the CYP2 clan, yet it only has one mitochondrial clan member, CYP44. Insects have expanded the CYP3 clan into the large CYP6 and CYP9 families and several spinoff families. Deuterostomes have expanded CYP2 extensively, while Trichoplax adhaeren has only one CYP2 clan member (e_gw1.8.275.1|Triad1 at JGI).
Scaffold 4 is especially rich in tandem repeat regions. It contains 26 of the 44 (59%) CYP genes located in tandem repeats, and 19 of these genes are CYP4 family members. Scaffold 4 contains tandem repeats for CYP4AN, 4AP, 4BX, and 4BY subfamilies. Interestingly, a Cyp4BY subfamily member (Cyp4BY5) is located in the middle of the Cyp4AN subfamily tandem repeat region. This is unusual as most of the genes adjacent to each other in a tandem repeat belong to the same subfamily. As the genes diversify a single gene cluster may contain multiple subfamilies as in the CYP2ABFGST and the CYP4ABXZ gene clusters in mammals [
46]. An error could have been made in the annotation of this CYP; however, re-examination of Cyp4BY5 gene provided no evidence of this and this CYP firmly fits in the Cyp4BY subfamily based on identity and phylogenetic status (Fig. ).