Looking for ERG orthologs in C. elegans and D. melanogaster
In this section we present our search for ERG
gene orthologs in C. elegans
and D. melanogaster
following their order in the sterol synthesis cascade (as shown in ). We have taken advantage of the fact that the full genomic sequences of these two animals are available. We have used BLASTp 
and considered as orthologs the best reciprocal hits 
. The results are summarized in and .
Details of the BLAST analysis that allowed the detection of ERG orthologs in C. elegans and D. melanogaster.
We failed to detect the squalene synthase (ERG9p) homolog, which catalyzes the first committed step in cholesterogenesis. In the search for squalene epoxidase (ERG1p) in a BLAST using ERG1p from yeast, we detected the gene CBG19254 in C. briggsae
, with a marginal Score (S
48 bits, E-value
). This protein, which also exists in C. elegans
, contains two functional domains: monooxygenase and UbiH (2-polyprenyl-6-methoxyphenol hydroxylase and FAD-dependent oxidase). This is also the case in yeast ERG1p. However, in a reverse BLAST, CBG19254 recognized with very high score yeast Coq6 and only marginally ERG1p (S
). A similar behavior was observed for GA20231-RA from D. pseudoobscura
, so we did not proceed to a further analysis of these sequences.
For lanosterol synthase (ERG7p), which follows in the classical pathway, we could not gather any convincing evidence for the existence of orthologs in the fruitfly and the nematode.
The following enzyme in the pathway is a sterol 14α-demethylase, ERG11p/Cyp51, involved in the biosynthesis of cholesterol, phytosterols and ergosterol. Thus, it is the only cytochrome P450 having an ortholog common to animals, plant and fungi 
. Similarity BLAST hits with yeast ERG11p/Cyp51, were obtained in Drosophila
(CG2397, CG10247 and other Cyps). However, the Drosophila
genes are likely to belong to other Cyp subfamilies (not Cyp51). Cyp51 is probably missing, which is in agreement with the results of Tijet, Helvig & Feyereisen 
who analyzed 90 sequences of the cytochrome P450
gene superfamily. Cyp51 is also absent in C. elegans 
orthologs in D. melanogaster
(CG17952) and in C. elegans
(B0250.9) were easily found. The corresponding proteins contain the ERG4-24 domain. D. melanogaster
produces three isoforms that are longer than the yeast ortholog, a peculiarity that they share with the human ortholog. The ortholog of ERG24
in mammals encodes the Lamin B receptor (LBR), a nuclear envelope protein first described in vertebrates. LBR bears extensive structural similarities with the members of the sterol reductase family (ERG24p and ERG4p). Human LBR (hLBR) cannot restore ergosterol biosynthesis in an ERG4
yeast mutant, whereas it is able to restore ergosterol prototrophy in an ERG24
mutant. This strongly suggests that hLBR is a sterol C14-reductase 
. Not surprisingly, a mutation in the hLBR
gene causes an autosomal recessive disease called hydrops-ectopic calcification-‘moth-eaten’ (HEM). This mutation leads to high levels of cholesta-8,14-dien-3-beta-ol in cultured skin fibroblasts, which is compatible with a deficiency of the cholesterol biosynthetic enzyme 3-beta-hydroxysterol delta(14)-reductase 
The hLBR contains two major domains: a ~220-amino-acid N-terminal segment highly charged, and a hydrophobic C-terminal half with eight putative transmembrane segments 
. Interestingly, it has been hypothesized that the region encoding the N-terminal domain of the LBR
gene arose from an ancestral gene coding for a soluble nuclear protein (which provides a nuclear localization signal) and that the rest of the protein evolved from another gene, similar to yeast ERG24
. Indeed, the C-terminal hydrophobic domain of LBR can be retained in the endoplasmic reticulum when expressed in transfected cells, as expected for the ortholog of ERG24
in mammals. In turn, the N-terminal domain is transported to the nucleus 
. This domain might be responsible for the targeting of the hydrophobic domain to the nucleus and for the interaction with lamin B 
. So far, cholesterol synthesis is supposed to occur in the smooth ER. Since the N-terminal domain of LBR is responsible for its nuclear localization, it would be interesting to investigate whether the LBR transcript, or protein, undergoes some processing leading to the production of a C-terminal domain sorted to the ER.
Recent functional studies show that the Drosophila
CG17952 gene is the ortholog of vertebrate LBR 
. The protein encoded by CG17952 shares some properties with hLBR. The Drosophila
LBR (dLBR) possesses a highly charged N-terminal domain of 307 amino acids followed by eight transmembrane segments. Transmembrane segments 1–6 are similar in length and position to the transmembrane domains 1–6 of hLBR. However, the putative membrane domains 7 and 8 of dLBR are shorter than those of hLBR. Thus, dLBR is expected to have a topological organization similar to that of its vertebrate orthologs 
. dLBR is able to bind to the Drosophila
lamin B, a function residing in the N-terminal domain. Not unexpectedly, dLBR does not display sterol C14 reductase activity when expressed in the yeast ERG24
mutant. This shows that, during insect evolution, although the enzymatic activity of this protein has been lost, its capacity to bind lamin B has not. However, depletion of dLBR by RNA interference does not lead to any obvious effect on nuclear architecture, or viability, of treated cells and embryos. Thus, although dLBR might be important, it is not a limiting component of the nuclear architecture in Drosophila
cells, at least during the first days of development 
. Our BLAST search shows that sequence B0250.9 is the potential ERG24
ortholog in C. elegans
. It would be interesting to experimentally assess if it has kept LBR activity.
Sequences homologous to ERG25p (C-4 methyloxidase) were also easily found in both D. melanogaster (CG1998/dERG25A and CG11162/dERG25B) and C. elegans (F49E12.9/ERG25A and F49E12.10/ERG25B) using the sequence of yeast ERG25p as a starting point. In both organisms the duplicated copies of ERG25 are located in the same chromosome (chromosome II for C. elegans and chromosome X for D. melanogaster). The two paralogs in D. melanogaster are separated by 0.25 Mb and contain a different number of predicted exons. Namely, CG1998 contain 6 exons, while CG11162 contains only 2. However, the last intron of both genes interrupts the coding sequence at very similar positions (i.e. between the second and the third positions of a Lys codon, ). In C. elegans the paralogs are at less than 3 kb away from each other. F49E12.9 contains 8 exons while F49E12.10 has 5 exons that may have been produced through exon fusion/splitting events.
Segments of the Drosophila paralogs CG1998 and CG11162 (homologs of Erg25) and the corresponding conceptual translations.
ERG26p (C-3 dehydrogenase) belongs to the 3β-hydroxysteroid dehydrogenase family and convincing evidence for the presence of orthologs in C. elegans (ZC8.1) and D. melanogaster (CG7724) was obtained in BLAST searches taking the sequence from yeast as the starting point (See details in ). In the case of Drosophila CG7724, similarity with yeast ERG26p extends over the first 250 amino acids while the remaining amino acids are more divergent. Interestingly, when performing a BLAST with the Drosophila sequence, the first hit in C. elegans was C32D5.12, but it was not the best hit either with yeast, or with A. thaliana, or even with the human (NSDHL) orthologs. However, a BLAST with C32D5.12 detected as the first hits NSDHL in man and ERG26 in yeast (). Thus, it is tempting to invoke some kind of sequence convergence as an explanation for this behavior.
In agreement with Breitling et al. 
we failed to find any clear homologue of ERG27p (C-3 ketoreductase) in both Drosophila
and C. elegans
, although several oxidoreductases were detected.
As outlined above, ERG28p might tether many other ERG proteins to the ER. The ERG28p ortholog of C. elegans
(C14C10.6) was hardly detectable by BLAST starting with yeast sequences. This precludes the use of standard phylogenetics methods to show orthology. However, further evidence of sequence relatedness was gathered using Psi-BLAST with the yeast sequence against the Metazoa division of Genbank 
. We included in the iterations the very divergent sequence CG17270 of D. melanogaster
, which is the ortholog of ERG28p according to our previous results 
. This allowed us to detect C14C10.6 as a signinficant hit (S
120 bits and E
). Reverse Psi-BLAST also suggested orthology (i.e. significant scores). The sequence of C14C10.6 is so divergent that it had no match in the conserved domain database (CDD). Moreover, we also computed the hydrophobic profiles for some of the orthologs 
and calculated the Pearson correlation coefficient “R” for various pairs of profiles. Significant R-values were obtained for the pair-wise comparisons (). This is not a proof of orthology but strengthens the idea of structural relationship at the protein level. Finally, we found that all proteins had similar lengths and were basic, with isoelectric points (pI) >8.5.
Hydrophobic profiles of several potential ERG orthologous proteins.
The ortholog of ERG6p in C. elegans
(H14E04.1) was easily detected by BLAST with protein sequences from either yeast or A. thaliana
(SMT1). We used the plant sequence since no clear ERG6
homolog could be detected in human. The issue with Drosophila
turned out to be more complex because, when starting the search with the yeast sequence, we detected CG8067 marginally (E>1). This was even worse when starting with the sequence of A. thaliana
. However, when using the C. elegans
sequence as a starting point, the first significant hit in Drosophila
) was CG2453, which proved to be the ortholog of yeast Coq5, but not of ERG6p. Finally, considering i) the similar lengths of the previously marginally detected CG8067, of H14E04.1 and of SMT1 proteins and ii) their similar pI, we have preferred CG8067 as the most likely ortholog. Indeed, the hydrophobic profiles of SMT1 and the protein encoded by CG8067 displayed a strong correlation. Namely, we obtained an R
0.43 with a p-value 10−16
The ortholog of ERG2p in C. elegans
(W08F4.3) was marginally detected by BLAST with the yeast and the human protein (opioid sigma-1 receptor, OPRS1). However, W08F4.3 was found to contain a sigma1-receptor domain when compared with the CDD. This strengthens the idea that this gene is the ortholog of OPRS1
. The situation in Drosophila
was more complex. When starting our BLAST with either yeast or human sequences, we detected the sequence HDC14735 (DAA04220) very marginally (E
1) and no conserved domain was found. However, when starting with the C. elegans
sequence, it came as the best hit with E
0.002 (the reverse was also true, with E
0.003). Although not significant, this result was taken as suggestive of similarity. The results were improved using Psi-BLAST. Again, we computed the hydrophobic profiles for the various potential orthologs and we found strong correlations (). Finally, the pI of the protein encoded by HDC14735 (pI
6.14) was comparable to those of OPRS1 (5.61) and ERG2 (5.54) (see below).
In BLAST searches with yeast ERG3p, we detected again CG1998 and CG11162 in D. melanogaster and F49E12.9 and F49E12.10 in C. elegans, but with worse scores than in BLASTs with ERG25 (S~55 bits versus 85 bits respectively). Therefore, we propose that the orthologs of ERG3 are potentially missing in both organisms.
For ERG5p, which belongs to the big Cyp protein family, no clear orthologs could be established. However, three potential candidates were found: CG4321-PA (Cyp4d8), CG3540-PA (Cyp4d14) and CG8859-PA (Cyp6g2). In the reverse BLAST, they all matched ERG5p as the best scoring hit in yeast.
Finally, the search for ERG4p orthologs led to the same ERG24 orthologs in the nematode and the fruitfly. Moreover, with ERG4p the BLAST scores were worse than with ERG24p. Thus, either ERG4 orthologs are missing or they have been replaced by ERG24.
Analyzing expression-profiling data to explore the potential function to the divergent ERG orthologs
Co-expression is indicative of i) physical interaction between proteins and ii) of membership to the same complex or molecular process 
. This is called the paradigm of “guilt by association” 
. Thus, we have used published microarray expression data, downloadable from the Gene Expression Omnibus of the NCBI, to investigate potential co-expression patterns of the ERG
orthologs that we have described above for both D. melanogaster
and C. elegans
. Co-expression can be assessed by determining the correlation coefficient. The correlation coefficient can be artificially inflated by flat profiles (no changes in the expression of the relevant genes). To avoid this, we focused on experiments where the genes of interest display strong variation (see ). Thus, we gathered data concerning 51 different microarray experiments for D. melanogaster
respecting the criterion outlined previously. We performed a similar analysis for C. elegans
but, unfortunately, the most interesting genes showed flat profiles (close to 0 in all experiments) and it was not possible to proceed further with the analysis.
Expression profiles of several genes expressionally correlated with dLBR and CG1998.
First, we asked whether the expression profiles of the D. melanogaster ERG
orthologs were correlated (). The strongest correlation was found between the dLBR
) and CG1998 (ERG25
) with an associated p-value of 10−16
(after a Bonferroni correction). Such a p-value means that only one correlation coefficient out of 1016
is expected to be as high as 0.89 just by chance (for n
51 experimental points). Considering the maximum number of possible correlations for the 14000 transcripts in the microarrays (representing the Drosophila
genome), such a high R cannot be found by chance. The behavior of dLBR and CG1998 might be reminiscent of the situation in yeast because ERG24p and ERG25p are supposed to interact, according to Epistatic MiniArray Profiling experiments 
. CG7724 and CG11162 displayed the second highest R (R
0.03), but this R is not relevant in genomic terms.
Expressional correlation among D. melanogaster ERG orthologs.
At first, we were expecting good expressional correlation among the ERG
orthologs in Drosophila
. However, it seems that only dLBR and CG1998 still “remember” their ancestral belonging to the sterol biosynthesis pathway. Poor expressional correlation among the rest of the ERG
orthologs also suggests that the corresponding proteins either have lost their ability to physically interact in order to form stable complexes, or they do so in conditions/moments not covered by the microarray experiments explored here. Then, we focused our attention on dLBR
and CG1998 by determining which other genes were expressionally correlated with them. For this, we gathered 84 genes displaying R≥0.875 with respect to both genes. For a correlation involving 51 data points, this R cut-off is associated with a safe p-value of 10−13
after correction (). In order to get insights about these 84 genes, we used the functional classification tool of the DAVID database (http://david.abcc.ncifcrf.gov/ 
). This software provides a rapid means to organize large lists of genes into functionally related groups and to unravel biological relationships.
In the analysis using DAVID, the most overrepresented class included genes encoding membrane proteins, often targeted to the ER, where sterol biosynthesis takes place in prototrophs (Group 1, in ). Interestingly, several of these proteins are supposed to be involved in co-translational protein targeting to membranes, signal peptide recognition, heat shock protein-binding, as well as unfolded protein binding, or to be elements of the translocon (a complex of proteins associated with the translocation of nascent polypeptides into the ER 
). The following functional category (Group 2) contained four transporter proteins while the last functional group involved chaperones (i.e. peptidyl-prolyl cis-trans
isomerase), chaperone cofactors or unfolded protein-binding factors. A similar analysis conducted using g
confirmed that genes encoding protein folding actors were overrepresented among the genes displaying strong expressional correlation with dLBR and CG1998 (p<10−5
). The existence of expressional correlation does not imply any causality. In fact, from this exploration it is not possible to determine whether dLBR and CG1998 somehow interact with other partners to participate in intracellular protein trafficking or folding, or on the contrary, they undergo the action of the latter. Since dLBR has been shown to be a nuclear protein 
, it would be interesting to investigate whether the dLBR transcript or protein are somehow processed to produce a C-terminal polypeptide that might be sorted to the ER. That would explain the expressional correlation between dLBR and ER proteins observed above. On the other hand, we have also explored the annotation of the genes expressionally correlated with the rest of the ERG
orthologs. However, no unifying theme emerged from this analysis (data not shown, available upon request). All in all, the strong expressional correlation between dLBR and CG1998 with proteins involved in intracellular protein trafficking or folding, and the absence of such correlation with other Erg orthologs (that also require chaperons) suggest that the involvement of dLBR and CG1998 in both processes is worth exploring.
Functional clustering of genes whose expression profiles stronlgy correlate with those of dLBR and CG1998 (using the DAVID classification tool at Medium stringency).
In a previous paper, given the structural similarity between cholesterol and ecdysteroids, we had proposed that divergent ERGp orthologs might somehow participate in the synthesis of the latter 
. We have therefore assessed the expressional correlation between candidate genes involved in this process: Dare1
, Jhamt 
and Start1 
, with dLBR
. While for Jahmt
the values of R are below 0.6, a very strong correlation (R
0.8) was found for Start1
. Interestingly, Start1
, which is involved in intra-mitochondrial sterol transport, is expressed ubiquitously. However, in situ
hybridization demonstrates a stronger expression in the prothoracic gland, where ecdysteroids are synthesized from cholesterol. These and other observations are consistent with the idea that Start1 plays a key role in the regulation of ecdysteroid synthesis 
. The potential functional link between dLBR, CG1998 and Start1 is also worth exploring.
In conclusion, we detected a preservation of ERG
genes in Drosophila melanogaster
and Caenorhabditis elegans
. In spite of their sequence divergence with respect to the corresponding orthologs in sterol prototrophs, they still are under selective pressure. Since insects are unable to synthesize cholesterol de novo
, an appealing way to explain this evolutionary acceleration is that ERGp orthologs have other biological functions in addition to sterol synthesis. This is clearly the case of the LBR, which is also a reductase in sterol prototrophs. Shut-down of cholesterogenesis in insects and nematodes would have allowed these proteins to evolve as much as their other functions were not compromised 
. Another, less parsimonious, explanation would be the evolution of different novel functions. Our microarray meta-analysis shows strong expressional correlation between the orthologs of ERG24
in D. melanogaster
and genes encoding factors involved in intracellular protein trafficking and folding. This is compatible with our idea that ERGp might be involved in other biological roles in addition to sterol synthesis. The potential link between ERG proteins and intracellular protein trafficking and folding deserves experimental exploration not only in Drosophila
but also in sterol prototrophs. Moreover, the potential link between dLBR, CG1998 and Start1 is to be explored in D. melanogaster
. This is compatible with our previous idea of a potential implication of these proteins in the synthesis of ecdysteroids. We hope that this genomic exploration and the hypotheses prompted here might open new avenues of experimental research.