Filarial parasites are related to the free-living nematode
Caenorhabditis elegans, a model organism with a fully sequenced and extensively annotated genome. Multiple independent genome-wide analyses of gene function for nearly all ~20000
C. elegans genes have been undertaken using high-throughput RNA interference (RNAi). This data, comprising ~61000 entries, is publicly accessible via Wormbase
[12]. The set of genes with non-wild type phenotypes in RNAi screens constitutes a pool of phenotypically significant and potentially essential
C. elegans genes. We reasoned that homologs of these genes in
B. malayi are also likely to be essential.
C. elegans is generally believed to be a valid model for less genetically tractable parasitic nematodes
[13]–
[15]. Indeed, there is good concordance between the phenotypes resulting from the few cases where genes from filarial nematodes have been targeted by RNAi and similar experiments targeting their
C. elegans orthologs
[16]–
[19].
Using release 150 of Wormbase (
http://www.wormbase.org), we recovered 4827
C. elegans genes with non-wild type RNAi phenotypes (RNAi positive set). From the 11771 predicted gene products in the data snapshot of the
B. malayi genome used in our studies, we identified 7435 as having an ortholog in
C. elegans (Materials and Methods). Of these, 3059 were mapped to the RNAi positive set, constituting a predicted “essential”
B. malayi genome. The majority of these essential genes have close human homologs and were removed. The remainder is a set of 589 first-pass candidate drug targets (,
Table S1).
Analysis of protein domains in the target set shows the presence of several over-represented domains as compared to the whole genome (
Table S2), suggestive of an important role in nematode biology. The C2H2 type zinc-finger domain and basic helix-loop-helix dimerization domain are over-represented 3- and 4-fold respectively in the target list, as compared to the whole genome, indicative of proteins that bind to nucleic acids and are presumably involved in essential gene regulation and developmental pathways in the parasite. The collagen triple helix repeat, over-represented by 5-fold, reflects unique components of the cuticle and extracellular matrix. Twenty-four potential targets contain InterPro domains that can be mapped to 14 distinct Enzyme Commission (E.C.) numbers (
Table S3). Functional classification of the target set using gene ontology (GO) annotations (
Table S4) and statistical analysis of the GO term content () revealed several over-represented terms including cuticle structure and ion transport.
| Table 1Over-represented GO terms in the target pool. |
While the pool of 589 candidates reflects a 20-fold reduction in the search space, it is still too large to enter drug-screening pipelines. To rank the output and identify the most promising potential targets, we developed a computational algorithm for integrating and weighting the biological data from
C. elegans and
B. malayi (). The aim of the prioritization algorithm was to predict the efficacy, selectivity and tractability of each candidate target. Hasan
et al. recently used a similar approach for prioritizing potential drug targets in
Mycobacterium tuberculosis [20].
| Table 2Prioritization factors and relative weighting scheme. |
Potential targets were rewarded for high sequence similarity with
C. elegans orthologs, but penalized heavily for the presence of a close homolog in humans. Based on the protein length ratios of the orthologs, we identified and penalized
B. malayi gene models that were incomplete or fragmented. Examples of such gene models include two previously proposed drug targets, 2,3-bisphosphoglycerate-independent phosphoglycerate mutase
[21] (model 13047.m00009) and chitin synthase 2
[22] (models 12621.m00166 and 14328.m00023) respectively; despite being penalized, these gene models appear in the top half of the ranked list based on their high scores in other positive ranking criteria. In some instances, manual prediction of the complete coding region revealed strong similarity to human proteins which was not detected using the incomplete or fragmented models. RNAi phenotype data for
C. elegans (obtained from Wormbase) was used to prioritize
B. malayi orthologs with respect to their potential efficacy. All reported
C. elegans RNAi phenotypes were binned into nine categories and assigned weights based on the severity of the observed phenotype (see Methods and
Table S5). Adult/larval lethality/arrest was assigned the highest weight. Replicating the adult lethality phenotype would be an important first step towards developing an effective and much-needed macrofilaricide (compound targeting adult worms). To overcome the complications arising from false positives we used ‘phenotype redundancy’
[23] as a measure of confidence, in which independent experiments using different reagents targeting a single gene produce the same phenotype. The product of severity and redundancy for each phenotype category was summed up and normalized by the total number of RNAi experiments for each gene to provide an aggregate confidence score. Interestingly, when the frequency distribution of the binned RNAi categories for
C. elegans sequences orthologous to the target pool was compared with that expected from the whole genome, we observed that reproductive and embryonic phenotypes (sterility and embryonic arrest/lethality) associated with genes involved in highly conserved metazoan processes were under-represented, whereas post-embryonic phenotypes were slightly over-represented (). The latter bodes well for our attempts to prioritize drug targets for larvicidal and macrofilaricidal discovery.
Targets were also prioritized based on data for stage specific expression from approximately 24000 ESTs derived from various stage and gender specific
B. malayi libraries
[24]. Of 589 targets, 252 had corresponding EST sequences. We compiled expression data from microfilariae (L1), L2, L3, L4 and adult stages of the parasite and assigned highest weight to targets which have evidence of expression in all five stages. Next were targets that are expressed in the adults, L4, L1, L3 and L2 stage, in decreasing order of priority.
Other important prioritization criteria included predicted ‘druggability’ and expressability. Druggability can be described as the presence of protein folds that favor interactions with drug-like chemical compounds. Hopkins
et al identified 130 InterPro protein domains that are targeted by established and experimental small molecule drugs that follow the Lipinsky rule of 5 (LR5)
[25]. Similarly, a list of 70 EC numbers of known enzyme targets and respective marketed drugs was compiled
[26]. Proteins with LR5 druggable domains or druggable EC numbers were given a high priority. An important factor for selection of targets for rational drug design is their potential to be expressed in heterologous systems for protein production, purification and crystallization. A genome wide survey for high throughput expression of
C. elegans proteins in
Escherichia coli found that protein expression and solubility are inversely correlated with hydrophobicity. Proteins having GRAVY (grand average of hydropathicity) scores below an empirically derived cutoff of −0.4 were more likely to be soluble
[27]. To prioritize drug targets in
B. malayi, we penalized proteins with a GRAVY score higher than −0.4. A complete set of data values used for prioritizing the potential targets are available in Supplementary
Data Set S1.
The ranked output ( and
S1), sorted by the sum of the individual scores for each predicted target, was then manually curated to improve functional annotations where possible. Twelve known or previously proposed targets were identified; nine of these are among the top 40 targets shown in , endorsing the validity of our approach. Two potential targets, triacylglycerol lipase and adenosine deaminase, having domains associated with druggable enzymes and ten targets with LR5 domains, including the rhodopsin-like GPCR superfamily and integrins (alpha-chain), were found concentrated in the top-half of the list. Many of the candidates were predicted to participate in a variety of essential processes which have no counterpart in mammals, such as molting and synthesis of chitin. Perhaps surprisingly, we also found potential targets that participate in important processes shared across Metazoa. These potential targets are functionally analogous to proteins present in mammals yet they bear no sequence similarity. These include the glycolytic/gluconeogenic enzyme 2,3-bisphosphoglycerate-independent phosphoglycerate mutase (iPGM) characterized previously
[21] and the innexin family of gap junction protein
[28]. The functions of some of our potential targets are described below in more detail.
| Table 3Ranked listing of the top 40 predicted drug targets. |
Molting
Several potential
B. malayi targets identified by our bioinformatics approach may mediate molting. Nematode molting, which takes place 4 times from hatching to adulthood, is a highly regulated and complex process involving the synthesis and secretion of a new exoskeleton, followed by the separation and shedding of the old cuticle
[29]. Steroid hormones have been implicated in triggering molting in nematodes, as found in arthropods
[30],
[31]. A recent genome-wide RNAi screen in
C. elegans has identified 159 genes that are required for molting
[32]. These genes may mediate distinct aspects of the process, from intracellular signaling (such as hypodermal-specific transcription factors) to extracellular execution (such as cuticle-digesting proteases). The sequencing of the
B. malayi genome has revealed that almost all these genes have a
B. malayi counterpart
[11], pointing to phylum-wide conservation in the molting machinery, validating
C. elegans as a good model for this process. There is wide agreement that molting represents an excellent process for chemotherapeutic intervention, given that it is an ancestral feature of the phylum Nematoda and does not occur in vertebrates
[32],
[33]. Consistent with this, we recovered more than a dozen
B. malayi orthologs of proteins necessary for molting in
C. elegans which could be considered potential drug targets. These include the
B. malayi orthologs of
C. elegans NOAH-1 and NOAH-2, which contain zona pellucida (ZP) domains and several plasminogen N-terminal (PAN) modules. These proteins share similarity with
Drosophila melanogaster NompA, a component of the extracellular matrix
[34]. Other high-ranking targets include the orthologs of
C. elegans bli-5 and
mlt-11, which encode predicted serine-peptidase inhibitors containing multiple Kunitz/Bovine trypsin inhibitor domains. These protease inhibitors may play a role in regulating the activity of hypodermally-expressed subtilisin-like peptidases, such as BLI-4, which could be required for processing cuticular collagens and activation of further collagen processing/degrading enzymes, such as astacin metallopeptidases
[35]. Significantly, Kunitz-type serine protease inhibitors have been implicated in molting in the related filarial nematode
Onchocerca volvulus [36], further supporting the hypothesis that the molecular machinery involved in the molting process is conserved between filarial and rhabditid nematodes.
We also identified
B. malayi orthologs of
C. elegans mlt-8 and
mlt-9.
mlt-8 encodes a novel protein that has been proposed to act as an amplifier of endocrine cues during synthesis of the new cuticle, while MLT-9 may be involved in hypodermal signaling
[32]. In addition, we identified orthologs of the
C. elegans Patched signaling family member
ptr-23 and Hedgehog signaling family members
qua-1 and
wrt-4. These genes have been demonstrated to play a role in molting, even though their functions in the process remain unclear
[32],
[37]–
[39]. In particular,
qua-1, which has been implicated in hypodermal signaling, encodes a nematode-specific cysteine peptidase capable of autocatalytic activation.
qua-1 is essential for ecdysis and viability: deletion mutants arrest at the first molt (L1 to L2) exhibiting severe morphological abnormalities.
qua-1 orthologs are both well conserved and ubiquitous throughout the phylum Nematoda
[39], making QUA-1 a particularly attractive target for the development of specific inhibitors
[33].
Structural Components
C. elegans has become one of the preferred models to investigate the assembly and molecular interactions of cell junctions because cell-cell and cell-matrix attachment components are generally well conserved between nematodes and vertebrates (reviewed in
[40]). However, a few nematode-specific components do exist, some of which were identified in our screen, including the
B. malayi homologs of
C.elegans ajm-1 and
pat-12/gei-16. The
C. elegans coiled-coil protein AJM-1 localizes to apical junctions and is required for embryonic elongation and maintenance of epithelial integrity
[41],
[42].
C. elegans pat-12/gei-16 has been implicated in the formation of Fibrous Organelles (FOs), which are found exclusively in nematodes and mediate attachment between body wall muscle and the cuticle across the hypodermis. FOs are essential for viability, ensure maintenance of body rigidity and allow for locomotion
[43]. Phenotypic inspection of
pat-12/gei-16 mutants, together with the molecular characterization of the gene product function, suggest that the protein acts as an adaptor providing linkages between the various structural components of FOs (Benjamin D. Williams and Caroline A. Behm, personal communication;
[44],
[45]). It is noteworthy that in the human filarial nematode
O. volvulus, the homolog of
gei-16 encodes the well-characterized OvB20 larval antigen
[46],
[47]. Immunogold electron microscopy of
O. lienalis with a OvB20-specific serum revealed localization to discrete foci in the hypodermis and cuticle
[47], suggesting that the essential function of
pat-12/gei-16 homologs in formation of FOs is likely to be evolutionarily conserved in filiarial nematodes.
Eight
B. malayi innexin homologs were identified as potential targets (see
Tables S1 and
S2). Innexins are invertebrate structural proteins that form intercellular channels, or gap junctions, allowing electrical coupling between adjacent cells (reviewed in
[28]). Distantly related connexins in vertebrates perform analogous functions. In
C. elegans, the innexin family comprises 25 paralogs, showing different spatio-temporal expression patterns
[48]. Detailed studies on seven
C. elegans inx genes have revealed that particular
inx genes are required for distinct processes including locomotion, egg laying, synchronized contraction of the pharyngeal musculature and inhibition of oocyte maturation
[28],
[49]. Notably, the innexin genes
unc-7 and
unc-9, which are required for locomotion, also modulate response to the anthelmintic drug ivermectin
[50]–
[52].
Chitin is a structural component of the eggshell
[53] and pharynx
[54] of nematodes and it is absent in mammals. As expected, our analyses revealed the two chitin synthase genes previously proposed as drug targets in
B. malayi [22],
[55] and
O. volvulus [22]. These genes are orthologs of the two chitin synthase genes present in the
C. elegans genome that are responsible for chitin deposition in the eggshell (
chs-1) and pharynx (
chs-2) and essential for development
[54]. Functional conservation of nematode chitin synthases is highly likely since the
B. malayi chs-1 transcript is predominantly found in the oocytes and early embryos
[55]. Orthologs of two other
C. elegans genes (H02I12.1 and W03F11.1) encoding proteins containing putative chitin binding domains, were also identified. Interestingly, RNAi against H02I12.1, which contains a peritrophin A chitin-binding module, compromises the egg osmotic integrity during early embryogenesis
[56], suggesting that this gene plays a role in eggshell chitin deposition. Thus, aspects of chitin metabolism are clearly essential in nematodes and involve a number of components worthy of further evaluation as drug targets.
The sugar galactofuranose (Gal
f) is an important component of cell surface glycoconjugates of several prokaryotic and eukaryotic pathogens and has been shown to be essential for viability and virulence
[57]–
[59]. From the
B. malayi genome, we annotated two putative orthologs of UDP-galactopyranose mutase (GLF), the enzyme that is required for biosynthesis of Gal
f. Both the sugar and the enzyme are absent from mammals making GLF an attractive drug target
[57].
Central Metabolism
In nematodes, the glucose disaccharide trehalose is proposed to serve as an energy reserve and a protectant against various environmental stresses such as heat, cold and freezing, oxidative and osmotic stress, anoxia, even dessication and anhydrobiosis
[60],
[61]. It is an abundant storage sugar in the filarial nematodes
Brugia pahangi and
Acanthocheilonema viteae [62] and is also found in bacteria, fungi and insects but not in mammals. We identified trehalose-6-phosphate phosphatase as an ortholog of the essential
C. elegans gene
gob-1 (gut obstructed). Removal of this gene activity in
C. elegans gives rise to larval lethality, partly due to intestinal blockage and subsequent starvation
[63]. This
gob-1 lethality is completely suppressed when the upstream trehalose-6-phosphate synthase genes are deleted, indicating that the lethality is due to toxic build-up of the intermediate trehalose-6-phosphate
[63].
Mammals take up various unsaturated fatty acids from food as essential nutrients whereas C.
elegans has fatty acid desaturases that catalyze the production of polyunsaturated fatty acids
[64]. Among the highly ranked targets was the
B. malayi ortholog of the essential
C. elegans fat-2 gene encoding a Δ-12 fatty acid desaturase that converts oleic acid (18:1) to linoleic acid (18:2) implying that
B. malayi also synthesizes polyunsaturated fatty acids rather than acquiring them from the host environment.
The glycolytic/gluconeogenic pathway is present in most cellular organisms, however, the enzymes in the pathway may not be conserved. We identified a 2,3-bisphosphoglycerate-independent phosphoglycerate mutase (iPGM) as such an example. This enzyme has a distinct sequence and structure from the 2,3-bisphosphoglycerate-dependent phosphoglycerate mutase (dPGM) found in mammals. Both enzymes are responsible for the interconversion of 2-phosphoglycerate and 3-phosphoglycerate, however different catalytic mechanisms are involved. The biochemical activities of both
B. malayi and
C. elegans iPGM enzymes have been demonstrated as well as the essentiality of the gene for nematode development. Down regulation of
C. elegans iPGM using RNAi, results in embryonic and larval lethality
[21].
Nucleic Acid Metabolism
Other potentially interesting targets revealed by our analysis include orthologs of
C. elegans transcription factors
lin-14,
die-1 and
pry-1 known to be involved in key developmental and morphogenetic processes.
C. elegans lin-14 is a nematode-specific transcription factor required for larval stage-specific gene expression
[65]. Mutations in
lin-14 cause cell lineage defects in several cell types. The
C. elegans gene
die-1 belongs to the zinc finger family of transcription factors. Loss of
die-1 affects epithelial cell rearrangements during embryonic epidermal morphogenesis, leading ultimately to embryonic arrest
[66]. We also recovered the
B. malayi homolog of
C. elegans pry-1 [67] encoding a protein with limited homology to vertebrate Axins, which act as scaffold proteins in the Wnt/beta-catenin signaling pathway
[68]. Despite its sequence divergence, PRY-1, like Axin, serves as a negative regulator in the Wnt signaling pathway in
C. elegans and can functionally complement for the
Danio rerio (zebrafish)
axin1 knockout
masterblind [69]. This example illustrates how specific components of signaling pathways, which are conserved between vertebrates and nematodes but have diverged at the primary sequence level, may differ sufficiently to allow for the development of nematode-specific inhibitors.
We also identified genes involved in RNA processing.
Trans-splicing, which involves the addition of a short leader sequence to the 5′-end of mRNA, is an essential step in the maturation of most mRNAs in nematodes and several other invertebrates and protozoa (reviewed in
[70]). Our analysis identified the
B. malayi orthologs of two known components (SL30p and SL95p) required for
in vitro RNA
trans-splicing in embryonic lysates from the human nematode
Ascaris lumbricoides [71]. Recently, orthologs of these two genes in
C. elegans (
sut-1 and
sna-2 respectively) have also been implicated in RNA
trans-splicing
[72]. Additionally, we identified an ortholog of
C. elegans ego-1, which belongs to a family of RNA-directed RNA polymerases.
ego-1 is essential for viability and fertility and in particular plays a crucial role in germline development, where it promotes cell proliferation, meiosis, and gametogenesis. It is thought that EGO-1 influences all these distinct processes by inducing and reinforcing germline RNAi of specific genes
[73]–
[75]. While many components of the RNAi pathway appear to be missing from the
B. malayi genome, most notably the spreading machinery
[11], presence of
ego-1 suggests conservation of the role of this class of RNA-directed RNA polymerases in germline silencing across Nematoda.
In addition to drug target discovery, our method highlights proteins participating in biological processes that are necessarily conserved across parasitic and free-living worms; in the case of
B. malayi and the sequenced Caenorhabditids these processes span an evolutionary distance of 350 million years since their last common ancestor
[11]. This substantially extends our confidence in identifying nematode-centric processes over those conserved only between the Caenorhabditid genomes. Significantly, 50% of the targets were annotated as hypothetical proteins. These may participate in completely novel nematode processes and are worthy of further study.
The recently completed draft genomic sequence of B. malayi has enabled us to predict potentially essential genes and apply a method for rational drug target discovery. In contrast to empirical methods, the bioinformatics approach described herein yields a larger pool of candidates and is not biased, thereby providing a wider range of potential targets. Given the threat of emerging drug resistance resulting from continued reliance on a limited repertoire of available drugs, a wider array of choices for drug targets will be invaluable. The method is also tunable and quickly provides a manageable set of targets for closer analysis. By adjusting the parameters of the comparative sequence analysis, the initial target pool size can be increased or decreased by an order of magnitude. Varying the weights for the factors used in the prioritization scheme can tailor the ranking to the needs of the end-user.
The basic subtractive filtering methodology is applicable to a wide variety of sequenced pathogens, ranging from microbial species to the metazoan parasite analyzed here. Although it is currently limited by the availability of complete genome sequence and functional genomics data, the rapid pace of technological advancements in these areas will soon overcome those limitations, and we expect this methodology to gain widespread applicability.