|Home | About | Journals | Submit | Contact Us | Français|
Plasmodium parasites are causative agents of malaria which affects >500 million people and claims ~2 million lives annually. The completion of Plasmodium genome sequencing and availability of PlasmoDB database has provided a platform for systematic study of parasite genome. Aminoacyl-tRNA synthetases (aaRSs) are pivotal enzymes for protein translation and other vital cellular processes. We report an extensive analysis of the Plasmodium falciparum genome to identify and classify aaRSs in this organism.
Using various computational and bioinformatics tools, we have identified 37 aaRSs in P. falciparum. Our key observations are: (i) fraction of proteome dedicated to aaRSs in P. falciparum is very high compared to many other organisms; (ii) 23 out of 37 Pf-aaRS sequences contain signal peptides possibly directing them to different cellular organelles; (iii) expression profiles of Pf-aaRSs vary considerably at various life cycle stages of the parasite; (iv) several PfaaRSs posses very unusual domain architectures; (v) phylogenetic analyses reveal evolutionary relatedness of several parasite aaRSs to bacterial and plants aaRSs; (vi) three dimensional structural modelling has provided insights which could be exploited in inhibitor discovery against parasite aaRSs.
We have identified 37 Pf-aaRSs based on our bioinformatics analysis. Our data reveal several unique attributes in this protein family. We have annotated all 37 Pf-aaRSs based on predicted localization, phylogenetics, domain architectures and their overall protein expression profiles. The sets of distinct features elaborated in this work will provide a platform for experimental dissection of this family of enzymes, possibly for the discovery of novel drugs against malaria.
Aminoacylation is the process of adding an aminoacyl group to the 3' end (CCA) of the tRNA molecule. tRNA is aminoacylated with a specific amino acid by aminoacyl-tRNA synthetase (aaRSs). aaRSs are responsible for attaching correct amino acid onto the cognate tRNA molecule in a two-step reaction. The amino acid is first activated with ATP forming an aminoacyladenylate intermediate. Once activated, this amino acid is transferred to the 3' end of its corresponding tRNA molecule to be processed during protein synthesis. All aaRSs require divalent cation MgCl2 for their aminoacylation reaction [1,2].
1. amino acid + ATP → aminoacyl-AMP + PPi
2. aminoacyl-AMP + tRNA → aminoacyl-tRNA + AMP
The aaRSs are divided into two major classes based on structural topology of their active sites. Class I aaRSs represent 11 amino acids, including Arg, Cys, Gln, Glu, Ile, Leu, Lys, Met, Val, Trp and Tyr. Class II aaRSs includes 10 amino acids - Ala, Asp, Asn, Gly, His, Lys, Phe, Pro, Ser and Thr. Core domains of class I enzymes are characterized by a Rossmann fold which consists of α-helices and β-pleated sheets. This domain contains two conserved motifs ('HIGH' and 'KMSKS') which are directly involved in ATP binding. Catalytic domain of class II enzymes has a unique fold with a central core of anti-parallel β strands flanked by α helices . There are three weakly conserved motifs, two of them are involved in ATP binding while the third one plays a role in homo dimerization. Class I enzymes bind ATP in an extended conformation while class II do so in a bent conformation. The two aaRS classes have different modes of aminoacylation - class I enzymes aminoacylate the 2'OH of the cognate tRNA whereas class II enzymes aminoacylate 3'OH of the tRNA (with the exception of PheRS) . All known aaRSs are multidomain proteins with complex modular architectures . In addition, eukaryotic aaRSs are distinguished by the presence of appended domains at either the N- or C-terminus which are generally absent from their bacterial/archaeal counterparts . These appendages to the catalytic cores of several aaRSs are non-catalytic and instead function to mediate protein- protein interactions or act as general RNA-binding domains [7-9].
In mammalian cells, some aaRSs are present as a larger multi- aaRS complex (MSC) composed of nine synthetases (arginyl-, aspartyl-, glutamyl-, glutaminyl-, leucyl-, lysyl-, isoleucyl-, methionyl- and prolyl-tRNA synthetases) [10-12]. The MSC is composed of a mixture of class I and class II aaRSs along with three non- aaRS proteins p38, p43 and p18. It is not clear why certain aaRSs exist as a complex while some are in free form. MSC might help in efficient protein synthesis by preventing mixing of charged tRNAs with cellular pool and by increasing local concentration of tRNA near the site of protein synthesis .
The accuracy of tRNA aminoacylation reaction is critical in ensuring fidelity in protein translation . To achieve this accuracy, some aaRS enzymes possess a proofreading (editing) mechanism that hydrolyzes tRNAs aminoacylated with the non-cognate amino acid . For example, editing domains may be found attached to alanyl-tRNA synthetase (AlaRS), leucyltRNA synthetase (LeuRS) and so on [16-21]. In other cases, the editing domain is not attached to aaRS but rather functions as an individual protein [22,23]. For example, YbaK protein from Haemophilus influenza is capable of efficiently editing Cys-tRNAPro . ThrRS has been shown to have another editing domain called NTD which can cleave the bond between D-amino acid and tRNA .
Recently it has been shown that aaRSs are not only involved in protein synthesis but also perform many non-catalytic and non-canonical roles in RNA processing/trafficking, apoptosis, rRNA synthesis, angiogenesis and inflammation [26-30]. These versatile properties of aaRSs are the outcome of their differential cellular localization, nucleic acid binding properties, protein-protein interactions and collaboration (fusion) with additional domains. In case of malaria parasite, apicoplast proteins and pathways have already received particular attention as drug targets . In this work we present a study of aaRSs from P. falciparum - the most virulent agent of human malaria. Our aim for this study was to use bioinformatics tools to (a) discover special and unusual modules present in parasite aaRSs which are potentially absent from human homologues, and (b) to identify potential new drug targets based on this protein family.
We exploited current annotation available in PlasmoDB  to identify the repertoire of aaRSs in P. falciparum genome. According to Enzyme Commission (EC) 37 proteins in PlasmoDB (see additional file 1) are annotated as belonging to the EC group 6.1.1. (EC number provided for aaRSs). Although in many cases current annotations allow an assignment to Class I or II of aaRSs, for some annotations are still preliminary. Due to this, we used Hidden Markov Models (HMMs) for identifying aaRSs in P. falciparum. For each aaRS a set of known sequences was utilized to construct 20 HMMs (see methods for details). For each database search a score distribution was obtained and 4 cutoffs were considered to identify aaRS. Results are reported in Table Table1.1. We observed that 2 proteins annotated as belonging to EC group 6.1.1.- in PlasmoDB are not found by HMMs - PF14_0401 annotated as MetRS is instead a generic tRNA binding protein as elucidated in the genome re-annotation process, while the second one (PFC0470w) is still mis-annotated as ValRS. A total of 18 Pf-aaRSs can be classified within the 10 aaRSs that define class I. All members of this class are represented in the P. falciparum proteome. The annotations of these sequences are summarized in additional file 1. Similar to class I Pf-aaRSs, the class II Pf-aaRSs have a total of 18 sequences for 10 different amino acid synthetases. Four genes are present in P. falciparum for PheRS but these likely encode for 1 heterodimeric and 2 monomeric versions of PheRS.
In order to carry out comparative analyses of aaRSs of P. falciparum with those of other species we considered aaRS sequences from several organisms representing three domains of life (see methods section). As expected, we found variable number of aaRSs in different species. M. jannaschii (archaebacteria) and M. tuberculosis (bacteria) have the lowest aaRSs count amongst other organisms like E. coli, S. cerevisiae, D. discoidium, P. falciparum, O. sativa, R. norvegicus, D. melanogaster, and H. sapiens. Human bears the highest number of aaRSs in this analysis (Figure (Figure1a).1a). Our analysis also shows that P. falciparum has the highest aaRS fraction (relative to its proteome size) when compared with bacteria, yeast and human counterparts (Figure (Figure1b).1b). The number of individual aaRS varies in different species. For example, when individual aaRSs from human and P. falciparum were compared it was evident that AlaRS and ThrRS were higher in number in humans (Figure (Figure2).2). Presence of more than one copy of each aaRS in an organism may indicate additional biological, temporal or spatial roles for these enzymes as several aaRSs also perform non-canonical functions . In this work we describe in detail the 37 Pf-aaRSs.
It was earlier believed that 20 aaRSs were necessary for the incorporation of 20 amino acids in proteins. But surprisingly, some archaea, bacteria and chloroplasts lack GlnRS and AsnRS enzymes [34-38]. Interestingly, these organisms use an alternate pathway based on tRNA dependent amino acid transformation. A non-discriminating GluRS charges tRNAGln with glutamic amino acid and then a second enzyme called tRNA-dependent amidotransferase (AdT) amidates glutamate to make glutamine. A corresponding reaction occurs in case of asparagine residues. In case of P. falciparum, occurrence of glutamate-tRNA synthetase (PF13_0257, MAL13P1.281) and amidotransferase subunit A (PFD0780w) & subunit B (PFF1395c) together indicates presence of both direct and indirect pathways for aminoacylation [39,40]. Both subunits of amidotransferase have apicoplast targeting signals suggesting an indirect pathway for aminoacylation in P. falciparum apicoplast. The expression of Pf-AdT subunit A is predicted in all life cycle stages of parasite based on proteomic and microarray data. We therefore feel that this pathway must also be active in the parasite apicoplast. We could not find sequence homologues of enzymes involved in indirect aminoacylation of cysteine residues [41-43] in the proteome of P. falciparum.
In mammalian cells, some aaRSs are present as a larger multi-aaRS complex (MSC). A constituent of the MSC - protein p43 - has sequence homologue (PF14_0401 - EMAP-II-like cytokine) in P. falciparum although there is no evidence for presence of MSC in malaria parasites. Interestingly, p43 is not only required for stability of the MSC complex but also functions as a proinflammatory cytokine [44-46]. Role of p43 homolog in P. falciparum is unknown, but evidence from other organisms indicates that MSC functions in protein stability, efficient protein translation and protein elongation . Sequence identity between P. falciparum p43 and its human homolog is ~24% and based on microarray data p43 seems to be expressed at asexual life cycle stages of P. falciparum. A mitochondrial targeting signal was also predicted for parasite p43 but the role of p43 in parasite remains to be explored experimentally.
aaRSs are not only involved in protein synthesis but also in various other cellular activities including intron splicing, translational regulation and tRNA channeling. Diversified roles for aaRSs necessitate their presence (transit) into various cellular compartments. We therefore analyzed P. falciparum aaRS sequences for presence of putative signal sequences predicted by MITOPROT, PredictNLS and PATS for mitochondria, nucleus and apicoplast respectively. We found that 23 P. falciparum aaRSs have signal peptides, possibly for directing them to different cellular organelles. Another 14 aaRSs from P. falciparum may be resident in the parasite cytoplasm (Figure (Figure3a).3a). Apicoplast is known to have protein synthesis machinery which may use aaRSs . Trafficking of nuclear encoded aaRSs to the apicoplast may explain why ~20 out of 37 Pf-aaRSs have apicoplast targeting signals. Our data indicate that out of total ~20 Pf-aaRSs bearing apicoplast targeting signals, ~12 aaRSs may be exclusive to this organelle. Others are predicted to be shared between apicoplast, nucleus and mitochondria (Figure (Figure3b).3b). It has been earlier shown that some tRNAs need to be aminoacylated in the nucleus before they can be exported to the cytoplasm, an observation indicating occurrence of aminoacylation reaction (mediated by aaRSs) inside the nucleus . In P. falciparum, we found 10 aaRSs with nuclear localization signals but only one is predicted to be exclusively resident in the nucleus (PFA0480w- PheRS). Interestingly, we found no Pf-aaRS sequences with specific PEXEL (Plasmodium export element) motifs. This motif is found in parasite proteins that are exported beyond the parasitophorous vacuole membrane [50,51].
In order to study expression of aaRS during life cycle of the malaria parasite, we took advantage of available transcriptomics and proteomics data from PLASMODB. Firstly, we analyzed proteomic data from several independent experiments and compared them with transcriptomics data by Le Roch . The latter sets of data were obtained using the affimetrix technology and hence provide a quantitative measure of mRNA levels in the parasite. Our results are provided in Table Table2.2. Interestingly, we found that mRNA levels of potential apicoplast proteins (AP in the table) are lower on average (mean1 = 44.6; mean2 = 41.5; gam = 91.3; spor = 58.1) than those of potential cytoplasmic proteins (mean1 = 259; mean2 = 264.8; gam = 174.8; spor = 73.8). Proteomic data confirmed that while the cytoplasmic aaRS are found in almost all stages, the apicoplast aaRS are rarely found in the parasite. This could be in part due to experimental limits in the identification of apicoplast proteins by mass spectrometry. Indeed, when we carried out a chi-quadro test we found that proteins predicted to be targeted to apicoplast are significantly less represented (p < 10-4) in the sample of proteins identified by mass spectrometry. For these reasons we limited analysis of gene expression profiles only for putative cytoplasmic proteins. We considered trascriptomics data for sexual stages and asexual stages [52,53]. We considered a reduced set of the time course gene expression data (22 time points instead of 48) and normalized data by Le Roch (see methods for details). This allowed us to analyse the expression of aaRS genes along all the intra-erythrocytic life cycle of the parasite (Table (Table2).2). Further observations of the protein expression profiles indicated that some aaRSs were exclusively detected at specific stages like, LeuRS (PF08_0011) and AspRS (PFE0715w) in sporozoites; IleRS (PFL1210w), SerRS (PF07_0073), GlnRS (PF13_0170), HisRS (PF14_0428) and PheRS (PFA0480w) in merozoites; AsnRS (PFE0475w), PheRS (PF11_0051) and HisRS (PFI1645c) in trophozoites and TrpRS (PF13_0205) in gametocyte stages (Figure (Figure44).
aaRSs are multi-domain proteins typically consisting of a conserved catalytic domain and an anti-codon binding domain. In addition, some aaRSs have RNA binding and editing domains that cleave incorrectly aminoacylated tRNA molecules . Additional functional domains may be appended to aaRSs in the course of biological evolution [55,56]. Careful examination of 37 identified P. falciparum aaRSs using Pfam database showed that most of them have a generic modular architecture that adheres to prototypical aaRSs (Figure (Figure5).5). The remaining P. falciparum aaRSs or related proteins like PF14_0423 (protein having serine-threonine kinase domain in fusion with an anti-codon binding module) have complex domain architectures. In several, concatenation of unusual domains such as Ybak, GST, Ser-Thr kinase and DNA binding domains is evident (Figure (Figure5).5). The functional relevance of these additional domains fused to typical aaRS in P. falciparum needs to be experimentally addressed. Intriguingly, two of the four Pf-PheRS subunits contain DNA binding domains (PF11_0051, PFA0480w). It is likely that the PheRS, in addition to its aminoacylation function, influences other cellular processes via DNA binding . Consistent with its potential DNA binding property, the P. falciparum PheRS (PFA0480w) has a nuclear localization signal. The CysRS of B. subtilis (which also contains a DNA binding domain) is believed to play a role in initiating chromosomal replication . Therefore, functional roles for P. falciparum PheRSs may extend from aminoacylation to DNA recognition and replication - a suggestion that requires experimental investigation. Similarly, it has been shown that GST or GST homology domains can help in complex formation of aaRSs with multifunctional factors (p38, p18) [56,57]. Additional data show that deletion of GST homology domain from the C-terminal region of p38 results in the dissociation of EPRS (Glutamyl-prolyl-tRNA synthetase) and MetRS from the MSC complex . Mammalian ValRS associated with elongation factor subunits also contain the GST homology domain [60-62]. Thus, the presence of GST domains might be a crucial feature of aaRSs. P. falciparum proteome has two such proteins with GST domains appended to MetRS (PF10_0340) and GluRS (PF13_0257). We also found a most interesting fusion of anticodon binding domain with a serine-threonine kinase (PF14_0423) in P. falciparum. This unusual kinase seems to be expressed throughout the life cycle of parasite (microarray data) and interestingly is predicted to be localized to the parasite nucleus. Clearly, the presence of unusual domain fusions in P. falciparum aaRSs suggests multiple functional roles for many of these P. falciparum enzymes as has been shown in other organisms.
Overall the percentage identity between matching human and P. falciparum aaRS domains varies from 17 to 51. Clearly, Pf-aaRSs which have low sequence identity with human counterparts might serve as good drug targets. In order to study evolutionary relationships of P. falciparum aaRSs with other species, phylogenetic trees were developed in PHYML using maximum likelihood method. For each type of P. falciparum aaRS a separate tree was constructed (see additional file 2). aaRS sequences from 102 different species were used for multiple sequence alignments. As an example, phylogenetic tree of TyrRS from various species (including two sequences from P. falciparum) was constructed. Interestingly, one Pf-TyrRS (MAL8P1.125) clustered with human TyrRS whereas the second Pf-TyrRS (PF11_0181) clustered with bacterial TyrRS indicating different evolutionary origins (Figure (Figure6a).6a). Based on distance matrices, several P. falciparum aaRS sequences clustered as being closer to plants (A. thaliana) or to bacteria (E. coli) (Figure (Figure6b).6b). It is already known that apicomplexan parasites like P. falciparum house a secondary endosymbiotic plastid, possibly hijacked by lateral genetic transfer from an alga. Therefore, the P. falciparum aaRS sequences which are evolutionary close to bacteria and plants are likely to be the outcome of horizontal gene transfer from the plastid. P. falciparum contains ~12 such aaRS sequences which cluster with bacterial or plant sequences. Functional and structural characterization of these bacterial/plant-like aaRS may be relevant in focusing efforts at using aaRS as drug targets.
To date, no crystal structures have been obtained for any aaRS from P. falciparum. Hence, we performed homology modeling of several P. falciparum aaRSs using homologous structures available in PDB. Known structural templates (≥ 40% identity) were used for molecular modeling of several P. falciparum aaRSs including the two TyrRSs (PF11_0181, MAL8P1.125), the PheRS (PFA0480w), ThrRS (PF11_0270), LysRS (PF13_0262), MetRS (PF10_0340) and TrpRS (PF13_0205). The program Align2D (sequence alignment module in Modeller) was used to perform dynamic programming-based global alignments of the target and template sequences. This program uses variable gap penalty for structural loops and core regions using information derived from template structures. We found key differences in the conserved motifs in various aaRSs. For example, the class I motif 'KYSKS' in P. falciparum TyrRS (PF11_0181) and 'KMSKS' in MAL8P1.125 differs from 'KLGKS' of human mitochondrial TyrRS (2PID) and 'KMSSS' of human cytoplasmic (1N3L) respectively. Similarly, class I motif 'HIGH' has subtle sequences variations between P. falciparum and H. sapiens TyrRSs (Figure (Figure7a,7a, Table Table3).3). Using the above procedures, we could generate structural models for several Pf-aaRSs. Stereo-chemical qualities of the generated protein models were assessed using PROCHECK (85-90% residues are in allowed regions of Ramachandran plot). The overall superimposed three-dimension models were visualized in CHIMERA and PYMOL (Figure (Figure7b).7b). Many sequence insertions were observed for P. falciparum enzymes when compared to their homologous . Location of insertions in P. falciparum TyrRS between well-conserved secondary structures suggests ability of TyrRS anticodon binding core to accommodate larger sequence inserts with minimum disruption to the catalytic domain. Direct comparison of modeled P. falciparum aaRSs with human aaRSs revealed several other important structural differences. For example, numerous insertions are present in the loop regions linking various α-helices (α10 to α13) in anticodon binding domain of P. falciparum TyrRSs (PF11_0181 and MAL8p1.125) when compared to its human homologous (2PID and 1N3L) respectively. Structural differences between TyrRS (from P. falciparum) and human counterparts are summarized in Table Table33 and shown in Figure Figure7c.7c. These subtle structural changes that manifest as partial conservation of important motifs in P. falciparum aaRSs reflect evolutionary divergence, and may be useful for exploitation of parasite-specific features as drug targets.
Aminoacyl-tRNA synthetases (aaRSs) link RNA with protein translation. Besides their key role in protein synthesis, aaRSs are also integral to various other cellular processes. aaRS enzymes have been the focus for antimicrobial drug discovery [64,65]. An example of clinical application of an aaRS inhibitor is provided by the antibiotic mupirocin (marketed as Bactroban), which selectively inactivates bacterial isoleucyl-tRNA synthetase . Similarly, it has been shown that the broad-spectrum antifungal 5-fluoro-1,3-dihydro-1-hydroxy-2,1-benzoxaborole (AN2690) inhibits yeast cytoplasmic leucyl-tRNA synthetase by blocking editing site of the enzyme [67,68]. Therefore, presence of distinct or tinkered P. falciparum aaRS lends an opportunity for their exploitation as new drug targets against malaria. In this study, we have extensively analyzed aaRS sequences from Plasmodium species in terms of their mRNA/protein expression profiles, their cellular localization, their organelle targeting and their unique sequence/domain attributes. We have discovered several distinct aaRSs in P. falciparum with no clear human counterparts in terms of their overall domain structures. We have also highlighted deviations of some highly conserved sequence motifs and active site sequence clusters. Our analyses clearly show that a larger fraction of P. falciparum proteome is devoted to aaRS when compared with many other organisms. The phylogenetic data hint at evolutionary closeness of some Pf-aaRSs to bacteria and plants - this further supports the fact of secondary endosymbiosis in this apicomplexan. We hope that our in-depth phylogenetic, protein targeting, domain architecture, protein expression profiling and homology modeling data on Pf-aaRSs can be used as a platform for experimental studies of this important protein family in malaria parasites.
The P. falciparum genome database PlasmoDB Release 5.4 was used for the present analyses. Sequence sets of all the aaRSs from other organisms includes P. berghei, P. chabaudi, P. falciparum, P. knowlesi, P. yoelii, P. vivax, H. sapiens, M. tuberculosis, D. discoidium, M. jannaschii, R. norvegicus, C. parvum, B. bovis, S. cerevisiae, D. melanogaster, Y. pestis, T. aquaticus, S. pneumoniae, S. entrica, E. coli, A. thaliana, A. pisum, A. salmonicida, B. cereus, B. thuringiensis, B. afzelii, B. burgdorferi, B. garinii, B. valaisiana, Bradyrhizobium, B. pennsylvanicus, C. acidaminovorans, H. defensa, C. taiwanensis, E. fergusonii, F. bacterium, F. novicida, F. tularensis, F. alni, G. tenuistipitata, H. arsenicoxydans, A. cellulolyticus, A. chlorophenolicus, A. ferrooxidans, Algoriphagus, A. muciniphila, Anaeromyxobacter, A. thermophilum, B. ambifaria, B. indica, B. mycoides, B. taurus, B. tribocorum, C. atlanticus, Caulobacter, C. aurantiacus, C. cellulolyticum, Citrobacter, C. pinensis, C. Ruthia, Cyanothece, D. desulfuricans, D. hafniense, Diaphorobacter, D. shibae, D. turgidum, E. cuniculi, E. lenta, E. ruminantium, Exiguobacterium, G. diazotrophicus, Geobacillus, M. maris, N. multipartita, Nocardioides, O. terrae, P. abelii, P. atlantica, P. denitrificans, P. ingrahamii, P. lavamentivorans, R. castenholzii, S. arenicola, S. fumaroxidans, X. autotrophicus, V. vadensis, V. paradoxus, T. whipplei, T. auensis, S. stellata, Ch. parvum, S. heliotrinireducens, Silicibacter, S. putrefaciens, S. usitatus, Thauera, X. laevis, Theileria annulata, Vibrio fischeri, W. succinogenes, X. tropicalis, Zeamays. Additional sequences were obtained based on sequence similarity via NCBI BLAST  and ENSEMBL  databases. Known sequence motifs of aaRSs have been used as templates to retrieve sequences of aaRS from other organisms. Some aaRS sequences were manually annotated based on the presence of signature motifs. Protein domains and motifs in the predicted aaRSs were identified using following programs - Superfamily , SMART  and MotifScan available at expasy web server. The following databases - Pfam , TIGR, PIR, EBI and PlasmoDB were also extensively used. Hidden Markov Model (HMM) for each of the 20 aaRS were constructed by the software package Sequence Alignment and Modeling System version 2.2.1 (SAM)  exploiting sequences in the aaRS database . HMM profiles were then used to carry out database search vs P. falciparum proteins. A score was assigned to each protein by calculating the probability that the corresponding sequence is generated by the HMM model, hence for each database search a score distribution was obtained. The score distributions were normalized and 4 ranges of values were considered to identify aaRS (c > 5, 10 < c < 20, 20 < c < 50, c < 50).
The prediction of signal sequences for cellular localization in P. falciparum was performed using various available online web-servers - MITOPROT , PredictNLS  and PATS  for mitochondria, nucleus and apicoplast respectively. PEXEL motif prediction was been carried out by querying PlasmoDB. To identify specific gene expression profiles, we have combined information from different data sets. For the spotted oligonucleotide array data, only half of the 48 time points of the intra-erythrocytic cycle are shown for simplicity, and ratios (versus a common reference) were log2-transformed prior to cluster analysis. For the photolithography data, CEL files were downloaded from website and transferred into Bioconductor package for analysis using a robust multi-array averaging algorithm (RMA) for background adjustment and quantiles normalization . Genes whose expression level was less than 10 (too close to background) or the logP was greater than -0.5 (too few probes per gene) were removed from dataset. Total intensity values for each time point were converted to mean-centered ratios by dividing the total intensity by the average intensity for that gene across all experimental conditions and were then log2-transformed prior to clustering. These data manipulations were necessary because the oligo-nucleotide array data was collected as the intensity ratio between the experimental sample and a common reference, while the photolithography data was collected as the total signal intensity at each spot. Gene expression patterns where the minimum percentage of existing values was less than 80% were eliminated from rest of the analysis. The remaining missing values were replaced by using the KNN-imputation method .
To explore the evolutionary relationships amongst aaRSs phylogenetic analyses were performed for each P. falciparum aaRS on an expanded set of 102 sequences. Multiple sequence alignments of these sequences were obtained from CLUSTALW with default parameters (performed locally) in PHYLIP format . These MSAs were used as seed sequences to run PHYML_v2.4.4 using Jones-Taylor-Thornton (JTT) model . The resulting file was further used in MEGA4.2 for visualization of trees .
We used Sali's Modeller8v2  tool for building various P. falciparum aaRSs models. The stereo-chemical quality of modeled proteins was verified by PROCHECK . Structural mapping of active site residues and other motifs was performed using CHIMERA  and PYMOL .
TKB, CK and SK carried out the computational experiments and data analysis and wrote the paper; MAJ and VS contributed to the manuscript writing; DS and EP carried out HMM construction and database search by HMM; FS performed analysis of transcriptomic and proteomic data; AS designed the study and supervised the work. All authors have read and approved the final manuscript.
List of Pf-aaRSs categorized into class I, class II, and related proteins. Gene ID, gene location, description of product and its length are given.
Phylogenetic trees of aaRSs from P. falciparum. The evolutionary tree was constructed by the method PHYML using the MEGA 4.0. P. falciparum aaRSs are labeled green triangles. 102 species considered for the evolutionary analysis are taken from the three domains of life viz. P. berghei, P. chabaudi, P. falciparum, P. knowlesi, P. yoelii, P. vivax, H. sapiens, M. tuberculosis, D. discoidium, M. jannaschii, R. norvegicus, C. parvum, B. bovis, S. cerevisiae, D. melanogaster, Y. pestis, T. aquaticus, S. pneumoniae, S. entrica, E. coli, A. thaliana, A. pisum, A. salmonicida, B. cereus, B. thuringiensis, B. afzelii, B. burgdorferi, B. garinii, B. valaisiana, Bradyrhizobium, B. pennsylvanicus, C. acidaminovorans, H. defensa, C. taiwanensis, E. fergusonii, F. bacterium, F. novicida, F. tularensis, F. alni, G. tenuistipitata, H. arsenicoxydans, A. cellulolyticus, A. chlorophenolicus, A. ferrooxidans, Algoriphagus, A. muciniphila, Anaeromyxobacter, A. thermophilum, B. ambifaria, B. indica, B. mycoides, B. taurus, B. tribocorum, C. atlanticus, Caulobacter, C. aurantiacus, C. cellulolyticum, Citrobacter, C. pinensis, C. Ruthia, Cyanothece, D. desulfuricans, D. hafniense, Diaphorobacter, D. shibae, D. turgidum, E. cuniculi, E. lenta, E. ruminantium, Exiguobacterium, G. diazotrophicus, Geobacillus, M. maris, N. multipartita, Nocardioides, O. terrae, P. abelii, P. atlantica, P. denitrificans, P. ingrahamii, P. lavamentivorans, R. castenholzii, S. arenicola, S. fumaroxidans, X. autotrophicus, V. vadensis, V. paradoxus, T. whipplei, T. auensis, S. stellata, Ch. parvum, S. heliotrinireducens, Silicibacter, S. putrefaciens, S. usitatus, Thauera, X. laevis, Theileria annulata, Vibrio fischeri, W. succinogenes, X. tropicalis, Zeamays.
TKB, CK and AS are supported by grants from the Department of Biotechnology, Govt. Of India. SK is supported by MEPHITIS grant. This work has been conducted as part of MEPHITIS project and partially funded by the European Commission (Grant Agreement no: HEALTH-F3-2009-223024).