Full-malaria (http://fullmal.ims.u-tokyo.ac.jp), a database for full-length cDNAs from the human malaria parasite, Plasmodium falciparum has been updated in at least three points. (i) We added 8934 sequences generated from the addition of new libraries, so that our collection of 11 424 full-length cDNAs covers 1375 (25%) of the estimated number of the entire 5409 parasite genes. (ii) All of our full-length cDNAs and GenBank EST sequences were mapped to genomic sequences together with publicly available annotated genes and other predictions. This precisely determined the gene structures and positions of the transcriptional start sites, which are indispensable for the identification of the promoter regions. (iii) A total of 4257 cDNA sequences were newly generated from murine malaria parasites, Plasmodium yoelii yoelii. The genome/cDNA sequences were compared at both nucleotide and amino acid levels, with those of P.falciparum, and the sequence alignment for each gene is presented graphically. This part of the database serves as a versatile platform to elucidate the function(s) of malaria genes by a comparative genomic approach. It should also be noted that all of the cDNAs represented in this database are supported by physical cDNA clones, which are publicly and freely available, and should serve as indispensable resources to explore functional analyses of malaria genomes.
Motivation: The sequencing of the Plasmodium yoelii genome, a model rodent malaria parasite, has greatly facilitated research for the development of new drug and vaccine candidates against malaria. Unfortunately, only preliminary gene models were annotated on the partially sequenced genome, mostly by in silico gene prediction, and there has been no major improvement of the annotation since 2002.
Results: Here we report on a systematic assessment of the accuracy of the genome annotation based on a detailed analysis of a comprehensive set of cDNA sequences and proteomics data. We found that the coverage of the current annotation tends to be biased toward genes expressed in the blood stages of the parasite life cycle. Based on our proteomic analysis, we estimate that about 15% of the liver stage proteome data we have generated is absent from the current annotation. Through comparative analysis we identified and manually curated a further 510 P. yoelii genes which have clear orthologs in the P. falciparum genome, but were not present or incorrectly annotated in the current annotation. This study suggests that improvements of the current P. yoelii genome annotation should focus on genes expressed in stages other than blood stages. Comparative analysis will be critically helpful for this re-annotation. The addition of newly annotated genes will facilitate the use of P. yoelii as a model system for studying human malaria.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites.
In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes.
Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of apicomplexa parasites.
The DNA amplification process can be a source of bias and artifacts, especially when amplifying genomic areas with extreme AT or GC content. The human malaria parasite Plasmodium falciparum has an AT-rich genome, and some of its highly AT-rich regions have been shown to be refractory to polymerase chain reaction (PCR) amplification. Biased amplification may lead to erroneous conclusions for studies investigating genome-wide gene expression, nucleosome position, and copy number variation. Here we compare genome-wide nucleosome coverage in libraries amplified at three different extension temperatures and show that reduction in PCR extension temperature from 70ºC to 60ºC can greatly increase the fraction of coverage at AT-rich regions of the P. falciparum genome. Our method will improve the efficiency and coverage in sequencing an AT-rich genome.
new generation sequencing; malaria; genome; amplification bias; nucleosome
It has been shown that nearly a quarter of the initial predicted gene models in the Plasmodium falciparum genome contain errors. Although there have been efforts to obtain complete cDNA sequences to correct the errors, the coverage of cDNA sequences on the predicted genes is still incomplete, and many gene models for those expressed in sexual or mosquito stages have not been validated. Antisense transcripts have widely been reported in P. falciparum; however, the extent and pattern of antisense transcripts in different developmental stages remain largely unknown.
We have sequenced seven bidirectional libraries from ring, early and late trophozoite, schizont, gametocyte II, gametocyte V, and ookinete, and four strand-specific libraries from late trophozoite, schizont, gametocyte II, and gametocyte V of the 3D7 parasites. Alignment of the cDNA sequences to the 3D7 reference genome revealed stage-specific antisense transcripts and novel intron-exon splicing junctions. Sequencing of strand-specific cDNA libraries suggested that more genes are expressed in one direction in gametocyte than in schizont. Alternatively spliced genes, antisense transcripts, and stage-specific expressed genes were also characterized.
It is necessary to continue to sequence cDNA from different developmental stages, particularly those of non-erythrocytic stages. The presence of antisense transcripts in some gametocyte and ookinete genes suggests that these antisense RNA may play an important role in gene expression regulation and parasite development. Future gene expression studies should make use of directional cDNA libraries. Antisense transcripts may partly explain the observed discrepancy between levels of mRNA and protein expression.
Malaria is the most important parasitic disease in the world with approximately two million people dying every year, mostly due to Plasmodium falciparum infection. During its complex life cycle in the Anopheles vector and human host, the parasite requires the coordinated and modulated expression of diverse sets of genes involved in epigenetic, transcriptional and post-transcriptional regulation. However, despite the availability of the complete sequence of the Plasmodium falciparum genome, we are still quite ignorant about Plasmodium mechanisms of transcriptional gene regulation. This is due to the poor prediction of nuclear proteins, cognate DNA motifs and structures involved in transcription.
A comprehensive directory of proteins reported to be potentially involved in Plasmodium transcriptional machinery was built from all in silico reports and databanks. The transcription-associated proteins were clustered in three main sets of factors: general transcription factors, chromatin-related proteins (structuring, remodelling and histone modifying enzymes), and specific transcription factors. Only a few of these factors have been molecularly analysed. Furthermore, from transcriptome and proteome data we modelled expression patterns of transcripts and corresponding proteins during the intra-erythrocytic cycle. Finally, an interactome of these proteins based either on in silico or on 2-yeast-hybrid experimental approaches is discussed.
This is the first attempt to build a comprehensive directory of potential transcription-associated proteins in Plasmodium. In addition, all complete transcriptome, proteome and interactome raw data were re-analysed, compared and discussed for a better comprehension of the complex biological processes of Plasmodium falciparum transcriptional regulation during the erythrocytic development.
Alternative splicing contributes significantly to the complexity of the human transcriptome and proteome. Computational prediction of alternative splice isoforms are usually based on EST sequences that also allow to approximate the expression pattern of the related transcripts. However, the limited number of tissues represented in the EST data as well as the different cDNA construction protocols may influence the predictive capacity of ESTs to unravel tissue-specifically expressed transcripts.
We predict tissue and tumor specific splice isoforms based on the genomic mapping (SpliceNest) of the EST consensus sequences and library annotation provided in the GeneNest database. We further ascertain the potentially rare tissue specific transcripts as the ones represented only by ESTs derived from normalized libraries. A subset of the predicted tissue and tumor specific isoforms are then validated via RT-PCR experiments over a spectrum of 40 tissue types.
Our strategy revealed 427 genes with at least one tissue specific transcript as well as 1120 genes showing tumor specific isoforms. While our experimental evaluation of computationally predicted tissue-specific isoforms revealed a high success rate in confirming the expression of these isoforms in the respective tissue, the strategy frequently failed to detect the expected restricted expression pattern. The analysis of putative lowly expressed transcripts using normalized cDNA libraries suggests that our ability to detect tissue-specific isoforms strongly depends on the expression level of the respective transcript as well as on the sensitivity of the experimental methods. Especially splice isoforms predicted to be disease-specific tend to represent transcripts that are expressed in a set of healthy tissues rather than novel isoforms.
We propose to combine the computational prediction of alternative splice isoforms with experimental validation for efficient delineation of an accurate set of tissue-specific transcripts.
FULL-malaria is a database for a full-length-enriched cDNA library
from the human malaria parasite Plasmodium falciparum (http://18.104.22.168/).
Because of its medical importance, this organism is the first target for
genome sequencing of a eukaryotic pathogen; the sequences of two
of its 14 chromosomes have already been determined. However, for
the full exploitation of this rapidly accumulating information, correct
identification of the genes and study of their expression are essential.
Using the oligo-capping method, we have produced a full-length-enriched cDNA
library from erythrocytic stage parasites and performed one-pass
reading. The database consists of nucleotide sequences of 2490 random
clones that include 390 (16%) known malaria genes according
to BLASTN analysis of the nr-nt database in GenBank; these represent
98 genes, and the clones for 48 of these genes contain the complete
protein-coding sequence (49%). On the other hand, comparisons
with the complete chromosome 2 sequence revealed that 35 of 210
predicted genes are expressed, and in addition led to detection
of three new gene candidates that were not previously known. In
total, 19 of these 38 clones (50%) were full-length. From
these observations, it is expected that the database contains ∼1000 genes, including
500 full-length clones. It should be an invaluable resource for
the development of vaccines and novel drugs.
The development of Plasmodium falciparum within human erythrocytes induces a wide array of changes in the ultrastructure, function and antigenic properties of the host cell. Numerous proteins encoded by the parasite have been shown to interact with the erythrocyte membrane. The identification of new interactions between human erythrocyte and P. falciparum proteins has formed a key area of malaria research. To circumvent the difficulties provided by conventional protein techniques, a novel application of the phage display technology was utilised.
P. falciparum phage display libraries were created and biopanned against purified erythrocyte membrane proteins. The identification of interacting and in-frame amino acid sequences was achieved by sequencing parasite cDNA inserts and performing bioinformatic analyses in the PlasmoDB database.
Following four rounds of biopanning, sequencing and bioinformatic investigations, seven P. falciparum proteins with significant binding specificity toward human erythrocyte spectrin and protein 4.1 were identified. The specificity of these P. falciparum proteins were demonstrated by the marked enrichment of the respective in-frame binding sequences from a fourth round phage display library.
The construction and biopanning of P. falciparum phage display expression libraries provide a novel approach for the identification of new interactions between the parasite and the erythrocyte membrane.
Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be “unclean”. Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction.
After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3′-end terminal structures in designated 5′-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/) using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are “unclean” or abnormal, all of which could be cleaned or filtered by AFST.
cDNA terminal pattern analysis, as implemented in the AFST software tool, can be utilized to reveal wet-lab errors such as restriction enzyme cutting abnormities and chimeric EST sequences, detect various data abnormalities embedded in existing Sanger EST datasets, improve the accuracy of identifying and extracting bona fide cDNA inserts from raw ESTs, and therefore greatly benefit downstream EST-based applications.
cDNA terminus; cDNA library construction; Pattern analysis; Restriction enzyme cutting abnormality; Chimeric EST sequences
The Plasmodium falciparum genome being AT-rich, the presence of GC-rich regions suggests functional significance. Evolution imposes selection pressure to retain functionally important coding and regulatory elements. Hence searching for evolutionarily conserved GC-rich, intergenic regions in an AT-rich genome will help in discovering new coding regions and regulatory elements. We have used elevated GC content in intergenic regions coupled with sequence conservation against P. reichenowi, which is evolutionarily closely related to P. falciparum to identify potential sequences of functional importance. Interestingly, ~30% of the GC-rich, conserved sequences were associated with antigenic proteins encoded by var and rifin genes. The majority of sequences identified in the 5′ UTR of var genes are represented by short expressed sequence tags (ESTs) in cDNA libraries signifying that they are transcribed in the parasite. Additionally, 19 sequences were located in the 3′ UTR of rifins and 4 also have overlapping ESTs. Further analysis showed that several sequences associated with var genes have the capacity to encode small peptides. A previous report has shown that upstream peptides can regulate the expression of var genes hence we propose that these conserved GC-rich sequences may play roles in regulation of gene expression.
Plasmodium; regulatory elements; comparative genomics; genome bias; antigenic variation
The Plasmodium falciparum genome (3D7 strain) published in 2002, revealed ~5,400 genes, mostly based on in silico predictions. Experimental data is therefore required for structural and functional assessments of P. falciparum genes and expression, and polymorphic data are further necessary to exploit genomic information to further qualify therapeutic target candidates. Here, we undertook a large scale analysis of a P. falciparum FcB1-schizont-EST library previously constructed by suppression subtractive hybridization (SSH) to study genes expressed during merozoite morphogenesis, with the aim of: 1) obtaining an exhaustive collection of schizont specific ESTs, 2) experimentally validating or correcting P. falciparum gene models and 3) pinpointing genes displaying protein polymorphism between the FcB1 and 3D7 strains.
A total of 22,125 clones randomly picked from the SSH library were sequenced, yielding 21,805 usable ESTs that were then clustered on the P. falciparum genome. This allowed identification of 243 protein coding genes, including 121 previously annotated as hypothetical. Statistical analysis of GO terms, when available, indicated significant enrichment in genes involved in "entry into host-cells" and "actin cytoskeleton". Although most ESTs do not span full-length gene reading frames, detailed sequence comparison of FcB1-ESTs versus 3D7 genomic sequences allowed the confirmation of exon/intron boundaries in 29 genes, the detection of new boundaries in 14 genes and identification of protein polymorphism for 21 genes. In addition, a large number of non-protein coding ESTs were identified, mainly matching with the two A-type rRNA units (on chromosomes 5 and 7) and to a lower extent, two atypical rRNA loci (on chromosomes 1 and 8), TARE subtelomeric regions (several chromosomes) and the recently described telomerase RNA gene (chromosome 9).
This FcB1-schizont-EST analysis confirmed the actual expression of 243 protein coding genes, allowing the correction of structural annotations for a quarter of these sequences. In addition, this analysis demonstrated the actual transcription of several remarkable non-protein coding loci: 2 atypical rRNA, TARE region and telomerase RNA gene. Together with other collections of P. falciparum ESTs, usually generated from mixed parasite stages, this collection of FcB1-schizont-ESTs provides valuable data to gain further insight into the P. falciparum gene structure, polymorphism and expression.
Almost a decade after the publication of the complete sequence of the genome of the human malaria parasite Plasmodium falciparum, the mechanisms involved in gene regulation remain poorly understood. Like other eukaryotic organisms, P. falciparum’s genomic DNA organizes into nucleosomes. Nucleosomes are the basic structural units of eukaryotic chromatin and their regulation is known to play a key role in regulation of gene expression. Despite its importance, the relationship between nucleosome positioning and gene regulation in the malaria parasite has only been investigated recently. Using two independent and complementary techniques followed by next-generation high-throughput sequencing, our laboratory recently generated a dynamic atlas of nucleosome-bound and nucleosome-free regions (NFRs) at single-nucleotide resolution throughout the parasite erythrocytic cycle. We have found evidences that genome-wide changes in nucleosome occupancy play a critical role in controlling the rigorous parasite replication in infected red blood cells. However, the role of nucleosome positioning at remarkable locations such as transcriptional start sites (TSS) was not investigated. Here we show that a study of NFR in experimentally determined TSS and in silico-predicted promoters can provide deeper insights of how a transcriptionally permissive organization of chromatin can control the parasite’s progression through its life cycle. We find that NFRs found at TSS and core promoters are strongly associated with high levels of gene expression in asexual erythrocytic stages, whereas nucleosome-bound TSSs and promoters are associated with silent genes preferentially expressed in sexual stages. The implications in terms of regulatory evolution, adaptation of gene expression and their impact in the design of antimalarial strategies are discussed.
Nucleosome; Malaria; TSS; Promoter; Transcription; Virulence; Evolution; MAINE-seq; FAIRE-seq
The intraerythrocytic development of Plasmodium falciparum, the most virulent human malaria parasite involves asexual and gametocyte stages. There has been a significant increase in disparate datasets derived from genomic and post-genomic analysis of the parasite that necessitates delivery of integrated analysis from which biological processes important to the survival of the parasite can be determined.
In order to resolve genes associated with stage differentially expressed transcripts, we have developed and implemented an integrative approach that combines evidence from P. falciparum expressed sequence tags (ESTs), genomic, microarray, proteomic and gene ontology data.
A total of 143 gametocyte-overexpressed and 51 asexual-overexpressed transcripts were identified. A subset of 74 genes associated with these transcripts showed evidence of stage-correlated protein expression, of which 53 have not been experimentally characterised. Our study has revealed (1) possible regulatory mechanisms in malaria parasites' gametocyte maturation, (2) correlation between EST and microarray data for a P. falciparum gene family to present unique EST-derived information, (3) candidate drug and antigenic targets on which computational and experimental studies can be performed, and (4) the need for more empirical studies on gene and protein expression in malaria parasites.
Applying different domains of data to the same underlying gene set has yielded novel insights into the biology of the parasite and presents an approach to appraise critically the data quality of post-genomic datasets from malaria parasites.
Plasmodium vivax is the most widely distributed human malaria, responsible for 70–80 million clinical cases each year and large socio-economical burdens for countries such as Brazil where it is the most prevalent species. Unfortunately, due to the impossibility of growing this parasite in continuous in vitro culture, research on P. vivax remains largely neglected.
A pilot survey of expressed sequence tags (ESTs) from the asexual blood stages of P. vivax was performed. To do so, 1,184 clones from a cDNA library constructed with parasites obtained from 10 different human patients in the Brazilian Amazon were sequenced. Sequences were automatedly processed to remove contaminants and low quality reads. A total of 806 sequences with an average length of 586 bp met such criteria and their clustering revealed 666 distinct events. The consensus sequence of each cluster and the unique sequences of the singlets were used in similarity searches against different databases that included P. vivax, Plasmodium falciparum, Plasmodium yoelii, Plasmodium knowlesi, Apicomplexa and the GenBank non-redundant database. An E-value of <10-30 was used to define a significant database match. ESTs were manually assigned a gene ontology (GO) terminology
A total of 769 ESTs could be assigned a putative identity based upon sequence similarity to known proteins in GenBank. Moreover, 292 ESTs were annotated and a GO terminology was assigned to 164 of them.
These are the first ESTs reported for P. vivax and, as such, they represent a valuable resource to assist in the annotation of the P. vivax genome currently being sequenced. Moreover, since the GC-content of the P. vivax genome is strikingly different from that of P. falciparum, these ESTs will help in the validation of gene predictions for P. vivax and to create a gene index of this malaria parasite.
The variant surface antigens expressed on Plasmodium falciparum–infected erythrocytes are potentially important targets of immunity to malaria and are encoded, at least in part, by a family of var genes, about 60 of which are present within every parasite genome. Here we use semi-conserved regions within short var gene sequence “tags” to make direct comparisons of var gene expression in 12 clinical parasite isolates from Kenyan children. A total of 1,746 var clones were sequenced from genomic and cDNA and assigned to one of six sequence groups using specific sequence features. The results show the following. (1) The relative numbers of genomic clones falling in each of the sequence groups was similar between parasite isolates and corresponded well with the numbers of genes found in the genome of a single, fully sequenced parasite isolate. In contrast, the relative numbers of cDNA clones falling in each group varied considerably between isolates. (2) Expression of sequences belonging to a relatively conserved group was negatively associated with the repertoire of variant surface antigen antibodies carried by the infected child at the time of disease, whereas expression of sequences belonging to another group was associated with the parasite “rosetting” phenotype, a well established virulence determinant. Our results suggest that information on the state of the host–parasite relationship in vivo can be provided by measurements of the differential expression of different var groups, and need only be defined by short stretches of sequence data.
Hope that it will be possible to develop a malaria vaccine is supported by the fact that individuals who have grown up in malaria endemic regions learn to carry malarial infections without suffering disease. Surprisingly little is still known about how this immunity develops. Much current research focuses on how the host develops immune responses to parasite antigens that are exposed to the host immune system. A major family of such antigens are inserted into the surface of parasite-infected erythrocytes, where they undergo antigenic switching to evade a developing antibody response. These proteins are encoded by a family of approximately 60 var genes, variants of which are present in every parasite genome.
The extreme diversity of the var genes has prevented meaningful comparison of their expression in clinical isolates. However, the authors of this paper show that var genes can be placed in groups that have a similar representation in the genomes of all parasites that the authors collected from Kenyan children. Having demonstrated an underlying similarity at the genomic level, the authors show that the var expression patterns vary markedly between different patients. The expression levels of specific groups of var genes was associated with poorly developed antibody responses in the children and a well-established parasite virulence phenotype. The study provides tools for exploring how host and parasite adapt to one another as immunity develops.
A new paradigm of biological investigation takes advantage of technologies that produce large high throughput datasets, including genome sequences, interactions of proteins, and gene expression. The ability of biologists to analyze and interpret such data relies on functional annotation of the included proteins, but even in highly characterized organisms many proteins can lack the functional evidence necessary to infer their biological relevance.
Here we have applied high confidence function predictions from our automated prediction system, PFP, to three genome sequences, Escherichia coli, Saccharomyces cerevisiae, and Plasmodium falciparum (malaria). The number of annotated genes is increased by PFP to over 90% for all of the genomes. Using the large coverage of the function annotation, we introduced the functional similarity networks which represent the functional space of the proteomes. Four different functional similarity networks are constructed for each proteome, one each by considering similarity in a single Gene Ontology (GO) category, i.e. Biological Process, Cellular Component, and Molecular Function, and another one by considering overall similarity with the funSim score. The functional similarity networks are shown to have higher modularity than the protein-protein interaction network. Moreover, the funSim score network is distinct from the single GO-score networks by showing a higher clustering degree exponent value and thus has a higher tendency to be hierarchical. In addition, examining function assignments to the protein-protein interaction network and local regions of genomes has identified numerous cases where subnetworks or local regions have functionally coherent proteins. These results will help interpreting interactions of proteins and gene orders in a genome. Several examples of both analyses are highlighted.
The analyses demonstrate that applying high confidence predictions from PFP can have a significant impact on a researchers' ability to interpret the immense biological data that are being generated today. The newly introduced functional similarity networks of the three organisms show different network properties as compared with the protein-protein interaction networks.
PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates the recently completed P. falciparum genome sequence and annotation, as well as draft sequence and annotation emerging from other Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for intra- and inter-species comparisons. Sequence information is integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects and proteomics studies. The relational schema used to build PlasmoDB, GUS (Genomics Unified Schema) employs a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically-based, queries of the database. A stand-alone version of the database is also available on CD-ROM (P. falciparum GenePlot), facilitating access to the data in situations where internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to facilitate utilization of the vast quantities of genomic-scale data produced by the global malaria research community. The software used to develop PlasmoDB has been used to create a second Apicomplexan parasite genome database, ToxoDB (http://ToxoDB.org).
The genome of a number of species of malaria parasites (Plasmodium spp.) has been sequenced in the hope of identifying new drug and vaccine targets. However, almost one-half of predicted Plasmodium genes are annotated as hypothetical and are difficult to analyse in bulk due to the inefficiency of current reverse genetic methodologies for Plasmodium. Recently, it has been shown that the transposase piggyBac integrates at random into the genome of the human malaria parasite P. falciparum offering the possibility to develop forward genetic screens to analyse Plasmodium gene function. This study reports the development and application of the piggyBac transposition system for the rodent malaria parasite P. berghei and the evaluation of its potential as a tool in forward genetic studies. P. berghei is the most frequently used malaria parasite model in gene function analysis since phenotype screens throughout the complete Plasmodium life cycle are possible both in vitro and in vivo.
We demonstrate that piggyBac based gene inactivation and promoter-trapping is both easier and more efficient in P. berghei than in the human malaria parasite, P. falciparum. Random piggyBac-mediated insertion into genes was achieved after parasites were transfected with the piggyBac donor plasmid either when transposase was expressed either from a helper plasmid or a stably integrated gene in the genome. Characterization of more than 120 insertion sites demonstrated that more than 70 most likely affect gene expression classifying their protein products as non-essential for asexual blood stage development. The non-essential nature of two of these genes was confirmed by targeted gene deletion one of which encodes P41, an ortholog of a human malaria vaccine candidate. Importantly for future development of whole genome phenotypic screens the remobilization of the piggyBac element in parasites that stably express transposase was demonstrated.
These data demonstrate that piggyBac behaved as an efficient and random transposon in P. berghei. Remobilization of piggyBac element shows that with further development the piggyBac system can be an effective tool to generate random genome-wide mutation parasite libraries, for use in large-scale phenotype screens in vitro and in vivo.
The work of the consortium that has been formed to complete the entire sequence of the genome of a selected clone of the human malaria parasite, Plasmodium falciparum, is almost finished. Already huge tracts of the genome are available as fully assembled chromosomes or large contigs and the work of initial annotation is in an advanced state. Post-genomic research is in one sense the process of furthering the process of annotation, creating biological atlases and preliminary attempts to make global descriptions of gene transcription and proteome analysis are underway. Comparison between significant amounts of genome data from both closely, and more distantly related organisms, can facilitate the identification of genes themselves, coordinately regulated gene expression groups, gene function and genome organization. Models of malaria can fulfil these functions and in addition provide an experimental system wherein predictions can be tested and basic experimental investigations performed within numerous aspects of disease, pathology, parasite-host and parasite-vector interactions. Comparative genomics in Plasmodium has already been shown to have informative roles in the completion of annotation and the elucidation of gene function. These roles will be illustrated by example and used as the basis for a discussion of the utility of genome information and malaria models in realizing the desired product of Plasmodium genomics, the development of malaria therapies.
Recent advances in high-throughput sequencing present a new opportunity to deeply probe an organism's transcriptome. In this study, we used Illumina-based massively parallel sequencing to gain new insight into the transcriptome (RNA-Seq) of the human malaria parasite, Plasmodium falciparum. Using data collected at seven time points during the intraerythrocytic developmental cycle, we (i) detect novel gene transcripts; (ii) correct hundreds of gene models; (iii) propose alternative splicing events; and (iv) predict 5′ and 3′ untranslated regions. Approximately 70% of the unique sequencing reads map to previously annotated protein-coding genes. The RNA-Seq results greatly improve existing annotation of the P. falciparum genome with over 10% of gene models modified. Our data confirm 75% of predicted splice sites and identify 202 new splice sites, including 84 previously uncharacterized alternative splicing events. We also discovered 107 novel transcripts and expression of 38 pseudogenes, with many demonstrating differential expression across the developmental time series. Our RNA-Seq results correlate well with DNA microarray analysis performed in parallel on the same samples, and provide improved resolution over the microarray-based method. These data reveal new features of the P. falciparum transcriptional landscape and significantly advance our understanding of the parasite's red blood cell-stage transcriptome.
With the completion and near completion of many malaria parasite genome-sequencing projects, efforts are now being directed to a better understanding of gene functions and to the discovery of vaccine and drug targets. Inter- and intraspecies comparisons of the parasite genomes will provide invaluable insights into parasite evolution, virulence, drug resistance, and immune invasion. Genome-wide searches for loci under various selection pressures may lead to discovery of genes conferring drug resistance or encoding for protective antigens. In addition, the Plasmodium falciparum genome sequence provides the basis for the development of various microarrays to monitor gene expression and to detect nucleotide substitution and deletion/amplification. Genome-wide profiling of the parasite proteome, chromatin modification, and nucleosome position also depend on availability of the parasite genome. In this brief review, we will highlight some recent advances and studies in characterizing gene function and related phenotype in P. falciparum that were made possible by the genome sequence, particularly the development of a genome-wide diversity map and various high-throughput genotyping methods for genome-wide association studies (GWAS).
Malaria; microarray; genome diversity; SNP; recombination; comparative genomics.
Potassium channels are essential for cell survival and participate in the regulation of cell membrane potential and electrochemical gradients. During its lifecycle, Plasmodium falciparum parasites must successfully traverse widely diverse environmental milieus, in which K+ channel function is likely to be critical. Dramatically differing conditions will be presented to the parasite in the mosquito mid-gut, red blood cell (RBC) cytosol and the human circulatory system.
In silico sequence analyses identified two open-reading frames in the P. falciparum genome that are predicted to encode for proteins with high homology to K+ channels. To further analyse these putative channels, specific antisera were generated and used in immunoblot and immunofluorescence analyses of P. falciparum-infected RBCs. Recombinant genome methods in cultured P. falciparum were used to create genetic knock outs of each K+ channel gene to asses the importance of their expression.
Immunoblot and IFA analyses confirmed the expression of the two putative P. falciparum K+ channels (PfK1 and PfK2). PfK1 is expressed in all asexual stage parasites, predominantly in late stages and localizes to the RBC membrane. Conversely, PfK2 is predominantly expressed in late schizont and merozoite stage parasites and remains primarily localized to the parasite. Repeated attempts to knockout PfK1 and PfK2 expression by targeted gene disruption proved unsuccessful despite evidence of recombinant gene integration, indicating that pfk1 and pfk2 are apparently refractory to genetic disruption.
Putative K+ channel proteins PfK1 and PfK2 are expressed in cultured P. falciparum parasites with differing spatial and temporal patterns. Eventual functional characterization of these channels may reveal future pharmacological targets.
Plasmodium vivax is the most widespread human malaria parasite. However, genetic information about its pathogenesis is limited at present, due to the lack of a reproducible in vitro cultivation method. Sequencing of the Plasmodium vivax genome suggested the presence of a homolog of deoxyhypusine synthase (DHS) from P. falciparum, the key regulatory enzyme in the first committed step of hypusine biosynthesis. DHS is involved in cell proliferation, and thus a valuable drug target for the human malaria parasite P. falciparum. A comparison of the enzymatic properties of the DHS enzymes between the benign and severe Plasmodium species should contribute to our understanding of the differences in pathogenicity and phylogeny of both malaria parasites.
We describe the cloning of a 1368 bp putative deoxyhypusine synthase gene (dhs) sequence from genomic DNA of P. vivax PEST strain Salvador I (Accession number AJ549098) after touchdown PCR. The corresponding protein was expressed and functionally characterized as deoxyhypusine synthase by determination of its specific activity and cross-reactivity to human DHS on a Western blot.
The putative DHS protein from P. vivax displays a FASTA score of 75 relative to DHS from rodent malaria parasite, P. yoelii, and 74 relative to that from the human parasite, P. falciparum strain 3D7. The ORF encoding 456 amino acids was expressed under control of IPTG-inducible T7 promoter, and expressed as a protein of approximately 50 kDa (theoretically 52.7 kDa) in E. coli BL21 DE3 cells. The N-terminal histidine-tagged protein was purified by Nickel-chelate affinity chromatography under denaturing conditions. DHS with a theoretical pI of 6.0 was present in both eluate fractions. The specific enzymatic activity of DHS was determined as 1268 U/mg protein. The inhibitor, N-guanyl-1, 7-diaminoheptane (GC7), suppressed specific activity by 36-fold. Western blot analysis performed with a polyclonal anti-human DHS antibody revealed cross-reactivity to DHS from P. vivax, despite an amino acid identity of 44% between the proteins.
We identify a novel DHS protein in the more benign malaria parasite,P. vivax, on the basis of specific enzymatic activity, cross-reactivity with a polyclonal antibody against human DHS, and amino acid identity with DHS homologs from the rodent malaria parasite, P. yoelii, and human P. falciparum strains.
Much of the Plasmodium falciparum genome encodes hypothetical proteins with limited homology to other organisms. A lack of robust tools for genetic manipulation of the parasite limits functional analysis of these hypothetical proteins and other aspects of the Plasmodium genome. Transposon mutagenesis has been used widely to identify gene functions in many organisms and would be extremely valuable for functional analysis of the Plasmodium genome.
In this study, we investigated the lepidopteran transposon, piggyBac, as a molecular genetic tool for functional characterization of the Plasmodium falciparum genome. Through multiple transfections, we generated 177 unique P. falciparum mutant clones with mostly single piggyBac insertions in their genomes. Analysis of piggyBac insertion sites revealed random insertions into the P. falciparum genome, in regards to gene expression in parasite life cycle stages and functional categories. We further explored the possibility of forward genetic studies in P. falciparum with a phenotypic screen for attenuated growth, which identified several parasite genes and pathways critical for intra-erythrocytic development.
Our results clearly demonstrate that piggyBac is a novel, indispensable tool for forward functional genomics in P. falciparum that will help better understand parasite biology and accelerate drug and vaccine development.