|Home | About | Journals | Submit | Contact Us | Français|
Serial analysis of gene expression (SAGE) was applied to the malarial parasite Plasmodium falciparum to characterize the comprehensive transcriptional profile of erythrocytic stages. A SAGE library of ~8335 tags representing 4866 different genes was generated from 3D7 strain parasites. Basic local alignment search tool analysis of high abundance SAGE tags revealed that a majority (88%) corresponded to 3D7 sequence, and despite the low complexity of the genome, 70% of these highly abundant tags matched unique loci. Characterization of these suggested the major metabolic pathways that are used by the organism under normal culture conditions. Furthermore several tags expressed at high abundance (30% of tags matching to unique loci of the 3D7 genome) were derived from previously uncharacterized open reading frames, demonstrating the use of SAGE in genome annotation. The open platform “profiling” nature of SAGE also lead to the important discovery of a novel transcriptional phenomenon in the malarial pathogen: a significant number of highly abundant tags that were derived from annotated genes (17%) corresponded to antisense transcripts. These SAGE data were validated by two independent means, strand specific reverse transcription-polymerase chain reaction and Northern analysis, where antisense messages were detected in both asexual and sexual stages. This finding has implications for transcriptional regulation of Plasmodium gene expression.
Malaria, an infectious disease caused by the protozoan parasite Plasmodium falciparum, affects 300–500 million people globally each year (WHO, 1997 ). Increasing drug-resistance in the parasite and insecticide-resistance in the Anopheles vector have exacerbated this substantial public health problem. Against this backdrop, effective strategies to combat the disease require a fundamental knowledge of the basic biology of Plasmodium to develop new pharmatherapeutics and vaccines that target the parasite.
Most studies of Plasmodium biology have been directed at single genes thought to be important for pathogenesis. With the advent of genomic technologies, however, new approaches to combat the disease, such as identifying entire repertoires of transcripts expressed under different conditions, have now become available. Genomic approaches were initiated with the sequencing of the P. falciparum (3D7 strain) genome, a collaborative project, undertaken by the Malaria Genome Consortium that is already close to completion (Butler, 1997 ; O'Brien, 1997 ; Craig et al., 1999 ). Chromosomes 2 and 3 have been fully sequenced (Gardner et al., 1998 ; Bowman et al., 1999 ), whereas 80 to 90% of the estimated 6000 open reading frames (ORFs) in the 3D7 genome are now available as raw sequence data. The next challenge is to use this vast amount of data to study the functional relevance of various genes. For example, it is now possible to identify genes that are transcribed in different stages of the parasite's development and also genes that are induced or repressed in response to various stimuli such as immune or drug pressure. For this reason, whole genome expression analyses with the use of high-density microarrays (Hayward et al., 2000 ) and serial analysis of gene expression (SAGE) (Munasinghe et al., 2000 ) have been developed for P. falciparum. These new approaches will complement each other to generate data for the Plasmodium research community. Genome sequence will expedite the microarray and SAGE analyses; conversely, open platform profiling techniques such as SAGE will help the Malaria Genome Project with annotation of previously uncharacterized ORFs and with novel gene discovery.
SAGE provides a sensitive and highly quantitative description of the transcript profile of a given cell type (Velculescu et al., 1995 , 1997 ). The SAGE technology samples short sequence tags (14 bases) from mRNA transcripts in the population of interest. These tags contain sufficient sequence information to identify, by basic local alignment search tool (BLAST) analysis, the transcript from which each tag was derived (Munasinghe et al., 2000 ). The frequency of each tag in the SAGE library is an accurate estimate of the abundance of its corresponding mRNA transcript. Numerous groups have used this technique successfully and described the SAGE protocol in detail (Velculescu et al., 1995 , 1997 ; Madden et al., 1997 ; Polyak et al., 1997 ; Matsumura et al., 1999 ; Virlon et al., 1999 ).
In this report, we show that SAGE can be used to study gene expression of the asexual stages of P. falciparum. Asexual parasites express many virulence factors and are the targets of antimalarials such as chloroquine; hence, an in-depth understanding of their transcriptional profiles will set the stage for future experiments addressing responses to immune or drug pressure.
SAGE was successfully applied to erythrocytic stage parasites (3D7 strain) of P. falciparum at baseline culturing conditions, and a SAGE library of ~8000 tags was generated. A majority of these corresponded to unique parasite genes, as demonstrated by BLAST analysis of a subset of tags. The SAGE data were validated by Northern and reverse transcription-polymerase chain reaction (RT-PCR) analysis of genes predicted to be highly expressed based on tag counts. BLAST analysis of highly abundant tags also provided insight into networks of major metabolic pathways that are used by the parasite under normal culture conditions. These pathways include mitochondrial, glucose, polyamine, and deoxy-d-xylulose 5-phosphate (DOXP) metabolism. Finally, SAGE also revealed the presence of antisense transcription in the malarial parasite, a phenomenon that has been previously missed by other methods of transcriptional analysis. These SAGE data were also validated by two independent methods, RT-PCR and Northern analysis; here antisense transcripts for genes expressed in asexual as well as sexual stage parasites were found. In summary, SAGE in Plasmodium has revealed many facets of the basic functioning of the parasite in culture, and it sets the stage for future comparisons of the transcriptional responses of P. falciparum to different stimuli.
3D7 strain parasites were maintained under standard culturing conditions (Trager and Jensen, 1976 ) with modifications as previously described (Munasinghe et al., 2000 ). Polyadenylated RNA was harvested from cultures at 8% parasitemia (1% rings, 5% trophozoites, and 2% schizonts) and used in the SAGE procedure as previously described (Munasinghe et al., 2000 ).
SAGE tags from 3D7 asexual stages were analyzed with the use of the SAGE software (Johns Hopkins University, Baltimore, MD, and Genzyme, Cambridge, MA), which extracts 14-bp tag counts from sequence files. To assign gene identity to each tag, the 3D7 experimental tag list was matched against a P. falciparum tag database. This database was created by extracting 14-bp tags from P. falciparum sequence deposited in GenBank (as of July 13, 2000), as well as from a compiled database of recently deposited 3D7 genome sequence (obtained from The Institute For Genomic Research (Rockville, MD), Sanger (Cambridge, UK), and Stanford (Palo Alto, CA), sequencing centers, and compiled at the University of Pennsylvania (Philadelphia, PA) as of July 26, 2000; kindly provided by Drs. Jessica Kissinger and David Roos). Because the P. falciparum genome is not fully annotated, all potential SAGE tags from both sense and antisense strands were extracted (i.e., tags were extracted from each database in the “genomic mode” rather than the “cDNA mode”).
The software output files are organized in such a way that matches to a single locus, matches to multiple loci, and no matches to database sequences can be readily determined. For genomic sequence that is annotated, it is possible to assign gene identification to each tag in the manner outlined above; however, most of the available P. falciparum genome sequence is not annotated. Therefore, the 187 most abundant tags (abundance level of >4 hits) were characterized by manual BLASTx analysis; see flow chart in Figure Figure1.1. Here, for tags derived from unannotated reads, a 500-1000-bp sequence surrounding the tag was translated in all six reading frames and compared with the entire National Center for Biotechnology Information protein database. Fourteen bp tags that failed to match either database were analyzed with the use of only the first 13 bp of the tag sequence in the manner outlined above.
RT-PCR was performed with the use of the 3′ RACE kit (Invitrogen, Carlsbad, CA) according to the manufacturer's protocol. First-strand cDNA synthesis was primed with oligo(dT)18 (0.5 μg of mRNA was used per reaction), whereas PCR was performed with the gene specific primers described below. All primers anneal within coding regions of the genes and result in 350–790-bp PCR products. Calmodulin (sense): 5′ GTCCATCACCATCAATATCAGC 3′ calmodulin (antisense): 5′ CTAAGGAGTTAGGAACGGTCATG 3′ msp-3 (sense): 5′ TTTTTGTGTTCTGGAAC-GCCTCCTCC 3′ msp-3 (antisense): 5′ GCTTCCGAAGATGCTGAA-AAAGCTGC 3′ pfg27/25 (sense): 5′ TCTTGTCGTTCATGATA-CGCTTC 3′ pfg27/25 (antisense): 5′ GTACAAAAGGATAGT-GCCAAGCCC 3′ rap-1 (sense): 5′ CTTTGAAGAAATCTCTGAT-TTCAGC 3′ rap-1 (antisense): 5′ GCTTTAGAAGGTGTCTGT-TCATATC 3′ hsp86 (sense): 5′ CCGAATTACTCCGATTCCAA-ACCTC 3′ hsp86 (antisense): 5′ CTTCTTCCATTTTAGAAT-CGGTTGC 3′ PCR reactions were carried out according to the manufacturer's protocol (3′ RACE kit; Invitrogen). Initial denaturation of the template occurred at 94°C for 3 min. Amplification was performed for five cycles at 94°C for 45 s, 52–54°C for 45 s, and 72°C for 45 s, followed by 21–26 cycles of identical amplification where the annealing temperature was increased to 55–57°C. Finally, extension of partial PCR products was completed at 72°C for 6 min.
Strand-specific RT-PCR used 1 μg of total RNA per reaction and was performed with the express purpose of distinguishing sense RNA from antisense RNA (Yu et al., 1995 ). RT-PCR was performed with the use of the 3′ RACE kit (Invitrogen); however, first-strand cDNA was primed with gene-specific primers that hybridize to either sense or antisense messages, rather than with an oligo(dT)18 primer. The gene-specific primers are identical to the primers listed above. A 10th of the cDNA sample was PCR amplified, with the same set of gene-specific primers and amplification conditions described above.
PCR products were electrophoresed on 1.2% agarose gels. All resultant PCR products were cloned into the pCRII vector with the use of the TA cloning kit (Invitrogen) and sequenced to confirm the identity of the amplified cDNA.
Northern analysis was performed according to standard protocols. Briefly, 1 μg of mRNA from 3D7 cultures was gel electrophoresed, blotted onto BA85 nitrocellulose membranes (Schleicher & Schuell, Keene, NH), and probed with gene-specific DNA probes. All probes (calmodulin, msp-3, rap-1, and pfg27/25) were derived from the RT-PCR products described in the previous section. DNA probes were radiolabeled with [α-32P]dATP with the use of random hexanucleotides and the Klenow fragment of DNA polymerase. Blots were visualized by autoradiography.
For strand-specific Northerns, 20 μg of total RNA were used per blot as described above. Synthetic RNAs corresponding to a sense or antisense fragment (~300 bp) of either calmodulin or msp-3 were used as probes for strand-specific Northern analysis. The synthetic RNAs were generated in the following manner. Briefly, RT-PCR products of calmodulin and msp-3 cDNAs (~300-bp fragment; see previous section) were cloned into pBluescript. The orientations of calmodulin and msp-3 genes within pBluescript were determined by sequencing. Each plasmid (pBluescript-msp-3 and pBluescript-calmodulin) was linearized with either BamHI or XhoI. These plasmids were then used as DNA templates for in vitro transcription reactions with the use of T3 or T7 RNA polymerase to generate synthetic sense or antisense RNA fragments for each gene. Plasmids digested with BamHI were incubated with T7 RNA polymerase (after standard protocols) to produce antisense RNAs for both genes. Similarly, plasmids digested with XhoI were incubated with T3 polymerase to produce sense RNAs. Strand-specific RNA probes were also obtained under the same conditions in the presence of [α-32P]ATP.
Quantitative Northern analysis was carried out for calmodulin and msp-3 to determine whether the ratio of their transcripts was comparable with that determined by SAGE. Northern blots and gene-specific DNA probes were prepared as described above. Known amounts of synthetic 300-bp RNA fragments (in the sense orientation) from each gene were run alongside the mRNA sample as markers for quantification. Blots were exposed to x-ray film (Kodak XO-MAT) such that the intensity of the signal was within the linear range of the film. Signal intensities for each of the transcripts in the mRNA sample were converted to molar amounts by reference to those of the synthetic RNAs. Signal intensities were measured by scanning the x-ray film into Adobe Photoshop (Adobe Systems, Mountain View, CA) and by using NIH Image software to quantify bands by pixel density.
P. gallinaceum parasites were propagated in White Leghorn chickens by serial injection into wing veins. At parasitemia of 50–70%, blood was withdrawn by heart puncture. Gametogenesis was induced as described previously (Goonewardene et al., 1993 ), with the inclusion of xanthurenic acid (Sigma, St. Louis, MO) at a final concentration of 50 μM in the exflagellation buffer. Gametes and zygotes were purified, also as described previously (Goonewardene et al., 1993 ), and 1 × 107 cells were incubated at 25°C in Medium 199 (Invitrogen) and harvested for analysis at 0, 24, and 48 h after isolation. Total RNA was isolated with the use of Tri reagent (Molecular Research Center, Cincinnati, OH) according to the manufacturer's protocol. Total RNA obtained from 1 × 107 parasites was used for each RT-PCR reaction. Strand-specific RT-PCR was performed as described previously with the following primers: pgs28 (sense): 5′ CATCTAGCATAGTCAGCACAAGGTTTATTTG 3′ and pgs28 (antisense): 5′ CAAACGAAGATTATTTAGTCAAAC 3′.
A total of 8335 SAGE tags was analyzed from the asexual blood stages of P. falciparum, 3D7 strain. A preliminary analysis showed that these 8335 tags corresponded to 4798 unique genes (Figure (Figure2A).2A). Of these, 1254 genes were present at an abundance of two hits (or counts) or greater. The 537 tags expressed at abundance levels ≥20 tags (percent abundance of 0.2) accounted for 6.4% of the total collection of tags but only 0.3% (15) of the total number of unique genes. As expected, these abundance groups had the highest percentage of matches to GenBank entries (Figure (Figure2B),2B), implying that many highly expressed messages have been readily cloned and studied. The lower abundance tags (abundance of <20 tags) accounted for 93.6% of the total collection of tags, and represented a vast majority of the unique genes expressed in the parasite. Moreover, these tags gave many fewer matches to GenBank; hence, SAGE in P. falciparum will aid in the discovery of novel malarial genes.
To assess whether 14-bp tags could uniquely identify genes in the highly A-T–rich Plasmodium genome, these SAGE tags were searched against 3D7 genome sequence. We decided that for an accurate estimate of the “tag to gene” mapping in Plasmodium, all available sequence data, both cDNA and genomic, would provide the most complete picture. Sequencing of the P. falciparum genome is close to completion; however, much of the newly available P. falciparum sequence data has yet to be annotated. Therefore, the 187 most abundant SAGE tags were analyzed in a more rigorous manner by BLASTx analysis. A schematic of the BLAST analysis is shown in Figure Figure1.1. This analysis revealed that a majority of the SAGE tags (88%) corresponded to P. falciparum genome sequence. Most of the tags that match to single loci (70%) lie within known genes; hence, SAGE tags can be used to uniquely identify genes in Plasmodium. The other 30% of tags that match single sites correspond to unknown genes and hypothetical open reading frames. Thus, SAGE data reveal not only predicted ORFs that are expressed but also previously uncharacterized transcripts; hence, SAGE in Plasmodium has the capacity to assist in annotation of the genome.
Approximately 10% of the 187 most abundant SAGE tags did not match parasite sequence. We expect this number to decrease as the genome project nears completion. The percentage of SAGE tags that gave multiple matches within the P. falciparum genome was also calculated and found to be 18%. In the present study, the 35 tags that matched more than one loci were further investigated; of these tags, 21 (60%) matched two or three genes, whereas 14 (40%) matched greater than three genes. The latter set of tag sequences was of lower complexity in general. Northern blot analysis should help resolve whether tags that match multiple genes indeed represent multiple transcripts.
The BLAST analysis described above enabled us to assign genes to highly abundant SAGE tags; examples of these are listed in Table Table1.1. This analysis provided a snapshot of the major transcripts expressed by the parasite. A complete picture of metabolic pathways used by P. falciparum growing in culture will incorporate protein expression and stability; nevertheless, BLAST analysis of abundant SAGE tags provides the first global description of genes and hence, metabolic pathways that might be transcriptionally regulated at the level of expression. The most abundant transcripts were grouped into functional categories to reveal the transcriptional profile of 3D7 parasites grown in culture (Figure (Figure3).3). Many tags represented housekeeping functions carried out by all prokaryotic and eukaryotic cells (transcription, translation, chaperones, cytoskeleton), whereas some functional classes were highly specific for the unique life cycle of Plasmodium (membrane-associated proteins involved in invasion, DOXP pathway).
Interestingly, many of the highly abundant messages (5.3%) appear to be transcribed from the 6-kb mitochondrial genome, and another 2.1% (thioredoxin, vacuolar ATPase subunit B, ATPase transporter, ubiquinol cytochrome c reductase-like protein) are probably involved in oxidative metabolism. Therefore, a significant proportion of abundant transcripts encodes proteins that play a role in oxidative metabolism.
Stage-specific transcripts are highly represented in the list of abundant messages, reflecting the different developmental stages present in the culture. For example, mRNAs encoding cell surface proteins involved in merozoite invasion (Cowman et al., 2000 ) comprise 8% of the most abundant transcripts. These include merozoite surface proteins 3 and 4 (MSP-3 and -4), rhoptry-associated protein-1 (RAP-1), and merozoite capping protein. Tags corresponding to serine repeat antigen, a soluble protein that is associated with the parasitophorous vacuole, were found at high abundance (0.32%). Surprisingly, a tag representing the gametocyte surface antigen Pfg27/25, shown to be essential for gametogenesis (Lobo et al., 1999 ), was also present at high abundance (0.25%) in this SAGE library derived from asexual parasites.
Abundant SAGE tags represented major metabolic pathways of the malarial parasite. Because asexual blood stages of Plasmodium do not store energy reserves in the form of glycogen or lipids, glucose taken up from plasma is the primary source of energy (Sherman, 1991 ). Therefore, glucose metabolism is a prominent aspect of intracellular growth and not unexpectedly, proteins required for glucose metabolism were represented among the abundant tags (aldolase, phosphoenolpyruvate carboxykinase, and triosephosphate isomerase).
Although lipids are not used as a major source of energy by P. falciparum, there is a significant increase in levels of phospholipids, diacylglycerol, and triacylglycerol, within the red blood cell upon merozoite invasion (Vial and Ancelin, 1998 ). This increase in the total lipid content is associated with a biosynthetic requirement for lipids during formation of the membranes surrounding the parasite (the parasitophorous vacuolar membrane and the tubovesicular membrane). N-Myristoyl transferase, an enzyme that plays a role in the formation of lipoproteins, was found among the 187 most abundant tags; however, tags representing proteins involved in lipid biosynthesis were not present.
Intraerythrocytic P. falciparum parasites are capable of de novo synthesis of pyrimidines from precursor molecules (Walsh and Sherman, 1968 ), with a requirement for para-aminobenzoic acid and folate cofactors. Unlike their hosts, malarial parasites do not use exogenous folate cofactors, but instead synthesize these de novo (Scheibel and Sherman, 1988 ). SAGE data revealed tags corresponding to ribonucleotide reductase, an enzyme of the pyrimidine biosynthetic pathway, and dihydrofolate synthase, an enzyme of the folate pathway. Polyamine biosynthetic enzymes were also represented among the SAGE tags (ornithine decarboxylase and ornithine aminotransferase).
The unique intracellular niche of malarial parasites results in the expression of many parasite-specific metabolic pathways. For example, growth of the asexual parasites within red blood cells is accompanied by degradation of hemoglobin and the subsequent detoxification of heme by-products (Foley and Tilley, 1998 ; Krogstad and De, 1998 ; Rosenthal and Meshnick, 1998 ). Tags representing proteins implicated in the detoxification of heme (histidine-rich proteins I and II, glutathione reductase) were found at high abundance in the SAGE library. Surprisingly, the plasmepsin and falcipain proteases that play a role in hemoglobin degradation were not found in the list of highly expressed genes. This may be due to the fact that their transcription occurs at an earlier stage in the parasite life cycle than the trophozoite stage, which was the predominant stage in the study population. Alternatively, these transcripts may be present at a very low abundance.
Finally, SAGE data revealed the expression of mRNA encoding DOXP synthase at high levels (0.09%). The DOXP pathway was recently identified as a parasite-specific metabolic pathway important for isoprenoid biosynthesis (Jomaa et al., 1999 ). Because this pathway is localized in the apicoplast, a plant-derived organelle of Plasmodium, DOXP metabolism provides a novel target for antimalarial drug development.
To confirm the expression data in asexual-stage parasites as determined by SAGE, RT-PCR and Northern analysis of several genes with highly abundant SAGE tag counts (calmodulin, msp-3, rap-1, and pfg27/25; Figure Figure4)4) were performed. Pfg27/25 represents a gametocyte-specific antigen, whereas the other three are predicted to be expressed in asexual stages. Because the SAGE library was derived from a culture that contained no detectable gametocytes, pfg27/25 was specifically chosen for RT-PCR and Northern analysis. RT-PCR products for all four genes were generated from asexual-stage mRNA (Figure (Figure4A).4A). These were cloned, sequenced, and found to correspond to the expected gene. Transcripts at the predicted length for all four genes were also detected by Northern blotting (our unpublished results; Figure Figure4B).4B). The presence of pfg27/25 transcripts in the asexual stages of P. falciparum has been reported in another genome-wide expression analysis with the use of microarrays (Hayward et al., 2000 ).
For a more quantitative estimate of gene expression, quantitative Northern analysis of two highly expressed genes (msp-3 and calmodulin) was performed (our unpublished results; Figure Figure4B).4B). Here, the molar ratio of msp-3 to calmodulin was ~3:1, which is similar to the ratio of their SAGE tag counts (Figure (Figure4B).4B). Hence, SAGE tag data appear to correlate well with relative levels of mRNA within the cells.
A surprising observation of SAGE in P. falciparum was the large proportion of tags corresponding to antisense transcripts. Unlike microarrays, SAGE is able to detect antisense transcription because the orientation of the SAGE tag on the mRNA can be readily determined. A SAGE tag consists of the 4-bp recognition sequence (CATG) of the restriction enzyme NlaIII (this enzyme defines the position of each tag in an mRNA transcript) and 10 bp of adjacent sequence in the direction of the 3′ poly(A) tail of the RNA molecule. Among 45 annotated genes whose 5′ and 3′ ends are clearly denoted, 17% of the tags consisted of a CATG and the 3′ adjacent 10 bp, in the direction of the 5′ end of the transcript, on the noncoding strand of cDNA. This result was unexpected; hence, we wanted independent confirmation of the SAGE data. This was accomplished by strand-specific RT-PCR analysis of asexual as well as sexual blood stages, and strand-specific Northern analysis in erythrocytic stage parasites.
We confirmed the presence of antisense transcripts from erythrocytic stages by strand-specific RT-PCR analysis of the three genes calmodulin, rap-1, and msp-3, and subsequent sequencing of the RT-PCR products to establish gene identity. Based on SAGE data, we expected all three transcripts to be present in both the sense and antisense orientations, a prediction that was confirmed by RT-PCR (Figure (Figure5A,5A, lanes 1–12) and sequence analysis. On the other hand, a PCR product for hsp-86 was only detected for sense RNA (Figure (Figure5A,5A, lane 15), consistent with the absence of an antisense SAGE tag for this gene. Importantly, control experiments that excluded reverse transcriptase (lanes 2, 4, 6, 8, 10, 12, 14, and 16) indicated a lack of contaminating genomic DNA, showing that the PCR products obtained during strand-specific RT-PCR were indeed derived from RNA. These data validate the antisense transcripts predicted by SAGE.
The presence of antisense transcripts was also confirmed by strand-specific Northern analysis for calmodulin and msp-3. To control for the specificity of the strand-specific RNA probes, synthetic RNA corresponding to the sense or antisense strands of each gene was included in the experiment. This synthetic RNA consisted of short transcripts (250–300 bp within the coding regions) derived from each gene in vitro. Figure Figure5B5B shows that strand-specific probes can specifically detect synthetic antisense RNA (lanes 1 and 2 for calmodulin; lanes 7 and 8 for msp-3) or synthetic sense RNA (lanes 4 and 5 for calmodulin; lanes 10 and 11 for msp-3). With the use of these strand-specific probes, total RNA isolated from asexual stage parasites was shown to contain both antisense (~1 kb) and sense (~1.2 kb) transcripts for both calmodulin (lanes 3 and 6) and msp-3 (~2 kb) (lanes 9 and 12). Therefore, as confirmed by two independent techniques, the presence of antisense tags in the SAGE library reflects antisense transcription in asexual stages of the malarial parasite.
We wondered whether genes expressed in other stages of the Plasmodium life cycle also exhibited antisense transcription. To address this, the sexual stages (zygotes and ookinetes) of the chicken malarial parasite P. gallinaceum were tested for the presence of antisense RNAs. Pgs28 is a major surface antigen of P. gallinaceum sexual stages (Duffy et al., 1993 ), and transcription of the pgs28 gene has been studied previously. Strand-specific RT-PCR of total RNA from zygotes (0 h) and mature ookinetes (48 h) showed that the pgs28 gene expressed both sense and antisense transcripts (Figure (Figure6)6) at different stages of in vitro development (lanes 1, 5, and 9 show antisense PCR product).
This report demonstrates the application of SAGE in P. falciparum. Despite the low complexity of the genome, SAGE tags as short as 14 bp can uniquely identify a majority of genes in P. falciparum. This observation has been exploited to study transcription in the asexual stages of the parasite, resulting in new insights into the biology of the pathogen. First, we provide a description of the transcriptional profile of the 3D7 strain of P. falciparum that builds upon the extensive data generated by the Malaria Genome Project. Second, the major metabolic pathways present in blood stage parasites are delineated; modulation of these pathways in response to stimuli such as drug and immune pressure can now be studied. Finally, this report shows that Plasmodium parasites express antisense RNAs at multiple stages during the developmental cycle, a finding that has implications for transcriptional regulation of Plasmodium gene expression.
Of the tags that matched to single loci, 70% matched to known genes, whereas 30% matched to unknown genes or hypothetical ORFs. This distribution is in stark contrast to genome sequencing data, where 60% of the putative ORFs were of unknown function, whereas 40% were genes encoding proteins of known functions (Gardner et al., 1998 ). This discrepancy could be explained by the fact that the asexual blood stages are more amenable to cultivation and experimental manipulation in the laboratory than other stages; hence, many of the transcripts expressed in these stages at high abundance have been previously studied and are of known functions. It is also likely that a majority of the transcripts expressed during laboratory culture of asexual blood stages encode proteins that serve housekeeping functions conserved within organisms widely separated on the phylogenetic tree. The genes of unknown function identified by the Malaria Genome Project may turn out to be of importance in host–parasite interactions and disease; however, under culturing conditions only relatively few may be expressed at high levels. Alternatively, the higher percentage of uncharacterized, putative ORFs in the sequence data might be due to overprediction of genes. Because SAGE data reveals genes that are actually expressed in asexual stage parasites, identification of tags that correspond to unknown genes and hypothetical ORFs will be of tremendous use in annotation of the P. falciparum genome.
Some tags (10%) did not match to the Plasmodium databases. Because the P. falciparum genome is 80–90% complete, these tags should prove to be informative as the genome project proceeds to completion. Alternatively, tags that do not match genome sequence may turn out to span splice junctions. These questions should be resolved as more genome sequence becomes available. Nevertheless, SAGE in P. falciparum is comparable with other studies where tags with no matches to the genome were as high as 20% (Matsumura et al., 1999 ) and 23% (Yamashita et al., 2000 ) of the total tags.
Finally, of the 8335 tags, 18% gave multiple matches to Plasmodium databases, a number that is fourfold higher than that obtained from human pancreatic SAGE libraries, where ~5% of tags gave multiple matches (Velculescu et al., 1995 ). However, pancreatic SAGE tags were only searched against RNA sequence databases, in contrast to our more extensive analysis that surveyed all available Plasmodium genome sequence. Hence, the higher percentage of multiple matches to the genome may reflect the method of analysis rather than any limitation of the technique when applied to the A-T rich genome of Plasmodium. Alternatively, the higher percentage of tags giving multiple matches may be a consequence of the lower complexity of the Plasmodium genome. Ambiguous tags of interest can be investigated further on an individual basis by Northern analysis.
Other reports on SAGE have revealed metabolic profiles that are highly specific to the organism or tissue under study. For example, SAGE of mouse kidney revealed a preponderance of ion channels and mitochondrial enzymes, consistent with the role of the kidney in filtration and solute transport and the high-energy requirement for the same (El-Meanawy et al., 2000 ). Transcriptional profiling of the 100 most abundant SAGE tags derived from seedlings of the rice plant, Oryza sativa L., demonstrated a prevalence of prolamin, a storage protein expressed in seeds (Matsumura et al., 1999 ). As expected, other highly abundant transcripts included those encoding water channels and respiratory metabolism enzymes.
SAGE data from P. falciparum shed light on the transcriptional profile of blood stage parasites and hence reveal the classes of proteins and metabolic pathways that are probably used during asexual growth. For example, membrane-associated proteins form the most abundant category of expressed proteins. This is not surprising in light of the fact that the parasite is separated from its extracellular environment by three separate membranes: the host red blood cell membrane, the parasitophorous vacuole membrane, and the parasite plasma membrane (Torii and Masamichi, 1998 ). Many of these highly expressed proteins are stage specific and have been previously shown to be important in invasion of the red blood cell (MSP-3 and -4) (Barnwell and Galinski, 1998 ); others are transporters that may import nutrients into the parasite cell (importin β-subunit). Hence, the unique niche of the malarial parasite within the red blood cell requires the high expression of specific surface proteins.
A significant proportion (7.4%) of the most abundant tags was derived either from transcripts encoded on the 6-kb mitochondrial genome or from nuclear encoded transcripts involved in oxidative metabolism. High levels of RNA synthesis from the 6-kb element may reflect the fact that this episomally replicating molecule is present at ~20 copies per cell (Preiser et al., 1996 ). However, a high demand for mitochondrial function and oxidative metabolism is suggested by the abundance of nuclear transcripts encoding proteins (thioredoxin, ubiquinol cytochrome c reductase-like protein) probably involved in the maintenance of intracellular oxidative homeostasis. Moreover, SAGE data show that transcripts encoding the molecular chaperones hsp-60 and -70, which may be involved in import of nuclear-encoded proteins into the mitochondria (Das et al., 1997 ), are also expressed at high levels. Hence, mitochondrial functions are most highly represented in the abundant classes of SAGE tags, probably reflecting the microaerophilic lifestyle of the parasite within the red blood cell. The robust expression of genes involved in mitochondrial physiology may explain why mitochondrial pathways have been excellent targets for antimalarial drugs.
The major transcriptional pathways in the parasite as revealed by SAGE will help to identify potential drug targets and lead compounds. For example, atovaquone inhibits erythrocytic growth by targeting the mitochondrial cytochrome bc1 complex (Fry and Pudney, 1992 ). Further evidence that other highly expressed metabolic pathways could also serve as drug targets is found in the following studies: the antimalarial drug fosmidomycin has been shown to target DOXP metabolism (Jomaa et al., 1999 ); the ornithine decarboxylase inhibitor, difluoro-methylornithine, inhibits erythrocytic growth of P. falciparum in culture (Assaraf et al., 1984 ); and folate antagonists such as pyrimethamine and cycloguanil target dihydrofolate reductase (Ferone et al., 1969 ). Other major transcriptional patterns uncovered by SAGE in the parasite (proteasome, chaperones, unknown ORFs) may provide new targets for antimalarial drug development.
Most techniques for global analysis of gene expression are unable to distinguish sense and antisense transcripts. Due to the directional nature of SAGE tags (4 bp representing the NlaIII site that is closest to the 3′ end of each transcript, and 10 bp downstream on the coding strand), we were able to identify numerous antisense transcripts in the transcriptional repertoire of P. falciparum asexual-stage parasites. Strand-specific RT-PCR and Northern analysis confirmed this observation for three of the genes (msp-3, rap-1, and calmodulin) predicted to transcribe antisense messages. The fact that antisense transcription can be detected in Plasmodium by three independent methods suggests that this is a bona fide biological phenomenon and not an artifact of the SAGE procedure.
Whether antisense RNAs transcribed by the malarial parasite have poly(A) tails is unclear. However, their representation in the SAGE library suggests that (similar to our observations with mitochondrial transcripts in the study by Munasinghe et al., 2000 ) due to the presence of long poly(A) tracts in the genome of P. falciparum, SAGE results in the sampling of both polyadenylated RNAs as well as those lacking poly(A) tails.
Our data also demonstrate that antisense transcripts are expressed in other stages of Plasmodium development. The pgs28 gene that encodes a major surface antigen of P. gallinaceum sexual stages has been studied extensively (Duffy et al., 1993 ). Transcription of pgs28 is restricted to the zygotes and ookinetes. Strand-specific RT-PCR shows that pgs28 expresses both sense and antisense transcripts in both stages. Hence, the presence of antisense transcripts may be a widespread phenomenon in multiple stages of Plasmodium development and should be tested further. For example, a family of genes (var) encodes variable surface proteins involved in host-parasite interactions; var genes are transcribed during erythrocytic growth, resulting in the expression of the PfEMP-1 protein (Wahlgren et al., 1999 ). Several var genes are transcribed in the ring stages, whereas a single var gene is transcribed in trophozoites (Chen et al., 1998 ; Scherf et al., 1998 ). It would be interesting to test whether any of the ring stage var transcripts are antisense.
What is the biological significance of antisense transcription in P. falciparum? Antisense transcripts may reflect mechanisms of transcriptional initiation in a parasite with a highly A-T–rich genome (86% A-T in noncoding regions and 76% in coding sequence) (Bowman et al., 1999 ). Numerous studies have shown that transcription in P. falciparum is initiated from the A-T–rich 5′ upstream region of genes, resulting in sense transcripts (Horrocks et al., 1998 ; Dechering et al., 1999 ; Horrocks and Lanzer, 1999 ). The presence of antisense transcripts for 17% of annotated genes implies novel mechanisms of transcriptional initiation and termination, including potential roles in posttranscriptional control of protein expression.
In conclusion, we have shown that SAGE can be readily adapted for the study of global transcription in P. falciparum. SAGE of 3D7 asexual parasites sheds light on the prominent metabolic pathways used in these stages. Because blood stages are the targets of both antimalarial drugs and the host immune system, this comprehensive transcriptional profile generated by SAGE will form the basis for future comparisons of gene expression under drug or immune pressure. Finally, the unique nature of SAGE reveals a novel phenomenon, that of antisense transcription that has previously been missed.
We thank Drs. J. Kissinger and D. Roos (University of Pennsylvania, Philadelphia, PA; http://www.plasmoDB.org) for providing access to assembled P. falciparum genomic sequences. We also thank Dr. Connie Chow for insightful comments about this manuscript, and Dr. Kevin Militello for providing the hsp86 primers and valuable suggestions. We acknowledge the invaluable data provided by the Malaria Genome Consortium: Sequence data for P. falciparum chromosome (1, 3, 4, 5, 6, 7, 8, 9, and 13) was obtained from The Sanger Center Web site at http://www.sanger.ac.uk/Projects/P_falciparum/. Sequencing of P. falciparum chromosome (1, 3, 4, 5, 6, 7, 8, 9, and 13) was accomplished as part of the Malaria Genome Project with support by The Wellcome Trust. Sequence data for P. falciparum chromosome 12 was obtained from the Stanford DNA Sequencing and Technology Center Web site at http://www-sequence.stanford.edu/group/malaria. Sequencing of P. falciparum chromosome 12 was accomplished as part of the Malaria Genome Project with support by the Burroughs Wellcome Fund. Preliminary sequence data for P. falciparum chromosome (2, 10, 11, and 14) was obtained from The Institute for Genomic Research Web site (www.tigr.org). Sequencing of chromosome (2, 10, 11, and 14) was part of the International Malaria Genome Sequencing Project and was supported by awards from the Burroughs Wellcome Fund and the U.S. Department of Defense. The Chromosome 2 Sequencing Project was a collaborative effort by The Institute for Genomic Research (TIGR) and the Naval Medical Research Center. This work was supported by the Burroughs Wellcome Fund and by the U.S. Department of Defense.