3D7 SAGE Tag Library from Asexual Blood Stage Parasites
A total of 8335 SAGE tags was analyzed from the asexual blood stages of P. falciparum, 3D7 strain. A preliminary analysis showed that these 8335 tags corresponded to 4798 unique genes (Figure A). Of these, 1254 genes were present at an abundance of two hits (or counts) or greater. The 537 tags expressed at abundance levels ≥20 tags (percent abundance of 0.2) accounted for 6.4% of the total collection of tags but only 0.3% (15) of the total number of unique genes. As expected, these abundance groups had the highest percentage of matches to GenBank entries (Figure B), implying that many highly expressed messages have been readily cloned and studied. The lower abundance tags (abundance of <20 tags) accounted for 93.6% of the total collection of tags, and represented a vast majority of the unique genes expressed in the parasite. Moreover, these tags gave many fewer matches to GenBank; hence, SAGE in P. falciparum will aid in the discovery of novel malarial genes.
BLAST Analysis of SAGE Tags
To assess whether 14-bp tags could uniquely identify genes in the highly A-T–rich Plasmodium genome, these SAGE tags were searched against 3D7 genome sequence. We decided that for an accurate estimate of the “tag to gene” mapping in Plasmodium, all available sequence data, both cDNA and genomic, would provide the most complete picture. Sequencing of the P. falciparum genome is close to completion; however, much of the newly available P. falciparum sequence data has yet to be annotated. Therefore, the 187 most abundant SAGE tags were analyzed in a more rigorous manner by BLASTx analysis. A schematic of the BLAST analysis is shown in Figure . This analysis revealed that a majority of the SAGE tags (88%) corresponded to P. falciparum genome sequence. Most of the tags that match to single loci (70%) lie within known genes; hence, SAGE tags can be used to uniquely identify genes in Plasmodium. The other 30% of tags that match single sites correspond to unknown genes and hypothetical open reading frames. Thus, SAGE data reveal not only predicted ORFs that are expressed but also previously uncharacterized transcripts; hence, SAGE in Plasmodium has the capacity to assist in annotation of the genome.
Approximately 10% of the 187 most abundant SAGE tags did not match parasite sequence. We expect this number to decrease as the genome project nears completion. The percentage of SAGE tags that gave multiple matches within the P. falciparum genome was also calculated and found to be 18%. In the present study, the 35 tags that matched more than one loci were further investigated; of these tags, 21 (60%) matched two or three genes, whereas 14 (40%) matched greater than three genes. The latter set of tag sequences was of lower complexity in general. Northern blot analysis should help resolve whether tags that match multiple genes indeed represent multiple transcripts.
Abundant Transcripts Expressed in P. falciparum Grown in Culture
The BLAST analysis described above enabled us to assign genes to highly abundant SAGE tags; examples of these are listed in Table . This analysis provided a snapshot of the major transcripts expressed by the parasite. A complete picture of metabolic pathways used by P. falciparum growing in culture will incorporate protein expression and stability; nevertheless, BLAST analysis of abundant SAGE tags provides the first global description of genes and hence, metabolic pathways that might be transcriptionally regulated at the level of expression. The most abundant transcripts were grouped into functional categories to reveal the transcriptional profile of 3D7 parasites grown in culture (Figure ). Many tags represented housekeeping functions carried out by all prokaryotic and eukaryotic cells (transcription, translation, chaperones, cytoskeleton), whereas some functional classes were highly specific for the unique life cycle of Plasmodium (membrane-associated proteins involved in invasion, DOXP pathway).
| Table 1Highly expressed genes in the 3D7 library |
Interestingly, many of the highly abundant messages (5.3%) appear to be transcribed from the 6-kb mitochondrial genome, and another 2.1% (thioredoxin, vacuolar ATPase subunit B, ATPase transporter, ubiquinol cytochrome c reductase-like protein) are probably involved in oxidative metabolism. Therefore, a significant proportion of abundant transcripts encodes proteins that play a role in oxidative metabolism.
Stage-specific transcripts are highly represented in the list of abundant messages, reflecting the different developmental stages present in the culture. For example, mRNAs encoding cell surface proteins involved in merozoite invasion (
Cowman et al., 2000 
) comprise 8% of the most abundant transcripts. These include merozoite surface proteins 3 and 4 (MSP-3 and -4), rhoptry-associated protein-1 (RAP-1), and merozoite capping protein. Tags corresponding to serine repeat antigen, a soluble protein that is associated with the parasitophorous vacuole, were found at high abundance (0.32%). Surprisingly, a tag representing the gametocyte surface antigen
Pfg27/25, shown to be essential for gametogenesis (
Lobo et al., 1999 
), was also present at high abundance (0.25%) in this SAGE library derived from asexual parasites.
Abundant SAGE tags represented major metabolic pathways of the malarial parasite. Because asexual blood stages of
Plasmodium do not store energy reserves in the form of glycogen or lipids, glucose taken up from plasma is the primary source of energy (
Sherman, 1991 
). Therefore, glucose metabolism is a prominent aspect of intracellular growth and not unexpectedly, proteins required for glucose metabolism were represented among the abundant tags (aldolase, phosphoenolpyruvate carboxykinase, and triosephosphate isomerase).
Although lipids are not used as a major source of energy by
P. falciparum, there is a significant increase in levels of phospholipids, diacylglycerol, and triacylglycerol, within the red blood cell upon merozoite invasion (
Vial and Ancelin, 1998 
). This increase in the total lipid content is associated with a biosynthetic requirement for lipids during formation of the membranes surrounding the parasite (the parasitophorous vacuolar membrane and the tubovesicular membrane).
N-Myristoyl transferase, an enzyme that plays a role in the formation of lipoproteins, was found among the 187 most abundant tags; however, tags representing proteins involved in lipid biosynthesis were not present.
Intraerythrocytic
P. falciparum parasites are capable of de novo synthesis of pyrimidines from precursor molecules (
Walsh and Sherman, 1968 
), with a requirement for para-aminobenzoic acid and folate cofactors. Unlike their hosts, malarial parasites do not use exogenous folate cofactors, but instead synthesize these de novo (
Scheibel and Sherman, 1988 
). SAGE data revealed tags corresponding to ribonucleotide reductase, an enzyme of the pyrimidine biosynthetic pathway, and dihydrofolate synthase, an enzyme of the folate pathway. Polyamine biosynthetic enzymes were also represented among the SAGE tags (ornithine decarboxylase and ornithine aminotransferase).
The unique intracellular niche of malarial parasites results in the expression of many parasite-specific metabolic pathways. For example, growth of the asexual parasites within red blood cells is accompanied by degradation of hemoglobin and the subsequent detoxification of heme by-products (
Foley and Tilley, 1998 
;
Krogstad and De, 1998 
;
Rosenthal and Meshnick, 1998 
). Tags representing proteins implicated in the detoxification of heme (histidine-rich proteins I and II, glutathione reductase) were found at high abundance in the SAGE library. Surprisingly, the plasmepsin and falcipain proteases that play a role in hemoglobin degradation were not found in the list of highly expressed genes. This may be due to the fact that their transcription occurs at an earlier stage in the parasite life cycle than the trophozoite stage, which was the predominant stage in the study population. Alternatively, these transcripts may be present at a very low abundance.
Finally, SAGE data revealed the expression of mRNA encoding DOXP synthase at high levels (0.09%). The DOXP pathway was recently identified as a parasite-specific metabolic pathway important for isoprenoid biosynthesis (
Jomaa et al., 1999 
). Because this pathway is localized in the apicoplast, a plant-derived organelle of
Plasmodium, DOXP metabolism provides a novel target for antimalarial drug development.
Validation of SAGE Data
To confirm the expression data in asexual-stage parasites as determined by SAGE, RT-PCR and Northern analysis of several genes with highly abundant SAGE tag counts (calmodulin,
msp-3,
rap-1, and
pfg27/25; Figure ) were performed.
Pfg27/25 represents a gametocyte-specific antigen, whereas the other three are predicted to be expressed in asexual stages. Because the SAGE library was derived from a culture that contained no detectable gametocytes,
pfg27/25 was specifically chosen for RT-PCR and Northern analysis. RT-PCR products for all four genes were generated from asexual-stage mRNA (Figure A). These were cloned, sequenced, and found to correspond to the expected gene. Transcripts at the predicted length for all four genes were also detected by Northern blotting (our unpublished results; Figure B). The presence of
pfg27/25 transcripts in the asexual stages of
P. falciparum has been reported in another genome-wide expression analysis with the use of microarrays (
Hayward et al., 2000 
).
For a more quantitative estimate of gene expression, quantitative Northern analysis of two highly expressed genes (msp-3 and calmodulin) was performed (our unpublished results; Figure B). Here, the molar ratio of msp-3 to calmodulin was ~3:1, which is similar to the ratio of their SAGE tag counts (Figure B). Hence, SAGE tag data appear to correlate well with relative levels of mRNA within the cells.
Antisense Transcripts
A surprising observation of SAGE in P. falciparum was the large proportion of tags corresponding to antisense transcripts. Unlike microarrays, SAGE is able to detect antisense transcription because the orientation of the SAGE tag on the mRNA can be readily determined. A SAGE tag consists of the 4-bp recognition sequence (CATG) of the restriction enzyme NlaIII (this enzyme defines the position of each tag in an mRNA transcript) and 10 bp of adjacent sequence in the direction of the 3′ poly(A) tail of the RNA molecule. Among 45 annotated genes whose 5′ and 3′ ends are clearly denoted, 17% of the tags consisted of a CATG and the 3′ adjacent 10 bp, in the direction of the 5′ end of the transcript, on the noncoding strand of cDNA. This result was unexpected; hence, we wanted independent confirmation of the SAGE data. This was accomplished by strand-specific RT-PCR analysis of asexual as well as sexual blood stages, and strand-specific Northern analysis in erythrocytic stage parasites.
We confirmed the presence of antisense transcripts from erythrocytic stages by strand-specific RT-PCR analysis of the three genes calmodulin, rap-1, and msp-3, and subsequent sequencing of the RT-PCR products to establish gene identity. Based on SAGE data, we expected all three transcripts to be present in both the sense and antisense orientations, a prediction that was confirmed by RT-PCR (Figure A, lanes 1–12) and sequence analysis. On the other hand, a PCR product for hsp-86 was only detected for sense RNA (Figure A, lane 15), consistent with the absence of an antisense SAGE tag for this gene. Importantly, control experiments that excluded reverse transcriptase (lanes 2, 4, 6, 8, 10, 12, 14, and 16) indicated a lack of contaminating genomic DNA, showing that the PCR products obtained during strand-specific RT-PCR were indeed derived from RNA. These data validate the antisense transcripts predicted by SAGE.
The presence of antisense transcripts was also confirmed by strand-specific Northern analysis for calmodulin and msp-3. To control for the specificity of the strand-specific RNA probes, synthetic RNA corresponding to the sense or antisense strands of each gene was included in the experiment. This synthetic RNA consisted of short transcripts (250–300 bp within the coding regions) derived from each gene in vitro. Figure B shows that strand-specific probes can specifically detect synthetic antisense RNA (lanes 1 and 2 for calmodulin; lanes 7 and 8 for msp-3) or synthetic sense RNA (lanes 4 and 5 for calmodulin; lanes 10 and 11 for msp-3). With the use of these strand-specific probes, total RNA isolated from asexual stage parasites was shown to contain both antisense (~1 kb) and sense (~1.2 kb) transcripts for both calmodulin (lanes 3 and 6) and msp-3 (~2 kb) (lanes 9 and 12). Therefore, as confirmed by two independent techniques, the presence of antisense tags in the SAGE library reflects antisense transcription in asexual stages of the malarial parasite.
We wondered whether genes expressed in other stages of the
Plasmodium life cycle also exhibited antisense transcription. To address this, the sexual stages (zygotes and ookinetes) of the chicken malarial parasite
P. gallinaceum were tested for the presence of antisense RNAs.
Pgs28 is a major surface antigen of
P. gallinaceum sexual stages (
Duffy et al., 1993 
), and transcription of the
pgs28 gene has been studied previously. Strand-specific RT-PCR of total RNA from zygotes (0 h) and mature ookinetes (48 h) showed that the
pgs28 gene expressed both sense and antisense transcripts (Figure ) at different stages of in vitro development (lanes 1, 5, and 9 show antisense PCR product).