Several recent studies have reported large numbers of cDNA sequences that have greatly improved genome annotation and gene models in the
P. falciparum genome [
17-
19]; however, none of the studies presented sequences from G or stages from mosquito. In this study, we obtained cDNA sequences from GII, GV, Oo, and four time points from asexual stages as well as strand-specific cDNA sequences from LT, Sc, GII and GV. We identified many unknown splicing junctions and stage specific expressed genes, including Oo-specific genes. Alignment of our sequences to the 3D7 genome sequence produced good matches of intron-exon junctions between our sequences and the gene models in the genome (> 6,000 of the 8,553 predicted junctions with 10 or more reads). We showed that Gs appear to have sense transcripts characterized by smaller variance of antisense than the asexual stages.
Natural antisense transcripts (NATs) have been widely recognized as an important mechanism of post-transcriptional regulation in both prokaryote and eukaryotic organisms [
27-
29]. In the human and mouse genomes, up to 72% of all genomic loci are found to have both sense and antisense transcripts [
22,
29]. Antisense transcripts have been shown to play a role in sense RNA transcription, pre-mRNA splicing, RNA editing, stability, and transport, and in regulation of translation [
27]. NAT can regulate gene expression through several mechanisms [
30]. In the transcriptional interference model, two bulky RNA polymerase II complexes on opposite DNA strands may interfere with one another, arresting transcription in one direction. In the RNA masking model, an antisense may mask a splice site on the sense pre-mRNA sequence, leading to an alternative splicing event. Formation of double-stranded RNA such as RNA editing and RNA interference is another type of regulatory mechanism, which may lead to degradation of sense transcripts (RNAi). In chromatin remodeling mechanism, transcription of non-coding antisense transcripts may be also involved in monoallelic gene expression such as genomic imprinting, X-inactivation and clonal expression of lymphocyte genes. Antisense transcripts can silence the expression of nearby genes through chromatin remodeling, most likely through the recruitment of histone-modifying enzymes [
30]. NATs have been reported from
P. falciparum previously [
4,
18,
31,
32], and antisense RNA or oligodeoxynucleotides have also been used to regulate gene expression in the parasite [
33-
35]. Our observation that the majority of the genes in schizont are expressed with S/AS pairs (Figure and ) is consistent with these previous reports and with the observations in human and mouse in which the majority of genes are expressed in both directions [
22,
29]. Although we do not have additional experimental evidence to confirm that the antisense transcripts we found here play a role in gene expression regulation through a specific mechanism, the presence of high levels of antisense RNA in a stage specific manner suggests that these antisense transcripts are likely associated with some unique features of the parasite developmental stages. Although the mechanism of RNAi has been shown to be absent in malaria parasites [
36], pairing of antisense and sense transcripts may still play a role in gene expression through non-RNAi mechanisms. The mapping of ~86% of all antisense reads to intron containing genes suggests that antisense may play a role in intron splicing, possibly through the mechanism of masking the sense splicing sites; the predominant presence of antisense transcripts in intron-containing genes could be due to promoter activities of the introns. Of course, some of the genes with antisense coverage could also be artifacts produced by DNA-dependent DNA polymerase activity of reverse transcriptase [
37]; however, it is difficult to imagine that one artifact only occurred in one stage such as G but not in other stages, because the samples were processed similarly at the same time.
An interesting observation from our strand-specific library sequences was the changes in the numbers of genes expressed in either a single direction (sense or antisense only) or in different S/AS mixtures in different stages. Whereas the majority of the genes in schizont are transcribed in mixtures with 10-30% RNA in the opposite directions of the major transcripts (Figure and ), the sexual stages appear to have a higher proportion of genes transcribed in higher S/AS ratios, suggesting a gradual shifting from more genes with mixture of transcripts in both directions in early G to more single-direction transcripts in mature G. Similarly, there appeared to be more genes with strand-specific transcripts in LT than Sc stage. Interestingly, it has been shown that the mouse X chromosome contains fewer bidirectional pairs of S/AS transcripts than the autosomes, and S:AS pairing is also associated with imprinted loci [
22]. High levels of strand-specific transcripts in G stages will lead to fewer S/AS pairs in these stages, suggesting the possibility of a similar mechanism in regulating sexual development in the malaria parasite mediated through antisense RNA. Further investigation on the patterns of transcription direction in more stages will provide additional information on the changes in S:AS ratios and the relationship of S:AS ratio variation and parasite development cycle. The observation may also help explain the lack of correlation in expression level of RNA transcript and protein [
38,
39] and suggests that analysis of RNA expression should be conducted using strand-specific cDNA libraries so that more precise transcriptional patterns can be characterized.
The mature gametocyte (GV) is a unique sexual stage that is developmentally arrested but can quickly resume development, producing male and female gametes as soon as it is taken into a mosquito midgut. In the rodent malaria parasite
Plasmodium berghei, it has been shown that many genes are transcribed but not translated in the G, which is regulated by a mechanism--termed "translational repression" mediated by DDX6-class RNA helicase, DOZI (development of zygote inhibited)--found in a complex with mRNA species in cytoplasmic bodies [
40-
42]. Our data showing the presence of large numbers of antisense transcripts (Additional file
1 and Additional file
2) in G suggest that these antisense transcripts could also be involved in gene expression regulation, either being part of the described translational suppression complex or other unknown mechanisms.
Using strand-specific reads, we were able to confirm ~84% and 73% of the intron-exon junctions in the predicted gene models if 1 or ≥ 10 bridging reads were used, respectively. Compared with the sequences from two previous reports [
17,
19], our strand-specific sequences appeared to match the predicted intron-exon junctions better (higher percentage) than those of the other two studies. The performance of a set of libraries in identifying splice junctions depends on the read lengths, base quality, total number of reads, the fraction of the total number of transcripts expressed at appreciable levels, and sequence alignment parameters. One explanation for the observed higher matches from our data is likely that our libraries included transcripts from G and Oo, whereas the other two studies had sequences from asexual stages only and therefore would miss the splice junctions in sexual stages. Our sequences also identified nearly 700 putative intron-exon junctions using a PPV = 0.9 cutoff. As expected, many of the new junctions were from genes that were expressed in Gs and were not characterized in previous studies. Another potential explanation for lower junction coverage for the data from Otto et al. could be related to bias amplification using high PCR extension temperature. We have shown that using an extension temperature of 60°C can increase the coverage of sequences in non-coding regions [
43]. Intron-exon junctions at 5' or 3' UTR may not amplify well using standard PCR conditions that were likely employed by Otto et al. [
17]. We used 60°C extension temperature, and Bartfai et al. [
19] used a linear amplification method that avoided biased sequencing of the AT-rich
Plasmodium genome; these two approaches therefore produced similar numbers of junction calls (Figure ).
Sequences from Oo also allow us to systematically characterize transcripts expressed specifically in this stage (Additional file
6). Oo is a transient stage in mosquito, and it has been difficult to obtain sufficient parasite material for genome-wide analysis. Although the majority of the sequences we obtained were transcripts from mosquito, the sequences allowed us to identify many genes that were upregulated and uniquely expressed at this stage, including genes encoding Oo capsule protein and Oo secreted-proteins. On the other hand, there were also 26 genes that encode conserved
Plasmodium proteins and are specifically expressed or upregulated in Oo. Our work represents the first large-scale transcriptional analysis from the Oo stage of
P. falciparum. The information presented here provides new insights into gene expression and regulation in this transient stage.