|Home | About | Journals | Submit | Contact Us | Français|
Asexual development in Toxoplasma gondii is a vital aspect of the parasite's life cycle, allowing transmission and avoidance of the host immune response. Differentiation of rapidly dividing tachyzoites into slowly growing, encysted bradyzoites involves significant changes in both physiology and morphology. We generated microarrays of ~4,400 Toxoplasma cDNAs, representing a minimum of ~600 genes (based on partial sequencing), and used these microarrays to study changes in transcript levels during tachyzoite-to-bradyzoite differentiation. This approach has allowed us to (i) determine expression profiles of previously described developmentally regulated genes, (ii) identify novel developmentally regulated genes, and (iii) identify distinct classes of genes based on the timing and magnitude of changes in transcript levels. Whereas microarray analysis typically involves comparisons of mRNA levels at different time points, we have developed a method to measure relative transcript abundance between genes at a given time point. This method was used to determine transcript levels in parasites prior to differentiation and to further classify bradyzoite-induced genes, thus allowing a more comprehensive view of changes in gene expression than is provided by standard expression profiles. Newly identified developmentally regulated genes include putative surface proteins (a SAG1-related protein, SRS9, and a mucin-domain containing protein), regulatory and metabolic enzymes (methionine aminopeptidase, oligopeptidase, aminotransferase, and glucose-6-phosphate dehydrogenase homologues), and a subset of genes encoding secretory organelle proteins (MIC1, ROP1, ROP2, ROP4, GRA1, GRA5, and GRA8). This analysis permits the first in-depth look at changes in gene expression during development of this complex protozoan parasite.
Toxoplasma gondii is a ubiquitous protozoan parasite able to infect a broad range of warm-blooded animals, including humans (12, 21). This parasite is a member of the Apicomplexa family, which includes the causative agents of malaria (Plasmodium spp.) and chicken coccidiosis (Eimeria spp.). Toxoplasma is distributed worldwide, with human infection rates ranging from 15 to 85% depending on the country. The prevalence of this parasite is due to its effective means of dissemination and ability to resist immune system clearance. Both properties depend on the complex developmental biology of this single-celled eukaryote. For example, one form of dissemination is through the sexual cycle, which occurs only in felines. The product of this cycle, oocysts, are passed in the feces of cats and, after 2 to 3 days in the environment, are infectious if ingested. Sporozoites within the oocysts enter the asexual cycle by differentiating into rapidly dividing tachyzoites that disseminate throughout the intermediate host. Pressure from the immune response causes tachyzoites to differentiate into slowly dividing bradyzoites, which form cysts within muscle and brain cells (13). Bradyzoite cysts are somehow protected from the host immune response and establish a life-long chronic infection. If ingested, e.g., through eating undercooked meat, bradyzoites are released from the tissue cysts in the gut, invade epithelial cells and differentiate into tachyzoites that initiate a new asexual cycle. Thus, T. gondii is able to disseminate via both the sexual and asexual cycles.
In most cases of human toxoplasmosis, infection ultimately becomes chronic with bradyzoites remaining essentially dormant and causing no overt disease. Serious disease can occur, however, when bradyzoites reactivate and differentiate into tachyzoites. In immunocompromised individuals the result can be potentially fatal encephalitis. Effective treatment for chronic toxoplasmosis is not available due to the inability of drugs to eliminate tissue cysts. The differentiation process, therefore, represents an important target for both vaccine and drug development.
Asexual development in T. gondii is an attractive model for the study of development in protozoa. Effective methods of transfection and targeted disruption of genes (which is facilitated by the haploid state of the asexual stages) make Toxoplasma amenable to genetic manipulation (3, 11). Various methods of inducing differentiation from tachyzoites to bradyzoites in vitro (28) are available and allow a detailed investigation of this biological process. The genetic regulatory mechanisms that control differentiation in Toxoplasma are unknown. Several studies have identified stage-specific proteins, including the surface antigens BSR4 (19), CST1 (39), SAG2C/D (20), SAG4A (22); metabolic enzymes (pyruvate kinase, lactate dehydrogenase, and enolase [9, 14, 37]); and stress response proteins (BAG1/5 [4, 25]). Regulation of expression occurs at both the posttranscriptional and the transcriptional levels (19, 38).
Stage-specific transcription is an important aspect of the asexual cycle; the parasite must alter expression of an unknown number of genes in order to change its physiology. EST sequencing of an in vivo bradyzoite cDNA library has identified a number of potential bradyzoite-specific genes based on the absence of these genes in tachyzoite cDNA libraries (22). A subtractive library approach has been used to identify potential developmentally regulated genes, although this approach does not provide a measure of mRNA abundance or timing of expression (36). Preliminary promoter analysis has identified sequences responsible for the bradyzoite-specific expression of a heat shock protein, BAG1 (5), and the tachyzoite-specific expression of NTPase (24), but a common promoter element or transcription factor that regulates stage-specific expression has yet to be identified.
A key step in dissecting the regulatory pathways that control bradyzoite differentiation is to expand the known set of developmentally regulated genes and to identify different patterns of gene expression. To accomplish this, we have generated T. gondii cDNA microarrays and used them to analyze gene expression during in vitro bradyzoite development. Here we describe the identification of several developmentally regulated genes, as well as the expression patterns of these genes. The results provide a comprehensive view of transcription regulation during differentiation and reveal novel aspects of bradyzoite biology.
T. gondii DNA microarrays were constructed from the previously described bradyzoite cDNA library (8). PCR products (generated by using 5 μl of overnight bacterial culture as a template) were ethanol precipitated, dried, resuspended in 3× SSC (1× SSC is 0.15 M NaCl plus 0.015 M sodium citrate), and spotted onto poly-l-lysine-treated glass slides as previously described (15).
All tissue culture was done with a Pru strain (the BSG-4 clone) of Toxoplasma that has been engineered to express green fluorescent protein (GFP) under the control of the LDH2 promoter, thus conferring bradyzoite-specific expression of GFP (U. Singh, J. L. Brewer, and J. C. Boothroyd, unpublished data). In vitro bradyzoite development was induced with the high-pH (8.1) method as previously described (27, 29, 35). At 48 h after invasion, the flasks containing tachyzoites were scraped, passed sequentially through a 27-gauge needle to release parasites from the host cells, and pelleted by spinning at 2,600 rpm for 15 min at room temperature. At 48, 72, and 96 h after induction of bradyzoite formation, the medium was removed from the flasks containing bradyzoites, followed by scraping and digestion in 170 mM NaCl-pepsin (0.1 mg/ml)-60 mM HCl for 1 min at 37°C and then neutralization with 94 mM Na2CO3 (22).
Total RNA was isolated from the pelleted tachyzoites and bradyzoites by using Trizol reagent (Gibco-BRL). RNA samples were electrophoresed and stained with SYBR green (Molecular Probes), and the fluorescence intensity of both the human and Toxoplasma large-subunit ribosomal RNAs (the human subunit is larger and therefore distinguishable from the Toxoplasma subunit) was determined by using ImageQuant (Molecular Dynamics). The accuracy of the SYBR green measurements was confirmed by Northern blot probing for the large subunit RNA of Toxoplasma (data not shown).
Toxoplasma cDNA probe preparation was performed essentially as described previously (15). Briefly, 3 μg of Toxoplasma RNA was converted to first-strand cDNA by using Superscript II (Life Technologies) and an oligo-dT18 primer (New England BioLabs). The cDNA was labeled with Cy5-dUTP (Amersham Pharmacia Biotech) with Klenow enzyme and random nanomer. Then, 500 ng of reference sample probe (prepared by PCR amplifying the polylinker region of the pBluescript plasmid [Stratagene] with T3 and T7 primers) was labeled with Cy3-dUTP. The Cy5- and Cy3-labeled samples were mixed, hybridized to a microarray overnight at 65°C, washed (15), and scanned with a GenePix 4000 microarray scanner (Axon Instruments).
For the BAG1 probing experiment, 200 ng of a PCR-amplified BAG1 cDNA was labeled with Cy5-dUTP with Klenow enzyme and hybridized to a microarray after mixing it with a Cy3-labeled reference as described above. Spots with a channel 2 (Cy5) mean net fluorescence intensity of >500 and a Cy5/Cy3 ratio of >1.0 were considered positive for BAG1 hybridization.
Microarrays were analyzed with the Scanalyze program (written by Mike Eisen and available on the Web) to determine the fluorescence intensities of the two dyes at each spot. Spots of poor quality were removed from further analysis based on the following criteria: the channel 1 (reference) net mean intensity must be ≥250 pixels, and the percentage of channel 1 pixels 1.5 times above background must be ≥66%. Spots must also have a channel 2-channel 1 normalized net signal intensity ratio of ≥0.1. This filter was used because channel 2-channel 1 normalized net ratios of <0.1 were indistinguishable from spots with channel 2 signal due to background only.
The signal intensities of the microarrays were normalized by adjusting the channel 2 intensity (for the Cy5-labeled Toxoplasma cDNA) so that the mean Cy5/Cy3 (Toxoplasma/reference) ratio for each microarray was equal to 1.0. Due to the bradyzoite bias of the EST library used to make these microarrays, a large number of spots in the bradyzoite arrays have high Cy5/Cy3 ratios, while fewer spots in the tachyzoite arrays have high Cy5/Cy3 ratios. This increase in signal intensity results in a higher Cy5/Cy3 mean for the bradyzoite arrays; the prenormalization means for one set of arrays, for example, were 1.74 (day 2 tachyzoites), 2.46 (day 2 bradyzoites), 3.29 (day 3 bradyzoites), and 3.28 (day 4 bradyzoites). The higher mean signal intensity for the bradyzoite arrays artificially lowers the Cy5 signal when we normalized it to a mean of 1.0. To avoid this effect, we removed all spots that were greater than one standard deviation above the prenormalized mean for the bradyzoite arrays (ranging from 13 to 15% of the total spots) and then calculated the normalization factor based on the remaining spots.
For each time point, the mean Toxoplasma cDNA versus reference value for each spot across all four microarrays was determined. Spots with only one datum point and spots with highly variable values (average deviation of >40% of the mean) were also excluded from further analysis. The fold change values were then calculated by dividing the bradyzoite versus reference mean by the tachyzoite versus reference mean for each spot. The values for all spots within a single contig were then averaged. When multiple contigs were found to represent a single gene, based on overlap identified by using database searches and our own sequencing data, the values for all ESTs within those contigs were averaged. Data for these types of genes are presented with all contigs listed (Table (Table1,1, LDH2, for example).
Statistical analysis of microarrays (SAM) was used according to recommended protocols (software and instructions are available at http://www-stat.stanford.edu/~tibs/SAM/). We performed two-class, unpaired sample analysis by treating the four day 2 tachyzoite arrays as one “class” and the four bradyzoite arrays at each time point as the second “class.” SAM identifies significant differences in mean expression levels based on repeat measurements of a given spot. For each set of experiments, a threshold “delta” value is defined (in our case, the absolute difference between the bradyzoite/reference and the tachyzoite/reference) that determines how different the two data sets must be in order to be considered significant. The delta value influences the false discovery rate (FDR), an estimate of the percentage of spots that will be falsely determined to be significant, such that higher delta values give lower FDRs and vice versa. The ability to set thresholds allows asymmetric cutoffs to be used for different categories of genes.
For our analysis, we used delta values that gave FDRs of ca. 10% when we tested for “induction” and “constitutive” expression and delta values that gave FDRs of ca. 20% when we tested for “repression” and “intermediate” expression. The exact delta values and corresponding FDRs for each time point (at the ~10% FDR level for induction and constitutive expression and the ~20% FDR level for repression and intermediate expression, respectively) were as follows. For day 2 bradyzoites, Δ = 1.09 and FDR = 10.24% and Δ = 0.79 and FDR = 20.74%. For day 3 bradyzoites, Δ = 1.24 and FDR = 10.29% and Δ = 0.96 and FDR = 20.43%. For day 4 bradyzoites, Δ = 1.15 and FDR = 10.48% and Δ = 0.86 and FDR = 20.04%. The different criterion for repressed and intermediate genes was used due to the challenges of detecting decreases in transcript abundance, as described in Results, and the lower difference in transcript abundance (lower delta value) required to be considered intermediate. FDR values in the 10 to 20% range are expected to give reliable results: previous work characterizing the application of SAM found that an FDR of 18%, combined with a fold change threshold of 1.5, was very accurate (based on Northern blot confirmation) at identifying differentially expressed genes (33). Because SAM is designed to measure each spot on a microarray as a distinct gene, we were not able to obtain a single score for genes spotted multiple times. Data for each contig were therefore grouped, and the fraction of spots called significant by SAM was determined.
A total of 10 μg of total parasite RNA was separated by agarose gel electrophoresis. Probes were prepared from the corresponding ESTs in the in vivo bradyzoite cDNA library and hybridized with Express-Hyb (Clontech). The same blot was sequentially stripped and hybridized with each probe. The blot was exposed to a PhosphorImager cassette (Molecular Dynamics) and analyzed by using the ImageQuant analysis program (Molecular Dynamics).
To study the regulation of a large set of Toxoplasma genes simultaneously, we constructed cDNA microarrays according to previously described methods (15). An in vivo bradyzoite EST library (22) was used to PCR amplify DNA for spotting on the microarrays. A total of 4,224 ESTs from this library are represented on each microarray. Of the 2,353 ESTs for which quality sequence was available, 613 separate contigs have been identified, 378 of which are represented by two or more ESTs. The redundancy of this library allows expression levels of genes represented by multiple ESTs to be measured at least twice on a single array. The 1,871 spots corresponding to unsequenced ESTs are expected to include a large number of additional genes not represented in the sequenced EST set. To augment this EST set, 95 highly represented ESTs from a tachyzoite library (1) were spotted, in duplicate, on the microarrays. The in vivo bradyzoite library (generated from cysts from infected mouse brains) contains a low level of inserts derived from mouse cDNA due to host contamination during RNA preparation (22). Such spots (144 [3.4%] of the total spots), identified by probing the microarrays with Cy5-labeled mouse and human cDNAs, were excluded from the differentiation time course analysis. Such spots could be true host genes or Toxoplasma orthologues that are so highly conserved that substantial cross-hybridization occurs.
Although extremely powerful, microarrays have technical limitations. For example, they will not distinguish differences that are a result of changes in transcriptional activity versus subsequent decay rates. Throughout this study, we use the terms “induced” and “repressed” gene expression to refer to changes in mRNA levels without meaning to imply that the change is due to synthesis versus decay. Genes that are members of highly conserved gene families can give misleading results because of cross-hybridization and cannot be distinguished by using cDNA microarray technology since hybridization by transcripts from another family member could dampen, obscure, or invert the fold change values, depending on relative transcript abundance in the different RNA preparations. Note, however, that Northern blots are also subject to this potential artifact if the mRNA sizes are the same for the two genes. Absolute proof, therefore, for differences predicted by the microarrays depends on detailed studies of each gene by using methods such as real-time PCR with gene-specific primers. The fact that the gene family must be highly similar for cross-hybridization (>80% identical at the nucleotide level) means this caveat will apply rarely; we note all such instances wherever there are indications that this situation might exist.
Changes in gene expression were monitored by comparing mRNA levels of in vitro tachyzoites to mRNA levels of parallel cultures induced to develop into bradyzoites by growth in high-pH media (28). Total RNA was isolated from intracellular tachyzoites 2 days postinvasion and from developing bradyzoites at 2, 3, and 4 days after bradyzoite induction. Later tachyzoite time points were not analyzed because parasites began to lyse the host cells by 3 days postinvasion and were therefore subject to extracellular conditions which would likely affect transcript levels for many genes. Day 2 tachyzoites therefore provided the best sample for comparing transcript levels between tachyzoites and bradyzoites. Pru strain parasites containing an LDH2-GFP fusion construct were used in these experiments, and the efficiency of bradyzoite induction was assessed based on GFP expression. Induction efficiencies were ~75% (day 2 bradyzoites) and ~85 to 90% (day 3 to 4 bradyzoites) (data not shown); these levels are comparable to what has been reported for in vitro differentiation (28). Two in vitro differentiation time course analyses were performed, yielding two independent RNA samples for all time points.
Since the parasites are grown in human foreskin fibroblast cells, some degree of human RNA will inevitably be present in the parasite RNA preparation. To determine the amount of such contamination, the fluorescence intensity of the human and Toxoplasma large-subunit rRNAs was assessed in SYBR green-stained agarose gels and used to normalize the samples, such that equivalent amounts of Toxoplasma RNA were labeled with the Cy5 fluorophore during cDNA synthesis. The labeled cDNAs were mixed with a Cy3-labeled reference sample and hybridized to the microarrays. The reference sample consists of the pBluescript polylinker amplified with T3 and T7 primers. These are the same primers used to amplify the spotted cDNAs, so all spots should hybridize to the reference probe in proportion to the molar amount of DNA present in the spot.
Fold changes in the bradyzoite versus tachyzoite expression levels for each spot can be calculated by dividing the ratio of signal intensities for bradyzoite versus reference by the ratio for tachyzoite versus reference. Data obtained with the reference sample, termed type II experiments (31), yielded fold change results (correlation coefficient = 0.83) similar to those of experiments in which Cy3-labeled tachyzoite and Cy5-labeled bradyzoite cDNAs were hybridized on the same microarray, type I experiments (Fig. (Fig.1).1). This demonstrates the validity of using the vector-derived reference, which is considerably more powerful since it allows us to normalize signal intensities for each array against a common reference and thus make direct comparisons between time points. It also allows us to compare signal intensities between spots on the same microarray and to perform the SAM statistical analysis.
To determine the reproducibility of the microarray data, RNA samples from both time courses were hybridized twice to microarrays to give four values for each spot at all time points. The overall microarray results were highly reproducible, both within and between the two time courses. The average correlation coefficient values for all pairwise comparisons of replicates within a given time point were as follows: 0.72 ± 0.06 (day 2 tachyzoites), 0.71 ± 0.06 (day 2 bradyzoites), 0.85 ± 0.02 (day 3 bradyzoites), and 0.77 ± 0.05 (day 4 bradyzoites). The lower correlation values for the day 2 tachyzoites and day 2 bradyzoites likely reflect the fact that many genes on the microarray are expressed at low levels in tachyzoites and early bradyzoites which can result in more variable measurements (33). To determine which genes showed significant changes in transcript levels, we combined fold change criteria (based on the bradyzoite versus tachyzoite expression levels) with statistical analysis of differences in transcript abundance, as described in Materials and Methods and below.
The average bradyzoite versus tachyzoite transcript level for each gene, represented by either a single EST or a contig of multiple ESTs, was used to classify genes as induced, repressed, constitutive, or intermediate with regard to changes in transcript levels during differentiation. Genes considered induced during tachyzoite-to-bradyzoite differentiation have a mean fold change (mean of all ESTs within a contig) of ≥1.95 and a mean minus standard error value of ≥1.50. The standard error of the mean was used in these measurements to account for differences in the number of spots for each gene that are present on the array. Genes considered repressed have a mean plus standard error value of ≤0.66. This value was chosen based on the prediction that the stability of tachyzoite transcripts in differentiating parasites and low levels of tachyzoite contamination (switch efficiency is 95% at best, as described above) will make it difficult to detect repression ratios of ≤0.50. We were also interested in identifying constitutively expressed genes. Genes in this category have a mean bradyzoite-to-tachyzoite ratio of between 0.75 and 1.25. For these genes, the mean minus standard error value must be >0.66 and the mean plus standard error value must be <1.50. Contigs that do not overlap with the induced, repressed, or constitutive cutoffs when the standard error value is added or subtracted from the mean are referred to as intermediate to distinguish them from contigs that were not placed in any category due to large standard error values. Averaging of bradyzoite/tachyzoite ratios for all spots within a contig allows for multiple, independent measurements to be considered for most contigs (although some multiply represented contigs occasionally yielded only one good spot). This gives greater confidence to the results although, as expected, not all ESTs in a contig gave exactly similar results. One explanation for this variability is that the 3′ ends of mRNAs are converted to cDNA more consistently by reverse transcriptase. For this reason, the 3′-most ESTs will give more consistent results.
Variability could also be due to contamination or misidentification of spots. To determine the frequency of such contamination or database errors, we hybridized a Cy5-labeled BAG1 probe to a microarray and compared the hybridizing spots to those identified as BAG1 in the microarray database (based on EST sequences). BAG1 was used as a probe because it is highly represented in the in vivo bradyzoite EST library used to make the microarrays and therefore provides a large sample to use in the validation of spot identity. Of 70 spots that were bound by the BAG1 probe (and for which sequence data were available), only 1 corresponded to an EST that was not labeled as BAG1, indicating that the level of contamination and/or database error is very low.
To determine the statistical significance of the fold change measurements, we used the SAM software program (33). Developmentally regulated contigs were defined as those that meet our fold change criteria and have ≥50% of spots within the contig called significantly different by SAM (for induced, repressed, and intermediate genes). To be considered “constitutively” expressed, genes must meet our fold change criteria and have <50% of spots called significantly different by SAM at all three time points. Results for singleton ESTs were analyzed by using the same SAM parameters applied to the contigs. For these ESTs the single spot must meet both the fold change and the SAM criteria.
Based on the above criteria, we identified 39 genes, represented by 37 contigs (including six pairs in which two contigs cover different regions of the same gene) and 8 singleton ESTs, that were induced at one or more time points and 20 genes, represented by 19 contigs (including two pairs) and 3 singleton ESTs, that were repressed at one or more time points. We also identified 312 genes that were “constitutively” expressed at all three time points (represented by 182 contigs and 130 singleton ESTs). Although several genes with intermediate levels of induction or repression were identified, we have chosen to focus on only those genes where changes in transcript abundance are dramatic (i.e., meet the induced or repressed fold change cutoffs). Fold change and SAM results for all 466 contigs for which data were obtained and the 11 ESTs that are developmentally regulated are available from our supplemental website: (www.stanford.edu/~mdcleary/bradyzoitetimecourse.xls).
As a further check on the microarray approach, we selected several genes for independent analysis by Northern blot assay. Figure Figure22 shows the results for two induced genes (SRS9, Ctoxoqual_4130, and a mucin domain-containing protein, Ctoxoqual_3897), a repressed gene (Ctoxoqual_2199), and a constitutive gene (GRA2, Ctoxoqual_604). In each case, the Northern blot supports the designation made by microarray analysis. However, the fold changes for SRS9 and Ctoxoqual_2199 appear by Northern blot to be considerably higher than those predicted by microarray analysis. This is due to the absence of a signal in one of the lanes, making it impossible to give a ratio for SRS9 or Ctoxoqual_2199, but the fold change is clearly more than the ~3-fold predicted by microarray. This underestimating aspect of microarrays has been previously noted (see, e.g., reference 17) and is likely due to the fact that some background hybridization occurs for each spot on the microarrays, providing a denominator even when the number of transcripts may be below the limit of detection by Northern blot. These results show that the microarrays correctly identify genes as up- or downregulated, but the degree of induction or repression (i.e., the ratio value) should be taken as a relative rather than literal statement of the change in abundance for a given transcript.
For SRS9, the results may also be affected by the fact that this gene is part of a gene family, some members of which are highly conserved (20). Although no gene with the requisite similarity to SRS9 has yet been identified, such could conceivably exist and distort the results.
Previously described stage-specific genes were clearly identified as such by microarray analysis. We examined genes for which published Northern blot or reverse transcription-PCR (RT-PCR) data indicate developmental changes in transcript abundance: BAG1, SAG4A, LDH2, SAG1, and NTP1. As expected, all were differentially expressed according to the microarray analysis (Table (Table1).1). We also observed the expected changes for genes encoding differentially expressed proteins whose regulation has not previously been studied at the transcript level (e.g., SAG2C/D and SRS2). Ctoxoqual_4436 encodes a previously described deoxyribose phosphate aldolase homologue (8) that was identified as bradyzoite specific in the previously reported subtracted bradyzoite library (36). Of the five contigs identified in the subtracted bradyzoite library, two, Ctoxoqual_819 and Ctoxoqual_4140, were identified as constitutively expressed by our microarray analysis. The reason for this discrepancy is not known but could include differences in the parasite strains and/or switch conditions used.
In addition to the 11 contigs that represent previously known stage-specific genes, an additional 48 contigs were identified as developmentally regulated, i.e., induced or repressed during bradyzoite differentiation (Table (Table2,2, and supplemental websiteatwww.stanford.edu/~mdcleary/bradyzoitetimecourse.xls).Putative functions were assigned to some of these genes based on previously performed BLAST homology searches (Toxoplasma EST database at http://ParaDB.cis.upenn.edu/toxo/index.html). Among the singleton ESTs that encode developmentally regulated genes, only tgzz64d12.r1 (GenBank AA520817) had homology to a known gene, hypothetical human protein FLJ10420, with a tBLASTx score of p = 10−22.
To obtain additional sequence information for use in homology searches, we performed 5′ and 3′ RACE (rapid amplification of cDNA ends) on bradyzoite cDNA with primers based on the EST sequences of 20 induced contigs. Homologies obtained by these two methods revealed several previously unrecognized aspects of bradyzoite biology. The set of induced genes include three novel putative surface proteins: a new SAG1-related sequence, SRS9 (Ctoxoqual_ 4130); a mucin-domain containing homologue of a Cryptosporidium parvum surface protein (Ctoxoqual_ 3897); and a homologue of a Plasmodium falciparum glutamic acid dipeptide repeat family of surface proteins (Ctoxoqual_4245). The tBLASTx values for the predicted open reading frames of these genes (based on available sequences, not the entire gene) are as follows: SRS9, p = 2 × 10−20 (homology to BSR4, one of the bradyzoite-specific members of the SAG1-related gene family, where p is the probability of homology occurring by chance); Ctoxoqual_3897, p = 5 × 10−39 (homology to mucin domains I and II of the C. parvum GP900 protein); and Ctoxoqual_4245, p = 10−11 (homology to the RESA H3 antigen of P. falciparum). The predicted protein sequence of the Ctoxoqual_4245 contig also contains a series of conserved VEE repeats that are found in the D260, RESA-H3, and LSA-3 proteins of P. falciparum.
Three genes encoding protein-modifying and/or metabolic enzymes were also identified among the induced contigs. A methionine aminopeptidase homologue (Ctoxoqual_4080, p = 2.3 × 10−28, homology to D. melanogaster methionine aminopeptidase), a bacterial oligopeptidase homologue (Ctoxoqual_1284, p = 1.9 × 10−11, homology to Lactococcus lactis oligopeptidase F), and a contig with homology to members of the class II pyridoxal phosphate-dependent aminotransferase family (Ctoxoqual_4095, p = 4 × 10−22, homology to 8-amino-7-oxononanoate synthase of Synechocystis sp.) were all induced in bradyzoites. The remaining induced contigs had no significant homology to any known genes.
Microarray analysis identified several genes whose transcript levels decrease during bradyzoite development. This set includes several previously described genes that were not known to be developmentally regulated: ROP1, ROP2, ROP4, GRA1, GRA5, GRA8, and MIC1. A glucose-6-phosphate dehydrogenase (G6PD) homologue (p = 2 × 10−20, homology to Caenorhabditis elegans G6PD) was also shown to be repressed in bradyzoites (Table (Table2).2). The other repressed contigs did not display any significant homology to known genes.
Constitutively expressed genes include many that encode housekeeping proteins, such as actin and ribosomal proteins. For example, 13 of the 18 contigs encoding different 60S ribosomal proteins were constitutively expressed at all three time points. None of the other five contigs fell into the repressed or induced categories at any time point; instead, they were expressed at intermediate levels or could not be placed in any category due to large standard error values.
About 40% of the clones spotted onto the microarray were not sequenced in the original EST sequencing effort (22). Therefore, to identify additional genes of interest, we sequenced cDNA clones corresponding to spots that were induced by microarray analysis. To avoid sequencing previously known bradyzoite-specific genes, we first hybridized the microarrays with Cy5-labeled fragments of known genes that are highly represented on the arrays (BAG1, LDH2, ENO1, MAG1, Ctoxoqual_4436, SAG4A, and SAG2C/D) and excluded any spots that were bound by these probes. Of 273 previously unsequenced spots that showed >2.0-fold induction in a single day 3 bradyzoite versus day 2 tachyzoite type II experiment, 55 were picked for sequence analysis and 52 yielded quality sequence. Five of these were novel sequences with no significant BLASTn homology to sequences in the Toxoplasma EST database and no significant BLASTx homology with any known protein. These sequences have been deposited in GenBank (under accession numbers BM351821, BM351822, BM351823, BM351824, and BM351825). Of the other 47 clones, 7 contained ESTs of known genes that the prescreen hybridization probes failed to detect (probably because the amount of DNA for those spots was low on the prescreen arrays). Of the remaining 40 clones, 8 overlapped with singletons from the Toxoplasma EST database and 32 overlapped with contigs represented elsewhere on the microarrays but not used as a probe in the prescreen hybridizations. A total of 17 of these 32 contigs are in the set of contigs we found to be induced in bradyzoites. The remaining 15 were not part of the induced set and thus represent contigs that appeared to be induced in one microarray experiment but, upon averaging across multiple arrays, failed to meet the stringent criteria we set for induction. This demonstrates the importance of the repeat experiments and statistical analysis in determining which contigs give reproducible and significant results.
To identify coordinately regulated genes, we clustered contigs based on patterns of significant changes in gene expression. All genes used for clustering met, at one or more time points, the fold change criteria for induction or repression, with ≥50% of spots called significant by SAM. At time points where the fold change values did not meet our criteria for induction or repression but ≥50% of the spots were determined to be significant by SAM, the data were included to show intermediate values and values very close to the fold change cutoffs. This clustering approach revealed several distinct classes of regulated genes (Fig. (Fig.3,3, rightmost columns). Genes that are induced in bradyzoites fell into five general categories: transient (day 2 only, days 2 and 3, or day 3 only), mid-late (days 3 and 4), late (day 4 only), all (days 2, 3, and 4), and all-high (levels >3-fold at days 2, 3, and 4). Clustering of genes whose transcript levels decrease during bradyzoite development resulted in three groups: early (down in day 2, 3, and 4 bradyzoites), middle (down in day 3 and day 4 bradyzoites), and late (down only in day 4 bradyzoites).
One unique advantage of our “type II” experimental design is the ability to compare signal intensities between spots on the same microarray using the common vector-derived reference. Although oligonucleotide microarrays can provide a measure of transcript abundance (10), spotted cDNA microarrays are not normally used to obtain this information due to variability in the amount of DNA in each spot. In our experiments, the cDNA signals are normalized to a reference DNA fragment common to all spots, thus allowing comparison of signal intensities between contigs on the same microarray. One caveat of this approach is that the insert size will affect the amount of reference DNA that can bind to equivalent mass amounts of DNA in a spot. Given that all inserts in the bradyzoite library were size selected within a range of 0.7 to 2 kb (22) and that signal intensities for a contig are based on the average value for ESTs of various lengths, it is unlikely that insert size will have a significant effect on these measurements. Another consideration is the effect of mRNA secondary structure on the length and amount of cDNA produced for a given gene. Again, measurement of multiple ESTs within a contig will overcome this problem in many cases. To test the validity of using the cDNA versus reference values to predict differences in transcript abundance between genes, we compared these values to the EST frequencies of given genes within a tachyzoite cDNA library (1), where the frequency of ESTs for a given gene correlate with the abundance of its transcript, since the library was neither amplified nor normalized. We examined the ME49 strain tachyzoite data from this library to randomly choose 20 contigs that fall into each of two classes: those that contain zero tachyzoite ESTs and those that contain 2 to 3 tachyzoite ESTs. A third class that includes all 14 contigs spotted on the arrays with at least 8 tachyzoite ESTs was also analyzed. For each contig, we determined the average tachyzoite cDNA versus reference ratio.
The results (Fig. (Fig.4)4) show good agreement with the predicted correlations, with the rarest EST class having the lowest average ratio and the most frequent class having the highest average ratio. Differences in the signal intensity values of the three groups were very significant: P < 0.0001 (unpaired Kruskal-Wallis test). Thus, the cDNA versus reference ratio is indeed a predictor of the abundance of a given transcript relative to other genes at the same time point. We cannot define an absolute cDNA versus reference value that represents zero transcripts present. Northern blot analysis of the SRS9 transcript (Fig. (Fig.2),2), which had an average tachyzoite versus reference value of 0.56 (standard deviation = 0.07), suggests that average ratio values that are well below the median for all contigs signify little if any transcript is present at that time point.
The ability to measure transcript abundance at a given time point allowed us to determine the transcript levels of developmentally regulated genes prior to the induction of bradyzoite development. We calculated the tachyzoite versus reference ratio for all contigs shown in Fig. Fig.33 and compared these values to the median tachyzoite versus reference ratio for all contigs on the microarrays. The results (Fig. (Fig.3,3, leftmost column) show that overall there is an inverse correlation between transcript abundance in tachyzoites and subsequent changes during bradyzoite development. However, the transcript level for one gene, Ctoxoqual_3897, starts at a level above the median signal intensity in tachyzoites and significantly increases in abundance during bradyzoite development. This gene therefore appears to be expressed in both stages but is significantly induced in bradyzoites. For the day 2 tachyzoites, cDNA versus reference values for constitutive genes were widely scattered, ranging from 0.39 to 2.82. The fact that many constitutive genes had very high or very low abundance levels in tachyzoites indicates that this feature alone is not a predictor of whether a gene will be induced or repressed upon differentiation.
The bradyzoite versus reference values were used to further characterize the expression profiles of genes that are induced in bradyzoites. Within the classes identified based on timing of induction, contigs were grouped according to transcript abundance (Fig. (Fig.5).5). Using this type of analysis, distinct categories emerge from within the same temporal class. For example, within the “all” class of genes, although each gene has a positive slope for relative abundance (as a function of time during differentiation), the transcript levels in day 2 bradyzoites differ significantly. Transcripts of LDH2 and the “mucin domain” gene are already very abundant in day 2 bradyzoites and rise further, while transcripts of the oligopeptidase gene and several others have only median levels of abundance before rising through days 3 and 4. This type of distinction is not apparent when genes are clustered based on fold change values alone, as in Fig. Fig.3.3. Although the ENO1 and BAG1 genes have similar expression profiles in Fig. Fig.3,3, clustering based on transcript levels reveals that the relative abundance of these two genes in bradyzoites is quite different (Fig. (Fig.5,5, mid-late class). The cDNA versus reference data for all contigs, for both tachyzoites and bradyzoites, are available at http://www.stanford.edu/~blader/ClearyetalmRNAabundance.xls.
Microarray analysis of in vitro bradyzoite development provides a global view of changes in transcript abundance during this important process. Using this approach, we have identified several novel developmentally regulated genes that fall into distinct classes based on the timing and magnitude of changes in their transcript abundance. Our method of microarray analysis combined fold change criteria, the statistical power of the SAM technique, and analysis of differences in transcript abundance between genes at a given time point.
The ability to identify differentially expressed genes was confirmed by Northern blot analysis and corroborated by the fact that genes previously reported to be induced or repressed were identified as such by the microarrays. Our analysis also provided information regarding genes whose protein products were known to be differentially expressed but whose transcript levels had not been investigated. Tachyzoite-specific expression of the SRS2 surface protein, for example, has been demonstrated by using immunofluorescence analysis (20), but the nature of this regulation was previously unknown. The microarray data suggest that SRS2 is transcriptionally regulated. This distinction is important considering that some stage-specific regulation in Toxoplasma is known to occur at the posttranscriptional level (e.g., BSR4 and LDH1) (19, 37).
The novel developmentally regulated genes identified by this study reveal several interesting aspects of bradyzoite biology. These genes can be divided into three general categories.
The discovery of a new SAG1-related sequence family member that is expressed only in bradyzoites, SRS9, brings to 21 the total number of genes in this family and adds one to the subset that are bradyzoite specific (SAG2C/D and BSR4 have been previously described as bradyzoite specific; see Lekutis et al.  for a review of this gene family). Transcriptional regulation of SAG family members during bradyzoite differentiation represents an intricate regulatory system that is believed to be crucial to the ability of this parasite to establish a chronic infection.
The mucin domain-containing protein that we identified is interesting given the role of a close homologue in host cell invasion in a related apicomplexan, Cryptosporidium parvum (2), and the observation that highly glycosylated mucin domains provide protection from degradative enzymes in the gut (34). The Toxoplasma mucin domain protein may be involved in epithelial cell invasion by bradyzoites or may function as a protective barrier. The tachyzoite transcript level data (Fig. (Fig.3)3) and Northern blot data (Fig. (Fig.2)2) showed that this gene is exceptional in being expressed at above-median levels in tachyzoites before being substantially upregulated in bradyzoites, so a role for this protein in tachyzoites cannot be ruled out. The identification of an additional potential surface protein induced in bradyzoites, a homologue of a P. falciparum surface protein (7), suggests that bradyzoites express a diverse array of surface antigens, including genes that are not part of the SAG1-related family. Antibody localization studies are necessary to confirm that these putative homologues are present on the surface of bradyzoites.
Three genes encoding homologues of protein-modifying enzymes were found to be induced in bradyzoites. These enzymes may serve to regulate bradyzoite-specific metabolic or signaling pathways. Methionine aminopeptidase (MAP) regulates protein biosynthesis by cleaving the amino-terminal methionine during the processing of many nascent proteins. Differential expression of MAP during the cell cycle has been described for the dinoflagellate Alexandrium fundyense (32). Induction of MAP in bradyzoites may alter protein biosynthesis through either methionine cleavage or inhibition of initiation factor 2a phosphorylation, another function of MAPs in some eukaryotes (18). The oligopeptidase induced in bradyzoites may be involved in regulation of bradyzoite-specific metabolic pathways, as is the case for bacterial oligopeptidases that modify enzymes in the lactose metabolism pathway (23). The third induced contig in this group, Ctoxoqual_4095, has homology to conserved regions of class II pyridoxal phosphate-dependent aminotransferases. These proteins are involved in a range of processes, including amino acid metabolism and sphingolipid synthesis (6).
The decreased expression of the G6PD gene in bradyzoites reinforces arguments for substantial differences in sugar metabolism compared with tachyzoites. Glucose-6-phosphate dehydrogenase is a regulatory enzyme that initiates the pentose phosphate pathway, and downregulation of G6PD will shift the flow of glucose-6-phosphate toward glycolysis or gluconeogenesis/amylopectin synthesis. Changes in glycolytic activity have previously been predicted based on the upregulation of pyruvate kinase and lactate dehydrogenase in bradyzoites (9).
The group of genes that is repressed in bradyzoites includes several that encode proteins that are targeted to the unique secretory organelles of apicomplexan parasites. Of this set, only NTP1 has previously been shown to be downregulated during bradyzoite development (24). For the GRA1, GRA5, and ROP1 proteins, previous studies have shown only that these are qualitatively present in bradyzoites. Thus, microarray analysis may be a more sensitive and quantitative method for detecting subtle changes compared to immunofluorescence staining, although changes in transcript and protein abundance will not always parallel each other. The finding that transcript levels of a subset of GRA genes decrease during bradyzoite development, while the transcripts for GRA2, GRA3, GRA4, GRA6, and GRA7 are constitutively expressed, shows that selective regulation of GRA genes occurs during differentiation.
The regulated expression of genes encoding rhoptry proteins, if reflected in the protein levels, may explain structural differences in the rhoptries of tachyzoites and bradyzoites. The rhoptries of tachyzoites appear mottled by electron microscopy, while bradyzoite rhoptries are extremely electron dense (16), a phenotype that was also observed in ROP1-knockout tachyzoites (30). This developmental change could, therefore, be due to decreased expression of ROP1 in bradyzoites.
The expression profiles obtained by microarray analysis allowed us to identify distinct classes of temporally regulated genes. Predictions regarding the role of these genes in Toxoplasma development and physiology can be made based on these profiles. For example, genes that fall into the transient class may be essential to the developmental process, while genes that group in the mid-late and all time point classes likely encode proteins responsible for bradyzoite survival, including metabolic enzymes such as enolase and stress response proteins such as BAG1. A crucial role for the surface proteins SAG2C/D and SAG4A and the unknown protein encoded by the Ctoxoqual_3905+4432 contig is suggested by the early induction and continued expression of these genes at high levels (all-high class).
The ability to compare transcript levels between genes at the same time point, as shown for tachyzoites in Fig. Fig.33 and bradyzoites in Fig. Fig.5,5, allows further refinement of the expression profiles. Microarray data are usually presented as clusters of genes that are grouped based on fold change in transcript levels over time. In such cases, the data reveal changes relative to the starting time or condition, but they do not provide information on whether a given gene's transcripts are abundant or rare relative to other genes. We have developed a technique that determines such relative transcript abundance at each time point, thus allowing distinct expression patterns to emerge from clusters of genes that may otherwise appear to be coordinately regulated using the fold change criteria. Differences in the amount of transcript present for each gene suggest distinct regulatory mechanisms due, for example, to different promoter strengths or different mRNA stabilities. Relative abundance data can also be used to distinguish between “stage-specific” genes (i.e., significant transcript levels in only one stage) and “up- (or down-) regulated” genes (substantial transcript levels in both stages but higher in one than the other). One example of a gene with an upregulated expression profile, not shown in Fig. Fig.3,3, is the gene encoding a cyst wall antigen, MAG1. This gene is expressed at levels above the median value in tachyzoites, with a tachyzoite versus reference value of 1.88, and is then induced in bradyzoites at intermediate lev-els(www.stanford.edu/~mdcleary/bradyzoitetimecourse.xls).These data are supported by previous RT-PCR studies that detected MAG1 transcripts in tachyzoites and found subsequent increased levels in bradyzoites (26).
The identification of novel developmentally regulated genes and of distinct patterns of gene expression provides a comprehensive view of the changes that occur during bradyzoite development. Genes whose transcript levels increase in bradyzoites represent promising targets for a reverse genetics approach to identifying genes essential for bradyzoite development through targeted disruption. Expression profile data will allow us to focus on genes that are induced early and at all time points as targets that may prove central to the development and/or survival of bradyzoites. Future microarray studies will examine changes in gene expression under additional conditions that induce bradyzoite development, as well as gene expression in other developmental stages (e.g., sexual stages) of T. gondii, as and when parasites in these stages are obtainable in the necessary quantities.
The identification of distinct classes of regulated genes will make it now possible to search for common regulatory sequences. For example, the analysis of genes with very high transcript levels and coordinate expression over time may reveal conserved sequence elements in their promoters or untranslated regions, as has been shown for groups of coordinately regulated genes identified by microarray analysis of sporulation in yeast (8). Ongoing T. gondii genome sequencing efforts will facilitate such analysis as the sequences needed to identify consensus motifs become available.
M.D.C., U.S., and I.J.B. contributed equally to this work.
We thank David Sibley for providing the 95 tachyzoite ESTs; John Storey for help with the statistical analysis; Kimberley Chong for help with the microarray printing; John Matese, Gavin Sherlock, and the Stanford Microarray Database for bioinformatic support and guidance; and members of the Boothroyd laboratory for helpful discussion.
M.D.C., I.J.B., and U.S were supported in part by the NIH (CMB GM07276, 1F32AI10478-01, and K08 AI01453, respectively). U.S. is a Burroughs-Wellcome Fund Career Development Awardee. J.C.B. was supported by the NIH (AI45057).