This study reports our findings on changes in gene expression between those stages of T. brucei
that can be readily studied in the laboratory. During natural infections in the vertebrate host, T. brucei
progresses from the actively dividing slender BF to the non-dividing stumpy BF. This form is primed for differentiation to procyclic forms in the insect host. We examined the expression level of nearly all predicted protein-coding genes of strain 927 T. brucei
, as well as many RNA genes, in five different parasite populations. Most of the analyses reported here are restricted to nuclearly encoded transcripts, although some findings for mitochondrially derived transcripts are noted. Three biological replicates were used for each test condition. Three of the five sets were derived from in vitro
culture under highly standardized conditions: log-phase cultured bloodstream forms (cBF), log-phase cultured procyclic forms (PF-log), and stationary-phase procyclic forms (PF-stat) (see Additional file 1
). The remaining two samples, slender BF and stumpy BF, were derived from infected rats and were likely to show more inter-sample variability, since they were grown for varying times in rats, and the animals became progressively less healthy after irradiation and infection. Since the stumpy BF populations were the last to be harvested and each contained a variable proportion of intermediately differentiated forms, these populations were the most biologically variable. They ranged from 76% to 93% morphologically stumpy and 68% to 88% of cells expressed the PF marker EP procyclin upon appropriate stimulation (see Table ). The slender BF populations showed less than 5% intermediate or stumpy forms and less than 1% expressed procyclin after stimulation.
Characterization of bloodstream form T. brucei from animals
The Nimblegen arrays that were hybridized with cDNA prepared from each parasite population contained multiple probes per gene (in most cases, eight), and three copies of these probe-sets per chip. After normalization to allow for cross-chip comparisons, we calculated a single value for each set of triplicate probes using Tukey's biweight formula. We then obtained a single gene level value using the Tukey biweight of the signals for the probes corresponding to each gene (Additional file 2
). In most cases, this value represented the "average" of 72 data points collected for every protein-coding gene (45 for RNA genes) in each of the five growth conditions. Thus, as detailed in Methods, we were able to utilize robust statistical analyses that increase our confidence in the findings of the comparisons reported below.
Gene expression levels
The normalized expression values from the 8110 probe-sets corresponding to nuclear genes were hierarchically clustered using the TMeV software package and are shown graphically as a heat map in Figure . A small number of genes (those colored black or dark blue in the middle of the figure) showed no or very low expression under any condition, and a similar number (colored red) showed high expression levels in some (top) or all (bottom) conditions. However, the large majority of the genes showed low (blue) or moderate (green or yellow) expression levels. Each biological sample set showed a similar distribution of signal intensities obtained for the 8004 probe-sets corresponding to nuclear protein-coding genes, with the curves obtained from cBF and PF-stat (the most divergent samples) shown Figure . This figure also depicts the signals obtained for the 345 probe-sets that map to multicopy CDSs. This curve was skewed dramatically to the right, as compared to the total sample of probe-sets, confirming gene amplification as a clear strategy for increasing gene expression in T. brucei. Indeed, 20 of the 50 probe-sets detecting the highest signals map to two or more genes (see below and Table ). However, the signals did not follow in rank order of number of genes detected by the probe-sets.
Figure 1 Distribution of different expression levels for all T. brucei nuclear genes. A. Heatmap showing the signals obtained for cBF, slender BF (SL-BF), stumpy BF (ST-BF), PF-log), and PF-stat. B. Density plot of gene level intensities for nuclear protein-coding (more ...)
The 50 most highly expressed protein-coding genes
For the total probe-sets, there is a large peak centered at ~4600 for the PF-stat and ~5800 for cBF. In both cases, the peak moves sharply down towards lower values, with a small shoulder at ~200. When the same arrays were probed with RNA derived from a different strain, this shoulder was more pronounced, and a corresponding increase in probe-sets with signal intensities less than 200 was observed (unpublished data). Since many of these probe-sets mapped to VSG
s, most of which are not conserved between strains, a signal level of ~200 can be taken as a generous estimate of background for non-transcribed regions. However, almost all regions of the T. brucei
genome are thought to be constitutively transcribed, and hence even those genes whose mRNAs are unstable would likely show a signal higher than this background. The number of probe-sets that failed to show a signal level of less than four times the "non-transcribed background" signal (i.e
., 800) in least one of the five different biological conditions was small (164 protein-coding genes). Even then, VSG
genes, which are subject to clonal variations in expression, accounted for all but 65 of this set. The majority (55) of the remaining genes have unknown function, with most of these (44) found only in T. brucei
, raising the possibility that they do not represent authentic genes. In addition, almost all of these low-expressing genes are located in sub-telomeric clusters of VSG
s or expression site associated genes (ESAG
s) or are immediately adjacent to convergent strand-switch regions where transcription terminates [15
]. Even when considering only cBF parasites, only 133 predicted non-VSG
protein-coding genes had a signal <800, and of these only 22 had annotated function. Notably, most of these genes were expressed to higher levels in at least one other stage, and those that were low in all stages were located in a sub-telomeric or convergent strand-switch region context.
The protein-coding genes were ranked according to their maximal expression level in any stage, and broadly categorized according to their annotated function (Figure ). As suggested above, VSG genes and T. brucei-specific genes of unknown function accounted for most of the genes showing the lowest expression (86% of the bottom 2% and 64% of the bottom 5%). Conversely, genes which are more highly expressed tend to have some type of functional annotation; indeed many of these have been studied experimentally.
We identified the nuclear CDSs with the top 10% of signals for each biological condition. The genes were individually examined and placed into categories based on their annotation and/or their proteomic detection in specific sub-cellular fractions. Figure shows the distribution of these genes into broad categories as in Figure , while Figure shows the distribution of genes with ascribed function (yellow in Figure ) according to various categories of biological function or location for the each of the five different biological conditions. As indicated in Figure , above, genes with unknown function were under-represented in the highly expressed category, as compared to the whole genome, and this was most striking in the PF-log cells (Figure ). Overall, the categories with the largest numbers of highly expressed genes were translation, metabolism, or cytoskeleton, with the majority (99 out of 144) of the genes in the translation category encoding cytoplasmic ribosomal proteins. Genes involved in mitochondrial function were highly represented in the top expressors in the PF stages, but more modestly represented in BF stages. As expected, significantly higher proportions of ESAGs or genes related to them (GRESAGs) were expressed at high levels in all BF stages than in the PF stages. In general, relatively similar proportion of genes were found in each category in all biological stages, although the number of genes involved in metabolism was increased in PF-log and genes encoding cytoskeleton proteins decreased in stumpy BF. However, this does not mean that the same genes are expressed to high levels in all stages. For example, 13% of mRNAs highly expressed in cBF were more than 2-fold up-regulated as compared to PF and similarly 12% of mRNAs highly expressed in PF were more than 2-fold up-regulated as compared to cBF (see below).
Figure 2 Distribution of functional gene category in the top 10th percentile of expression values in each biological condition. A. Distribution of the functional gene categories as described in Figure 1C. B. Further breakdown of functional category. Only those (more ...)
Among the 800 most highly expressed nuclear CDSs in cBF, 307 were highly expressed in every condition examined. Surprisingly, 111 of these had unknown function. Thus, there is a large set of highly expressed T. brucei
-specific (20) or conserved (91) genes that have not been ascribed a function; including nine of the 50 most highly expressed genes in cBF (see Table ). Some of the "hypothetical" proteins encoded by these nine genes have been shown to exist by proteomic analyses (Tb11.01.2800, Tb10.6k15.1500) or other studies (Tb927.1.4600 [13
]). Conversely, two other "hypothetical genes" lie in intergenic regions between coding regions that are highly expressed, raising the possibility that these are not separate genes, but rather simply sequences within 3' UTRs of the neighboring genes. The first, as represented by Tb927.1.2540, is one of a set of seven almost identical putative genes (six of which are annotated as "hypothetical protein, unlikely") that are interspersed between the histone H3 genes. The second, as represented by Tb927.1.4590, corresponds to a set of genes that are interspersed between the highly expressed CFB1
genes. Tb11.1190 encodes a putative protein which is composed of 49 repeats of 68 amino acids and is represented on the microarray by a single probe corresponding to a unique sequence at the C-terminus. Tb11.02.4690 specifies a 22 kDa protein with a signal sequence and three transmembrane domains. It is expressed to a much higher level than the flanking genes, indicating it is a distinct mRNA. The last of these nine genes, Tb927.4.1000, encodes a 25 kDa protein that is also expressed much more highly than the adjacent genes.
Differential gene expression
Comparison of the Tukey mean maximum and minimum signal levels for all probe-sets corresponding to nuclear genes revealed 122 genes that showed a greater than 10-fold change between two or more of the five biological conditions tested. Of these, 30 were VSG
) genes, or ESAG/GRESAG
s, many of which are associated with antigenic variation. A total of 446 genes (including 161 VSG
s and ESAG
s) showed more than 4-fold variation, while at the 2-fold level, 2105 genes (including 233 VSG
s and ESAG
s) showed a statistically supported difference in expression. Thus, over one-fourth of all of the genes assessed on the microarray were differentially expressed (i.e
. q-value of <5% in multi-class significance analysis of microarrays (SAM)) between at least two conditions. This dataset was further reduced to those showing a >2-fold deviation from the mean of all five conditions in at least one sample, and by excluding all genes encoding VSG/VRs and ESAG/GRESAG
s (which were analyzed separately, see below). These 534 probe-sets were K-median clustered using settings indicated in the Methods to yield nine clusters which contained between 31 and 80 genes (Figure and Additional file 3
). Each of these clusters represents a distinct pattern of gene expression, although some are similar. Overall, these highly regulated genes are enriched in those involved in metabolism, proteolysis, translation and T. brucei
-specific unknown functions, but under-represented in those genes that are conserved but have unknown function. However, individual clusters show differential enrichment in particular functional gene categories.
Figure 3 Cluster analysis of highly-regulated mRNAs. The 534 genes showing two-fold deviation from the mean expression value in at least one of the 5 samples using multi-class SAM analysis were analyzed. ESAGs, GRESAGs, VSGs, and VRs were excluded from the analysis. (more ...)
Figure shows the heatmap depicting gene clustering, while Figure shows overlaid expression patterns graphically for every gene in each cluster. Figure depicts specific examples that illustrate the expression patterns characteristic of the gene clusters. Genes in clusters 5, 6, 7 and 8 all have higher mRNA levels in BF than in PF, although the clusters each show subtle differences in expression pattern. For example, cluster 8 contains genes that encode mRNAs substantially down-regulated in PF-log and PF-stat, whereas in cluster 6 the down-regulation in PF is more modest and some genes begin to decrease expression in stumpy BF. Cluster 8 is enriched in genes involved in metabolism and adenosine transport and also includes several genes encoding 64-65 kDa invariant surface glycoproteins (ISGs) and procyclin-associated genes (PAGs). Cluster 6 contains genes encoding three 75 kDa ISGs, several proteases, and a substantial number of T. brucei-specific proteins of unknown function. Cluster 7 contains genes that are somewhat down-regulated in PF-log (relative to BF), but are expressed at even lower levels in PF-stat. Genes encoding proteins involved in interaction, metabolism, protein folding and protein transport or modification, are overrepresented in this cluster. Conversely, cluster 5 contains 60 genes that are down-regulated to a greater extent in PF-log than in PF-stat. This cluster is dominated by genes encoding proteases and T. brucei-specific proteins with unknown function. The former category includes several paralogues encoding homologues of the Leishmania gp63 surface protease.
Clusters 2, 3, and 4 show different patterns of up-regulation with respect to the two PF biological conditions. Cluster 2 contains genes that are up-regulated in PF-log, but not in PF-stat. For some genes this change in expression begins in stumpy BFs. The genes in this cluster are over-represented for those encoding proteins involved in interaction, metabolism, RNA processing, transcription, translation and cytoskeleton function. Their reduced expression in PF-stat is consistent with cessation of growth functions upon entry into stationary phase. Indeed we observed a significant accumulation of rRNA precursors in the PF-stat samples upon analysis on the Agilent BioAnalyzer (not shown). Conversely, cluster 3 contains genes that are up-regulated only in PF-stat and mostly have unknown function, including eight that are T. brucei-specific. It is possible that some of these gene products are involved in preparation for differentiation into epimastigotes, the next stage in the parasite life cycle. Finally, cluster 4 contains genes with higher expression levels in both PF-log and PF-stat and is enriched in genes encoding proteins involved in metabolism, proteolysis, or with unknown function but located on the cell surface or mitochondrion. As discussed in more detail below, this is consistent with the switch to mitochondrial pathways for energy generation in procyclics.
Cluster 1 contains the largest number of genes, with 82 members. The expression of these genes is lower in PF-stat (and stumpy BF in some cases), but the genes are expressed at higher levels in cBF and slender BF than seen in cluster 2. These genes are over-represented in those involved in metabolism, DNA replication/repair, protein folding, proteolysis and translation, consistent with their down-regulation in stationary-phase cells. Cluster 9, with 31 members, is the smallest cluster. The pattern of gene regulation is similar to cluster 1, but with some up-regulation in PF-log. Like cluster 1, this cluster is enriched in genes involved in DNA replication/repair.
The existence of these varied expression patterns implies a complex set of regulatory mechanisms operating at the RNA level to control the abundance of transcripts encoded by nuclear genes. The specific proteins involved in these processes are only beginning to be examined (see for example refs. [13
In contrast to the analyses above that examined the transcripts showing the most variation in abundance, we also looked at the transcripts that showed the least variation. Genes such as these would provide excellent controls for studies of developmental changes in gene expression. We identified 830 genes with a maximum variation in expression between the five stages of <25% (see Additional file 2
). As expected, genes encoding proteins involved in known stage-regulated processes such as glycolysis and electron transport are significantly under-represented. However, the group is slightly enriched for genes of unknown function. Genes encoding proteins involved in lipid or fatty acid metabolism are also over-represented, comprising 28% of the metabolic enzymes that showed little variation as opposed to 11% of all metabolic enzymes. Similarly, genes encoding proteins of the ubiquitin pathway represent 40% of all protease-related genes, but 85% of the subset of protease-related genes that showed little variation. Genes involved in histone acetylation or chromatin structure, such as Tb927.7.1690 and Tb927.4.2520 (which encode transcriptional silencer Sir2) also tended to maintain similar mRNA levels between life cycle stages.
Comparison of cBF and log phase PF forms
In order to identify differences in gene expression between specific conditions, we conducted pair-wise comparisons of specific datasets (including cBF versus slender BF, slender BF versus stumpy BF, cBF versus PF-log, and PF-log versus PF-stat) using SAM, setting the q-value to <5% and the fold-change to >2 (see Methods). Because VSG expression is both clonal and highly variable, VSGs are excluded from the gene tallies below, unless otherwise noted.
Comparing the signals between cBF and PF-log, 691 genes were found to be differentially expressed. When the stringency of the SAM analysis was reduced to a 1.7-fold change, 963 genes were detected. A further reduction to 1.5-fold identified 1508 genes--approximately 19% of the genome. Thus, a relatively large fraction of the genome encodes mRNAs that differ in abundance between these two stages. Figure shows a comparison of the functional categories of the genes showing >2-fold regulation; these are individually listed (along with their fold-changes in mRNA expression and q-values) in Additional file 4
. Table itemizes those genes upregulated in cBF that have predicted functions (excluding VSG
s and ESAG
S, which are discussed below).
Figure 4 Pairwise comparison of gene expression in different biological conditions. Nuclear genes with >2-fold change in expression between the two conditions indicated (by either Tukey medians or SAM-calculated mean) were categorized into different functional (more ...)
Genes with functional annotation that are up-regulated in cBF as compared to PF-loga
As can be seen in Figure , categories of genes where cBF show higher expression than PF-log cells include ESAG
s and GRESAG
s, uncharacterized proteins bearing interaction motifs (such as zinc fingers and leucine-rich repeats), and known surface and secreted proteins (in part because there are multiple distinct genes in several surface protein families). However, it is also interesting that a larger number of genes upregulated in cBF encode proteins with hypothetical status (conserved or T. brucei
-specific) that have signal sequences. This fits well with the finding that the secretory and endocytic systems are more active in BF than PF [18
]. However, unlike Koumandou et al. [12
], we did not find that mRNAs specifying proteins involved in secretory traffic were highly up-regulated in BF.
Categories with more representatives up-regulated in PF-log cells include those encoding mitochondrial proteins, metabolic proteins, and translation. It is known that the metabolism of PF (which can use both glucose and amino acids for energy metabolism) is more complex than BF (which are highly glycolytic) [19
], presumably accounting for the more diverse set of metabolic genes up-regulated in PF-log cells. In PF, the mitochondrion becomes enlarged with more fully developed cristae and the respiratory chain is active [2
]. These changes are reflected in the increased expression of a large number of genes (72) encoding products associated with the mitochondrion, including 27 that are of unknown function. In contrast, a single gene known to encode a mitochondrial protein is upregulated in cBF: the alternative oxidase. This oxidase is required for the glycerophosphate shuttle that allows glycolysis to continue [20
The comparison of mRNA abundance between these two stages led to the identification of several groups of interesting genes. These include those encoding nucleoside transporters NT2-NT7 which reside in an array immediately adjacent to the sub-telomeric VSG
cluster at the "right" end of chromosome 2. There they alternate with a set of iron-ascorbate oxidoreductase genes (see Tb927.2.6180, Tb927.2.6230, Tb927.2.6310) that have not been functionally characterized to our knowledge. The NT
genes were reported to be more highly expressed in BF than PF forms, a finding which we also observe [21
]. Interestingly, all of these oxidoreductase genes are also significantly more highly expressed in cBF than PF-log (3.3-13.2-fold, see Table ). Three additional iron/ascorbate oxidoreductase genes are found on chromosomes 5, 7, and 9 -- these are each expressed to similar levels in cBF and PF-log. Thus, the chromosome 2 region represents a rare cluster of genes encoding similarly regulated mRNAs. Another interesting case is that of VSP1
, an acidocalcisomal pyrophosphatase encoded by two tandemly linked genes (Tb11.02.4910 and Tb11.02.4930) with almost identical coding regions [22
]. The array data show that the two genes are reciprocally regulated, which may potentially be traced to their divergent 3' UTRs.
Comparison of gene expression in BF under different conditions
We compared the expression of all nuclear genes in slender BF isolated from infected animals with the expression in slender BF obtained by in vitro
culture (cBF) and the expression in stumpy BF from animals. In comparison of cBF and slender BF, other than a few VSG
genes, no gene showed a difference in expression that met our criteria of a 2-fold change and q-value < 5%. Additional file 5
lists those genes that showed more moderate (>1.5-fold) or less well-supported changes (q-value < 15). However, two ISG64
genes (Tb927.5.1390 and Tb927.5.1430) showed a slightly lower (1.6-1.8-fold), but high-confidence increase in signal in slender BF. A few other genes showed similar (1.5-1.9-fold) changes in expression, but had somewhat lower confidence (q-value = 7.9). These included a CAMK group protein kinase (Tb927.7.6580), a nucleoside phosphorylase (Tb927.8.4430), a tryparedoxin (Tb927.3.5090) and two proteins with unknown function that had higher signals in cBF, and a GRESAG4
that had a higher signal in slender BF. These data contrast with a previous microarray study examining 550 genes that found 35 were upregulated in cBF and 3 were upregulated in slender BF [12
]. None of those 38 genes correspond to the few genes that we identified above. Two sets of genes that we identified as modestly upregulated were on the previous array, but these were not observed to be upregulated in that analysis. The lack of consistency between the two studies in this regard could arise from differences in the strains or conditions (e.g
., medium, serum, use of intact vs
immunocompromised animals). Nonetheless both studies do suggest that in vitro
cultivation provides a reasonable model for analysis of most mRNAs in slender BF.
A comparison of the rapidly dividing slender BF with the non-dividing stumpy BF showed a total of 107 genes with at least a 2-fold change in signal in the arrays, not including VSG
s. About twice as many genes were up-regulated in slender forms (Table ) as were up-regulated in stumpy forms (Table ). The most prominent categories of genes showing increased signals in slender forms are those that are related to the cytoskeleton, including the flagellum (Figure ). Many of these genes are annotated as hypothetical proteins, but they were detected in the flagellar proteome [23
]. Non-dividing forms do not build new flagella or cytoskeleton. Additionally, several metabolic enzymes were up-regulated, predominantly those which are localized to the glycosome (a specialized peroxisome) or are involved in glycolysis. Conversely, the entire set of eight ESAG9
genes in the 927 strain were upregulated in stumpy BF (from 2-fold to 30-fold), as were two genes that are related to ESAG9
. The function of ESAG9
is not known; it was originally described as a gene found in a VSG
expression site (ES) in the closely related parasite Trypanosoma equiperdum
]. At that time, the authors noted that a related ESAG9
was transcribed independently of the VSG
ES. Seven of the eight annotated ESAG9
genes encode proteins with a predicted signal sequence, but none of these contain predicted transmembrane domains, suggesting the ESAG9
s could encode a family of secreted proteins. The metabolic enzymes encoded by genes with higher mRNA levels in stumpy BF were predominantly mitochondrial, consistent with pre-adaptation for differentiation into insect forms. We also noted the increased mRNA for the PAD1
genes, which encode citrate transporters and were previously shown to be upregulated in stumpy forms of T. brucei
strain EATRO 2340 [25
Genes showing increased expression in slender BF as compared to stumpy BFa
Genes showing increased expression in stumpy BF as compared to slender BF
Comparison of gene expression of PF in different conditions
Unlike cBF, in vitro
cultured PF can be grown to stationary phase where they can persist for several days as viable cultures. Thus, we could directly compare the abundance of mRNAs in actively replicating (PF-log) and non-dividing (PF-stat) cells. A total of 895 genes showed differential expression (see Additional file 6
), many more than the 107 genes differentially regulated between with the slender (log) versus stumpy (stationary) BF. About three times as many genes were up-regulated in PF-log as compared to PF-stat. As shown in Figure , this increase was reflected across almost all categories of genes, except for proteins categorized as unknown (both conserved and T. brucei
-specific). The most skewed group was genes annotated as encoding hypothetical proteins (conserved or T. brucei
-specific) that have predicted transmembrane domains -- many more such genes were upregulated in stationary phase than in log phase. For proteins with ascribed function, those associated with protein phosphorylation/dephosphorylation were enriched in stationary phase. As discussed above, some of changes in PF-stat may reflect the decrease in cellular growth functions, or perhaps preparation for development to epimastigotes. It is also possible that some transcripts with higher signals in PF-stat are simply those that decay most slowly.
Genes encoded by the mitochondrial genome
Several genes on the mitochondrial maxicircle genome are extensively remodeled by RNA editing to yield transcripts encoding components of mitochondrial respiratory complexes. Only 15 mitochondrial probe-sets could be designed (see Additional file 7
). Four corresponded to both edited and unedited sequences, and six to never-edited sequences, including the two rRNAs. Three corresponded to edited sequences, two of which had corresponding unedited probe-sets. From this limited set, a few trends could be observed, which were compatible with prior literature [26
]. For example, 12S and 9S rRNA, cytochrome b, cytochrome oxidase subunit I, and cytochrome oxidase subunit II (edited plus unedited) transcripts all increased in stumpy BF and further increased in PF-log, although some did not reach statistical significance until PF-log phase. Somewhat surprisingly, ATP synthase subunit 6 (edited), NADH dehydrogenase subunit 5, NADH dehydrogenase subunit 7 (edited) and NADH dehydrogenase subunit 8 (edited) all showed increased mRNA levels in stumpy BF, but decreased in PF-log. Unexpectedly, many of the signals reached their maximum in PF-stat. This could reflect a potential differential stability as compared to the nuclearly-encoded transcripts under conditions of growth arrest, and would be highlighted by the normalization procedure.
We noted several tandem arrays of gene families containing non-identical genes that were differentially regulated. Three families encoding proteins with multiple transmembrane domains are depicted in Figure . The first cluster (Figure ) of genes are those in the recently described PAD
array of carboxylate transporters [25
]. In contrast to PAD1
, which are induced in stumpy BF [25
], the other members of this gene family are either constitutively expressed at the mRNA level or more highly expressed in PF. PAD5
show an increase in expression from stumpy forms to PF-log and even higher expression in PF-stat. PAD8
showed similar expression in all conditions except in PF-stat, which was ~1.5-fold increased over PF-log (q-value = 4.89). Figure shows an unrelated gene family on chromosome 10 that also encodes major facilitator proteins. These four genes show a high level of conservation with one another, with long stretches of amino acid identity. Two are most highly expressed in BF, whereas the other two show a more complex pattern of regulation. The final set of genes (Figure ) encodes a set of related proteins predicted to have four to five transmembrane domains, four of these genes are tandemly arrayed on chromosome 8. Here the mRNA abundances of the three most closely related genes are higher in the BF samples. In contrast, the first gene in the array and another more divergent, unlinked gene on chromosome 11 do not show this pattern, and have similar or higher expression in PF.
Figure 5 Differential expression of tandemly arrayed genes. A. The PAD gene array of carboxylate transporter proteins (PAD1, Tb9277.5930, PAD2, Tb927.7.5940, PAD3, Tb927.7.5950, PAD4, Tb927.7.5960, PAD5, Tb927.7.5970, PAD6, Tb927.7.5980, PAD7, Tb927.7.5990, PAD8 (more ...)
VSGs and ESAGs
The T. brucei
strain 927 genome contains approximately 1600 VSG
genes (or pseudogenes), but each BF trypanosome expresses only a single VSG, which covers the surface of the parasite in a dense coat. Although T. brucei
possesses ~20 VSG
ESs (located at telomeres of megabase- and intermediate-sized chromosomes) [29
], the expressed VSG
gene encoding the surface coat protein is located in the sole active ES. In BF, transcription initiates in all ESs, but attenuates rapidly in the inactive ESs, never reaching the downstream genes including the resident VSG
]. Similarly, transcription of ESs initiates in PF, but transcript elongation is minimal [31
]. A relatively small number of apparently functional VSG
genes exist on the 11 megabase-sized chromosomes in T. brucei
. The minichromosomes also contain a reservoir of apparently functional VSG
genes, but only a few have been sequenced. In contrast, most VSG
genes reside in sub-telomeric arrays that are comprised of pseudogenes (which were not included on these microarrays) and atypical VSG
genes, which encode proteins that are neither clearly pseudogenes nor clearly functional [11
]. The pseudogenes provide the fuel for generating novel VSG
genes by mosaic gene conversion during antigenic variation, particularly later in infection [32
]. The VSG-related VR
genes are located not in the telomeric ESs or sub-telomeric arrays, but rather typically reside in chromosome-internal strand-switch regions and lack the 70-bp repeats typically found upstream of VSG
]. The telomeric ESs and sub-telomeric VSG
arrays also contain hundreds of ESAG
s, many of which are pseudogenes. However, a number of genes related to ESAG
s) have chromosomal-internal location (the nomenclature discriminating ESAG
s and GRESAG
s was not consistently applied as genes were named).
The microarray design used in this study, contained probes for 74 VSG
s, 70 atypical VSG
s, and 46 VSG
s that were unclassified on VSGdb [33
]; 21 sub-telomeric ESAG
s, 104 chromosome-internal ESAG
s and GRESAG
s, as well as 17 ESAG
s from three T. brucei
strain 427 ESs (no T. brucei
strain 927 ESs have been annotated to date). This VSG
subset of genes was represented by a total of 357 probe-sets. Even though individual parasites express only one ES (containing a single VSG
and ~10 ESAG
s) at a time, since the parasites have been maintained without regard for antigenic type, we expected that there would be diverse set of VSG
genes showing some expression at the population level. In addition, we expected that expression of these VSG
s and ESAG
s would vary between biological replicates, and indeed, a subset of VSG
s showed considerable variation in BF, but not PF (Figure ), probably reflecting antigenic variation within these populations. Thus, subsequent analyses were carried out on the 15 individual samples rather than on the mean of the biological conditions (see Additional file 8
for gene level data).
Figure 6 Cluster analysis of ESAG and VSG gene expression. A. Signals from probes detecting VSG/VR genes. Density plot of gene-level coefficient of variation for VSG genes, defined as the standard deviation across the three biological replicates divided by the (more ...)
Hierarchical clustering of the 357 probe-sets (after log2-transformation of the normalized expression values) allowed us to define four distinct patterns of VSG gene and ESAG expression (marked A-D in Figure ). Interestingly, the distribution of VSG genes and ESAGs from different genomic locations within each group differed markedly (see Figure and ). Group A contained a large number (137) of VSGs not expressed in any sample, or only at low levels in some BF samples, exemplified by gene 1 in Figure . All these genes were located within sub-telomeric clusters and were likely not transcribed at any stages, except when translocated to the active expression site in small sub-populations of BFs. This group also included five ESAGs from T. brucei 427 ESs that presumably either reside in inactive expression sites or are not present in T. brucei 927.
A second group (B) contained 34 VSG genes and 54 ESAGs, which were expressed at substantially higher (but still relatively moderate) levels in BF and generally low levels in PF. Many of these showed variable expression levels in different biological replicates of the BF samples, indicative of expression from active ESs in sub-populations of BF. This group contained VSG and VR genes from sub-telomeric clusters (genes 2 and 3, in Figure ), as well as from chromosomal-internal locations (mostly VRs, e.g. gene 4). It also contained ESAGs and GRESAGs from the 427 ES, sub-telomeric clusters and chromosomal-internal loci. Of particular interest are several ESAG9 genes that are up-regulated only in stumpy BF (as discussed above). While this group of genes has many of the hallmarks of canonical VSG/ESAG expression from ESs, it should be noted that in many cases their signal levels in PF were substantially above background; suggesting that the genes are actively transcribed in PF, but the mRNAs are less stable than in BF.
Unexpectedly, a group (C) of 34 sub-telomeric VSG
and nine ESAG
genes showed variable expression levels in both BF and PF. The function of these VSG
genes is unclear, since they appear to encode both typical and atypical VSGs. In particular, one group of four tandemly-linked VSG
genes from an allele-specific region of chromosome 5 showed highest expression in PF-log cells (see gene 5 in Figure ). Group C contains both sub-telomeric and internal genes encoding ESAGs 2, 3, 5, and 11. Interestingly, several ESAG11
-related genes and a VR
are located in a tandem array on chromosome 4, where they are interspersed with genes encoding hypothetical proteins. Since the hypothetical proteins show very similar expression patterns to the adjacent ESAG11
-related genes, at least some may simply represent 3' UTRs of the neighboring genes. Interestingly, this cluster is located between rRNA and tRNA gene clusters, and would not be expected to be transcribed, since it appears to lack the modified chromatin found at typical RNA polymerase II transcription initiation sites [15
]. The signal levels for all these genes is modest (<3000), and are lowest in stationary phase, suggesting that they may merely represent increased "background" transcription due their proximity to the actively transcribed RNA genes. However, this does not rule out functionality of this set of putative genes.
The final group (D) of VSG
genes were expressed in all life cycle stages. All nine VSG
genes in this group are located in chromosomal-internal loci: four are annotated as VR
s, four are atypical VSG
s and one is uncategorized. mRNA for several of the VR
genes has previously been detected in PF using PCR [32
]. One of the VR
s shows highest expression in PF-stat (gene 7 in Figure ). Three of the atypical VSG
genes show similar expression in all stages (e.g., gene 6 in Figure ). These genes, Tb927.4.5400, Tb927.4.5420 and Tb927.4.5430, along with a 4th
gene identical to Tb927.4.5420, are tandemly-linked to form a small cluster just 5' (and on the opposite strand) to the sub-telomeric VSG
cluster at the "right" end of chromosome 4. Interestingly, this cluster of genes is immediately downstream of a convergent strand-switch region that appears to contain an RNA polymerase transcription initiation site in both BF and PF [16
]. A large number of ESAG
s are also expressed in all life cycle stages; of these most are chromosomal-internal GRESAG4
genes that have been shown previously to be expressed in PF [34
]. However, two ESAG4
s and an ESAG7
from the 427 ES show this expression pattern, as do six sub-telomeric ESAG
s (two encode ESAG3, two ESAG5, one ESAG4 and one ESAG9-like). The functional significance of their expression in PF is unknown.
Of the 215 VSG
genes examined, 43 showed expression levels above the bottom quartile (~4500) of all genes in at least one BF sample. These included 10 classified as encoding functional VSGs, and eight that were unclassified, but also included eight encoding atypical VSGs and 17 VR
s. From these data it is apparent that at least some atypical VSG
s are expressed and hence likely to be functional. Indeed a query on GeneDB for VSGs
annotated as being detected in proteomic analysis of BF [35
] yielded seven genes, two of which are atypical VSGs. Only nine of the 43 VSG
genes noted above were expressed below the 5th
percentile (~1300) in PF. These included five encoding typical VSGs (and two that were uncharacterized), but one gene encoded an atypical VSG, and one VR
gene also had this expression pattern. Thus, these data suggest that the functional diversity of VSG
genes is likely more complex that currently appreciated.