Here we report on the identification of T. brucei genes encoding predicted unknown surface proteins obtained via in silico GPI-anchor attachment signal sequence prediction analysis. Expression profiling analysis from mammalian and tsetse developmental stages indicate that transcripts for the majority of the hypothetical and hypothetical conserved proteins are expressed in parasites during their development in the tsetse salivary glands and proventriculus. Most notably, we identified 8 trypanosome genes specifically expressed in parasitized salivary glands, expression for all of which was also detected from mammalian infective MCF trypanosomes present in fly saliva. The results of this analysis give the first large-scale insight into stage-regulated expression of genes encoding putative hypothetical surface proteins during key developmental processes in the tsetse fly, and support the established paradigm of differential expression through development. Functional characterization of these unknown proteins, particularly expressed by metacyclics in saliva, ay lead the way to novel transmission blocking strategies in the mammalian host.
Proteins with GPI posttranslational modification are typically expressed on the surface of eukaryotic parasites and have the potential to participate in important biological processes such as cell–cell interactions, signal transduction, endocytosis, complement regulation, and antigenic presentation 
. In protozoan parasites, GPI anchored glycoconjugates extensively coat the plasma membrane and are involved in many aspects of host–parasite interactions, such as adhesion and invasion of host cells, modulation and evasion from host immune response 
. As such, there is interest in identifying the surface proteins of the medically important kinetoplastids, as reported in L.
and T. cruzi
where proteomic techniques were applied to capture this class of proteins 
. Current knowledge of the VSGs and procyclins, two of the best characterized GPI-anchored surface proteins of T. brucei
has demonstrated the importance of these proteins in trypanosome developmental processes. Further, GPI biosynthesis has also been implicated as a molecular target for development of new drugs against African sleeping sickness 
. The availability of the T. brucei
genome allows for postgenomic discoveries including screens for hallmark motifs such as GPI anchor attachment signals associated with surface proteins 
Several publically available programs can be used to predict post-translational modifications (PTM) such as glycosylation and GPI-anchor attachment, although a gold standard for prediction software remains to be found 
. As a result, experimental validation of predicted features is always warranted. The quality of predictive algorithm outputs vary in response to several factors. In the case of GPI-anchor prediction, variables include the size of the motif recognized, quality of the underlying data used to test the algorithm, and correct application of learning procedures such as neural networks 
. The ideal tool would have high sensitivity to detect true positives, with a low false prediction rate 
. Also relevant is the biological context being considered, as a result there are algorithms specifically for protozoa, fungi, plants, etc 
. As seen with our dataset, two algorithms can generate different results from the same dataset. In this work, FragAnchor agreed with most, but not all of those genes previously identified by a BigPI search specific for protozoa GPI anchor attachment domains. A similar outcome with these two programs was reported after testing both against known positive and negative control GPI-anchored protein datasets 
, and against a dataset from the protozoan pathogen Plasmodium falciparum
. In both of these cases, although correct identification of true GPI-anchored proteins was high, the false positive rate was high as well. Conversely, another group found FragAnchor to be more accurate than BigPI, while maintaining the same false positive rate 
, although limitations associated with the algorithm they employed for comparison make it difficult to draw clear conclusions 
. With these challenges in mind, we opted for a conservative approach in the identification of putative GPI-anchored proteins by selecting only those genes encoding products that showed agreement between the two predictive programs. As the absence of predicted trans-membrane domains is necessary to support a prediction of GPI-anchoring 
, we further excluded putative proteins bearing any predicted trans-membrane domains from expression analysis despite predictions of GPI-anchoring. While the presence of a GPI anchor attachment signal suggests cell surface membrane expression as mentioned earlier, there is evidence that both N- and O- glycosylation status directs nascent proteins to the apical region 
. Like GPI anchor attachment sites, glycosylation sites can be predicted using in silico
methodology. Importantly, while the presence of predicted glycosylation sites support the expectation of surface expression, the absence of glycosylation does not imply a lack of surface expression of a protein 
Fifty-six of the in silico-identified genes in the T. b. brucei genome had known or predicted functions in other closely related kinetoplastid parasites and were not pursued for further expression analysis. These included all members of the BARP family, and many genes with putative functions, such as GP-63 surface protease (5 copies), trans-sialidase (4 copies), procyclin associated gene 4 (2 copies), and numerous carrier or transporter proteins. Our aim was to identify unknown SG stage-regulated genes for downstream characterization and investigation as novel transmission blocking targets. Of the 163 non-procyclin, non-VSG coding genes that were identified as encoding GPI-anchor proteins using the BigPI prediction software, 104 were confirmed with FragAnchor. With regard to possible function of these gene products, 106/163 had no known functions. A search of the available whole genome sequence information from T. b. gambiense, L. major, T. cruzi, T. congolense and T. vivax indicated that about 21% (22/106) of the identified genes were unique to T. b. brucei. With regard to the 25 genes that met our criteria to be considered likely to encode predicted GPI-anchored proteins, 5 were conserved at the level of the TriTryp genomes, 10 were shared with other species of Trypanosoma, and 10 were unique to T. b. brucei. It is possible that the lack of homologs in these genomes reflects the different biology of the parasite species, although it is also possible that as genome annotations improve homologs may be revealed. While T. b. gambiense is more closely related to T. b. brucei than the other trypanosomatid species analyzed, its biology differs from T. b. brucei. It remains to be seen if the unique genes in T. b. brucei genome contribute to its differing epidemiology. The annotated whole genome sequence of T. b. rhodesiense is not yet available, however, the status of T. b. brucei specific genes in T. b. rhodesiense is of interest both from an evolutionary and epidemiological point of view.
Gene expression profiling analysis showed that the majority of the 21 genes for which we detected transcripts, are expressed by trypanosome developmental stages present in the tsetse fly PV and SG tissues, while comparatively fewer are expressed by mammalian bloodstream forms and none in the MG. A similar trend was found in genes encoding proteins with less likelihood of GPI anchoring. Similarly, a proteomic analysis that identified GPI-anchored molecules in T. cruzi
insect-stage epimastigote cultures also found the majority of the identified proteins to be novel 
. In the case of T. brucei
, obtaining sufficient epimastigote and metacyclic parasites from infected tsetse flies for functional analysis is difficult since these stages are unculturable in vitro
. Confirmation of the corresponding stage-regulated protein expression is a necessary next step, and the resulting data may shed light on the roles of these products in parasite biology. Complex gene expression profiles for putative surface proteins in the proventricular and salivary gland stages of T. brucei
may reflect the multiple discrete trypanosome developmental stages infecting these tissues, or heightened sensitivity of these trypanosomes to the tsetse or mammalian bite-site host environment. Unlike the SG and PV, far fewer unknown putative surface proteins were associated with the BSF and MG stages. This minimal detection of unknown transcripts in PF and BSF samples may be related to the abundant expression of known GPI-anchored major surface proteins in these stages- specifically the procyclins and VSGs, respectively.
Interestingly, genes encoding 8 of the 21 putative GPI-anchored proteins were specifically upregulated by parasites infecting tsetse SG. Although trypanosomes undergo four distinct developmental steps in this tissue, only two GPI-anchored protein families have been demonstrated on the surface of any SG stages to date. The alanine-rich BARP proteins are expressed on epimastigotes attached to the salivary gland epithelium. Free metacyclics in saliva no longer express BARP, but have upregulated the metacyclic variant surface glycoproteins (M-VSGs) in advance of inoculation into the mammalian host 
. The data presented here suggest a more complex series of events may be involved in the maturation of the SG-inhabiting trypanosome stages. Proteins specifically expressed on the immature SG stages might be involved in host-parasite interactions and as such could be targeted to prevent parasite maturation in the fly using genetic modification strategies in the tsetse host 
. On the other hand, proteins expressed on the mature metacyclics may present novel vaccine targets for use in the vertebrate hosts.
Importantly, transcripts corresponding to the SG stage-regulated genes were not detected in the bloodstream form stages. Since the mammalian infective metacyclic trypomastigote is suggested to be “pre-adapted” to life in the vertebrate host, one could expect these samples to share proteins. There are two potential explanations for this observation. First, many gene products associated with adaptation to the vertebrate environment are likely to be intracellular i.e. related to energy metabolism, and therefore not bearing GPI-anchor attachment domains. As a result, these genes are expected to have been excluded from the in silico
screen applied here. Second, when an infective fly bites the vertebrate host, metacyclic parasites are detected for several days with the bloodstream forms being not apparent until nearly a week after the infective bite 
. Thus it is possible that transitional metacyclics (t-MCFs), i.e. those detected in vertebrate blood in the days immediately after an infective tsetse bite, but before differentiation to the BSF, may have a transcriptome that reflects the parasite adaptation process from the environment of invertebrate saliva to vertebrate blood.
MCF trypanosomes, like malarial sporozoites, are the critical developmental stage of the parasite which gives rise to infection in the vertebrate host. While considerable effort has been mounted towards development of a sporozoite vaccine for the prevention of malaria, this has not been the case with the MCF of T. brucei. To date, VSGs have effectively thwarted all attempts at developing a vaccine against the mature BSF. It is thought that MCF parasites also express variable proteins (M-VSGs), which would hamper vaccine development efforts targeting MCF. Our results suggest however that GPI-anchored surface protein repertoire of MCF may be more complex and different from the BSF forms than originally thought. The expression of the genes encoding putative surface proteins on the mammalian-infective stage suggests a complex interface of MCF and mammalian bite-site.
In summary, the in silico and semi-quantitative gene expression analyses approach used here has allowed an important first look at the stage-regulated expression of genes encoding putative GPI-anchored proteins with no known functions in the human and animal pathogen T. brucei. The findings presented here suggest that the tsetse host-parasite interplay during differentiation may be quite complex. Most importantly, these results greatly increase our understanding of trypanosome biology at the point of transmission to the vertebrate host, and identify a number of putative invariant surface proteins, which could be investigated further for novel transmission blocking strategies.