The estimated complexity of the normalized library was 2.3 × 106. A total of 12,768 randomly selected clones were sequenced. Of these 1,128 had no insert and are not included in other figures. A total of 10,450 clones were sequenced from the 59 end, 10,977 were sequenced from the 39 end and 9,857 were sequenced from both ends. This yielded 21,427 ESTs (9,879,196 bp). Median EST size was 461 bp and 8,983 of the ESTs contained polyA tails. Overall, 3,761 59 and 39 read pairs from the same clone overlap, giving full sequence of a clone. The cDNA library insert size was tested by comparing the sequences to previously characterized Glossina genes (of which there are 19). Interestingly, 65 clones had higher than 95% homology to these genes across their length in both the forward and reverse direction. We estimate these clones range in size from 113 bp to 1,870 bp and we estimated the average insert size to be 990 bp. All the 39 sequences hitting the known Glossina genes had polyA tails but not always as long as the previously described sequences. The average distance of the 59 end start for the EST was 124 bp from the start of the cDNA. None of the clones contained full length cDNAs.
Clustering with Phrap (Phil Green, unpublished observations) produced 3,220 clusters with a median membership of 4.90 (range 2-135; 74.3% of the total EST). This left 5,656 singletons (25.7% of the total EST). The ESTs generated were 73.7% redundant which is relatively low considering that all clones were sequenced from both ends and hence there is an inherent level of redundancy caused by overlapping forward and reverse reads, which we estimate to be 10%. Preliminary sequencing of a library that had not been normalized demonstrated a redundancy of 87.5%, after just 181 reads. At a similar stage of sequencing the normalized library had a redundancy level of just 10%.
It is probable that the final midgut library contained ESTs representing transcripts from four sources: midguts exposed to trypanosomes from one to seven days; fat body from the same trypanosome-challenged flies; trypanosomes; and bacterial symbionts from the fly. Whilst the vast majority of mRNA used in library construction was of midgut origin the minor components would be expected to increase in representation during the normalization procedure. We found 356 ESTs that matched known T. brucei
DNA sequences with a BLASTn score above 400 and these were eliminated. Tsetse flies also contain three bacterial symbionts [7
]. Of these, Wigglesworthia
have a strong presence in the midgut. We attempted to minimize the representation of these in the library by including a polyA mRNA purification step which acts to exclude bacterial sequences - most of which characteristically lack a polyA tail. In addition we analyzed the data specifically looking for bacterial sequences. Using BLASTX against Swall we identified 34 sequences with highest hits to bacterial sequences: identity varied between 46% and 96%. Of these 34 sequences, 20 were to the endosymbiont Wigglesworthia brevipalpis
, the midgut endosymbiont found in another tsetse fly Glossina brevipalpis.
The amino acid identity displayed by these 20 hits ranged from 52% to 91%. Given the very high abundance of Wigglesworthia
in the midgut bacteriome it is clear that the polyA exclusion strategy was effective in minimizing the representation of bacterial sequences in the library. Of the remaining sequences, 3,884 sequences had matches in the Drosophila
protein database, that is, 45% of the total number of clusters and singletons. We also found that 17.15% of the 59 alignments contained AUG in the alignment. Only 151 sequences had matches to proteins in Swall and not to Drosophila
proteins. The high degree of similarity between the two species makes Glossina
a very good comparative model for studying Drosophila
and the Drosophila
database an excellent resource for those studying Glossina
. A description of the proportion of ESTs falling into different functional classes is given in Figure . The sequences have been submitted to GenBank: Accession numbers BX548257-BX569683.
Figure 1 Classification of EST clusters using Gene Ontology TM. (a) Individual annotations were summarized by mapping to broad level terms from the 'biological process' ontology and (b) the 'molecular function'. ESTs which mapped to process unknown or molecular (more ...)
Using BLASTX and Pfam we identified 78 homologs of genes with known or putative immunity-related functions. Although the insect midgut is known to be involved in the immune response [8
] this is still a surprisingly high number of immune-related genes to find, particularly in the gut of an insect which feeds exclusively on (normally sterile) vertebrate blood. For example, an EST project centered on immune-competent, hemocyte-like cell lines from the malarial mosquito Anopheles gambiae
] identified only 38 such clusters. Many explanations are possible for the comparatively high number found in Glossina
. For example, the genes identified may not have immunity-related functions in the midgut. Alternatively, this high number may be a function of the presence of bacterial symbionts in the tsetse fly midgut [7
] and the need to regulate their numbers. On the other hand, it may be a result of the low redundancy of the library and the comparatively large numbers of ESTs produced in this study.
Serine proteases and inhibitors
In this group, 15 genes have been identified. The presence of 11 putative proteinase inhibitors in a gut dedicated to the digestion of a high protein diet is surprising and suggests they have an important function. Serine protease genes possessing 'clip domains' are implicated in the activation of the prophenol oxidase cascade and other cascades associated with the immune response [12
]. Four homologs of such serine proteases have been uncovered (Table ). The function of such genes in a midgut environment remains to be determined. Many serine proteases involved in the immune response exist in fine balance with serine protease inhibitors to ensure that the impact of protease-activated cascades remains localized in time and space [13
]. Serpins may have particular involvement in inactivating serine proteases with clip domains [13
]. Some serpins are involved in the immune response. For example, Spn43Ac in Drosophila
is involved in the regulation of Toll-mediated antifungal defense [15
]. We have identified nine putative serpins, but it seems unlikely that these are involved in the regulation of complex insect-based cascades in the midgut. Instead, the large numbers of serpins found here may reflect the need to inactivate the complement and coagulation cascades in the blood meal - to protect the midgut epithelium and retain the meal in a physical state suitable for digestion, respectively. In support of the latter contention, Gmm-2766 is a homolog of Infestin which is reported in the NCBI protein database (AAK57342) as a novel thrombin inhibitor present in the midgut of the blood-sucking hemipteran Triatoma infestans.
Serpins may also have an additional direct role in immunity as proteolytic enzymes are important virulence factors in many pathogens and proteinase inhibitors may have important roles in regulating disease [16
Serine protease-related group of putative immune-related genes chosen for arraying
Two further components of this group are members of the complement C3/α2 macroglobulin superfamily, which are homologs of the TepIV
gene in Drosophila
is strongly upregulated in response to immune challenge in adult Drosophila
and it has been suggested that Tep genes in Drosophila
may have complement-like properties [17
]. Other suggested functions for members of this protein family are as proteinase inhibitors [16
In this group, 28 putative adhesion genes have been isolated (Table ). We have eliminated homologs of enzymes involved in sugar metabolism which also contain chitin-binding domains. This leaves 14 molecules containing chitin-binding domains which may have an adhesive function and play a role in immunity. Tse36b05, Gmm-3093 and Gmm-2445 are homologs of a mucin, a peritrophic matrix constituent and a peritrophin, respectively, and may play a defensive barrier role in the midgut [18
]. The cluster Gmm-1329 is a homolog of the Anopheles gambiae
which is immune inducible and highly expressed in the adult mosquito midgut [20
]. Five other clusters are homologs of the Drosophila
, which encodes a gene related to chitinase but which lacks catalytic activity. In Drosophila
, the product encoded by Chit
is an imaginal disk growth factor [21
]. The function(s) of this group of genes in an adult insect is open to question, but may include an immune function given the immune responsiveness of other molecules carrying chitin-binding domains [20
Adhesion molecules group of putative immune-related genes chosen for arraying
Three homologs of pattern recognition proteins known to bind microbial surface molecules have been found. Two are homologs of peptidoglycan-binding proteins which are pattern recognition proteins involved in immune response pathways targeting bacterial infections [22
]. The third is a homolog of a gram negative binding protein known to bind lipopolysaccharide and β 1-3 glucan from gram negative bacteria and fungi, respectively. The homolog is involved in the regulation of NF-κB-dependent antimicrobial peptide gene expression in Drosophila
The sequencing uncovered seven homologs of scavenger receptor molecules which may have involvement in the immune response including a homolog of croquemort which is a macrophage receptor involved in the phagocytosis of apoptotic cells in Drosophila
Of particular interest are three putative lectin genes. Two are homologs of c-type lectin superfamily members. One is from Drosophila
and the other an immune responsive c-type lectin from the fleshfly Sarcophaga peregrina
]. The third putative lectin (Tse33h03.q) is a member of the ConA-like lectin superfamily. Indirect evidence from sugar inhibition experiments suggests lectins play a role in determining the initial success of trypanosome infections in tsetse flies and stimulate the maturation of successful trypanosome infections [4
]. Until now these genes have defied attempts to clone them, hindering precise analysis of their effects on trypanosomes. Further work on these lectins is underway.
Other putative immune-related genes
There are 35 genes in this group. An attacin gene, distinct from the previously reported AttA
gene from Glossina
], has been identified; attacins are important antimicrobial peptides in insects. It is interesting that other antimicrobial peptide genes, including ones already cloned and sequenced which are known to be expressed in the adult midgut [5
], are notable by their absence from our EST list. Reactive oxygen species (ROS) play a role in insect immunity [30
]; we identified 18 putative antioxidant genes consisting of five superoxide dismutases, three catalases, five peroxidases and five peroxiredoxins. Here we give details for the eight which were included in the macroarrays - we are carrying out further work on these antioxidant genes. This unusually high number of antioxidant genes may reflect the need of the fly to protect the midgut epithelium from oxygen radical attack on lipids caused by the abundance of heme molecules liberated from the digested blood meal [31
]. Some of these antioxidant genes may also protect against ROS generated during immune responses. For example, Gmm-2058 is a homolog of Dpx4156, which is itself a homolog of mammalian and parasite genes believed to play a role in preventing oxidative damage by scavenging extracellular ROS released by immune effector cells [32
Five clusters with homology to genes involved in iron metabolism were identified. Iron metabolism genes have been implicated in immunity [34
]. The remaining 11 genes are homologs of genes involved in signaling pathways associated with the immune response. They include components of the well-studied antimicrobial peptide regulating pathways Toll and Imd [36
]. They also include a homolog of the Drosophila Thor
gene which has been implicated in post-transcriptional regulation of immune responses [37
In the future, comparative studies of immune-related genes in different hematophagous species may be informative in the understanding of the relationships between insects and the parasites that they transmit. Publication of the full genome of A. gambiae
makes this the benchmark species [38
]. Of the 68 immunity-related genes used in the arrays, 54 had a homolog in the A. gambiae
database at a BLASTX value of 1e-08 or less (Tables ,,) suggesting that Glossina
potentially show considerable overlap in the genes underpinning their immune systems. This suggests that future comparative studies of these species may help provide wide ranging insights into immune mechanisms in blood-sucking insects.
Miscellaneous group of putative immune-related genes chosen for arraying
Both trypanosome and bacterial challenges lead to a significant number of downregulated genes (Figure ) which may represent a malaise reaction of the gut to infection such as that recorded in Drosophila
] and Manduca
]. Only relatively small numbers of genes are upregulated in response to infection. However, based on their homology, many of the genes which are not upregulated probably do have an immune function - for example, Gmm-3156, Tse70a12 and Tse812d11 are all involved in bacterial binding.
Figure 2 Expression profiles of putative immunity genes following bacteria or trypanosome challenge. Macroarrays were produced for the 68 putative immunity-related genes listed in Tables ,,. The contig (Gmm-) or clone identifier (more ...)
Successful trypanosome infection results in considerable changes in the transcriptional profile (Figure ). Comparing infected midguts to equivalent non-trypanosome challenged midguts, 10 genes are upregulated and 28 downregulated. Comparing midguts from tsetse which had self-cleared trypanosome infections to equivalent non-trypanosome challenged midguts, 13 genes were upregulated and 31 downregulated. Remarkably, six of the 10 genes upregulated and 25 of the 28 genes downregulated in infected flies show the same changes in self-cleared flies.
Two peroxidase homologs are upregulated in both self-cleared and infected flies. In addition two more peroxidase genes are upregulated in self-cleared flies. It is known that trypanosomes are particularly susceptible to ROS [42
] and it is an interesting speculation that these genes may be upregulated to protect the fly against ROS which are generated during the tsetse immune response against trypanosomes (Zhengrong Hao and Serap Aksoy, unpublished observations). A homolog of the TepIV
gene of D. melanogaster
, a member of the larger c3/α2 macroglobulin family, is also upregulated in both trypanosome-infected and self-cleared flies. It has been suggested that invertebrate members of this family may have complement-like [17
] or proteinase inhibitor functions [16
]. Members of the Tep gene family of D. melanogaster
are also upregulated on immune stimulation [17
]. A serpin is upregulated in both self-cleared and infected flies and it is interesting to note that artificial proteinase inhibitors can have cytocidal effects on trypanosomes [45
]. One of the three putative lectins, Tse33h03, is also upregulated in both self-cleared flies and infected flies. While the full length cDNA for the other two lectins described both have putative signal peptides and so are probably secreted from cells, this particular lectin is of the ConA superfamily and lacks a signal peptide and so is probably intracellular (M.J.L., unpublished observations). Therefore it is unlikely that this lectin has a direct interaction with trypanosomes.
Genes that are differently regulated between infected and self-cleared flies are clearly of particular interest. The peroxidase homologs are discussed above. Others upregulated in self-cleared flies but not in infected flies include homologs of members of both the Toll and Imd signaling pathways. This may suggest a role for these pathways in successful tsetse immune responses to trypanosomes. These findings are consistent with previous reports that immune peptides known to be regulated by these pathways in D. melanogaster
are upregulated in response to trypanosome challenge [5
]. Consistent with these findings, attacin, which can be regulated through both pathways in Drosophila
], is upregulated in self-cleared flies.
The expression profile resulting from bacterial infection of the gut is quite distinct from that occurring in response to trypanosome infection (Figure ). This suggests the possibility that different recognition pathways are involved in bacterial and trypanosome infections. Of particular note in these expression profiles is the upregulation of homologs of several genes implicated in binding chitin - only following bacterial infection. The consistent pattern of upregulation seen here adds support to the possibility of an immune function for chitin-binding molecules in insects [20
]. The upregulation of genes involved in the Toll and Imd pathways and genes involved in the management of oxidative stress, uniquely during trypanosome infection, also merits further investigation.
The array data represent a first attempt to assess immune responses in the tsetse fly midgut. The ESTs reported here will permit more sophisticated approaches to be made in the future when it will be possible to determine the effect of the insect immune system on each of the developmental stages of the parasite. This will help us to understand why so few tsetse flies that have ingested trypanosomes go on to become infective.
The project has increased the number of Glossina genes in public databases by two orders of magnitude. Identification of the putative immunity genes provides a resource which should be of value in the experimental dissection of tsetse trypanosome interactions. For example, we are currently producing recombinant lectins to determine their impact on trypanosomes. The remarkably high proportion of the homologs found in the Drosophila database suggests that this rich resource will contribute much to our knowledge of tsetse biology. In particular, as annotation of the Drosophila genome proceeds it is likely that putative functions can be ascribed to many of the 4,485 ESTs for which no putative function has been ascribed.