TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. ‘User Comments’ may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.
Bioinformatic analyses have been used to identify potential downstream targets of the essential enzyme N-myristoyl transferase in the TriTryp species, Leishmania major, Trypanosoma brucei and Trypanosoma cruzi. These database searches predict ∼60 putative N-myristoylated proteins with high confidence, including both previously characterised and novel molecules. One of the latter is an N-myristoylated protein phosphatase which has high sequence similarity to the Protein Phosphatase with EF-Hand (PPEF) proteins identified in sensory cells of higher eukaryotes. In L. major and T. brucei, the PPEF-like phosphatases are encoded by single-copy genes and are constitutively expressed in all parasite life cycle stages. The N-terminus of LmPPEF is a substrate for N-myristoyl transferase and is also palmitoylated in vivo. The wild type protein has been localised to the endocytic system by immunofluorescence. The catalytic and fused C-terminal domains of the kinetoplastid and other eukaryotic PPEFs share high sequence similarity, but unlike their higher eukaryotic relatives, the C-terminal parasite EF-hand domains are degenerate and do not bind calcium.
PPEF, Protein Phosphatase with EF-Hands; NMT, N-myristoyl transferase; BSF, bloodstream form; PCF, procyclic form; N-Myristoylation; Palmitoylation; Protein phosphatases; Bioinformatics
The Trypanosoma cruzi genome was sequenced from a hybrid strain (CL Brener). However, high allelic variation and the repetitive nature of the genome have prevented the complete linear sequence of chromosomes being determined. Determining the full complement of chromosomes and establishing syntenic groups will be important in defining the structure of T. cruzi chromosomes. A large amount of information is now available for T. cruzi and Trypanosoma brucei, providing the opportunity to compare and describe the overall patterns of chromosomal evolution in these parasites.
The genome sizes, repetitive DNA contents, and the numbers and sizes of chromosomes of nine strains of T. cruzi from four lineages (TcI, TcII, TcV and TcVI) were determined. The genome of the TcI group was statistically smaller than other lineages, with the exception of the TcI isolate Tc1161 (José-IMT). Satellite DNA content was correlated with genome size for all isolates, but this was not accompanied by simultaneous amplification of retrotransposons. Regardless of chromosomal polymorphism, large syntenic groups are conserved among T. cruzi lineages. Duplicated chromosome-sized regions were identified and could be retained as paralogous loci, increasing the dosage of several genes. By comparing T. cruzi and T. brucei chromosomes, homologous chromosomal regions in T. brucei were identified. Chromosomes Tb9 and Tb11 of T. brucei share regions of syntenic homology with three and six T. cruzi chromosomal bands, respectively.
Despite genome size variation and karyotype polymorphism, T. cruzi lineages exhibit conservation of chromosome structure. Several syntenic groups are conserved among all isolates analyzed in this study. The syntenic regions are larger than expected if rearrangements occur randomly, suggesting that they are conserved owing to positive selection. Mapping of the syntenic regions on T. cruzi chromosomal bands provides evidence for the occurrence of fusion and split events involving T. brucei and T. cruzi chromosomes.
The protozoan pathogens Leishmania major, Trypanosoma brucei and Trypanosoma cruzi (the Tritryps) are parasites that produce devastating human diseases. These organisms show very unusual mechanisms of gene expression, such as polycistronic transcription. We are interested in the study of tRNA genes, which are transcribed by RNA polymerase III (Pol III). To analyze the sequences and genomic organization of tRNA genes and other Pol III-transcribed genes, we have performed an in silico analysis of the Tritryps genome sequences.
Our analysis indicated the presence of 83, 66 and 120 genes in L. major, T. brucei and T. cruzi, respectively. These numbers include several previously unannotated selenocysteine (Sec) tRNA genes. Most tRNA genes are organized into clusters of 2 to 10 genes that may contain other Pol III-transcribed genes. The distribution of genes in the L. major genome does not seem to be totally random, like in most organisms. While the majority of the tRNA clusters do not show synteny (conservation of gene order) between the Tritryps, a cluster of 13 Pol III genes that is highly syntenic was identified. We have determined consensus sequences for the putative promoter regions (Boxes A and B) of the Tritryps tRNA genes, and specific changes were found in tRNA-Sec genes. Analysis of transcription termination signals of the tRNAs (clusters of Ts) showed differences between T. cruzi and the other two species. We have also identified several tRNA isodecoder genes (having the same anticodon, but different sequences elsewhere in the tRNA body) in the Tritryps.
A low number of tRNA genes is present in Tritryps. The overall weak synteny that they show indicates a reduced importance of genome location of Pol III genes compared to protein-coding genes. The fact that some of the differences between isodecoder genes occur in the internal promoter elements suggests that differential control of the expression of some isoacceptor tRNA genes in Tritryps is possible. The special characteristics found in Boxes A and B from tRNA-Sec genes from Tritryps indicate that the mechanisms that regulate their transcription might be different from those of other tRNA genes.
Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryps) are unicellular protozoa that cause leishmaniasis, sleeping sickness and Chagas' disease, respectively. Most drugs against them were discovered through the screening of large numbers of compounds against whole parasites. Nonhomologous isofunctional enzymes (NISEs) may present good opportunities for the identification of new putative drug targets because, though sharing the same enzymatic activity, they possess different three-dimensional structures thus allowing the development of molecules against one or other isoform. From public data of the Tritryps' genomes, we reconstructed the Genetic Information Processing Pathways (GIPPs). We then used AnEnPi to look for the presence of these enzymes between Homo sapiens and Tritryps, as well as specific enzymes of the parasites. We identified three candidates (ECs 184.108.40.206 and 6.1.1.-) in these pathways that may be further studied as new therapeutic targets for drug development against these parasites.
In 2005, draft sequences of the genomes of Trypanosoma brucei, Trypanosoma cruzi and Leishmania major, also known as the Tri-Tryp genomes, were published. These protozoan parasites are the causative agents of three distinct insect-borne diseases, namely sleeping sickness, Chagas disease and leishmaniasis, all with a worldwide distribution. Despite the large estimated evolutionary distance among them, a conserved core of ~6,200 trypanosomatid genes was found among the Tri-Tryp genomes. Extensive analysis of these genomic sequences has greatly increased our understanding of the biology of these parasites and their host-parasite interactions. In this article, we review the recent advances in the comparative genomics of these three species. This analysis also includes data on additional sequences derived from other trypanosmatid species, as well as recent data on gene expression and functional genomics. In addition to facilitating the identification of key parasite molecules that may provide a better understanding of these complex diseases, genome studies offer a rich source of new information that can be used to define potential new drug targets and vaccine candidates for controlling these parasitic infections.
Trypanosoma brucei; Trypanosoma cruzi; Leishmania major; genome; RNAseq
Molecular studies have shown several peculiarities in the regulatory mechanisms of gene expression in trypanosomatids. Protein coding genes are organized in long polycistronic units that seem to be constitutively transcribed. Therefore, post-transcriptional regulation of gene expression is considered to be the main point for control of transcript abundance and functionality. Here we describe the characterization of a 17 kDa RNA-binding protein from Trypanosoma cruzi (TcRBP19) containing an RNA recognition motive (RRM). This protein is coded by a single copy gene located in a high molecular weight chromosome of T. cruzi. Orthologous genes are present in the TriTryp genomes. TcRBP19 shows target selectivity since among the different homoribopolymers it preferentially binds polyC. TcRBP19 is a low expression protein only barely detected at the amastigote stage localizing in a diffuse pattern in the cytoplasm.
Kinetoplastida; Trypanosoma cruzi; RNA binding proteins; RRM protein; TcRBP19
By using improved pulsed field gel conditions, the karyotypes of several strains of the protozoan parasite Trypanosoma cruzi were analyzed and compared with those of Leishmania major and two other members of the genus Trypanosoma. There was no difference in chromosome migration patterns between different life cycle stages of the T. cruzi strains analyzed. However, the sizes and numbers of chromosomal bands varied considerably among T. cruzi strains. This karyotype variation among T. cruzi strains was analyzed further at the chromosomal level by using multicopy genes as probes in Southern hybridizations. The chromosomal location of the genes encoding alpha- and beta-tubulin, ubiquitin, rRNA, spliced leader RNA, and an 85-kilodalton protein remained stable during developmental conversion of the parasite. The sizes and numbers of chromosomes containing these sequences varied among the different strains analyzed, implying multiple rearrangements of these genes during evolution of the parasites. During continuous in vitro cultivation of T. cruzi Y, the chromosomal location of the spliced leader gene shifted spontaneously. The spliced leader gene encodes a 35-nucleotide RNA that is spliced in trans from a 105-nucleotide donor RNA onto all mRNAs in T. cruzi. The spliced leader sequences changed in their physical location in both the cloned and uncloned Y strains. Associated with the complex changes was an increase in the infectivity of the rearranged variant for tissue culture cells. Our results indicate that the spliced leader gene clusters in T. cruzi undergo high-frequency genomic rearrangements.
TcruziDB () is an integrated post-genomics database for the parasitic organism, Trypanosoma cruzi, the causative agent of Chagas' disease. TcruziDB was established in 2003 as a flat-file database with tools for mining the unannotated sequence reads and preliminary contig assemblies emerging from the Tri-Tryp genome consortium (TIGR/SBRI/Karolinska). Today, TcruziDB houses the recently published assembled genomic contigs and annotation provided by the genome consortium in a relational database supported by the Genomics Unified Schema (GUS) architecture. The combination of an annotated genome and a relational architecture has facilitated the integration of genomic data with expression data (proteomic and EST) and permitted the construction of automated analysis pipelines. TcruziDB has accepted, and will continue to accept the deposition of genomic and functional genomic datasets contributed by the research community.
Extracellular factors produced by Leishmania spp., Trypanosoma cruzi, and Trypanosoma brucei are important in the host-parasite relationship. Here, we describe a genome-based approach to identify putative extracellular proteins conserved among trypanosomatids that are likely involved in the classical secretory pathway. Potentially secreted proteins were identified by bioinformatic analysis of the T. cruzi genome. A subset of thirteen genes encoding unknown proteins with orthologs containing a signal peptide sequence in L. infantum, L. major, and T. brucei were transfected into L. infantum. Tagged proteins detected in the extracellular medium confirmed computer predictions in about 25% of the hits. Secretion was confirmed for two L. infantum orthologs proteins using the same experimental system. Infectivity studies of transgenic Leishmania parasites suggest that one of the secreted proteins increases parasite replication inside macrophages. This methodology can identify conserved secreted proteins involved in the classical secretory pathway, and they may represent potential virulence factors in trypanosomatids.
The structurally complex network of minicircles and maxicircles comprising the mitochondrial DNA of kinetoplastids mirrors the complexity of the RNA editing process that is required for faithful expression of encrypted maxicircle genes. Although a few of the guide RNAs that direct this editing process have been discovered on maxicircles, guide RNAs are mostly found on the minicircles. The nuclear and maxicircle genomes have been sequenced and assembled for Trypanosoma cruzi, the causative agent of Chagas disease, however the complement of 1.4-kb minicircles, carrying four guide RNA genes per molecule in this parasite, has been less thoroughly characterised.
Fifty-four CL Brener and 53 Esmeraldo strain minicircle sequence reads were extracted from T. cruzi whole genome shotgun sequencing data. With these sequences and all published T. cruzi minicircle sequences, 108 unique guide RNAs from all known T. cruzi minicircle sequences and two guide RNAs from the CL Brener maxicircle were predicted using a local alignment algorithm and mapped onto predicted or experimentally determined sequences of edited maxicircle open reading frames. For half of the sequences no statistically significant guide RNA could be assigned. Likely positions of these unidentified gRNAs in T. cruzi minicircle sequences are estimated using a simple Hidden Markov Model. With the local alignment predictions as a standard, the HMM had an ~85% chance of correctly identifying at least 20 nucleotides of guide RNA from a given minicircle sequence. Inter-minicircle recombination was documented. Variable regions contain species-specific areas of distinct nucleotide preference. Two maxicircle guide RNA genes were found.
The identification of new minicircle sequences and the further characterization of all published minicircles are presented, including the first observation of recombination between minicircles. Extrapolation suggests a level of 4% recombinants in the population, supporting a relatively high recombination rate that may serve to minimize the persistence of gRNA pseudogenes. Characteristic nucleotide preferences observed within variable regions provide potential clues regarding the transcription and maturation of T. cruzi guide RNAs. Based on these preferences, a method of predicting T. cruzi guide RNAs using only primary minicircle sequence data was created.
The genomes of the three parasitic protozoa Trypanosoma cruzi, Trypanosoma brucei and Leishmania major are the main subject of this study. These parasites are responsible for devastating human diseases known as Chagas disease, African sleeping sickness and cutaneous Leishmaniasis, respectively, that affect millions of people in the developing world. The prevalence of these neglected diseases results from a combination of poverty, inadequate prevention and difficult treatment. Protein phosphorylation is an important mechanism of controlling the development of these kinetoplastids. With the aim to further our knowledge of the biology of these organisms we present a characterisation of the phosphatase complement (phosphatome) of the three parasites.
An ontology-based scan of the three genomes was used to identify 86 phosphatase catalytic domains in T. cruzi, 78 in T. brucei, and 88 in L. major. We found interesting differences with other eukaryotic genomes, such as the low proportion of tyrosine phosphatases and the expansion of the serine/threonine phosphatase family. Additionally, a large number of atypical protein phosphatases were identified in these species, representing more than one third of the total phosphatase complement. Most of the atypical phosphatases belong to the dual-specificity phosphatase (DSP) family and show considerable divergence from classic DSPs in both the domain organisation and sequence features.
The analysis of the phosphatome of the three kinetoplastids indicates that they possess orthologues to many of the phosphatases reported in other eukaryotes, including humans. However, novel domain architectures and unusual combinations of accessory domains, suggest distinct functional roles for several of the kinetoplastid phosphatases, which await further experimental exploration. These distinct traits may be exploited in the selection of suitable new targets for drug development to prevent transmission and spread of the diseases, taking advantage of the already extensive knowledge on protein phosphatase inhibitors.
Trypanosoma cruzi is a Kinetoplastid parasite of humans and is the cause of Chagas disease, a potentially lethal condition affecting the cardiovascular, gastrointestinal, and nervous systems of the human host. Constraint-based modeling has emerged in the last decade as a useful approach to integrating genomic and other high-throughput data sets with more traditional, experimental data acquired through decades of research and published in the literature.
We present a validated, constraint-based model of the core metabolism of Trypanosoma cruzi strain CL Brener. The model includes four compartments (extracellular space, cytosol, mitochondrion, glycosome), 51 transport reactions, and 93 metabolic reactions covering carbohydrate, amino acid, and energy metabolism. In addition, we make use of several replicate high-throughput proteomic data sets to specifically examine metabolism of the morphological form of T. cruzi in the insect gut (epimastigote stage).
This work demonstrates the utility of constraint-based models for integrating various sources of data (e.g., genomics, primary biochemical literature, proteomics) to generate testable hypotheses. This model represents an approach for the systematic study of T. cruzi metabolism under a wide range of conditions and perturbations, and should eventually aid in the identification of urgently needed novel chemotherapeutic targets.
The factors influencing variation in the clinical forms of Chagas disease have not been elucidated; however, it is likely that the genetics of both the host and the parasite are involved. Several studies have attempted to correlate the T. cruzi strains involved in infection with the clinical forms of the disease by using hemoculture and/or PCR-based genotyping of parasites from infected human tissues. However, both techniques have limitations that hamper the analysis of large numbers of samples. The goal of this work was to identify conserved and polymorphic linear B-cell epitopes of T. cruzi that could be used for serodiagnosis and serotyping of Chagas disease using ELISA.
By performing B-cell epitope prediction on proteins derived from pair of alleles of the hybrid CL Brener genome, we have identified conserved and polymorphic epitopes in the two CL Brener haplotypes. The rationale underlying this strategy is that, because CL Brener is a recent hybrid between the TcII and TcIII DTUs (discrete typing units), it is likely that polymorphic epitopes in pairs of alleles could also be polymorphic in the parental genotypes. We excluded sequences that are also present in the Leishmania major, L. infantum, L. braziliensis and T. brucei genomes to minimize the chance of cross-reactivity. A peptide array containing 150 peptides was covalently linked to a cellulose membrane, and the reactivity of the peptides was tested using sera from C57BL/6 mice chronically infected with the Colombiana (TcI) and CL Brener (TcVI) clones and Y (TcII) strain.
Findings and Conclusions
A total of 36 peptides were considered reactive, and the cross-reactivity among the strains is in agreement with the evolutionary origin of the different T. cruzi DTUs. Four peptides were tested against a panel of chagasic patients using ELISA. A conserved peptide showed 95.8% sensitivity, 88.5% specificity, and 92.7% accuracy for the identification of T. cruzi in patients infected with different strains of the parasite. Therefore, this peptide, in association with other T. cruzi antigens, may improve Chagas disease serodiagnosis. Together, three polymorphic epitopes were able to discriminate between the three parasite strains used in this study and are thus potential targets for Chagas disease serotyping.
Serological tests are preferentially used for the diagnosis of Chagas disease during the chronic phase because of the low parasitemia and high anti-T. cruzi antibody titers. However, contradictory or inconclusive results, mainly related to the characteristics of the antigens used, are often observed. Additionally, the factors influencing variation in the clinical forms of Chagas disease have not been elucidated, although it is likely that host and parasite genetics are involved. Several studies attempting to correlate the parasite strain with the clinical forms have used hemoculture and/or PCR-based genotyping. However, both techniques have limitations. Hemoculture requires the isolation of parasites from patient blood and the growth of these parasites in animals or in vitro culture, thereby possibly selecting certain subpopulations. Moreover, the level of parasitemia in the chronic phase is very low, hindering the detection of parasites. Additionally, direct genotyping of parasites from infected tissues is an invasive procedure that requires medical care and hinders studies with a large number of samples. The goal of this work was to identify conserved and polymorphic linear B-cell epitopes of T. cruzi on a genome-wide scale for use in the serodiagnosis and serotyping of Chagas disease using ELISA. Development of a serotyping method based on the detection of strain-specific antibodies may help to understand the relationship between the infecting strain and disease evolution.
Trypanosomatids of the order Kinetoplastida are major contributors to global disease and morbidity, and understanding their basic biology coupled with the development of new drug targets represents a critical need. Additionally, trypanosomes are among the more accessible divergent eukaryote experimental systems. The genome of Trypanosoma brucei contains 8,131 predicted open reading frames (ORFs), of which over half have no known homologues beyond the Kinetoplastida and a substantial number of others are poorly defined by in silico analysis. Thus, a major challenge following completion of the T. brucei genome sequence is to obtain functional data for all trypanosome ORFs. As T. brucei is more experimentally tractable than the related Trypanosoma cruzi and Leishmania spp. and shares >75% of their genes, functional analysis of T. brucei has the potential to inform a range of parasite biology. Here, we report methods for systematic mRNA ablation by RNA interference (RNAi) and for phenotypic analysis, together with online data dissemination. This represents the first systematic analysis of gene function in a parasitic organism. In total, 210 genes have been targeted in the bloodstream form parasite, representing an essentially complete phenotypic catalogue of chromosome I together with a validation set. Over 30% of the chromosome I genes generated a phenotype when targeted by RNAi; most commonly, this affected cell growth, viability, and/or cell cycle progression. RNAi against approximately 12% of ORFs was lethal, and an additional 11% had growth defects but retained short-term viability in culture. Although we found no evidence for clustering or a bias towards widely evolutionarily conserved genes within the essential ORF cohort, the putative chromosome I centromere is adjacent to a domain containing genes with no associated phenotype. Involvement of such a large proportion of genes in robust growth in vitro indicates that a high proportion of the expressed trypanosome genome is required for efficient propagation; many of these gene products represent potential drug targets.
CCCH type zinc finger proteins are RNA binding proteins with regulatory functions at all stages of mRNA metabolism. The best-characterized member, tritetraproline (TTP), binds to AU rich elements in 3' UTRs of unstable mRNAs, mediating their degradation. In kinetoplastids, CCCH type zinc finger proteins have been identified as being involved in the regulation of the life cycle and possibly the cell cycle. To date, no systematic listing of CCCH proteins in kinetoplastids is available.
We have identified the complete set of CCCH type zinc finger proteins in the available genomes of the kinetoplastid protozoa Trypanosoma brucei, Trypanosoma cruzi and Leishmania major. One fifths (20%) of all CCCH motifs fall into non-conventional classes and many had not been previously identified. One third of all CCCH proteins have more than one CCCH motif, suggesting multivalent RNA binding. One third have additional recognizable domains. The vast majority are unique to Kinetoplastida or to a subgroup within. Two exceptions are of interest: the putative orthologue of the mRNA nuclear export factor Mex67 and a 3'-5' exoribonuclease restricted to Leishmania species. CCCH motifs are absent from these proteins in other organisms and might be unique, novel features of the Kinetoplastida homologues. Of the others, several have a predicted, and in one case experimentally confirmed, connection to the ubiquitination pathways, for instance a HECT-type E3 ubiquitin ligase. The total number of kinetoplastid CCCH proteins is similar to the number in higher eukaryotes but lower than in yeast. A comparison of the genomic loci between the Trypanosomatidae homologues provides insight into both the evolution of the CCCH proteins as well as the CCCH motifs.
This study provides the first systematic listing of the Kinetoplastida CCCH proteins. The number of CCCH proteins with more then one CCCH motif is larger than previously estimated, due to the identification of non-conventional CCCH motifs. Experimental approaches are now necessary to examine the functions of the many unique CCCH proteins as well as the function of the putative Mex67 and the Leishmania 3'-5' exoribonuclease.
The mitochondrial DNA of kinetoplastid flagellates is distinctive in the eukaryotic world due to its massive size, complex form and large sequence content. Comprised of catenated maxicircles that contain rRNA and protein-coding genes and thousands of heterogeneous minicircles encoding small guide RNAs, the kinetoplast network has evolved along with an extreme form of mRNA processing in the form of uridine insertion and deletion RNA editing. Many maxicircle-encoded mRNAs cannot be translated without this post-transcriptional sequence modification.
We present the complete sequence and annotation of the Trypanosoma cruzi maxicircles for the CL Brener and Esmeraldo strains. Gene order is syntenic with Trypanosoma brucei and Leishmania tarentolae maxicircles. The non-coding components have strain-specific repetitive regions and a variable region that is unique for each strain with the exception of a conserved sequence element that may serve as an origin of replication, but shows no sequence identity with L. tarentolae or T. brucei. Alternative assemblies of the variable region demonstrate intra-strain heterogeneity of the maxicircle population. The extent of mRNA editing required for particular genes approximates that seen in T. brucei. Extensively edited genes were more divergent among the genera than non-edited and rRNA genes. Esmeraldo contains a unique 236-bp deletion that removes the 5'-ends of ND4 and CR4 and the intergenic region. Esmeraldo shows additional insertions and deletions outside of areas edited in other species in ND5, MURF1, and MURF2, while CL Brener has a distinct insertion in MURF2.
The CL Brener and Esmeraldo maxicircles represent two of three previously defined maxicircle clades and promise utility as taxonomic markers. Restoration of the disrupted reading frames might be accomplished by strain-specific RNA editing. Elements in the non-coding region may be important for replication, transcription, and anchoring of the maxicircle within the kinetoplast network.
Around the world, trypanosomatids are known for being etiological agents of several highly disabling and often fatal diseases like Chagas disease (Trypanosoma cruzi), leishmaniasis (Leishmania spp.), and African trypanosomiasis (Trypanosoma brucei). Throughout their life cycle, they must cope with diverse environmental conditions, and the mechanisms involved in these processes are crucial for their survival. In this review, we describe the role of heme in several essential metabolic pathways of these protozoans. Notwithstanding trypanosomatids lack of the complete heme biosynthetic pathway, we focus our discussion in the metabolic role played for important heme-proteins, like cytochromes. Although several genes for different types of cytochromes, involved in mitochondrial respiration, polyunsaturated fatty acid metabolism, and sterol biosynthesis, are annotated at the Tritryp Genome Project, the encoded proteins have not yet been deeply studied. We pointed our attention into relevant aspects of these protein functions that are amenable to be considered for rational design of trypanocidal agents.
L1Tc is the best represented autonomous LINE of the Trypanosoma cruzi genome, throughout which several functional copies may exist. In this study, we show that the first 77 bp of L1Tc (Pr77) (also present in the T. cruzi non-autonomous retrotransposon NARTc, in the Trypanosoma brucei RIME/ingi elements, and in the T. cruzi, T. brucei and Leishmania major degenerate L1Tc/ingi-related elements [DIREs]) behave as a promoter element that activates gene transcription. The transcription rate promoted by Pr77 is 10–14-fold higher than that mediated by sequences located upstream from the T. cruzi tandemly repeated genes KMP11 and the GAPDH. The Pr77 promoter-derived mRNAs initiate at nucleotide +1 of L1Tc, are unspliced and translated. L1Tc transcripts show a moderate half life and are RNA pol II dependent. The presence of an internal promoter at the 5′ end of L1Tc favors the production of full-length L1Tc RNAs and reinforces the hypothesis that this mobile element may be naturally autonomous in its transposition.
The completion of the genome sequencing projects for major pathogens Trypanosoma brucei, Trypanosoma cruzi and Leishmania major has enabled numerous studies that would have been difficult or impossible to perform otherwise. New technologies in sequencing and protein analyses promise further rapid expansion in our capabilities. The keys to successful use of these new tools are recognizing the power and limitations of studies performed thus far, grasping the unrealized potential of new and developing technologies, and creating access to a multidisciplinary set of skills that will facilitate research, particularly in the bioinformatic analysis of the reams of data that will be forthcoming. In this Discussion, we will provide an overview of kinetoplastid genomics studies with emphasis on studies advanced through genomic data, and a preview of what may come in the near future.
bioinformatics; Leishmania; Trypanosoma; trypanosome
Trypanosoma cruzi, the causal agent of Chagas Disease, affects more than 16 million people in Latin America. The clinical outcome of the disease results from a complex interplay between environmental factors and the genetic background of both the human host and the parasite. However, knowledge of the genetic diversity of the parasite, is currently limited to a number of highly studied loci. The availability of a number of genomes from different evolutionary lineages of T. cruzi provides an unprecedented opportunity to look at the genetic diversity of the parasite at a genomic scale.
Using a bioinformatic strategy, we have clustered T. cruzi sequence data available in the public domain and obtained multiple sequence alignments in which one or two alleles from the reference CL-Brener were included. These data covers 4 major evolutionary lineages (DTUs): TcI, TcII, TcIII, and the hybrid TcVI. Using these set of alignments we have identified 288,957 high quality single nucleotide polymorphisms and 1,480 indels. In a reduced re-sequencing study we were able to validate ~ 97% of high-quality SNPs identified in 47 loci. Analysis of how these changes affect encoded protein products showed a 0.77 ratio of synonymous to non-synonymous changes in the T. cruzi genome. We observed 113 changes that introduce or remove a stop codon, some causing significant functional changes, and a number of tri-allelic and tetra-allelic SNPs that could be exploited in strain typing assays. Based on an analysis of the observed nucleotide diversity we show that the T. cruzi genome contains a core set of genes that are under apparent purifying selection. Interestingly, orthologs of known druggable targets show statistically significant lower nucleotide diversity values.
This study provides the first look at the genetic diversity of T. cruzi at a genomic scale. The analysis covers an estimated ~ 60% of the genetic diversity present in the population, providing an essential resource for future studies on the development of new drugs and diagnostics, for Chagas Disease. These data is available through the TcSNP database (http://snps.tcruzi.org).
The increased sequencing of pathogen genomes and the subsequent availability of genome-scale functional datasets are expected to guide the experimental work necessary for target-based drug discovery. However, a major bottleneck in this has been the difficulty of capturing and integrating relevant information in an easily accessible format for identifying and prioritizing potential targets. The open-access resource TDRtargets.org facilitates drug target prioritization for major tropical disease pathogens such as the mycobacteria Mycobacterium leprae and Mycobacterium tuberculosis; the kinetoplastid protozoans Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi; the apicomplexan protozoans Plasmodium falciparum, Plasmodium vivax, and Toxoplasma gondii; and the helminths Brugia malayi and Schistosoma mansoni.
Here we present strategies to prioritize pathogen proteins based on whether their properties meet criteria considered desirable in a drug target. These criteria are based upon both sequence-derived information (e.g., molecular mass) and functional data on expression, essentiality, phenotypes, metabolic pathways, assayability, and druggability. This approach also highlights the fact that data for many relevant criteria are lacking in less-studied pathogens (e.g., helminths), and we demonstrate how this can be partially overcome by mapping data from homologous genes in well-studied organisms. We also show how individual users can easily upload external datasets and integrate them with existing data in TDRtargets.org to generate highly customized ranked lists of potential targets.
Using the datasets and the tools available in TDRtargets.org, we have generated illustrative lists of potential drug targets in seven tropical disease pathogens. While these lists are broadly consistent with the research community's current interest in certain specific proteins, and suggest novel target candidates that may merit further study, the lists can easily be modified in a user-specific manner, either by adjusting the weights for chosen criteria or by changing the criteria that are included.
In cell-based drug development, researchers attempt to create drugs that kill a pathogen without necessarily understanding the details of how the drugs work. In contrast, target-based drug development entails the search for compounds that act on a specific intracellular target—often a protein known or suspected to be required for survival of the pathogen. The latter approach to drug development has been facilitated greatly by the sequencing of many pathogen genomes and the incorporation of genome data into user-friendly databases. The present paper shows how the database TDRtargets.org can identify proteins that might be considered good drug targets for diseases such as African sleeping sickness, Chagas disease, parasitic worm infections, tuberculosis, and malaria. These proteins may score highly in searches of the database because they are dissimilar to human proteins, are structurally similar to other “druggable” proteins, have functions that are easy to measure, and/or fulfill other criteria. Researchers can use the lists of high-scoring proteins as a basis for deciding which potential drug targets to pursue experimentally.
The lack of general class II transcription factors was a hallmark of the genomic sequences of the human parasites Trypanosoma brucei, Trypanosoma cruzi and Leishmania major. However, the recent identification of TFIIA as part of a protein complex essential for RNA polymerase II-mediated transcription of SLRNA genes, which encode the trans splicing-specific spliced leader RNA, suggests that trypanosomatids assemble a highly divergent set of these factors at the SLRNA promoter. Here we report the identification of a trypanosomatid TFIIB-like (TFIIBlike) protein which has limited overall sequence homology to eukaryotic TFIIB and archaeal TFB but harbors conserved residues within the N-terminal zinc ribbon domain, the B finger and cyclin repeat I. In accordance with the function of TFIIB, T.brucei TFIIBlike is encoded by an essential gene, localizes to the nucleus, specifically binds to the SLRNA promoter, interacts with RNA polymerase II, and is absolutely required for SLRNA transcription.
The recent availability of genomic sequences and BAC libraries for a large number of mammals provides an excellent opportunity for identifying comparatively-anchored markers that are useful for creating high-resolution radiation-hybrid (RH) and BAC-based comparative maps. To use these maps for multispecies genome comparison and evolutionary inference, robust bioinformatic tools are required for the identification of chromosomal regions shared between genomes and to localize the positions of evolutionary breakpoints that are the signatures of chromosomal rearrangements. Here we report an automated tool for the identification of homologous synteny blocks (HSBs) between genomes that tolerates errors common in RH comparative maps and can be used for automated whole-genome analysis of chromosome rearrangements that occur during evolution.
We developed an algorithm and software tool (SyntenyTracker) that can be used for automated definition of HSBs using pair-wise RH or gene-based comparative maps as input. To verify correct implementation of the underlying algorithm, SyntenyTracker was used to identify HSBs in the cattle and human genomes. Results demonstrated 96% agreement with HSBs defined manually using the same set of rules. A comparison of SyntenyTracker with the AutoGRAPH synteny tool was performed using identical datasets containing 14,380 genes with 1:1 orthology in human and mouse. Discrepancies between the results using the two tools and advantages of SyntenyTracker are reported.
SyntenyTracker was shown to be an efficient and accurate automated tool for defining HSBs using datasets that may contain minor errors resulting from limitations in map construction methodologies. The utility of SyntenyTracker will become more important for comparative genomics as the number of mapped and sequenced genomes increases.
Chagas disease is caused by Trypanosoma cruzi and is endemic to North, Central and South American countries. Current therapy against this disease is only partially effective and produces adverse side effects. Studies on the metabolic pathways of T. cruzi, in particular those with no equivalent in mammalian cells, might identify targets for the development of new drugs. Ceramide is metabolized to inositolphosphoceramide (IPC) in T. cruzi and other kinetoplastid protists whereas in mammals it is mainly incorporated into sphingomyelin. In T. cruzi, in contrast to Trypanosoma brucei and Leishmania spp., IPC functions as lipid anchor constituent of glycoproteins and free glycosylinositolphospholipids (GIPLs). Inhibition of IPC and GIPLs biosynthesis impairs differentiation of trypomastigotes into the intracellular amastigote forms. The gene encoding IPC synthase in T. cruzi has been identified and the enzyme has been expressed in a cell-free system. The enzyme involved in IPC degradation and the remodelases responsible for the incorporation of ceramide into free GIPLs or into the glycosylphosphatidyl inositols (GPIs) anchoring glycoproteins, and in fatty acid modifications of these molecules of T. cruzi have been understudied. IPC metabolism and remodeling could be exploited as targets for Chagas disease chemotherapy.
glycosylphosphatidylinositol; glycosylinositolphospholipids; inositolphosphoceramide; phospholipase C; sphingolipids; Trypanosoma