PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (729340)

Clipboard (0)
None

Related Articles

1.  TriTrypDB: a functional genomic resource for the Trypanosomatidae 
Nucleic Acids Research  2009;38(Database issue):D457-D462.
TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. ‘User Comments’ may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.
doi:10.1093/nar/gkp851
PMCID: PMC2808979  PMID: 19843604
2.  Anatomy and evolution of telomeric and subtelomeric regions in the human protozoan parasite Trypanosoma cruzi 
BMC Genomics  2012;13:229.
Background
The subtelomeres of many protozoa are highly enriched in genes with roles in niche adaptation. T. cruzi trypomastigotes express surface proteins from Trans-Sialidase (TS) and Dispersed Gene Family-1 (DGF-1) superfamilies which are implicated in host cell invasion. Single populations of T. cruzi may express different antigenic forms of TSs. Analysis of TS genes located at the telomeres suggests that chromosome ends could have been the sites where new TS variants were generated. The aim of this study is to characterize telomeric and subtelomeric regions of T. cruzi available in TriTrypDB and connect the sequences of telomeres to T. cruzi working draft sequence.
Results
We first identified contigs carrying the telomeric repeat (TTAGGG). Of 49 contigs identified, 45 have telomeric repeats at one end, whereas in four contigs the repeats are located internally. All contigs display a conserved telomeric junction sequence adjacent to the hexamer repeats which represents a signature of T. cruzi chromosome ends. We found that 40 telomeric contigs are located on T. cruzi chromosome-sized scaffolds. In addition, we were able to map several telomeric ends to the chromosomal bands separated by pulsed-field gel electrophoresis.
The subtelomeric sequence structure varies widely, mainly as a result of large differences in the relative abundance and organization of genes encoding surface proteins (TS and DGF-1), retrotransposon hot spot genes (RHS), retrotransposon elements, RNA-helicase and N-acetyltransferase genes. While the subtelomeric regions are enriched in pseudogenes, they also contain complete gene sequences matching both known and unknown expressed genes, indicating that these regions do not consist of nonfunctional DNA but are instead functional parts of the expressed genome. The size of the subtelomeric regions varies from 5 to 182 kb; the smaller of these regions could have been generated by a recent chromosome breakage and telomere healing event.
Conclusions
The lack of synteny in the subtelomeric regions suggests that genes located in these regions are subject to recombination, which increases their variability, even among homologous chromosomes. The presence of typical subtelomeric genes can increase the chance of homologous recombination mechanisms or microhomology-mediated end joining, which may use these regions for the pairing and recombination of free ends.
doi:10.1186/1471-2164-13-229
PMCID: PMC3418195  PMID: 22681854
3.  Genome Size, Karyotype Polymorphism and Chromosomal Evolution in Trypanosoma cruzi 
PLoS ONE  2011;6(8):e23042.
Background
The Trypanosoma cruzi genome was sequenced from a hybrid strain (CL Brener). However, high allelic variation and the repetitive nature of the genome have prevented the complete linear sequence of chromosomes being determined. Determining the full complement of chromosomes and establishing syntenic groups will be important in defining the structure of T. cruzi chromosomes. A large amount of information is now available for T. cruzi and Trypanosoma brucei, providing the opportunity to compare and describe the overall patterns of chromosomal evolution in these parasites.
Methodology/Principal Findings
The genome sizes, repetitive DNA contents, and the numbers and sizes of chromosomes of nine strains of T. cruzi from four lineages (TcI, TcII, TcV and TcVI) were determined. The genome of the TcI group was statistically smaller than other lineages, with the exception of the TcI isolate Tc1161 (José-IMT). Satellite DNA content was correlated with genome size for all isolates, but this was not accompanied by simultaneous amplification of retrotransposons. Regardless of chromosomal polymorphism, large syntenic groups are conserved among T. cruzi lineages. Duplicated chromosome-sized regions were identified and could be retained as paralogous loci, increasing the dosage of several genes. By comparing T. cruzi and T. brucei chromosomes, homologous chromosomal regions in T. brucei were identified. Chromosomes Tb9 and Tb11 of T. brucei share regions of syntenic homology with three and six T. cruzi chromosomal bands, respectively.
Conclusions
Despite genome size variation and karyotype polymorphism, T. cruzi lineages exhibit conservation of chromosome structure. Several syntenic groups are conserved among all isolates analyzed in this study. The syntenic regions are larger than expected if rearrangements occur randomly, suggesting that they are conserved owing to positive selection. Mapping of the syntenic regions on T. cruzi chromosomal bands provides evidence for the occurrence of fusion and split events involving T. brucei and T. cruzi chromosomes.
doi:10.1371/journal.pone.0023042
PMCID: PMC3155523  PMID: 21857989
4.  A population study of the minicircles in Trypanosoma cruzi: predicting guide RNAs in the absence of empirical RNA editing 
BMC Genomics  2007;8:133.
Background
The structurally complex network of minicircles and maxicircles comprising the mitochondrial DNA of kinetoplastids mirrors the complexity of the RNA editing process that is required for faithful expression of encrypted maxicircle genes. Although a few of the guide RNAs that direct this editing process have been discovered on maxicircles, guide RNAs are mostly found on the minicircles. The nuclear and maxicircle genomes have been sequenced and assembled for Trypanosoma cruzi, the causative agent of Chagas disease, however the complement of 1.4-kb minicircles, carrying four guide RNA genes per molecule in this parasite, has been less thoroughly characterised.
Results
Fifty-four CL Brener and 53 Esmeraldo strain minicircle sequence reads were extracted from T. cruzi whole genome shotgun sequencing data. With these sequences and all published T. cruzi minicircle sequences, 108 unique guide RNAs from all known T. cruzi minicircle sequences and two guide RNAs from the CL Brener maxicircle were predicted using a local alignment algorithm and mapped onto predicted or experimentally determined sequences of edited maxicircle open reading frames. For half of the sequences no statistically significant guide RNA could be assigned. Likely positions of these unidentified gRNAs in T. cruzi minicircle sequences are estimated using a simple Hidden Markov Model. With the local alignment predictions as a standard, the HMM had an ~85% chance of correctly identifying at least 20 nucleotides of guide RNA from a given minicircle sequence. Inter-minicircle recombination was documented. Variable regions contain species-specific areas of distinct nucleotide preference. Two maxicircle guide RNA genes were found.
Conclusion
The identification of new minicircle sequences and the further characterization of all published minicircles are presented, including the first observation of recombination between minicircles. Extrapolation suggests a level of 4% recombinants in the population, supporting a relatively high recombination rate that may serve to minimize the persistence of gRNA pseudogenes. Characteristic nucleotide preferences observed within variable regions provide potential clues regarding the transcription and maturation of T. cruzi guide RNAs. Based on these preferences, a method of predicting T. cruzi guide RNAs using only primary minicircle sequence data was created.
doi:10.1186/1471-2164-8-133
PMCID: PMC1892023  PMID: 17524149
5.  TcTASV: A Novel Protein Family in Trypanosoma cruzi Identified from a Subtractive Trypomastigote cDNA Library 
Background
The identification and characterization of antigens expressed in Trypanosoma cruzi stages that parasitize mammals are essential steps for the development of new vaccines and diagnostics. Genes that are preferentially expressed in trypomastigotes may be involved in key processes that define the biology of trypomastigotes, like cell invasion and immune system evasion.
Methodology/Principal Findings
With the initial aim of identifying trypomastigote-specific expressed tags, we constructed and sequenced an epimastigote-subtracted trypomastigote cDNA library (library TcT-E). More than 45% of the sequenced clones of the library could not be mapped to previously annotated mRNAs or proteins. We validated the presence of these transcripts by reverse northern blot and northern blot experiments, therefore providing novel information about the mRNA expression of these genes in trypomastigotes. A 280-bp consensus element (TcT-E element, TcT-Eelem) located at the 3′ untranslated region (3′ UTR) of many different open reading frames (ORFs) was identified after clustering the TcT-E dataset. Using an RT-PCR approach, we were able to amplify different mature mRNAs containing the same TcT-Eelem in the 3′ UTR. The proteins encoded by these ORFs are members of a novel surface protein family in T. cruzi, (which we named TcTASV for T. cruzi Trypomastigote, Alanine, Serine and Valine rich proteins). All members of the TcTASV family have conserved coding amino- and carboxy-termini, and a central variable core that allows partitioning of TcTASV proteins into three subfamilies. Analysis of the T. cruzi genome database resulted in the identification of 38 genes/ORFs for the whole TcTASV family in the reference CL-Brener strain (lineage II). Because this protein family was not found in other trypanosomatids, we also looked for the presence of TcTASV genes in other evolutionary lineages of T. cruzi, sequencing 48 and 28 TcTASVs members from the RA (lineage II) and Dm28 (lineage I) T. cruzi strains respectively. Detailed phylogenetic analyses of TcTASV gene products show that this gene family is different from previously characterized mucin (TcMUCII), mucin-like, and MASP protein families.
Conclusions/Significance
We identified TcTASV, a new gene family of surface proteins in T. cruzi.
Author Summary
Chagas' disease, caused by the kinetoplastid protozoan parasite Trypanosoma cruzi, is endemic in Latin America. At present there are neither vaccines for prevention nor totally effective drugs for the treatment of the disease. T. cruzi has a complex life cycle alternating between a reduviid insect (the vector) and a mammalian host, where different parasite stages are found. Differentially expressed genes are the hallmark of the specialized biology of each life cycle stage. The aim of this work was to identify genes expressed in the trypomastigote stage (a blood-circulating stage that invades new cells and spreads the infection in different organs of the mammalian host) that could be used to develop new vaccines or diagnostics. An initial screening of trypomastigote transcripts was performed by sequencing of an epimastigote-subtracted trypomastigote cDNA library. Besides identifying a large proportion of differentially expressed mRNAs, we discovered a novel protein family, which we denominated TcTASV.
doi:10.1371/journal.pntd.0000841
PMCID: PMC2950142  PMID: 20957201
6.  Kinetoplastid PPEF phosphatases: Dual acylated proteins expressed in the endomembrane system of Leishmania 
Bioinformatic analyses have been used to identify potential downstream targets of the essential enzyme N-myristoyl transferase in the TriTryp species, Leishmania major, Trypanosoma brucei and Trypanosoma cruzi. These database searches predict ∼60 putative N-myristoylated proteins with high confidence, including both previously characterised and novel molecules. One of the latter is an N-myristoylated protein phosphatase which has high sequence similarity to the Protein Phosphatase with EF-Hand (PPEF) proteins identified in sensory cells of higher eukaryotes. In L. major and T. brucei, the PPEF-like phosphatases are encoded by single-copy genes and are constitutively expressed in all parasite life cycle stages. The N-terminus of LmPPEF is a substrate for N-myristoyl transferase and is also palmitoylated in vivo. The wild type protein has been localised to the endocytic system by immunofluorescence. The catalytic and fused C-terminal domains of the kinetoplastid and other eukaryotic PPEFs share high sequence similarity, but unlike their higher eukaryotic relatives, the C-terminal parasite EF-hand domains are degenerate and do not bind calcium.
doi:10.1016/j.molbiopara.2006.11.008
PMCID: PMC1885993  PMID: 17169445
PPEF, Protein Phosphatase with EF-Hands; NMT, N-myristoyl transferase; BSF, bloodstream form; PCF, procyclic form; N-Myristoylation; Palmitoylation; Protein phosphatases; Bioinformatics
7.  Ancestral Genomes, Sex, and the Population Structure of Trypanosoma cruzi 
PLoS Pathogens  2006;2(3):e24.
Acquisition of detailed knowledge of the structure and evolution of Trypanosoma cruzi populations is essential for control of Chagas disease. We profiled 75 strains of the parasite with five nuclear microsatellite loci, 24Sα RNA genes, and sequence polymorphisms in the mitochondrial cytochrome oxidase subunit II gene. We also used sequences available in GenBank for the mitochondrial genes cytochrome B and NADH dehydrogenase subunit 1. A multidimensional scaling plot (MDS) based in microsatellite data divided the parasites into four clusters corresponding to T. cruzi I (MDS-cluster A), T. cruzi II (MDS-cluster C), a third group of T. cruzi strains (MDS-cluster B), and hybrid strains (MDS-cluster BH). The first two clusters matched respectively mitochondrial clades A and C, while the other two belonged to mitochondrial clade B. The 24Sα rDNA and microsatellite profiling data were combined into multilocus genotypes that were analyzed by the haplotype reconstruction program PHASE. We identified 141 haplotypes that were clearly distributed into three haplogroups (X, Y, and Z). All strains belonging to T. cruzi I (MDS-cluster A) were Z/Z, the T. cruzi II strains (MDS-cluster C) were Y/Y, and those belonging to MDS-cluster B (unclassified T. cruzi) had X/X haplogroup genotypes. The strains grouped in the MDS-cluster BH were X/Y, confirming their hybrid character. Based on these results we propose the following minimal scenario for T. cruzi evolution. In a distant past there were at a minimum three ancestral lineages that we may call, respectively, T. cruzi I, T. cruzi II, and T. cruzi III. At least two hybridization events involving T. cruzi II and T. cruzi III produced evolutionarily viable progeny. In both events, the mitochondrial recipient (as identified by the mitochondrial clade of the hybrid strains) was T. cruzi II and the mitochondrial donor was T. cruzi III.
Synopsis
The parasite protozoan Trypanosoma cruzi causes Chagas disease, a malady that afflicts almost 20 million people in South America and Central America. Although the genome sequencing of T. cruzi has been recently completed, little is known about its population structure and evolution. Since 1999, two major evolutionary lineages presenting distinct epidemiological characteristics have been recognized in the parasite: T. cruzi I and T. cruzi II, the latter being much more associated with severe chronic cases of the disease. We describe new and important aspects of the population structure of the parasite, especially the characterization of a third ancestral lineage that we propose to call T. cruzi III. Through careful dissection of the genetic constitution of blocks of genes that are stably transmitted from generation to generation of the parasite we deduced at least two occurrences of the formation of hybrid strains from the parental lineages T. cruzi II and T. cruzi III, including the strain CLBrener, whose genome was sequenced. We did not find any hybrids originating from T. cruzi I. A fascinating finding was that both hybrids studied had the same mitochondrial DNA type as the T. cruzi III ancestral lineage, which was quite different from T.cruzi II.
doi:10.1371/journal.ppat.0020024
PMCID: PMC1434789  PMID: 16609729
8.  Gene organization and sequence analyses of transfer RNA genes in Trypanosomatid parasites 
BMC Genomics  2009;10:232.
Background
The protozoan pathogens Leishmania major, Trypanosoma brucei and Trypanosoma cruzi (the Tritryps) are parasites that produce devastating human diseases. These organisms show very unusual mechanisms of gene expression, such as polycistronic transcription. We are interested in the study of tRNA genes, which are transcribed by RNA polymerase III (Pol III). To analyze the sequences and genomic organization of tRNA genes and other Pol III-transcribed genes, we have performed an in silico analysis of the Tritryps genome sequences.
Results
Our analysis indicated the presence of 83, 66 and 120 genes in L. major, T. brucei and T. cruzi, respectively. These numbers include several previously unannotated selenocysteine (Sec) tRNA genes. Most tRNA genes are organized into clusters of 2 to 10 genes that may contain other Pol III-transcribed genes. The distribution of genes in the L. major genome does not seem to be totally random, like in most organisms. While the majority of the tRNA clusters do not show synteny (conservation of gene order) between the Tritryps, a cluster of 13 Pol III genes that is highly syntenic was identified. We have determined consensus sequences for the putative promoter regions (Boxes A and B) of the Tritryps tRNA genes, and specific changes were found in tRNA-Sec genes. Analysis of transcription termination signals of the tRNAs (clusters of Ts) showed differences between T. cruzi and the other two species. We have also identified several tRNA isodecoder genes (having the same anticodon, but different sequences elsewhere in the tRNA body) in the Tritryps.
Conclusion
A low number of tRNA genes is present in Tritryps. The overall weak synteny that they show indicates a reduced importance of genome location of Pol III genes compared to protein-coding genes. The fact that some of the differences between isodecoder genes occur in the internal promoter elements suggests that differential control of the expression of some isoacceptor tRNA genes in Tritryps is possible. The special characteristics found in Boxes A and B from tRNA-Sec genes from Tritryps indicate that the mechanisms that regulate their transcription might be different from those of other tRNA genes.
doi:10.1186/1471-2164-10-232
PMCID: PMC2695483  PMID: 19450263
9.  Widespread, focal copy number variations (CNV) and whole chromosome aneuploidies in Trypanosoma cruzi strains revealed by array comparative genomic hybridization 
BMC Genomics  2011;12:139.
Background
Trypanosoma cruzi is a protozoan parasite and the etiologic agent of Chagas disease, an important public health problem in Latin America. T. cruzi is diploid, almost exclusively asexual, and displays an extraordinarily diverse population structure both genetically and phenotypically. Yet, to date the genotypic diversity of T. cruzi and its relationship, if any, to biological diversity have not been studied at the whole genome level.
Results
In this study, we used whole genome oligonucleotide tiling arrays to compare gene content in biologically disparate T. cruzi strains by comparative genomic hybridization (CGH). We observed that T. cruzi strains display widespread and focal copy number variations (CNV) and a substantially greater level of diversity than can be adequately defined by the current genetic typing methods. As expected, CNV were particularly frequent in gene family-rich regions containing mucins and trans-sialidases but were also evident in core genes. Gene groups that showed little variation in copy numbers among the strains tested included those encoding protein kinases and ribosomal proteins, suggesting these loci were less permissive to CNV. Moreover, frequent variation in chromosome copy numbers were observed, and chromosome-specific CNV signatures were shared by genetically divergent T. cruzi strains.
Conclusions
The large number of CNV, over 4,000, reported here uphold at a whole genome level the long held paradigm of extraordinary genome plasticity among T. cruzi strains. Moreover, the fact that these heritable markers do not parse T. cruzi strains along the same lines as traditional typing methods is strongly suggestive of genetic exchange playing a major role in T. cruzi population structure and biology.
doi:10.1186/1471-2164-12-139
PMCID: PMC3060142  PMID: 21385342
10.  Database of Trypanosoma cruzi repeated genes: 20 000 additional gene variants 
BMC Genomics  2007;8:391.
Background
Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats. These repeats include surface molecule genes, and several other gene families. In the T. cruzi genome sequencing project, it was clear that not all copies of repetitive genes were present in the assembly, due to collapse of nearly identical repeats. However, at the time of publication of the T. cruzi genome, it was not clear to what extent this had occurred.
Results
We have developed a pipeline to estimate the genomic repeat content, where shotgun reads are aligned to the genomic sequence and the gene copy number is estimated using the average shotgun coverage. This method was applied to the genome of T. cruzi and copy numbers of all protein coding sequences and pseudogenes were estimated. The 22 640 results were stored in a database available online. 18% of all protein coding sequences and pseudogenes were estimated to exist in 14 or more copies in the T. cruzi CL Brener genome. The average coverage of the annotated protein coding sequences and pseudogenes indicate a total gene copy number, including allelic gene variants, of over 40 000.
Conclusion
Our results indicate that the number of protein coding sequences and pseudogenes in the T. cruzi genome may be twice the previous estimate. We have constructed a database of the T. cruzi gene repeat data that is available as a resource to the community. The main purpose of the database is to enable biologists interested in repeated, unfinished regions to closely examine and resolve these regions themselves using all available shotgun data, instead of having to rely on annotated consensus sequences that often are erroneous and possibly misleading. Five repetitive genes were studied in more detail, in order to illustrate how the database can be used to analyze and extract information about gene repeats with different characteristics in Trypanosoma cruzi.
doi:10.1186/1471-2164-8-391
PMCID: PMC2204015  PMID: 17963481
11.  Trypanosoma cruzi: Molecular characterization of an RNA binding protein differentially expressed in the parasite life cycle 
Experimental parasitology  2007;117(1):99-105.
Molecular studies have shown several peculiarities in the regulatory mechanisms of gene expression in trypanosomatids. Protein coding genes are organized in long polycistronic units that seem to be constitutively transcribed. Therefore, post-transcriptional regulation of gene expression is considered to be the main point for control of transcript abundance and functionality. Here we describe the characterization of a 17 kDa RNA-binding protein from Trypanosoma cruzi (TcRBP19) containing an RNA recognition motive (RRM). This protein is coded by a single copy gene located in a high molecular weight chromosome of T. cruzi. Orthologous genes are present in the TriTryp genomes. TcRBP19 shows target selectivity since among the different homoribopolymers it preferentially binds polyC. TcRBP19 is a low expression protein only barely detected at the amastigote stage localizing in a diffuse pattern in the cytoplasm.
doi:10.1016/j.exppara.2007.03.010
PMCID: PMC2020836  PMID: 17475252
Kinetoplastida; Trypanosoma cruzi; RNA binding proteins; RRM protein; TcRBP19
12.  A genomic scale map of genetic diversity in Trypanosoma cruzi 
BMC Genomics  2012;13:736.
Background
Trypanosoma cruzi, the causal agent of Chagas Disease, affects more than 16 million people in Latin America. The clinical outcome of the disease results from a complex interplay between environmental factors and the genetic background of both the human host and the parasite. However, knowledge of the genetic diversity of the parasite, is currently limited to a number of highly studied loci. The availability of a number of genomes from different evolutionary lineages of T. cruzi provides an unprecedented opportunity to look at the genetic diversity of the parasite at a genomic scale.
Results
Using a bioinformatic strategy, we have clustered T. cruzi sequence data available in the public domain and obtained multiple sequence alignments in which one or two alleles from the reference CL-Brener were included. These data covers 4 major evolutionary lineages (DTUs): TcI, TcII, TcIII, and the hybrid TcVI. Using these set of alignments we have identified 288,957 high quality single nucleotide polymorphisms and 1,480 indels. In a reduced re-sequencing study we were able to validate ~ 97% of high-quality SNPs identified in 47 loci. Analysis of how these changes affect encoded protein products showed a 0.77 ratio of synonymous to non-synonymous changes in the T. cruzi genome. We observed 113 changes that introduce or remove a stop codon, some causing significant functional changes, and a number of tri-allelic and tetra-allelic SNPs that could be exploited in strain typing assays. Based on an analysis of the observed nucleotide diversity we show that the T. cruzi genome contains a core set of genes that are under apparent purifying selection. Interestingly, orthologs of known druggable targets show statistically significant lower nucleotide diversity values.
Conclusions
This study provides the first look at the genetic diversity of T. cruzi at a genomic scale. The analysis covers an estimated ~ 60% of the genetic diversity present in the population, providing an essential resource for future studies on the development of new drugs and diagnostics, for Chagas Disease. These data is available through the TcSNP database (http://snps.tcruzi.org).
doi:10.1186/1471-2164-13-736
PMCID: PMC3545726  PMID: 23270511
13.  A Genome-Wide Analysis of Genetic Diversity in Trypanosoma cruzi Intergenic Regions 
Background
Trypanosoma cruzi is the causal agent of Chagas Disease. Recently, the genomes of representative strains from two major evolutionary lineages were sequenced, allowing the construction of a detailed genetic diversity map for this important parasite. However this map is focused on coding regions of the genome, leaving a vast space of regulatory regions uncharacterized in terms of their evolutionary conservation and/or divergence.
Methodology
Using data from the hybrid CL Brener and Sylvio X10 genomes (from the TcVI and TcI Discrete Typing Units, respectively), we identified intergenic regions that share a common evolutionary ancestry, and are present in both CL Brener haplotypes (TcII-like and TcIII-like) and in the TcI genome; as well as intergenic regions that were conserved in only two of the three genomes/haplotypes analyzed. The genetic diversity in these regions was characterized in terms of the accumulation of indels and nucleotide changes.
Principal Findings
Based on this analysis we have identified i) a core of highly conserved intergenic regions, which remained essentially unchanged in independently evolving lineages; ii) intergenic regions that show high diversity in spite of still retaining their corresponding upstream and downstream coding sequences; iii) a number of defined sequence motifs that are shared by a number of unrelated intergenic regions. A fraction of indels explains the diversification of some intergenic regions by the expansion/contraction of microsatellite-like repeats.
Author Summary
Chagas disease is caused by the protozoan parasite Trypanosoma cruzi, and poses a serious public health problem in the America, with approximately 8 million people infected and 200,000 new cases reported annually. The disease has different clinical manifestations. The fact that infections by the same species cause different clinical outcomes is believed to be determined, at least in part, by the genetic background of the parasite (infection by different strains). Previous characterizations of the genetic diversity in Trypanosoma cruzi were carried out on the protein-coding portions of the genome. However, the genetic diversity of non-coding intergenic regions remained unexplored. These regions are particularly important in trypanosomes because they contain essential regulatory sequences that drive the process of mRNA maturation and that ultimately govern the expression of genes. In this study, we analyzed the genetic diversity present in non-coding regions of the genome, and provide a broad picture of the selective forces acting on this subset of the genome. Based on this analysis we identified a highly conserved core of intergenic regions, that were maintained essentially unchanged over large evolutionary periods of time, as well as a highly divergent set of intergenic regions.
doi:10.1371/journal.pntd.0002839
PMCID: PMC4006747  PMID: 24784238
14.  Genomic variation of Trypanosoma cruzi: involvement of multicopy genes. 
Infection and Immunity  1990;58(10):3217-3224.
By using improved pulsed field gel conditions, the karyotypes of several strains of the protozoan parasite Trypanosoma cruzi were analyzed and compared with those of Leishmania major and two other members of the genus Trypanosoma. There was no difference in chromosome migration patterns between different life cycle stages of the T. cruzi strains analyzed. However, the sizes and numbers of chromosomal bands varied considerably among T. cruzi strains. This karyotype variation among T. cruzi strains was analyzed further at the chromosomal level by using multicopy genes as probes in Southern hybridizations. The chromosomal location of the genes encoding alpha- and beta-tubulin, ubiquitin, rRNA, spliced leader RNA, and an 85-kilodalton protein remained stable during developmental conversion of the parasite. The sizes and numbers of chromosomes containing these sequences varied among the different strains analyzed, implying multiple rearrangements of these genes during evolution of the parasites. During continuous in vitro cultivation of T. cruzi Y, the chromosomal location of the spliced leader gene shifted spontaneously. The spliced leader gene encodes a 35-nucleotide RNA that is spliced in trans from a 105-nucleotide donor RNA onto all mRNAs in T. cruzi. The spliced leader sequences changed in their physical location in both the cloned and uncloned Y strains. Associated with the complex changes was an increase in the infectivity of the rearranged variant for tissue culture cells. Our results indicate that the spliced leader gene clusters in T. cruzi undergo high-frequency genomic rearrangements.
Images
PMCID: PMC313642  PMID: 2169461
15.  Identification of Strain-Specific B-cell Epitopes in Trypanosoma cruzi Using Genome-Scale Epitope Prediction and High-Throughput Immunoscreening with Peptide Arrays 
Background
The factors influencing variation in the clinical forms of Chagas disease have not been elucidated; however, it is likely that the genetics of both the host and the parasite are involved. Several studies have attempted to correlate the T. cruzi strains involved in infection with the clinical forms of the disease by using hemoculture and/or PCR-based genotyping of parasites from infected human tissues. However, both techniques have limitations that hamper the analysis of large numbers of samples. The goal of this work was to identify conserved and polymorphic linear B-cell epitopes of T. cruzi that could be used for serodiagnosis and serotyping of Chagas disease using ELISA.
Methodology
By performing B-cell epitope prediction on proteins derived from pair of alleles of the hybrid CL Brener genome, we have identified conserved and polymorphic epitopes in the two CL Brener haplotypes. The rationale underlying this strategy is that, because CL Brener is a recent hybrid between the TcII and TcIII DTUs (discrete typing units), it is likely that polymorphic epitopes in pairs of alleles could also be polymorphic in the parental genotypes. We excluded sequences that are also present in the Leishmania major, L. infantum, L. braziliensis and T. brucei genomes to minimize the chance of cross-reactivity. A peptide array containing 150 peptides was covalently linked to a cellulose membrane, and the reactivity of the peptides was tested using sera from C57BL/6 mice chronically infected with the Colombiana (TcI) and CL Brener (TcVI) clones and Y (TcII) strain.
Findings and Conclusions
A total of 36 peptides were considered reactive, and the cross-reactivity among the strains is in agreement with the evolutionary origin of the different T. cruzi DTUs. Four peptides were tested against a panel of chagasic patients using ELISA. A conserved peptide showed 95.8% sensitivity, 88.5% specificity, and 92.7% accuracy for the identification of T. cruzi in patients infected with different strains of the parasite. Therefore, this peptide, in association with other T. cruzi antigens, may improve Chagas disease serodiagnosis. Together, three polymorphic epitopes were able to discriminate between the three parasite strains used in this study and are thus potential targets for Chagas disease serotyping.
Author Summary
Serological tests are preferentially used for the diagnosis of Chagas disease during the chronic phase because of the low parasitemia and high anti-T. cruzi antibody titers. However, contradictory or inconclusive results, mainly related to the characteristics of the antigens used, are often observed. Additionally, the factors influencing variation in the clinical forms of Chagas disease have not been elucidated, although it is likely that host and parasite genetics are involved. Several studies attempting to correlate the parasite strain with the clinical forms have used hemoculture and/or PCR-based genotyping. However, both techniques have limitations. Hemoculture requires the isolation of parasites from patient blood and the growth of these parasites in animals or in vitro culture, thereby possibly selecting certain subpopulations. Moreover, the level of parasitemia in the chronic phase is very low, hindering the detection of parasites. Additionally, direct genotyping of parasites from infected tissues is an invasive procedure that requires medical care and hinders studies with a large number of samples. The goal of this work was to identify conserved and polymorphic linear B-cell epitopes of T. cruzi on a genome-wide scale for use in the serodiagnosis and serotyping of Chagas disease using ELISA. Development of a serotyping method based on the detection of strain-specific antibodies may help to understand the relationship between the infecting strain and disease evolution.
doi:10.1371/journal.pntd.0002524
PMCID: PMC3814679  PMID: 24205430
16.  Predicting the Proteins of Angomonas deanei, Strigomonas culicis and Their Respective Endosymbionts Reveals New Aspects of the Trypanosomatidae Family 
PLoS ONE  2013;8(4):e60209.
Endosymbiont-bearing trypanosomatids have been considered excellent models for the study of cell evolution because the host protozoan co-evolves with an intracellular bacterium in a mutualistic relationship. Such protozoa inhabit a single invertebrate host during their entire life cycle and exhibit special characteristics that group them in a particular phylogenetic cluster of the Trypanosomatidae family, thus classified as monoxenics. In an effort to better understand such symbiotic association, we used DNA pyrosequencing and a reference-guided assembly to generate reads that predicted 16,960 and 12,162 open reading frames (ORFs) in two symbiont-bearing trypanosomatids, Angomonas deanei (previously named as Crithidia deanei) and Strigomonas culicis (first known as Blastocrithidia culicis), respectively. Identification of each ORF was based primarily on TriTrypDB using tblastn, and each ORF was confirmed by employing getorf from EMBOSS and Newbler 2.6 when necessary. The monoxenic organisms revealed conserved housekeeping functions when compared to other trypanosomatids, especially compared with Leishmania major. However, major differences were found in ORFs corresponding to the cytoskeleton, the kinetoplast, and the paraflagellar structure. The monoxenic organisms also contain a large number of genes for cytosolic calpain-like and surface gp63 metalloproteases and a reduced number of compartmentalized cysteine proteases in comparison to other TriTryp organisms, reflecting adaptations to the presence of the symbiont. The assembled bacterial endosymbiont sequences exhibit a high A+T content with a total of 787 and 769 ORFs for the Angomonas deanei and Strigomonas culicis endosymbionts, respectively, and indicate that these organisms hold a common ancestor related to the Alcaligenaceae family. Importantly, both symbionts contain enzymes that complement essential host cell biosynthetic pathways, such as those for amino acid, lipid and purine/pyrimidine metabolism. These findings increase our understanding of the intricate symbiotic relationship between the bacterium and the trypanosomatid host and provide clues to better understand eukaryotic cell evolution.
doi:10.1371/journal.pone.0060209
PMCID: PMC3616161  PMID: 23560078
17.  Trypanosoma cruzi IIc: Phylogenetic and Phylogeographic Insights from Sequence and Microsatellite Analysis and Potential Impact on Emergent Chagas Disease 
Trypanosoma cruzi, the etiological agent of Chagas disease, is highly genetically diverse. Numerous lines of evidence point to the existence of six stable genetic lineages or DTUs: TcI, TcIIa, TcIIb, TcIIc, TcIId, and TcIIe. Molecular dating suggests that T. cruzi is likely to have been an endemic infection of neotropical mammalian fauna for many millions of years. Here we have applied a panel of 49 polymorphic microsatellite markers developed from the online T. cruzi genome to document genetic diversity among 53 isolates belonging to TcIIc, a lineage so far recorded almost exclusively in silvatic transmission cycles but increasingly a potential source of human infection. These data are complemented by parallel analysis of sequence variation in a fragment of the glucose-6-phosphate isomerase gene. New isolates confirm that TcIIc is associated with terrestrial transmission cycles and armadillo reservoir hosts, and demonstrate that TcIIc is far more widespread than previously thought, with a distribution at least from Western Venezuela to the Argentine Chaco. We show that TcIIc is truly a discrete T. cruzi lineage, that it could have an ancient origin and that diversity occurs within the terrestrial niche independently of the host species. We also show that spatial structure among TcIIc isolates from its principal host, the armadillo Dasypus novemcinctus, is greater than that among TcI from Didelphis spp. opossums and link this observation to differences in ecology of their respective niches. Homozygosity in TcIIc populations and some linkage indices indicate the possibility of recombination but cannot yet be effectively discriminated from a high genome-wide frequency of gene conversion. Finally, we suggest that the derived TcIIc population genetic data have a vital role in determining the origin of the epidemiologically important hybrid lineages TcIId and TcIIe.
Author Summary
Trypanosoma cruzi, the etiological agent of Chagas disease, infects over 10 million people in Latin America. Six major genetic lineages of the parasite have been identified with differential geographic distributions, ecological associations and epidemiological importance. With the advent of the T. cruzi genome sequence, it is possible to examine the micro-epidemiology of T. cruzi using high resolution genetic markers that assess diversity within these major types. Here we examine the genetic diversity of TcIIc, a poorly understood T. cruzi genetic lineage found predominantly among wild cycles of parasite transmission infecting terrestrial mammals and triatomine vectors, but also a potentially important emergent human disease agent. Amongst a number of findings, we show that TcIIc genetic diversity is comparable to other ancient T. cruzi lineages, highly spatially structured, and that a stringent co-evolutionary relationship with its principal reservoir host can be ruled out. Additionally, TcIIc is one of the two parents of hybrid lineages TcIId and TcIIe, which cause most of the Chagas disease that occurs in the Southern Cone of South America. The system we have developed will help to clarify the ecological circumstances around the emergence of these epidemiologically important hybrids, and perhaps help predict similar events in the future.
doi:10.1371/journal.pntd.0000510
PMCID: PMC2727949  PMID: 19721699
18.  The Trypanosoma cruzi Sylvio X10 strain maxicircle sequence: the third musketeer 
BMC Genomics  2011;12:58.
Background
Chagas disease has a diverse pathology caused by the parasite Trypanosoma cruzi, and is indigenous to Central and South America. A pronounced feature of the trypanosomes is the kinetoplast, which is comprised of catenated maxicircles and minicircles that provide the transcripts involved in uridine insertion/deletion RNA editing. T. cruzi exchange genetic material through a hybridization event. Extant strains are grouped into six discrete typing units by nuclear markers, and three clades, A, B, and C, based on maxicircle gene analysis. Clades A and B are the more closely related. Representative clade B and C maxicircles are known in their entirety, and portions of A, B, and C clades from multiple strains show intra-strain heterogeneity with the potential for maxicircle taxonomic markers that may correlate with clinical presentation.
Results
To perform a genome-wide analysis of the three maxicircle clades, the coding region of clade A representative strain Sylvio X10 (a.k.a. Silvio X10) was sequenced by PCR amplification of specific fragments followed by assembly and comparison with the known CL Brener and Esmeraldo maxicircle sequences. The clade A rRNA and protein coding region maintained synteny with clades B and C. Amino acid analysis of non-edited and 5'-edited genes for Sylvio X10 showed the anticipated gene sequences, with notable frameshifts in the non-edited regions of Cyb and ND4. Comparisons of genes that undergo extensive uridine insertion and deletion display a high number of insertion/deletion mutations that are likely permissible due to the post-transcriptional activity of RNA editing.
Conclusion
Phylogenetic analysis of the entire maxicircle coding region supports the closer evolutionary relationship of clade B to A, consistent with uniparental mitochondrial inheritance from a discrete typing unit TcI parental strain and studies on smaller fragments of the mitochondrial genome. Gene variance that can be corrected by RNA editing hints at an unusual depth for maxicircle taxonomic markers, which will aid in the ability to distinguish strains, their corresponding symptoms, and further our understanding of the T. cruzi population structure. The prevalence of apparently compromised coding regions outside of normally edited regions hints at undescribed but active mechanisms of genetic exchange.
doi:10.1186/1471-2164-12-58
PMCID: PMC3040149  PMID: 21261994
19.  A Simple Strain Typing Assay for Trypanosoma cruzi: Discrimination of Major Evolutionary Lineages from a Single Amplification Product 
Background
Trypanosoma cruzi is the causative agent of Chagas' Disease. The parasite has a complex population structure, with six major evolutionary lineages, some of which have apparently resulted from ancestral hybridization events. Because there are important biological differences between these lineages, strain typing methods are essential to study the T. cruzi species. Currently, there are a number of typing methods available for T. cruzi, each with its own advantages and disadvantages. However, most of these methods are based on the amplification of a variable number of loci.
Methodology/Principal Findings
We present a simple typing assay for T. cruzi, based on the amplification of a single polymorphic locus: the TcSC5D gene. When analyzing sequences from this gene (a putative lathosterol/episterol oxidase) we observed a number of interesting polymorphic sites, including 1 tetra-allelic, and a number of informative tri- and bi-allelic SNPs. Furthermore, some of these SNPs were located within the recognition sequences of two commercially available restriction enzymes. A double digestion with these enzymes generates a unique restriction pattern that allows a simple classification of strains in six major groups, corresponding to DTUs TcI–TcIV, the recently proposed Tcbat lineage, and TcV/TcVI (as a group). Direct sequencing of the amplicon allows the classification of strains into seven groups, including the six currently recognized evolutionary lineages, by analyzing only a few discriminant polymorphic sites.
Conclusions/Significance
Based on these findings we propose a simple typing assay for T. cruzi that requires a single PCR amplification followed either by restriction fragment length polymorphism analysis, or direct sequencing. In the panel of strains tested, the sequencing-based method displays equivalent inter-lineage resolution to recent multi- locus sequence typing assays. Due to their simplicity and low cost, the proposed assays represent a good alternative to rapidly screen strain collections, providing the cornerstone for the development of robust typing strategies.
Author Summary
Trypanosoma cruzi, the causative agent of Chagas Disease, infects approximately 8 million people in the Americas, with 200,000 new cases reported anually. The disease, in its chronic stage, has different manifestations: mega-colon, mega-esophagus, and cardiomyopathy, among others. The fact that infections by the same species cause these different clinical outcomes is believed to be determined, at least in part, by the genetic background of the parasite (infection by different strains). By analyzing a number of molecular markers, the population of the parasite has been divided into seven major evolutionary lineages, which evolve mostly independently, by clonal expansion with infrequent exchange of genetic material. Accurate classification of different strains and isolates into their corresponding evolutionary lineages is therefore essential to obtain a good map of biological, biochemical and ecoepidemiological features for the whole species. The current methods available to type T. cruzi stocks are either laborious and costly (requiring the amplification and sequencing of a variable number of genes or gene fragments), or limited in resolution. In this work we describe a number of key discriminant sites in a gene encoding a putative enzyme from the sterol pathway of the parasite, which were exploited to design a couple of alternative typing assays. Using these key discriminant sites, we can classify any T. cruzi stock into either six or seven evolutionary lineages using only one gene fragment, and in a matter of hours (depending on the assay used). To our knowledge, the proposed assays are the first typing assays that can discriminate T. cruzi stocks with such speed and low cost.
doi:10.1371/journal.pntd.0001777
PMCID: PMC3409129  PMID: 22860154
20.  Shotgun Sequencing Analysis of Trypanosoma cruzi I Sylvio X10/1 and Comparison with T. cruzi VI CL Brener 
Trypanosoma cruzi is the causative agent of Chagas disease, which affects more than 9 million people in Latin America. We have generated a draft genome sequence of the TcI strain Sylvio X10/1 and compared it to the TcVI reference strain CL Brener to identify lineage-specific features. We found virtually no differences in the core gene content of CL Brener and Sylvio X10/1 by presence/absence analysis, but 6 open reading frames from CL Brener were missing in Sylvio X10/1. Several multicopy gene families, including DGF, mucin, MASP and GP63 were found to contain substantially fewer genes in Sylvio X10/1, based on sequence read estimations. 1,861 small insertion-deletion events and 77,349 nucleotide differences, 23% of which were non-synonymous and associated with radical amino acid changes, further distinguish these two genomes. There were 336 genes indicated as under positive selection, 145 unique to T. cruzi in comparison to T. brucei and Leishmania. This study provides a framework for further comparative analyses of two major T. cruzi lineages and also highlights the need for sequencing more strains to understand fully the genomic composition of this parasite.
Author Summary
Chagas disease is a major health problem in Latin America and it is caused by the protozoan parasite Trypanosoma cruzi. The genome sequence of the T. cruzi strain CL Brener (TcVI) has revealed a genome with large repertoires of genes for surface antigens, among other features. In the present study, we sequenced the genome of a representative member of TcI, the predominant agent of Chagas disease North of the Amazon and performed comparative analyses with CL Brener. Genetic variation between strains can potentially explain differences in disease pathogenesis, host preferences and aid the identification of drug targets. Our analysis showed that the two genomes have very similar sets of genes, but contain large differences in the relative size of several important gene families. Moreover, an abundance of allelic sequence variation was found in a large fraction of genes, and an evolutionary analysis indicated that many genes have evolved at different rates.
doi:10.1371/journal.pntd.0000984
PMCID: PMC3050914  PMID: 21408126
21.  Using comparative genomics to reorder the human genome sequence into a virtual sheep genome 
Genome Biology  2007;8(7):R152.
Using BAC-end sequences, a sparse marker map and the sequences of the human, dog and cow genomes, an accurate and detailed sub-gene level map of the sheep genome has been constructed.
Background
Is it possible to construct an accurate and detailed subgene-level map of a genome using bacterial artificial chromosome (BAC) end sequences, a sparse marker map, and the sequences of other genomes?
Results
A sheep BAC library, CHORI-243, was constructed and the BAC end sequences were determined and mapped with high sensitivity and low specificity onto the frameworks of the human, dog, and cow genomes. To maximize genome coverage, the coordinates of all BAC end sequence hits to the cow and dog genomes were also converted to the equivalent human genome coordinates. The 84,624 sheep BACs (about 5.4-fold genome coverage) with paired ends in the correct orientation (tail-to-tail) and spacing, combined with information from sheep BAC comparative genome contigs (CGCs) built separately on the dog and cow genomes, were used to construct 1,172 sheep BAC-CGCs, covering 91.2% of the human genome. Clustered non-tail-to-tail and outsize BACs located close to the ends of many BAC-CGCs linked BAC-CGCs covering about 70% of the genome to at least one other BAC-CGC on the same chromosome. Using the BAC-CGCs, the intrachromosomal and interchromosomal BAC-CGC linkage information, human/cow and vertebrate synteny, and the sheep marker map, a virtual sheep genome was constructed. To identify BACs potentially located in gaps between BAC-CGCs, an additional set of 55,668 sheep BACs were positioned on the sheep genome with lower confidence. A coordinate conversion process allowed us to transfer human genes and other genome features to the virtual sheep genome to display on a sheep genome browser.
Conclusion
We demonstrate that limited sequencing of BACs combined with positioning on a well assembled genome and integrating locations from other less well assembled genomes can yield extensive, detailed subgene-level maps of mammalian genomes, for which genomic resources are currently limited.
doi:10.1186/gb-2007-8-7-r152
PMCID: PMC2323240  PMID: 17663790
22.  A comparative physical map reveals the pattern of chromosomal evolution between the turkey (Meleagris gallopavo) and chicken (Gallus gallus) genomes 
BMC Genomics  2011;12:447.
Background
A robust bacterial artificial chromosome (BAC)-based physical map is essential for many aspects of genomics research, including an understanding of chromosome evolution, high-resolution genome mapping, marker-assisted breeding, positional cloning of genes, and quantitative trait analysis. To facilitate turkey genetics research and better understand avian genome evolution, a BAC-based integrated physical, genetic, and comparative map was developed for this important agricultural species.
Results
The turkey genome physical map was constructed based on 74,013 BAC fingerprints (11.9 × coverage) from two independent libraries, and it was integrated with the turkey genetic map and chicken genome sequence using over 41,400 BAC assignments identified by 3,499 overgo hybridization probes along with > 43,000 BAC end sequences. The physical-comparative map consists of 74 BAC contigs, with an average contig size of 13.6 Mb. All but four of the turkey chromosomes were spanned on this map by three or fewer contigs, with 14 chromosomes spanned by a single contig and nine chromosomes spanned by two contigs. This map predicts 20 to 27 major rearrangements distinguishing turkey and chicken chromosomes, despite up to 40 million years of separate evolution between the two species. These data elucidate the chromosomal evolutionary pattern within the Phasianidae that led to the modern turkey and chicken karyotypes. The predominant rearrangement mode involves intra-chromosomal inversions, and there is a clear bias for these to result in centromere locations at or near telomeres in turkey chromosomes, in comparison to interstitial centromeres in the orthologous chicken chromosomes.
Conclusion
The BAC-based turkey-chicken comparative map provides novel insights into the evolution of avian genomes, a framework for assembly of turkey whole genome shotgun sequencing data, and tools for enhanced genetic improvement of these important agricultural and model species.
doi:10.1186/1471-2164-12-447
PMCID: PMC3189400  PMID: 21906286
23.  Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics 
BMC Genomics  2013;14:79.
Background
The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome.
Results
The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs.
Conclusions
In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that not only was the gene nucleotide sequence conserved between soybean and lupin GRRs, but the order and orientation of particular genes in syntenic blocks was homologous, as well. These findings will be valuable to the forthcoming sequencing of the lupin genome.
doi:10.1186/1471-2164-14-79
PMCID: PMC3618312  PMID: 23379841
Narrow-leafed lupin; Glycine max; MFLP; Genome mapping; contigs; DNA sequencing; Synteny; BAC-FISH
24.  Improved genome assembly and evidence-based global gene model set for the chordate Ciona intestinalis: new insight into intron and operon populations 
Genome Biology  2008;9(10):R152.
An improved assembly of the Ciona intestinalis genome reveals that it contains non-canonical introns and that about 20% of Ciona genes reside in operons.
Background
The draft genome sequence of the ascidian Ciona intestinalis, along with associated gene models, has been a valuable research resource. However, recently accumulated expressed sequence tag (EST)/cDNA data have revealed numerous inconsistencies with the gene models due in part to intrinsic limitations in gene prediction programs and in part to the fragmented nature of the assembly.
Results
We have prepared a less-fragmented assembly on the basis of scaffold-joining guided by paired-end EST and bacterial artificial chromosome (BAC) sequences, and BAC chromosomal in situ hybridization data. The new assembly (115.2 Mb) is similar in length to the initial assembly (116.7 Mb) but contains 1,272 (approximately 50%) fewer scaffolds. The largest scaffold in the new assembly incorporates 95 initial-assembly scaffolds. In conjunction with the new assembly, we have prepared a greatly improved global gene model set strictly correlated with the extensive currently available EST data. The total gene number (15,254) is similar to that of the initial set (15,582), but the new set includes 3,330 models at genomic sites where none were present in the initial set, and 1,779 models that represent fusions of multiple previously incomplete models. In approximately half, 5'-ends were precisely mapped using 5'-full-length ESTs, an important refinement even in otherwise unchanged models.
Conclusion
Using these new resources, we identify a population of non-canonical (non-GT-AG) introns and also find that approximately 20% of Ciona genes reside in operons and that operons contain a high proportion of single-exon genes. Thus, the present dataset provides an opportunity to analyze the Ciona genome much more precisely than ever.
doi:10.1186/gb-2008-9-10-r152
PMCID: PMC2760879  PMID: 18854010
25.  Trypanosomatid comparative genomics: Contributions to the study of parasite biology and different parasitic diseases 
In 2005, draft sequences of the genomes of Trypanosoma brucei, Trypanosoma cruzi and Leishmania major, also known as the Tri-Tryp genomes, were published. These protozoan parasites are the causative agents of three distinct insect-borne diseases, namely sleeping sickness, Chagas disease and leishmaniasis, all with a worldwide distribution. Despite the large estimated evolutionary distance among them, a conserved core of ~6,200 trypanosomatid genes was found among the Tri-Tryp genomes. Extensive analysis of these genomic sequences has greatly increased our understanding of the biology of these parasites and their host-parasite interactions. In this article, we review the recent advances in the comparative genomics of these three species. This analysis also includes data on additional sequences derived from other trypanosmatid species, as well as recent data on gene expression and functional genomics. In addition to facilitating the identification of key parasite molecules that may provide a better understanding of these complex diseases, genome studies offer a rich source of new information that can be used to define potential new drug targets and vaccine candidates for controlling these parasitic infections.
PMCID: PMC3313497  PMID: 22481868
Trypanosoma brucei; Trypanosoma cruzi; Leishmania major; genome; RNAseq

Results 1-25 (729340)