1.  Identification of three novel FGF16 mutations in X-linked recessive fusion of the fourth and fifth metacarpals and possible correlation with heart disease 
Nonsense mutations in FGF16 have recently been linked to X-linked recessive hand malformations with fusion between the fourth and the fifth metacarpals and hypoplasia of the fifth digit (MF4; MIM#309630). The purpose of this study was to perform careful clinical phenotyping and to define molecular mechanisms behind X-linked recessive MF4 in three unrelated families. We performed whole-exome sequencing, and identified three novel mutations in FGF16. The functional impact of FGF16 loss was further studied using morpholino-based suppression of fgf16 in zebrafish. In addition, clinical investigations revealed reduced penetrance and variable expressivity of the MF4 phenotype. Cardiac disorders, including myocardial infarction and atrial fibrillation followed the X-linked FGF16 mutated trait in one large family. Our findings establish that a mutation in exon 1, 2 or 3 of FGF16 results in X-linked recessive MF4 and expand the phenotypic spectrum of FGF16 mutations to include a possible correlation with heart disease.
PMCID: PMC4190875  PMID: 25333065
FGF16; heart; metacarpal fusion; MF4
2.  An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge 
Brownstein, Catherine A | Beggs, Alan H | Homer, Nils | Merriman, Barry | Yu, Timothy W | Flannery, Katherine C | DeChene, Elizabeth T | Towne, Meghan C | Savage, Sarah K | Price, Emily N | Holm, Ingrid A | Luquette, Lovelace J | Lyon, Elaine | Majzoub, Joseph | Neupert, Peter | McCallie Jr, David | Szolovits, Peter | Willard, Huntington F | Mendelsohn, Nancy J | Temme, Renee | Finkel, Richard S | Yum, Sabrina W | Medne, Livija | Sunyaev, Shamil R | Adzhubey, Ivan | Cassa, Christopher A | de Bakker, Paul IW | Duzkale, Hatice | Dworzyński, Piotr | Fairbrother, William | Francioli, Laurent | Funke, Birgit H | Giovanni, Monica A | Handsaker, Robert E | Lage, Kasper | Lebo, Matthew S | Lek, Monkol | Leshchiner, Ignaty | MacArthur, Daniel G | McLaughlin, Heather M | Murray, Michael F | Pers, Tune H | Polak, Paz P | Raychaudhuri, Soumya | Rehm, Heidi L | Soemedi, Rachel | Stitziel, Nathan O | Vestecka, Sara | Supper, Jochen | Gugenmus, Claudia | Klocke, Bernward | Hahn, Alexander | Schubach, Max | Menzel, Mortiz | Biskup, Saskia | Freisinger, Peter | Deng, Mario | Braun, Martin | Perner, Sven | Smith, Richard JH | Andorf, Janeen L | Huang, Jian | Ryckman, Kelli | Sheffield, Val C | Stone, Edwin M | Bair, Thomas | Black-Ziegelbein, E Ann | Braun, Terry A | Darbro, Benjamin | DeLuca, Adam P | Kolbe, Diana L | Scheetz, Todd E | Shearer, Aiden E | Sompallae, Rama | Wang, Kai | Bassuk, Alexander G | Edens, Erik | Mathews, Katherine | Moore, Steven A | Shchelochkov, Oleg A | Trapane, Pamela | Bossler, Aaron | Campbell, Colleen A | Heusel, Jonathan W | Kwitek, Anne | Maga, Tara | Panzer, Karin | Wassink, Thomas | Van Daele, Douglas | Azaiez, Hela | Booth, Kevin | Meyer, Nic | Segal, Michael M | Williams, Marc S | Tromp, Gerard | White, Peter | Corsmeier, Donald | Fitzgerald-Butt, Sara | Herman, Gail | Lamb-Thrush, Devon | McBride, Kim L | Newsom, David | Pierson, Christopher R | Rakowsky, Alexander T | Maver, Aleš | Lovrečić, Luca | Palandačić, Anja | Peterlin, Borut | Torkamani, Ali | Wedell, Anna | Huss, Mikael | Alexeyenko, Andrey | Lindvall, Jessica M | Magnusson, Måns | Nilsson, Daniel | Stranneheim, Henrik | Taylan, Fulya | Gilissen, Christian | Hoischen, Alexander | van Bon, Bregje | Yntema, Helger | Nelen, Marcel | Zhang, Weidong | Sager, Jason | Zhang, Lu | Blair, Kathryn | Kural, Deniz | Cariaso, Michael | Lennon, Greg G | Javed, Asif | Agrawal, Saloni | Ng, Pauline C | Sandhu, Komal S | Krishna, Shuba | Veeramachaneni, Vamsi | Isakov, Ofer | Halperin, Eran | Friedman, Eitan | Shomron, Noam | Glusman, Gustavo | Roach, Jared C | Caballero, Juan | Cox, Hannah C | Mauldin, Denise | Ament, Seth A | Rowen, Lee | Richards, Daniel R | Lucas, F Anthony San | Gonzalez-Garay, Manuel L | Caskey, C Thomas | Bai, Yu | Huang, Ying | Fang, Fang | Zhang, Yan | Wang, Zhengyuan | Barrera, Jorge | Garcia-Lobo, Juan M | González-Lamuño, Domingo | Llorca, Javier | Rodriguez, Maria C | Varela, Ignacio | Reese, Martin G | De La Vega, Francisco M | Kiruluta, Edward | Cargill, Michele | Hart, Reece K | Sorenson, Jon M | Lyon, Gholson J | Stevenson, David A | Bray, Bruce E | Moore, Barry M | Eilbeck, Karen | Yandell, Mark | Zhao, Hongyu | Hou, Lin | Chen, Xiaowei | Yan, Xiting | Chen, Mengjie | Li, Cong | Yang, Can | Gunel, Murat | Li, Peining | Kong, Yong | Alexander, Austin C | Albertyn, Zayed I | Boycott, Kym M | Bulman, Dennis E | Gordon, Paul MK | Innes, A Micheil | Knoppers, Bartha M | Majewski, Jacek | Marshall, Christian R | Parboosingh, Jillian S | Sawyer, Sarah L | Samuels, Mark E | Schwartzentruber, Jeremy | Kohane, Isaac S | Margulies, David M
Genome Biology  2014;15(3):R53.
There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups.
PMCID: PMC4073084  PMID: 24667040
3.  Assistive/Socially Assistive Robotic Platform for Therapy and Recovery: Patient Perspectives 
Improving adherence to therapy is a critical component of advancing outcomes and reducing the cost of rehabilitation. A robotic platform was previously developed to explore how robotics could be applied to the social dimension of rehabilitation to improve adherence. This paper aims to report on feedback given by end users of the robotic platform as well as the practical applications that socially assistive robotics could have in the daily life activities of a patient. A group of 10 former and current patients interacted with the developed robotic platform during a simulated exercise session before taking an experience-based survey. A portion of these participants later provided verbal feedback as part of a focus group on the potential utility of such a platform. Identified applications included assistance with reaching exercise goals, managing to-do lists, and supporting participation in social and recreational activities. The study participants expressed that the personality characteristics of the robotic system should be adapted to individual preferences and that the assistance provided over time should align with the progress of their recovery. The results from this study are encouraging and will be useful for further development of socially assistive robotics.
PMCID: PMC3881578  PMID: 24454355
4.  Investigating highly replicated asthma genes as candidate genes for allergic rhinitis 
BMC Medical Genetics  2013;14:51.
Asthma genetics has been extensively studied and many genes have been associated with the development or severity of this disease. In contrast, the genetic basis of allergic rhinitis (AR) has not been evaluated as extensively. It is well known that asthma is closely related with AR since a large proportion of individuals with asthma also present symptoms of AR, and patients with AR have a 5–6 fold increased risk of developing asthma. Thus, the relevance of asthma candidate genes as predisposing factors for AR is worth investigating. The present study was designed to investigate if SNPs in highly replicated asthma genes are associated with the occurrence of AR.
A total of 192 SNPs from 21 asthma candidate genes reported to be associated with asthma in 6 or more unrelated studies were genotyped in a Swedish population with 246 AR patients and 431 controls. Genotypes for 429 SNPs from the same set of genes were also extracted from a Singapore Chinese genome-wide dataset which consisted of 456 AR cases and 486 controls. All SNPs were subsequently analyzed for association with AR and their influence on allergic sensitization to common allergens.
A limited number of potential associations were observed and the overall pattern of P-values corresponds well to the expectations in the absence of an effect. However, in the tests of allele effects in the Chinese population the number of significant P-values exceeds the expectations. The strongest signals were found for SNPs in NPSR1 and CTLA4. In these genes, a total of nine SNPs showed P-values <0.001 with corresponding Q-values <0.05. In the NPSR1 gene some P-values were lower than the Bonferroni correction level. Reanalysis after elimination of all patients with asthmatic symptoms excluded asthma as a confounding factor in our results. Weaker indications were found for IL13 and GSTP1 with respect to sensitization to birch pollen in the Swedish population.
Genetic variation in the majority of the highly replicated asthma genes were not associated to AR in our populations which suggest that asthma and AR could have less in common than previously anticipated. However, NPSR1 and CTLA4 can be genetic links between AR and asthma and associations of polymorphisms in NPSR1 with AR have not been reported previously.
PMCID: PMC3653682  PMID: 23663310
Allergic rhinitis; Association; Asthma; Case–control; Replication
5.  Full spatial characterization of a nanofocused x-ray free-electron laser beam by ptychographic imaging 
Scientific Reports  2013;3:1633.
The emergence of hard X-ray free electron lasers (XFELs) enables new insights into many fields of science. These new sources provide short, highly intense, and coherent X-ray pulses. In a variety of scientific applications these pulses need to be strongly focused. In this article, we demonstrate focusing of hard X-ray FEL pulses to 125 nm using refractive x-ray optics. For a quantitative analysis of most experiments, the wave field or at least the intensity distribution illuminating the sample is needed. We report on the full characterization of a nanofocused XFEL beam by ptychographic imaging, giving access to the complex wave field in the nanofocus. From these data, we obtain the full caustic of the beam, identify the aberrations of the optic, and determine the wave field for individual pulses. This information is for example crucial for high-resolution imaging, creating matter in extreme conditions, and nonlinear x-ray optics.
PMCID: PMC3620670  PMID: 23567281
6.  Poor Reproducibility of Allergic Rhinitis SNP Associations 
PLoS ONE  2013;8(1):e53975.
Replication of reported associations is crucial to the investigation of complex disease. More than 100 SNPs have previously been reported as associated with allergic rhinitis (AR), but few of these have been replicated successfully. To investigate the general reproducibility of reported AR-associations in candidate gene studies, one Swedish (352 AR-cases, 709 controls) and one Singapore Chinese population (948 AR-cases, 580 controls) were analyzed using 49 AR-associated SNPs. The overall pattern of P-values indicated that very few of the investigated SNPs were associated with AR. Given published odds ratios (ORs) most SNPs showed high power to detect an association, but no correlations were found between the ORs of the two study populations or with published ORs. None of the association signals were in common to the two genome-wide association studies published in AR, indicating that the associations represent false positives or have much lower effect-sizes than reported.
PMCID: PMC3559641  PMID: 23382861
7.  An N-Terminal Missense Mutation in STX11 Causative of FHL4 Abrogates Syntaxin-11 Binding to Munc18-2 
Familial hemophagocytic lymphohistiocytosis (FHL) is an often-fatal hyperinflammatory disorder caused by autosomal recessive mutations in PRF1, UNC13D, STX11, and STXBP2. We identified a homozygous STX11 mutation, c.173T > C (p.L58P), in three patients presenting clinically with hemophagocytic lymphohistiocytosis from unrelated Pakistani families. The mutation yields an amino acid substitution in the N-terminal Habc domain of syntaxin-11 and resulted in defective natural killer cell degranulation. Notably, syntaxin-11 expression was decreased in patient cells. However, in an ectopic expression system, syntaxin-11 L58P was expressed at levels comparable to wild-type syntaxin-11, but did not bind Munc18-2. Moreover, another N-terminal syntaxin-11 mutant, R4A, also did not bind Munc18-2. Thus, we have identified a novel missense STX11 mutation causative of FHL type 4. The syntaxin-11 R4A and L58P mutations reveal that both the N-terminus and Habc domain of syntaxin-11 are required for binding to Munc18-2, implying similarity to the dynamic binary binding of neuronal syntaxin-1 to Munc18-1.
PMCID: PMC3890652  PMID: 24459464
familial hemophagocytic lymphohistiocytosis; syntaxin-11; Munc18-2; N-peptide
8.  The genome of the heartworm, Dirofilaria immitis, reveals drug and vaccine targets 
The FASEB Journal  2012;26(11):4650-4661.
The heartworm Dirofilaria immitis is an important parasite of dogs. Transmitted by mosquitoes in warmer climatic zones, it is spreading across southern Europe and the Americas at an alarming pace. There is no vaccine, and chemotherapy is prone to complications. To learn more about this parasite, we have sequenced the genomes of D. immitis and its endosymbiont Wolbachia. We predict 10,179 protein coding genes in the 84.2 Mb of the nuclear genome, and 823 genes in the 0.9-Mb Wolbachia genome. The D. immitis genome harbors neither DNA transposons nor active retrotransposons, and there is very little genetic variation between two sequenced isolates from Europe and the United States. The differential presence of anabolic pathways such as heme and nucleotide biosynthesis hints at the intricate metabolic interrelationship between the heartworm and Wolbachia. Comparing the proteome of D. immitis with other nematodes and with mammalian hosts, we identify families of potential drug targets, immune modulators, and vaccine candidates. This genome sequence will support the development of new tools against dirofilariasis and aid efforts to combat related human pathogens, the causative agents of lymphatic filariasis and river blindness.—Godel, C., Kumar, S., Koutsovoulos, G., Ludin, P., Nilsson, D., Comandatore, F., Wrobel, N., Thompson, M., Schmid, C. D., Goto, S., Bringaud, F., Wolstenholme, A., Bandi, C., Epe, C., Kaminsky, R., Blaxter, M., Mäser, P. The genome of the heartworm, Dirofilaria immitis, reveals drug and vaccine targets.
PMCID: PMC3475251  PMID: 22889830
comparative genomics; filaria; transposon; Wolbachia
9.  Toll-like receptor gene polymorphisms are associated with allergic rhinitis: a case control study 
BMC Medical Genetics  2012;13:66.
The Toll-like receptor proteins are important in host defense and initiation of the innate and adaptive immune responses. A number of studies have identified associations between genetic variation in the Toll-like receptor genes and allergic disorders such as asthma and allergic rhinitis. The present study aim to search for genetic variation associated with allergic rhinitis in the Toll-like receptor genes.
A first association analysis genotyped 73 SNPs in 182 cases and 378 controls from a Swedish population. Based on these results an additional 24 SNPs were analyzed in one Swedish population with 352 cases and 709 controls and one Chinese population with 948 cases and 580 controls.
The first association analysis identified 4 allergic rhinitis-associated SNPs in the TLR7-TLR8 gene region. Subsequent analysis of 24 SNPs from this region identified 7 and 5 significant SNPs from the Swedish and Chinese populations, respectively. The corresponding risk-associated haplotypes are significant after Bonferroni correction and are the most common haplotypes in both populations. The associations are primarily detected in females in the Swedish population, whereas it is seen in males in the Chinese population. Further independent support for the involvement of this region in allergic rhinitis was obtained from quantitative skin prick test data generated in both populations.
Haplotypes in the TLR7-TLR8 gene region were associated with allergic rhinitis in one Swedish and one Chinese population. Since this region has earlier been associated with asthma and allergic rhinitis in a Danish linkage study this speaks strongly in favour of this region being truly involved in the development of this disease.
PMCID: PMC3459792  PMID: 22857391
Allergic rhinitis; Toll-like receptor; Polymorphism; Genetics; Haplotype; Case–control
10.  Epigenetic Regulation of Transcription and Virulence in Trypanosoma cruzi by O-Linked Thymine Glucosylation of DNA ▿ †  
Molecular and Cellular Biology  2011;31(8):1690-1700.
Unlike other eukaryotes, the protein-coding genes of Trypanosoma cruzi are arranged in large polycistronic gene clusters transcribed by polymerase II (Pol II). Thus, it is thought that trypanosomes rely solely on posttranscriptional processes to regulate gene expression. Here, we show that the glucosylated thymine DNA base (β-d-glucosyl-hydroxymethyluracil or base J) is present within sequences flanking the polycistronic units (PTUs) in T. cruzi. The loss of base J at sites of transcription initiation, via deletion of the two enzymes that regulate base J synthesis (JBP1 and JBP2), correlates with an increased rate of Pol II transcription and subsequent genome-wide increase in gene expression. The affected genes include virulence genes, and the resulting parasites are defective in host cell invasion and egress. These studies indicate that base J is an epigenetic factor regulating Pol II transcription initiation in kinetoplastids and provides the first biological role of the only hypermodified DNA base in eukaryotes.
PMCID: PMC3126337  PMID: 21321080
11.  The Short Non-Coding Transcriptome of the Protozoan Parasite Trypanosoma cruzi 
The pathway for RNA interference is widespread in metazoans and participates in numerous cellular tasks, from gene silencing to chromatin remodeling and protection against retrotransposition. The unicellular eukaryote Trypanosoma cruzi is missing the canonical RNAi pathway and is unable to induce RNAi-related processes. To further understand alternative RNA pathways operating in this organism, we have performed deep sequencing and genome-wide analyses of a size-fractioned cDNA library (16–61 nt) from the epimastigote life stage. Deep sequencing generated 582,243 short sequences of which 91% could be aligned with the genome sequence. About 95–98% of the aligned data (depending on the haplotype) corresponded to small RNAs derived from tRNAs, rRNAs, snRNAs and snoRNAs. The largest class consisted of tRNA-derived small RNAs which primarily originated from the 3′ end of tRNAs, followed by small RNAs derived from rRNA. The remaining sequences revealed the presence of 92 novel transcribed loci, of which 79 did not show homology to known RNA classes.
Author Summary
Chagas' disease is a major health problem in Latin America and is caused by the protozoan parasite Trypanosoma cruzi. T. cruzi lacks the pathway for RNA interference, which is widespread among eukaryotes, and is therefore unable to induce RNAi-related processes. In many organisms, small RNAs play an important role in regulating gene expression and other cellular processes. In order to understand if other small RNA pathways are operating in this organism, we performed high throughput sequencing and genome-wide analyses of the short transcriptome. We identified an abundance of small RNAs derived from non-coding RNA genes, including transfer RNAs, ribosomal RNAs as well as small nucleolar RNAs and small nuclear RNAs. Certain tRNA types were overrepresented as precursors for small RNAs. Further, we identified 79 novel small non-coding RNAs, not previously reported. We did not identify canonical small RNAs, like microRNAs and small interfering RNAs, and concluded that these do not exist in T. cruzi. This study has provided insights into the short transcriptome of a major human pathogen and provided starting points for further functional investigation of small RNAs and their biological roles.
PMCID: PMC3166047  PMID: 21912713
12.  Genome-Wide Identification of Molecular Mimicry Candidates in Parasites 
PLoS ONE  2011;6(3):e17546.
Among the many strategies employed by parasites for immune evasion and host manipulation, one of the most fascinating is molecular mimicry. With genome sequences available for host and parasite, mimicry of linear amino acid epitopes can be investigated by comparative genomics. Here we developed an in silico pipeline for genome-wide identification of molecular mimicry candidate proteins or epitopes. The predicted proteome of a given parasite was broken down into overlapping fragments, each of which was screened for close hits in the human proteome. Control searches were carried out against unrelated, free-living eukaryotes to eliminate the generally conserved proteins, and with randomized versions of the parasite proteins to get an estimate of statistical significance. This simple but computation-intensive approach yielded interesting candidates from human-pathogenic parasites. From Plasmodium falciparum, it returned a 14 amino acid motif in several of the PfEMP1 variants identical to part of the heparin-binding domain in the immunosuppressive serum protein vitronectin. And in Brugia malayi, fragments were detected that matched to periphilin-1, a protein of cell-cell junctions involved in barrier formation. All the results are publicly available by means of mimicDB, a searchable online database for molecular mimicry candidates from pathogens. To our knowledge, this is the first genome-wide survey for molecular mimicry proteins in parasites. The strategy can be adopted to any pair of host and pathogen, once appropriate negative control organisms are chosen. MimicDB provides a host of new starting points to gain insights into the molecular nature of host-pathogen interactions.
PMCID: PMC3050887  PMID: 21408160
13.  Phylogenomics of Ligand-Gated Ion Channels Predicts Monepantel Effect 
PLoS Pathogens  2010;6(9):e1001091.
The recently launched veterinary anthelmintic drench for sheep (Novartis Animal Health Inc., Switzerland) containing the nematocide monepantel represents a new class of anthelmintics: the amino-acetonitrile derivatives (AADs), much needed in view of widespread resistance to the classical drugs. Recently, it was shown that the ACR-23 protein in Caenorhabditis elegans and a homologous protein, MPTL-1 in Haemonchus contortus, are potential targets for AAD action. Both proteins belong to the DEG-3 subfamily of acetylcholine receptors, which are thought to be nematode-specific, and different from those targeted by the imidazothiazoles (e.g. levamisole). Here we provide further evidence that Cel-ACR-23 and Hco-MPTL-1-like subunits are involved in the monepantel-sensitive phenotype. We performed comparative genomics of ligand-gated ion channel genes from several nematodes and subsequently assessed their sensitivity to anthelmintics. The nematode species in the Caenorhabditis genus, equipped with ACR-23/MPTL-1-like receptor subunits, are sensitive to monepantel (EC50<1.25 µM), whereas the related nematodes Pristionchus pacificus and Strongyloides ratti, which lack an ACR-23/MPTL-1 homolog, are insensitive (EC50>43 µM). Genome sequence information has long been used to identify putative targets for therapeutic intervention. We show how comparative genomics can be applied to predict drug sensitivity when molecular targets of a compound are known or suspected.
Author Summary
Increased use of anthelmintics has contributed to the emergence of drug-resistant nematodes, causing serious problems for more than one billion sheep worldwide. The last class of compounds indicated for livestock was introduced 28 years ago. Recently, however, Novartis AH developed a new anthelmintic active against drug-resistant nematodes of sheep, the amino-acetonitrile derivative (AAD) monepantel. We have previously indirectly shown that the AADs have a novel mode of action involving acetylcholine receptor subunits: the ACR-23 protein in Caenorhabditis elegans and a homologous protein, MPTL-1 in Haemonchus contortus. To better understand the mode of action of the AADs, we performed comparative genomics of all ligand-gated ion channel genes from a range of organisms, including members from all nematode clades. We confirmed that MPTL-1 belongs to a unique, nematode-specific sub-family of receptor subunits. We also found that some nematode species lack ACR-23/MPTL-1 and predicted them to be monepantel insensitive. We challenged this hypothesis in a panel of drug tests: several species of Caenorhabditis nematodes equipped with ACR-23/MPTL-1-like receptor subunits were found susceptible to monepantel, whereas Pristionchus pacificus, closely related to these worms but lacking an ACR-23/MPTL-1 homolog, was tolerant. The parasitic nematode Strongyloides ratti, which has only a remote homolog of DES-2 and ACR-23/MPTL-1, was also tolerant to monepantel. This confirms our prediction and highlights how comparative genomic data can be used to predict a drug effect.
PMCID: PMC2936538  PMID: 20838602
14.  Spliced Leader Trapping Reveals Widespread Alternative Splicing Patterns in the Highly Dynamic Transcriptome of Trypanosoma brucei 
PLoS Pathogens  2010;6(8):e1001037.
Trans-splicing of leader sequences onto the 5′ends of mRNAs is a widespread phenomenon in protozoa, nematodes and some chordates. Using parallel sequencing we have developed a method to simultaneously map 5′splice sites and analyze the corresponding gene expression profile, that we term spliced leader trapping (SLT). The method can be applied to any organism with a sequenced genome and trans-splicing of a conserved leader sequence. We analyzed the expression profiles and splicing patterns of bloodstream and insect forms of the parasite Trypanosoma brucei. We detected the 5′ splice sites of 85% of the annotated protein-coding genes and, contrary to previous reports, found up to 40% of transcripts to be differentially expressed. Furthermore, we discovered more than 2500 alternative splicing events, many of which appear to be stage-regulated. Based on our findings we hypothesize that alternatively spliced transcripts present a new means of regulating gene expression and could potentially contribute to protein diversity in the parasite. The entire dataset can be accessed online at TriTrypDB or through:
Author Summary
Some organisms like the human and animal parasite Trypanosoma brucei add a leader sequence to their mRNAs through a reaction called trans-splicing. Until now the splice sites for most mRNAs were unknown in T. brucei. Using high throughput sequencing we have developed a method to identify the splice sites and at the same time measure the abundance of the corresponding mRNAs. Analyzing three different life cycle stages of the parasite we identified the vast majority of splice sites in the organism and, to our great surprise, uncovered more than 2500 alternative splicing events, many of which appeared to be specific for one of the life cycle stages. Alternative splicing is a result of the addition of the leader sequence to different positions on the mRNA, leading to mixed mRNA populations that can encode for proteins with varying properties. One of the most obvious changes caused by alternative splicing is the gain or loss of targeting signals, leading to differential localization of the corresponding proteins. Based on our findings we hypothesize that alternative splicing is a major mechanism to regulate gene expression in T. brucei and could contribute to protein diversity in the parasite.
PMCID: PMC2916883  PMID: 20700444
15.  The Trypanosoma brucei MitoCarta and its regulation and splicing pattern during development 
Nucleic Acids Research  2010;38(21):7378-7387.
It has long been known that trypanosomes regulate mitochondrial biogenesis during the life cycle of the parasite; however, the mitochondrial protein inventory (MitoCarta) and its regulation remain unknown. We present a novel computational method for genome-wide prediction of mitochondrial proteins using a support vector machine-based classifier with ∼90% prediction accuracy. Using this method, we predicted the mitochondrial localization of 468 proteins with high confidence and have experimentally verified the localization of a subset of these proteins. We then applied a recently developed parallel sequencing technology to determine the expression profiles and the splicing patterns of a total of 1065 predicted MitoCarta transcripts during the development of the parasite, and showed that 435 of the transcripts significantly changed their expressions while 630 remain unchanged in any of the three life stages analyzed. Furthermore, we identified 298 alternatively splicing events, a small subset of which could lead to dual localization of the corresponding proteins.
PMCID: PMC2995047  PMID: 20660476
16.  Comparative genomics of metabolic networks of free-living and parasitic eukaryotes 
BMC Genomics  2010;11:217.
Obligate endoparasites often lack particular metabolic pathways as compared to free-living organisms. This phenomenon comprises anabolic as well as catabolic reactions. Presumably, the corresponding enzymes were lost in adaptation to parasitism. Here we compare the predicted core metabolic graphs of obligate endoparasites and non-parasites (free living organisms and facultative parasites) in order to analyze how the parasites' metabolic networks shrunk in the course of evolution.
Core metabolic graphs comprising biochemical reactions present in the presumed ancestor of parasites and non-parasites were reconstructed from the Kyoto Encyclopedia of Genes and Genomes. While the parasites' networks had fewer nodes (metabolites) and edges (reactions), other parameters such as average connectivity, network diameter and number of isolated edges were similar in parasites and non-parasites. The parasites' networks contained a higher percentage of ATP-consuming reactions and a lower percentage of NAD-requiring reactions. Control networks, shrunk to the size of the parasites' by random deletion of edges, were scale-free but exhibited smaller diameters and more isolated edges.
The parasites' networks were smaller than those of the non-parasites regarding number of nodes or edges, but not regarding network diameters. Network integrity but not scale-freeness has acted as a selective principle during the evolutionary reduction of parasite metabolism. ATP-requiring reactions in particular have been retained in the parasites' core metabolism while NADH- or NADPH-requiring reactions were lost preferentially.
PMCID: PMC2858753  PMID: 20356377
17.  Proteomics in Trypanosoma cruzi - Localization of Novel Proteins to Various Organelles 
Proteomics  2008;8(13):2735-2749.
The completion of the genome sequence of Trypanosoma cruzi has been followed by several studies of protein expression, with the long-term aim to obtain a complete picture of the parasite proteome. We report a proteomic analysis of an organellar cell fraction from T. cruzi CL Brener epimastigotes. A total of 396 proteins were identified by LC-MS/MS. Of these, 138 were annotated as hypothetical in the genome databases and the rest could be assigned to several metabolic and biosynthetic pathways, transport, and structural functions. Comparative analysis with a whole cell proteome study resulted in the validation of the expression of 173 additional proteins. Of these, 38 proteins previously reported in other stages were not found in the only large-scale study of the total epimastigote stage proteome. A selected set of identified proteins was analyzed further to investigate gene copy number, sequence variation, transmembrane domains and targeting signals. The genes were cloned and the proteins expressed with a c-myc epitope tag in T. cruzi epimastigotes. Immunofluorescence microscopy revealed the localization of these proteins in different cellular compartments, such as endoplasmic reticulum, acidocalcisome, mitochondrion, and putative cytoplasmic transport or delivery vesicles. The results demonstrate that the use of enriched subcellular fractions allows the detection of T. cruzi proteins that are undetected by whole cell proteomic methods.
PMCID: PMC2706665  PMID: 18546153
Immunofluorescence; Organelles; Protein isoforms; Trypanosoma cruzi
18.  Fission Yeast Mitogen-Activated Protein Kinase Sty1 Interacts with Translation Factors▿ †  
Eukaryotic Cell  2007;7(2):328-338.
Signaling by stress-activated mitogen-activated protein kinase (MAPK) pathways influences translation efficiency in mammalian cells and budding yeast. We have investigated the stress-activated MAPK from fission yeast, Sty1, and its downstream protein kinase, Mkp1/Srk1, for physically associated proteins using tandem affinity purification tagging. We find Sty1, but not Mkp1, to bind to the translation elongation factor eukaryotic elongation factor 2 (eEF2) and the translation initiation factor eukaryotic initiation factor 3a (eIF3a). The Sty1-eIF3a interaction is weakened under oxidative or hyperosmotic stress, whereas the Sty1-eEF2 interaction is stable. Nitrogen deprivation causes a transient strengthening of both the Sty1-eEF2 and the Sty1-Mkp1 interactions, overlapping with the time of maximal Sty1 activation. Analysis of polysome profiles from cells under oxidative stress, or after hyperosmotic shock or nitrogen deprivation, shows that translation in sty1 mutant cells recovers considerably less efficiently than that in the wild type. Cells lacking the Sty1-regulated transcription factor Atf1 are deficient in maintaining and recovering translational activity after hyperosmotic shock but not during oxidative stress or nitrogen starvation. In cells lacking Sty1, eIF3a levels are decreased, and phosphorylation of eIF3a is reduced. Taken together, our data point to a central role in translational adaptation for the stress-activated MAPK pathway in fission yeast similar to that in other investigated eukaryotes, with the exception that fission yeast MAPK-activated protein kinases seem not to be directly involved in this process.
PMCID: PMC2238150  PMID: 18065650
19.  Database of Trypanosoma cruzi repeated genes: 20 000 additional gene variants 
BMC Genomics  2007;8:391.
Repeats are present in all genomes, and often have important functions. However, in large genome sequencing projects, many repetitive regions remain uncharacterized. The genome of the protozoan parasite Trypanosoma cruzi consists of more than 50% repeats. These repeats include surface molecule genes, and several other gene families. In the T. cruzi genome sequencing project, it was clear that not all copies of repetitive genes were present in the assembly, due to collapse of nearly identical repeats. However, at the time of publication of the T. cruzi genome, it was not clear to what extent this had occurred.
We have developed a pipeline to estimate the genomic repeat content, where shotgun reads are aligned to the genomic sequence and the gene copy number is estimated using the average shotgun coverage. This method was applied to the genome of T. cruzi and copy numbers of all protein coding sequences and pseudogenes were estimated. The 22 640 results were stored in a database available online. 18% of all protein coding sequences and pseudogenes were estimated to exist in 14 or more copies in the T. cruzi CL Brener genome. The average coverage of the annotated protein coding sequences and pseudogenes indicate a total gene copy number, including allelic gene variants, of over 40 000.
Our results indicate that the number of protein coding sequences and pseudogenes in the T. cruzi genome may be twice the previous estimate. We have constructed a database of the T. cruzi gene repeat data that is available as a resource to the community. The main purpose of the database is to enable biologists interested in repeated, unfinished regions to closely examine and resolve these regions themselves using all available shotgun data, instead of having to rely on annotated consensus sequences that often are erroneous and possibly misleading. Five repetitive genes were studied in more detail, in order to illustrate how the database can be used to analyze and extract information about gene repeats with different characteristics in Trypanosoma cruzi.
PMCID: PMC2204015  PMID: 17963481
20.  Repetitive DNA is associated with centromeric domains in Trypanosoma brucei but not Trypanosoma cruzi 
Genome Biology  2007;8(3):R37.
Centromeres in Trypanosoma cruzi and Trypanosoma brucei can be localised to regions between directional gene clusters that contain degenerate retroelements, and in the case of T. brucei, repetitive DNA.
Trypanosomes are parasitic protozoa that diverged early from the main eukaryotic lineage. Their genomes display several unusual characteristics and, despite completion of the trypanosome genome projects, the location of centromeric DNA has not been identified.
We report evidence on the location and nature of centromeric DNA in Trypanosoma cruzi and Trypanosoma brucei. In T. cruzi, we used telomere-associated chromosome fragmentation and found that GC-rich transcriptional 'strand-switch' domains composed predominantly of degenerate retrotranposons are a shared feature of regions that confer mitotic stability. Consistent with this, etoposide-mediated topoisomerase-II cleavage, a biochemical marker for active centromeres, is concentrated at these domains. In the 'megabase-sized' chromosomes of T. brucei, topoisomerase-II activity is also focused at single loci that encompass regions between directional gene clusters that contain transposable elements. Unlike T. cruzi, however, these loci also contain arrays of AT-rich repeats stretching over several kilobases. The sites of topoisomerase-II activity on T. brucei chromosome 1 and T. cruzi chromosome 3 are syntenic, suggesting that centromere location has been conserved for more than 200 million years. The T. brucei intermediate and minichromosomes, which lack housekeeping genes, do not exhibit site-specific accumulation of topoisomerase-II, suggesting that segregation of these atypical chromosomes might involve a centromere-independent mechanism.
The localization of centromeric DNA in trypanosomes fills a major gap in our understanding of genome organization in these important human pathogens. These data are a significant step towards identifying and functionally characterizing other determinants of centromere function and provide a framework for dissecting the mechanisms of chromosome segregation.
PMCID: PMC1868937  PMID: 17352808

