Search tips
Search criteria

Results 1-25 (25)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Chromerid genomes reveal the evolutionary path from photosynthetic algae to obligate intracellular parasites 
eLife  null;4:e06974.
The eukaryotic phylum Apicomplexa encompasses thousands of obligate intracellular parasites of humans and animals with immense socio-economic and health impacts. We sequenced nuclear genomes of Chromera velia and Vitrella brassicaformis, free-living non-parasitic photosynthetic algae closely related to apicomplexans. Proteins from key metabolic pathways and from the endomembrane trafficking systems associated with a free-living lifestyle have been progressively and non-randomly lost during adaptation to parasitism. The free-living ancestor contained a broad repertoire of genes many of which were repurposed for parasitic processes, such as extracellular proteins, components of a motility apparatus, and DNA- and RNA-binding protein families. Based on transcriptome analyses across 36 environmental conditions, Chromera orthologs of apicomplexan invasion-related motility genes were co-regulated with genes encoding the flagellar apparatus, supporting the functional contribution of flagella to the evolution of invasion machinery. This study provides insights into how obligate parasites with diverse life strategies arose from a once free-living phototrophic marine alga.
eLife digest
Single-celled parasites cause many severe diseases in humans and animals. The apicomplexans form probably the most successful group of these parasites and include the parasites that cause malaria. Apicomplexans infect a broad range of hosts, including humans, reptiles, birds, and insects, and often have complicated life cycles. For example, the malaria-causing parasites spread by moving from humans to female mosquitoes and then back to humans.
Despite significant differences amongst apicomplexans, these single-celled parasites also share a number of features that are not seen in other living species. How and when these features arose remains unclear. It is known from previous work that apicomplexans are closely related to single-celled algae. But unlike apicomplexans, which depend on a host animal to survive, these algae live freely in their environment, often in close association with corals.
Woo et al. have now sequenced the genomes of two photosynthetic algae that are thought to be close living relatives of the apicomplexans. These genomes were then compared to each other and to the genomes of other algae and apicomplexans. These comparisons reconfirmed that the two algae that were studied were close relatives of the apicomplexans.
Further analyses suggested that thousands of genes were lost as an ancient free-living algae evolved into the apicomplexan ancestor, and further losses occurred as these early parasites evolved into modern species. The lost genes were typically those that are important for free-living organisms, but are either a hindrance to, or not needed in, a parasitic lifestyle. Some of the ancestor's genes, especially those that coded for the building blocks of flagella (structures which free-living algae use to move around), were repurposed in ways that helped the apicomplexans to invade their hosts. Understanding this repurposing process in greater detail will help to identify key molecules in these deadly parasites that could be targeted by drug treatments. It will also offer answers to one of the most fascinating questions in evolutionary biology: how parasites have evolved from free-living organisms.
PMCID: PMC4501334  PMID: 26175406
Chromera velia; Vitrella brassicaformis; evolution of parasitism; malaria; toxoplasmosis; other
2.  glbase: a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data 
Cell Regeneration  2014;3(1):1.
Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data), and which has been instrumental in the analysis of complex data sets. glbase is freely available at
PMCID: PMC4230833  PMID: 25408880
ChIP-seq; RNA-seq; Genomics; Microarray; Motifs; Transcription factor; Bioinformatics
3.  Cellular network entropy as the energy potential in Waddington's differentiation landscape 
Scientific Reports  2013;3:3039.
Differentiation is a key cellular process in normal tissue development that is significantly altered in cancer. Although molecular signatures characterising pluripotency and multipotency exist, there is, as yet, no single quantitative mark of a cellular sample's position in the global differentiation hierarchy. Here we adopt a systems view and consider the sample's network entropy, a measure of signaling pathway promiscuity, computable from a sample's genome-wide expression profile. We demonstrate that network entropy provides a quantitative, in-silico, readout of the average undifferentiated state of the profiled cells, recapitulating the known hierarchy of pluripotent, multipotent and differentiated cell types. Network entropy further exhibits dynamic changes in time course differentiation data, and in line with a sample's differentiation stage. In disease, network entropy predicts a higher level of cellular plasticity in cancer stem cell populations compared to ordinary cancer cells. Importantly, network entropy also allows identification of key differentiation pathways. Our results are consistent with the view that pluripotency is a statistical property defined at the cellular population level, correlating with intra-sample heterogeneity, and driven by the degree of signaling promiscuity in cells. In summary, network entropy provides a quantitative measure of a cell's undifferentiated state, defining its elevation in Waddington's landscape.
PMCID: PMC3807110  PMID: 24154593
4.  Systematic identification of transcriptional regulatory modules from protein–protein interaction networks 
Nucleic Acids Research  2013;42(1):e6.
Transcription factors (TFs) combine with co-factors to form transcriptional regulatory modules (TRMs) that regulate gene expression programs with spatiotemporal specificity. Here we present a novel and generic method (rTRM) for the reconstruction of TRMs that integrates genomic information from TF binding, cell type-specific gene expression and protein–protein interactions. rTRM was applied to reconstruct the TRMs specific for embryonic stem cells (ESC) and hematopoietic stem cells (HSC), neural progenitor cells, trophoblast stem cells and distinct types of terminally differentiated CD4+ T cells. The ESC and HSC TRM predictions were highly precise, yielding 77 and 96 proteins, of which ∼75% have been independently shown to be involved in the regulation of these cell types. Furthermore, rTRM successfully identified a large number of bridging proteins with known roles in ESCs and HSCs, which could not have been identified using genomic approaches alone, as they lack the ability to bind specific DNA sequences. This highlights the advantage of rTRM over other methods that ignore PPI information, as proteins need to interact with other proteins to form complexes and perform specific functions. The prediction and experimental validation of the co-factors that endow master regulatory TFs with the capacity to select specific genomic sites, modulate the local epigenetic profile and integrate multiple signals will provide important mechanistic insights not only into how such TFs operate, but also into abnormal transcriptional states leading to disease.
PMCID: PMC3874207  PMID: 24137002
5.  The IL-10/STAT3-mediated anti-inflammatory response: recent developments and future challenges 
Briefings in Functional Genomics  2013;12(6):489-498.
Inflammation is a fundamental response of the immune system whose successful termination involves the elimination of the invading pathogens, the resolution of inflammation and the repair of the local damaged tissue. In this context, the interleukin 10 (IL-10)-mediated anti-inflammatory response (AIR) represents an essential homeostatic mechanism that controls the degree and duration of inflammation. Here, we review recent work on the mechanistic characterization of the IL-10-mediated AIR on multiple levels: from the cataloguing of the in vivo genomic targets of STAT3 (the transcription factor downstream of IL-10) to the identification of specific co-factors that endow STAT3 with genomic-binding specificity, and how genomic and computational methods are being used to elucidate the regulatory mechanisms of this essential physiological response in macrophages.
PMCID: PMC3838198  PMID: 23943603
IL-10; JAK1; STAT3; anti-inflammatory response; macrophages; transcriptional regulatory modules; bioinformatics
6.  Hard-wired heterogeneity in blood stem cells revealed using a dynamic regulatory network model 
Bioinformatics  2013;29(13):i80-i88.
Motivation: Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements, therefore, has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here, we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes.
Results: Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as ‘stepping stones’ for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or ‘trigger’ is required to exit the stem cell state, with distinct triggers characterizing maturation into the various different lineages. By focusing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1, which we confirmed experimentally thus validating our model. In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells.
Contact: or or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3694641  PMID: 23813012
7.  Genomic and computational approaches to dissect the mechanisms of STAT3’s universal and cell type-specific functions 
JAK-STAT  2013;2(4):e25097.
STAT3 is the quintessential pleiotropic transcription factor with many biological roles throughout development as well as in multiple adult tissues. Its functional heterogeneity is encoded in the range of genome-wide binding patterns that specify different regulatory networks in distinct cell types. However, STAT3 does not display remarkable DNA binding preferences that may help correlate specific motifs with individual biological functions or cell types. Therefore, achieving a detailed understanding of the regulatory mechanisms that endow STAT3 (or any other pleiotropic transcription factor) with such a rainbow of functions is not only a central problem in biology but also a fiendishly difficult one. Here we describe key genomic and computational approaches that have shed light into this question, and present the two current models of STAT3 binding (universal and cell type-specific). We also discuss the role that the local epigenetic environment plays in the selection of STAT3 binding sites.
PMCID: PMC3876425  PMID: 24416643
STAT3; JAK-STAT; ChIP-seq; genomics; bioinformatics; pleiotropy; transcriptional regulatory modules; cancer; inflammation
8.  The Repertoires of Ubiquitinating and Deubiquitinating Enzymes in Eukaryotic Genomes 
Molecular Biology and Evolution  2013;30(5):1172-1187.
Reversible protein ubiquitination regulates virtually all known cellular activities. Here, we present a quantitatively evaluated and broadly applicable method to predict eukaryotic ubiquitinating enzymes (UBE) and deubiquitinating enzymes (DUB) and its application to 50 distinct genomes belonging to four of the five major phylogenetic supergroups of eukaryotes: unikonts (including metazoans, fungi, choanozoa, and amoebozoa), excavates, chromalveolates, and plants. Our method relies on a collection of profile hidden Markov models, and we demonstrate its superior performance (coverage and classification accuracy >99%) by identifying approximately 25% and approximately 35% additional UBE and DUB genes in yeast and human, which had not been reported before. In yeast, we predict 85 UBE and 24 DUB genes, for 814 UBE and 107 DUB genes in the human genome. Most UBE and DUB families are present in all eukaryotic lineages, with plants and animals harboring massively enlarged repertoires of ubiquitin ligases. Unicellular organisms, on the other hand, typically harbor less than 300 UBEs and less than 40 DUBs per genome. Ninety-one UBE/DUB genes are orthologous across all four eukaryotic supergroups, and these likely represent a primordial core of enzymes of the ubiquitination system probably dating back to the first eukaryotes approximately 2 billion years ago. Our genome-wide predictions are available through the Database of Ubiquitinating and Deubiquitinating Enzymes (, where users can also perform advanced sequence and phylogenetic analyses and submit their own predictions.
PMCID: PMC3670738  PMID: 23393154
ubiquitination; functional prediction; profile hidden Markov model
9.  Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling 
Genome Biology  2013;14(2):R11.
The Amoebozoa constitute one of the primary divisions of eukaryotes, encompassing taxa of both biomedical and evolutionary importance, yet its genomic diversity remains largely unsampled. Here we present an analysis of a whole genome assembly of Acanthamoeba castellanii (Ac) the first representative from a solitary free-living amoebozoan.
Ac encodes 15,455 compact intron-rich genes, a significant number of which are predicted to have arisen through inter-kingdom lateral gene transfer (LGT). A majority of the LGT candidates have undergone a substantial degree of intronization and Ac appears to have incorporated them into established transcriptional programs. Ac manifests a complex signaling and cell communication repertoire, including a complete tyrosine kinase signaling toolkit and a comparable diversity of predicted extracellular receptors to that found in the facultatively multicellular dictyostelids. An important environmental host of a diverse range of bacteria and viruses, Ac utilizes a diverse repertoire of predicted pattern recognition receptors, many with predicted orthologous functions in the innate immune systems of higher organisms.
Our analysis highlights the important role of LGT in the biology of Ac and in the diversification of microbial eukaryotes. The early evolution of a key signaling facility implicated in the evolution of metazoan multicellularity strongly argues for its emergence early in the Unikont lineage. Overall, the availability of an Ac genome should aid in deciphering the biology of the Amoebozoa and facilitate functional genomic studies in this important model organism and environmental host.
PMCID: PMC4053784  PMID: 23375108
10.  Distinct transcriptional regulatory modules underlie STAT3’s cell type-independent and cell type-specific functions 
Nucleic Acids Research  2013;41(4):2155-2170.
Transcription factors (TFs) regulate gene expression by binding to short DNA sequence motifs, yet their binding specificities alone cannot explain how certain TFs drive a diversity of biological processes. In order to investigate the factors that control the functions of the pleiotropic TF STAT3, we studied its genome-wide binding patterns in four different cell types: embryonic stem cells, CD4+ T cells, macrophages and AtT-20 cells. We describe for the first time two distinct modes of STAT3 binding. First, a small cell type-independent mode represented by a set of 35 evolutionarily conserved STAT3-binding sites that collectively regulate STAT3’s own functions and cell growth. We show that STAT3 is recruited to sites with E2F1 already pre-bound before STAT3 activation. Second, a series of different transcriptional regulatory modules (TRMs) assemble around STAT3 to drive distinct transcriptional programs in the four cell types. These modules recognize cell type-specific binding sites and are associated with factors particular to each cell type. Our study illustrates the versatility of STAT3 to regulate both universal- and cell type-specific functions by means of distinct TRMs, a mechanism that might be common to other pleiotropic TFs.
PMCID: PMC3575808  PMID: 23295670
11.  Genome-wide enhancer prediction from epigenetic signatures using genetic algorithm-optimized support vector machines 
Nucleic Acids Research  2012;40(10):e77.
The chemical modification of histones at specific DNA regulatory elements is linked to the activation, inactivation and poising of genes. A number of tools exist to predict enhancers from chromatin modification maps, but their practical application is limited because they either (i) consider a smaller number of marks than those necessary to define the various enhancer classes or (ii) work with an excessive number of marks, which is experimentally unviable. We have developed a method for chromatin state detection using support vector machines in combination with genetic algorithm optimization, called ChromaGenSVM. ChromaGenSVM selects optimum combinations of specific histone epigenetic marks to predict enhancers. In an independent test, ChromaGenSVM recovered 88% of the experimentally supported enhancers in the pilot ENCODE region of interferon gamma-treated HeLa cells. Furthermore, ChromaGenSVM successfully combined the profiles of only five distinct methylation and acetylation marks from ChIP-seq libraries done in human CD4+ T cells to predict ∼21 000 experimentally supported enhancers within 1.0 kb regions and with a precision of ∼90%, thereby improving previous predictions on the same dataset by 21%. The combined results indicate that ChromaGenSVM comfortably outperforms previously published methods and that enhancers are best predicted by specific combinations of histone methylation and acetylation marks.
PMCID: PMC3378905  PMID: 22328731
12.  Deep Evolutionary Conservation of an Intramolecular Protein Kinase Activation Mechanism 
PLoS ONE  2012;7(1):e29702.
DYRK-family kinases employ an intramolecular mechanism to autophosphorylate a critical tyrosine residue in the activation loop. Once phosphorylated, DYRKs lose tyrosine kinase activity and function as serine/threonine kinases. DYRKs have been characterized in organisms from yeast to human; however, all entities belong to the Unikont supergroup, only one of five eukaryotic supergroups. To assess the evolutionary age and conservation of the DYRK intramolecular kinase-activation mechanism, we surveyed 21 genomes representing four of the five eukaryotic supergroups for the presence of DYRKs. We also analyzed the activation mechanism of the sole DYRK (class 2 DYRK) present in Trypanosoma brucei (TbDYRK2), a member of the excavate supergroup and separated from Drosophila by ∼850 million years. Bioinformatics showed the DYRKs clustering into five known subfamilies, class 1, class 2, Yaks, HIPKs and Prp4s. Only class 2 DYRKs were present in all four supergroups. These diverse class 2 DYRKs also exhibited conservation of N-terminal NAPA regions located outside of the kinase domain, and were shown to have an essential role in activation loop autophosphorylation of Drosophila DmDYRK2. Class 2 TbDYRK2 required the activation loop tyrosine conserved in other DYRKs, the NAPA regions were critical for this autophosphorylation event, and the NAPA-regions of Trypanosoma and human DYRK2 complemented autophosphorylation by the kinase domain of DmDYRK2 in trans. Finally, sequential deletion analysis was used to further define the minimal region required for trans-complementation. Our analysis provides strong evidence that class 2 DYRKs were present in the primordial or root eukaryote, and suggest this subgroup may be the oldest, founding member of the DYRK family. The conservation of activation loop autophosphorylation demonstrates that kinase self-activation mechanisms are also primitive.
PMCID: PMC3250476  PMID: 22235329
13.  Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium 
Nature  2010;464(7287):367-373.
Fusarium species are among the most important phytopathogenic and toxigenic fungi. To understand the molecular underpinnings of pathogenicity in the genus Fusarium, we compared the genomes of three phenotypically diverse species: Fusarium graminearum, Fusarium verticillioides and Fusarium oxysporum f. sp. lycopersici. Our analysis revealed lineage-specific (LS) genomic regions in F. oxysporum that include four entire chromosomes and account for more than one-quarter of the genome. LS regions are rich in transposons and genes with distinct evolutionary profiles but related to pathogenicity, indicative of horizontal acquisition. Experimentally, we demonstrate the transfer of two LS chromosomes between strains of F. oxysporum, converting a non-pathogenic strain into a pathogen. Transfer of LS chromosomes between otherwise genetically isolated strains explains the polyphyletic origin of host specificity and the emergence of new pathogenic lineages in F. oxysporum. These findings put the evolution of fungal pathogenicity into a new perspective.
PMCID: PMC3048781  PMID: 20237561
14.  Motif-blind, genome-wide discovery of cis-regulatory modules in Drosophila and mouse 
Developmental cell  2009;17(4):568-579.
We present new approaches to cis-regulatory module (CRM) discovery in the common scenario where relevant transcription factors and/or motifs are unknown. Beginning with a small list of CRMs mediating a common gene expression pattern, we search genome-wide for CRMs with similar functionality, using new statistical scores, and without requiring known motifs or accurate motif discovery. We cross-validate our predictions on 31 regulatory networks in Drosophila and through correlations with gene expression data. Five predicted modules tested using an in vivo reporter gene assay all show tissue-specific regulatory activity. We also demonstrate our methods’ ability to predict mammalian tissue-specific enhancers. Finally, we predict human CRMs that regulate early blood and cardiovascular development. In vivo transgenic mouse analysis of two predicted CRMs demonstrates that both have appropriate enhancer activity. Overall, 7/7 predictions were validated successfully in vivo, demonstrating the effectiveness of our approach for insect and mammalian genomes.
PMCID: PMC2768654  PMID: 19853570
15.  Genomic Analysis of the Basal Lineage Fungus Rhizopus oryzae Reveals a Whole-Genome Duplication 
PLoS Genetics  2009;5(7):e1000549.
Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called “zygomycetes,” R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99–880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs), comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD) event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin–proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14α-demethylase (ERG11), could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.
Author Summary
Rhizopus oryzae is a widely dispersed fungus that can cause fatal infections in people with suppressed immune systems, especially diabetics or organ transplant recipients. Antibiotic therapy alone is rarely curative, particularly in patients with disseminated infection. We sequenced the genome of a pathogenic R. oryzae strain and found evidence that the entire genome had been duplicated at some point in its evolution and retained two copies of three extremely sophisticated systems involved in energy generation and utilization. The ancient whole-genome duplication, together with recent gene duplications, has led to the expansion of gene families related to pathogen virulence, fungal-specific cell wall synthesis, and signal transduction, which may contribute to the aggressive and frequently life-threatening growth of this organism. We also identified cell wall synthesis enzymes, essential for fungal cell integrity but absent in mammals, which may present potential targets for developing novel diagnostic and therapeutic treatments. R. oryzae represents the first sequenced fungus from the early lineages of the fungal phylogenetic tree, and thus the genome sequence sheds light on the evolution of the entire fungal kingdom.
PMCID: PMC2699053  PMID: 19578406
16.  A HaemAtlas: characterizing gene expression in differentiated human blood cells 
Blood  2009;113(19):e1-e9.
Hematopoiesis is a carefully controlled process that is regulated by complex networks of transcription factors that are, in part, controlled by signals resulting from ligand binding to cell-surface receptors. To further understand hematopoiesis, we have compared gene expression profiles of human erythroblasts, megakaryocytes, B cells, cytotoxic and helper T cells, natural killer cells, granulocytes, and monocytes using whole genome microarrays. A bioinformatics analysis of these data was performed focusing on transcription factors, immunoglobulin superfamily members, and lineage-specific transcripts. We observed that the numbers of lineage-specific genes varies by 2 orders of magnitude, ranging from 5 for cytotoxic T cells to 878 for granulocytes. In addition, we have identified novel coexpression patterns for key transcription factors involved in hematopoiesis (eg, GATA3-GFI1 and GATA2-KLF1). This study represents the most comprehensive analysis of gene expression in hematopoietic cells to date and has identified genes that play key roles in lineage commitment and cell function. The data, which are freely accessible, will be invaluable for future studies on hematopoiesis and the role of specific genes and will also aid the understanding of the recent genome-wide association studies.
PMCID: PMC2680378  PMID: 19228925
17.  The Phosphoproteome of Bloodstream Form Trypanosoma brucei, Causative Agent of African Sleeping Sickness 
The protozoan parasite Trypanosoma brucei is the causative agent of human African sleeping sickness and related animal diseases, and it has over 170 predicted protein kinases. Protein phosphorylation is a key regulatory mechanism for cellular function that, thus far, has been studied in T.brucei principally through putative kinase mRNA knockdown and observation of the resulting phenotype. However, despite the relatively large kinome of this organism and the demonstrated essentiality of several T. brucei kinases, very few specific phosphorylation sites have been determined in this organism. Using a gel-free, phosphopeptide enrichment-based proteomics approach we performed the first large scale phosphorylation site analyses for T.brucei. Serine, threonine, and tyrosine phosphorylation sites were determined for a cytosolic protein fraction of the bloodstream form of the parasite, resulting in the identification of 491 phosphoproteins based on the identification of 852 unique phosphopeptides and 1204 phosphorylation sites. The phosphoproteins detected in this study are predicted from their genome annotations to participate in a wide variety of biological processes, including signal transduction, processing of DNA and RNA, protein synthesis, and degradation and to a minor extent in metabolic pathways. The analysis of phosphopeptides and phosphorylation sites was facilitated by in-house developed software, and this automated approach was validated by manual annotation of spectra of the kinase subset of proteins. Analysis of the cytosolic bloodstream form T. brucei kinome revealed the presence of 44 phosphorylated protein kinases in our data set that could be classified into the major eukaryotic protein kinase groups by applying a multilevel hidden Markov model library of the kinase catalytic domain. Identification of the kinase phosphorylation sites showed conserved phosphorylation sequence motifs in several kinase activation segments, supporting the view that phosphorylation-based signaling is a general and fundamental regulatory process that extends to this highly divergent lower eukaryote.
PMCID: PMC2716717  PMID: 19346560
18.  Draft Genome of the Filarial Nematode Parasite Brugia malayi 
Science (New York, N.Y.)  2007;317(5845):1756-1760.
Parasitic nematodes that cause elephantiasis and river blindness threaten hundreds of millions of people in the developing world. We have sequenced the ~90 megabase (Mb) genome of the human filarial parasite Brugia malayi and predict ~11,500 protein coding genes in 71 Mb of robustly assembled sequence. Comparative analysis with the free-living, model nematode Caenorhabditis elegans revealed that, despite these genes having maintained little conservation of local synteny during ~350 million years of evolution, they largely remain in linkage on chromosomal units. More than 100 conserved operons were identified. Analysis of the predicted proteome provides evidence for adaptations of B. malayi to niches in its human and vector hosts and insights into the molecular basis of a mutualistic relationship with its Wolbachia endosymbiont. These findings offer a foundation for rational drug design.
PMCID: PMC2613796  PMID: 17885136
19.  BloodExpress: a database of gene expression in mouse haematopoiesis 
Nucleic Acids Research  2008;37(Database issue):D873-D879.
Haematopoiesis is the process whereby blood stem cells give rise to at least fourteen functionally distinct mature cell types, and represents the best characterized mammalian adult stem cell system. Here we introduce the BloodExpress database, the first public resource integrating mouse blood cell expression profiles. BloodExpress enables the searching of data from individual studies in a single database accessible through a user-friendly web interface. Microarray datasets have been processed uniformly to allow their comparison on the BloodExpress platform. BloodExpress covers the majority of murine blood cell types, including both progenitors and terminally differentiated cells. This allows for the identification of dynamic changes in gene expression as cells differentiate down the well-defined haematopoietic hierarchy. A gene-centric interface returns haematopoietic expression patterns together with functional annotation and a list of other genes with similar expression patterns. A cell type-centric interface allows the identification of genes expressed at specific points of blood development, with the additional and useful capability of filtering by specific gene functional categories. BloodExpress thus constitutes a platform for the discovery of novel gene functions across the haematopoietic tree. BloodExpress is freely accessible at
PMCID: PMC2686428  PMID: 18987008
20.  Kinomer v. 1.0: a database of systematically classified eukaryotic protein kinases 
Nucleic Acids Research  2008;37(Database issue):D244-D250.
The regulation of protein function through reversible phosphorylation by protein kinases and phosphatases is a general mechanism controlling virtually every cellular activity. Eukaryotic protein kinases can be classified into distinct, well-characterized groups based on amino acid sequence similarity and function. We recently reported a highly sensitive and accurate hidden Markov model-based method for the automatic detection and classification of protein kinases into these specific groups. The Kinomer v. 1.0 database presented here contains annotated classifications for the protein kinase complements of 43 eukaryotic genomes. These span the taxonomic range and include fungi (16 species), plants (6), diatoms (1), amoebas (2), protists (1) and animals (17). The kinomes are stored in a relational database and are accessible through a web interface on the basis of species, kinase group or a combination of both. In addition, the Kinomer v. 1.0 HMM library is made available for users to perform classification on arbitrary sequences. The Kinomer v. 1.0 database is a continually updated resource where direct comparison of kinase sequences across kinase groups and across species can give insights into kinase function and evolution. Kinomer v. 1.0 is available at
PMCID: PMC2686601  PMID: 18974176
21.  Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis 
Science (New York, N.Y.)  2007;315(5809):207-212.
We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ~160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated in pathogenesis and phagocytosis of host proteins may exemplify adaptations of the parasite during its transition to a urogenital environment. The genome sequence predicts previously unknown functions for the hydrogenosome, which support a common evolutionary origin of this unusual organelle with mitochondria.
PMCID: PMC2080659  PMID: 17218520
22.  The complement of protein kinases of the microsporidium Encephalitozoon cuniculi in relation to those of Saccharomyces cerevisiae and Schizosaccharomyces pombe 
BMC Genomics  2007;8:309.
Microsporidia, parasitic fungi-related eukaryotes infecting many cell types in a wide range of animals (including humans), represent a serious health threat in immunocompromised patients. The 2.9 Mb genome of the microsporidium Encephalitozoon cuniculi is the smallest known of any eukaryote. Eukaryotic protein kinases are a large superfamily of enzymes with crucial roles in most cellular processes, and therefore represent potential drug targets. We report here an exhaustive analysis of the E. cuniculi genomic database aimed at identifying and classifying all protein kinases of this organism with reference to the kinomes of two highly-divergent yeast species, Saccharomyces cerevisiae and Schizosaccharomyces pombe.
A database search with a multi-level protein kinase family hidden Markov model library led to the identification of 29 conventional protein kinase sequences in the E. cuniculi genome, as well as 3 genes encoding atypical protein kinases. The microsporidian kinome presents striking differences from those of other eukaryotes, and this minimal kinome underscores the importance of conserved protein kinases involved in essential cellular processes. ~30% of its kinases are predicted to regulate cell cycle progression while another ~28% have no identifiable homologues in model eukaryotes and are likely to reflect parasitic adaptations. E. cuniculi lacks MAP kinase cascades and almost all protein kinases that are involved in stress responses, ion homeostasis and nutrient signalling in the model fungi S. cerevisiae and S. pombe, including AMPactivated protein kinase (Snf1), previously thought to be ubiquitous in eukaryotes. A detailed database search and phylogenetic analysis of the kinomes of the two model fungi showed that the degree of homology between their kinomes of ~85% is much higher than that previously reported.
The E. cuniculi kinome is by far the smallest eukaryotic kinome characterised to date. The difficulty in assigning clear homology relationships for nine out of the twentynine microsporidian conventional protein kinases despite its compact genome reflects the phylogenetic distance between microsporidia and other eukaryotes. Indeed, the E. cuniculi genome presents a high proportion of genes in which evolution has been accelerated by up to four-fold. There are no orthologues of the protein kinases that constitute MAP kinase pathways and many other protein kinases with roles in nutrient signalling are absent from the E. cuniculi kinome. However, orthologous kinases can nonetheless be identified that correspond to members of the yeast kinomes with roles in some of the most fundamental cellular processes. For example, E. cuniculi has clear orthologues of virtually all the major conserved protein kinases that regulate the core cell cycle machinery (Aurora, Polo, DDK, CDK and Chk1). A comprehensive comparison of the homology relationships between the budding and fission yeast kinomes indicates that, despite an estimated 800 million years of independent evolution, the two model fungi share ~85% of their protein kinases. This will facilitate the annotation of many of the as yet uncharacterised fission yeast kinases, and also those of novel fungal genomes.
PMCID: PMC2078597  PMID: 17784954
23.  GOLD.db: genomics of lipid-associated disorders database 
BMC Genomics  2004;5:93.
The GOLD.db (Genomics of Lipid-Associated Disorders Database) was developed to address the need for integrating disparate information on the function and properties of genes and their products that are particularly relevant to the biology, diagnosis management, treatment, and prevention of lipid-associated disorders.
The GOLD.db provides a reference for pathways and information about the relevant genes and proteins in an efficiently organized way. The main focus was to provide biological pathways with image maps and visual pathway information for lipid metabolism and obesity-related research. This database provides also the possibility to map gene expression data individually to each pathway. Gene expression at different experimental conditions can be viewed sequentially in context of the pathway. Related large scale gene expression data sets were provided and can be searched for specific genes to integrate information regarding their expression levels in different studies and conditions. Analytic and data mining tools, reagents, protocols, references, and links to relevant genomic resources were included in the database. Finally, the usability of the database was demonstrated using an example about the regulation of Pten mRNA during adipocyte differentiation in the context of relevant pathways.
The GOLD.db will be a valuable tool that allow researchers to efficiently analyze patterns of gene expression and to display them in a variety of useful and informative ways, allowing outside researchers to perform queries pertaining to gene expression results in the context of biological processes and pathways.
PMCID: PMC544894  PMID: 15588328
24.  An essential Aurora-related kinase transiently associates with spindle pole bodies during Plasmodium falciparum erythrocytic schizogony 
Molecular Microbiology  2011;79(1):205-221.
Aurora kinases compose a family of conserved Ser/Thr protein kinases playing essential roles in eukaryotic cell division. To date, Aurora homologues remain uncharacterized in the protozoan phylum Apicomplexa. In malaria parasites, the characterization of Aurora kinases may help understand the cell cycle control during erythrocytic schizogony where asynchronous nuclear divisions occur. In this study, we revisited the kinome of Plasmodium falciparum and identified three Aurora-related kinases, Pfark-1, -2, -3. Among these, Pfark-1 is highly conserved in malaria parasites and also appears to be conserved across Apicomplexa. By tagging the endogenous Pfark-1 gene with the green fluorescent protein (GFP) in live parasites, we show that the Pfark-1–GFP protein forms paired dots associated with only a subset of nuclei within individual schizonts. Immunofluorescence analysis using an anti-α-tubulin antibody strongly suggests a recruitment of Pfark-1 at duplicated spindle pole bodies at the entry of the M phase of the cell cycle. Unsuccessful attempts at disrupting the Pfark-1 gene with a knockout construct further indicate that Pfark-1 is required for parasite growth in red blood cells. Our study provides new insights into the cell cycle control of malaria parasites and reports the importance of Aurora kinases as potential targets for new antimalarials.
PMCID: PMC3025120  PMID: 21166904
25.  Chemical Proteomic Analysis Reveals the Drugability of the Kinome of Trypanosoma brucei 
ACS Chemical Biology  2012;7(11):1858-1865.
The protozoan parasite Trypanosoma brucei is the causative agent of African sleeping sickness, and there is an urgent unmet need for improved treatments. Parasite protein kinases are attractive drug targets, provided that the host and parasite kinomes are sufficiently divergent to allow specific inhibition to be achieved. Current drug discovery efforts are hampered by the fact that comprehensive assay panels for parasite targets have not yet been developed. Here, we employ a kinase-focused chemoproteomics strategy that enables the simultaneous profiling of kinase inhibitor potencies against more than 50 endogenously expressed T. brucei kinases in parasite cell extracts. The data reveal that T. brucei kinases are sensitive to typical kinase inhibitors with nanomolar potency and demonstrate the potential for the development of species-specific inhibitors.
PMCID: PMC3621575  PMID: 22908928

Results 1-25 (25)