Search tips
Search criteria

Results 1-25 (114)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  The genomes of four tapeworm species reveal adaptations to parasitism 
Nature  2013;496(7443):57-63.
Tapeworms cause debilitating neglected diseases that can be deadly and often require surgery due to ineffective drugs. Here we present the first analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115-141 megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have species-specific expansions of non-canonical heat shock proteins and families of known antigens; specialised detoxification pathways, and metabolism finely tuned to rely on nutrients scavenged from their hosts. We identify new potential drug targets, including those on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
PMCID: PMC3964345  PMID: 23485966
HSP70; parasitism; Cestoda; cysticercosis; echinococcosis; Platyhelminthes
2.  Extensive Microbial and Functional Diversity within the Chicken Cecal Microbiome 
PLoS ONE  2014;9(3):e91941.
Chickens are major source of food and protein worldwide. Feed conversion and the health of chickens relies on the largely unexplored complex microbial community that inhabits the chicken gut, including the ceca. We have carried out deep microbial community profiling of the microbiota in twenty cecal samples via 16S rRNA gene sequences and an in-depth metagenomics analysis of a single cecal microbiota. We recovered 699 phylotypes, over half of which appear to represent previously unknown species. We obtained 648,251 environmental gene tags (EGTs), the majority of which represent new species. These were binned into over two-dozen draft genomes, which included Campylobacter jejuni and Helicobacter pullorum. We found numerous polysaccharide- and oligosaccharide-degrading enzymes encoding within the metagenome, some of which appeared to be part of polysaccharide utilization systems with genetic evidence for the co-ordination of polysaccharide degradation with sugar transport and utilization. The cecal metagenome encodes several fermentation pathways leading to the production of short-chain fatty acids, including some with novel features. We found a dozen uptake hydrogenases encoded in the metagenome and speculate that these provide major hydrogen sinks within this microbial community and might explain the high abundance of several genera within this microbiome, including Campylobacter, Helicobacter and Megamonas.
PMCID: PMC3962364  PMID: 24657972
3.  The mobility of two kinase domains in the Escherichia coli chemoreceptor array varies with signaling state 
Molecular microbiology  2013;89(5):831-841.
Motile bacteria sense their physical and chemical environment through highly cooperative, ordered arrays of chemoreceptors. These signaling complexes phosphorylate a response regulator which in turn governs flagellar motor reversals, driving cells towards favorable environments. The structural changes that translate chemoeffector binding into the appropriate kinase output are not known. Here, we apply high-resolution electron cryotomography to visualize mutant chemoreceptor signaling arrays in well-defined kinase activity states. The arrays were well ordered in all signaling states, with no discernible differences in receptor conformation at 2-3 nm resolution. Differences were observed, however, in a keel-like density that we identify here as CheA kinase domains P1 and P2, which are the phosphorylation site domain and the binding domain for response regulator target proteins, respectively. Mutant receptor arrays with high kinase activities all exhibited small keels and high proteolysis susceptibility, indicative of mobile P1 and P2 domains. In contrast, arrays in kinase-off signaling states exhibited a range of keel sizes. These findings confirm that chemoreceptor arrays do not undergo large structural changes during signaling, and suggest instead that kinase activity is modulated at least in part by changes in the mobility of key domains.
PMCID: PMC3763515  PMID: 23802570
bacterial chemotaxis; signal transduction; electron cryotomography
4.  Mechanistic aspects of inducible nitric oxide synthase-induced lung injury in burn trauma☆ 
Although the beneficial effects of inducible nitric oxide synthase (iNOS) inhibition in acute lung injury secondary to cutaneous burn and smoke inhalation were previously demonstrated, the mechanistic aspects are not completely understood. The objective of the present study is to describe the mechanism(s) underlying these favourable effects. We hypothesised that iNOS inhibition prevents formation of excessive reactive nitrogen species and attenuates the activation of poly(ADP) (poly(adenosine diphosphate)) ribose polymerase, thus mitigating the severity of acute lung injury in sheep subjected to combined burn and smoke inhalation.
Adult ewes were chronically instrumented for a 24-h study and allocated to groups: sham: not injured, not treated, n = 6; control: injured, not treated, n = 6; and BBS-2: injured treated with iNOS dimerisation inhibitor BBS-2, n = 6. Control and BBS-2 groups received 40% total body surface area 3rd-degree cutaneous burn and cotton smoke insufflation into the lungs under isoflurane anaesthesia.
Treatment with iNOS inhibitor BBS-2 significantly improved pulmonary gas exchange (partial pressure of oxygen in the blood/fraction of inspired oxygen (PaO2/FiO2) 409 ± 43 mmHg vs. 233 ± 50 mmHg in controls, p < 0.05) and reduced airway pressures (peak pressure 20 ± 1 cm H2O vs. 28 ± 2 cm H2O in controls, p < 0.05) and lung water content (lung wet-to-dry ratio 4.1 ± 0.3 vs. 5.2 ± 0.2 in controls, p < 0.05) 24 h after the burn and smoke injury. BBS-2 significantly reduced the increases in lung lymph nitrite/nitrate (10 ± 3 μM vs. 26 ± 6 μM in controls, p < 0.05) and 3-nitrotyrosine (109 ± 11 (densitometry value) vs. 151 ± 18 in controls, p < 0.05). Burn/smoke-induced increases in lung tissue nitrite/nitrate, poly(ADP)ribose polymerase, nuclear factor-κB (NF-κB) activity, myeloperoxidase activity and malondialdehyde formation and interleukin (IL)-8 expression were also attenuated with BBS-2.
The results provide strong evidence that BBS-2 ameliorated acute lung injury by inhibiting the inducible nitric oxide synthase/reactive nitrogen species/poly(ADP-ribose) polymerase (iNOS/RNS/PARP) pathway.
PMCID: PMC3936245  PMID: 21334141
Thermal injury; Reactive nitrogen species; Poly(ADP)ribose
5.  Quantitative Genome-Wide Genetic Interaction Screens Reveal Global Epistatic Relationships of Protein Complexes in Escherichia coli 
PLoS Genetics  2014;10(2):e1004120.
Large-scale proteomic analyses in Escherichia coli have documented the composition and physical relationships of multiprotein complexes, but not their functional organization into biological pathways and processes. Conversely, genetic interaction (GI) screens can provide insights into the biological role(s) of individual gene and higher order associations. Combining the information from both approaches should elucidate how complexes and pathways intersect functionally at a systems level. However, such integrative analysis has been hindered due to the lack of relevant GI data. Here we present a systematic, unbiased, and quantitative synthetic genetic array screen in E. coli describing the genetic dependencies and functional cross-talk among over 600,000 digenic mutant combinations. Combining this epistasis information with putative functional modules derived from previous proteomic data and genomic context-based methods revealed unexpected associations, including new components required for the biogenesis of iron-sulphur and ribosome integrity, and the interplay between molecular chaperones and proteases. We find that functionally-linked genes co-conserved among γ-proteobacteria are far more likely to have correlated GI profiles than genes with divergent patterns of evolution. Overall, examining bacterial GIs in the context of protein complexes provides avenues for a deeper mechanistic understanding of core microbial systems.
Author Summary
Genome-wide genetic interaction (GI) screens have been performed in yeast, but no analogous large-scale studies have yet been reported for bacteria. Here, we have used E. coli synthetic genetic array (eSGA) technology developed by our group to quantitatively map GIs to reveal epistatic dependencies and functional cross-talk among ∼600,000 digenic mutant combinations. By combining this epistasis information with functional modules derived by our group's earlier efforts from proteomic and genomic context (GC)-based methods, we identify several unexpected pathway-level dependencies, functional links between protein complexes, and biological roles of uncharacterized bacterial gene products. As part of the study, two of our pathway predictions from GI screens were validated experimentally, where we confirmed the role of these new components in iron-sulphur biogenesis and ribosome integrity. We also extrapolated the epistatic connectivity diagram of E. coli to 233 distantly related γ-proteobacterial species lacking GI information, and identified co-conserved genes and functional modules important for bacterial pathogenesis. Overall, this study describes the first genome-scale map of GIs in gram-negative bacterium, and through integrative analysis with previously derived protein-protein and GC-based interaction networks presents a number of novel insights into the architecture of bacterial pathways that could not have been discerned through either network alone.
PMCID: PMC3930520  PMID: 24586182
6.  Sequencing, De Novo Assembly and Annotation of the Colorado Potato Beetle, Leptinotarsa decemlineata, Transcriptome 
PLoS ONE  2014;9(1):e86012.
The Colorado potato beetle (Leptinotarsa decemlineata) is a major pest and a serious threat to potato cultivation throughout the northern hemisphere. Despite its high importance for invasion biology, phenology and pest management, little is known about L. decemlineata from a genomic perspective. We subjected European L. decemlineata adult and larval transcriptome samples to 454-FLX massively-parallel DNA sequencing to characterize a basal set of genes from this species. We created a combined assembly of the adult and larval datasets including the publicly available midgut larval Roche 454 reads and provided basic annotation. We were particularly interested in diapause-specific genes and genes involved in pesticide and Bacillus thuringiensis (Bt) resistance.
Using 454-FLX pyrosequencing, we obtained a total of 898,048 reads which, together with the publicly available 804,056 midgut larval reads, were assembled into 121,912 contigs. We established a repository of genes of interest, with 101 out of the 108 diapause-specific genes described in Drosophila montana; and 621 contigs involved in insecticide resistance, including 221 CYP450, 45 GSTs, 13 catalases, 15 superoxide dismutases, 22 glutathione peroxidases, 194 esterases, 3 ADAM metalloproteases, 10 cadherins and 98 calmodulins. We found 460 putative miRNAs and we predicted a significant number of single nucleotide polymorphisms (29,205) and microsatellite loci (17,284).
This report of the assembly and annotation of the transcriptome of L. decemlineata offers new insights into diapause-associated and insecticide-resistance-associated genes in this species and provides a foundation for comparative studies with other species of insects. The data will also open new avenues for researchers using L. decemlineata as a model species, and for pest management research. Our results provide the basis for performing future gene expression and functional analysis in L. decemlineata and improve our understanding of the biology of this invasive species at the molecular level.
PMCID: PMC3900453  PMID: 24465841
7.  Comparative Genomics of Cultured and Uncultured Strains Suggests Genes Essential for Free-Living Growth of Liberibacter 
PLoS ONE  2014;9(1):e84469.
The full genomes of two uncultured plant pathogenic Liberibacter, Ca. Liberibacter asiaticus and Ca. Liberibacter solanacearum, are publicly available. Recently, the larger genome of a closely related cultured strain, Liberibacter crescens BT-1, was described. To gain insights into our current inability to culture most Liberibacter, a comparative genomics analysis was done based on the RAST, KEGG, and manual annotations of these three organisms. In addition, pathogenicity genes were examined in all three bacteria. Key deficiencies were identified in Ca. L. asiaticus and Ca. L. solanacearum that might suggest why these organisms have not yet been cultured. Over 100 genes involved in amino acid and vitamin synthesis were annotated exclusively in L. crescens BT-1. However, none of these deficiencies are limiting in the rich media used to date. Other genes exclusive to L. crescens BT-1 include those involved in cell division, the stringent response regulatory pathway, and multiple two component regulatory systems. These results indicate that L. crescens is capable of growth under a much wider range of conditions than the uncultured Liberibacter strains. No outstanding differences were noted in pathogenicity-associated systems, suggesting that L. crescens BT-1 may be a plant pathogen on an as yet unidentified host.
PMCID: PMC3885570  PMID: 24416233
8.  Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies 
PLoS ONE  2014;9(1):e84348.
The comparison of samples, or beta diversity, is one of the essential problems in ecological studies. Next generation sequencing (NGS) technologies make it possible to obtain large amounts of metagenomic and metatranscriptomic short read sequences across many microbial communities. De novo assembly of the short reads can be especially challenging because the number of genomes and their sequences are generally unknown and the coverage of each genome can be very low, where the traditional alignment-based sequence comparison methods cannot be used. Alignment-free approaches based on k-tuple frequencies, on the other hand, have yielded promising results for the comparison of metagenomic samples. However, it is not known if these approaches can be used for the comparison of metatranscriptome datasets and which dissimilarity measures perform the best.
We applied several beta diversity measures based on k-tuple frequencies to real metatranscriptomic datasets from pyrosequencing 454 and Illumina sequencing platforms to evaluate their effectiveness for the clustering of metatranscriptomic samples, including three dissimilarity measures, one dissimilarity measure in CVTree, one relative entropy based measure S2 and three classical distances. Results showed that the measure can achieve superior performance on clustering metatranscriptomic samples into different groups under different sequencing depths for both 454 and Illumina datasets, recovering environmental gradients affecting microbial samples, classifying coexisting metagenomic and metatranscriptomic datasets, and being robust to sequencing errors. We also investigated the effects of tuple size and order of the background Markov model. A software pipeline to implement all the steps of analysis is built and is available at
The k-tuple based sequence signature measures can effectively reveal major groups and gradient variation among metatranscriptomic samples from NGS reads. The dissimilarity measure performs well in all application scenarios and its performance is robust with respect to tuple size and order of the Markov model.
PMCID: PMC3879298  PMID: 24392128
9.  Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation 
PLoS ONE  2013;8(12):e84508.
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation.
PMCID: PMC3875567  PMID: 24386392
10.  Multigramme synthesis and asymmetric dihydroxylation of a 4-fluorobut-2E-enoate 
Esters of crotonic acid were brominated on a multigramme scale using a free radical procedure. A phase transfer catalysed fluorination transformed these species to the 4-fluorobut-2E-enoates reproducibly and at scale (48–53%, ca. 300 mmol). Asymmetric dihydroxylation reactions were then used to transform the butenoate, ultimately into all four diastereoisomers of a versatile fluorinated C4 building block at high enantiomeric-enrichment. The (DHQ)2AQN and (DHQD)2AQN ligands described by Sharpless were the most effective. The development and optimisation of a new and facile method for the determination of ee is also described; 19F{1H} spectra recorded in d-chloroform/diisopropyl tartrate showed distinct baseline separated signals for different enantiomers.
PMCID: PMC3869297  PMID: 24367430
asymmetric; dihydroxylation; ee determination; fluorination; fluorosugars; organo-fluorine
11.  Biosynthesis of Vitamins and Cofactors in Bacterium-Harbouring Trypanosomatids Depends on the Symbiotic Association as Revealed by Genomic Analyses 
PLoS ONE  2013;8(11):e79786.
Some non-pathogenic trypanosomatids maintain a mutualistic relationship with a betaproteobacterium of the Alcaligenaceae family. Intensive nutritional exchanges have been reported between the two partners, indicating that these protozoa are excellent biological models to study metabolic co-evolution. We previously sequenced and herein investigate the entire genomes of five trypanosomatids which harbor a symbiotic bacterium (SHTs for Symbiont-Haboring Trypanosomatids) and the respective bacteria (TPEs for Trypanosomatid Proteobacterial Endosymbiont), as well as two trypanosomatids without symbionts (RTs for Regular Trypanosomatids), for the presence of genes of the classical pathways for vitamin biosynthesis. Our data show that genes for the biosynthetic pathways of thiamine, biotin, and nicotinic acid are absent from all trypanosomatid genomes. This is in agreement with the absolute growth requirement for these vitamins in all protozoa of the family. Also absent from the genomes of RTs are the genes for the synthesis of pantothenic acid, folic acid, riboflavin, and vitamin B6. This is also in agreement with the available data showing that RTs are auxotrophic for these essential vitamins. On the other hand, SHTs are autotrophic for such vitamins. Indeed, all the genes of the corresponding biosynthetic pathways were identified, most of them in the symbiont genomes, while a few genes, mostly of eukaryotic origin, were found in the host genomes. The only exceptions to the latter are: the gene coding for the enzyme ketopantoate reductase (EC: which is related instead to the Firmicutes bacteria; and two other genes, one involved in the salvage pathway of pantothenic acid and the other in the synthesis of ubiquinone, that are related to Gammaproteobacteria. Their presence in trypanosomatids may result from lateral gene transfer. Taken together, our results reinforce the idea that the low nutritional requirement of SHTs is associated with the presence of the symbiotic bacterium, which contains most genes for vitamin production.
PMCID: PMC3833962  PMID: 24260300
12.  Dead End Metabolites - Defining the Known Unknowns of the E. coli Metabolic Network  
PLoS ONE  2013;8(9):e75210.
The EcoCyc database is an online scientific database which provides an integrated view of the metabolic and regulatory network of the bacterium Escherichia coli K-12 and facilitates computational exploration of this important model organism. We have analysed the occurrence of dead end metabolites within the database – these are metabolites which lack the requisite reactions (either metabolic or transport) that would account for their production or consumption within the metabolic network. 127 dead end metabolites were identified from the 995 compounds that are contained within the EcoCyc metabolic network. Their presence reflects either a deficit in our representation of the network or in our knowledge of E. coli metabolism. Extensive literature searches resulted in the addition of 38 transport reactions and 3 metabolic reactions to the database and led to an improved representation of the pathway for Vitamin B12 salvage. 39 dead end metabolites were identified as components of reactions that are not physiologically relevant to E. coli K-12 – these reactions are properties of purified enzymes in vitro that would not be expected to occur in vivo. Our analysis led to improvements in the software that underpins the database and to the program that finds dead end metabolites within EcoCyc. The remaining dead end metabolites in the EcoCyc database likely represent deficiencies in our knowledge of E. coli metabolism.
PMCID: PMC3781023  PMID: 24086468
13.  Mutation Rules and the Evolution of Sparseness and Modularity in Biological Systems 
PLoS ONE  2013;8(8):e70444.
Biological systems exhibit two structural features on many levels of organization: sparseness, in which only a small fraction of possible interactions between components actually occur; and modularity – the near decomposability of the system into modules with distinct functionality. Recent work suggests that modularity can evolve in a variety of circumstances, including goals that vary in time such that they share the same subgoals (modularly varying goals), or when connections are costly. Here, we studied the origin of modularity and sparseness focusing on the nature of the mutation process, rather than on connection cost or variations in the goal. We use simulations of evolution with different mutation rules. We found that commonly used sum-rule mutations, in which interactions are mutated by adding random numbers, do not lead to modularity or sparseness except for in special situations. In contrast, product-rule mutations in which interactions are mutated by multiplying by random numbers – a better model for the effects of biological mutations – led to sparseness naturally. When the goals of evolution are modular, in the sense that specific groups of inputs affect specific groups of outputs, product-rule mutations also lead to modular structure; sum-rule mutations do not. Product-rule mutations generate sparseness and modularity because they tend to reduce interactions, and to keep small interaction terms small.
PMCID: PMC3735639  PMID: 23936433
14.  Exploring the Host Parasitism of the Migratory Plant-Parasitic Nematode Ditylenchus destuctor by Expressed Sequence Tags Analysis 
PLoS ONE  2013;8(7):e69579.
The potato rot nematode, Ditylenchus destructor, is a very destructive nematode pest on many agriculturally important crops worldwide, but the molecular characterization of its parasitism of plant has been limited. The effectors involved in nematode parasitism of plant for several sedentary endo-parasitic nematodes such as Heterodera glycines, Globodera rostochiensis and Meloidogyne incognita have been identified and extensively studied over the past two decades. Ditylenchus destructor, as a migratory plant parasitic nematode, has different feeding behavior, life cycle and host response. Comparing the transcriptome and parasitome among different types of plant-parasitic nematodes is the way to understand more fully the parasitic mechanism of plant nematodes. We undertook the approach of sequencing expressed sequence tags (ESTs) derived from a mixed stage cDNA library of D. destructor. This is the first study of D. destructor ESTs. A total of 9800 ESTs were grouped into 5008 clusters including 3606 singletons and 1402 multi-member contigs, representing a catalog of D. destructor genes. Implementing a bioinformatics' workflow, we found 1391 clusters have no match in the available gene database; 31 clusters only have similarities to genes identified from D. africanus, the most closely related species to D. destructor; 1991 clusters were annotated using Gene Ontology (GO); 1550 clusters were assigned enzyme commission (EC) numbers; and 1211 clusters were mapped to 181 KEGG biochemical pathways. 22 ESTs had similarities to reported nematode effectors. Interestedly, most of the effectors identified in this study are involved in host cell wall degradation or modification, such as 1,4-beta-glucanse, 1,3-beta-glucanse, pectate lyase, chitinases and expansin, or host defense suppression such as calreticulin, annexin and venom allergen-like protein. This result implies that the migratory plant-parasitic nematode D. destructor secrets similar effectors to those of sedentary plant nematodes. Finally we further characterized the two D. destructor expansin proteins.
PMCID: PMC3726699  PMID: 23922743
15.  Metagenomic Insights into Metabolic Capacities of the Gut Microbiota in a Fungus-Cultivating Termite (Odontotermes yunnanensis) 
PLoS ONE  2013;8(7):e69184.
Macrotermitinae (fungus-cultivating termites) are major decomposers in tropical and subtropical areas of Asia and Africa. They have specifically evolved mutualistic associations with both a Termitomyces fungi on the nest and a gut microbiota, providing a model system for probing host-microbe interactions. Yet the symbiotic roles of gut microbes residing in its major feeding caste remain largely undefined. Here, by pyrosequencing the whole gut metagenome of adult workers of a fungus-cultivating termite (Odontotermes yunnanensis), we showed that it did harbor a broad set of genes or gene modules encoding carbohydrate-active enzymes (CAZymes) relevant to plant fiber degradation, particularly debranching enzymes and oligosaccharide-processing enzymes. Besides, it also contained a considerable number of genes encoding chitinases and glycoprotein oligosaccharide-processing enzymes for fungal cell wall degradation. To investigate the metabolic divergence of higher termites of different feeding guilds, a SEED subsystem-based gene-centric comparative analysis of the data with that of a previously sequenced wood-feeding Nasutitermes hindgut microbiome was also attempted, revealing that SEED classifications of nitrogen metabolism, and motility and chemotaxis were significantly overrepresented in the wood-feeder hindgut metagenome, while Bacteroidales conjugative transposons and subsystems related to central aromatic compounds metabolism were apparently overrepresented here. This work fills up our gaps in understanding the functional capacities of fungus-cultivating termite gut microbiota, especially their roles in the symbiotic digestion of lignocelluloses and utilization of fungal biomass, both of which greatly add to existing understandings of this peculiar symbiosis.
PMCID: PMC3714238  PMID: 23874908
16.  Gene Fusion Analysis in the Battle against the African Endemic Sleeping Sickness 
PLoS ONE  2013;8(7):e68854.
The protozoan Trypanosoma brucei causes African Trypanosomiasis or sleeping sickness in humans, which can be lethal if untreated. Most available pharmacological treatments for the disease have severe side-effects. The purpose of this analysis was to detect novel protein-protein interactions (PPIs), vital for the parasite, which could lead to the development of drugs against this disease to block the specific interactions. In this work, the Domain Fusion Analysis (Rosetta Stone method) was used to identify novel PPIs, by comparing T. brucei to 19 organisms covering all major lineages of the tree of life. Overall, 49 possible protein-protein interactions were detected, and classified based on (a) statistical significance (BLAST e-value, domain length etc.), (b) their involvement in crucial metabolic pathways, and (c) their evolutionary history, particularly focusing on whether a protein pair is split in T. brucei and fused in the human host. We also evaluated fusion events including hypothetical proteins, and suggest a possible molecular function or involvement in a certain biological process. This work has produced valuable results which could be further studied through structural biology or other experimental approaches so as to validate the protein-protein interactions proposed here. The evolutionary analysis of the proteins involved showed that, gene fusion or gene fission events can happen in all organisms, while some protein domains are more prone to fusion and fission events and present complex evolutionary patterns.
PMCID: PMC3714255  PMID: 23874788
17.  ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq 
PLoS ONE  2013;8(7):e67019.
Experimental variance is a major challenge when dealing with high-throughput sequencing data. This variance has several sources: sampling replication, technical replication, variability within biological conditions, and variability between biological conditions. The high per-sample cost of RNA-Seq often precludes the large number of experiments needed to partition observed variance into these categories as per standard ANOVA models. We show that the partitioning of within-condition to between-condition variation cannot reasonably be ignored, whether in single-organism RNA-Seq or in Meta-RNA-Seq experiments, and further find that commonly-used RNA-Seq analysis tools, as described in the literature, do not enforce the constraint that the sum of relative expression levels must be one, and thus report expression levels that are systematically distorted. These two factors lead to misleading inferences if not properly accommodated. As it is usually only the biological between-condition and within-condition differences that are of interest, we developed ALDEx, an ANOVA-like differential expression procedure, to identify genes with greater between- to within-condition differences. We show that the presence of differential expression and the magnitude of these comparative differences can be reasonably estimated with even very small sample sizes.
PMCID: PMC3699591  PMID: 23843979
18.  The Embryonic Transcriptome of the Red-Eared Slider Turtle (Trachemys scripta) 
PLoS ONE  2013;8(6):e66357.
The bony shell of the turtle is an evolutionary novelty not found in any other group of animals, however, research into its formation has suggested that it has evolved through modification of conserved developmental mechanisms. Although these mechanisms have been extensively characterized in model organisms, the tools for characterizing them in non-model organisms such as turtles have been limited by a lack of genomic resources. We have used a next generation sequencing approach to generate and assemble a transcriptome from stage 14 and 17 Trachemys scripta embryos, stages during which important events in shell development are known to take place. The transcriptome consists of 231,876 sequences with an N50 of 1,166 bp. GO terms and EC codes were assigned to the 61,643 unique predicted proteins identified in the transcriptome sequences. All major GO categories and metabolic pathways are represented in the transcriptome. Transcriptome sequences were used to amplify several cDNA fragments designed for use as RNA in situ probes. One of these, BMP5, was hybridized to a T. scripta embryo and exhibits both conserved and novel expression patterns. The transcriptome sequences should be of broad use for understanding the evolution and development of the turtle shell and for annotating any future T. scripta genome sequences.
PMCID: PMC3686863  PMID: 23840449
19.  Broad Spectrum of Mimiviridae Virophage Allows Its Isolation Using a Mimivirus Reporter 
PLoS ONE  2013;8(4):e61912.
The giant virus Mimiviridae family includes 3 groups of viruses: group A (includes Acanthamoeba polyphaga Mimivirus), group B (includes Moumouvirus) and group C (includes Megavirus chilensis). Virophages have been isolated with both group A Mimiviridae (the Mamavirus strain) and the related Cafeteria roenbergensis virus, and they have also been described by bioinformatic analysis of the Phycodnavirus. Here, we found that the first two strains of virophages isolated with group A Mimiviridae can multiply easily in groups B and C and play a role in gene transfer among these virus subgroups. To isolate new virophages and their Mimiviridae host in the environment, we used PCR to identify a sample with a virophage and a group C Mimiviridae that failed to grow on amoeba. Moreover, we showed that virophages reduce the pathogenic effect of Mimivirus (plaque formation), establishing its parasitic role on Mimivirus. We therefore developed a co-culture procedure using Acanthamoeba polyphaga and Mimivirus to recover the detected virophage and then sequenced the virophage's genome. We present this technique as a novel approach to isolating virophages. We demonstrated that the newly identified virophages replicate in the viral factories of all three groups of Mimiviridae, suggesting that the spectrum of virophages is not limited to their initial host.
PMCID: PMC3626643  PMID: 23596530
20.  Species Identification and Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences 
PLoS ONE  2013;8(4):e60811.
The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.
PMCID: PMC3620293  PMID: 23579286
21.  Sequencing and annotation of the Ophiostoma ulmi genome 
BMC Genomics  2013;14:162.
The ascomycete fungus Ophiostoma ulmi was responsible for the initial pandemic of the massively destructive Dutch elm disease in Europe and North America in early 1910. Dutch elm disease has ravaged the elm tree population globally and is a major threat to the remaining elm population. O. ulmi is also associated with valuable biomaterials applications. It was recently discovered that proteins from O. ulmi can be used for efficient transformation of amylose in the production of bioplastics.
We have sequenced the 31.5 Mb genome of O.ulmi using Illumina next generation sequencing. Applying both de novo and comparative genome annotation methods, we predict a total of 8639 gene models. The quality of the predicted genes was validated using a variety of data sources consisting of EST data, mRNA-seq data and orthologs from related fungal species. Sequence-based computational methods were used to identify candidate virulence-related genes. Metabolic pathways were reconstructed and highlight specific enzymes that may play a role in virulence.
This genome sequence will be a useful resource for further research aimed at understanding the molecular mechanisms of pathogenicity by O. ulmi. It will also facilitate the identification of enzymes necessary for industrial biotransformation applications.
PMCID: PMC3618308  PMID: 23496816
22.  Improving Microbial Genome Annotations in an Integrated Database Context 
PLoS ONE  2013;8(2):e54859.
Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at
PMCID: PMC3570495  PMID: 23424620
23.  Identification of a Functional Connectome for Long-Term Fear Memory in Mice 
PLoS Computational Biology  2013;9(1):e1002853.
Long-term memories are thought to depend upon the coordinated activation of a broad network of cortical and subcortical brain regions. However, the distributed nature of this representation has made it challenging to define the neural elements of the memory trace, and lesion and electrophysiological approaches provide only a narrow window into what is appreciated a much more global network. Here we used a global mapping approach to identify networks of brain regions activated following recall of long-term fear memories in mice. Analysis of Fos expression across 84 brain regions allowed us to identify regions that were co-active following memory recall. These analyses revealed that the functional organization of long-term fear memories depends on memory age and is altered in mutant mice that exhibit premature forgetting. Most importantly, these analyses indicate that long-term memory recall engages a network that has a distinct thalamic-hippocampal-cortical signature. This network is concurrently integrated and segregated and therefore has small-world properties, and contains hub-like regions in the prefrontal cortex and thalamus that may play privileged roles in memory expression.
Author Summary
Memory retrieval is thought to involve the coordinated activation of multiple regions of the brain, rather than localized activity in a specific region. In order to visualize networks of brain regions activated by recall of a fear memory in mice, we quantified expression of an activity-regulated gene (c-fos) that is induced by neural activity. This allowed us to identify collections of brain regions where Fos expression co-varies across mice, and presumably form components of a network that are co-active during recall of long-term fear memory. This analysis suggested that expression of a long-term fear memory is an emergent property of large scale neural network interactions. This network has a distinct thalamic-hippocampal-cortical signature and, like many real-world networks as well as other anatomical and functional brain networks, has small-world architecture with a subset of highly-connected hub nodes that may play more central roles in memory expression.
PMCID: PMC3536620  PMID: 23300432
24.  Computational Prediction of Protein-Protein Interactions in Leishmania Predicted Proteomes 
PLoS ONE  2012;7(12):e51304.
The Trypanosomatids parasites Leishmania braziliensis, Leishmania major and Leishmania infantum are important human pathogens. Despite of years of study and genome availability, effective vaccine has not been developed yet, and the chemotherapy is highly toxic. Therefore, it is clear just interdisciplinary integrated studies will have success in trying to search new targets for developing of vaccines and drugs. An essential part of this rationale is related to protein-protein interaction network (PPI) study which can provide a better understanding of complex protein interactions in biological system. Thus, we modeled PPIs for Trypanosomatids through computational methods using sequence comparison against public database of protein or domain interaction for interaction prediction (Interolog Mapping) and developed a dedicated combined system score to address the predictions robustness. The confidence evaluation of network prediction approach was addressed using gold standard positive and negative datasets and the AUC value obtained was 0.94. As result, 39,420, 43,531 and 45,235 interactions were predicted for L. braziliensis, L. major and L. infantum respectively. For each predicted network the top 20 proteins were ranked by MCC topological index. In addition, information related with immunological potential, degree of protein sequence conservation among orthologs and degree of identity compared to proteins of potential parasite hosts was integrated. This information integration provides a better understanding and usefulness of the predicted networks that can be valuable to select new potential biological targets for drug and vaccine development. Network modularity which is a key when one is interested in destabilizing the PPIs for drug or vaccine purposes along with multiple alignments of the predicted PPIs were performed revealing patterns associated with protein turnover. In addition, around 50% of hypothetical protein present in the networks received some degree of functional annotation which represents an important contribution since approximately 60% of Leishmania predicted proteomes has no predicted function.
PMCID: PMC3519578  PMID: 23251492

Results 1-25 (114)