The quest for new antimalarial drugs, especially those with novel modes of action, is essential in the face of emerging drug-resistant parasites. Here we describe a new chemical class of molecules, pyrazoleamides, with potent activity against human malaria parasites and showing remarkably rapid parasite clearance in an in vivo model. Investigations involving pyrazoleamide-resistant parasites, whole-genome sequencing and gene transfers reveal that mutations in two proteins, a calcium-dependent protein kinase (PfCDPK5) and a P-type cation-ATPase (PfATP4), are necessary to impart full resistance to these compounds. A pyrazoleamide compound causes a rapid disruption of Na+ regulation in blood-stage Plasmodium falciparum parasites. Similar effect on Na+ homeostasis was recently reported for spiroindolones, which are antimalarials of a chemical class quite distinct from pyrazoleamides. Our results reveal that disruption of Na+ homeostasis in malaria parasites is a promising mode of antimalarial action mediated by at least two distinct chemical classes.
Novel antimalarial drugs are urgently needed to combat parasite drug resistance. Here, Vaidya et al. describe a new chemical class of potent antimalarial compounds that act by disrupting the parasite's sodium homeostasis.
Plasmodium falciparum is unique among human malarias in its ability to sequester in post-capillary venules of host organs. The main variant antigens implicated are the P. falciparum erythrocyte membrane protein 1 (PfEMP1), which can be divided into three major groups (A–C). Our study was a unique examination of sequestered populations of parasites for genetic background and expression of PfEMP1 groups. We collected post-mortem tissue from twenty paediatric hosts with pathologically different forms of cerebral malaria (CM1 and CM2) and parasitaemic controls (PC) to directly examine sequestered populations of parasites in the brain, heart and gut. Use of two different techniques to investigate this question produced divergent results. By quantitative PCR, group A var genes were upregulated in all three organs of CM2 and PC cases. In contrast, in CM1 infections displaying high levels of sequestration but negligible vascular pathology, there was high expression of group B var. Cloning and sequencing of var transcript tags from the same samples indicated a uniformly low expression of group A-like var. Generally, within an organ sample, 1–2 sequences were expressed at dominant levels. 23% of var tags were detected in multiple patients despite the P. falciparum infections being genetically distinct, and two tags were observed in up to seven hosts each with high expression in the brains of 3–4 patients. This study is a novel examination of the sequestered parasites responsible for fatal cerebral malaria and describes expression patterns of the major cytoadherence ligand in three organ-derived populations and three pathological states.
One of the most severe forms of malarial disease is cerebral malaria, which disproportionally affects young children. In this disease, the parasite places proteins on the red blood cell surface, providing a “smokescreen” by which they evade host immunity and hide in organ blood vessels, blocking them and causing tissue damage. It is impossible to study parasites in the organs during life and autopsy studies on children with malaria are exceedingly rare. In Malawi, we examined parasites from the brain, heart and intestine of twenty cases of fatal malaria including controls with low numbers of malaria parasites but another identified cause of death. We found little difference in the category of proteins the parasites used in controls and cerebral malaria, although a small number of specific proteins were detected in multiple infections. In an alternative form of malaria in which the brain is heavily infected but shows no evidence of damage, we found a different set of proteins at high proportion. However, as these children were typically older and most were infected with HIV, we could not determine which of these factors was most important. Interactions between host and parasite have the potential to influence disease outcomes.
Sparganosis is an infection with a larval Diphyllobothriidea tapeworm. From a rare cerebral case presented at a clinic in the UK, DNA was recovered from a biopsy sample and used to determine the causative species as Spirometra erinaceieuropaei through sequencing of the cox1 gene. From the same DNA, we have produced a draft genome, the first of its kind for this species, and used it to perform a comparative genomics analysis and to investigate known and potential tapeworm drug targets in this tapeworm.
The 1.26 Gb draft genome of S. erinaceieuropaei is currently the largest reported for any flatworm. Through investigation of β-tubulin genes, we predict that S. erinaceieuropaei larvae are insensitive to the tapeworm drug albendazole. We find that many putative tapeworm drug targets are also present in S. erinaceieuropaei, allowing possible cross application of new drugs. In comparison to other sequenced tapeworm species we observe expansion of protease classes, and of Kuntiz-type protease inhibitors. Expanded gene families in this tapeworm also include those that are involved in processes that add post-translational diversity to the protein landscape, intracellular transport, transcriptional regulation and detoxification.
The S. erinaceieuropaei genome begins to give us insight into an order of tapeworms previously uncharacterized at the genome-wide level. From a single clinical case we have begun to sketch a picture of the characteristics of these organisms. Finally, our work represents a significant technological achievement as we present a draft genome sequence of a rare tapeworm, and from a small amount of starting material.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0510-3) contains supplementary material, which is available to authorized users.
Rodent malaria parasites (RMP) are used extensively as models of human malaria. Draft RMP genomes have been published for Plasmodium yoelii, P. berghei ANKA (PbA) and P. chabaudi AS (PcAS). Although availability of these genomes made a significant impact on recent malaria research, these genomes were highly fragmented and were annotated with little manual curation. The fragmented nature of the genomes has hampered genome wide analysis of Plasmodium gene regulation and function.
We have greatly improved the genome assemblies of PbA and PcAS, newly sequenced the virulent parasite P. yoelii YM genome, sequenced additional RMP isolates/lines and have characterized genotypic diversity within RMP species. We have produced RNA-seq data and utilised it to improve gene-model prediction and to provide quantitative, genome-wide, data on gene expression. Comparison of the RMP genomes with the genome of the human malaria parasite P. falciparum and RNA-seq mapping permitted gene annotation at base-pair resolution. Full-length chromosomal annotation permitted a comprehensive classification of all subtelomeric multigene families including the ‘Plasmodium interspersed repeat genes’ (pir). Phylogenetic classification of the pir family, combined with pir expression patterns, indicates functional diversification within this family.
Complete RMP genomes, RNA-seq and genotypic diversity data are excellent and important resources for gene-function and post-genomic analyses and to better interrogate Plasmodium biology. Genotypic diversity between P. chabaudi isolates makes this species an excellent parasite to study genotype-phenotype relationships. The improved classification of multigene families will enhance studies on the role of (variant) exported proteins in virulence and immune evasion/modulation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-014-0086-0) contains supplementary material, which is available to authorized users.
Plasmodium chabaudi; Plasmodium berghei; Plasmodium yoelii; Genomes; RNA-seq; Genotypic diversity; Multigene families; pirs; Phylogeny
Pathogen genome sequencing directly from clinical samples is quickly gaining importance in genetic and medical research studies. However, low DNA yield from blood-borne pathogens is often a limiting factor. The problem worsens in extremely base-biased genomes such as the AT-rich Plasmodium falciparum. We present a strategy for whole-genome amplification (WGA) of low-yield samples from P. falciparum prior to short-read sequencing. We have developed WGA conditions that incorporate tetramethylammonium chloride for improved amplification and coverage of AT-rich regions of the genome. We show that this method reduces amplification bias and chimera formation. Our data show that this method is suitable for as low as 10 pg input DNA, and offers the possibility of sequencing the parasite genome from small blood samples.
whole-genome amplification; AT-rich; malaria; tetramethylammonium chloride
Commitment to and completion of sexual development are essential for malaria parasites (protists of the genus Plasmodium) to be transmitted through mosquitoes1. The molecular mechanism(s) responsible for commitment have been hitherto unknown. Here we show that PBAP2-G, a conserved member of the ApiAP2 family of transcription factors, is essential for the commitment of asexually replicating forms to sexual development in P. berghei, a malaria parasite of rodents. PBAP2-G was identified from mutations in its encoding gene, PBANKA_143750, which account for the loss of sexual development frequently observed in parasites transmitted artificially by blood passage. Systematic gene deletion of conserved ApiAP2 genes in Plasmodium confirmed the role of PBAP2-G and revealed a second ApiAP2 member (PBANKA_103430, termed PBAP2-G2) that significantly modulates but does not abolish gametocytogenesis indicating that a cascade of ApiAP2 proteins are involved in commitment to the production and maturation of gametocytes. The data suggest a mechanism of commitment to gametocytogenesis in Plasmodium consistent with a positive feedback loop involving PBAP2G which might be exploited to prevent the transmission of this pernicious parasite.
Oxamniquine resistance evolved in the human blood fluke (Schistosoma mansoni) in Brazil in the 1970s. We crossed parental parasites differing ~500-fold in drug response, determined drug sensitivity and marker segregation in clonally-derived F2s, and identified a single QTL (LOD=31) on chromosome 6. A sulfotransferase was identified as the causative gene using RNAi knockdown and biochemical complementation assays and we subsequently demonstrated independent origins of loss-of-function mutations in field-derived and laboratory-selected resistant parasites. These results demonstrate the utility of linkage mapping in a human helminth parasite, while crystallographic analyses of protein-drug interactions illuminate the mode of drug action and provide a framework for rational design of oxamniquine derivatives that kill both S. mansoni and S. haematobium, the two species responsible for >99% of schistosomiasis cases worldwide.
Tapeworms cause debilitating neglected diseases that can be deadly and often require surgery due to ineffective drugs. Here we present the first analysis of tapeworm genome sequences using the human-infective species Echinococcus multilocularis, E. granulosus, Taenia solium and the laboratory model Hymenolepis microstoma as examples. The 115-141 megabase genomes offer insights into the evolution of parasitism. Synteny is maintained with distantly related blood flukes but we find extreme losses of genes and pathways ubiquitous in other animals, including 34 homeobox families and several determinants of stem cell fate. Tapeworms have species-specific expansions of non-canonical heat shock proteins and families of known antigens; specialised detoxification pathways, and metabolism finely tuned to rely on nutrients scavenged from their hosts. We identify new potential drug targets, including those on which existing pharmaceuticals may act. The genomes provide a rich resource to underpin the development of urgently needed treatments and control.
HSP70; parasitism; Cestoda; cysticercosis; echinococcosis; Platyhelminthes
Spatial relationships within the eukaryotic nucleus are essential for proper nuclear function. In Plasmodium falciparum, the repositioning of chromosomes has been implicated in the regulation of the expression of genes responsible for antigenic variation, and the formation of a single, peri-nuclear nucleolus results in the clustering of rDNA. Nevertheless, the precise spatial relationships between chromosomes remain poorly understood, because, until recently, techniques with sufficient resolution have been lacking. Here we have used chromosome conformation capture and second-generation sequencing to study changes in chromosome folding and spatial positioning that occur during switches in var gene expression. We have generated maps of chromosomal spatial affinities within the P. falciparum nucleus at 25 Kb resolution, revealing a structured nucleolus, an absence of chromosome territories, and confirming previously identified clustering of heterochromatin foci. We show that switches in var gene expression do not appear to involve interaction with a distant enhancer, but do result in local changes at the active locus. These maps reveal the folding properties of malaria chromosomes, validate known physical associations, and characterize the global landscape of spatial interactions. Collectively, our data provide critical information for a better understanding of gene expression regulation and antigenic variation in malaria parasites.
Plasmodium falciparum; antigenic variation; genome conformation capture; 3C; HiC
Chemical genetics and a global comparative analysis of phosphorylation and phospholipids in vivo shows that PKG is the upstream regulator that induces calcium signals that enables Plasmodium to progress through its complex life cycle.
Many critical events in the Plasmodium life cycle rely on the controlled release of Ca2+ from intracellular stores to activate stage-specific Ca2+-dependent protein kinases. Using the motility of Plasmodium berghei ookinetes as a signalling paradigm, we show that the cyclic guanosine monophosphate (cGMP)-dependent protein kinase, PKG, maintains the elevated level of cytosolic Ca2+ required for gliding motility. We find that the same PKG-dependent pathway operates upstream of the Ca2+ signals that mediate activation of P. berghei gametocytes in the mosquito and egress of Plasmodium falciparum merozoites from infected human erythrocytes. Perturbations of PKG signalling in gliding ookinetes have a marked impact on the phosphoproteome, with a significant enrichment of in vivo regulated sites in multiple pathways including vesicular trafficking and phosphoinositide metabolism. A global analysis of cellular phospholipids demonstrates that in gliding ookinetes PKG controls phosphoinositide biosynthesis, possibly through the subcellular localisation or activity of lipid kinases. Similarly, phosphoinositide metabolism links PKG to egress of P. falciparum merozoites, where inhibition of PKG blocks hydrolysis of phosphatidylinostitol (4,5)-bisphosphate. In the face of an increasing complexity of signalling through multiple Ca2+ effectors, PKG emerges as a unifying factor to control multiple cellular Ca2+ signals essential for malaria parasite development and transmission.
Malaria, caused by Plasmodium spp. parasites, is a profound human health problem. Plasmodium parasites progress through a complex life cycle as they move between infected humans and blood-feeding mosquitoes. We know that tight regulation of calcium ion levels within the cytosol of the parasite is critical to control multiple signalling events in their life cycle. However, how these calcium levels are controlled remains a mystery. Here, we show that a single protein kinase, the cGMP-dependent protein kinase G (PKG), controls the calcium signals that are critical at three different points of the life cycle: (1) for the exit of the merozoite form of the parasite from human erythrocytes (red blood cells), (2) for the cellular activation that happens when Plasmodium sexual transmission stages are ingested by a blood-feeding mosquito, and (3) for the productive gliding of the ookinete, which is the parasite stage that invades the mosquito midgut. We provide initial evidence that the universal role of PKG relies on the production of lipid precursors which then give rise to inositol (1,4,5)-trisphosphate (IP3), a messenger molecule that serves as a signal for the release of calcium from stores within the parasite. This signalling pathway provides a potential target to block both malaria development in the human host and transmission to the mosquito vector.
Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics.
Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data.
The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity.
Globodera pallida is a devastating pathogen of potato crops, making it one of the most economically important plant parasitic nematodes. It is also an important model for the biology of cyst nematodes. Cyst nematodes and root-knot nematodes are the two most important plant parasitic nematode groups and together represent a global threat to food security.
We present the complete genome sequence of G. pallida, together with transcriptomic data from most of the nematode life cycle, particularly focusing on the life cycle stages involved in root invasion and establishment of the biotrophic feeding site. Despite the relatively close phylogenetic relationship with root-knot nematodes, we describe a very different gene family content between the two groups and in particular extensive differences in the repertoire of effectors, including an enormous expansion of the SPRY domain protein family in G. pallida, which includes the SPRYSEC family of effectors. This highlights the distinct biology of cyst nematodes compared to the root-knot nematodes that were, until now, the only sedentary plant parasitic nematodes for which genome information was available. We also present in-depth descriptions of the repertoires of other genes likely to be important in understanding the unique biology of cyst nematodes and of potential drug targets and other targets for their control.
The data and analyses we present will be central in exploiting post-genomic approaches in the development of much-needed novel strategies for the control of G. pallida and related pathogens.
Although asexual reproduction via clonal propagation has been proposed as the principal reproductive mechanism across parasitic protozoa of the Leishmania genus, sexual recombination has long been suspected, based on hybrid marker profiles detected in field isolates from different geographical locations. The recent experimental demonstration of a sexual cycle in Leishmania within sand flies has confirmed the occurrence of hybridisation, but knowledge of the parasite life cycle in the wild still remains limited. Here, we use whole genome sequencing to investigate the frequency of sexual reproduction in Leishmania, by sequencing the genomes of 11 Leishmania infantum isolates from sand flies and 1 patient isolate in a focus of cutaneous leishmaniasis in the Çukurova province of southeast Turkey. This is the first genome-wide examination of a vector-isolated population of Leishmania parasites. A genome-wide pattern of patchy heterozygosity and SNP density was observed both within individual strains and across the whole group. Comparisons with other Leishmania donovani complex genome sequences suggest that these isolates are derived from a single cross of two diverse strains with subsequent recombination within the population. This interpretation is supported by a statistical model of the genomic variability for each strain compared to the L. infantum reference genome strain as well as genome-wide scans for recombination within the population. Further analysis of these heterozygous blocks indicates that the two parents were phylogenetically distinct. Patterns of linkage disequilibrium indicate that this population reproduced primarily clonally following the original hybridisation event, but that some recombination also occurred. This observation allowed us to estimate the relative rates of sexual and asexual reproduction within this population, to our knowledge the first quantitative estimate of these events during the Leishmania life cycle.
Sexual reproduction is predicted to be a rare event in Leishmania parasites, as evidenced by detection of rare parasite hybrids in natural populations using molecular methods. Recently, a sexual cycle has been detected experimentally in parasites within the sand fly vector (that transmits this pathogenic microorganism to mammalian species including man, causing human leishmaniasis). In this study, we have used whole genome sequencing to investigate genetic variation at the highest level of resolution in Leishmania parasites isolated from sand flies in a defined focus of leishmaniasis in southeast Turkey. Using a range of analytical tools, we show that variation in these parasites arose following a single cross between two diverse strains and subsequent recombination between the progeny, despite mainly clonal reproduction in the parasite population. We have thus been able to derive quantitative estimates of the relative rates of sexual and asexual reproduction during the Leishmania life cycle for the first time, information that will be critical to our understanding of the epidemiology and evolution of this genus.
Advances in both high-throughput sequencing and whole-genome amplification (WGA) protocols have allowed genomes to be sequenced from femtograms of DNA, for example from individual cells or from precious clinical and archived samples. Using the highly curated Caenorhabditis elegans genome as a reference, we have sequenced and identified errors and biases associated with Illumina library construction, library insert size, different WGA methods and genome features such as GC bias and simple repeat content. Detailed analysis of the reads from amplified libraries revealed characteristics suggesting that majority of amplified fragment ends are identical but inverted versions of each other. Read coverage in amplified libraries is correlated with both tandem and inverted repeat content, while GC content only influences sequencing in long-insert libraries. Nevertheless, single nucleotide polymorphism (SNP) calls and assembly metrics from reads in amplified libraries show comparable results with unamplified libraries. To utilize the full potential of WGA to reveal the real biological interest, this article highlights the importance of recognizing additional sources of errors from amplified sequence reads and discusses the potential implications in downstream analyses.
whole-genome amplification; Illumina; SNPs; genome assembly; chimeric DNA
Defining mechanisms by which Plasmodium virulence is regulated is central to understanding the pathogenesis of human malaria. Serial blood passage of Plasmodium through rodents1-3, primates4 or humans5 increases parasite virulence, suggesting that vector transmission regulates Plasmodium virulence within the mammalian host. In agreement, disease severity can be modified by vector transmission6-8, which is assumed to ‘reset’ Plasmodium to its original character3. However, direct evidence that vector transmission regulates Plasmodium virulence is lacking. Here we utilise mosquito transmission of serially blood passaged (SBP) Plasmodium chabaudi chabaudi9 to interrogate regulation of parasite virulence. Analysis of SBP P.c. chabaudi before and after mosquito transmission demonstrates that vector transmission intrinsically modifies the asexual blood-stage parasite, which in turn, modifies the elicited mammalian immune response, which in turn, attenuates parasite growth and associated pathology. Attenuated parasite virulence associates with modified expression of the pir multi-gene family. Vector transmission of Plasmodium therefore regulates gene expression of probable variant antigens in the erythrocytic cycle, modifies the elicited mammalian immune response, and thus regulates parasite virulence. These results place the mosquito at the centre of our efforts to dissect mechanisms of protective immunity to malaria for the development of an effective vaccine.
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host–parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.
Chromatin diminution is the programmed elimination of specific DNA sequences during development. It occurs in diverse species, but the function(s) of diminution and the specificity of sequence loss remain largely unknown. Diminution in the nematode Ascaris suum occurs during early embryonic cleavages and leads to the loss of germline genome sequences and the formation of a distinct genome in somatic cells. We found that ~43 Mb (~13%) of genome sequence is eliminated in A. suum somatic cells, including ~12.7 Mb of unique sequence. The eliminated sequences and location of the DNA breaks are the same in all somatic lineages from a single individual, and between different individuals. At least 685 genes are eliminated. These genes are preferentially expressed in the germline and during early embryogenesis. We propose that diminution is a mechanism of germline gene regulation that specifically removes a large number of genes involved in gametogenesis and early embryogenesis.
WormBase (http://www.wormbase.org/) is a highly curated resource dedicated to supporting research using the model organism Caenorhabditis elegans. With an electronic history predating the World Wide Web, WormBase contains information ranging from the sequence and phenotype of individual alleles to genome-wide studies generated using next-generation sequencing technologies. In recent years, we have expanded the contents to include data on additional nematodes of agricultural and medical significance, bringing the knowledge of C. elegans to bear on these systems and providing support for underserved research communities. Manual curation of the primary literature remains a central focus of the WormBase project, providing users with reliable, up-to-date and highly cross-linked information. In this update, we describe efforts to organize the original atomized and highly contextualized curated data into integrated syntheses of discrete biological topics. Next, we discuss our experiences coping with the vast increase in available genome sequences made possible through next-generation sequencing platforms. Finally, we describe some of the features and tools of the new WormBase Web site that help users better find and explore data of interest.
We describe an analysis of genome variation in 825 Plasmodium falciparum samples from Asia and Africa that reveals an unusual pattern of parasite population structure at the epicentre of artemisinin resistance in western Cambodia. Within this relatively small geographical area we have discovered several distinct but apparently sympatric parasite subpopulations with extremely high levels of genetic differentiation. Of particular interest are three subpopulations, all associated with clinical resistance to artemisinin, which have skewed allele frequency spectra and remarkably high levels of haplotype homozygosity, indicative of founder effects and recent population expansion. We provide a catalogue of SNPs that show high levels of differentiation in the artemisinin-resistant subpopulations, including codon variants in various transporter proteins and DNA mismatch repair proteins. These data provide a population genetic framework for investigating the biological origins of artemisinin resistance and for defining molecular markers to assist its elimination.
The current standard to assess pentavalent antimonial (SSG) susceptibility of Leishmania is a laborious in vitro assay of which the result has little clinical value because SSG-resistant parasites are also found in SSG-cured patients. Candidate genetic markers for clinically relevant SSG-resistant parasites identified by full genome sequencing were here validated on a larger set of clinical strains. We show that 3 genomic locations suffice to specifically detect the SSG-resistant parasites found only in patients experiencing SSG treatment failure. This finding allows the development of rapid assays to monitor the emergence and spread of clinically relevant SSG-resistant Leishmania parasites.
The small ruminant parasite Haemonchus contortus is the most widely used parasitic nematode in drug discovery, vaccine development and anthelmintic resistance research. Its remarkable propensity to develop resistance threatens the viability of the sheep industry in many regions of the world and provides a cautionary example of the effect of mass drug administration to control parasitic nematodes. Its phylogenetic position makes it particularly well placed for comparison with the free-living nematode Caenorhabditis elegans and the most economically important parasites of livestock and humans.
Here we report the detailed analysis of a draft genome assembly and extensive transcriptomic dataset for H. contortus. This represents the first genome to be published for a strongylid nematode and the most extensive transcriptomic dataset for any parasitic nematode reported to date. We show a general pattern of conservation of genome structure and gene content between H. contortus and C. elegans, but also a dramatic expansion of important parasite gene families. We identify genes involved in parasite-specific pathways such as blood feeding, neurological function, and drug metabolism. In particular, we describe complete gene repertoires for known drug target families, providing the most comprehensive understanding yet of the action of several important anthelmintics. Also, we identify a set of genes enriched in the parasitic stages of the lifecycle and the parasite gut that provide a rich source of vaccine and drug target candidates.
The H. contortus genome and transcriptome provide an essential platform for postgenomic research in this and other important strongylid parasites.
Malaria elimination strategies require surveillance of the parasite population for genetic changes that demand a public health response, such as new forms of drug resistance. 1,2 Here we describe methods for large-scale analysis of genetic variation in Plasmodium falciparum by deep sequencing of parasite DNA obtained from the blood of patients with malaria, either directly or after short term culture. Analysis of 86,158 exonic SNPs that passed genotyping quality control in 227 samples from Africa, Asia and Oceania provides genome-wide estimates of allele frequency distribution, population structure and linkage disequilibrium. By comparing the genetic diversity of individual infections with that of the local parasite population, we derive a metric of within-host diversity that is related to the level of inbreeding in the population. An open-access web application has been established for exploration of regional differences in allele frequency and of highly differentiated loci in the P. falciparum genome.
Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/.
Genome assembly; validation; evaluation
Genome projects now produce draft assemblies within weeks thanks to advanced high-throughput sequencing technologies. For milestone projects like E. coli or H. sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT, post-assembly genome-improvement toolkit) to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence, and to exploit reference genomes (if available) for improving scaffolding and generating annotations. The protocol is most accessible for bacterial and small Eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes approximately 24 hours: it doubles the average contig size and annotates over 4300 gene models.
Next generation sequencing; automatic finishing; gap closing; genome annotation; contig ordering