|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: TK JTJ NK NS MB. Performed the experiments: ASF AJR IJT TDO MH TK. Analyzed the data: JJD PM TT KT MCL SAA IJT PJAC KH JM. Contributed reagents/materials/analysis tools: NK TY. Wrote the paper: TK JTJ JAC AGM MB. Contributed equally as second authors: JAC JJD KH NK PM TT IJT. Assembled the genome and directed annotation: ASF AJR IJT TDO MH TK. Analyzed RNAi pathway and neuropeptides: JJD PM AGM. Analyzed effector analysis: PJAC JTJ. Analyzed embryogenesis and stress response: KH JM. Repeat analysis: IJT. orthMCL and HGT analysis: JAC. Peptidase analysis: SAA. Chemoreception analysis: TT KT MCL.
Bursaphelenchus xylophilus is the nematode responsible for a devastating epidemic of pine wilt disease in Asia and Europe, and represents a recent, independent origin of plant parasitism in nematodes, ecologically and taxonomically distinct from other nematodes for which genomic data is available. As well as being an important pathogen, the B. xylophilus genome thus provides a unique opportunity to study the evolution and mechanism of plant parasitism. Here, we present a high-quality draft genome sequence from an inbred line of B. xylophilus, and use this to investigate the biological basis of its complex ecology which combines fungal feeding, plant parasitic and insect-associated stages. We focus particularly on putative parasitism genes as well as those linked to other key biological processes and demonstrate that B. xylophilus is well endowed with RNA interference effectors, peptidergic neurotransmitters (including the first description of ins genes in a parasite) stress response and developmental genes and has a contracted set of chemosensory receptors. B. xylophilus has the largest number of digestive proteases known for any nematode and displays expanded families of lysosome pathway genes, ABC transporters and cytochrome P450 pathway genes. This expansion in digestive and detoxification proteins may reflect the unusual diversity in foods it exploits and environments it encounters during its life cycle. In addition, B. xylophilus possesses a unique complement of plant cell wall modifying proteins acquired by horizontal gene transfer, underscoring the impact of this process on the evolution of plant parasitism by nematodes. Together with the lack of proteins homologous to effectors from other plant parasitic nematodes, this confirms the distinctive molecular basis of plant parasitism in the Bursaphelenchus lineage. The genome sequence of B. xylophilus adds to the diversity of genomic data for nematodes, and will be an important resource in understanding the biology of this unusual parasite.
Bursaphelenchus xylophilus is an important plant pathogen, responsible for an epidemic of pine wilt disease in Asia and Europe. B. xylophilus has acquired the ability to parasitise plants independently from other economically important nematodes and has a complex life cycle that includes fungal feeding and a stage associated with an insect, as well as plant parasitism. We have sequenced the genome of B. xylophilus and used it as a resource to understand disease mechanisms and the biological basis of its complex ecology. The ability to break down cellulose, the major component of the plant cell wall, is a major problem for plant parasitic nematodes as few animals can produce the required enzymes (cellulases). Previous work has shown that other plant parasitic nematodes have acquired cellulases from bacteria but we show that all Bursaphelenchus cellulases were most likely acquired independently from fungi. We also describe a complex set of genes encoding enzymes that can break down proteins and other molecules, perhaps reflecting the range of organisms with which B. xylophilus interacts during its life cycle. The genome sequence of Bursaphelenchus represents an important step forward in understanding its biology, and will contribute to efforts to control the devastating disease it causes.
The nematode Caenorhabditis elegans was the first multicellular organism for which a complete genome sequence was available, and subsequent genomics research on a wider range of nematodes has provided information on many important biological processes and is underpinned by the information developed for C. elegans . While C. elegans is a free-living bacterial feeder, nematodes exhibit a wide range of ecological interactions, including important parasites of humans and livestock that have huge agricultural and medical impacts .
Plant parasitic nematodes cause damage to crops globally. Within the Nematoda, the ability to parasitise plants has evolved independently on several occasions  and nematodes use a wide range of strategies to parasitise plants. Some nematodes are migratory ectoparasites which remain outside the roots and have a very limited interaction with their hosts. Migratory endoparasitic nematodes invade their host and cause extensive damage as they move through the host and feed. Sedentary endoparasitic nematodes, including the cyst nematodes and root knot nematodes, have highly complex biotrophic interactions with their hosts. These are the most damaging plant nematodes and consequently genomes for two species of root knot nematode have been reported ,  with others in progress for cyst nematodes. Currently there is no genome sequence for any migratory endoparasitic nematode.
The pine wood nematode Bursaphelenchus xylophilus is a migratory endoparasite that causes severe damage to forestry and forest ecosystems (reviewed in ). B. xylophilus is native to North America such that trees there have evolved tolerance or resistance to the pathogen. However, at the start of the 20th Century it was introduced into Japan and has subsequently spread to other countries in Asia where no natural resistance is present, causing huge damage in an on-going epidemic of pine wilt disease. Despite global quarantine efforts B. xylophilus was recently introduced into Portugal  and has now also spread to Spain.
Most species within the Bursaphelenchus genus, including the closest relatives of B. xylophilus , are fungal feeders that are transmitted by vector insects only to dead or dying trees during oviposition. B. xylophilus and the few other pathogenic species described to date are unique in their capacity to feed on live trees as well as on the fungi that colonise dead or dying trees, so these species may represent a relatively recent, independent origin of plant parasitism. The nematode is a member of the Aphelenchoididae and belongs to clade 10  while most other major plant parasites including Meloidogyne species belong to clade12  (Figure 1). The life cycle of B. xylophilus is summarised in Figure 2 and the infection and disease process has been reviewed by Mamiya  and by Jones et al. .
Little was known about the molecular basis of the interactions between B. xylophilus and its host plants. A series of advances have been made in the last few years, underpinned by a relatively small scale Expressed Sequence Tag (EST) project on this nematode . Genes involved in parasitism were identified and characterised, including those encoding a wide range of plant cell wall degrading or modifying proteins –. As in other plant parasitic nematodes, it is now clear that horizontal gene transfer has played an important role in the evolution of plant parasitism in B. xylophilus .
In order to shed further light on the mechanisms of parasitism used by B. xylophilus and to investigate the genetic and genomic factors involved in the evolution of parasitism, we have produced a high quality genome sequence from an inbred line of this nematode. We describe the assembly and initial annotation and characterisiation of the genome sequence, then interrogate this dataset to identify genes involved in key biological processes, including those associated with chemosensation, neurotransmission, alimentation, stress-responses and development. Further, we identify genes potentially important in functional genetics such as the RNAi pathway and putative control targets such as G-protein coupled receptors (GPCRs), peptidases and neuropeptide genes. This genome sequence allows a comparison of genes involved in plant parasitism across nematode clades and expands our knowledge of the role played by horizontal gene transfer in the evolution of plant parasitism by nematodes.
The B. xylophilus Ka4 population, which originated in Ibaraki prefecture Japan and has been maintained for over 15 years in the Nematology Lab in FFPRI was previously used to generate biological material for ESTs. The Ka4C1 inbred line was established by sister-brother mating over 8 generations from the Ka4 population and was used to generate of material for genome sequencing.
Nematodes were cultivated for 10 days on Botrytis cinerea grown on autoclaved barley grains with antibiotics (100 µg/ml streptomycin and 25 µg/ml chloramphenicol). The nematodes were collected using a modified Baermann funnel technique for 3 h at 25°C and cleaned by sucrose flotation  followed by three rinses in 1x PBST. The cleaned nematodes were incubated in 1x PBST containing antibiotics (100 µg/ml streptomycin and 25 µg/ml chloramphenicol) at 23°C for 8 hours to allow nematodes to digest fungal residues remaining in their guts before use. Genomic DNA was extracted from nematodes using GenomeTip-100G (Qiagen) following the manufacturer's instructions.
Poly-(A) + RNA was extracted from mixed-stage nematodes or fourth-stage dispersal larva (DL4 or LIV) collected from the vector insect beetle as described previously  and used for the construction of EST libraries.
To determine the B. xylophilus chromosome number, chromosomes were observed in early embryos by DAPI staining and confocal laser-scanning microscopy . The genome size of B. xylophilus was estimated using real time PCR as described in Welhen et al . Translation elongation factor 1 alpha (genbank no GU130155) was used as a reference. DNA concentration was calculated using Qubit (Invitrogen) and the real time PCR reaction was performed using the StepOnePlus system (Applied BioSystems) with SYBR Green I. These protocols can be extremely sensitive and consequently the experiments were repeated in triplicate.
A library of single stranded DNA fragments was obtained from B. xylophilus genomic DNA using Roche/454 standard procedures and a total of 993,947 single reads were obtained from this library using a 454 FLX Genome Sequencer. For paired end sequencing, libraries of 8 kb and 20 kb fragments were prepared using the manufacturer's standard protocol after shearing using a Hydroshear device. A total of 1,729,870 reads for the 8 k library and 223,236 reads for the 20 k library were obtained after standard Roche quality trimming. Of these, 89% and 88% were recognized as paired by the Newbler assembler (version 2.3). The 454 FLX sequence data provided 13× coverage of the genome sequence.
Genomic DNA was sequenced on Illumina Genome Analyser I (GAI) using standard procedures. The sequences were 51 bases long. A total of 71,593,088 reads passed the default filter, giving a total coverage of about 48 genome equivalents.
Sequence data from 454 FLX and Illumina GAI were assembled using the Newbler de novo assembler (version 2.3) and Velvet assembler (version 1.0.12)  respectively. The result from each assembly was combined using Minimus2 (sourceforge.net/projects/amos/) and contigs supported by both assemblies were used as fake unpaired capillary reads in the subsequent Newbler assembly. The resulting assembly was improved using three different methods: AbacasII  to merge small contigs; IMAGE  to iteratively map and assemble short reads to close gaps; and iCORN  to iteratively correct single base substitutions and small indels.
As an indirect measure of completeness of the assembly, a search for orthologues was performed by CEGMA (ver. 2.0) software using CEGs (core eukaryotic genes); a set of 248 extremely highly conserved genes thought to be present in almost all eukaryotes in a reduced number of paralogues .
EST clustering was performed using PartiGene (ver. 3.0), a software pipeline designed to analyze and organize EST data sets . Briefly, EST sequences were clustered into groups (putative genes) on the basis of sequence similarity and then clusters were assembled to yield consensus sequences using Phrap (P. Green, unpublished). Mapping of individual ESTs or clustered ESTs to the genome assembly was performed using PASA software .
EST data generated in this study and previously obtained by capillary sequencing  were used to predict genes in the assembly. The total number of ESTs used was 82,100.
A reference dataset of 565 B. xylophilus protein encoding genes was manually curated from EST clusters and predictions of highly conserved genes using CEGMA . 365 of these were used to train ab-initio gene predictors Augustus  and SNAP  and 200 were used to evaluate accuracy of the predictions. EVidenceModeler  was used to combine all predictions from gene predictors Augustus, SNAP and GeneMark.hmm , EST mapping data from PASA  and protein homology data against the Pfam database using GeneWise2 . Gene prediction accuracy was computed at the level of nucleotides, exons and complete genes on 200 manually-curated gene models (Figure S1) as described previously .
Initial functional annotation was performed using InterProScan to search against the InterPro protein family database, which included PROSITE, PRINTS, Pfam, ProDom, SMART, TIGRFAMs, PIR SuperFamily and SUPERFAMILY . The latest version Pfam search (ver. 24.0) , which uses HMMER3 and is more sensitive than the previous version packed in InterProScan, was also performed independently for B. xylophilus genome. Gene Ontology annotation was derived using Blast2GO software  based on the BLAST match against NCBI non-redundant (NR) proteins with an E-value cutoff of 1e-10 and InterProScan results.
Assignments to conserved positions in metabolic and regulatory pathways were performed using KOBAS software  based on the KEGG annotation resource . KEGG genes and KO term annotations were assigned based on similarity searches with a 1e-5 E-value cutoff and a rank cutoff 5. Significantly enriched pathway terms between two organisms were identified by frequencies of terms with chi-square and FDR correlation tests.
To study the evolution of gene families across nematodes within the order Rhabditida, we used the predicted protein sets from all 9 genomes available in WormBase release WS221 (www.wormbase.org) – C. elegans, C. brenneri, C. briggsae, C. japonicum, C. remaneri, Meloidogyne hapla, M. incognita, Pristionchus pacificus and Brugia malayi, together with predicted proteins of B. xylophilus. Version 2.0 of the OrthoMCL pipeline  was used to cluster proteins into families of orthologous genes, with default settings and the BLAST parameters recommended in the OrthoMCL documentation. We reconstructed the evolution of gene families on a phylogeny for these 10 species, based on aligning amino-acid sequences from single-copy gene families using Muscle v3.6 , and constructing coding-sequence nucleotide alignments based on these. Phylogenetic inference was performed using BEAST v.1.6.1  from 10 million Markov Chain Monte Carlo (MCMC) generations under a strict clock model using the SRD  model for each gene partition. Three independent MCMC runs converged to the same posterior probability. Birth and death of gene families was inferred under Dollo parsimony using the Dollop program from v3.69 of the Phylip package .
To look for potential horizontal gene transfers specifically into the B. xylophilus lineage, we used BLASTP to compare predicted B. xylophilus protein sequences against the NCBI NR database, producing a candidate set of laterally transferred genes that either had no significant BLAST hit (E-value≤10−5) to any nematode sequence or had significant hits only to genes from Aphelenchoidoidea and no other nematode. For each of these candidates, amino acid sequence data was extracted from the NCBI database, aligned using Muscle v3.6 and ML phylogenetic trees estimated using the best-fitting model under the AICc criterion in RaxML v7.2.8. . Clade support was estimated using 100 non-parametric bootstrap replicates in RaxML, and approximately unbiased (AU) statistical tests of tree topology were performed in Consel v1.19 .
The detection and annotation of ncRNA molecules was performed using the LeARN pipeline . This pipeline contains four independent methods: tRNAscanSE for transfer RNA (tRNA) gene detection, NCBI-BLASTN versus a ribosomal RNA (rRNA) sequence database, the Rfam database (release 9.0) to detect common ncRNA families and a mirfold-based pipeline using the mirBase library as a source of micro RNA (miRNA) candidates.
Transposable elements (TEs) in the assembly were identified using two approaches. The first stage consisted of de novo identification of repeat families in the assembly based on signatures of transposable elements and assuming fragments of TEs are present throughout the genome. Long terminal repeat (LTR) retrotransposons were identified using LTRharvest which searches for two near-identical copies of an LTR flanked by target site duplications that are close to each other. We also used RepeatModeler (http://www.repeatmasker.org/RepeatModeler.html) which aims to construct repeat consensus from two de novo detection programs (RepeatScout and RECON). Repeats present at less than 10 copies in the genome or that were less than 100 bp were excluded from further analysis. The second approach used homology searching of the assembly sequence against curated TEs using TransposonPSI (http://transposonpsi.sourceforge.net/). UCLUST was used to cluster the candidate sequences (with 80% identity) and create a non-redundant library of repeat consensus sequences.
The annotation of repeat candidates involves a search against RepBase and NCBI non-redundant library. Some of these candidates that have some annotations available from program output (for example, from TransposonPSI) were further checked this way. Manual curation of the candidates was carried out to determine coding regions on intact TEs that are potentially active. We found the majority of unannotated elements contained no ORFs longer than 30 bp nor had any significant matches to repetitive elements in the databases.
RepeatMasker (v3.2.8) was used to calculate the distribution of each repeat and its abundance. Custom perl scripts were used to choose the best match from overlapping matches in RepeatMasker output to avoid calculating the same region twice or more when considering repeat content of the genome.
The CAZymes Analysis Toolkit (CAT)  was used to detect B. xylophilus carbohydrate active enzymes (CAZymes) based on the CAZy database. An annotation method “based on association rules between CAZy families and Pfam domains” was used with an E-value threshold of 0.01, a bitscore threshold of 55 and rule support level 40. The annotation was supplemented and confirmed manually using BLAST search similarities and protein length matches. Expansin-like genes were detected by BLAST search using core modules of known expansin proteins as queries. Putative functions of the proteins were predicted by similarity to known protein modules and presence of catalytic sites using BLASTP search against NCBI's Conserved Domain Database service and InterProScan (www.ebi.ac.uk/Tools/InterProScan).
The MEROPS server was used to identify B. xylophilus putative peptidases. The peptidase candidates were derived from MEROPS batch BLAST . The candidates were manually examined in terms of similarity (E-value cutoff 1e-10) to MEROPS proteins and presence of all catalytic sites.
BLASTP was used to search for B. xylophilus homologues of effectors from M. incognita , Heterodera glycines  and Globodera pallida . An E-value cutoff of 1e-5 was used to identify significant matches. In addition, candidate effectors were sought from the B. xylophilus protein set using a bioinformatic approach. Secreted proteins were identified as those having a potential signal peptide at their N-terminus predicted by SignalP 3.0  and no transmembrane domain within the mature peptide as predicted using TMHMM 2.0 . Novel candidate effectors within this secreted protein dataset were identified as those with no BLASTP matches against the NR protein database (E-value cutoff 0.001).
Orthologues of C. elegans genes necessary for unequal cell divisions were identified from the B. xylophilus protein set by BLASTP search using protein sequences retrieved from WormBase as queries. Their structures were confirmed manually using NCBI and WormBase BLAST.
Seventy-eight proteins known to be involved in core aspects of the C. elegans RNAi pathway were identified from the literature. Protein sequences were retrieved from WormBase (release WS206) and used as queries in TBLASTN and BLASTP searches against the predicted protein and contig databases. All primary BLAST hits returning with a bitscore ≥40 and an E-value≤0.01 were manually translated to amino acid sequence in six reading frames (www.expasy.ch/tools/dna.html), and analysed for identity domain structure by BLASTP (through NCBI's Conserved Domain Database service) and InterProScan. The appropriate reading frame in each case (determined empirically on a case by case basis, but usually that with the largest uninterrupted open reading frame) was then subjected to reciprocal TBLASTN and BLASTP against the C. elegans non-redundant nucleotide and protein databases on the NCBI BLAST server (http://www.ncbi.nlm.nih.gov/BLAST), using default settings. The identity of the top-scoring reciprocal BLAST hit was accepted as identity of the relevant primary hit, as long as that identity was also supported by domain structure analysis.
The predicted B. xylophilus AGOs were analysed further through annotation of conserved RNase-H-like catalytic residue sites and the MID sub-domain which was then used for a further BLAST search against the C. elegans nr protein set (NCBI). The MID domain is highly conserved in functionally comparable AGOs , .
Neuropeptide genes were identified using the BLAST search tool available through the genome consortium website. FLP and NLP search strings were constructed from orthologous flp and nlp transcript sequences taken from C. elegans in the first instance, or another plant-parasitic nematode where a C. elegans orthologue was unavailable. Search strings were constructed by concatenating the mature peptides (including basic cleavage sites) encoded by each orthologue. Where this concatenation resulted in a search string of less than 20 characters, the string was repeated end-to-end at least once. Search strings were entered into BLASTP searches of the predicted proteins, and tBLASTn searches of the genome scaffolds, with E-value set to 1000. INS search strings were created by concatenating functionally conserved A and B peptide regions from the C. elegans ins gene complement, in addition to a series of mammalian and molluscan orthologues. These were used to search the predicted protein set of B. xylophilus, using a higher E-value threshold of 1,000,000. The BLAST returns were annotated for both A and B peptides, which were isolated, concatenated and used as BLAST search strings against the C. elegans nr protein set (NCBI) with an increased E-value of 1,000,000. All reciprocal BLAST returns with a Bit score ≥25 and an E-value≤10 were annotated for conserved A and B peptides, and the C. elegans orthologue with the highest similarity to these domains was accepted as identity. Putative INS orthologues which did not meet the reciprocal BLAST criteria, but which closely resemble the structure of INS A and B peptide domains are included as putative variant (Var) INS orthologues (≤25, ≥20 bit score; E-value ≥10, ≤20). All hits were analysed by eye for the presence of neuropeptide precursor sequences, dibasic cleavage sites, and analysed for the presence of secretory signal peptides using SignalP 3.0 .
Orthologues of C. elegans genes were identified from B. xylophilus protein set using BLASTP. The families of GSTs, UGTs, CYPs, and ABC transporters were identified with InterProScan. Their structures were confirmed manually with BLAST on NCBI and WormBase.
Protein sequences known to be involved in dauer formation and maintenance were retrieved from WormBase and used as search strings in a series of tBLASTn and BLASTP searches against B. xylophilus genome and protein sequences. An E-value cutoff of 1e-10 was used to identify significant matches.
Potential chemoreceptors included 7-transmembrane, G-protein coupled receptors (GPCRs), e. g. Str, Sra and Srg gene superfamilies in C. elegans , and other receptors with transmembrane structures, e. g., gustatory receptors (GURs, orthologues of insect gustatory receptors) in C. elegans, ionotropic glutamate receptors (IRs) in Drosophila melanogaster  and transient receptor potential (TRP) channels , . Other putative GPCRs including those for neurotransmitters (e.g., bioorganic amines) and other signal transduction pathways were also searched. BLASTP and InterProScan were used to search for B. xylophilus orthologues of these proteins. All primary BLASTP hits returning with an E-value ≤ 0.0001 and coverage ratio ≥ 0.7 (apart from insect GURs, which included only hits of very low similarity to nematode GURs) were analysed for identity and domain structure by BLASTP (NCBI's Conserved Domain Database service), WormBase (WS221) and TMHMM 2.0 .
Molecular phylogenetic trees of serpentine receptor and gustatory receptor proteins were built by maximum likelihood method using MEGA5 with the JTT matrix-based model . A few receptors were removed in some families based on very long branch lengths in a preliminary maximum likelihood tree. A maximum likelihood tree with culled proteins was drawn to a scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were eliminated.
The contigs resulting from Bursaphelenchus xylophilus assembly were deposited in the EMBL/Genbank/DDBJ databases under accession numbers CADV01000001-CADV01010432.
Our assembly strategy assembled the 6 pairs of nuclear chromosomes (Figure S2) into 1,231 scaffolds, totaling 74.5 Mb, with half of these nucleotides present in scaffolds of at least 1.16 Mb (Table 1). The size of the assembly is in good agreement with our experimental estimate of 69.0±5.5 Mb (see Text S1, Table S1 in Text S2). The genome size of 74.5 Mb is smaller than that of C. elegans and other published nematode genomes except that of Meloidogyne hapla. The GC content of the genome was 40.4%, higher than that of other nematodes except for Pristionchus pacificus. Most of the mitochondrial genome was assembled in a single 13,410 bp scaffold, and shows a similar gene content and organization to that of C. elegans (Figure S3). Analysis of conserved eukaryotic genes (CEGs) showed that 96.77% and 97.98% of CEGs were present as full or partial genes respectively, with an average of 1.08―1.09 genes per CEG (Table 1), suggesting high completeness of the assembly.
The assembly is approximately 22% repetitive, of which only around 1.3% had characteristics of transposable elements (TEs, Table 2). A complete set of tRNA (Table S2 in Text S2) and rRNA genes were found in the genome. In common with other parasitic nematodes, including B. malayi and M. incognita , SL2 trans-splicing appears not to exist in Bursaphelenchus, but we find 25 SL1-like sequences, found in the same tandem repeats as the 5S rRNAs, as in C. elegans.
Analysis of chromosomal rearrangements between B. xylophilus and C. elegans identified a similar pattern of macrosynteny to that found between the more distantly related Trichinella spiralis and C. elegans . Large B. xylophilus scaffolds largely contain genes orthologous to those from a single C. elegans chromosome. These genes are, however, interspersed by genes orthologous to those from other chromosomes (see Text S1, Figure 3).
A total of 18,074 protein coding genes were predicted in the assembly (Table 1). This is fewer than the 20,416 in C. elegans (WormBase WS221) and 19,212 in M. incognita, although it is higher than the number for M. hapla. The average protein length is similar to that of other nematodes, but B. xylophilus displays the largest average exon size (289 bp) and the smallest average number of exons per gene (4.5). The Bursaphelenchus genome shows a number of characteristics of compact parasite genomes, for example having relatively few, short introns like M. hapla, but has a similar repetitive element content to other published nematode genomes, and is overall only slightly smaller.
Our automated annotation of the B. xylophilus proteome assigned some functional information to a total of 12,483 proteins (69%) (see Text S1 Figure S4). The top 20 Pfam hits in B. xylophilus are shown in Figure 4, and compared with hits in C. elegans. As part of our annotation approach, B. xylophilus proteins were mapped to pathways defined by KEGG, and pathways that are under- and over-represented in this genome compared to C. elegans are shown in Table S3 in Text S2 and are discussed in sections focusing on particular biological features below.
The combined predicted proteins from B. xylophilus and 9 other nematode species were grouped into 27,547 families of orthologues and an additional 51,942 singleton proteins. We used a molecular phylogeny based on single-copy gene families to reconstruct the distribution and evolutionary dynamics of these gene families (Figure 5), and find that the B. xylophilus genome has been relatively conserved over the long divergence from other plant parasitic nematodes. The pattern of sharing of gene families between genomes (Figure 6) shows little obvious phylogenetic pattern, but identifies relatively small numbers of genes - 202 genes shared by the two plant parasitic genomes, and 144 genes shared by Pristionchus and Bursaphelenchus that both show a close association with insects during their lifecycle – that could be implicated in these particular specializations.
The plant cell wall is the primary barrier faced by most plant parasites and the production of enzymes able to break down this cell wall is thus of critical importance. A summary of the carbohydrate active enzymes (CAZymes) and expansin-like proteins which may modify plant cell walls detected in B. xylophilus and other nematodes is shown in Table 3. B. xylophilus contains 34 putative plant cell wall modifying enzymes but compared to other plant parasitic nematodes its composition is unique. Most interestingly, glycoside hydrolase family 45 (GH45) cellulases are present only in B. xylophilus. Other plant parasitic nematodes have GH5 proteins that degrade cellulose but no such genes are present in B. xylophilus. In addition, GH30 xylanases, GH43 arabinases and GH28 pectinases were also absent in B. xylophilus.
Cell wall degrading CAZymes found in plant parasitic nematodes are thought to have been acquired via horizontal gene transfer (HGT) because similar genes are absent in almost all other nematodes and because they are most similar to genes from bacteria or fungi , . GH5 cellulases have been found in many plant parasitic Tylenchoidea including Meloidogyne, Globodera and Heterodera. Recently, the genome sequence of P. pacificus revealed that this nematode also has GH5 cellulases . However phylogenetic analysis suggested that these cellulases were not closely related to those in the Tylenchida and that they are likely to have been acquired independently from different sources . B. xylophilus GH45 cellulases have not been found in any other nematode genus and are most similar to those from fungi. Thus these genes have been hypothesised to be acquired via HGT from fungi . In a phylogenetic analysis all 11 GH45 proteins found in the B. xylophilus genome are grouped in a highly supported monophyletic group and embedded within a clade of fungal homologues (Figure S5). This supports the idea of HGT from fungi with subsequent duplication within the B. xylophilus genome. Recent analysis has revealed that distribution of GH45 proteins is limited to the genus Bursaphelenchus and its sister genus Aphelenchoides (T Kikuchi unpublished results). The absence of GH5 genes in the B. xylophilus genome and the absence of GH45 proteins in Tylenchida nematodes support these hypotheses and suggest that HGT events have repeatedly played important roles in the evolution of plant parasitism in nematodes.
A systematic evolutionary study of plant cell wall modifying genes in Tylenchoidea concluded that those genes were acquired by multiple HGT events from bacteria closely associated with the ancestors of these nematodes followed by gene duplication . B. xylophilus is closely associated with fungi and it is likely that this feeding strategy is ancestral for this group as most Bursaphelenchus species are solely fungal feeders.
In addition to plant cell wall degrading enzymes, CAZymes are present in B. xylophilus which potentially degrade the fungal cell wall. Chitin is one of the main components of fungal cell walls. Proteins related to chitin degradation were identified in the B. xylophilus genome. In comparison to the two Meloidogyne species, P. pacificus and B. malayi the number of chitin-related CAZymes in B. xylophilus is increased, likely reflecting its fungal feeding activity (Table 4). C. elegans has a much larger number of GH18 proteins than B. xylophilus. This may be because those proteins have been used in C. elegans for specific biological features such as bacterial feeding. Interestingly, six GH16 proteins which may degrade beta-1,3-glucan, another core component of the fungal cell wall, have been identified in the genome while no homologues have been found in other nematodes (Table 4). Because the B. xylophilus GH16 β-1,3-glucanase genes are most similar to those from bacteria, it has been suggested that they were acquired from bacteria which were closely associated with its ancestor . This suggests that HGT processes, similar to those associated with plant cell wall modifying enzymes of other parasitic nematodes, have enhanced the ability of Bursaphelenchus spp. to feed on fungi.
Not all of the carbohydrate-active enzymes over-represented in the B. xylophilus genome are involved in cell-wall degradation – several other CAZyme families are substantially expanded in B. xylophilus compared to other nematodes sequenced to date (Table 4). For example, the genome also has more glycosyl transferases family 43 (GT43) genes than other nematodes. These proteins may have beta-glucuronyltransferase activities, but the reason for the increase in these genes in B. xylophilus remains unclear.
Peptidases (proteases) catalyse the cleavage of peptide bonds within proteins, play important functions in all cellular organisms and are involved in a broad range of biological processes. In nematodes, peptidases play critical roles not only in physiological processes including embryogenesis and cuticle remodeling during larval development but also in parasitic processes such as tissue penetration, digestion of host tissue for nutrition and evasion of the host immune response.
In our analysis 581 peptidase genes were identified in B. xylophilus, which is the largest number in any characterized nematode genome (Table 5), with peptidase families involved in extracellular digestion and lysosomal activities particularly expanded (see Text S1, Table S4 in Text S2). One family of endopeptidases appears to have been acquired by HGT from an ascomycete fungus (Table 6). In addition, B. xylophilus contains an expanded number of GH27 proteins homologous to the gana-1 gene of C. elegans (Table 4), which has α-galactosidase and α-N-acetylgalactosaminidase activities and is localized to lysosomes .
The gut granules of intestinal cells in C. elegans are intestine-specific secondary lysosomes, so lysosomal enzymes play important roles in the digestion of food proteins in nematodes. B. xylophilus has an expanded repertoire of peptidases and other digestive enzymes that are either secreted or localised in lysosomes and so may play important roles in food digestion. Genes in the lysosome pathway were the most significantly over-represented in B. xylophilus (Table S3 in Text S2). B. xylophilus uses food sources such as fungi and woody plants that may be difficult to digest and the expansion of digestive peptidases in the nematode is therefore consistent with its unusual life style.
Plant parasitic nematodes produce a variety of secreted proteins that mediate interactions with their hosts – these encompass a variety of functions and include the cell-wall modifying enzymes discussed above. Such proteins have variously been termed “parasitism genes” or “effectors” and encompass any protein secreted by the nematode into the host that manipulates the host to the benefit of the nematode. For example, cyst nematodes produce effectors that mimic plant peptides and which may help initiate the formation of the biotrophic feeding structures induced by these nematodes , as well as proteins that suppress host defence responses . We found that the majority of effectors from other plant parasitic nematodes have no homologues in B. xylophilus. Some significant matches to effectors from all three species were found (Table 7). However, the B. xylophilus sequences that matched these effectors, except cell wall degrading enzymes, either did not have predicted signal peptides or, if a signal peptide was predicted, homologues were also present in a wide range of other species including C. elegans and animal parasitic nematodes. Both these lines of evidence suggest that the B. xylophilus homologues identified in this analysis are not true effectors that play a role in parasitism. These findings are consistent with the differing biology of the various nematode groups; root knot and cyst nematodes are biotrophic species whereas B. xylophilus is a migratory endoparasite that does not rely on biotrophy.
There are two possible exceptions. B. xylophilus contains homologues of venom allergen proteins. These proteins are present in all nematodes investigated to date and are thought to be important for the parasitic process of animal and plant parasites (e.g. , ). Several venom allergen proteins from B. xylophilus have been characterized and are known to be expressed in the oesophageal gland cells . Our HGT analysis identified a putative cystatin, or cystein protease inhibitor, apparently acquired from a bacterium (Table 6). Cystatins are well known as immunomodulatory pathogenicity factors in the animal parasitic filarial nematodes , so this protein could potentially play a role in parasite-host interaction in a plant parasitic nematode. However proteins from this family are involved in regulating a variety of endogenous proteinase activities in many cellular roles and, for example, are described as having anti-fungal properties , so the function of this protein will require experimental verification. We also identified a total of 923 predicted secreted proteins in the B. xylophilus genome that show no significant similarity to proteins from other species (Dataset S1), representing a pool of candidates that may play a role in the interaction between B. xylophilus and the other organisms with which it interacts. Very few (5) of these sequences produce matches against other plant parasitic nematode ESTs, consistent with previous studies which have shown that secreted proteins of parasitic nematodes often bear a high proportion of novel genes .
Detoxification of potentially damaging compounds is an important process for any organism to cope with its environment and may be particularly crucial for parasitic organisms, which come under attack from host responses to infection. In particular, plant parasites must cope with a wide range of secondary metabolites that plants generate in order to protect their tissues .
B. xylophilus principally inhabits the resin canals of its pine hosts. The resin to which it is exposed – a complex mixture of compounds, including terpenoids  and cyclic aromatic compounds  – is likely to have nematocidal activity and, like the detoxification of xenobiotics by C. elegans , would be expected to proceed in three distinct phases: (I) the addition of functional groups to molecules, making them more suitable substrates for downstream; (II) the actual detoxification reactions; and (III) efflux. Cytochrome P450s (CYPs) represent the most important group of phase I proteins, and B. xylophilus encodes a similar number of CYPs to that found in C. elegans (Table 8). Of the two main families of phase II detoxification enzymes – the glutathione S-transferases (GSTs) and UDP-glucuronosyl transferases (UGTs), we identified 41 full-length and 26 partial GSTs, and 60 UGTs, similar numbers to those found in C. elegans (Table 8, Figure S6). The final phase of the detoxification process involves ATP-binding cassette (ABC) transporters actively exporting detoxified xenobiotics. A total of 106 ABC transporters were detected in the B. xylophilus genome; this number was about twice that for C. elegans and about three times that for M. incognita (Table 8), suggesting that B. xylophilus is particularly enriched in genes responsible for the efflux of detoxified molecules.
Finally, we investigated genes involved in regulating the detoxification process. In C. elegans, the transcription factor SKN-1 regulates expression of many detoxification enzymes , and SKN-1 activity is in turn controlled by a number of different pathways (see Figure S7). In the presence of oxidative stress or electrophilic compounds, SKN-1 induces the expression of many phase II detoxification enzymes. Orthologues of all these regulatory pathways can be identified in B. xylophilus, suggesting that the regulation of xenobiotic degradation may be conserved in nematodes.
There are other signs that Bursaphelenchus may have an unusual repertoire of genes involved in the defence against or utilisation of complex pine tree metabolites – in our KEGG pathway analysis, xenobiotic and drug metabolism through CYPs were among the pathways showing most significant enrichment in gene copy number over C. elegans, confirming that other genes, likely to be involved downstream of the CYP genes themselves, as well as efflux effectors, are notably enriched (Table S3 in Text S2). Furthermore, our search for carbohydrate active enzymes reported a number of genes classified into the GH109 family of glycosyl hydrolases (Table 5) that on closer inspection proved to be most similar (approx 39% identity and E-values<1E-50) to enzymes displaying trans-1,2-dihydrobenzene-1,2-diol dehydrogenase activity, which is involved in the pathway downstream of cytochrome P450 in the metabolism of naphthalene and other polycyclic aromatic hydrocarbons (PAHs)  (Table S3 in Text S2). While PAHs, including naphthalene itself, are known to be produced in small quantities by a few plant species , they are not known from pines, and it seems more likely that this enzyme is homologous to naphthalene-degrading enzymes but acts on some of the many other aromatic molecules generated by plants .
It seems likely that B. xylophilus has a larger number of detoxification enzymes than other plant parasitic nematodes (M. incognita and M. hapla), with similar or expanded repertoires of such genes to those reported for the free-living C. elegans and the necromenic P. pacificus  for the various components of the detoxification process. This expansion in detoxification process components may reflect the variety of stressful environments that it encounters during its life cycle, and perhaps the particular challenges of inhabiting living tissues in a plant host that produces diverse secondary toxic metabolites.
B. xylophilus embryos seem to form the anterior-posterior axis quite differently from those of C. elegans as the point of sperm entry becomes the future anterior end of the animal . Surprisingly, however, other early events in B. xylophilus embryos, such as pronuclear meeting and posterior spindle movement followed by the unequal first cell division are quite similar . Therefore, it is informative to compare and contrast the proteins that control these processes in these two species. Orthologues of the majority of C. elegans proteins involved in these processes were identified in B. xylophilus and appear to be highly conserved. However one putative homologue of the serine/threonine kinase protein PAR-1 was quite distinct in B. xylophilus from that in C. elegans in that the former was considerably smaller (467 AA compared to 1,192 AA in C. elegans); the implications of this difference are unknown.
The formation of dauer (or infective) larvae specialized for surviving adverse conditions or for invading host organisms is an important life stage for many nematodes. In B. xylophilus, we identified orthologues of most genes involved in pathways which regulate dauer larva formation and recovery in C. elegans  (Table S5 in Text S2). We also identified orthologues of genes involved in C. elegans dauer pheromone synthesis (see Text S1). As C. elegans, adverse conditions trigger B. xylophilus to enter the third-stage dauer larva (DL3 or dispersal third-stage larva LIII) (Figure 2). Pathways that respond to these environmental cues may be more conserved in B. xylophilus than in other parasitic nematodes, most of which use different cues when forming a dauer (infective) stage. In addition to DL3, B. xylophilus has a specialized stage called the fourth-stage dispersal larva (DL4 or LIV). B. xylophilus DL3 develop into the DL4 when stimulated by the presence of the vector beetle Monochamus alternatus and become ready to board the vector , . Previous studies showed that several novel genes are expressed specifically in the DL4 nematodes , suggesting that B. xylophilus responds to different environmental stimuli, and likely uses distinct pathways and proteins to control this part of the lifecycle.
Nematode neuropeptides are encoded on flp (FMRFamide-like peptide), nlp (neuropeptide-like protein) or ins (insulin-like peptide) genes , . Diverse arrays of neuropeptides exist within every nematode species that has been studied, and neuropeptide receptors are promising potential drug targets . The complexity of this peptidergic signalling environment likely aids behavioural diversity and plasticity in spite of the structurally simple nematode nervous system. We find B. xylophilus's flp and nlp gene complements are typical of those seen in other parasitic nematode species , , although the absence of two flp genes - flp-30 and -31 - is noteworthy (Table S6 in Text S2), as these genes have previously been considered unique to Meloidogyne spp. , . Their absence from B. xylophilus suggests they may associate with an obligate parasitic lifestyle. The discovery of seven ins-like orthologues in the B. xylophilus genome is significant as the first description of nematode INS-like peptides outside C. elegans (Table S6 in Text S2, Dataset S2).
Chemoreception governs essential aspects of the life of many invertebrates, including the search for mates and hosts and the timing of critical steps in their life cycles. Chemoreceptors constitute one interface between the animal and its world, and could be expected to exhibit local adaptations to the specific chemosensory niche of each organism. The main group of putative chemosensory genes in nematodes is represented by serpentine receptors, which are GPCRs, include a large number of families and are also important drug targets. We find representatives of most C. elegans serpentine receptor families in the B. xylophilus genome, but many represent specific expansions, so the two species have related but largely distinct repertoires. The total number of serpentine receptor genes identified from B. xylophilus represents only 10–20% of the number found in C. elegans  but 35–45 times of those in M. hapla . It is unclear whether these striking differences represent a reduced and/or expanded chemosensory systems in various nematodes, or whether additional gene families have been expanded to cover some of the chemosensory spectrum in B. xylophilus and other species. Other chemosensory genes identified include gustatory receptors, GPCR receptors for a range of neurotransmitters that could have chemosensory roles, and members of the ionotropic glutamate receptor family , among others (see Text S1, Figure S8–S10).
The B. xylophilus genome encodes more predicted orthologues of C. elegans RNAi pathway effectors (37 of a potential 78) than found in M. incognita (27) and M. hapla (28) (unpublished data). Whilst B. xylophilus has orthologues of eight of the nine small RNA biosynthetic protein-encoding genes considered, dsRNA uptake and spreading genes are not well represented, e.g. no sid gene orthologues were identified with rsd-3 the only representative gene identified. RNA-dependent RNA polymerases (RdRps) are expanded relative to C. elegans, with four ego-1-, two rrf-1-, and three rrf-3-like orthologues. Sixteen Argonaute (AGO) genes were identified relative to the 27 of C. elegans and, for some of these, there was divergence within the catalytic and RNA-binding MID subdomains; other RNA-induced silencing complex (RISC) cofactors were identified (ain-1, tsn-1 and vig-1). Whilst the short interfering RNA (siRNA) inhibitor eri-1 was not found, microRNA (miRNA) inhibitors (somi-1; xrn-2) were identified. Nuclear effectors were reasonably well represented such that most of the components of a functional RNAi pathway were identified within the B. xylophilus genome. See Text S1 and Table S7-S11 in Text S2 for details.
In addition to its status as an economically important plant pathogen, B. xylophilus is remarkable for its unusual biological traits that relate to its complex ecology. During its life cycle it occupies two distinct habitats – an insect and a tree – where it exploits a number of different food sources, including plant tissues and a wide variety of fungi. This adaptability to a number of different niches is reflected in its genome sequence. The presence of a rich repertoire of detoxification enzymes and transporters reflects B. xylophilus' habitat in the resin canals of its host trees, where it is exposed to a cocktail of secondary metabolites, and its elaboration of a narrow subset of carbohydrate metabolizing enzymes reflects it adaptation to break down plant cell walls. The unique complement of genes involved in cellulose degradation, and other catabolic enzymes and the absence of effectors previously known to function at the host-parasite interface, confirms that B. xylophilus has a mode of parasitism that is distinct from other plant parasitic nematodes. This parasitism is mediated by a unique suite of parasitism-related genes, assembled through a combination of gene duplication and horizontal gene transfer. The genome provides strong evidence of multiple independent horizontal gene transfer events and these have shaped the evolution of this group.
Most importantly the genome sequence will act as a foundation for functional studies using a wide range of techniques and will directly inform efforts aimed at controlling this parasite. The identification of genes involved in nematode invasion and feeding from the plant will empower efforts to understand the interaction of B. xylophilus's with its host. One exciting possibility is the potential for genomic information from the hosts of B. xylophilus to facilitate understanding of the host-parasite interaction and associated pathology; host genetics is likely to play a key role in the disease as B. xylophilus is non-pathogenic to American pine species. We have identified genes involved in a range of crucial biological processes, many of which, such as neuropeptides, GPCRs and developmental genes could be viable control targets. The presence of a rich set of RNAi pathway effector genes gives much hope that reverse genetics will underpin future functional genomics efforts in this species. In this way, the genome sequence provides the opportunity to identify and validate putative control targets without the need to rely on C. elegans and make assumptions on conserved functionality/importance between nematodes from different clades.
To our knowledge, only seven nematode genomes have previously been published, and data are available for only a handful more, mostly from the genus Caenorhabditis. Given the breadth of the nematode phylum, genomic information from any new nematode species is an important advance but, B. xylophilus, in particular, is the first species to be sequenced from clade 10, and the first from the order Aphelenchoidea. We hope that with the other imminent nematode genomes being sequenced, B. xylophilus will serve as an important comparator. These data provide a rich resource for those trying to develop novel control strategies directed against B. xylophilus. In addition, the parasite's unusual life cycle makes this genome sequence a unique resource to investigate the association between genome structure and lifestyle, casting new light on the many conserved processes for which the free-living non-parasitic C. elegans remains the pre-eminent model.
B. xylophilus novel putative secreted proteins.
Amino acid sequences of B. xylophilus INS-like proteins.
Accuracies of gene predictions by ab-initio methods and combined by EVidenceModeler (EVM). Gene prediction accuracy was calculated at the nucleotide, exon, and complete gene level using 200 manually curated gene models as references. Label Augustus, Snap and Genemark.hmm represent ab-initio gene predictions by each predictor. EVMall, which is the combined one by EVM with the optimised weight, includes the three ab-initio predictions and the GeneWise predictions based on pfam protein similarities. EVMall+PASA includes EVMall and Program to Assemble Spliced Alignments (PASA) alignment assemblies and corresponding terminal exon supplement. Sn, sensitivity.
Chromosome numbers of B. xylophilus and B. mucronatus. B. xylophilus (Ka4C1 line) chromosome was observed to be 2n=12. Bar=2 µm.
Gene orders of mitochondrial DNAs of B. xylophilus and C. elegans. Arrows indicate the direction of transcription of genes. The genomes contain 12 protein-coding genes (atp6, cob, cox1-3, nad1-6 and 4L), two rRNA genes (rrnS and rrnL), tRNA genes (circles with one letter codes to indicate transferred amino acid) and non-coding region (AT). Note that the B. xylophilus mtDNA sequence is not completed, lacking a short AT rich sequence.
Distribution of the three major gene ontology categories assigned to B. xylophilus. Predicted genes can have more than one GO term. The x axis indicates the percentage of the term compared to the total of the terms.
Unrooted phylogenetic tree of GH45 proteins. Amino acid sequences of GH45 proteins in CAZy website (www.cazy.org) were retrieved from uniprot. The maximum likelihood phylogenetic tree was made using Phylogeny.fr (www.phylogeny.fr) using default setting. Proteins with short lengths or with long branches in the preliminary tree were removed from analysis.
GST classifications in B. xylophilus and C. elegans. Full-length GSTs containing both conserved N and C domains are classified into the classes. GST phylogeny tree is reconstructed according to Zimniak & Singh .
Signalling pathways involved in phase II enzyme expression. The XREP-1/DDB-1/CUL-4 complex (red), the DAF-2 pathway (green), and glycogen synthase kinase (yellow) negatively regulate SKN-1 (dark blue), whereas the p38 MAPK pathway (light blue) positively regulates SKN-1. The relationships among these pathways are not clear. When animals are exposed to xenobiotics, SKN-1 accumulates in the nucleus and induces the expression of many phase II enzymes. C. elegans protein names are followed by more general names in brackets.
Phylogenetic tree of 16 gustatory receptors. The phylogenetic tree was built by maximum likelihood method according to MEGA5 based on the JTT matrix-based model with uniform rate. Three proteins of B. xylophilus, (GUR-1, -2 and -3 indicated with bold) proteins of nematodes C. elegans, C. remanei, C. briggsae (GUR indicated with red), and proteins of insects Drosophila pseudoobscura, Anopheles gambiae, Culex quinquefasciatus, Tribolium castaneum (GR with blu) collected from NCBI protein database were used in the analysis. The scale bar indicates number of amino acid changes per site. Bootstrap values more than 50% form 1000 replications were shown on appropriate branches.
Maximum likelihood tree of STR superfamily chemoreceptor proteins in B. xylophilus. The phylogenetic tree was built by maximum likelihood using MEGA5. All 5 families (Str, Srd, Srh, Sri and Srj) from Str superfamilies of B. xylophilus were included in the tree. The scale bar indicates number of amino acid changes per site. Bootstrap values more than 50% were shown in the tree.
Maximum likelihood tree of SRT family chemoreceptor proteins in B. xylophilus and C. elegans. The phylogenetic tree was built using MEGA5. The scale bar indicates number of amino acid changes per site. Bootstrap values more than 50% were shown in the tree.
Supplementary text – Detailed Result.
We thank Neil Rawlings for helpful advice about peptidases. We also thank Takuya Aikawa, Mituteru Akiba, Hayato Masuya, Katsunori Nakamura and members of Sanger Parasite Genomics team for discussion and helpful comments on the manuscript.
The authors have declared that no competing interests exist.
This work was partially supported by Grants-in-Aid for Scientific Research (KAKENHI 23248024 and 23380092) (http://www.jsps.go.jp/) and research grant #200704 of FFPRI (http://www.ffpri.affrc.go.jp/). TK was supported by JSPS Postdoctoral Fellowships for Research Abroad (http://www.jsps.go.jp/). JAC, IJT, SAS, AJR, AS-F, and MB are supported by the Wellcome Trust [grant WT 085775/Z/08/Z] (http://www.wellcome.ac.uk/). TDO is supported by European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n°242095: EVIMalaR - Towards the establishment of a permanent European Virtual Institute dedicated to Malaria Research (EVIMalaR) (http://cordis.europa.eu/fp7/). JH Institute receives funding from the Scottish Government. This work benefited from links funded though COST Action 872 (http://www.cost.esf.org/) and through ERASMUS MUNDUS project 2008-102 (EUMAINE) (http://www.eumaine.ugent.be). Swedish University of Agricultural Sciences was supported by Swedish Science Council, Carl Trygger Foundation (http://www.carltryggersstiftelse.se/), the Linnaeus initiative “Insect Chemical Ecology, Ethology and Evolution” (ICE3). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.