Cluster formation and analysis
To identify overlapping EST sequences, improve base accuracy and transcript length, and to produce non-redundant EST data for further functional annotation and comparative analysis, the 5,076 ESTs from the A. avenae library were grouped by sequence identity into clusters. Based upon regions of nucleotide identity, EST sequences were merged into contiguous consensus sequences (contigs). 'Contig' member ESTs derive from identical transcripts while 'cluster' members may derive from the same gene but represent different transcript splice isoforms (i.e ESTs form contigs, contigs form clusters). Two thousand seven hundred non-redundant EST clusters were generated from the ESTs (Table ). In 2,005 cases, clusters consist of a single EST, whereas the largest single cluster contains 81 ESTs (1 case) (Fig. ). The majority of contigs (650 out of 695) were composed of 2-10 ESTs. 89 clusters were found to contain multiple contig members, revealing potential splice isoforms. By eliminating redundancy during this contig building, the total number of nucleotides used for further analysis was reduced from 2.82 million to 1.61 million. In addition, this process significantly increased the length of assembled transcript sequences from 468 ± 114 nt for submitted ESTs alone to 595 ± 321 nt for contigs. The longest sequence generated also increased from 724 to 2,154 nt.
Histogram showing the distribution of ESTs from A. avenae by cluster size. For example, there were five clusters of size 23 containing a sum total of 115 ESTs. Distribution of contig sizes is not shown.
Based on the identified clusters 2,700 A. avenae
genes were identified, corresponding to a new gene discovery rate of 53% (2,700/5,076). However, 2700 clusters is likely to be an overestimate of the true gene discovery rate, as one gene could be represented by multiple nonoverlapping clusters. Such "fragmentation" has been estimated at 18% using C. elegans
as a reference genome [39
]. After allowing for such potential fragmentation, we estimated that the A. avenae
sequences derived from minimum of 2,214 genes giving a discovery rate of 44% (2,214/5,076). Assuming between 14,000 and 21,000 total genes, the range encompassed by Meloidogyne hapla
], M. incognita
] and C. elegans
(Wormpep v. 203), the cluster dataset could represent approximately 11-16% of A. avenae
Transcript abundance and highly represented genes
A high level of representation in a cDNA library usually correlates with high transcript abundance in the original biological sample [42
], although artifacts of library construction can result in a selection for or against some transcripts. The A. avenae
clusters were ranked according to the number of contributing ESTs, and the top 25 clusters are summarized in Table . Each of these clusters contained fifteen or more EST copies and represented 16% of the total number of ESTs obtained. Eighteen of the clusters had significant matches to genes with annotated functions based on BLASTX (E < 1e-5) against the non-redundant database, and all of these had homologues in nematodes. Transcripts abundantly represented in the A. avenae
library included genes encoding structural proteins (such as actin, collagen, tropomyosin and troponin C) and proteins which carry out core metabolic processes (e.g
. cytochrome c oxidase, ATP synthase). Other abundant ESTs included a small heat shock protein and phosphoenolpyruvate carboxykinase. The latter enzyme has previously been cloned from the parasitic nematodes, Haemonchus contortus
and Ascaris suum
]. Cluster AAC00541, containing 23 ESTs, was similar to an SXP/RAL-2 family protein from the parasitic nematode, Anisakis simplex
. Similar genes have previously been characterized from plant parasitic nematodes [45
], and individual genes have been shown to be expressed in a range of secretory tissues including the gland cells surrounding the main sense organs (amphids) and the hypodermis.
The most abundantly represented transcripts in the A avenae cDNA library
Seven of the 25 most abundantly represented transcripts from A. avenae had no significant similarity to any sequence in the non-redundant protein database (Table ). Since most nematode data are available only as ESTs and therefore not included in the BLASTX analysis, we compared these 7 contigs against dbEST using BLASTN and TBLASTX. However, these searches returned no significant matches (E < 1e-5). We also conducted BLASTN and TBLASTX searches against the non-redundant nucleotide database for these sequences. Six of the clusters did not return any matches from this database but cluster AAC00148 produced a match using TBLASTX analysis (E < 1e-5) (Table ).
Comparisons to proteins from other species
We compared the 2,700 cluster sequences from A. avenae
against three databases containing protein sequences from different organisms. The cluster sequences were compared with protein sequences from (i) C. elegans
(WORMPEP v.203 [46
]), (ii) other nematodes (available protein sequences and peptides from conceptually translated ESTs), and (iii) organisms other than nematodes (from the NCBI non-redundant protein database) [47
]. 66% of the A. avenae
clusters (1,782 of 2,700) had matches in one or more of the three databases and these matches were represented using SimiTri [48
] (Fig. ). In the majority of cases where homologies were found (1,242/1,782), matches were found in all three databases surveyed. Gene products in this category are generally widely conserved across metazoans and many are involved in core biological processes. Examination of the individual database searches showed that 1,567 (58.0%) had homologues in C. elegans
, 1,750 (64.8%) in other nematodes and 1,321 (48.9%) in organisms other than nematodes. The 918 clusters (34.0%) which had no significant similarity to any sequences in these three protein databases were searched against non-redundant nucleotide and dbEST databases using BLASTN and TBLASTX (employing a cut-off of 1e-05). 56 clusters generated matches in these searches but no matches were obtained for the remaining 862 sequences (31.9%).
Figure 3 Comparison of A. avenae cluster sequences with C. elegans, other nematodes and non-nematode protein sequence databases using SimiTri. The numbers at each vertex indicate the number of cluster sequences matching only that specific database. The numbers (more ...)
Table shows the 15 gene products with the highest level of conservation (E-value ranging from 0 to e-151) between A. avenae
and C. elegans
; these include gene products involved in cell structure (for example, actin, UNC-87), protein biosynthesis or regulation (for example, UBQ-1, elongation factor,) and metabolism (for example, enolase, cytochrome c oxidase). Representation of these clusters in the A. avenae
EST collection varied from 81 ESTs to 2 ESTs. None of these most conserved gene products were nematode specific. Out of all clusters, 461 (17.1%) had homology only to nematodes, either C. elegans
(4), other nematodes (137), or both (320) (Fig. ). The most conserved (1e-111) of these nematode-specific proteins was a homolog of a serine proteinase inhibitor, previously characterized from C. elegans
(K10D3.4) and parasitic nematodes [49
]. Among the other most conserved nematode-specific clusters were homologs of previously characterized C. elegans
structural proteins (for example cluster, AAC01973 matched to a collagen family protein, COL-176) as well as uncharacterized C. elegans
hypothetical proteins (for example, cluster, AAC01948 matched C. elegans
gene, C34E7.4 which has no known function).
Most conserved nematode genes between A. avenae and C. elegans
The 137 cluster sequences where homologs were present only in other nematodes were further categorized based on their BLAST (BLASTX and TBLASTX) results (Additional file 1
). Matches were found in plant parasitic, animal parasitic and free living nematodes. 24.8% of sequences (34 of 137) had homology only to sequences from plant parasitic nematodes. Some of these sequences were similar to previously characterized cell-wall-degrading enzymes, which are known to be involved in the parasitism process of these nematodes. For example, cluster, AAC01592 matched an expansin-like protein from B. xylophilus
] and cluster, AAC02968 matched a β-1,4-endoglucanase precursor from Globodera rostochiensis
]. Further analysis of some of the cell-wall-degrading enzymes present in A. avenae
is presented below.
Identification of transcripts encoding cell-wall-degrading enzymes
BLASTX analysis allowed us to identify various genes with significant similarity to genes encoding enzymes which degrade plant and fungal cell walls (Table ). The plant cell-wall-degrading enzymes that were identified included cellulase, pectate lyase, polygalacturonase and expansin, while transcripts encoding fungal cell-wall-degrading enzymes included β-1,3-endoglucanase and chitinase.
A. avenae transcripts similar to plant or fungal cell-wall-degrading enzymes
Eight cellulase genes were present in 8 different A. avenae clusters and in all cases homologues were found in other plant parasitic nematodes. One cellulase gene (AAC01152) was identified as a contig of six individual ESTs. Two clusters (AAC00199 and AAC00801) contained 2 ESTs each and remaining cellulase clusters were present as a singleton. Two types of pectin degrading enzymes: pectate lyase and polygalacturonase were identified (Table ). While all transcripts encoding pectate lyase genes were identified as singletons, polygalacturonase clusters contained either single or two individual ESTs. The features of the sequences of cellulase and pectate lyase are discussed in more detail below.
In addition to the plant cell-wall-degrading enzymes, we identified genes encoding expansin-like proteins in the A. avenae
dataset. Expansins and expansin-like proteins have recently been described in several plant parasitic nematodes [19
] and it is thought that these proteins disrupt non-covalent bonds in the plant cell wall, enhancing the activity of other enzymes such as cellulases. All four expansin-like transcripts were identified as a best match with the expansin genes from the B. xylophilus
Two different types of genes, β-1,3-endoglucanase and chitinase, that could encode enzymes, important in degradation of the fungal cell wall were identified. A gene encoding a β-1,3-endoglucanase has been cloned and characterized from B. xylophilus
] and is thought to aid fungal feeding in this nematode. Chitinase, an enzyme responsible for breakdown of β-1,4-glycosidic bonds within chitin, has been found in wide ranges of nematodes. Since chitin is known to be present in the eggshell [57
] and the microfilarial sheath [58
] of nematodes, it has been suggested that chitinases have a role in remodeling processes during the molting of filariae and in the hatching of larvae from the eggshell [59
]. However, the existence of large families of chitinases in the free-living nematode C. elegans
suggests that these enzymes may also fulfill other functions [61
]. In the plant parasitic nematode, Heterodera glycines
, chitinase was found to be expressed in the subventral oesophageal gland cells of the parasitic stages of this nematode, suggesting a role in parasitism but not in hatching [62
]. The fungal feeding plant parasitic B. xylophilus
also contains chitinase [15
]. Since β-1,3-glucan and chitin are the two major structural polysaccharides of the fungal cell-wall, it is possible that fungal feeding nematodes like B. xylophilus
and A. avenae
secrete these enzymes in order to metabolize or soften the fungal cell wall as part of the feeding process.
Characterisation of genes encoding cellulases and pectate lyases; analysis of sequences, phylogentics and expression analysis
Cellulase and pectate lyase have been identified and characterized in a wide range of plant parasitic nematodes [16
]. The presence of genes encoding cell-wall-degrading enzymes in A. avenae
opens up a new avenue for further molecular studies aimed at understanding their functional role in this nematode and investigating the origin and evolution of these genes within the Nematoda. We therefore cloned the full length cDNA and genomic sequences of two putative cellulases (named Aa-eng-1
) and two putative pectate lyases (named Aa-pel-1
) and compared these sequences to those from other plant parasitic nematodes.
The full-length sequences of the cellulases were identified from two plasmid clones whose EST sequences are part of cluster, AAC001152 (Table ). Although, six individual ESTs, form a contig to represent this cluster, one EST was selected as it showed a slightly different nucleotide sequence from the other five ESTs. Two different plasmid clones containing the full length cDNA sequences of the cellulases were subsequently selected and sequenced using the specific primers listed in Table .
Primers used in the analysis of cell-wall-degrading enzymes
cDNA was 1,104 bp in length (excluding the polyA tail) and included a 981-bp open reading frame (ORF) that could encode a protein of 327 amino acids with an ATG start codon at position 35 and a TGA stop codon at position 1,016 (Fig. ). The complete cDNA of Aa-eng-2
was 1,107 bp in length and also contained a potential ORF of 327 amino acids with an ATG start codon at position 35 and TGA stop codon at 1,016. A signal peptide of 19 amino acids is predicted by SignalP [63
] at the N terminus of the deduced AA-ENG-1 and AA-ENG-2 polypeptides. The predicted molecular masses of the putative mature proteins were 34.130 kDa and 34.059 kDa respectively and the theoretical pI value was 6.2 for both proteins. The AA-ENG-1 and AA-ENG-2 proteins contained a catalytic domain homologous to GHF5 β-1,4-endoglucanases as predicted by PRODOM [64
]. The deduced amino acid sequences showed highest similarity with the GHF5 endoglucanase from the migratory plant parasitic nematode Radopholus similis
(GenBank Accession No. [ABV54446
]). The highest non-nematode similarities of both AA-ENG-1 and AA-ENG-2 were with the β-1,4-endoglucanase from Cytophaga hutchinsoni
(cellulolytic gliding bacterium; GenBank Accession No. [YP_678708
]). AA-ENG-1 and AA-ENG-2 share 99% identity in their amino acid sequences.
Figure 5 Aa-eng-1cDNA sequence and predicted amino acid sequences. The predicted signal peptide for secretion and polyadenylation signal sequence are underlined. Predicted positions of the five intron sequences identified are indicated by darkened triangles. Primers (more ...)
Genomic clones of Aa-eng-1
were obtained by PCR amplification using gene-specific primers (Table ) and genomic DNA as template. The Aa-eng-1
genomic DNA products were1,518 bp and 1,522 bp long respectively from the ATG to the stop codon. The position of exon/intron boundaries of the genomic sequences were determined by aligning the genomic sequences with the corresponding cDNA sequences. All introns were bordered by canonical cis-splicing sequences [65
]. Five introns were identified in Aa-eng-1
(Fig. ) of which four introns were small (40 bp to 85 bp) a feature commonly found in nematodes [66
]. Only the first intron was larger (319 bp). Five introns were also identified in Aa-eng-2
. The first intron was 337 bp long and the length of remaining four introns ranged from 40 to 72 bp. The intron positions of Aa-eng-1
genes were identical to each other.
Sequence alignment of the two endoglucanases from A. avenae
with GHF5 endoglucanases from nematodes and other organisms revealed that both AA-ENG-1 and AA-ENG-2 possess a consensus pattern of GHF5 endoglucanases in their primary amino acid sequences in which two glutamic acids residues are the predicted proton donor and mucleophile/base of the catalytic site (Fig. ) which is also true for all previously described nematode GHF5 endoglucanases. In addition to the catalytic domain some of these proteins contain a cellulose binding domain (CBD) joined to the catalytic domain through a linker peptide. The GHF5 endoglucanase genes isolated from plant parasitic nematodes have also different structures: all have a signal peptide and catalytic domain, some have an additional linker and CBD and others only have a linker but no CBD [25
]. However, neither peptide linkers nor CBD domains were present in the two GHF5 endoglucanases isolated from A. avenae
. Expansins from cyst nematodes have also been shown to contain a CBD, but no such domains were predicted in other sequences within the EST dataset, including the putative expansins.
A phylogenetic tree was generated from an alignment of the β-1,4-endoglucanase protein sequences from AA-ENG-1, AA-ENG-2, cyst and root-knot nematodes, the migratory plant-parasitic nematodes R. similis, Pratylenchus penetrans, Pratylenchus coffeae and Ditylenchus africanus and GHF5 cellulases from phytophagous beetles, bacteria and protists (Fig. ). AA-ENG-1 and AA-ENG-2 clustered into a larger group of protein sequences including all nematode GHF5 cellulases, indicating that A. avenae cellulases are closely related to those of the Tylenchida. This analysis supports the idea that all nematode GHF5 cellulases evolved from a GHF5 sequence acquired by a common ancestor of this group.
Figure 6 Unrooted phylogenetic tree of GHF5 catalytic domains based on the protein sequences using the maximum likelihood method. The GHF5 proteins from A. avenae (AA-ENG-1 and AA-ENG-2) are labeled in bold. GenBank accession numbers of GHF5 proteins from Meloidogyne (more ...)
The β-1,4-endoglucanases are the largest family of cell-wall-degrading enzymes that have been identified in parasitic nematodes to date. Over the last decade, a large number of GHF5 endoglucanases have been identified and extensively studied in plant parasitic Tylenchida including cyst and root-knot nematodes [20
]. Genes encoding β-1,4-endoglucanases have also been found in Bursaphelenchus
spp but these enzymes are most similar to GHF45 cellulases from fungi [16
]. The presence of GHF5 cellulases within A. avenae
(as opposed to GHF45 cellulases) provides further support for the suggestion that this nematode is more closely related to the Tylenchida than to Bursaphelenchus
and its relatives.
Although the presence of GHF5 cellulases within A. avenae
can be readily explained given the phylogenetic arguments above, the presence of a β-1,3-endoglucanase is more surprising. These enzymes act to metabolise the fungal cell wall and have been previously described in Bursaphelenchus
spp. Such genes are not usually present in animals and it was suggested that the Bursaphelenchus
genes were acquired by horizontal gene transfer from bacteria [17
No such genes are present in root-knot nematodes (for which two genome sequences are available) or other Tylenchida. It is possible that a fungal feeding common ancestor of Aphelenchoidea and Tylenchida possessed this gene but that more "advanced" plant parasites have subsequently lost it. Further sequencing within both nematode groups is required to resolve this issue.
All the transcripts potentially encoding pectate lyases were identified as singletons (Table ). Two full-length cDNA sequences of pectate lyases, designated Aa-pel-1
, were identified from the plasmid clones corresponding to the cluster IDs AAC01649 and AAC03048 respectively. The complete cDNA of Aa-pel-1
was 821 bp in length and contained an ORF of 247 amino acids with a putative ATG start codon at position 30 and TAA stop codon at position 771. A signal peptide of 18 amino acids is predicted by SignalP [63
] at the N-terminus of the putative AA-PEL-1 amino acid sequence. The mature protein has a predicted molecular mass of 24.250 kDa and theoretical pI of 8.93.
The full-length Aa-pel-2
cDNA was 838 bp long and contained an ORF of 249 bp with an ATG start codon at position 31 and a TAA stop codon at position 778. An N-terminal signal sequence of 19 amino acids predicted by SignalP [63
]. The molecular mass and theoretical pI value of the putative AA-PEL-2 protein were 24.329 kDa and 9.12 respectively. AA-PEL-1 has 61% identity to AA-PEL-2.
To obtain genomic sequences, the entire coding regions of Aa-pel-1
gene were amplified from A. avenae
gDNA using gene specific primers (Table ). Analysis of these sequences showed that the Aa-pel-1
genes were 1,860 and 1,221 bp long respectively from the ATG to the stop codon. Two introns (468 bp and 651 bp) were identified in Aa-pel-1
contained only one intron (690 bp) (Fig. ). All introns were bordered by canonical cis-splicing sequences [65
]. The position of the second intron of Aa-pel-1
was identical to the intron in Aa-pel-2
Figure 7 Multiple sequence alignment of A. avenae pectate lyase protein sequences (AA-PEL-1 and AA-PEL-2) with the sequences of pectate lyases from plant parasitic nematodes, bacterium, and a fungus. BX-PEL-1 [BAE48369], BX-PEL-2 [BAE48370] from Bursaphelenchus (more ...)
The intron position in the A. avenae
) were compared with other nematode pectate lyase genes. The pectate lyase genes from Bursaphelenchus
species (Bx-pel-1/2 and Bm-pel-1/2
) each have one intron in their coding region at the same position [18
]. This position is also identical to the common intron position of the A. avenae
genes (Fig. ). Moreover, one of the three introns of Mi-pel-1
from M. incognita
is at the same position as that in the A. avenae
from G. rostochiensis
has six introns and Mi-pel-2
from M. incognita
has two introns. Gr-pel-1
share two intron positions but none of the introns of these genes have the same position as that in the A. avenae
genes (not shown).
A protein homology search using the deduced amino acid sequences of AA-PEL-1 and AA-PEL-2 using BLASTP indicated high similarity to the pectate lyases belonging to the the polysaccharide lyase family 3 (PL3) from plant parasitic nematodes, bacteria and fungi. Multiple sequence alignment of AA-PEL-1 and AA-PEL-2 with the best matches confirmed that both A. avenae
sequences contained the four highly conserved regions characteristic of PL3 pectate lyases in bacteria and fungi as well as 8-10 cysteine residues and four charged residues (Fig. ) that are potentially involved in catalysis [26
]. AA-PEL-1 and AA-PEL-2 were most similar to the pectate lyases (BX-PEL1/2 and BM-PEL1/2) from the pinewood nematodes B. xylophilus
and B. mucronatus
(52 to 59% identity for the AA-PEL-1 and 53 to 63% identity for AA-PEL-2) [18
]. A. avenae
sequences shared 23 to 33% identity with pectate lyases (MI-PEL-1, MI-PEL-2, MJ-PEL-1 and GR-PEL-1) from cyst and root-knot plant parasitic nematode spp [28
], and 39 to 45% identity with the sequences from two microbes (Fig. ).
A phylogenetic tree was generated from an alignment of A. avenae pectate lyase sequences with selected proteins belonging to PL3 from bacteria, fungi, and nematodes using the maximum likelihood method (Fig. ). Both Aa-pel-1 and Aa-pel-2 were clustered with the Bursaphelenchus genes. Other nematode sequences were not monophyletic but were clustered into distinct clades.
Figure 8 Unrooted phylogenetic tree of selected polysaccharide lyase family 3 proteins generated using maximum likelihood analysis. The PL3 proteins from A. avenae (AA-PEL-1 and AA-PEL-2) are labeled in bold. GenBank accession numbers of PL3 proteins from Bursaphelenchus (more ...)
The pectate lyase genes from A. avenae are more similar to the Bursaphelenchus genes compared to those from cyst and root-knot nematodes (Figs. and ). The identical position of the common intron of the A. avenae genes (Aa-pel-1 and Aa-pel-2) and introns within pectate lyase genes from Bursaphelenchus and M. incognita (Fig. ) suggests that pectate lyase genes from a wide range of plant parasitic nematodes have the same origin. To determine which A. avenae cells express GHF5 endoglucanases and pectate lyases, in situ mRNA hybridisation was performed (Fig. ). Digoxigenin-labeled antisense probes generated from Aa-eng-1 and Aa-pel-1 specifically hybridized with the transcripts in the esophageal gland cells of A. avenae (Figs. and ). Staining was observed in juvenile and adult nematodes rather than being restricted to a specific life stages. No hybridisation was observed with the control (sense) cDNA probes of Aa-eng-1 or Aa-pel-1 (Figs. and ).
Figure 9 Localization of Aa-eng-1 and Aa-pel-1 transcripts in the esophageal gland cells of A. avenae by in situ hybridisation. Nematode sections were hybridized with antisense (A) and sense (B) Aa-eng-1 digoxigenin-labeled cDNA probes. Hybridisation was also (more ...)
As a part of the complex process of parasitism, a wide range of plant parasitic nematodes use endogenous β-1,4-endoglucanases and pectate lyases to degrade two abundant constituents of the plant cell wall and thus facilitate their migration through host tissues. The presence of signal peptides in the deduced amino acid sequences of the endoglucanases and pectate lyase from A. avenae
coupled with their expression in the esophageal glands suggest that both enzymes have a similar role in A. avenae
. The presence of such genes in A. avenae
suggests that this nematode can enter and migrate through plant tissues and may also be able to feed on plant cell contents. This is backed up by the observation that A. avenae
is known to feed on plant tissue in culture [13
]. A. avenae
may therefore have a wide ranging diet that includes fungi and plant tissues. This, coupled with the position of this nematode as a basal member of a clade that includes a wide range of plant parasitic nematodes, provides further support for the idea that plant parasitism has evolved from fungal feeding and suggests that A. avenae
, may be a very primitive plant feeder.