As recently as February 2000 only 22 ESTs from plant parasitic nematodes had been deposited in dbEST. As of October 2002, that number has risen to 46,876, including 42,210 from Washington University and collaborators. Included are 32,735 sequences from Meloidogyne species (M. incognita 12,752, M. hapla 11,049, M. javanica 5,600, M. arenaria 3,334), as well as ESTs from cyst nematode species (G. rostochiensis 5,934, H. glycines 4,327, G. pallida 1,832), and the lesion nematode (P. penetrans 2,048). The majority of these sequences have been isolated from L2 and egg libraries, but sequencing from more diverse stages is now underway.
The only previous analysis of root knot nematodes ESTs [29
] used 914 ESTs from M. incognita
L2 without clustering and with non-automated assignment of genes to categories. The two datasets share some overlap, with 35% (316/914) of the previously analyzed ESTs finding matches in 16% (261/1,625) of the NemaGene clusters analyzed in this paper, many with strong homology (< 1e-40). This overlap was less than expected given the redundancy of the cDNA library analyzed here, at nearly 6,000 ESTs, and suggest that: first, libraries made by different methods are likely to result in different representation from an mRNA pool (either different genes or other portions of the same genes as a result of different 5' processivity); and second, that M. incognita
L2 are likely to have a substantial number of unsampled messages awaiting generation of new libraries or library normalization. The semi-automated clustering, sequence homology searching and scripted assignment of sequences to functional categories presented here is a scalable approach to analysis that can be applied to larger datasets.
In addition to applying the approaches presented here to larger and more diverse datasets, further topics in Meloidogyne
genome analysis have yet to be explored. The availability of ESTs representing different developmental stages of Meloidogyne
will allow an examination of changes in gene representation between stages, and in turn an understanding of the relative importance of various metabolic processes at different stages of development. EST sequences and their corresponding clones can be further used to study relative expression level between stages and conditions using microarrays [75
] and amplified fragment length polymorphism (AFLP) approaches [76
]. Contig sequences within clusters can also be compared directly for evidence of alternative splicing, another feature which might correlate with developmental stage. Other topics where bioinformatics analysis of available ESTs can improve current knowledge of Meloidogyne
molecular biology include the identification of secreted and transmembrane proteins through secretion signal sequence detection [77
], the creation of a more accurate codon usage/bias and amino-acid usage tables [78
], the identification of conserved genes and pathways used in dauer/infective stages across nematode species [79
], the definition and study of nematode-specific domains [55
], and improved phylogenies based on sampling from multiple genes [53
While ESTs do not provide information on genome organization in Meloidogyne
(no genome sequence or physical map is yet available), they can shed light on the organization of the C. elegans
genome. For instance, C. elegans
autosomes are organized into central regions dense with predicted genes, highly expressed genes and known mutants, whereas the chromosome arms contain more repetitive sequences and have a higher meiotic recombination rate [31
]. By using the expanding collection of ESTs from nematodes at various evolutionary distances from C. elegans
, the hypothesis that genes on the autosome arms are more rapidly evolving can be tested more systematically. Mapping of ESTs from other nematode species can also detect genes contained in the C. elegans
genome yet not previously recognized, and therefore missing from Wormpep, as well as recognized genes where not all exons have been correctly predicted.
In conclusion, the 5,713 ESTs analyzed here in 1,625 clusters probably represent 6-10% of the genes in the M. incognita genome. This initial study, which will be expanded as further sequences are generated, demonstrates that EST generation is an effective method for the discovery of the new genes in plant parasitic nematodes. Further, functional categorization and comparison to known sequences allows the identification of important biological processes at specific developmental stages as well as unusual sequences, such as horizontal gene transfer candidates.