The phylum Nematoda is biologically diverse, including parasites of plants and animals as well as free-living taxa. Underpinning this diversity will be commensurate diversity in expressed genes, including gene sets associated specifically with evolution of parasitism.
Methods and Findings
Here we have analyzed the extensive expressed sequence tag data (available for 37 nematode species, most of which are parasites) and define over 120,000 distinct putative genes from which we have derived robust protein translations. Combined with the complete proteomes of Caenorhabditis elegans and Caenorhabditis briggsae, these proteins have been grouped into 65,000 protein families that in turn contain 40,000 distinct protein domains. We have mapped the occurrence of domains and families across the Nematoda and compared the nematode data to that available for other phyla. Gene loss is common, and in particular we identify nearly 5,000 genes that may have been lost from the lineage leading to the model nematode C. elegans. We find a preponderance of novelty, including 56,000 nematode-restricted protein families and 26,000 nematode-restricted domains. Mapping of the latest time-of-origin of these new families and domains across the nematode phylogeny revealed ongoing evolution of novelty. A number of genes from parasitic species had signatures of horizontal transfer from their host organisms, and parasitic species had a greater proportion of novel, secreted proteins than did free-living ones.
These classes of genes may underpin parasitic phenotypes, and thus may be targets for development of effective control measures.
The high-throughput sequencing of messenger RNA from parasitic organisms has permitted large-scale sequence analyses typically reserved for complete genome studies. Such expressed sequence tags (ESTs) have previously been generated for 37 species from the phylum Nematoda, of which 35 were from parasitic species. These datasets were combined with the complete genomes of Caenorhabditis elegans and C. briggsae. The sequences were assembled into 65,000 protein families, and decorated with 40,000 distinct protein domains. These annotations were analysed in the context of the nematode phylogeny. We identified massive gene loss in the model nematode, C. elegans, as well as plant-like proteins in nematodes that cause crop damage. Furthermore, many protein families were found in small groups of closely related species and may represent innovations necessary to sustain their parasitic ecologies. All of these data are presented at NemBase (www.nematodes.org) and will aid researchers working on this important group of parasites.