The advance on genome sequencing and computational methods for clustering homologous proteins has been helping the scientific community to reevaluate several aspects of basic biology. Here we have applied clustering of protein sequences chosen from two clades of organisms that are known to be autotrophic for the biosynthesis of Essential Amino Acids (EAAs). Furthermore, we searched for the enzymes responsible for nitrogen assimilation, incorporating ammonium into glutamate. Lack of cytoplasmic glutamate dehydrogenase leads to a dependency of amino acids consumption as the source of organic nitrogen, i.e., the organism in a certain sense actually becomes auxotrophic to both EAAs and NEAAs (Non-Essential Amino Acids), in order to build other nitrogen-containing molecules.
The work presented here takes advantage of both the Seed Linkage software and a home-built UniProt Enriched KEGG Orthology database (UEKO) as source of information, to rapidly group homologues of fungi and plant amino acid sequences, respectively represented by Saccharomyces cerevisiae and Arabidopsis thaliana. KEGG Orthology contains to date more than 1 million sequences from nearly 1,000 genomes and it was enriched by a procedure developed by our group to attain 2,442,384 sequences from 25,024 organisms, constituting the UEKO database (UniRef50 enriched KEGG Orthology database, to be published elsewhere and further distributed). Counting the total recruited sequences reported in this work (31,392), the percentage of recruitment by (i) Seed Linkage, (ii) original KO or (iii) the enriched portion of KO (UEKO) was, respectively, 6%, 44% and 50%. Moreover, 26% of all detected enzymes for the phyla represented in Figures , and were exclusively detected by Seed Linkage software and/or UEKO database. These numbers reinforce the relevance on the development of homologous searching capability, improving the ability of KEGG Orthology database to build a scenario for the biological processes of interest such as those presented here. Moreover, on top of the search for homologues represented by circles in the Figures, a complementary search using the 31,392 clustered sequences allowed the investigation of all UniProt sequences, including fragments (e.g. UniProt accession B7QGP4, VIL1 from Arthropoda) and some full length proteins not accessed by the initial search (e.g. UniProt accession D3AYE6, complete protein K14, from Amoebozoa; actually a more recent version of KO already incorporates this entry). It is important to notice that, in UniProt, the technical term fragment is applied to partial CDS sequences, a product of incompletely sequenced mRNA, as well as amino acid sequences modeled from the genome that lack initial methionine. Thus they might represent additional evidence of the enzyme presence rather than a reminiscent pseudogene. Stringent criteria (1x10-10 e-value, 50% identity and 50% subject coverage cutoffs) were adjusted with extensive manual inspection and additional evidences were included as triangles in the Figures. One evidence collected as triangle claimed our attention, since it came from a clade bearing the complete genome of the well annotated organism Drosophila melanogaster (Figure , enzyme VIL1, phylum Arthropoda). Manual inspection reveals that the evidence yielded by the additional search (represented by triangle) returned a hit from Ixodes scapularis (a genome under “assembly” status), but remarkably, the gene was found to be missing in the fly. Thus, this represents a recent gene loss within a non functional pathway.
The main interest of this work was to depict the evolution of amino acids essentiality, or heterotrophy. Grouping organisms into phyla level allowed easy labeling of clades that comprise organisms with sequenced or draft genomes, as shown in Figures , and , making it possible to infer deletion events distinctively in these clades. It is important to notice that many phyla contain complete genomes, which allowed us to figure out the deletion process with more certainty. However, the picturing of the entire scenario allowed the analysis to be extended to the branched clades, although this requires additional caution on interpretation. Even escaping the scope of this work, it suggests a demand for planned choice of genomes to be completely sequenced, since as clearly shown here we lack information from several phyla such as the ones represented with empty circles (e.g. Cryptophyta, Haptophyta, Neocallimastigomycota and Glaucophyta). Enzymes not found by our analysis requires further attention and search using more sensitive methods and detailed manual or even experimental analysis, to detect divergent sequences; in other words, the absence of evidence is not evidence of absence. However, the present work exemplifies a method that can be easily applied to other scenarios of gene/pathway loss.
The scenario of amino acid auxotrophy supports the hypothesis of a Great Genomic Deletion model of amino acid biosynthesis in association with heterotrophy. This phenomenon has probably occurred several times, particularly at the origin of metazoans. This deletion has been likely associated with endosymbiotic relationships or with the development of systems specialized in nutrient absorption. It seems that amino acid essentiality has been originated as a phenotypic loss of pathways early in Choanozoa, followed by multiple losses during metazoan evolution. Similar progresses of deletions occur closer to Heterokontophyta and Rhizaria, culminating in Apicomplexa. Rhodophyta and Microsporidia also attain the auxotrophy.
Moreover, remaining enzymes set apart from their original roles in amino acid biosynthetic metabolism seem to be more prone to evolutionary changes whilst enzymes present in complete pathways are more structurally conserved among distant phyla (Figures and ). Although a detailed investigation is needed, our preliminary analysis suggests that the copies which remained in metazoan genomes may have suffered subfunctionalization and sometimes this might have occurred in more ancestral organisms (Figure and additional files 2
). Thus, in some sense, the orthologue enzyme might actually have been deleted in animals, and the divergent copy is the one remaining. These divergent copies are sometimes named outparalogues. We are currently investigating substitution rate ratios and promoter elements in these genes.
Subsequent deletion includes the enzymes implicated in nitrogen assimilation, which takes place just after the broad deletion of EAAs biosynthetic enzymes (since except metazoans, other eukaryotic clades lack biosynthetic pathways and contains a nitrogen assimilative enzyme), as observed in more derived metazoans, but not Cnidaria. Most Cnidaria are carnivorous, so one possibility is that Cnidaria may benefit from the assimilation of organic nitrogen under long periods of fasting, however this finding needs additional investigation. Thus, the simplest explanation, is that the loss of nitrogen assimilative enzymes are related to lower selective pressure associated with the origin of the most heterotrophic organisms, animals.
To our knowledge this is the first initiative to clarify the complete scenario using powerful homologous grouping approaches and the total repertoire of sequenced genomes.