Search tips
Search criteria

Results 1-25 (1071672)

Clipboard (0)

Related Articles

1.  SH3 interactome conserves general function over specific form 
The Caenorhabditis elegans SH3 domain interactome was mapped and compared with the yeast SH3 interactome. Orthologous SH3 domain-mediated interactions are highly rewired, but the general function of the SH3 domain network is conserved between the two species
C. elegans Src homology 3 (SH3) domain interactome was mapped using stringent yeast two-hybrid, resulting in a total of 1070 interactions among 79 out of 84 worm SH3 domains and 475 proteins.SH3 domain binding specificities were profiled for 36 worm SH3 domains using peptide phage display.The yeast and worm SH3 domain interactomes are significantly enriched in endocytosis proteins, but the specific interactions mediated by orthologous SH3 domains are highly rewired.Using the worm SH3 interactome, we identified new endocytosis proteins in worm and human.
Src homology 3 (SH3) domains bind peptides to mediate protein–protein interactions that assemble and regulate dynamic biological processes. We surveyed the repertoire of SH3 binding specificity using peptide phage display in a metazoan, the worm Caenorhabditis elegans, and discovered that it structurally mirrors that of the budding yeast Saccharomyces cerevisiae. We then mapped the worm SH3 interactome using stringent yeast two-hybrid and compared it with the equivalent map for yeast. We found that the worm SH3 interactome resembles the analogous yeast network because it is significantly enriched for proteins with roles in endocytosis. Nevertheless, orthologous SH3 domain-mediated interactions are highly rewired. Our results suggest a model of network evolution where general function of the SH3 domain network is conserved over its specific form.
PMCID: PMC3658277  PMID: 23549480
network evolution; phage display; protein interaction conservation; SH3 domains; yeast two-hybrid
2.  Identification of 526 Conserved Metazoan Genetic Innovations Exposes a New Role for Cofactor E-like in Neuronal Microtubule Homeostasis 
PLoS Genetics  2013;9(10):e1003804.
The evolution of metazoans from their choanoflagellate-like unicellular ancestor coincided with the acquisition of novel biological functions to support a multicellular lifestyle, and eventually, the unique cellular and physiological demands of differentiated cell types such as those forming the nervous, muscle and immune systems. In an effort to understand the molecular underpinnings of such metazoan innovations, we carried out a comparative genomics analysis for genes found exclusively in, and widely conserved across, metazoans. Using this approach, we identified a set of 526 core metazoan-specific genes (the ‘metazoanome’), approximately 10% of which are largely uncharacterized, 16% of which are associated with known human disease, and 66% of which are conserved in Trichoplax adhaerens, a basal metazoan lacking neurons and other specialized cell types. Global analyses of previously-characterized core metazoan genes suggest a prevalent property, namely that they act as partially redundant modifiers of ancient eukaryotic pathways. Our data also highlights the importance of exaptation of pre-existing genetic tools during metazoan evolution. Expression studies in C. elegans revealed that many metazoan-specific genes, including tubulin folding cofactor E-like (TBCEL/coel-1), are expressed in neurons. We used C. elegans COEL-1 as a representative to experimentally validate the metazoan-specific character of our dataset. We show that coel-1 disruption results in developmental hypersensitivity to the microtubule drug paclitaxel/taxol, and that overexpression of coel-1 has broad effects during embryonic development and perturbs specialized microtubules in the touch receptor neurons (TRNs). In addition, coel-1 influences the migration, neurite outgrowth and mechanosensory function of the TRNs, and functionally interacts with components of the tubulin acetylation/deacetylation pathway. Together, our findings unveil a conserved molecular toolbox fundamental to metazoan biology that contains a number of neuronally expressed and disease-related genes, and reveal a key role for TBCEL/coel-1 in regulating microtubule function during metazoan development and neuronal differentiation.
Author Summary
The evolution of multicellular animals (metazoans) from their single-celled ancestor required new molecular tools to create and coordinate the various biological functions involved in a communal, or multicellular, lifestyle. This would eventually include the unique cellular and physiological demands of specialized tissues like the nervous system. To identify and understand the genetic bases of such unique metazoan traits, we used a comparative genomics approach to identify 526 metazoan-specific genes which have been evolutionarily conserved throughout the diversification of the animal kingdom. Interestingly, we found that some of those genes are still completely uncharacterized or poorly studied. We used the metazoan model organism C. elegans to examine the expression of some poorly characterized metazoan-specific genes and found that many, including one encoding tubulin folding cofactor E-like (TBCEL; C. elegans COEL-1), are expressed in cells of the nervous system. Using COEL-1 as an example to understand the metazoan-specific character of our dataset, our studies reveal a new role for this protein in regulating the stability of the microtubule cytoskeleton during development, and function of the touch receptor neurons. In summary, our findings help define a conserved molecular toolbox important for metazoan biology, and uncover an important role for COEL-1/TBCEL during development and in the nervous system of the metazoan C. elegans.
PMCID: PMC3789837  PMID: 24098140
3.  Phylogenetic analysis of the human basic helix-loop-helix proteins 
Genome Biology  2002;3(6):research0030.1-research0030.18.
The basic helix-loop-helix (bHLH) proteins are a large and complex multigene family of transcription factors with important roles in animal development, including that of fruitflies, nematodes and vertebrates. The identification of orthologous relationships among the bHLH genes from these widely divergent taxa allows reconstruction of the putative complement of bHLH genes present in the genome of their last common ancestor.
We identified 39 different bHLH genes in the worm Caenorhabditis elegans, 58 in the fly Drosophila melanogaster and 125 in human (Homo sapiens). We defined 44 orthologous families that include most of these bHLH genes. Of these, 43 include both human and fly and/or worm genes, indicating that genes from these families were already present in the last common ancestor of worm, fly and human. Only two families contain both yeast and animal genes, and no family contains both plant and animal bHLH genes. We suggest that the diversification of bHLH genes is directly linked to the acquisition of multicellularity, and that important diversification of the bHLH repertoire occurred independently in animals and plants.
As the last common ancestor of worm, fly and human is also that of all bilaterian animals, our analysis indicates that this ancient ancestor must have possessed at least 43 different types of bHLH, highlighting its genomic complexity.
PMCID: PMC116727  PMID: 12093377
4.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes 
Genome Biology  2004;5(2):R7.
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes.
Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes.
The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.
PMCID: PMC395751  PMID: 14759257
5.  Protein–Protein Interactions More Conserved within Species than across Species 
PLoS Computational Biology  2006;2(7):e79.
Experimental high-throughput studies of protein–protein interactions are beginning to provide enough data for comprehensive computational studies. Today, about ten large data sets, each with thousands of interacting pairs, coarsely sample the interactions in fly, human, worm, and yeast. Another about 55,000 pairs of interacting proteins have been identified by more careful, detailed biochemical experiments. Most interactions are experimentally observed in prokaryotes and simple eukaryotes; very few interactions are observed in higher eukaryotes such as mammals. It is commonly assumed that pathways in mammals can be inferred through homology to model organisms, e.g. the experimental observation that two yeast proteins interact is transferred to infer that the two corresponding proteins in human also interact. Two pairs for which the interaction is conserved are often described as interologs. The goal of this investigation was a large-scale comprehensive analysis of such inferences, i.e. of the evolutionary conservation of interologs. Here, we introduced a novel score for measuring the overlap between protein–protein interaction data sets. This measure appeared to reflect the overall quality of the data and was the basis for our two surprising results from our large-scale analysis. Firstly, homology-based inferences of physical protein–protein interactions appeared far less successful than expected. In fact, such inferences were accurate only for extremely high levels of sequence similarity. Secondly, and most surprisingly, the identification of interacting partners through sequence similarity was significantly more reliable for protein pairs within the same organism than for pairs between species. Our analysis underlined that the discrepancies between different datasets are large, even when using the same type of experiment on the same organism. This reality considerably constrains the power of homology-based transfer of interactions. In particular, the experimental probing of interactions in distant model organisms has to be undertaken with some caution. More comprehensive images of protein–protein networks will require the combination of many high-throughput methods, including in silico inferences and predictions.
The IntAct database contains about ten large-scale data sets of protein–protein interactions. Each set contains thousands of experimentally observed pair interactions. Most pairs were observed in yeast (Saccharomyces cerevisiae), fly (Drosophila melanogaster), and worm (Caenorhabditis elegans). These interactions are often perceived as model organisms in the sense that one can infer that two mouse proteins interact if one experimentally observes the two corresponding proteins in worm to interact. Here, the authors analyzed in detail how the sequence signals of physical protein–protein interactions are conserved. It is a common assumption that protein–protein interactions can easily be inferred through homology transfer from one model organism to another organism of interest. Here, the authors demonstrated that such homology transfers are only accurate at unexpectedly high levels of sequence identity. Even more surprisingly, homology transfers of protein–protein interactions are significantly more reliable for protein pairs from the same species than for two protein pairs from different organisms. The observation that interactions were much more conserved within than across species was valid for all levels of sequence similarity, i.e. for very similar as well as for more diverged interologs.
PMCID: PMC1513270  PMID: 16854211
6.  Functional characterization in Caenorhabditis elegans of transmembrane worm-human orthologs 
BMC Genomics  2004;5:85.
The complete genome sequences for human and the nematode Caenorhabditis elegans offer an opportunity to learn more about human gene function through functional characterization of orthologs in the worm. Based on a previous genome-wide analysis of worm-human orthologous transmembrane proteins, we selected seventeen genes to explore experimentally in C. elegans. These genes were selected on the basis that they all have high confidence candidate human orthologs and that their function is unknown. We first analyzed their phylogeny, membrane topology and domain organization. Then gene functions were studied experimentally in the worm by using RNA interference and transcriptional gfp reporter gene fusions.
The experiments gave functional insights for twelve of the genes studied. For example, C36B1.12, the worm ortholog of three presenilin-like genes, was almost exclusively expressed in head neurons, suggesting an ancient conserved role important to neuronal function. We propose a new transmembrane topology for the presenilin-like protein family. sft-4, the worm ortholog of surfeit locus gene Surf-4, proved to be an essential gene required for development during the larval stages of the worm. R155.1, whose human ortholog is entirely uncharacterized, was implicated in body size control and other developmental processes.
By combining bioinformatics and C. elegans experiments on orthologs, we provide functional insights on twelve previously uncharacterized human genes.
PMCID: PMC533873  PMID: 15533247
7.  Expression of a unique drug-resistant Hsp90 ortholog by the nematode Caenorhabditis elegans 
Cell Stress & Chaperones  2003;8(1):93-104.
In all species studied to date, the function of heat shock protein 90 (Hsp90), a ubiquitous and evolutionarily conserved molecular chaperone, is inhibited selectively by the natural product drugs geldanamycin (GA) and radicicol. Crystal structures of the N-terminal region of yeast and human Hsp90 have revealed that these compounds interact with the chaperone in a Bergerat-type adenine nucleotide–binding fold shared throughout the gyrase, Hsp90, histidine kinase mutL (GHKL) superfamily of adenosine triphosphatases. To better understand the consequences of disrupting Hsp90 function in a genetically tractable multicellular organism, we exposed the soil-dwelling nematode Caenorhabditis elegans to GA under a variety of conditions designed to optimize drug uptake. Mutations in the gene encoding C elegans Hsp90 affect larval viability, dauer development, fertility, and life span. However, exposure of worms to GA produced no discernable phenotypes, although the amino acid sequence of worm Hsp90 is 85% homologous to that of human Hsp90. Consistent with this observation, we found that solid phase–immobilized GA failed to bind worm Hsp90 from worm protein extracts or when translated in a rabbit reticulocyte lysate system. Further, affinity precipitation studies using chimeric worm-vertebrate fusion proteins or worm C-terminal truncations expressed in reticulocyte lysate revealed that the conserved nucleotide-binding fold of worm Hsp90 exhibits the novel ability to bind adenosine triphosphate but not GA. Despite its unusual GA resistance, worm Hsp90 appeared fully functional when expressed in a vertebrate background. It heterodimerized with its vertebrate counterpart and showed no evidence of compromising its essential cellular functions. Heterologous expression of worm Hsp90 in tumor cells, however, did not render them GA resistant. These findings provide new insights into the nature of unusual N-terminal nucleotide-binding fold of Hsp90 and suggest that target-related drug resistance is unlikely to emerge in patients receiving GA-like chemotherapeutic agents.
PMCID: PMC514859  PMID: 12820659
8.  Genome-Wide Analysis Reveals Novel Genes Essential for Heme Homeostasis in Caenorhabditis elegans 
PLoS Genetics  2010;6(7):e1001044.
Heme is a cofactor in proteins that function in almost all sub-cellular compartments and in many diverse biological processes. Heme is produced by a conserved biosynthetic pathway that is highly regulated to prevent the accumulation of heme—a cytotoxic, hydrophobic tetrapyrrole. Caenorhabditis elegans and related parasitic nematodes do not synthesize heme, but instead require environmental heme to grow and develop. Heme homeostasis in these auxotrophs is, therefore, regulated in accordance with available dietary heme. We have capitalized on this auxotrophy in C. elegans to study gene expression changes associated with precisely controlled dietary heme concentrations. RNA was isolated from cultures containing 4, 20, or 500 µM heme; derived cDNA probes were hybridized to Affymetrix C. elegans expression arrays. We identified 288 heme-responsive genes (hrgs) that were differentially expressed under these conditions. Of these genes, 42% had putative homologs in humans, while genomes of medically relevant heme auxotrophs revealed homologs for 12% in both Trypanosoma and Leishmania and 24% in parasitic nematodes. Depletion of each of the 288 hrgs by RNA–mediated interference (RNAi) in a transgenic heme-sensor worm strain identified six genes that regulated heme homeostasis. In addition, seven membrane-spanning transporters involved in heme uptake were identified by RNAi knockdown studies using a toxic heme analog. Comparison of genes that were positive in both of the RNAi screens resulted in the identification of three genes in common that were vital for organismal heme homeostasis in C. elegans. Collectively, our results provide a catalog of genes that are essential for metazoan heme homeostasis and demonstrate the power of C. elegans as a genetic animal model to dissect the regulatory circuits which mediate heme trafficking in both vertebrate hosts and their parasites, which depend on environmental heme for survival.
Author Summary
Heme is an iron-containing cofactor for proteins involved in many critical cellular processes. However, free heme is toxic to cells, suggesting that heme synthesis, acquisition, and transport is highly regulated. Efforts to understand heme trafficking in multicellular organisms have failed primarily due to the inability to separate the processes of endogenous heme synthesis from heme uptake and transport. Caenorhabditis elegans is unique among model organisms because it cannot synthesize heme but instead eats environmental heme to grow and develop normally. Thus, worms are an ideal genetic animal model to study heme homeostasis. This work identifies a novel list of 288 heme-responsive genes (hrgs) in C. elegans and a number of related genes in humans and medically relevant parasites. Knocking down the function of each of these hrgs reveals roles for several in heme uptake, transport, and detection within the organism. Our study provides insights into metazoan regulation of organismal heme homeostasis. The identification of parasite-specific hrg homologs may permit the selective design and screening of drugs that specifically target heme uptake pathways in parasites without affecting the host. Thus, this work has therapeutic implications for the treatment of human iron deficiency, one of the top ten mortality factors world-wide.
PMCID: PMC2912396  PMID: 20686661
9.  The COG database: an updated version includes eukaryotes 
BMC Bioinformatics  2003;4:41.
The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.
We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.
The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.
PMCID: PMC222959  PMID: 12969510
10.  Metazoan Scc4 Homologs Link Sister Chromatid Cohesion to Cell and Axon Migration Guidance 
PLoS Biology  2006;4(8):e242.
Saccharomyces cerevisiae Scc2 binds Scc4 to form an essential complex that loads cohesin onto chromosomes. The prevalence of Scc2 orthologs in eukaryotes emphasizes a conserved role in regulating sister chromatid cohesion, but homologs of Scc4 have not hitherto been identified outside certain fungi. Some metazoan orthologs of Scc2 were initially identified as developmental gene regulators, such as Drosophila Nipped-B, a regulator of cut and Ultrabithorax, and delangin, a protein mutant in Cornelia de Lange syndrome. We show that delangin and Nipped-B bind previously unstudied human and fly orthologs of Caenorhabditis elegans MAU-2, a non-axis-specific guidance factor for migrating cells and axons. PSI-BLAST shows that Scc4 is evolutionarily related to metazoan MAU-2 sequences, with the greatest homology evident in a short N-terminal domain, and protein–protein interaction studies map the site of interaction between delangin and human MAU-2 to the N-terminal regions of both proteins. Short interfering RNA knockdown of human MAU-2 in HeLa cells resulted in precocious sister chromatid separation and in impaired loading of cohesin onto chromatin, indicating that it is functionally related to Scc4, and RNAi analyses show that MAU-2 regulates chromosome segregation in C. elegans embryos. Using antisense morpholino oligonucleotides to knock down Xenopus tropicalis delangin or MAU-2 in early embryos produced similar patterns of retarded growth and developmental defects. Our data show that sister chromatid cohesion in metazoans involves the formation of a complex similar to the Scc2-Scc4 interaction in the budding yeast. The very high degree of sequence conservation between Scc4 homologs in complex metazoans is consistent with increased selection pressure to conserve additional essential functions, such as regulation of cell and axon migration during development.
A complex previously found only in yeast is described in metazoa, where it functions both in chromatid cohesion and in migration during development.
PMCID: PMC1484498  PMID: 16802858
11.  Intrinsic Disorder Is a Common Feature of Hub Proteins from Four Eukaryotic Interactomes 
PLoS Computational Biology  2006;2(8):e100.
Recent proteome-wide screening approaches have provided a wealth of information about interacting proteins in various organisms. To test for a potential association between protein connectivity and the amount of predicted structural disorder, the disorder propensities of proteins with various numbers of interacting partners from four eukaryotic organisms (Caenorhabditis elegans, Saccharomyces cerevisiae, Drosophila melanogaster, and Homo sapiens) were investigated. The results of PONDR VL-XT disorder analysis show that for all four studied organisms, hub proteins, defined here as those that interact with ≥10 partners, are significantly more disordered than end proteins, defined here as those that interact with just one partner. The proportion of predicted disordered residues, the average disorder score, and the number of predicted disordered regions of various lengths were higher overall in hubs than in ends. A binary classification of hubs and ends into ordered and disordered subclasses using the consensus prediction method showed a significant enrichment of wholly disordered proteins and a significant depletion of wholly ordered proteins in hubs relative to ends in worm, fly, and human. The functional annotation of yeast hubs and ends using GO categories and the correlation of these annotations with disorder predictions demonstrate that proteins with regulation, transcription, and development annotations are enriched in disorder, whereas proteins with catalytic activity, transport, and membrane localization annotations are depleted in disorder. The results of this study demonstrate that intrinsic structural disorder is a distinctive and common characteristic of eukaryotic hub proteins, and that disorder may serve as a determinant of protein interactivity.
From the formulation of Emil Fisher's lock-and-key hypothesis in 1894 until the early 1990s, a dominating and widely accepted concept in molecular biology was the protein structure–function paradigm. According to this concept, a protein can perform its biological function(s) only after folding into a specific rigid 3-D structure. Only recently has the validity of this structure–function paradigm been seriously challenged, primarily through the wealth of counterexamples that have gradually accumulated over the past 15 years. These counterexamples demonstrated that many proteins exist in a natively unfolded (or intrinsically disordered) state, and function without a prerequisite stably folded structure. In many cases, the lack of structure is required for biological function. Previous results have implicated intrinsic disorder as having an important role in protein interactions. The authors generalize this notion by comparing interaction networks from four eukaryotic organisms: yeast, worm, fly, and human. They have found that within these networks the proteins that interact with multiple protein partners (network hubs) are significantly more disordered than proteins that interact with a single protein partner (network ends). The results of this study demonstrate that intrinsic structural disorder is a distinctive and common characteristic of hub proteins, and that disorder may serve as a determinant of protein interactivity.
PMCID: PMC1526461  PMID: 16884331
12.  Upstream plasticity and downstream robustness in evolution of molecular networks 
Gene duplication followed by the functional divergence of the resulting pair of paralogous proteins is a major force shaping molecular networks in living organisms. Recent species-wide data for protein-protein interactions and transcriptional regulations allow us to assess the effect of gene duplication on robustness and plasticity of these molecular networks.
We demonstrate that the transcriptional regulation of duplicated genes in baker's yeast Saccharomyces cerevisiae diverges fast so that on average they lose 3% of common transcription factors for every 1% divergence of their amino acid sequences. The set of protein-protein interaction partners of their protein products changes at a slower rate exhibiting a broad plateau for amino acid sequence similarity above 70%. The stability of functional roles of duplicated genes at such relatively low sequence similarity is further corroborated by their ability to substitute for each other in single gene knockout experiments in yeast and RNAi experiments in a nematode worm Caenorhabditis elegans. We also quantified the divergence rate of physical interaction neighborhoods of paralogous proteins in a bacterium Helicobacter pylori and a fly Drosophila melanogaster. However, in the absence of system-wide data on transcription factors' binding in these organisms we could not compare this rate to that of transcriptional regulation of duplicated genes.
For all molecular networks studied in this work we found that even the most distantly related paralogous proteins with amino acid sequence identities around 20% on average have more similar positions within a network than a randomly selected pair of proteins. For yeast we also found that the upstream regulation of genes evolves more rapidly than downstream functions of their protein products. This is in accordance with a view which puts regulatory changes as one of the main driving forces of the evolution. In this context a very important open question is to what extent our results obtained for homologous genes within a single species (paralogs) carries over to homologous proteins in different species (orthologs).
PMCID: PMC385226  PMID: 15070432
13.  Evolution of SET-domain protein families in the unicellular and multicellular Ascomycota fungi 
The evolution of multicellularity is accompanied by the occurrence of differentiated tissues, of organismal developmental programs, and of mechanisms keeping the balance between proliferation and differentiation. Initially, the SET-domain proteins were associated exclusively with regulation of developmental genes in metazoa. However, finding of SET-domain genes in the unicellular yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe suggested that SET-domain proteins regulate a much broader variety of biological programs. Intuitively, it is expected that the numbers, types, and biochemical specificity of SET-domain proteins of multicellular versus unicellular forms would reflect the differences in their biology. However, comparisons across the unicellular and multicellular domains of life are complicated by the lack of knowledge of the ancestral SET-domain genes. Even within the crown group, different biological systems might use the epigenetic 'code' differently, adapting it to organism-specific needs. Simplifying the model, we undertook a systematic phylogenetic analysis of one monophyletic fungal group (Ascomycetes) containing unicellular yeasts, Saccharomycotina (hemiascomycetes), and a filamentous fungal group, Pezizomycotina (euascomycetes).
Systematic analysis of the SET-domain genes across an entire eukaryotic phylum has outlined clear distinctions in the SET-domain gene collections in the unicellular and in the multicellular (filamentous) relatives; diversification of SET-domain gene families has increased further with the expansion and elaboration of multicellularity in animal and plant systems. We found several ascomycota-specific SET-domain gene groups; each was unique to either Saccharomycotina or Pezizomycotina fungi. Our analysis revealed that the numbers and types of SET-domain genes in the Saccharomycotina did not reflect the habitats, pathogenicity, mechanisms of sexuality, or the ability to undergo morphogenic transformations. However, novel genes have appeared for functions associated with the transition to multicellularity. Descendents of most of the SET-domain gene families found in the filamentous fungi could be traced in the genomes of extant animals and plants, albeit as more complex structural forms.
SET-domain genes found in the filamentous species but absent from the unicellular sister group reflect two alternative evolutionary events: deletion from the yeast genomes or appearance of novel structures in filamentous fungal groups. There were no Ascomycota-specific SET-domain gene families (i.e., absent from animal and plant genomes); however, plants and animals share SET-domain gene subfamilies that do not exist in the fungi. Phylogenetic and gene-structure analyses defined several animal and plant SET-domain genes as sister groups while those of fungal origin were basal to them. Plants and animals also share SET-domain subfamilies that do not exist in fungi.
PMCID: PMC2474616  PMID: 18593478
14.  Eleven ancestral gene families lost in mammals and vertebrates while otherwise universally conserved in animals 
Gene losses played a role which may have been as important as gene and genome duplications and rearrangements, in modelling today species' genomes from a common ancestral set of genes. The set and diversity of protein-coding genes in a species has direct output at the functional level. While gene losses have been reported in all the major lineages of the metazoan tree of life, none have proposed a focus on specific losses in the vertebrates and mammals lineages. In contrast, genes lost in protostomes (i.e. arthropods and nematodes) but still present in vertebrates have been reported and extensively detailed. This probable over-anthropocentric way of comparing genomes does not consider as an important phenomena, gene losses in species that are usually described as "higher". However reporting universally conserved genes throughout evolution that have recently been lost in vertebrates and mammals could reveal interesting features about the evolution of our genome, particularly if these losses can be related to losses of capability.
We report 11 gene families conserved throughout eukaryotes from yeasts (such as Saccharomyces cerevisiae) to bilaterian animals (such as Drosophila melanogaster or Caenorhabditis elegans). This evolutionarily wide conservation suggests they were present in the last common ancestors of fungi and metazoan animals. None of these 11 gene families are found in human nor mouse genomes, and their absence generally extends to all vertebrates. A total of 8 out of these 11 gene families have orthologs in plants, suggesting they were present in the Last Eukaryotic Common Ancestor (LECA). We investigated known functional information for these 11 gene families. This allowed us to correlate some of the lost gene families to loss of capabilities.
Mammalian and vertebrate genomes lost evolutionary conserved ancestral genes that are probably otherwise not dispensable in eukaryotes. Hence, the human genome, which is generally viewed as being the result of increased complexity and gene-content, has also evolved through simplification and gene losses. This acknowledgement confirms, as already suggested, that the genome of our far ancestor was probably more complex than ever considered.
PMCID: PMC1382263  PMID: 16420703
15.  Evolution of Innate Immunity: Clues from Invertebrates via Fish to Mammals 
Host responses against invading pathogens are basic physiological reactions of all living organisms. Since the appearance of the first eukaryotic cells, a series of defense mechanisms have evolved in order to secure cellular integrity, homeostasis, and survival of the host. Invertebrates, ranging from protozoans to metazoans, possess cellular receptors, which bind to foreign elements and differentiate self from non-self. This ability is in multicellular animals associated with presence of phagocytes, bearing different names (amebocytes, hemocytes, coelomocytes) in various groups including animal sponges, worms, cnidarians, mollusks, crustaceans, chelicerates, insects, and echinoderms (sea stars and urchins). Basically, these cells have a macrophage-like appearance and function and the repair and/or fight functions associated with these cells are prominent even at the earliest evolutionary stage. The cells possess pathogen recognition receptors recognizing pathogen-associated molecular patterns, which are well-conserved molecular structures expressed by various pathogens (virus, bacteria, fungi, protozoans, helminths). Scavenger receptors, Toll-like receptors, and Nod-like receptors (NLRs) are prominent representatives within this group of host receptors. Following receptor–ligand binding, signal transduction initiates a complex cascade of cellular reactions, which lead to production of one or more of a wide array of effector molecules. Cytokines take part in this orchestration of responses even in lower invertebrates, which eventually may result in elimination or inactivation of the intruder. Important innate effector molecules are oxygen and nitrogen species, antimicrobial peptides, lectins, fibrinogen-related peptides, leucine rich repeats (LRRs), pentraxins, and complement-related proteins. Echinoderms represent the most developed invertebrates and the bridge leading to the primitive chordates, cephalochordates, and urochordates, in which many autologous genes and functions from their ancestors can be found. They exhibit numerous variants of innate recognition and effector molecules, which allow fast and innate responses toward diverse pathogens despite lack of adaptive responses. The primitive vertebrates (agnathans also termed jawless fish) were the first to supplement innate responses with adaptive elements. Thus hagfish and lampreys use LRRs as variable lymphocyte receptors, whereas higher vertebrates [cartilaginous and bony fishes (jawed fish), amphibians, reptiles, birds, and mammals] developed the major histocompatibility complex, T-cell receptors, and B-cell receptors (immunoglobulins) as additional adaptive weaponry to assist innate responses. Extensive cytokine networks are recognized in fish, but related signal molecules can be traced among invertebrates. The high specificity, antibody maturation, immunological memory, and secondary responses of adaptive immunity were so successful that it allowed higher vertebrates to reduce the number of variants of the innate molecules originating from both invertebrates and lower vertebrates. Nonetheless, vertebrates combine the two arms in an intricate inter-dependent network. Organisms at all developmental stages have, in order to survive, applied available genes and functions of which some may have been lost or may have changed function through evolution. The molecular mechanisms involved in evolution of immune molecules, might apart from simple base substitutions be as diverse as gene duplication, deletions, alternative splicing, gene recombination, domain shuffling, retrotransposition, and gene conversion. Further, variable regulation of gene expression may have played a role.
PMCID: PMC4172062  PMID: 25295041
evolution; immunity; innate immunity; adaptive immunity; invertebrates; vertebrates
16.  Bacteria, Yeast, Worms, and Flies: Exploiting simple model organisms to investigate human mitochondrial diseases 
The extensive conservation of mitochondrial structure, composition, and function across evolution offers a unique opportunity to expand our understanding of human mitochondrial biology and disease. By investigating the biology of much simpler model organisms, it is often possible to answer questions that are unreachable at the clinical level. Here, we review the relative utility of four different model organisms, namely the bacteria Escherichia coli, the yeast Saccharomyces cerevisiae, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, in studying the role of mitochondrial proteins relevant to human disease. E. coli are single cell, prokaryotic bacteria that have proven to be a useful model system in which to investigate mitochondrial respiratory chain protein structure and function. S. cerevisiae is a single-celled eukaryote that can grow equally well by mitochondrial-dependent respiration or by ethanol fermentation, a property that has proven to be a veritable boon for investigating mitochondrial functionality. C. elegans is a multi-cellular, microscopic worm that is organized into five major tissues and has proven to be a robust model animal for in vitro and in vivo studies of primary respiratory chain dysfunction and its potential therapies in humans. Studied for over a century, D. melanogaster is a classic metazoan model system offering an abundance of genetic tools and reagents that facilitates investigations of mitochondrial biology using both forward and reverse genetics. The respective strengths and limitations of each species relative to mitochondrial studies are explored. In addition, an overview is provided of major discoveries made in mitochondrial biology in each of these four model systems.
PMCID: PMC3628736  PMID: 20818735
Escherichia coli; Saccharomyces cerevisiae; Caenhorabditis elegans; Drosophila melanogaster; mitochondria; model organisms
17.  A Specificity Map for the PDZ Domain Family 
PLoS Biology  2008;6(9):e239.
PDZ domains are protein–protein interaction modules that recognize specific C-terminal sequences to assemble protein complexes in multicellular organisms. By scanning billions of random peptides, we accurately map binding specificity for approximately half of the over 330 PDZ domains in the human and Caenorhabditis elegans proteomes. The domains recognize features of the last seven ligand positions, and we find 16 distinct specificity classes conserved from worm to human, significantly extending the canonical two-class system based on position −2. Thus, most PDZ domains are not promiscuous, but rather are fine-tuned for specific interactions. Specificity profiling of 91 point mutants of a model PDZ domain reveals that the binding site is highly robust, as all mutants were able to recognize C-terminal peptides. However, many mutations altered specificity for ligand positions both close and far from the mutated position, suggesting that binding specificity can evolve rapidly under mutational pressure. Our specificity map enables the prediction and prioritization of natural protein interactions, which can be used to guide PDZ domain cell biology experiments. Using this approach, we predicted and validated several viral ligands for the PDZ domains of the SCRIB polarity protein. These findings indicate that many viruses produce PDZ ligands that disrupt host protein complexes for their own benefit, and that highly pathogenic strains target PDZ domains involved in cell polarity and growth.
Author Summary
The PDZ domain is a structural domain that functions as a protein–protein interaction module that recognizes specific C-terminal peptide sequences to assemble intracellular complexes important in signaling pathways of multicellular organisms. These modules are associated with human disease and are targets of viruses and other pathogens. By examining peptide specificity and substrate diversity of roughly one half of the PDZ domains known to exist in human and the nematode Caenorhabditis elegans, we were able to show that PDZ domains are more specific than previously appreciated. PDZ domains also remain functional under high mutational pressure, and only a few of the vast number of possible PDZ domain specificities are utilized in nature. These PDZ domain specificities are conserved from human to worm, implying that the specificities evolved early and were reused over evolution instead of being reshaped. The specificity map generated here was used to predict and experimentally confirm new viral PDZ-binding motifs. We present evidence that pathogenic viruses, including avian influenza, bind host PDZ domains via these motifs, thereby competing with signaling by host complexes, which leads to disruption of growth and polarity of the host cells.
A genome-scale specificity map for PDZ domains reveals how family members recognize ligands to assemble signaling complexes and also reveals how viruses target these domains to subvert host cell function.
PMCID: PMC2553845  PMID: 18828675
18.  A genomic analysis of chronological longevity factors in budding yeast 
Cell Cycle  2011;10(9):1385-1396.
Chronological life span (CLS) has been studied as an aging paradigm in yeast. A few conserved aging genes have been identified that modulate both chronological and replicative longevity in yeast as well as longevity in the nematode Caenorhabditis elegans; however, a comprehensive analysis of the relationship between genetic control of chronological longevity and aging in other model systems has yet to be reported. To address this question, we performed a functional genomic analysis of chronological longevity for 550 single-gene deletion strains, which accounts for approximately 12% of the viable homozygous diploid deletion strains in the yeast ORF deletion collection. This study identified 33 previously unknown determinants of CLS. We found no significant enrichment for enhanced CLS among deletions corresponding to yeast orthologs of worm aging genes or among replicatively long-lived deletion strains, although a trend toward overlap was noted. In contrast, a subset of gene deletions identified from a screen for reduced acidification of culture media during growth to stationary phase was enriched for increased CLS. These results suggest that genetic control of CLS under the most commonly utilized assay conditions does not strongly overlap with longevity determinants in C. elegans, with the existing confined to a small number of genetic pathways. These data also further support the model that acidification of the culture medium plays an important role in survival during chronological aging in synthetic medium, and suggest that chronological aging studies using alternate medium conditions may be more informative with regard to aging of multicellular eukaryotes.
PMCID: PMC3356828  PMID: 21447998
aging; genomic; screen; lifespan; yeast; C. elegans; pH; chronological; replicative
19.  Evolutionary cores of domain co-occurrence networks 
The modeling of complex systems, as disparate as the World Wide Web and the cellular metabolism, as networks has recently uncovered a set of generic organizing principles: Most of these systems are scale-free while at the same time modular, resulting in a hierarchical architecture. The structure of the protein domain network, where individual domains correspond to nodes and their co-occurrences in a protein are interpreted as links, also falls into this category, suggesting that domains involved in the maintenance of increasingly developed, multicellular organisms accumulate links. Here, we take the next step by studying link based properties of the protein domain co-occurrence networks of the eukaryotes S. cerevisiae, C. elegans, D. melanogaster, M. musculus and H. sapiens.
We construct the protein domain co-occurrence networks from the PFAM database and analyze them by applying a k-core decomposition method that isolates the globally central (highly connected domains in the central cores) from the locally central (highly connected domains in the peripheral cores) protein domains through an iterative peeling process. Furthermore, we compare the subnetworks thus obtained to the physical domain interaction network of S. cerevisiae. We find that the innermost cores of the domain co-occurrence networks gradually grow with increasing degree of evolutionary development in going from single cellular to multicellular eukaryotes. The comparison of the cores across all the organisms under consideration uncovers patterns of domain combinations that are predominately involved in protein functions such as cell-cell contacts and signal transduction. Analyzing a weighted interaction network of PFAM domains of Yeast, we find that domains having only a few partners frequently interact with these, while the converse is true for domains with a multitude of partners. Combining domain co-occurrence and interaction information, we observe that the co-occurrence of domains in the innermost cores (globally central domains) strongly coincides with physical interaction. The comparison of the multicellular eukaryotic domain co-occurrence networks with the single celled of S. cerevisiae (the overlap network) uncovers small, connected network patterns.
We hypothesize that these patterns, consisting of the domains and links preserved through evolution, may constitute nucleation kernels for the evolutionary increase in proteome complexity. Combining co-occurrence and physical interaction data we argue that the driving force behind domain fusions is a collective effect caused by the number of interactions and not the individual interaction frequency.
PMCID: PMC1079808  PMID: 15788102
20.  The unfolded protein response in fission yeast modulates stability of select mRNAs to maintain protein homeostasis 
eLife  2012;1:e00048.
The unfolded protein response (UPR) monitors the protein folding capacity of the endoplasmic reticulum (ER). In all organisms analyzed to date, the UPR drives transcriptional programs that allow cells to cope with ER stress. The non-conventional splicing of Hac1 (yeasts) and XBP1 (metazoans) mRNA, encoding orthologous UPR transcription activators, is conserved and dependent on Ire1, an ER membrane-resident kinase/endoribonuclease. We found that the fission yeast Schizosaccharomyces pombe lacks both a Hac1/XBP1 ortholog and a UPR-dependent-transcriptional-program. Instead, Ire1 initiates the selective decay of a subset of ER-localized-mRNAs that is required to survive ER stress. We identified Bip1 mRNA, encoding a major ER-chaperone, as the sole mRNA cleaved upon Ire1 activation that escapes decay. Instead, truncation of its 3′ UTR, including loss of its polyA tail, stabilized Bip1 mRNA, resulting in increased Bip1 translation. Thus, S. pombe uses a universally conserved stress-sensing machinery in novel ways to maintain homeostasis in the ER.
eLife digest
Protein folding—the process by which a sequence of amino acids adopts the precise shape that is needed to perform a specific biological function—is one of the most important processes in all of biology. Any sequence of amino acids has the potential to fold into a large number of different shapes, and misfolded proteins can lead to toxicity and other problems. For example, all cells rely on signaling proteins in the membranes that enclose them to monitor their environment so that they can adapt to changing conditions and, in multicellular organisms, communicate with neighboring cells: without properly folded signaling proteins, chaos would ensue. Moreover, many diseases—including diabetes, cancer, viral infection and neurodegenerative disease—have been linked to protein folding processes. It is not surprising, therefore, that cells have evolved elaborate mechanisms to exert exquisite quality control over protein folding.
One of these mechanisms, called the unfolded protein response (UPR), operates in a compartment within the cell known as the endoplasmic reticulum (ER). The ER is a labyrinthine network of tubes and sacs within all eukaryotic cells, and most proteins destined for the cell surface or outside the cell adopt their properly folded shapes within this compartment. If the ER does not have enough capacity to fold all of the proteins that are delivered there, the UPR switches on to increase the protein folding capacity, to expand the surface area and volume of the compartment, and to degrade misfolded proteins. If the UPR cannot adequately adjust the folding capacity of the ER to meet the demands of the cell, the UPR triggers a program that kills the cell to prevent putting the whole organism at risk.
Researchers have identified the cellular components that monitor the protein folding conditions inside the ER. All eukaryotic cells, from unicellular yeasts to mammalian cells, contain a highly conserved protein-folding sensor called Ire1. In all species analyzed to date, Ire1 is known to activate the UPR through an messenger RNA (mRNA) splicing mechanism. This splicing event provides the switch that drives a gene expression program in which the production of ER components is increased to boost the protein folding capacity of the compartment.
Kimmig, Diaz et al. now report the first instance of an organism in which the UPR does not involve mRNA splicing or the initiation of a gene expression program. Rather, the yeast Schizosaccharomyces pombe utilizes Ire1 to an entirely different end. The authors find that the activation of Ire1 in S. pombe leads to the selective decay of a specific class of mRNAs that all encode proteins entering the ER. Thus, rather than increasing the protein folding capacity of the ER when faced with an increased protein folding load, S. pombe cells correct the imbalance by decreasing the load.
The authors also show that a lone mRNA—the mRNA that encodes the molecular chaperone BiP, which is one of the major protein-folding components in the ER—uniquely escapes this decay. Rather than being degraded, Ire1 truncates BiP mRNA and renders it more stable. By studying the UPR in a divergent organism, the authors shed new light on the evolution of a universally important process and illustrate how conserved machinery has been repurposed.
PMCID: PMC3470409  PMID: 23066505
Unfolded Protein Response; Ire1; selective mRNA decay; Bip1 mRNA stabilization; ER homeostasis; S. pombe
21.  Highly Divergent Mitochondrial ATP Synthase Complexes in Tetrahymena thermophila 
PLoS Biology  2010;8(7):e1000418.
Tetrahymena ATP synthase, an evolutionarily divergent protein complex, has a very unusual structure and protein composition including a unique Fo subunit a and at least 13 proteins with no orthologs outside of the ciliate lineage.
The F-type ATP synthase complex is a rotary nano-motor driven by proton motive force to synthesize ATP. Its F1 sector catalyzes ATP synthesis, whereas the Fo sector conducts the protons and provides a stator for the rotary action of the complex. Components of both F1 and Fo sectors are highly conserved across prokaryotes and eukaryotes. Therefore, it was a surprise that genes encoding the a and b subunits as well as other components of the Fo sector were undetectable in the sequenced genomes of a variety of apicomplexan parasites. While the parasitic existence of these organisms could explain the apparent incomplete nature of ATP synthase in Apicomplexa, genes for these essential components were absent even in Tetrahymena thermophila, a free-living ciliate belonging to a sister clade of Apicomplexa, which demonstrates robust oxidative phosphorylation. This observation raises the possibility that the entire clade of Alveolata may have invented novel means to operate ATP synthase complexes. To assess this remarkable possibility, we have carried out an investigation of the ATP synthase from T. thermophila. Blue native polyacrylamide gel electrophoresis (BN-PAGE) revealed the ATP synthase to be present as a large complex. Structural study based on single particle electron microscopy analysis suggested the complex to be a dimer with several unique structures including an unusually large domain on the intermembrane side of the ATP synthase and novel domains flanking the c subunit rings. The two monomers were in a parallel configuration rather than the angled configuration previously observed in other organisms. Proteomic analyses of well-resolved ATP synthase complexes from 2-D BN/BN-PAGE identified orthologs of seven canonical ATP synthase subunits, and at least 13 novel proteins that constitute subunits apparently limited to the ciliate lineage. A mitochondrially encoded protein, Ymf66, with predicted eight transmembrane domains could be a substitute for the subunit a of the Fo sector. The absence of genes encoding orthologs of the novel subunits even in apicomplexans suggests that the Tetrahymena ATP synthase, despite core similarities, is a unique enzyme exhibiting dramatic differences compared to the conventional complexes found in metazoan, fungal, and plant mitochondria, as well as in prokaryotes. These findings have significant implications for the origins and evolution of a central player in bioenergetics.
Author Summary
Synthesis of ATP, the currency of the cellular energy economy, is carried out by a rotary nano-motor, the ATP synthase complex, which uses proton flow to drive the rotation of protein subunits so as to produce ATP. There are two main components in mitochondrial F-type ATP synthase complexes, each made up of a number of different proteins: F1 has the catalytic sites for ATP synthesis, and Fo forms channels for proton movement and provides a bearing and stator to contain the rotary action of the motor. The two parts of the complex have to interact with each other, and critical protein subunits of the enzyme are conserved from bacteria to higher eukaryotes. We were surprised that a group of unicellular organisms called alveolates (including ciliates, apicomplexa, and dinoflagellates) seemed to lack two critical proteins of the Fo component. We have isolated intact ATP synthase complexes from the ciliate Tetrahymena thermophila and examined their structure by electron microscopy and their protein composition by mass spectrometry. We found that the ATP synthase complex of this organism is quite different, both in its overall structure and in many of the associated protein subunits, from the ATP synthase in other organisms. At least 13 novel proteins are present within this complex that have no orthologs in any organism outside of the ciliates. Our results suggest significant divergence of a critical bioenergetic player within the alveolate group.
PMCID: PMC2903591  PMID: 20644710
22.  Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in C. elegans 
PLoS ONE  2013;8(4):e62204.
The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB) fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5–25%). Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In C. elegans, the number of OB-fold proteins reported is remarkably low (n = 46) compared to other evolutionary-related eukaryotes, such as yeast S. cerevisiae (n = 344) or fruit fly D. melanogaster (n = 84). Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies.
Methodology/Principal Findings
This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in C. elegans.
This study raises the possibility that the annotation of highly divergent protein fold families can be improved in C. elegans. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of C. elegans, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome.
PMCID: PMC3636199  PMID: 23638006
23.  Comparative Genomics of the Eukaryotes 
Science (New York, N.Y.)  2000;287(5461):2204-2215.
A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae—and the proteins they are predicted to encode—was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease.
PMCID: PMC2754258  PMID: 10731134
24.  Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation 
BMC Genomics  2008;9:609.
One mechanism to account for robustness against gene knockouts or knockdowns is through buffering by gene duplicates, but the extent and general correlates of this process in organisms is still a matter of debate. To reveal general trends of this process, we provide a comprehensive comparison of gene essentiality, duplication and buffering by duplicates across seven bacteria (Mycoplasma genitalium, Bacillus subtilis, Helicobacter pylori, Haemophilus influenzae, Mycobacterium tuberculosis, Pseudomonas aeruginosa, Escherichia coli), and four eukaryotes (Saccharomyces cerevisiae (yeast), Caenorhabditis elegans (worm), Drosophila melanogaster (fly), Mus musculus (mouse)).
In nine of the eleven organisms, duplicates significantly increase chances of survival upon gene deletion (P-value ≤ 0.05), but only by up to 13%. Given that duplicates make up to 80% of eukaryotic genomes, the small contribution is surprising and points to dominant roles of other buffering processes, such as alternative metabolic pathways. The buffering capacity of duplicates appears to be independent of the degree of gene essentiality and tends to be higher for genes with high expression levels. For example, buffering capacity increases to 23% amongst highly expressed genes in E. coli. Sequence similarity and the number of duplicates per gene are weak predictors of the duplicate's buffering capacity. In a case study we show that buffering gene duplicates in yeast and worm are somewhat more similar in their functions than non-buffering duplicates and have increased transcriptional and translational activity.
In sum, the extent of gene essentiality and buffering by duplicates is not conserved across organisms and does not correlate with the organisms' apparent complexity. This heterogeneity goes beyond what would be expected from differences in experimental approaches alone. Buffering by duplicates contributes to robustness in several organisms, but to a small extent – and the relatively large amount of buffering by duplicates observed in yeast and worm may be largely specific to these organisms. Thus, the only common factor of buffering by duplicates between different organisms may be the by-product of duplicate retention due to demands of high dosage.
PMCID: PMC2627895  PMID: 19087332
25.  A longitudinal study of Caenorhabditis elegans larvae reveals a novel locomotion switch, regulated by Gαs signaling 
eLife  2013;2:e00782.
Despite their simplicity, longitudinal studies of invertebrate models are rare. We thus sought to characterize behavioral trends of Caenorhabditis elegans, from the mid fourth larval stage through the mid young adult stage. We found that, outside of lethargus, animals exhibited abrupt switching between two distinct behavioral states: active wakefulness and quiet wakefulness. The durations of epochs of active wakefulness exhibited non-Poisson statistics. Increased Gαs signaling stabilized the active wakefulness state before, during and after lethargus. In contrast, decreased Gαs signaling, decreased neuropeptide release, or decreased CREB activity destabilized active wakefulness outside of, but not during, lethargus. Taken together, our findings support a model in which protein kinase A (PKA) stabilizes active wakefulness, at least in part through two of its downstream targets: neuropeptide release and CREB. However, during lethargus, when active wakefulness is strongly suppressed, the native role of PKA signaling in modulating locomotion and quiescence may be minor.
eLife digest
The roundworm C. elegans is a key model organism in neuroscience. It has a simple nervous system, made up of just 302 neurons, and was the first multicellular organism to have its genome fully sequenced. The lifecycle of C. elegans begins with an embryonic stage, followed by four larval stages and then adulthood, and worms can progress through this cycle in only three days. However, relatively little is known about how the behaviour of the worms varies across these distinct developmental phases.
The body wall of C. elegans contains pairs of muscles that extend along its length, and when waves of muscle contraction travel along its body, the worm undergoes a sinusoidal pattern of movement. A signalling cascade involving a molecule called protein kinase A is thought to help control these movements, and upregulation of this cascade has been shown to increase locomotion.
Now, Nagy et al. have analysed the movement of C. elegans during these different stages of development. This involved developing an image processing tool that can analyze the position and posture of a worm’s body in each of three million (or more) images per day. Using this tool, which is called PyCelegans, Nagy et al. identified two behavioral macro-states in one of the larval forms of C. elegans: these states, which can persist for hours, are referred to as active wakefulness and quiet wakefulness. During periods of active wakefulness, the worms spent most (but not all) of their time moving forwards; during quiet wakefulness, they remained largely still.
The worms switched abruptly between these two states, and the transition seemed to be regulated by PKA signaling. By using PyCelegans to compare locomotion in worms with mutations in genes encoding various components of this pathway, Nagy et al. showed that mutants with increased PKA activity spent more time in a state of active wakefulness, while the opposite was true for worms with mutations that reduced PKA activity.
In addition to providing new insights into the control of locomotion in C. elegans, this study has provided a new open-source PyCelegans suite of tools, which are available to be extended and adapted by other researchers for new uses.
PMCID: PMC3699835  PMID: 23840929
quiet wakefulness; modulation; longitudinal study; PKA; CREB; unc-31/CAPS; C. elegans

Results 1-25 (1071672)