The multinational SYSCILIA consortium aims to gain a mechanistic understanding of the cilium. We utilize multiple parallel high-throughput (HTP) initiatives to develop predictive models of relationships between complex genotypes and variable phenotypes of ciliopathies. The models generated are only as good as the wet laboratory data fed into them. It is therefore essential to orchestrate a well-annotated and high-confidence dataset to be able to assess the quality of any HTP dataset. Here, we present the inaugural SYSCILIA gold standard of known ciliary components as a public resource.
Quantifying patterns of adaptive divergence between taxa is a major goal in the comparative and evolutionary study of prokaryote genomes. When applied appropriately, the McDonald-Kreitman (MK) test is a powerful test of selection based on the relative frequency of non-synonymous and synonymous substitutions between species compared to non-synonymous and synonymous polymorphisms within species. The webserver ODoSE (Ortholog Direction of Selection Engine) allows the calculation of a novel extension of the MK test, the Direction of Selection (DoS) statistic, as well as the calculation of a weighted-average Neutrality Index (NI) statistic for the entire core genome, allowing for systematic analysis of the evolutionary forces shaping core genome divergence in prokaryotes. ODoSE is hosted in a Galaxy environment, which makes it easy to use and amenable to customization and is freely available at www.odose.nl.
Current epigenetic research makes frequent use of whole-genome ChIP profiling for determining the in vivo binding of proteins, e.g. transcription factors and histones, to DNA. Two important and recurrent questions for these large scale analyses are: 1) What is the genomic distribution of a set of binding sites? and 2) Does this genomic distribution differ significantly from another set of sites?
We exemplify the functionality of the PinkThing by analysing a ChIP profiling dataset of cohesin binding sites. We show the subset of cohesin sites with no CTCF binding have a characteristic genomic distribution different from the set of all cohesin sites.
The PinkThing is a web application for fast and easy analysis of the context of genomic loci, such as peaks from ChIP profiling experiments. The output of the PinkThing analysis includes: categorisation of position relative to genes (intronic, exonic, 5’ near, 3’ near 5’ far, 3’ far and distant), distance to the closest annotated 3’ and 5’ end of genes, direction of transcription of the nearest gene, and the option to include other genomic elements like ESTs and CpG islands. The PinkThing enables easy statistical comparison between experiments, i.e. experimental versus background sets, reporting over- and underrepresentation as well as p-values for all comparisons. Access and use of the PinkThing is free and open (without registration) to all users via the website: http://pinkthing.cmbi.ru.nl
The large size of metabolic networks entails an overwhelming multiplicity in the possible steady-state flux distributions that are compatible with stoichiometric constraints. This space of possibilities is largest in the frequent situation where the nutrients available to the cells are unknown. These two factors: network size and lack of knowledge of nutrient availability, challenge the identification of the actual metabolic state of living cells among the myriad possibilities. Here we address this challenge by developing a method that integrates gene-expression measurements with genome-scale models of metabolism as a means of inferring metabolic states. Our method explores the space of alternative flux distributions that maximize the agreement between gene expression and metabolic fluxes, and thereby identifies reactions that are likely to be active in the culture from which the gene-expression measurements were taken. These active reactions are used to build environment-specific metabolic models and to predict actual metabolic states. We applied our method to model the metabolic states of Saccharomyces cerevisiae growing in rich media supplemented with either glucose or ethanol as the main energy source. The resulting models comprise about 50% of the reactions in the original model, and predict environment-specific essential genes with high sensitivity. By minimizing the sum of fluxes while forcing our predicted active reactions to carry flux, we predicted the metabolic states of these yeast cultures that are in large agreement with what is known about yeast physiology. Most notably, our method predicts the Crabtree effect in yeast cells growing in excess glucose, a long-known phenomenon that could not have been predicted by traditional constraint-based modeling approaches. Our method is of immediate practical relevance for medical and industrial applications, such as the identification of novel drug targets, and the development of biotechnological processes that use complex, largely uncharacterized media, such as biofuel production.
Metabolic fluxes are steady-state rates of metabolite interconversion within living cells. They determine the rates of growth and product formation, and are of biotechnological and medical importance. An important and pressing question is how to identify the actual distribution of fluxes in living cells among the manifold possibilities that complex metabolic networks allow. One way to address this question is to constrain the space of possibilities using gene-expression measurements. Here we present a method that uses gene-expression measurements to infer the metabolic state of cells growing in uncharacterized environments. We applied this method to model the metabolism of Saccharomyces cerevisiae grown with glucose or ethanol as main energy source. Our modeling enables the prediction of genes that are essential for growth in either environment. We also show that our method predicts aspects of the energy metabolism of these cultures that are in large agreement with what is known about yeast physiology. Our method is of direct practical importance in the fields of biotechnology and medicine, such as in vivo drug target identification, where nutrient conditions are largely unknown.
Azole compounds are the primary therapy for patients with diseases caused by Aspergillus fumigatus. However, prolonged treatment may cause resistance to develop, which is associated with treatment failure. The azole target cyp51A is a hotspot for mutations that confer phenotypic resistance, but in an increasing number of resistant isolates the underlying mechanism remains unknown. Here, we report the discovery of a novel resistance mechanism, caused by a mutation in the CCAAT-binding transcription factor complex subunit HapE. From one patient, four A. fumigatus isolates were serially collected. The last two isolates developed an azole resistant phenotype during prolonged azole therapy. Because the resistant isolates contained a wild type cyp51A gene and the isolates were isogenic, the complete genomes of the last susceptible isolate and the first resistant isolate (taken 17 weeks apart) were sequenced using Illumina technology to identify the resistance conferring mutation. By comparing the genome sequences to each other as well as to two A. fumigatus reference genomes, several potential non-synonymous mutations in protein-coding regions were identified, six of which could be confirmed by PCR and Sanger sequencing. Subsequent sexual crossing experiments showed that resistant progeny always contained a P88L substitution in HapE, while the presence of the other five mutations did not correlate with resistance in the progeny. Cloning the mutated hapE gene into the azole susceptible akuBKU80 strain showed that the HapE P88L mutation by itself could confer the resistant phenotype. This is the first time that whole genome sequencing and sexual crossing strategies have been used to find the genetic basis of a trait of interest in A. fumigatus. The discovery may help understand alternate pathways for azole resistance in A. fumigatus with implications for the molecular diagnosis of resistance and drug discovery.
The quaternary structure of eukaryotic NADH:ubiquinone oxidoreductase (complex I), the largest complex of the oxidative phosphorylation, is still mostly unresolved. Furthermore, it is unknown where transiently bound assembly factors interact with complex I. We therefore asked whether the evolution of complex I contains information about its 3D topology and the binding positions of its assembly factors. We approached these questions by correlating the evolutionary rates of eukaryotic complex I subunits using the mirror-tree method and mapping the results into a 3D representation by multidimensional scaling.
More than 60% of the evolutionary correlation among the conserved seven subunits of the complex I matrix arm can be explained by the physical distance between the subunits. The three-dimensional evolutionary model of the eukaryotic conserved matrix arm has a striking similarity to the matrix arm quaternary structure in the bacterium Thermus thermophilus (rmsd=19 Å) and supports the previous finding that in eukaryotes the N-module is turned relative to the Q-module when compared to bacteria. By contrast, the evolutionary rates contained little information about the structure of the membrane arm. A large evolutionary model of 45 subunits and assembly factors allows to predict subunit positions and interactions (rmsd = 52.6 Å). The model supports an interaction of NDUFAF3, C8orf38 and C2orf56 during the assembly of the proximal matrix arm and the membrane arm. The model further suggests a tight relationship between the assembly factor NUBPL and NDUFA2, which both have been linked to iron-sulfur cluster assembly, as well as between NDUFA12 and its paralog, the assembly factor NDUFAF2.
The physical distance between subunits of complex I is a major correlate of the rate of protein evolution in the complex I matrix arm and is sufficient to infer parts of the complex’s structure with high accuracy. The resulting evolutionary model predicts the positions of a number of subunits and assembly factors.
Eukaryotic complex I; Quaternary topology; Assembly; Mirror-tree method; Co-evolution
Translation termination is accomplished by proteins of the Class I release factor family (RF) that recognize stop codons and catalyze the ribosomal release of the newly synthesized peptide. Bacteria have two canonical RFs: RF1 recognizes UAA and UAG, RF2 recognizes UAA and UGA. Despite that these two release factor proteins are sufficient for de facto translation termination, the eukaryotic organellar RF protein family, which has evolved from bacterial release factors, has expanded considerably, comprising multiple subfamilies, most of which have not been functionally characterized or formally classified. Here, we integrate multiple sources of information to analyze the remarkable differentiation of the RF family among organelles. We document the origin, phylogenetic distribution and sequence structure features of the mitochondrial and plastidial release factors: mtRF1a, mtRF1, mtRF2a, mtRF2b, mtRF2c, ICT1, C12orf65, pRF1, and pRF2, and review published relevant experimental data. The canonical release factors (mtRF1a, mtRF2a, pRF1, and pRF2) and ICT1 are derived from bacterial ancestors, whereas the others have resulted from gene duplications of another release factor. These new RF family members have all lost one or more specific motifs relevant for bona fide release factor function but are mostly targeted to the same organelle as their ancestor. We also characterize the subset of canonical release factor proteins that bear nonclassical PxT/SPF tripeptide motifs and provide a molecular-model-based rationale for their retained ability to recognize stop codons. Finally, we analyze the coevolution of canonical RFs with the organellar genetic code. Although the RF presence in an organelle and its stop codon usage tend to coevolve, we find three taxa that encode an RF2 without using UGA stop codons, and one reverse scenario, where mamiellales green algae use UGA stop codons in their mitochondria without having a mitochondrial type RF2. For the latter, we put forward a “stop-codon reinvention” hypothesis that involves the retargeting of the plastid release factor to the mitochondrion.
release factor; translation termination; mitochondrion; plastid; evolution; genetic code
mtRF1 is a vertebrate mitochondrial protein with an unknown function that arose from a duplication of the mitochondrial release factor mtRF1a. To elucidate the function of mtRF1, we determined the positions that are conserved among mtRF1 sequences but that are different in their mtRF1a paralogs. We subsequently modeled the 3D structure of mtRF1a and mtRF1 bound to the ribosome, highlighting the structural implications of these differences to derive a hypothesis for the function of mtRF1.
Our model predicts, in agreement with the experimental data, that the 3D structure of mtRF1a allows it to recognize the stop codons UAA and UAG in the A-site of the ribosome. In contrast, we show that mtRF1 likely can only bind the ribosome when the A-site is devoid of mRNA. Furthermore, while mtRF1a will adopt its catalytic conformation, in which it functions as a peptidyl-tRNA hydrolase in the ribosome, only upon binding of a stop codon in the A-site, mtRF1 appears specifically adapted to assume this extended, peptidyl-tRNA hydrolyzing conformation in the absence of mRNA in the A-site.
We predict that mtRF1 specifically recognizes ribosomes with an empty A-site and is able to function as a peptidyl-tRNA hydrolase in those situations. Stalled ribosomes with empty A-sites that still contain a tRNA bound to a peptide chain can result from the translation of truncated, stop-codon less mRNAs. We hypothesize that mtRF1 recycles such stalled ribosomes, performing a function that is analogous to that of tmRNA in bacteria.
This article was reviewed by Dr. Eugene Koonin, Prof. Knud H. Nierhaus (nominated by Dr. Sarah Teichmann) and Dr. Shamil Sunyaev.
Class I release factor; mtRF1; mtRF1a; Mitochondrial genetic code; Translation termination; Stalled ribosome
Orthology is a central tenet of comparative genomics and ortholog identification is instrumental to protein function prediction. Major advances have been made to determine orthology relations among a set of homologous proteins. However, they depend on the comparison of individual sequences and do not take into account divergent orthologs.
We have developed an iterative orthology prediction method, Ortho-Profile, that uses reciprocal best hits at the level of sequence profiles to infer orthology. It increases ortholog detection by 20% compared to sequence-to-sequence comparisons. Ortho-Profile predicts 598 human orthologs of mitochondrial proteins from Saccharomyces cerevisiae and Schizosaccharomyces pombe with 94% accuracy. Of these, 181 were not known to localize to mitochondria in mammals. Among the predictions of the Ortho-Profile method are 11 human cytochrome c oxidase (COX) assembly proteins that are implicated in mitochondrial function and disease. Their co-expression patterns, experimentally verified subcellular localization, and co-purification with human COX-associated proteins support these predictions. For the human gene C12orf62, the ortholog of S. cerevisiae COX14, we specifically confirm its role in negative regulation of the translation of cytochrome c oxidase.
Divergent homologs can often only be detected by comparing sequence profiles and profile-based hidden Markov models. The Ortho-Profile method takes advantage of these techniques in the quest for orthologs.
In a comparative genomics study for mitochondrial ribosome-associated proteins, we identified C7orf30, the human homolog of the plant protein iojap. Gene order conservation among bacteria and the observation that iojap orthologs cannot be transferred between bacterial species predict this protein to be associated with the mitochondrial ribosome. Here, we show colocalization of C7orf30 with the large subunit of the mitochondrial ribosome using isokinetic sucrose gradient and 2D Blue Native polyacrylamide gel electrophoresis (BN-PAGE) analysis. We co-purified C7orf30 with proteins of the large subunit, and not with proteins of the small subunit, supporting interaction that is specific to the large mitoribosomal complex. Consistent with this physical association, a mitochondrial translation assay reveals negative effects of C7orf30 siRNA knock-down on mitochondrial gene expression. Based on our data we propose that C7orf30 is involved in ribosomal large subunit function. Sequencing the gene in 35 patients with impaired mitochondrial translation did not reveal disease-causing mutations in C7orf30.
Chromatin Immuno Precipitation (ChIP) profiling detects in vivo protein-DNA binding, and has revealed a large combinatorial complexity in the binding of chromatin associated proteins and their post-translational modifications. To fully explore the spatial and combinatorial patterns in ChIP-profiling data and detect potentially meaningful patterns, the areas of enrichment must be aligned and clustered, which is an algorithmically and computationally challenging task. We have developed CATCHprofiles, a novel tool for exhaustive pattern detection in ChIP profiling data. CATCHprofiles is built upon a computationally efficient implementation for the exhaustive alignment and hierarchical clustering of ChIP profiling data. The tool features a graphical interface for examination and browsing of the clustering results. CATCHprofiles requires no prior knowledge about functional sites, detects known binding patterns “ab initio”, and enables the detection of new patterns from ChIP data at a high resolution, exemplified by the detection of asymmetric histone and histone modification patterns around H2A.Z-enriched sites. CATCHprofiles' capability for exhaustive analysis combined with its ease-of-use makes it an invaluable tool for explorative research based on ChIP profiling data.
CATCHprofiles and the CATCH algorithm run on all platforms and is available for free through the CATCH website: http://catch.cmbi.ru.nl/.
User support is available by subscribing to the mailing list firstname.lastname@example.org.
MicroRNAs (miRNAs) play a fundamental role in the regulation of gene expression by translational repression or target mRNA degradation. Regulatory elements in miRNA promoters are less well studied, but may reveal a link between their expression and a specific cell type.
To explore this link in myeloid cells, miRNA expression profiles were generated from monocytes and dendritic cells (DCs). Differences in miRNA expression among monocytes, DCs and their stimulated progeny were observed. Furthermore, putative promoter regions of miRNAs that are significantly up-regulated in DCs were screened for Transcription Factor Binding Sites (TFBSs) based on TFBS motif matching score, the degree to which those TFBSs are over-represented in the promoters of the up-regulated miRNAs, and the extent of conservation of the TFBSs in mammals.
Analysis of evolutionarily conserved TFBSs in DC promoters revealed preferential clustering of sites within 500 bp upstream of the precursor miRNAs and that many mRNAs of cognate TFs of the conserved TFBSs were indeed expressed in the DCs. Taken together, our data provide evidence that selected miRNAs expressed in DCs have evolutionarily conserved TFBSs relevant to DC biology in their promoters.
Motivation: The intensification of DNA sequencing will increasingly unveil uncharacterized species with potential alternative genetic codes. A total of 0.65% of the DNA sequences currently in Genbank encode their proteins with a variant genetic code, and these exceptions occur in many unrelated taxa.
Results: We introduce FACIL (Fast and Accurate genetic Code Inference and Logo), a fast and reliable tool to evaluate nucleic acid sequences for their genetic code that detects alternative codes even in species distantly related to known organisms. To illustrate this, we apply FACIL to a set of mitochondrial genomic contigs of Globobulimina pseudospinescens. This foraminifer does not have any sequenced close relative in the databases, yet we infer its alternative genetic code with high confidence values. Results are intuitively visualized in a Genetic Code Logo.
Availability and implementation: FACIL is available as a web-based service at http://www.cmbi.ru.nl/FACIL/ and as a stand-alone program.
Supplementary information: Supplementary data are available at Bioinformatics online.
It is generally accepted that hydrogenosomes (hydrogen-producing organelles) evolved from a mitochondrial ancestor. However, until recently, only indirect evidence for this hypothesis was available. Here, we present the almost complete genome of the hydrogen-producing mitochondrion of the anaerobic ciliate Nyctotherus ovalis and show that, except for the notable absence of genes encoding electron transport chain components of Complexes III, IV, and V, it has a gene content similar to the mitochondrial genomes of aerobic ciliates. Analysis of the genome of the hydrogen-producing mitochondrion, in combination with that of more than 9,000 genomic DNA and cDNA sequences, allows a preliminary reconstruction of the organellar metabolism. The sequence data indicate that N. ovalis possesses hydrogen-producing mitochondria that have a truncated, two step (Complex I and II) electron transport chain that uses fumarate as electron acceptor. In addition, components of an extensive protein network for the metabolism of amino acids, defense against oxidative stress, mitochondrial protein synthesis, mitochondrial protein import and processing, and transport of metabolites across the mitochondrial membrane were identified. Genes for MPV17 and ACN9, two hypothetical proteins linked to mitochondrial disease in humans, were also found. The inferred metabolism is remarkably similar to the organellar metabolism of the phylogenetically distant anaerobic Stramenopile Blastocystis. Notably, the Blastocystis organelle and that of the related flagellate Proteromonas lacertae also lack genes encoding components of Complexes III, IV, and V. Thus, our data show that the hydrogenosomes of N. ovalis are highly specialized hydrogen-producing mitochondria.
hydrogenosome; mitochondrion; horizontal gene transfer; evolution; adaptation; Nyctotherus
Here we show that c17orf42, hereafter TEFM (transcription elongation factor of mitochondria), makes a critical contribution to mitochondrial transcription. Inactivation of TEFM in cells by RNA interference results in respiratory incompetence owing to decreased levels of H- and L-strand promoter-distal mitochondrial transcripts. Affinity purification of TEFM from human mitochondria yielded a complex comprising mitochondrial transcripts, mitochondrial RNA polymerase (POLRMT), pentatricopeptide repeat domain 3 protein (PTCD3), and a putative DEAD-box RNA helicase, DHX30. After RNase treatment only POLRMT remained associated with TEFM, and in human cultured cells TEFM formed foci coincident with newly synthesized mitochondrial RNA. Based on deletion mutants, TEFM interacts with the catalytic region of POLRMT, and in vitro TEFM enhanced POLRMT processivity on ss- and dsDNA templates. TEFM contains two HhH motifs and a Ribonuclease H fold, similar to the nuclear transcription elongation regulator Spt6. These findings lead us to propose that TEFM is a mitochondrial transcription elongation factor.
Motivation: Protein–protein interaction (PPI) networks are a valuable resource for the interpretation of genomics data. However, such networks have interaction enrichment biases for proteins that are often studied. These biases skew quantitative results from comparing PPI networks with genomics data. Here, we introduce an approach named physical interaction enrichment (PIE) to eliminate these biases.
Methodology: PIE employs a normalization that ensures equal node degree (edge) distribution of a test set and of the random networks it is compared with. It quantifies whether a set of proteins have more interactions between themselves than proteins in random networks, and can therewith be regarded as physically cohesive.
Results: Among other datasets, we applied PIE to genetic morbid disease (GMD) genes and to genes whose expression is induced upon infection with human-metapneumovirus (HMPV). Both sets contain proteins that are often studied and that have relatively many interactions in the PPI network. Although interactions between proteins of both sets are found to be overrepresented in PPI networks, the GMD proteins are not more likely to interact with each other than random proteins when this overrepresentation is taken into account. In contrast the HMPV-induced genes, representing a biologically more coherent set, encode proteins that do tend to interact with each other and can be used to predict new HMPV-induced genes. By handling biases in PPI networks, PIE can be a valuable tool to quantify the degree to which a set of genes are involved in the same biological process.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Heterozygous mutations in p63 are associated with split hand/foot malformations (SHFM), orofacial clefting, and ectodermal abnormalities. Elucidation of the p63 gene network that includes target genes and regulatory elements may reveal new genes for other malformation disorders. We performed genome-wide DNA–binding profiling by chromatin immunoprecipitation (ChIP), followed by deep sequencing (ChIP–seq) in primary human keratinocytes, and identified potential target genes and regulatory elements controlled by p63. We show that p63 binds to an enhancer element in the SHFM1 locus on chromosome 7q and that this element controls expression of DLX6 and possibly DLX5, both of which are important for limb development. A unique micro-deletion including this enhancer element, but not the DLX5/DLX6 genes, was identified in a patient with SHFM. Our study strongly indicates disruption of a non-coding cis-regulatory element located more than 250 kb from the DLX5/DLX6 genes as a novel disease mechanism in SHFM1. These data provide a proof-of-concept that the catalogue of p63 binding sites identified in this study may be of relevance to the studies of SHFM and other congenital malformations that resemble the p63-associated phenotypes.
Mammalian embryonic development requires precise control of gene expression in the right place at the right time. One level of control of gene expression is through cis-regulatory elements controlled by transcription factors. Deregulation of gene expression by mutations in such cis-regulatory elements has been described in developmental disorders. Heterozygous mutations in the transcription factor p63 are found in patients with limb malformations, cleft lip/palate, and defects in skin and other epidermal appendages, through disruption of normal ectodermal development during embryogenesis. We reasoned that the identification of target genes and cis-regulatory elements controlled by p63 would provide candidate genes for defects arising from abnormally regulated ectodermal development. To test our hypothesis, we carried out a genome-wide binding site analysis and identified a large number of target genes and regulatory elements regulated by p63. We further showed that one of these regulatory elements controls expression of DLX6 and possibly DLX5 in the apical ectodermal ridge in the developing limbs. Loss of this element through a micro-deletion was associated with split hand foot malformation (SHFM1). The list of p63 binding sites provides a resource for the identification of mutations that cause ectodermal dysplasias and malformations in humans.
Bioinformatic analysis classifies the human protein encoded by immature colon carcinoma transcript-1 (ICT1) as one of a family of four putative mitochondrial translation release factors. However, this has not been supported by any experimental evidence. As only a single member of this family, mtRF1a, is required to terminate the synthesis of all 13 mitochondrially encoded polypeptides, the true physiological function of ICT1 was unclear. Here, we report that ICT1 is an essential mitochondrial protein, but unlike the other family members that are matrix-soluble, ICT1 has become an integral component of the human mitoribosome. Release-factor assays show that although ICT1 has retained its ribosome-dependent PTH activity, this is codon-independent; consistent with its loss of both domains that promote codon recognition in class-I release factors. Mutation of the GGQ domain common to ribosome-dependent PTHs causes a loss of activity in vitro and, crucially, a loss of cell viability, in vivo. We suggest that ICT1 may be essential for hydrolysis of prematurely terminated peptidyl-tRNA moieties in stalled mitoribosomes.
mitoribosomes; peptidyl-tRNA hydrolase; translation release factor
The alpha-kinase family represents a class of atypical protein kinases that display little sequence similarity to conventional protein kinases. Early studies on myosin heavy chain kinases in Dictyostelium discoideum revealed their unusual propensity to phosphorylate serine and threonine residues in the context of an alpha-helix. Although recent studies show that some members of this family can also phosphorylate residues in non-helical regions, the name alpha-kinase has remained. During evolution, the alpha-kinase domains combined with many different functional subdomains such as von Willebrand factor-like motifs (vWKa) and even cation channels (TRPM6 and TRPM7). As a result, these kinases are implicated in a large variety of cellular processes such as protein translation, Mg2+ homeostasis, intracellular transport, cell migration, adhesion, and proliferation. Here, we review the current state of knowledge on different members of this kinase family and discuss the potential use of alpha-kinases as drug targets in diseases such as cancer.
Signal transduction; Protein phosphorylation; Atypical kinases; Alpha-kinase family
Hydrogenosomes are organelles that produce molecular hydrogen and ATP. The broad phylogenetic distribution of their hosts suggests that the hydrogenosomes of these organisms evolved several times independently from the mitochondria of aerobic progenitors. Morphology and 18S rRNA phylogeny suggest that the microaerophilic amoeboflagellate Psalteriomonas lanterna, which possesses hydrogenosomes and elusive "modified mitochondria", belongs to the Heterolobosea, a taxon that consists predominantly of aerobic, mitochondriate organisms. This taxon is rather unrelated to taxa with hitherto studied hydrogenosomes.
Electron microscopy of P. lanterna flagellates reveals a large globule in the centre of the cell that is build up from stacks of some 20 individual hydrogenosomes. The individual hydrogenosomes are surrounded by a double membrane that encloses a homogeneous, dark staining matrix lacking cristae. The "modified mitochondria" are found in the cytoplasm of the cell and are surrounded by 1-2 cisterns of rough endoplasmatic reticulum, just as the mitochondria of certain related aerobic Heterolobosea. The ultrastructure of the "modified mitochondria" and hydrogenosomes is very similar, and they have the same size distribution as the hydrogenosomes that form the central stack.
The phylogenetic analysis of selected EST sequences (Hsp60, Propionyl-CoA carboxylase) supports the phylogenetic position of P. lanterna close to aerobic Heterolobosea (Naegleria gruberi). Moreover, this analysis also confirms the identity of several mitochondrial or hydrogenosomal key-genes encoding proteins such as a Hsp60, a pyruvate:ferredoxin oxidoreductase, a putative ADP/ATP carrier, a mitochondrial complex I subunit (51 KDa), and a [FeFe] hydrogenase.
Comparison of the ultrastructure of the "modified mitochondria" and hydrogenosomes strongly suggests that both organelles are just two morphs of the same organelle. The EST studies suggest that the hydrogenosomes of P. lanterna are physiologically similar to the hydrogenosomes of Trichomonas vaginalis and Trimastix pyriformis. Phylogenetic analysis of the ESTs confirms the relationship of P. lanterna with its aerobic relative, the heterolobosean amoeboflagellate Naegleria gruberi, corroborating the evolution of hydrogenosomes from a common, mitochondriate ancestor.
The human mitochondrial proteome is shown to have expanded due to duplication of protein encoding genes and re-localization of these duplicated proteins.
Mitochondria are highly complex, membrane-enclosed organelles that are essential to the eukaryotic cell. The experimental elucidation of organellar proteomes combined with the sequencing of complete genomes allows us to trace the evolution of the mitochondrial proteome.
We present a systematic analysis of the evolution of mitochondria via gene duplication in the human lineage. The most common duplications are intra-mitochondrial, in which the ancestral gene and the daughter genes encode mitochondrial proteins. These duplications significantly expanded carbohydrate metabolism, the protein import machinery and the calcium regulation of mitochondrial activity. The second most prevalent duplication, inter-compartmental, extended the catalytic as well as the RNA processing repertoire by the novel mitochondrial localization of the protein encoded by one of the daughter genes. Evaluation of the phylogenetic distribution of N-terminal targeting signals suggests a prompt gain of the novel localization after inter-compartmental duplication. Relocalized duplicates are more often expressed in a tissue-specific manner relative to intra-mitochondrial duplicates and mitochondrial proteins in general. In a number of cases, inter-compartmental duplications can be observed in parallel in yeast and human lineages leading to the convergent evolution of subcellular compartments.
One-to-one human-yeast orthologs are typically restricted to their ancestral subcellular localization. Gene duplication relaxes this constraint on the cellular location, allowing nascent proteins to be relocalized to other compartments. We estimate that the mitochondrial proteome expanded at least 50% since the common ancestor of human and yeast.
There are thousands of very diverse ciliate species from which only a handful mitochondrial genomes have been studied so far. These genomes are rather similar because the ciliates analysed (Tetrahymena spp. and Paramecium aurelia) are closely related. Here we study the mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus. These ciliates are only distantly related to Tetrahymena spp. and Paramecium aurelia, but more closely related to Nyctotherus ovalis, which possesses a hydrogenosomal (mitochondrial) genome.
The linear mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus were sequenced and compared with the mitochondrial genomes of several Tetrahymena species, Paramecium aurelia and the partially sequenced mitochondrial genome of the anaerobic ciliate Nyctotherus ovalis. This study reports new features such as long 5'gene extensions of several mitochondrial genes, extremely long cox1 and cox2 open reading frames and a large repeat in the middle of the linear mitochondrial genome. The repeat separates the open reading frames into two blocks, each having a single direction of transcription, from the repeat towards the ends of the chromosome. Although the Euplotes mitochondrial gene content is almost identical to that of Paramecium and Tetrahymena, the order of the genes is completely different. In contrast, the 33273 bp (excluding the repeat region) piece of the mitochondrial genome that has been sequenced in both Euplotes species exhibits no difference in gene order. Unexpectedly, many of the mitochondrial genes of E. minuta encoding ribosomal proteins possess N-terminal extensions that are similar to mitochondrial targeting signals.
The mitochondrial genomes of the hypotrichous ciliates Euplotes minuta and Euplotes crassus are rather different from the previously studied genomes. Many genes are extended in size compared to mitochondrial genes from other sources.
In species with large effective population sizes, highly expressed genes tend to be encoded by codons with highly abundant cognate tRNAs to maximize translation rate. However, there has been little evidence for a similar bias of synonymous codons in highly expressed human genes. Here, we ask instead whether there is evidence for the selection for codons associated with low abundance tRNAs. Rather than averaging the codon usage of complete genes, we scan the genes for windows with deviating codon usage. We show that there is a significant over representation of human genes that contain clusters of codons with low abundance cognate tRNAs. We name these regions, which on average have a 50% reduction in the amount of cognate tRNA available compared to the remainder of the gene, RTS (rare tRNA score) clusters. We observed a significant reduction in the substitution rate between the human RTS clusters and their orthologous chimp sequence, when compared to non–RTS cluster sequences. Overall, the genes with an RTS cluster have higher tissue specificity than the non–RTS cluster genes. Furthermore, these genes are functionally enriched for transcription regulation. As genes that regulate transcription in lower eukaryotes are known to be involved in translation on demand, this suggests that the mechanism of translation level expression regulation also exists within the human genome.
The degeneracy of the genetic code means that many amino acids are encoded by not one, but a range of codons. In bacteria and yeast, it is known that the choice of codons used can be beneficial (or detrimental) to the gene function. As humans have a relatively small effective population size, and the efficiency of selection to purge mutations of mild deleterious effect decreases as population size decreases, it has been assumed that the benefit/cost of codons is not large enough to have a measurable effect on codon choice. Here we show that codons with the lowest amount of tRNA are clustered in gene sequences more often than anticipated. The genes containing these clusters were found to have specific functions in gene expression. Comparisons to known bacterial and yeast processes suggest a translation level mechanism for the regulation of protein expression in human genes. Thus, our investigation highlights the potential for the presence of a novel regulatory mechanism in human genes.
Motivation: Most microbial species can not be cultured in the laboratory. Metagenomic sequencing may still yield a complete genome if the sequenced community is enriched and the sequencing coverage is high. However, the complexity in a natural population may cause the enrichment culture to contain multiple related strains. This diversity can confound existing strict assembly programs and lead to a fragmented assembly, which is unnecessary if we have a related reference genome available that can function as a scaffold.
Results: Here, we map short metagenomic sequencing reads from a population of strains to a related reference genome, and compose a genome that captures the consensus of the population's sequences. We show that by iteration of the mapping and assembly procedure, the coverage increases while the similarity with the reference genome decreases. This indicates that the assembly becomes less dependent on the reference genome and approaches the consensus genome of the multi-strain population.
Supplementary Information: Supplementary data are available at Bioinformatics online.
An investigation of metabolic networks in E. coli and S. cerevisiae reveals that asymmetric protein interactions affect gene expression, the relative effect of gene-knockouts and genome evolution.
The relationships between proteins are often asymmetric: one protein (A) depends for its function on another protein (B), but the second protein does not depend on the first. In metabolic networks there are multiple pathways that converge into one central pathway. The enzymes in the converging pathways depend on the enzymes in the central pathway, but the enzymes in the latter do not depend on any specific enzyme in the converging pathways. Asymmetric relations are analogous to the “if->then” logical relation where A implies B, but B does not imply A (A->B).
We show that the majority of relationships between enzymes in metabolic flux models of metabolism in Escherichia coli and Saccharomyces cerevisiae are asymmetric. We show furthermore that these asymmetric relationships are reflected in the expression of the genes encoding those enzymes, the effect of gene knockouts and the evolution of genomes. From the asymmetric relative dependency, one would expect that the gene that is relatively independent (B) can occur without the other dependent gene (A), but not the reverse. Indeed, when only one gene of an A->B pair is expressed, is essential, is present in a genome after an evolutionary gain or loss, it tends to be the independent gene (B). This bias is strongest for genes encoding proteins whose asymmetric relationship is evolutionarily conserved.
The asymmetric relations between proteins that arise from the system properties of metabolic networks affect gene expression, the relative effect of gene knockouts and genome evolution in a predictable manner.