The rate of mutations in eukaryotes depends on a plethora of factors and is not immediately derived from the fidelity of DNA polymerases (Pols). Replication of chromosomes containing the anti-parallel strands of duplex DNA occurs through the copying of leading and lagging strand templates by a trio of Pols α, δ and ε, with the assistance of Pol ζ and Y-family Pols at difficult DNA template structures or sites of DNA damage. The parameters of the synthesis at a given location are dictated by the quality and quantity of nucleotides in the pools, replication fork architecture, transcription status, regulation of Pol switches, and structure of chromatin. The result of these transactions is a subject of survey and editing by DNA repair.
DNA polymerases; nucleotide pools; mutagenesis; Okazaki fragments
Aberrant activation of receptor tyrosine kinases (RTKs) is a common feature of many cancer cells. It was previously suggested that the mechanisms of kinase activation in cancer might be linked to transitions between active and inactive states. Here we estimate the effects of single and double cancer mutations on the stability of active and inactive states of the kinase domains from different RTKs. We show that singleton cancer mutations destabilize active and inactive states, however inactive states are destabilized more than the active ones leading to kinase activation. We show that there exists a relationship between the estimate of oncogenic potential of cancer mutation and kinase activation. Namely, more frequent mutations have a higher activating effect, which might allow us to predict the activating effect of the mutations from the mutation spectra. Independent evolutionary analysis of mutation spectra complements this observation and finds the same frequency threshold defining mutation hot spots. We analyze double mutations and report a positive epistasis and additional advantage of doublets with respect to cancer cell fitness. The activation mechanisms of double mutations differ from those of single mutations and double mutation spectrum is found to be dissimilar to the mutation spectrum of singletons.
cancer mutation; receptor tyrosine kinase; protein structure; kinase activation; mutation spectra; double mutations
Genetic information should be accurately transmitted from cell to cell; conversely, the adaptation in evolution and disease is fueled by mutations. In the case of cancer development, multiple genetic changes happen in somatic diploid cells. Most classic studies of the molecular mechanisms of mutagenesis have been performed in haploids. We demonstrate that the parameters of the mutation process are different in diploid cell populations. The genomes of drug-resistant mutants induced in yeast diploids by base analog 6-hydroxylaminopurine (HAP) or AID/APOBEC cytosine deaminase PmCDA1 from lamprey carried a stunning load of thousands of unselected mutations. Haploid mutants contained almost an order of magnitude fewer mutations. To explain this, we propose that the distribution of induced mutation rates in the cell population is uneven. The mutants in diploids with coincidental mutations in the two copies of the reporter gene arise from a fraction of cells that are transiently hypersensitive to the mutagenic action of a given mutagen. The progeny of such cells were never recovered in haploids due to the lethality caused by the inactivation of single-copy essential genes in cells with too many induced mutations. In diploid cells, the progeny of hypersensitive cells survived, but their genomes were saturated by heterozygous mutations. The reason for the hypermutability of cells could be transient faults of the mutation prevention pathways, like sanitization of nucleotide pools for HAP or an elevated expression of the PmCDA1 gene or the temporary inability of the destruction of the deaminase. The hypothesis on spikes of mutability may explain the sudden acquisition of multiple mutational changes during evolution and carcinogenesis.
Evolution and carcinogenesis are driven by mutations. Cells maintain constant mutation rates and can afford only transient mutagenesis bursts for adaptation. The nature of the mutational avalanches is not very clear. We sequenced the whole genomes of mutants induced in haploid and diploid yeast by nucleobase analog HAP and by DNA editing cytosine deaminase. Mutants selected in diploids are saturated with passenger mutations. Far fewer mutations are found in haploid mutants. Treatment with a mutagen without selection results in intermediate mutagenesis. The observed transient hypermutability of diploids under mutagenic insult helps to explain the wellspring of mutations that arise during evolution and carcinogenesis.
A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or ‘promiscuous’). These promiscuous domains are typically involved in protein–protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.
mobile domain; promiscuous domain; domain network; domain architecture; domain evolution
Kindlin-3 is a novel integrin activator in hematopoietic cells and its deficiency leads to immune problems and severe bleeding, known as LAD-III. Our current understanding of Kindlin-3 function primarily relies on analysis of animal models or cell lines.
To understand the functions of Kindlin-3 in human primary blood cells.
Here we analyze primary and immortalized hematopoietic cells obtained from a new LAD-III patient with immune problems, bleeding, a history of anemia and abnormally shaped red blood cells.
Patient’s WBC and platelets showed defect in agonist induced integrin activation and botrocetin induced platelet agglutination. Primary leukocytes from this patient exhibited abnormal activation of beta1 integrin. Integrin activation defects were responsible for observed deficiency of botrocetin induced platelet response. Analysis of patient’s genomic DNA revealed a novel mutation in kindlin-3 gene. The mutation abolished Kindlin-3 expression in primary WBC and platelets due to abnormal splicing. Kindlin-3 is expressed in erythrocytes and its deficiency proposed to lead to abnormal shape of RBC. Immortalized patient’s WBCs expressed a truncated form of Kindlin-3 which was not sufficient to support integrin activation. Expression of Kindlin-3 cDNA in immortalized patient’s WBCs rescued integrin activation defects while overexpression of the truncated form did not.
Kindlin-3 deficiency impairs integrin function, including activation of beta 1 integrin.
Abnormalities in GPIb-IX function in kindlin-3 deficient platelets are secondary to integrin defects.
Region of Kindlin-3 encoded by Exon 11 is crucial for its ability to activate integrins in humans.
Integrins; Kindlins; Leukocyte Adhesion Deficiency; Platelets; Red Blood Cells; White Blood Cells
We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes. We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences. Although the sequences of most lincRNAs are much less strongly conserved than protein sequences, the extent of orthology between the lincRNomes is unexpectedly high, with 60 to 70% of the lincRNA genes shared between human and mouse. The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.
Genome analysis of humans and other mammals reveals a surprisingly small number of protein-coding genes, only slightly over 20,000 (although the diversity of actual proteins is substantially augmented by alternative transcription and alternative splicing). Recent analysis of the mammalian genomes and transcriptomes, in particular, using the RNAseq technology, shows that, in addition to protein-coding genes, mammalian genomes encode many long non-coding RNAs. For some of these transcripts, various regulatory functions have been demonstrated, but on the whole the repertoire of long non-coding RNAs remains poorly characterized. We compared the identified long intergenic non-coding (linc)RNAs from human and mouse, and employed a specially developed statistical technique to estimate the size and evolutionary conservation of the human and mouse lincRNomes. The estimates show that there are at least twice as many human and mouse lincRNAs than there are protein-coding genes. Moreover, about two third of the lincRNA genes appear to be conserved between human and mouse, implying thousands of conserved but still uncharacterized functions.
We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 Mb and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than 1/3 of Daphnia’s genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The co-expansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes – including many additional loci within sequenced regions that are otherwise devoid of annotations – are the most responsive genes to ecological challenges.
Clusters of localized hypermutation in human breast cancer genomes, named “kataegis” (from the Greek for thunderstorm), are hypothesized to result from multiple cytosine deaminations catalyzed by AID/APOBEC proteins. However, a direct link between APOBECs and kataegis is still lacking. We have sequenced the genomes of yeast mutants induced in diploids by expression of the gene for PmCDA1, a hypermutagenic deaminase from sea lamprey. Analysis of the distribution of 5,138 induced mutations revealed localized clusters very similar to those found in tumors. Our data provide evidence that unleashed cytosine deaminase activity is an evolutionary conserved, prominent source of genome-wide kataegis events.
This article was reviewed by: Professor Sandor Pongor, Professor Shamil R. Sunyaev, and Dr Vladimir Kuznetsov.
APOBEC; Deaminase; Mutation; Kataegis; Cancer; Diploid yeast; Hypermutation
In order to maintain visual sensitivity at all light levels, the vertebrate eye possesses a mechanism to regenerate the visual pigment chromophore 11-cis retinal in the dark enzymatically, unlike in all other taxa, which rely on photoisomerization. This mechanism is termed the visual cycle and is localized to the retinal pigment epithelium (RPE), a support layer of the neural retina. Speculation has long revolved around whether more primitive chordates, such as tunicates and cephalochordates, anticipated this feature. The two key enzymes of the visual cycle are RPE65, the visual cycle all-trans retinyl ester isomerohydrolase, and lecithin:retinol acyltransferase (LRAT), which generates RPE65’s substrate. We hypothesized that the origin of the vertebrate visual cycle is directly connected to an ancestral carotenoid oxygenase acquiring a new retinyl ester isomerohydrolase function. Our phylogenetic analyses of the RPE65/BCMO and N1pC/P60 (LRAT) superfamilies show that neither RPE65 nor LRAT orthologs occur in tunicates (Ciona) or cephalochordates (Branchiostoma), but occur in Petromyzon marinus (Sea Lamprey), a jawless vertebrate. The closest homologs to RPE65 in Ciona and Branchiostoma lacked predicted functionally diverged residues found in all authentic RPE65s, but lamprey RPE65 contained all of them. We cloned RPE65 and LRATb cDNAs from lamprey RPE and demonstrated appropriate enzymatic activities. We show that Ciona ß-carotene monooxygenase a (BCMOa) (previously annotated as an RPE65) has carotenoid oxygenase cleavage activity but not RPE65 activity. We verified the presence of RPE65 in lamprey RPE by immunofluorescence microscopy, immunoblot and mass spectrometry. On the basis of these data we conclude that the crucial transition from the typical carotenoid double bond cleavage functionality (BCMO) to the isomerohydrolase functionality (RPE65), coupled with the origin of LRAT, occurred subsequent to divergence of the more primitive chordates (tunicates, etc.) in the last common ancestor of the jawless and jawed vertebrates.
Among thousands of long non-coding RNAs (lncRNAs) only a small subset is functionally characterized and the functional annotation of lncRNAs on the genomic scale remains inadequate. In this study we computationally characterized two functionally different parts of human lncRNAs transcriptome based on their ability to bind the polycomb repressive complex, PRC2. This classification is enabled by the fact that while all lncRNAs constitute a diverse set of sequences, the classes of PRC2-binding and PRC2 non-binding lncRNAs possess characteristic combinations of sequence-structure patterns and, therefore, can be separated within the feature space. Based on the specific combination of features, we built several machine-learning classifiers and identified the SVM-based classifier as the best performing. We further showed that the SVM-based classifier is able to generalize on the independent data sets. We observed that this classifier, trained on the human lncRNAs, can predict up to 59.4% of PRC2-binding lncRNAs in mice. This suggests that, despite the low degree of sequence conservation, many lncRNAs play functionally conserved biological roles.
Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence–absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.
Maximum likelihood; expectation-maximization; intron evolution; ancestral reconstruction; eukaryotic gene structure
It was proposed that if some mRNA characteristics resulted in a low efficiency of termination signal, an additional closely located stop codon (tandem stop codons) could be used to prevent the harmful readthrough. However, the role of tandem terminators in higher eukaryotes was not verified and remains hypothetical. In this work the sequence features of Arabidopsis thaliana and Oryza sativa mRNAs were analyzed. It was found that plant mRNAs with UGA terminator were characterized by a higher frequency of nonsense codons in the first triplet position of 3′-UTR that could result from a weak natural selection for “reserve” stop signal. Interestingly, the presence of tandem stop codons positively correlated with a specific amino acid composition in the C-terminal position of the encoded proteins. In particular, C-terminal glycine positively correlated with significantly higher frequencies of reserve terminators at the beginning positions of 3′-UTR in UGA-containing mRNAs. This finding coincides with some earlier observations concerning the role of glycine and its codons in inefficient termination of translation and recoding (e.g., 2A oligopeptide).
mRNA; Arabidopsis thaliana; Oryza sativa; stop codon; tandem terminators; readthrough
The two types of eukaryotic spliceosomal introns, U2 and U12, possess different splice signals and are excised by distinct spliceosomes. The nature of the primordial introns remains uncertain. A comparison of the amino acid distributions at insertion sites of introns that retained their positions throughout eukaryotic evolution with the distributions for human and Arabidopsis thaliana U2 and U12 introns reveals close similarity with U2 but not U12. Thus, the primordial spliceosomal introns were, most likely, U2-type.
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
long noncoding RNA; ncRNA; RNA expression; genomic alignments; introns; RNA folding
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.
In eukaryotes, protein-coding genes are interrupted by non-coding introns. The intron densities widely differ, from 6–7 introns per kilobase of coding sequence in vertebrates, some invertebrates and plants, to only a few introns across the entire genome in many unicellular forms. We applied a robust statistical methodology, Markov Chain Monte Carlo, to reconstruct the history of intron gain and loss throughout the evolution of eukaryotes using a set of 245 homologous genes from 99 genomes that represent the diversity of eukaryotes. Intron-rich ancestors were confidently inferred for each major eukaryotic group including 53% to 74% of the human intron density for the last eukaryotic common ancestor, and 120% to 130% of the human value for the last common ancestor of animals. Evolution of eukaryotic genes involved primarily intron loss, with substantial gain only at the bases of several major branches including plants and animals. Thus, the common ancestor of all extant eukaryotes was a complex organism with a gene architecture resembling those in multicellular organisms. The line of descent from the last common ancestor to mammals was an uninterrupted intron-rich state that, given the error-prone splicing in intron-rich organisms, was conducive to the elaboration of functional alternative splicing.
Editing deaminases have a pivotal role in cellular physiology. A notable member of this superfamily, APOBEC3G (A3G), restricts retroviruses, and Activation Induced Deaminase (AID) generates antibody diversity by localized deamination of cytosines in DNA. Unconstrained deaminase activity can cause genome-wide mutagenesis and cancer. The mechanisms that protect the genomic DNA from the undesired action of deaminases are unknown. Using the in vitro deamination assays and expression of A3G in yeast, we show that replication protein A (RPA), the eukaryotic single-stranded DNA (ssDNA) binding protein, severely inhibits the deamination activity and processivity of A3G.
We found that mutations induced by A3G in the yeast genomic reporter are changes of a single nucleotide. This is unexpected because of the known property of A3G to catalyze multiple deaminations upon one substrate encounter event in vitro. The addition of recombinant RPA to the oligonucleotide deamination assay severely inhibited A3G activity. Additionally, we reveal the inverse correlation between RPA concentration and the number of deaminations induced by A3G in vitro on long ssDNA regions. This resembles the “hit and run” single base substitution events observed in yeast.
Our data suggest that RPA is a plausible antimutator factor limiting the activity and processivity of editing deaminases in the model yeast system. Because of the similar antagonism of yeast RPA and human RPA with A3G in vitro, we propose that RPA plays a role in the protection of the human genome cell from A3G and other deaminases when they are inadvertently diverged from their natural targets. We propose a model where RPA serves as one of the guardians of the genome that protects ssDNA from the destructive processive activity of deaminases by non-specific steric hindrance.
The deaminase-like fold includes, in addition to nucleic acid/nucleotide deaminases, several catalytic domains such as the JAB domain, and others involved in nucleotide and ADP-ribose metabolism. Using sensitive sequence and structural comparison methods, we develop a comprehensive natural classification of the deaminase-like fold and show that its ancestral version was likely to operate on nucleotides or nucleic acids. Consequently, we present evidence that a specific group of JAB domains are likely to possess a DNA repair function, distinct from the previously known deubiquitinating peptidase activity. We also identified numerous previously unknown clades of nucleic acid deaminases. Using inference based on contextual information, we suggest that most of these clades are toxin domains of two distinct classes of bacterial toxin systems, namely polymorphic toxins implicated in bacterial interstrain competition and those that target distantly related cells. Genome context information suggests that these toxins might be delivered via diverse secretory systems, such as Type V, Type VI, PVC and a novel PrsW-like intramembrane peptidase-dependent mechanism. We propose that certain deaminase toxins might be deployed by diverse extracellular and intracellular pathogens as also endosymbionts as effectors targeting nucleic acids of host cells. Our analysis suggests that these toxin deaminases have been acquired by eukaryotes on several independent occasions and recruited as organellar or nucleo-cytoplasmic RNA modifiers, operating on tRNAs, mRNAs and short non-coding RNAs, and also as mutators of hyper-variable genes, viruses and selfish elements. This scenario potentially explains the origin of mutagenic AID/APOBEC-like deaminases, including novel versions from Caenorhabditis, Nematostella and diverse algae and a large class of fast-evolving fungal deaminases. These observations greatly expand the distribution of possible unidentified mutagenic processes catalyzed by nucleic acid deaminases.
Accurate estimation of the divergence time of the extant eukaryotes is a fundamentally important but extremely difficult problem owing primarily to gross violations of the molecular clock at long evolutionary distances and the lack of appropriate calibration points close to the date of interest. These difficulties are intrinsic to the dating of ancient divergence events and are reflected in the large discrepancies between estimates obtained with different approaches. Estimates of the age of Last Eukaryotic Common Ancestor (LECA) vary approximately twofold, from ~1,100 million years ago (Mya) to ~2,300 Mya.
We applied the genome-wide analysis of rare genomic changes associated with conserved amino acids (RGC_CAs) and used several independent techniques to obtain date estimates for the divergence of the major lineages of eukaryotes with calibration intervals for insects, land plants and vertebrates. The results suggest an early divergence of monocot and dicot plants, approximately 340 Mya, raising the possibility of plant-insect coevolution. The divergence of bilaterian animal phyla is estimated at ~400-700 Mya, a range of dates that is consistent with cladogenesis immediately preceding the Cambrian explosion. The origin of opisthokonts (the supergroup of eukaryotes that includes metazoa and fungi) is estimated at ~700-1,000 Mya, and the age of LECA at ~1,000-1,300 Mya. We separately analyzed the red algal calibration interval which is based on single fossil. This analysis produced time estimates that were systematically older compared to the other estimates. Nevertheless, the majority of the estimates for the age of the LECA using the red algal data fell within the 1,200-1,400 Mya interval.
The inference of a "young LECA" is compatible with the latest of previously estimated dates and has substantial biological implications. If these estimates are valid, the approximately 1 to 1.4 billion years of evolution of eukaryotes that is open to comparative-genomic study probably was preceded by hundreds of millions years of evolution that might have included extinct diversity inaccessible to comparative approaches.
This article was reviewed by William Martin, Herve Philippe (nominated by I. King Jordan), and Romain Derelle.
bilateria; opisthokonts; angiosperms; last eukaryotic common ancestor; molecular dating
Yeast DNA polymerase ε (Pol ε) is a highly accurate and processive enzyme that participates in nuclear DNA replication of the leading strand template. In addition to a large subunit (Pol2) harboring the polymerase and proofreading exonuclease active sites, Pol ε also has one essential subunit (Dpb2) and two smaller, non-essential subunits (Dpb3 and Dpb4) whose functions are not fully understood. To probe the functions of Dpb3 and Dpb4, here we investigate the consequences of their absence on the biochemical properties of Pol ε in vitro and on genome stability in vivo. The fidelity of DNA synthesis in vitro by purified Pol2/Dpb2, i.e. lacking Dpb3 and Dpb4, is comparable to the four-subunit Pol ε holoenzyme. Nonetheless, deletion of DPB3 and DPB4 elevates spontaneous frameshift and base substitution rates in vivo, to the same extent as the loss of Pol ε proofreading activity in a pol2-4 strain. In contrast to pol2-4, however, the dpb3Δdpb4Δ does not lead to a synergistic increase of mutation rates with defects in DNA mismatch repair. The increased mutation rate in dpb3Δdpb4Δ strains is partly dependent on REV3, as well as the proofreading capacity of Pol δ. Finally, biochemical studies demonstrate that the absence of Dpb3 and Dpb4 destabilizes the interaction between Pol ε and the template DNA during processive DNA synthesis and during processive 3′ to 5′exonucleolytic degradation of DNA. Collectively, these data suggest a model wherein Dpb3 and Dpb4 do not directly influence replication fidelity per se, but rather contribute to normal replication fork progression. In their absence, a defective replisome may more frequently leave gaps on the leading strand that are eventually filled by Pol ζ or Pol δ, in a post-replication process that generates errors not corrected by the DNA mismatch repair system.
The high fidelity of DNA replication is safeguarded by the accuracy of nucleotide selection by DNA polymerases, proofreading activity of the replicative polymerases, and the DNA mismatch repair system. Errors made by replicative polymerases are corrected by mismatch repair, and inactivation of the mismatch repair system results in a multiplicative increase in error rates when combined with a proofreading deficient allele of a replicative polymerase. In this study, we demonstrate that the deletion of two non-essential genes encoding for two subunits of Pol ε give an increased mutation rate due to increased synthesis by the error-prone DNA polymerase ζ. Surprisingly, there was no multiplicative increase in error rates when the mismatch repair system was inactivated. We propose that the deletion of DPB3 and DPB4 gives a defective replisome, which in turn gives increased synthesis, in part, by Pol ζ during an error-prone post-replication process that is not efficiently repaired by the mismatch repair system.
Retroposition, a leading mechanism for gene duplication, is an important process shaping the evolution of genomes. Retrogenes are also involved in the gene structure evolution as a major player in the process of intron deletion. Here, we demonstrate the role of retrogenes in intron gain in mammals. We identified one case of “intronization,” the transformation of exonic sequences into an intron, in the primate specific retrogene RNF113B and two independent “intronization” events in the retrogene DCAF12L2, one in the common ancestor of primates and rodents and another one in the rodent lineage. Intron gain resulted from the origin of new splice variants, and both genes have two transcript forms, one with retained intron and one with the intron spliced out. Evolution of these genes, especially RNF113B, has been very dynamic and has been accompanied by several additional events including parental gene loss, secondary retroposition, and exaptation of transposable elements.
intron gain; gene structure evolution; splice variant; RNF113; DCAF12
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches).
The deep phylogeny of eukaryotes is an important but extremely difficult problem of evolutionary biology. Five eukaryotic supergroups are relatively well established but the relationship between these supergroups remains elusive, and their divergence seems to best fit a “Big Bang” model. Attempts were made to root the tree of eukaryotes by using potential derived shared characters such as unique fusions of conserved genes. One popular model of eukaryotic evolution that emerged from this type of analysis is the unikont–bikont phylogeny: The unikont branch consists of Metazoa, Choanozoa, Fungi, and Amoebozoa, whereas bikonts include the rest of eukaryotes, namely, Plantae (green plants, Chlorophyta, and Rhodophyta), Chromalveolata, excavates, and Rhizaria. We reexamine the relationships between the eukaryotic supergroups using a genome-wide analysis of rare genomic changes (RGCs) associated with multiple, conserved amino acids (RGC_CAMs and RGC_CAs), to resolve trifurcations of major eukaryotic lineages. The results do not support the basal position of Chromalveolata with respect to Plantae and unikonts or the monophyly of the bikont group and appear to be best compatible with the monophyly of unikonts and Chromalveolata. Chromalveolata show a distinct, additional signal of affinity with Plantae, conceivably, owing to genes transferred from the secondary, red algal symbiont. Excavates are derived forms, with extremely long branches that complicate phylogenetic inference; nevertheless, the RGC analysis suggests that they are significantly more likely to cluster with the unikont–Chromalveolata assemblage than with the Plantae. Thus, the first split in eukaryotic evolution might lie between photosynthetic and nonphotosynthetic forms and so could have been triggered by the endosymbiosis between an ancestral unicellular eukaryote and a cyanobacterium that gave rise to the chloroplast.
eukaryotic phylogeny; rare genomic changes; parsimony; substitutions; insertions; deletions
To probe Pol ζ functions in vivo via its error signature, here we report the properties of Saccharomyces cerevisiae Pol ζ in which phenyalanine was substituted for the conserved Leu-979 in the catalytic (Rev3) subunit. We show that purified L979F Pol ζ is 30% as active as wild-type Pol ζ when replicating undamaged DNA. L979F Pol ζ shares with wild-type Pol ζ the ability to perform moderately processive DNA synthesis. When copying undamaged DNA, L979F Pol ζ is error-prone compared to wild-type Pol ζ, providing a biochemical rationale for the observed mutator phenotype of rev3-L979F yeast strains. Errors generated by L979F Pol ζ in vitro include single-base insertions, deletions and substitutions, with the highest error rates involving stable misincorporation of dAMP and dGMP. L979F Pol ζ also generates multiple errors in close proximity to each other. The frequency of these events far exceeds that expected for independent single changes, indicating that the first error increases the probability of additional errors within 10 nucleotides. Thus L979F Pol ζ, and perhaps wild-type Pol ζ, which also generates clustered mutations at a lower but significant rate, performs short patches of processive, error-prone DNA synthesis. This may explain the origin of some multiple clustered mutations observed in vivo.