Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes.
The abundance of mammalian long intergenic non-coding RNA (lincRNA) genes is high, yet their functions remain largely unknown. One possible way to study this important question is to use large-scale comparisons of various characteristics of lincRNA with those of protein-coding genes for which a large body of functional information is available. A prominent feature of mammalian protein-coding genes is the high evolutionary conservation of the exon-intron structure. Comparative analysis of putative intron positions in lincRNA genes from various mammalian genomes suggests that some lincRNA introns have been conserved for over 100 million years, thus the primary and/or secondary structure of these molecules is likely to be functionally important.
lincRNA; exon; intron; non-coding RNA; genomic alignments; intron gain; intron loss
Activation-induced deaminase (AID) is the master regulator of class switch recombination (CSR) and somatic hypermutation (SHM), but the mechanisms regulating AID function are obscure. The differential pattern of switch plasmid activity in three IgM+/AID+ and two IgG+/AID+ B cell lines prompted an analysis of global gene expression to discover the origin of these cells. Gene profiling suggested that the IgG+/AID+ B cell lines derived from germinal center B cells. Analysis of SHM potential demonstrates that the IgVκ domains are inducibly diversified at high rate during in vitro culture. The mutation spectra focused to A:T base pairs, revealing a component of the hypermutation program that occurs preferentially during phase 2 of SHM. The A:T error spectra were analyzed and were not characteristic of polymerase η activity. A differential pattern of three consensus motifs used for A:T base substitutions was observed in WT and Polη-, Msh2- and Msh6-deficient B cells. Strikingly, mutations in our B cell lines recapitulated the mutable motif profile for Polη and Msh2 deficiency, respectively, and suggest that an additional pathway for the generation of A:T mutations in SHM is conserved in mouse and human.
AID; B cell; Immunoglobulin; Somatic hypermutation
Germline endogenous viral elements (EVEs) genetically preserve viral nucleotide sequences useful to the study of viral evolution, gene mutation, and the phylogenetic relationships among host organisms. Here, we describe a lineage-specific, adeno-associated virus (AAV)-derived endogenous viral element (mAAV-EVE1) found within the germline of numerous closely related marsupial species. Molecular screening of a marsupial DNA panel indicated that mAAV-EVE1 occurs specifically within the marsupial suborder Macropodiformes (present-day kangaroos, wallabies, and related macropodoids), to the exclusion of other Diprotodontian lineages. Orthologous mAAV-EVE1 locus sequences from sixteen macropodoid species, representing a speciation history spanning an estimated 30 million years, facilitated compilation of an inferred ancestral sequence that recapitulates the genome of an ancient marsupial AAV that circulated among Australian metatherian fauna sometime during the late Eocene to early Oligocene. In silico gene reconstruction and molecular modelling indicate remarkable conservation of viral structure over a geologic timescale. Characterisation of AAV-EVE loci among disparate species affords insight into AAV evolution and, in the case of macropodoid species, may offer an additional genetic basis for assignment of phylogenetic relationships among the Macropodoidea. From an applied perspective, the identified AAV “fossils” provide novel capsid sequences for use in translational research and clinical applications.
Transposable elements (TEs) are abundant in mammalian genomes and appear to have contributed to the evolution of their hosts by providing novel regulatory or coding sequences. We analyzed different regions of long intergenic non-coding RNA (lincRNA) genes in human and mouse genomes to systematically assess the potential contribution of TEs to the evolution of the structure and regulation of expression of lincRNA genes. Introns of lincRNA genes contain the highest percentage of TE-derived sequences (TES), followed by exons and then promoter regions although the density of TEs is not significantly different between exons and promoters. Higher frequencies of ancient TEs in promoters and exons compared to introns implies that many lincRNA genes emerged before the split of primates and rodents. The content of TES in lincRNA genes is substantially higher than that in protein-coding genes, especially in exons and promoter regions. A significant positive correlation was detected between the content of TEs and evolutionary rate of lincRNAs indicating that inserted TEs are preferentially fixed in fast-evolving lincRNA genes. These results are consistent with the repeat insertion domains of LncRNAs hypothesis under which TEs have substantially contributed to the origin, evolution, and, in particular, fast functional diversification, of lincRNA genes.
mobile elements; molecular domestication; exaptation; junk DNA; long non-coding RNA; repetitive elements
More than half a century from postulated Warburg theory of cancer cells origin, a question of changed metabolism in cancer is again taking the central place. Generalized picture of cancer metabolism was replaced by analysis of signaling and oncogenes in each type of cancer for several decades. However, now empowered with wealth of knowledge about tumor suppressors, oncogenes, and signaling pathways, reprogramming of cellular metabolism (e.g., increased glycolysis to respiration ratio in cancer cells) reemerged as an important element of cancer progression. To analyze level of expression of various proteins including metabolic enzymes across various cancers we used dbEST and Unigene data. We delineated a list of genes that are overexpressed in different types of cancer. We also grouped overexpressed enzymes into KEGG pathways and analyzed adjacent pathways to describe enzymatic reactions that take place in cancer cells and to identify major players that are abundant in cancer protein machinery. Glycolysis/gluconeogenesis and oxidative phosphorylation are the most abundant pathways although several other pathways are enriched in genes from our list. Ubiquitously overexpressed genes could be marked as nonspecific cancer-associated genes when analyzing genes that are overexpressed in certain types of cancer. Thus the list of overexpressed genes may be a useful tool for cancer research.
A substantial fraction of eukaryotic proteins contains multiple domains, some of which show a tendency to occur in diverse domain architectures and can be considered mobile (or ‘promiscuous’). These promiscuous domains are typically involved in protein–protein interactions and play crucial roles in interaction networks, particularly those contributing to signal transduction. They also play a major role in creating diversity of protein domain architecture in the proteome. It is now apparent that promiscuity is a volatile and relatively fast-changing feature in evolution, and that only a few domains retain their promiscuity status throughout evolution. Many such domains attained their promiscuity status independently in different lineages. Only recently, we have begun to understand the diversity of protein domain architectures and the role the promiscuous domains play in evolution of this diversity. However, many of the biological mechanisms of protein domain mobility remain shrouded in mystery. In this review, we discuss our present understanding of protein domain promiscuity, its evolution and its role in cellular function.
mobile domain; promiscuous domain; domain network; domain architecture; domain evolution
Mutations in genomes of species are frequently distributed non-randomly, resulting in mutation clusters, including recently discovered kataegis in tumors. DNA editing deaminases play the prominent role in the etiology of these mutations. To gain insight into the enigmatic mechanisms of localized hypermutagenesis that lead to cluster formation, we analyzed the mutational single nucleotide variations (SNV) data obtained by whole-genome sequencing of drug-resistant mutants induced in yeast diploids by AID/APOBEC deaminase and base analog 6-HAP. Deaminase from sea lamprey, PmCDA1, induced robust clusters, while 6-HAP induced a few weak ones. We found that PmCDA1, AID, and APOBEC1 deaminases preferentially mutate the beginning of the actively transcribed genes. Inactivation of transcription initiation factor Sub1 strongly reduced deaminase-induced can1 mutation frequency, but, surprisingly, did not decrease the total SNV load in genomes. However, the SNVs in the genomes of the sub1 clones were re-distributed, and the effect of mutation clustering in the regions of transcription initiation was even more pronounced. At the same time, the mutation density in the protein-coding regions was reduced, resulting in the decrease of phenotypically detected mutants. We propose that the induction of clustered mutations by deaminases involves: a) the exposure of ssDNA strands during transcription and loss of protection of ssDNA due to the depletion of ssDNA-binding proteins, such as Sub1, and b) attainment of conditions favorable for APOBEC action in subpopulation of cells, leading to enzymatic deamination within the currently expressed genes. This model is applicable to both the initial and the later stages of oncogenic transformation and explains variations in the distribution of mutations and kataegis events in different tumor cells.
Genomes of tumors are heavily enriched with mutations. Some of these mutations are distributed non-randomly, forming mutational clusters. Editing cytosine deaminases from APOBEC superfamily are responsible for the formation of many of these clusters. We have expressed APOBEC enzyme in diploid yeast cells and found that most of the mutations occur in the beginning of the active genes, where transcription starts. Clusters of mutations overlapped with promoters/transcription start sites. This is likely due to the weaker protection of ssDNA, an ultimate APOBEC deaminase enzyme target, in the beginning of the genes. This hypothesis was reinforced by the finding that inactivation of Sub1 transcription initiation factor, which is found predominantly in the regions of transcription initiation, leads to further increase in mutagenesis in the beginning of the genes. Interestingly, the total number of mutations in the genomes of Sub1-deficient clones did not change, despite the 100-fold decrease in frequency of mutants in a reporter gene. Thus, the drastic change in genome-wide distribution of mutations can be caused by inactivation of a single gene. We propose that the loss of ssDNA protection factors causes formation of mutation clusters in human cancer.
Kindlins are essential for integrin-mediated cell adhesion. This study focuses on the evolutionary origin and subsequent functional specialization of kindlins and their role in evolutionary adaptation of cell adhesiveness in multicellular organisms.
Kindlins are integrin-interacting proteins essential for integrin-mediated cell adhesiveness. In this study, we focused on the evolutionary origin and functional specialization of kindlins as a part of the evolutionary adaptation of cell adhesive machinery. Database searches revealed that many members of the integrin machinery (including talin and integrins) existed before kindlin emergence in evolution. Among the analyzed species, all metazoan lineages—but none of the premetazoans—had at least one kindlin-encoding gene, whereas talin was present in several premetazoan lineages. Kindlin appears to originate from a duplication of the sequence encoding the N-terminal fragment of talin (the talin head domain) with a subsequent insertion of the PH domain of separate origin. Sequence analysis identified a member of the actin filament–associated protein 1 (AFAP1) superfamily as the most likely origin of the kindlin PH domain. The functional divergence between kindlin paralogues was assessed using the sequence swap (chimera) approach. Comparison of kindlin 2 (K2)/kindlin 3 (K3) chimeras revealed that the F2 subdomain, in particular its C-terminal part, is crucial for the differential functional properties of K2 and K3. The presence of this segment enables K2 but not K3 to localize to focal adhesions. Sequence analysis of the C-terminal part of the F2 subdomain of K3 suggests that insertion of a variable glycine-rich sequence in vertebrates contributed to the loss of constitutive K3 targeting to focal adhesions. Thus emergence and subsequent functional specialization of kindlins allowed multicellular organisms to develop additional tissue-specific adaptations of cell adhesiveness.
The ortholog conjecture (OC), which is central to functional annotation of genomes, posits that orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of Gene Ontology (GO) annotations and expression profiles, among within-species paralogs compared with orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. However, several subsequent studies suggest that GO annotations and microarray data could artificially inflate functional similarity between paralogs from the same organism. We sought to test the OC using approaches distinct from those used in previous studies. Analysis of a large RNAseq data set from multiple human and mouse tissues shows that expression similarity (correlations coefficients, rank‘s, or Z-scores) between orthologs is substantially greater than that for between-species paralogs with the same sequence divergence, in agreement with the OC and the results of recent detailed analyses. These findings are further corroborated by a fine-grain analysis in which expression profiles of orthologs and paralogs were compared separately for individual gene families. Expression profiles of within-species paralogs are more strongly correlated than profiles of orthologs but it is shown that this is caused by high background noise, that is, correlation between profiles of unrelated genes in the same organism. Z-scores and rank scores show a nonmonotonic dependence of expression profile similarity on sequence divergence. This complexity of gene expression evolution after duplication might be at least partially caused by selection for protein dosage rebalancing following gene duplication.
duplicated genes; selection; neutral evolution; rebalancing dosage effect model; duplication–degeneration–complementation model; neofunctionalization model; subfunctionalization model
Mitochondria are ubiquitous membranous organelles of eukaryotic cells that evolved from an alpha-proteobacterial endosymbiont and possess a small genome that encompasses from 3 to 106 genes. Accumulation of thousands of mitochondrial genomes from diverse groups of eukaryotes provides an opportunity for a comprehensive reconstruction of the evolution of the mitochondrial gene repertoire.
Clusters of orthologous mitochondrial protein-coding genes (MitoCOGs) were constructed from all available mitochondrial genomes and complemented with nuclear orthologs of mitochondrial genes. With minimal exceptions, the mitochondrial gene complements of eukaryotes are subsets of the superset of 66 genes found in jakobids. Reconstruction of the evolution of mitochondrial genomes indicates that the mitochondrial gene set of the last common ancestor of the extant eukaryotes was slightly larger than that of jakobids. This superset of mitochondrial genes likely represents an intermediate stage following the loss and transfer to the nucleus of most of the endosymbiont genes early in eukaryote evolution. Subsequent evolution in different lineages involved largely parallel transfer of ancestral endosymbiont genes to the nuclear genome. The intron density in nuclear orthologs of mitochondrial genes typically is nearly the same as in the rest of the genes in the respective genomes. However, in land plants, the intron density in nuclear orthologs of mitochondrial genes is almost 1.5-fold lower than the genomic mean, suggestive of ongoing transfer of functional genes from mitochondria to the nucleus.
The MitoCOGs are expected to become an important resource for the study of mitochondrial evolution. The nearly complete superset of mitochondrial genes in jakobids likely represents an intermediate stage in the evolution of eukaryotes after the initial, extensive loss and transfer of the endosymbiont genes. In addition, the bacterial multi-subunit RNA polymerase that is encoded in the jakobid mitochondrial genomes was replaced by a single-subunit phage-type RNA polymerase in the rest of the eukaryotes. These results are best compatible with the rooting of the eukaryotic tree between jakobids and the rest of the eukaryotes. The land plants are the only eukaryotic branch in which the gene transfer from the mitochondrial to the nuclear genome appears to be an active, ongoing process.
Electronic supplementary material
The online version of this article (doi:10.1186/s12862-014-0237-5) contains supplementary material, which is available to authorized users.
Mitochondria; Genome evolution; Gene loss; Gene transfer; Introns; Clusters of orthologous genes
A dramatic increase in the prevalence of autism and Autistic Spectrum Disorders (ASD) has been observed over the last two decades in USA, Europe and Asia. Given the accumulating data on the possible role of translation in the etiology of ASD, we analyzed potential effects of rare synonymous substitutions associated with ASD on mRNA stability, splicing enhancers and silencers, and codon usage.
Presentation of the hypothesis
We hypothesize that subtle impairment of translation, resulting in dosage imbalance of neuron-specific proteins, contributes to the etiology of ASD synergistically with environmental neurotoxins.
Testing the hypothesis
A statistically significant shift from optimal to suboptimal codons caused by rare synonymous substitutions associated with ASD was detected whereas no effect on other analyzed characteristics of transcripts was identified. This result suggests that the impact of rare codons on the translation of genes involved in neuron development, even if slight in magnitude, could contribute to the pathogenesis of ASD in the presence of an aggressive chemical background. This hypothesis could be tested by further analysis of ASD-associated mutations, direct biochemical characterization of their effects, and assessment of in vivo effects on animal models.
Implications of the hypothesis
It seems likely that the synergistic action of environmental hazards with genetic variations that in themselves have limited or no deleterious effects but are potentiated by the environmental factors is a general principle that underlies the alarming increase in the ASD prevalence.
This article was reviewed by Andrey Rzhetsky, Neil R. Smalheiser, and Shamil R. Sunyaev.
Synonymous mutations; Single nucleotide polymorphism; Codon usage; Splicing enhancer; Splicing silencer; mRNA secondary structure; Transcription factor binding; Neurotoxin
The rate of mutations in eukaryotes depends on a plethora of factors and is not immediately derived from the fidelity of DNA polymerases (Pols). Replication of chromosomes containing the anti-parallel strands of duplex DNA occurs through the copying of leading and lagging strand templates by a trio of Pols α, δ and ε, with the assistance of Pol ζ and Y-family Pols at difficult DNA template structures or sites of DNA damage. The parameters of the synthesis at a given location are dictated by the quality and quantity of nucleotides in the pools, replication fork architecture, transcription status, regulation of Pol switches, and structure of chromatin. The result of these transactions is a subject of survey and editing by DNA repair.
DNA polymerases; nucleotide pools; mutagenesis; Okazaki fragments
Aberrant activation of receptor tyrosine kinases (RTKs) is a common feature of many cancer cells. It was previously suggested that the mechanisms of kinase activation in cancer might be linked to transitions between active and inactive states. Here we estimate the effects of single and double cancer mutations on the stability of active and inactive states of the kinase domains from different RTKs. We show that singleton cancer mutations destabilize active and inactive states, however inactive states are destabilized more than the active ones leading to kinase activation. We show that there exists a relationship between the estimate of oncogenic potential of cancer mutation and kinase activation. Namely, more frequent mutations have a higher activating effect, which might allow us to predict the activating effect of the mutations from the mutation spectra. Independent evolutionary analysis of mutation spectra complements this observation and finds the same frequency threshold defining mutation hot spots. We analyze double mutations and report a positive epistasis and additional advantage of doublets with respect to cancer cell fitness. The activation mechanisms of double mutations differ from those of single mutations and double mutation spectrum is found to be dissimilar to the mutation spectrum of singletons.
cancer mutation; receptor tyrosine kinase; protein structure; kinase activation; mutation spectra; double mutations
Genetic information should be accurately transmitted from cell to cell; conversely, the adaptation in evolution and disease is fueled by mutations. In the case of cancer development, multiple genetic changes happen in somatic diploid cells. Most classic studies of the molecular mechanisms of mutagenesis have been performed in haploids. We demonstrate that the parameters of the mutation process are different in diploid cell populations. The genomes of drug-resistant mutants induced in yeast diploids by base analog 6-hydroxylaminopurine (HAP) or AID/APOBEC cytosine deaminase PmCDA1 from lamprey carried a stunning load of thousands of unselected mutations. Haploid mutants contained almost an order of magnitude fewer mutations. To explain this, we propose that the distribution of induced mutation rates in the cell population is uneven. The mutants in diploids with coincidental mutations in the two copies of the reporter gene arise from a fraction of cells that are transiently hypersensitive to the mutagenic action of a given mutagen. The progeny of such cells were never recovered in haploids due to the lethality caused by the inactivation of single-copy essential genes in cells with too many induced mutations. In diploid cells, the progeny of hypersensitive cells survived, but their genomes were saturated by heterozygous mutations. The reason for the hypermutability of cells could be transient faults of the mutation prevention pathways, like sanitization of nucleotide pools for HAP or an elevated expression of the PmCDA1 gene or the temporary inability of the destruction of the deaminase. The hypothesis on spikes of mutability may explain the sudden acquisition of multiple mutational changes during evolution and carcinogenesis.
Evolution and carcinogenesis are driven by mutations. Cells maintain constant mutation rates and can afford only transient mutagenesis bursts for adaptation. The nature of the mutational avalanches is not very clear. We sequenced the whole genomes of mutants induced in haploid and diploid yeast by nucleobase analog HAP and by DNA editing cytosine deaminase. Mutants selected in diploids are saturated with passenger mutations. Far fewer mutations are found in haploid mutants. Treatment with a mutagen without selection results in intermediate mutagenesis. The observed transient hypermutability of diploids under mutagenic insult helps to explain the wellspring of mutations that arise during evolution and carcinogenesis.
Kindlin-3 is a novel integrin activator in hematopoietic cells and its deficiency leads to immune problems and severe bleeding, known as LAD-III. Our current understanding of Kindlin-3 function primarily relies on analysis of animal models or cell lines.
To understand the functions of Kindlin-3 in human primary blood cells.
Here we analyze primary and immortalized hematopoietic cells obtained from a new LAD-III patient with immune problems, bleeding, a history of anemia and abnormally shaped red blood cells.
Patient’s WBC and platelets showed defect in agonist induced integrin activation and botrocetin induced platelet agglutination. Primary leukocytes from this patient exhibited abnormal activation of beta1 integrin. Integrin activation defects were responsible for observed deficiency of botrocetin induced platelet response. Analysis of patient’s genomic DNA revealed a novel mutation in kindlin-3 gene. The mutation abolished Kindlin-3 expression in primary WBC and platelets due to abnormal splicing. Kindlin-3 is expressed in erythrocytes and its deficiency proposed to lead to abnormal shape of RBC. Immortalized patient’s WBCs expressed a truncated form of Kindlin-3 which was not sufficient to support integrin activation. Expression of Kindlin-3 cDNA in immortalized patient’s WBCs rescued integrin activation defects while overexpression of the truncated form did not.
Kindlin-3 deficiency impairs integrin function, including activation of beta 1 integrin.
Abnormalities in GPIb-IX function in kindlin-3 deficient platelets are secondary to integrin defects.
Region of Kindlin-3 encoded by Exon 11 is crucial for its ability to activate integrins in humans.
Integrins; Kindlins; Leukocyte Adhesion Deficiency; Platelets; Red Blood Cells; White Blood Cells
We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes. We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences. Although the sequences of most lincRNAs are much less strongly conserved than protein sequences, the extent of orthology between the lincRNomes is unexpectedly high, with 60 to 70% of the lincRNA genes shared between human and mouse. The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.
Genome analysis of humans and other mammals reveals a surprisingly small number of protein-coding genes, only slightly over 20,000 (although the diversity of actual proteins is substantially augmented by alternative transcription and alternative splicing). Recent analysis of the mammalian genomes and transcriptomes, in particular, using the RNAseq technology, shows that, in addition to protein-coding genes, mammalian genomes encode many long non-coding RNAs. For some of these transcripts, various regulatory functions have been demonstrated, but on the whole the repertoire of long non-coding RNAs remains poorly characterized. We compared the identified long intergenic non-coding (linc)RNAs from human and mouse, and employed a specially developed statistical technique to estimate the size and evolutionary conservation of the human and mouse lincRNomes. The estimates show that there are at least twice as many human and mouse lincRNAs than there are protein-coding genes. Moreover, about two third of the lincRNA genes appear to be conserved between human and mouse, implying thousands of conserved but still uncharacterized functions.
We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 Mb and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than 1/3 of Daphnia’s genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The co-expansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes – including many additional loci within sequenced regions that are otherwise devoid of annotations – are the most responsive genes to ecological challenges.
Clusters of localized hypermutation in human breast cancer genomes, named “kataegis” (from the Greek for thunderstorm), are hypothesized to result from multiple cytosine deaminations catalyzed by AID/APOBEC proteins. However, a direct link between APOBECs and kataegis is still lacking. We have sequenced the genomes of yeast mutants induced in diploids by expression of the gene for PmCDA1, a hypermutagenic deaminase from sea lamprey. Analysis of the distribution of 5,138 induced mutations revealed localized clusters very similar to those found in tumors. Our data provide evidence that unleashed cytosine deaminase activity is an evolutionary conserved, prominent source of genome-wide kataegis events.
This article was reviewed by: Professor Sandor Pongor, Professor Shamil R. Sunyaev, and Dr Vladimir Kuznetsov.
APOBEC; Deaminase; Mutation; Kataegis; Cancer; Diploid yeast; Hypermutation
In order to maintain visual sensitivity at all light levels, the vertebrate eye possesses a mechanism to regenerate the visual pigment chromophore 11-cis retinal in the dark enzymatically, unlike in all other taxa, which rely on photoisomerization. This mechanism is termed the visual cycle and is localized to the retinal pigment epithelium (RPE), a support layer of the neural retina. Speculation has long revolved around whether more primitive chordates, such as tunicates and cephalochordates, anticipated this feature. The two key enzymes of the visual cycle are RPE65, the visual cycle all-trans retinyl ester isomerohydrolase, and lecithin:retinol acyltransferase (LRAT), which generates RPE65’s substrate. We hypothesized that the origin of the vertebrate visual cycle is directly connected to an ancestral carotenoid oxygenase acquiring a new retinyl ester isomerohydrolase function. Our phylogenetic analyses of the RPE65/BCMO and N1pC/P60 (LRAT) superfamilies show that neither RPE65 nor LRAT orthologs occur in tunicates (Ciona) or cephalochordates (Branchiostoma), but occur in Petromyzon marinus (Sea Lamprey), a jawless vertebrate. The closest homologs to RPE65 in Ciona and Branchiostoma lacked predicted functionally diverged residues found in all authentic RPE65s, but lamprey RPE65 contained all of them. We cloned RPE65 and LRATb cDNAs from lamprey RPE and demonstrated appropriate enzymatic activities. We show that Ciona ß-carotene monooxygenase a (BCMOa) (previously annotated as an RPE65) has carotenoid oxygenase cleavage activity but not RPE65 activity. We verified the presence of RPE65 in lamprey RPE by immunofluorescence microscopy, immunoblot and mass spectrometry. On the basis of these data we conclude that the crucial transition from the typical carotenoid double bond cleavage functionality (BCMO) to the isomerohydrolase functionality (RPE65), coupled with the origin of LRAT, occurred subsequent to divergence of the more primitive chordates (tunicates, etc.) in the last common ancestor of the jawless and jawed vertebrates.
Among thousands of long non-coding RNAs (lncRNAs) only a small subset is functionally characterized and the functional annotation of lncRNAs on the genomic scale remains inadequate. In this study we computationally characterized two functionally different parts of human lncRNAs transcriptome based on their ability to bind the polycomb repressive complex, PRC2. This classification is enabled by the fact that while all lncRNAs constitute a diverse set of sequences, the classes of PRC2-binding and PRC2 non-binding lncRNAs possess characteristic combinations of sequence-structure patterns and, therefore, can be separated within the feature space. Based on the specific combination of features, we built several machine-learning classifiers and identified the SVM-based classifier as the best performing. We further showed that the SVM-based classifier is able to generalize on the independent data sets. We observed that this classifier, trained on the human lncRNAs, can predict up to 59.4% of PRC2-binding lncRNAs in mice. This suggests that, despite the low degree of sequence conservation, many lncRNAs play functionally conserved biological roles.
Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence–absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.
Maximum likelihood; expectation-maximization; intron evolution; ancestral reconstruction; eukaryotic gene structure
It was proposed that if some mRNA characteristics resulted in a low efficiency of termination signal, an additional closely located stop codon (tandem stop codons) could be used to prevent the harmful readthrough. However, the role of tandem terminators in higher eukaryotes was not verified and remains hypothetical. In this work the sequence features of Arabidopsis thaliana and Oryza sativa mRNAs were analyzed. It was found that plant mRNAs with UGA terminator were characterized by a higher frequency of nonsense codons in the first triplet position of 3′-UTR that could result from a weak natural selection for “reserve” stop signal. Interestingly, the presence of tandem stop codons positively correlated with a specific amino acid composition in the C-terminal position of the encoded proteins. In particular, C-terminal glycine positively correlated with significantly higher frequencies of reserve terminators at the beginning positions of 3′-UTR in UGA-containing mRNAs. This finding coincides with some earlier observations concerning the role of glycine and its codons in inefficient termination of translation and recoding (e.g., 2A oligopeptide).
mRNA; Arabidopsis thaliana; Oryza sativa; stop codon; tandem terminators; readthrough