Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)
Year of Publication
1.  EREM: Parameter Estimation and Ancestral Reconstruction by Expectation-Maximization Algorithm for a Probabilistic Model of Genomic Binary Characters Evolution 
Advances in Bioinformatics  2010;2010:167408.
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches).
PMCID: PMC2866244  PMID: 20467467
2.  The Ecoresponsive Genome of Daphnia pulex 
Science (New York, N.Y.)  2011;331(6017):555-561.
We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 Mb and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than 1/3 of Daphnia’s genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The co-expansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes – including many additional loci within sequenced regions that are otherwise devoid of annotations – are the most responsive genes to ecological challenges.
PMCID: PMC3529199  PMID: 21292972
3.  Dependencies among Editing Sites in Serotonin 2C Receptor mRNA 
PLoS Computational Biology  2012;8(9):e1002663.
The serotonin 2C receptor (5-HT2CR)–a key regulator of diverse neurological processes–exhibits functional variability derived from editing of its pre-mRNA by site-specific adenosine deamination (A-to-I pre-mRNA editing) in five distinct sites. Here we describe a statistical technique that was developed for analysis of the dependencies among the editing states of the five sites. The statistical significance of the observed correlations was estimated by comparing editing patterns in multiple individuals. For both human and rat 5-HT2CR, the editing states of the physically proximal sites A and B were found to be strongly dependent. In contrast, the editing states of sites C and D, which are also physically close, seem not to be directly dependent but instead are linked through the dependencies on sites A and B, respectively. We observed pronounced differences between the editing patterns in humans and rats: in humans site A is the key determinant of the editing state of the other sites, whereas in rats this role belongs to site B. The structure of the dependencies among the editing sites is notably simpler in rats than it is in humans implying more complex regulation of 5-HT2CR editing and, by inference, function in the human brain. Thus, exhaustive statistical analysis of the 5-HT2CR editing patterns indicates that the editing state of sites A and B is the primary determinant of the editing states of the other three sites, and hence the overall editing pattern. Taken together, these findings allow us to propose a mechanistic model of concerted action of ADAR1 and ADAR2 in 5-HT2CR editing. Statistical approach developed here can be applied to other cases of interdependencies among modification sites in RNA and proteins.
Author Summary
The serotonin receptor 2C is a key regulator of diverse neurological processes that affect feeding behavior, sleep, sexual behavior, anxiety and depression. The function of the receptor itself is regulated via so-called pre-mRNA editing, i.e. site-specific adenosine deamination in five distinct sites. The greater the number of edited sites in the serotonin receptor mRNA, the lower the activity of the receptor it encodes. Here we used the results of extensive massively parallel sequencing from human and rat brains to elucidate the dependencies among the editing states of the five sites. Despite the apparent simplicity of the problem, disambiguation of these dependencies is a difficult task that required development of a new statistical technique. We employed this method to analyse the dependencies among editing in the 5 susceptible sites of the receptor mRNA and found that the proximal, juxtaposed sites A and B are strongly interdependent, and that the editing state of these two sites is a major determinant of the editing states of the other three sites, and hence the overall editing pattern. The statistical approach we developed for the analysis of mRNA editing can be applied to other cases of multiple site modification in RNA and proteins.
PMCID: PMC3435259  PMID: 22969417
4.  A Maximum Likelihood Method for Reconstruction of the Evolution of Eukaryotic Gene Structure 
Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence–absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.
PMCID: PMC3410445  PMID: 19381540
Maximum likelihood; expectation-maximization; intron evolution; ancestral reconstruction; eukaryotic gene structure
5.  Origin and evolution of spliceosomal introns 
Biology Direct  2012;7:11.
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
PMCID: PMC3488318  PMID: 22507701
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
6.  A Universal Nonmonotonic Relationship between Gene Compactness and Expression Levels in Multicellular Eukaryotes 
Analysis of gene architecture and expression levels of four organisms, Homo sapiens, Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana, reveals a surprising, nonmonotonic, universal relationship between expression level and gene compactness. With increasing expression level, the genes tend at first to become longer but, from a certain level of expression, they become more and more compact, resulting in an approximate bell-shaped dependence. There are two leading hypotheses to explain the compactness of highly expressed genes. The selection hypothesis predicts that gene compactness is predominantly driven by the level of expression, whereas the genomic design hypothesis predicts that expression breadth across tissues is the driving force. We observed the connection between gene expression breadth in humans and gene compactness to be significantly weaker than the connection between expression level and compactness, a result that is compatible with the selection hypothesis but not the genome design hypothesis. The initial gene elongation with increasing expression level could be explained, at least in part, by accumulation of regulatory elements enhancing expression, in particular, in introns. This explanation is compatible with the observed positive correlation between intron density and expression level of a gene. Conversely, the trend toward increasing compactness for highly expressed genes could be caused by selection for minimization of energy and time expenditure during transcription and splicing and for increased fidelity of transcription, splicing, and/or translation that is likely to be particularly critical for highly expressed genes. Regardless of the exact nature of the forces that shape the gene architecture, we present evidence that, at least, in animals, coding and noncoding parts of genes show similar architectonic trends.
PMCID: PMC2817431  PMID: 20333206
eukaryotic gene structure; eukaryotic gene architecture; selection on gene compactness; genomic design; intron functionality; intron density
7.  Widespread Positive Selection in Synonymous Sites of Mammalian Genes 
Molecular biology and evolution  2007;24(8):1821-1831.
Evolution of protein sequences is largely governed by purifying selection, with a small fraction of proteins evolving under positive selection. The evolution at synonymous positions in protein-coding genes is not nearly as well understood, with the extent and types of selection remaining, largely, unclear. A statistical test to identify purifying and positive selection at synonymous sites in protein-coding genes was developed. The method compares the rate of evolution at synonymous sites (Ks) to that in intron sequences of the same gene after sampling the aligned intron sequences to mimic the statistical properties of coding sequences. We detected purifying selection at synonymous sites in ∼28% of the 1,562 analyzed orthologous genes from mouse and rat, and positive selection in ∼12% of the genes. Thus, the fraction of genes with readily detectable positive selection at synonymous sites is much greater than the fraction of genes with comparable positive selection at nonsynonymous sites, i.e., at the level of the protein sequence. Unlike other genes, the genes with positive selection at synonymous sites showed no correlation between Ks and the rate of evolution in nonsynonymous sites (Ka), indicating that evolution of synonymous sites under positive selection is decoupled from protein evolution. The genes with purifying selection at synonymous sites showed significant anticorrelation between Ks and expression level and breadth, indicating that highly expressed genes evolve slowly. The genes with positive selection at synonymous sites showed the opposite trend, i.e., highly expressed genes had, on average, higher Ks. For the genes with positive selection at synonymous sites, a significantly lower mRNA stability is predicted compared to the genes with negative selection. Thus, mRNA destabilization could be an important factor driving positive selection in nonsynonymous sites, probably, through regulation of expression at the level of mRNA degradation and, possibly, also translation rate. So, unexpectedly, we found that positive selection at synonymous sites of mammalian genes is substantially more common than positive selection at the level of protein sequences. Positive selection at synonymous sites might act through mRNA destabilization affecting mRNA levels and translation.
PMCID: PMC2632937  PMID: 17522087
synonymous sites; nonsynonymous sites; positive selection; purifying selection; introns
8.  Homoplasy in genome-wide analysis of rare amino acid replacements: the molecular-evolutionary basis for Vavilov's law of homologous series 
Biology Direct  2008;3:7.
Rare genomic changes (RGCs) that are thought to comprise derived shared characters of individual clades are becoming an increasingly important class of markers in genome-wide phylogenetic studies. Recently, we proposed a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions) that were inferred using genome-wide identification of amino acid replacements that were: i) located in unambiguously aligned regions of orthologous genes, ii) shared by two or more taxa in positions that contain a different, conserved amino acid in a much broader range of taxa, and iii) require two or three nucleotide substitutions. When applied to animal phylogeny, the RGC_CAM approach supported the coelomate clade that unites deuterostomes with arthropods as opposed to the ecdysozoan (molting animals) clade. However, a non-negligible level of homoplasy was detected.
We provide a direct estimate of the level of homoplasy caused by parallel changes and reversals among the RGC_CAMs using 462 alignments of orthologous genes from 19 eukaryotic species. It is shown that the impact of parallel changes and reversals on the results of phylogenetic inference using RGC_CAMs cannot explain the observed support for the Coelomata clade. In contrast, the evidence in support of the Ecdysozoa clade, in large part, can be attributed to parallel changes. It is demonstrated that parallel changes are significantly more common in internal branches of different subtrees that are separated from the respective common ancestor by relatively short times than in terminal branches separated by longer time intervals. A similar but much weaker trend was detected for reversals. The observed evolutionary trend of parallel changes is explained in terms of the covarion model of molecular evolution. As the overlap between the covarion sets in orthologous genes from different lineages decreases with time after divergence, the likelihood of parallel changes decreases as well.
The level of homoplasy observed here appears to be low enough to justify the utility of RGC_CAMs and other types of RGCs for resolution of hard problems in phylogeny. Parallel changes, one of the major classes of events leading to homoplasy, occur much more often in relatively recently diverged lineages than in those separated from their last common ancestor by longer time intervals of time. This pattern seems to provide the molecular-evolutionary underpinning of Vavilov's law of homologous series and is readily interpreted within the framework of the covarion model of molecular evolution.
This article was reviewed by Alex Kondrashov, Nicolas Galtier, and Maximilian Telford and Robert Lanfear (nominated by Laurence Hurst).
PMCID: PMC2292158  PMID: 18346278
9.  Patterns of intron gain and conservation in eukaryotic genes 
The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
PMCID: PMC2151770  PMID: 17935625
10.  Unifying measures of gene function and evolution 
Recent genome analyses revealed intriguing correlations between variables characterizing the functioning of a gene, such as expression level (EL), connectivity of genetic and protein–protein interaction networks, and knockout effect, and variables describing gene evolution, such as sequence evolution rate (ER) and propensity for gene loss. Typically, variables within each of these classes are positively correlated, e.g. products of highly expressed genes also have a propensity to be involved in many protein–protein interactions, whereas variables between classes are negatively correlated, e.g. highly expressed genes, on average, evolve slower than weakly expressed genes. Here, we describe principal component (PC) analysis of seven genome-related variables and propose biological interpretations for the first three PCs. The first PC reflects a gene's ‘importance’, or the ‘status’ of a gene in the genomic community, with positive contributions from knockout lethality, EL, number of protein–protein interaction partners and the number of paralogues, and negative contributions from sequence ER and gene loss propensity. The next two PCs define a plane that seems to reflect the functional and evolutionary plasticity of a gene. Specifically, PC2 can be interpreted as a gene's ‘adaptability’ whereby genes with high adaptability readily duplicate, have many genetic interaction partners and tend to be non-essential. PC3 also might reflect the role of a gene in organismal adaptation albeit with a negative rather than a positive contribution of genetic interactions; we provisionally designate this PC ‘reactivity’. The interpretation of PC2 and PC3 as measures of a gene's plasticity is compatible with the observation that genes with high values of these PCs tend to be expressed in a condition- or tissue-specific manner. Functional classes of genes substantially vary in status, adaptability and reactivity, with the highest status characteristic of the translation system and cytoskeletal proteins, highest adaptability seen in cellular processes and signalling genes, and top reactivity characteristic of metabolic enzymes.
PMCID: PMC1560323  PMID: 16777745
gene expression; gene dispensability; protein–protein interaction; sequence evolution rate; gene loss; principal component analysis

Results 1-10 (10)