Search tips
Search criteria

Results 1-19 (19)

Clipboard (0)
Year of Publication
1.  Origin and evolution of spliceosomal introns 
Biology Direct  2012;7:11.
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
PMCID: PMC3488318  PMID: 22507701
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
2.  A late origin of the extant eukaryotic diversity: divergence time estimates using rare genomic changes 
Biology Direct  2011;6:26.
Accurate estimation of the divergence time of the extant eukaryotes is a fundamentally important but extremely difficult problem owing primarily to gross violations of the molecular clock at long evolutionary distances and the lack of appropriate calibration points close to the date of interest. These difficulties are intrinsic to the dating of ancient divergence events and are reflected in the large discrepancies between estimates obtained with different approaches. Estimates of the age of Last Eukaryotic Common Ancestor (LECA) vary approximately twofold, from ~1,100 million years ago (Mya) to ~2,300 Mya.
We applied the genome-wide analysis of rare genomic changes associated with conserved amino acids (RGC_CAs) and used several independent techniques to obtain date estimates for the divergence of the major lineages of eukaryotes with calibration intervals for insects, land plants and vertebrates. The results suggest an early divergence of monocot and dicot plants, approximately 340 Mya, raising the possibility of plant-insect coevolution. The divergence of bilaterian animal phyla is estimated at ~400-700 Mya, a range of dates that is consistent with cladogenesis immediately preceding the Cambrian explosion. The origin of opisthokonts (the supergroup of eukaryotes that includes metazoa and fungi) is estimated at ~700-1,000 Mya, and the age of LECA at ~1,000-1,300 Mya. We separately analyzed the red algal calibration interval which is based on single fossil. This analysis produced time estimates that were systematically older compared to the other estimates. Nevertheless, the majority of the estimates for the age of the LECA using the red algal data fell within the 1,200-1,400 Mya interval.
The inference of a "young LECA" is compatible with the latest of previously estimated dates and has substantial biological implications. If these estimates are valid, the approximately 1 to 1.4 billion years of evolution of eukaryotes that is open to comparative-genomic study probably was preceded by hundreds of millions years of evolution that might have included extinct diversity inaccessible to comparative approaches.
This article was reviewed by William Martin, Herve Philippe (nominated by I. King Jordan), and Romain Derelle.
PMCID: PMC3125394  PMID: 21595937
bilateria; opisthokonts; angiosperms; last eukaryotic common ancestor; molecular dating
3.  Evolution of alternative and constitutive regions of mammalian 5'UTRs 
BMC Genomics  2009;10:162.
Alternative splicing (AS) in protein-coding sequences has emerged as an important mechanism of regulation and diversification of animal gene function. By contrast, the extent and roles of alternative events including AS and alternative transcription initiation (ATI) within the 5'-untranslated regions (5'UTRs) of mammalian genes are not well characterized.
We evaluated the abundance, conservation and evolution of putative regulatory control elements, namely, upstream start codons (uAUGs) and open reading frames (uORFs), in the 5'UTRs of human and mouse genes impacted by alternative events. For genes with alternative 5'UTRs, the fraction of alternative sequences (those present in a subset of the transcripts) is much greater than that in the corresponding coding sequence, conceivably, because 5'UTRs are not bound by constraints on protein structure that limit AS in coding regions. Alternative regions of mammalian 5'UTRs evolve faster and are subject to a weaker purifying selection than constitutive portions. This relatively weak selection results in over-abundance of uAUGs and uORFs in the alternative regions of 5'UTRs compared to constitutive regions. Nevertheless, even in alternative regions, uORFs evolve under a stronger selection than the rest of the sequences, indicating that some of the uORFs are conserved regulatory elements; some of the non-conserved uORFs could be involved in species-specific regulation.
The findings on the evolution and selection in alternative and constitutive regions presented here are consistent with the hypothesis that alternative events, namely, AS and ATI, in 5'UTRs of mammalian genes are likely to contribute to the regulation of translation.
PMCID: PMC2674463  PMID: 19371439
4.  Evolution of DNA polymerases: an inactivated polymerase-exonuclease module in Pol ε and a chimeric origin of eukaryotic polymerases from two classes of archaeal ancestors 
Biology Direct  2009;4:11.
Evolution of DNA polymerases, the key enzymes of DNA replication and repair, is central to any reconstruction of the history of cellular life. However, the details of the evolutionary relationships between DNA polymerases of archaea and eukaryotes remain unresolved.
We performed a comparative analysis of archaeal, eukaryotic, and bacterial B-family DNA polymerases, which are the main replicative polymerases in archaea and eukaryotes, combined with an analysis of domain architectures. Surprisingly, we found that eukaryotic Polymerase ε consists of two tandem exonuclease-polymerase modules, the active N-terminal module and a C-terminal module in which both enzymatic domains are inactivated. The two modules are only distantly related to each other, an observation that suggests the possibility that Pol ε evolved as a result of insertion and subsequent inactivation of a distinct polymerase, possibly, of bacterial descent, upstream of the C-terminal Zn-fingers, rather than by tandem duplication. The presence of an inactivated exonuclease-polymerase module in Pol ε parallels a similar inactivation of both enzymatic domains in a distinct family of archaeal B-family polymerases. The results of phylogenetic analysis indicate that eukaryotic B-family polymerases, most likely, originate from two distantly related archaeal B-family polymerases, one form giving rise to Pol ε, and the other one to the common ancestor of Pol α, Pol δ, and Pol ζ. The C-terminal Zn-fingers that are present in all eukaryotic B-family polymerases, unexpectedly, are homologous to the Zn-finger of archaeal D-family DNA polymerases that are otherwise unrelated to the B family. The Zn-finger of Polε shows a markedly greater similarity to the counterpart in archaeal PolD than the Zn-fingers of other eukaryotic B-family polymerases.
Evolution of eukaryotic DNA polymerases seems to have involved previously unnoticed complex events. We hypothesize that the archaeal ancestor of eukaryotes encoded three DNA polymerases, namely, two distinct B-family polymerases and a D-family polymerase all of which contributed to the evolution of the eukaryotic replication machinery. The Zn-finger might have been acquired from PolD by the B-family form that gave rise to Pol ε prior to or in the course of eukaryogenesis, and subsequently, was captured by the ancestor of the other B-family eukaryotic polymerases. The inactivated polymerase-exonuclease module of Pol ε might have evolved by fusion with a distinct polymerase, rather than by duplication of the active module of Pol ε, and is likely to play an important role in the assembly of eukaryotic replication and repair complexes.
This article was reviewed by Patrick Forterre, Arcady Mushegian, and Chris Ponting. For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2669801  PMID: 19296856
5.  A highly conserved family of inactivated archaeal B family DNA polymerases 
Biology Direct  2008;3:32.
A widespread and highly conserved family of apparently inactivated derivatives of archaeal B-family DNA polymerases is described. Phylogenetic analysis shows that the inactivated forms comprise a distinct clade among archaeal B-family polymerases and that, within this clade, Euryarchaea and Crenarchaea are clearly separated from each other and from a small group of bacterial homologs. These findings are compatible with an ancient duplication of the DNA polymerase gene followed by inactivation and parallel loss in some of the lineages although contribution of horizontal gene transfer cannot be ruled out. The inactivated derivative of the archaeal DNA polymerase could form a complex with the active paralog and play a structural role in DNA replication.
This article was reviewed by Purificacion Lopez-Garcia and Chris Ponting. For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2527604  PMID: 18684330
6.  Accumulation of GC donor splice signals in mammals 
Biology Direct  2008;3:30.
The GT dinucleotide in the first two intron positions is the most conserved element of the U2 donor splice signals. However, in a small fraction of donor sites, GT is replaced by GC. A substantial enrichment of GC in donor sites of alternatively spliced genes has been observed previously in human, nematode and Arabidopsis, suggesting that GC signals are important for regulation of alternative splicing. We used parsimony analysis to reconstruct evolution of donor splice sites and inferred 298 GT > GC conversion events compared to 40 GC > GT conversion events in primate and rodent genomes. Thus, there was substantive accumulation of GC donor splice sites during the evolution of mammals. Accumulation of GC sites might have been driven by selection for alternative splicing.
This article was reviewed by Jerzy Jurka and Anton Nekrutenko. For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2490688  PMID: 18613975
7.  U12 intron positions are more strongly conserved between animals and plants than U2 intron positions 
Biology Direct  2008;3:19.
We report that the positions of minor, U12 introns are conserved in orthologous genes from human and Arabidopsis to an even greater extent than the positions of the major, U2 introns. The U12 introns, especially, conserved ones are concentrated in 5'-portions of plant and animal genes, where the U12 to U2 conversions occurs preferentially in the 3'-portions of genes. These results are compatible with the hypothesis that the high level of conservation of U12 intron positions and their persistence in genomes despite the unidirectional U12 to U2 conversion are explained by the role of the slowly excised U12 introns in down-regulation of gene expression.
This article was reviewed by John Logsdon and Manyuan Long. For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2426677  PMID: 18479526
8.  Homoplasy in genome-wide analysis of rare amino acid replacements: the molecular-evolutionary basis for Vavilov's law of homologous series 
Biology Direct  2008;3:7.
Rare genomic changes (RGCs) that are thought to comprise derived shared characters of individual clades are becoming an increasingly important class of markers in genome-wide phylogenetic studies. Recently, we proposed a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions) that were inferred using genome-wide identification of amino acid replacements that were: i) located in unambiguously aligned regions of orthologous genes, ii) shared by two or more taxa in positions that contain a different, conserved amino acid in a much broader range of taxa, and iii) require two or three nucleotide substitutions. When applied to animal phylogeny, the RGC_CAM approach supported the coelomate clade that unites deuterostomes with arthropods as opposed to the ecdysozoan (molting animals) clade. However, a non-negligible level of homoplasy was detected.
We provide a direct estimate of the level of homoplasy caused by parallel changes and reversals among the RGC_CAMs using 462 alignments of orthologous genes from 19 eukaryotic species. It is shown that the impact of parallel changes and reversals on the results of phylogenetic inference using RGC_CAMs cannot explain the observed support for the Coelomata clade. In contrast, the evidence in support of the Ecdysozoa clade, in large part, can be attributed to parallel changes. It is demonstrated that parallel changes are significantly more common in internal branches of different subtrees that are separated from the respective common ancestor by relatively short times than in terminal branches separated by longer time intervals. A similar but much weaker trend was detected for reversals. The observed evolutionary trend of parallel changes is explained in terms of the covarion model of molecular evolution. As the overlap between the covarion sets in orthologous genes from different lineages decreases with time after divergence, the likelihood of parallel changes decreases as well.
The level of homoplasy observed here appears to be low enough to justify the utility of RGC_CAMs and other types of RGCs for resolution of hard problems in phylogeny. Parallel changes, one of the major classes of events leading to homoplasy, occur much more often in relatively recently diverged lineages than in those separated from their last common ancestor by longer time intervals of time. This pattern seems to provide the molecular-evolutionary underpinning of Vavilov's law of homologous series and is readily interpreted within the framework of the covarion model of molecular evolution.
This article was reviewed by Alex Kondrashov, Nicolas Galtier, and Maximilian Telford and Robert Lanfear (nominated by Laurence Hurst).
PMCID: PMC2292158  PMID: 18346278
9.  Patterns of intron gain and conservation in eukaryotic genes 
The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
PMCID: PMC2151770  PMID: 17935625
10.  Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns 
BMC Genomics  2006;7:311.
The signals that determine the specificity and efficiency of splicing are multiple and complex, and are not fully understood. Among other factors, the relative contributions of different mechanisms appear to depend on intron size inasmuch as long introns might hinder the activity of the spliceosome through interference with the proper positioning of the intron-exon junctions. Indeed, it has been shown that the information content of splice sites positively correlates with intron length in the nematode, Drosophila, and fungi. We explored the connections between the length of vertebrate introns, the strength of splice sites, exonic splicing signals, and evolution of flanking exons.
A compensatory relationship is shown to exist between different types of signals, namely, the splice sites and the exonic splicing enhancers (ESEs). In the range of relatively short introns (approximately, < 1.5 kilobases in length), the enhancement of the splicing signals for longer introns was manifest in the increased concentration of ESEs. In contrast, for longer introns, this effect was not detectable, and instead, an increase in the strength of the donor and acceptor splice sites was observed. Conceivably, accumulation of A-rich ESE motifs beyond a certain limit is incompatible with functional constraints operating at the level of protein sequence evolution, which leads to compensation in the form of evolution of the splice sites themselves toward greater strength. In addition, however, a correlation between sequence conservation in the exon ends and intron length, particularly, in synonymous positions, was observed throughout the entire length range of introns. Thus, splicing signals other than the currently defined ESEs, i.e., potential new classes of ESEs, might exist in exon sequences, particularly, those that flank long introns.
Several weak but statistically significant correlations were observed between vertebrate intron length, splice site strength, and potential exonic splicing signals. Taken together, these findings attest to a compensatory relationship between splice sites and exonic splicing signals, depending on intron length.
PMCID: PMC1713244  PMID: 17156453
11.  Signs of positive selection of somatic mutations in human cancers detected by EST sequence analysis 
BMC Cancer  2006;6:36.
Carcinogenesis typically involves multiple somatic mutations in caretaker (DNA repair) and gatekeeper (tumor suppressors and oncogenes) genes. Analysis of mutation spectra of the tumor suppressor that is most commonly mutated in human cancers, p53, unexpectedly suggested that somatic evolution of the p53 gene during tumorigenesis is dominated by positive selection for gain of function. This conclusion is supported by accumulating experimental evidence of evolution of new functions of p53 in tumors. These findings prompted a genome-wide analysis of possible positive selection during tumor evolution.
A comprehensive analysis of probable somatic mutations in the sequences of Expressed Sequence Tags (ESTs) from malignant tumors and normal tissues was performed in order to access the prevalence of positive selection in cancer evolution. For each EST, the numbers of synonymous and non-synonymous substitutions were calculated. In order to identify genes with a signature of positive selection in cancers, these numbers were compared to: i) expected numbers and ii) the numbers for the respective genes in the ESTs from normal tissues.
We identified 112 genes with a signature of positive selection in cancers, i.e., a significantly elevated ratio of non-synonymous to synonymous substitutions, in tumors as compared to 37 such genes in an approximately equal-sized EST collection from normal tissues. A substantial fraction of the tumor-specific positive-selection candidates have experimentally demonstrated or strongly predicted links to cancer.
The results of EST analysis should be interpreted with extreme caution given the noise introduced by sequencing errors and undetected polymorphisms. Furthermore, an inherent limitation of EST analysis is that multiple mutations amenable to statistical analysis can be detected only in relatively highly expressed genes. Nevertheless, the present results suggest that positive selection might affect a substantial number of genes during tumorigenic somatic evolution.
PMCID: PMC1431556  PMID: 16469093
12.  Mutational hotspots in the TP53 gene and, possibly, other tumor suppressors evolve by positive selection 
Biology Direct  2006;1:4.
The mutation spectra of the TP53 gene and other tumor suppressors contain multiple hotspots, i.e., sites of non-random, frequent mutation in tumors and/or the germline. The origin of the hotspots remains unclear, the general view being that they represent highly mutable nucleotide contexts which likely reflect effects of different endogenous and exogenous factors shaping the mutation process in specific tissues. The origin of hotspots is of major importance because it has been suggested that mutable contexts could be used to infer mechanisms of mutagenesis contributing to tumorigenesis.
Here we apply three independent tests, accounting for non-uniform base compositions in synonymous and non-synonymous sites, to test whether the hotspots emerge via selection or due to mutational bias. All three tests consistently indicate that the hotspots in the TP53 gene evolve, primarily, via positive selection. The results were robust to the elimination of the highly mutable CpG dinucleotides. By contrast, only one, the least conservative test reveals the signature of positive selection in BRCA1, BRCA2, and p16. Elucidation of the origin of the hotspots in these genes requires more data on somatic mutations in tumors.
The results of this analysis seem to indicate that positive selection for gain-of-function in tumor suppressor genes is an important aspect of tumorigenesis, blurring the distinction between tumor suppressors and oncogenes.
This article was reviewed by Sandor Pongor, Christopher Lee and Mikhail Blagosklonny.
PMCID: PMC1403748  PMID: 16542006
13.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes 
Genome Biology  2004;5(2):R7.
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes.
Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes.
The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.
PMCID: PMC395751  PMID: 14759257
14.  Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ 
Genome Biology  2003;4(9):R55.
Comparative genomics and phylogenetic analysis have been used to examine horizontal transfer of entire operons versus displacement of individual genes within operons by horizontally acquired orthologs and independent assembly of the same or similar operons from genes with different phylogenetic affinities.
Shuffling and disruption of operons and horizontal gene transfer are major contributions to the new, dynamic view of prokaryotic evolution. Under the 'selfish operon' hypothesis, operons are viewed as mobile genetic entities that are constantly disseminated via horizontal gene transfer, although their retention could be favored by the advantage of coregulation of functionally linked genes. Here we apply comparative genomics and phylogenetic analysis to examine horizontal transfer of entire operons versus displacement of individual genes within operons by horizontally acquired orthologs and independent assembly of the same or similar operons from genes with different phylogenetic affinities.
Since a substantial number of operons have been identified experimentally in only a few model bacteria, evolutionarily conserved gene strings were analyzed as surrogates of operons. The phylogenetic affinities within these predicted operons were assessed first by sequence similarity analysis and then by phylogenetic analysis, including statistical tests of tree topology. Numerous cases of apparent horizontal transfer of entire operons were detected. However, it was shown that apparent horizontal transfer of individual genes or arrays of genes within operons is not uncommon either and results in xenologous gene displacement in situ, that is, displacement of an ancestral gene by a horizontally transferred ortholog from a taxonomically distant organism without change of the local gene organization. On rarer occasions, operons might have evolved via independent assembly, in part from horizontally acquired genes.
The discovery of in situ gene displacement shows that combination of rampant horizontal gene transfer with selection for preservation of operon structure provides for events in prokaryotic evolution that, a priori, seem improbable. These findings also emphasize that not all aspects of operon evolution are selfish, with operon integrity maintained by purifying selection at the organism level.
PMCID: PMC193655  PMID: 12952534
15.  Getting positive about selection 
Genome Biology  2003;4(8):331.
A report on the 68th Symposium on Quantitative Biology, 'The Genome of Homo Sapiens', Cold Spring Harbor, USA, 28 May-2 June 2003.
A report on the 68th Symposium on Quantitative Biology, The Genome of Homo Sapiens', Cold Spring Harbor, USA, 28 May-2 June 2003.
PMCID: PMC193638  PMID: 12914654
16.  The rhomboids: a nearly ubiquitous family of intramembrane serine proteases that probably evolved by multiple ancient horizontal gene transfers 
Genome Biology  2003;4(3):R19.
The near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer.
The rhomboid family of polytopic membrane proteins shows a level of evolutionary conservation unique among membrane proteins. They are present in nearly all the sequenced genomes of archaea, bacteria and eukaryotes, with the exception of several species with small genomes. On the basis of experimental studies with the developmental regulator rhomboid from Drosophila and the AarA protein from the bacterium Providencia stuartii, the rhomboids are thought to be intramembrane serine proteases whose signaling function is conserved in eukaryotes and prokaryotes.
Phylogenetic tree analysis carried out using several independent methods for tree constructions and the corresponding statistical tests suggests that, despite its broad distribution in all three superkingdoms, the rhomboid family was not present in the last universal common ancestor of extant life forms. Instead, we propose that rhomboids evolved in bacteria and have been acquired by archaea and eukaryotes through several independent horizontal gene transfers. In eukaryotes, two distinct, ancient acquisitions apparently gave rise to the two major subfamilies, typified by rhomboid and PARL (presenilins-associated rhomboid-like protein), respectively. Subsequent evolution of the rhomboid family in eukaryotes proceeded by multiple duplications and functional diversification through the addition of extra transmembrane helices and other domains in different orientations relative to the conserved core that harbors the protease activity.
Although the near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer. This emphasizes the importance of explicit phylogenetic analysis for the reconstruction of ancestral life forms. A hypothetical scenario for the origin of intracellular membrane proteases from membrane transporters is proposed.
PMCID: PMC153459  PMID: 12620104
17.  Selection in the evolution of gene duplications 
Genome Biology  2002;3(2):research0008.1-research0008.9.
Gene duplications have a major role in the evolution of new biological functions. Theoretical studies often assume that a duplication per se is selectively neutral and that, following a duplication, one of the gene copies is freed from purifying (stabilizing) selection, which creates the potential for evolution of a new function.
In search of systematic evidence of accelerated evolution after duplication, we used data from 26 bacterial, six archaeal, and seven eukaryotic genomes to compare the mode and strength of selection acting on recently duplicated genes (paralogs) and on similarly diverged, unduplicated orthologous genes in different species. We find that the ratio of nonsynonymous to synonymous substitutions (Kn/Ks) in most paralogous pairs is <<1 and that paralogs typically evolve at similar rates, without significant asymmetry, indicating that both paralogs produced by a duplication are subject to purifying selection. This selection is, however, substantially weaker than the purifying selection affecting unduplicated orthologs that have diverged to the same extent as the analyzed paralogs. Most of the recently duplicated genes appear to be involved in various forms of environmental response; in particular, many of them encode membrane and secreted proteins.
The results of this analysis indicate that recently duplicated paralogs evolve faster than orthologs with the same level of divergence and similar functions, but apparently do not experience a phase of neutral evolution. We hypothesize that gene duplications that persist in an evolving lineage are beneficial from the time of their origin, due primarily to a protein dosage effect in response to variable environmental conditions; duplications are likely to give rise to new functions at a later phase of their evolution once a higher level of divergence is reached.
PMCID: PMC65685  PMID: 11864370
18.  Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins 
Genome Biology  2001;2(12):research0053.1-research0053.9.
Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat).
Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment.
Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins.
PMCID: PMC64838  PMID: 11790256
19.  Genome trees constructed using five different approaches suggest new major bacterial clades 
The availability of multiple complete genome sequences from diverse taxa prompts the development of new phylogenetic approaches, which attempt to incorporate information derived from comparative analysis of complete gene sets or large subsets thereof. Such attempts are particularly relevant because of the major role of horizontal gene transfer and lineage-specific gene loss, at least in the evolution of prokaryotes.
Five largely independent approaches were employed to construct trees for completely sequenced bacterial and archaeal genomes: i) presence-absence of genomes in clusters of orthologous genes; ii) conservation of local gene order (gene pairs) among prokaryotic genomes; iii) parameters of identity distribution for probable orthologs; iv) analysis of concatenated alignments of ribosomal proteins; v) comparison of trees constructed for multiple protein families. All constructed trees support the separation of the two primary prokaryotic domains, bacteria and archaea, as well as some terminal bifurcations within the bacterial and archaeal domains. Beyond these obvious groupings, the trees made with different methods appeared to differ substantially in terms of the relative contributions of phylogenetic relationships and similarities in gene repertoires caused by similar life styles and horizontal gene transfer to the tree topology. The trees based on presence-absence of genomes in orthologous clusters and the trees based on conserved gene pairs appear to be strongly affected by gene loss and horizontal gene transfer. The trees based on identity distributions for orthologs and particularly the tree made of concatenated ribosomal protein sequences seemed to carry a stronger phylogenetic signal. The latter tree supported three potential high-level bacterial clades,: i) Chlamydia-Spirochetes, ii) Thermotogales-Aquificales (bacterial hyperthermophiles), and ii) Actinomycetes-Deinococcales-Cyanobacteria. The latter group also appeared to join the low-GC Gram-positive bacteria at a deeper tree node. These new groupings of bacteria were supported by the analysis of alternative topologies in the concatenated ribosomal protein tree using the Kishino-Hasegawa test and by a census of the topologies of 132 individual groups of orthologous proteins. Additionally, the results of this analysis put into question the sister-group relationship between the two major archaeal groups, Euryarchaeota and Crenarchaeota, and suggest instead that Euryarchaeota might be a paraphyletic group with respect to Crenarchaeota.
We conclude that, the extensive horizontal gene flow and lineage-specific gene loss notwithstanding, extension of phylogenetic analysis to the genome scale has the potential of uncovering deep evolutionary relationships between prokaryotic lineages.
PMCID: PMC60490  PMID: 11734060

Results 1-19 (19)