Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)
Year of Publication
Document Types
1.  Estimation of prokaryotic supergenome size and composition from gene frequency distributions 
BMC Genomics  2014;15(Suppl 6):S14.
Because prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome. Previous attempts to understand the nature of the supergenome treated genomes as random, independent collections of genes and assumed that the supergenome consists of a small number of homogeneous sub-reservoirs. Here we explore the consequences of relaxing both assumptions.
We surveyed several methods for estimating the size and composition of the supergenome. The methods assumed that genomes were either random, independent samples of the supergenome or that they evolved from a common ancestor along a known tree via stochastic sampling from the reservoir. The reservoir was assumed to be either a collection of homogeneous sub-reservoirs or alternatively composed of genes with Gamma distributed gain probabilities. Empirical gene frequencies were used to either compute the likelihood of the data directly or first to reconstruct the history of gene gains and then compute the likelihood of the reconstructed numbers of gains.
Supergenome size estimates using the empirical gene frequencies directly are not robust with respect to the choice of the model. By contrast, using the gene frequencies and the phylogenetic tree to reconstruct multiple gene gains produces reliable estimates of the supergenome size and indicates that a homogeneous supergenome is more consistent with the data than a supergenome with Gamma distributed gain probabilities.
PMCID: PMC4240607  PMID: 25572821
supergenome; genome evolution; gene frequency distribution; ancestral reconstruction
2.  Evolution of alternative and constitutive regions of mammalian 5'UTRs 
BMC Genomics  2009;10:162.
Alternative splicing (AS) in protein-coding sequences has emerged as an important mechanism of regulation and diversification of animal gene function. By contrast, the extent and roles of alternative events including AS and alternative transcription initiation (ATI) within the 5'-untranslated regions (5'UTRs) of mammalian genes are not well characterized.
We evaluated the abundance, conservation and evolution of putative regulatory control elements, namely, upstream start codons (uAUGs) and open reading frames (uORFs), in the 5'UTRs of human and mouse genes impacted by alternative events. For genes with alternative 5'UTRs, the fraction of alternative sequences (those present in a subset of the transcripts) is much greater than that in the corresponding coding sequence, conceivably, because 5'UTRs are not bound by constraints on protein structure that limit AS in coding regions. Alternative regions of mammalian 5'UTRs evolve faster and are subject to a weaker purifying selection than constitutive portions. This relatively weak selection results in over-abundance of uAUGs and uORFs in the alternative regions of 5'UTRs compared to constitutive regions. Nevertheless, even in alternative regions, uORFs evolve under a stronger selection than the rest of the sequences, indicating that some of the uORFs are conserved regulatory elements; some of the non-conserved uORFs could be involved in species-specific regulation.
The findings on the evolution and selection in alternative and constitutive regions presented here are consistent with the hypothesis that alternative events, namely, AS and ATI, in 5'UTRs of mammalian genes are likely to contribute to the regulation of translation.
PMCID: PMC2674463  PMID: 19371439
3.  Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns 
BMC Genomics  2006;7:311.
The signals that determine the specificity and efficiency of splicing are multiple and complex, and are not fully understood. Among other factors, the relative contributions of different mechanisms appear to depend on intron size inasmuch as long introns might hinder the activity of the spliceosome through interference with the proper positioning of the intron-exon junctions. Indeed, it has been shown that the information content of splice sites positively correlates with intron length in the nematode, Drosophila, and fungi. We explored the connections between the length of vertebrate introns, the strength of splice sites, exonic splicing signals, and evolution of flanking exons.
A compensatory relationship is shown to exist between different types of signals, namely, the splice sites and the exonic splicing enhancers (ESEs). In the range of relatively short introns (approximately, < 1.5 kilobases in length), the enhancement of the splicing signals for longer introns was manifest in the increased concentration of ESEs. In contrast, for longer introns, this effect was not detectable, and instead, an increase in the strength of the donor and acceptor splice sites was observed. Conceivably, accumulation of A-rich ESE motifs beyond a certain limit is incompatible with functional constraints operating at the level of protein sequence evolution, which leads to compensation in the form of evolution of the splice sites themselves toward greater strength. In addition, however, a correlation between sequence conservation in the exon ends and intron length, particularly, in synonymous positions, was observed throughout the entire length range of introns. Thus, splicing signals other than the currently defined ESEs, i.e., potential new classes of ESEs, might exist in exon sequences, particularly, those that flank long introns.
Several weak but statistically significant correlations were observed between vertebrate intron length, splice site strength, and potential exonic splicing signals. Taken together, these findings attest to a compensatory relationship between splice sites and exonic splicing signals, depending on intron length.
PMCID: PMC1713244  PMID: 17156453
4.  Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52 
BMC Genomics  2002;3:8.
The DNA single-strand annealing proteins (SSAPs), such as RecT, Redβ, ERF and Rad52, function in RecA-dependent and RecA-independent DNA recombination pathways. Recently, they have been shown to form similar helical quaternary superstructures. However, despite the functional similarities between these diverse SSAPs, their actual evolutionary affinities are poorly understood.
Using sensitive computational sequence analysis, we show that the RecT and Redβ proteins, along with several other bacterial proteins, form a distinct superfamily. The ERF and Rad52 families show no direct evolutionary relationship to these proteins and define novel superfamilies of their own. We identify several previously unknown members of each of these superfamilies and also report, for the first time, bacterial and viral homologs of Rad52. Additionally, we predict the presence of aberrant HhH modules in RAD52 that are likely to be involved in DNA-binding. Using the contextual information obtained from the analysis of gene neighborhoods, we provide evidence of the interaction of the bacterial members of each of these SSAP superfamilies with a similar set of DNA repair/recombination protein. These include different nucleases or Holliday junction resolvases, the ABC ATPase SbcC and the single-strand-binding protein. We also present evidence of independent assembly of some of the predicted operons encoding SSAPs and in situ displacement of functionally similar genes.
There are three evolutionarily distinct superfamilies of SSAPs, namely the RecT/Redβ, ERF, and RAD52, that have different sequence conservation patterns and predicted folds. All these SSAPs appear to be primarily of bacteriophage origin and have been acquired by numerous phylogenetically distant cellular genomes. They generally occur in predicted operons encoding one or more of a set of conserved DNA recombination proteins that appear to be the principal functional partners of the SSAPs.
PMCID: PMC101383  PMID: 11914131

Results 1-4 (4)