Search tips
Search criteria

Results 1-25 (54)

Clipboard (0)
Year of Publication
1.  A DNA topoisomerase IB in Thaumarchaeota testifies for the presence of this enzyme in the last common ancestor of Archaea and Eucarya 
Biology Direct  2008;3:54.
DNA topoisomerase IB (TopoIB) was thought for a long time to be a eukaryotic specific enzyme. A shorter version was then found in viruses and later on in several bacteria, but not in archaea. Here, we show that a eukaryotic-like TopoIB is present in the recently sequenced genomes of two archaea of the newly proposed phylum Thaumarchaeota. Phylogenetic analyses suggest that a TopoIB was present in the last common ancestor of Archaea and Eucarya. This finding indicates that the last common ancestor of Archaea and Eucarya may have harboured a DNA genome.
This article was reviewed by Eugene Koonin and Anthony Poole
PMCID: PMC2621148  PMID: 19105819
2.  Evidence from glycine transfer RNA of a frozen accident at the dawn of the genetic code 
Biology Direct  2008;3:53.
Transfer RNA (tRNA) is the means by which the cell translates DNA sequence into protein according to the rules of the genetic code. A credible proposition is that tRNA was formed from the duplication of an RNA hairpin half the length of the contemporary tRNA molecule, with the point at which the hairpins were joined marked by the canonical intron insertion position found today within tRNA genes. If these hairpins possessed a 3'-CCA terminus with different combinations of stem nucleotides (the ancestral operational RNA code), specific aminoacylation and perhaps participation in some form of noncoded protein synthesis might have occurred. However, the identity of the first tRNA and the initial steps in the origin of the genetic code remain elusive.
Here we show evidence that glycine tRNA was the first tRNA, as revealed by a vestigial imprint in the anticodon loop sequences of contemporary descendents. This provides a plausible mechanism for the missing first step in the origin of the genetic code. In 448 of 466 glycine tRNA gene sequences from bacteria, archaea and eukaryote cytoplasm analyzed, CCA occurs immediately upstream of the canonical intron insertion position, suggesting the first anticodon (NCC for glycine) has been captured from the 3'-terminal CCA of one of the interacting hairpins as a result of an ancestral ligation.
That this imprint (including the second and third nucleotides of the glycine tRNA anticodon) has been retained through billions of years of evolution suggests Crick's 'frozen accident' hypothesis has validity for at least this very first step at the dawn of the genetic code.
This article was reviewed by Dr Eugene V. Koonin, Dr Rob Knight and Dr David H Ardell.
PMCID: PMC2630981  PMID: 19091122
3.  Pitfalls of the most commonly used models of context dependent substitution 
Biology Direct  2008;3:52.
Neighboring nucleotides exert a striking influence on mutation, with the hypermutability of CpG dinucleotides in many genomes being an exemplar. Among the approaches employed to measure the relative importance of sequence neighbors on molecular evolution have been continuous-time Markov process models for substitutions that treat sequences as a series of independent tuples. The most widely used examples are the codon substitution models. We evaluated the suitability of derivatives of the nucleotide frequency weighted (hereafter NF) and tuple frequency weighted (hereafter TF) models for measuring sequence context dependent substitution. Critical properties we address are their relationships to an independent nucleotide process and the robustness of parameter estimation to changes in sequence composition. We then consider the impact on inference concerning dinucleotide substitution processes from application of these two forms to intron sequence alignments from primates.
We prove that the NF form always nests the independent nucleotide process and that this is not true for the TF form. As a consequence, using TF to study context effects can be misleading, which is shown by both theoretical calculations and simulations. We describe a simple example where a context parameter estimated under TF is confounded with composition terms unless all sequence states are equi-frequent. We illustrate this for the dinucleotide case by simulation under a nucleotide model, showing that the TF form identifies a CpG effect when none exists. Our analysis of primate introns revealed that the effect of nucleotide neighbors is over-estimated under TF compared with NF. Parameter estimates for a number of contexts are also strikingly discordant between the two model forms.
Our results establish that the NF form should be used for analysis of independent-tuple context dependent processes. Although neighboring effects in general are still important, prominent influences such as the elevated CpG transversion rate previously identified using the TF form are an artifact. Our results further suggest as few as 5 parameters may account for ~85% of neighboring nucleotide influence.
This article was reviewed by Dr Rob Knight, Dr Josh Cherry (nominated by Dr David Lipman) and Dr Stephen Altschul (nominated by Dr David Lipman).
PMCID: PMC2628887  PMID: 19087239
4.  The evolution of domain-content in bacterial genomes 
Biology Direct  2008;3:51.
Across all sequenced bacterial genomes, the number of domains nc in different functional categories c scales as a power-law in the total number of domains n, i.e. nc∝nαc, with exponents αc that vary across functional categories. Here we investigate the implications of these scaling laws for the evolution of domain-content in bacterial genomes and derive the simplest evolutionary model consistent with these scaling laws.
We show that, using only an assumption of time invariance, the scaling laws uniquely determine the relative rates of domain additions and deletions across all functional categories and evolutionary lineages. In particular, the model predicts that the rate of additions and deletions of domains of category c is proportional to the number of domains nc currently in the genome and we discuss the implications of this observation for the role of horizontal transfer in genome evolution. Second, in addition to being proportional to nc, the rate of additions and deletions of domains of category c is proportional to a category-dependent constant ρc, which is the same for all evolutionary lineages. This 'evolutionary potential' ρc represents the relative probability for additions/deletions of domains of category c to be fixed in the population by selection and is predicted to equal the scaling exponent αc. By comparing the domain content of 93 pairs of closely-related genomes from all over the phylogenetic tree of bacteria, we demonstrate that the model's predictions are supported by available genome-sequence data.
Our results establish a direct quantitative connection between the scaling of domain numbers with genome size, and the rate of addition and deletions of domains during short evolutionary time intervals.
This article was reviewed by Eugene V. Koonin, Martijn A. Huynen, and Sergei Maslov.
PMCID: PMC2615428  PMID: 19077245
5.  Are we degenerate tetraploids? More genomes, new facts 
Biology Direct  2008;3:50.
Within the bilaterians, the appearance and evolution of vertebrates is accompanied by enormous changes in anatomical, morphological and developmental features. This evolution of increased complexity has been associated with two genome duplications (2R hypothesis) at the origin of vertebrates. However, in spite of extensive debate the validity of the 2R hypothesis remains controversial. The paucity of sequence data in early years of genomic era was an intrinsic obstacle in tracking the genome evolutionary history of chordates.
In this article I review the 2R hypothesis by taking into account the recent availability of genomic sequence data for an expanding range of animals. I argue here that genetic architecture of lower metazoans and representatives of major vertebrate and invertebrate lineages provides no support for the hypothesis relating the origin of vertebrates with widespread gene or genome duplications.
It appears that much of the genomic complexity of modern vertebrates is very ancient likely predating the origin of chordates or even the Bilaterian-Nonbilaterian divergence. The origin and evolution of vertebrates is partly accompanied by an increase in gene number. However, neither can we take this subtle increase in gene number as an only causative factor for evolution of phenotypic complexity in modern vertebrates nor we can take it as a reflection of polyplodization events early in their history.
This article was reviewed by Eugene Koonin, Joshua Cherry (nominated by David Lipman), and Jerzy Jurka.
PMCID: PMC2615429  PMID: 19077184
6.  Activating and inhibiting connections in biological network dynamics 
Biology Direct  2008;3:49.
Many studies of biochemical networks have analyzed network topology. Such work has suggested that specific types of network wiring may increase network robustness and therefore confer a selective advantage. However, knowledge of network topology does not allow one to predict network dynamical behavior – for example, whether deleting a protein from a signaling network would maintain the network's dynamical behavior, or induce oscillations or chaos.
Here we report that the balance between activating and inhibiting connections is important in determining whether network dynamics reach steady state or oscillate. We use a simple dynamical model of a network of interacting genes or proteins. Using the model, we study random networks, networks selected for robust dynamics, and examples of biological network topologies. The fraction of activating connections influences whether the network dynamics reach steady state or oscillate.
The activating fraction may predispose a network to oscillate or reach steady state, and neutral evolution or selection of this parameter may affect the behavior of biological networks. This principle may unify the dynamics of a wide range of cellular networks.
Reviewed by Sergei Maslov, Eugene Koonin, and Yu (Brandon) Xia (nominated by Mark Gerstein). For the full reviews, please go to the Reviewers' comments section.
PMCID: PMC2651858  PMID: 19055800
7.  Functional insight into Maelstrom in the germline piRNA pathway: a unique domain homologous to the DnaQ-H 3'–5' exonuclease, its lineage-specific expansion/loss and evolutionarily active site switch 
Biology Direct  2008;3:48.
Maelstrom (MAEL) plays a crucial role in a recently-discovered piRNA pathway; however its specific function remains unknown. Here a novel MAEL-specific domain characterized by a set of conserved residues (Glu-His-His-Cys-His-Cys, EHHCHC) was identified in a broad range of species including vertebrates, sea squirts, insects, nematodes, and protists. It exhibits ancient lineage-specific expansions in several species, however, appears to be lost in all examined teleost fish species. Functional involvement of MAEL domains in DNA- and RNA-related processes was further revealed by its association with HMG, SR-25-like and HDAC_interact domains. A distant similarity to the DnaQ-H 3'–5' exonuclease family with the RNase H fold was discovered based on the evidence that all MAEL domains adopt the canonical RNase H fold; and several protist MAEL domains contain the conserved 3'–5' exonuclease active site residues (Asp-Glu-Asp-His-Asp, DEDHD). This evolutionary link together with structural examinations leads to a hypothesis that MAEL domains may have a potential nuclease activity or RNA-binding ability that may be implicated in piRNA biogenesis. The observed transition of two sets of characteristic residues between the ancestral DnaQ-H and the descendent MAEL domains may suggest a new mode for protein function evolution called "active site switch", in which the protist MAEL homologues are the likely evolutionary intermediates due to harboring the specific characteristics of both 3'–5' exonuclease and MAEL domains.
This article was reviewed by L Aravind, Wing-Cheong Wong and Frank Eisenhaber. For the full reviews, please go to the Reviewers' Comments section.
PMCID: PMC2628886  PMID: 19032786
8.  On the brink between extinction and persistence 
Biology Direct  2008;3:47.
The nature of size fluctuations is crucial in forecasting future population persistence, independently of whether the variability stems from external forces or from the dynamics of the population renewal process. The risk of intercepting zero is highly dependent on the way the variance of the population size relates to its mean. The minimum population size required for a population not to go extinct can be determined by a scaling equation relating the variance to the arithmetic mean. By the use of a derived expression for the harmonic mean defined by the parameters of the scaling equation we show how it is possible to separate the domains of persistence from those of extinction and to facilitate the identification of populations on the brink of extinction.
This article was reviewed by Mark W. Schwartz (nominated by Peter Olofsson), Josef Bryja (nominated by Aniko Szabo) and Wai-YuanTan. For the full reviews, please go to the Reviewers' Comments section.
PMCID: PMC2613133  PMID: 19019237
9.  Exon definition as a potential negative force against intron losses in evolution 
Biology Direct  2008;3:46.
Previous studies have indicated that the wide variation in intron density (the number of introns per gene) among different eukaryotes largely reflects varying degrees of intron loss during evolution. The most popular model, which suggests that organisms lose introns through a mechanism in which reverse-transcribed cDNA recombines with the genomic DNA, concerns only one mutational force.
Using exons as the units of splicing-site recognition, exon definition constrains the length of exons. An intron-loss event results in fusion of flanking exons and thus a larger exon. The large size of the newborn exon may cause splicing errors, i.e., exon skipping, if the splicing of pre-mRNAs is initiated by exon definition. By contrast, if the splicing of pre-mRNAs is initiated by intron definition, intron loss does not matter. Exon definition may thus be a selective force against intron loss. An organism with a high frequency of exon definition is expected to experience a low rate of intron loss throughout evolution and have a high density of spliceosomal introns.
The majority of spliceosomal introns in vertebrates may be maintained during evolution not because of potential functions, but because of their splicing mechanism (i.e., exon definition). Further research is required to determine whether exon definition is a negative force in maintaining the high intron density of vertebrates.
This article was reviewed by Dr. Scott W. Roy (nominated by Dr. John Logsdon), Dr. Eugene V. Koonin, and Dr. Igor B. Rogozin (nominated by Dr. Mikhail Gelfand). For the full reviews, please go to the Reviewers' comments section.
PMCID: PMC2614967  PMID: 19014515
10.  Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination 
Biology Direct  2008;3:45.
Recently Mycobacterium tuberculosis was shown to possess a novel protein modification, in which a small protein Pup is conjugated to the epsilon-amino groups of lysines in target proteins. Analogous to ubiquitin modification in eukaryotes, this remarkable modification recruits proteins for degradation via archaeal-type proteasomes found in mycobacteria and allied actinobacteria. While a mycobacterial protein named PafA was found to be required for this conjugation reaction, its biochemical mechanism has not been elucidated. Using sensitive sequence profile comparison methods we establish that the PafA family proteins are related to the γ-glutamyl-cysteine synthetase and glutamine synthetase. Hence, we predict that PafA is the Pup ligase, which catalyzes the ATP-dependent ligation of the terminal γ-carboxylate of glutamate to lysines, similar to the above enzymes. We further discovered that an ortholog of the eukaryotic PAC2 (e.g. cg2106) is often present in the vicinity of the actinobacterial Pup-proteasome gene neighborhoods and is likely to represent the ancestral proteasomal chaperone. Pup-conjugation is sporadically present outside the actinobacteria in certain lineages, such as verrucomicrobia, nitrospirae, deltaproteobacteria and planctomycetes, and in the latter two lineages it might modify membrane proteins.
This article was reviewed by M. Madan Babu and Andrei Osterman
PMCID: PMC2588565  PMID: 18980670
11.  Synaptic enrichment of microRNAs in adult mouse forebrain is related to structural features of their precursors 
Biology Direct  2008;3:44.
Within mouse forebrain, a subset of microRNAs are significantly enriched in synaptoneurosomes (a synaptic fraction containing pinched-off dendritic spines) and a subset are significantly depleted relative to total forebrain homogenate. Here I show that, as a group, the pre-miR hairpin precursors of synaptically enriched microRNAs exhibit significantly different structural features than those that are non-enriched or depleted. Precursors of synaptically enriched microRNAs tend to have a) shorter uninterrupted double-stranded stem segments, and b) more symmetrical bulges containing a single nucleotide on each side. These structural differences may provide a basis for the differential binding of proteins that mediate dendritic transport of pre-miRs, or that prevent pre-miRs from being prematurely processed into mature miRNAs during the transport process.
This article was reviewed by I. King Jordan and Jerzy Jurka.
PMCID: PMC2588566  PMID: 18957138
12.  Did RNA editing in plant organellar genomes originate under natural selection or through genetic drift? 
Biology Direct  2008;3:43.
The C↔U substitution types of RNA editing have been observed frequently in organellar genomes of land plants. Although various attempts have been made to explain why such a seemingly inefficient genetic mechanism would have evolved, no satisfactory explanation exists in our view. In this study, we examined editing patterns in chloroplast genomes of the hornwort Anthoceros formosae and the fern Adiantum capillus-veneris and in mitochondrial genomes of the angiosperms Arabidopsis thaliana, Beta vulgaris and Oryza sativa, to gain an understanding of the question of how RNA editing originated.
We found that 1) most editing sites were distributed at the 2nd and 1st codon positions, 2) editing affected codons that resulted in larger hydrophobicity and molecular size changes much more frequently than those with little change involved, 3) editing uniformly increased protein hydrophobicity, 4) editing occurred more frequently in ancestrally T-rich sequences, which were more abundant in genes encoding membrane-bound proteins with many hydrophobic amino acids than in genes encoding soluble proteins, and 5) editing occurred most often in genes found to be under strong selective constraint.
These analyses show that editing mostly affects functionally important and evolutionarily conserved codon positions, codons and genes encoding membrane-bound proteins. In particular, abundance of RNA editing in plant organellar genomes may be associated with disproportionately large percentages of genes in these two genomes that encode membrane-bound proteins, which are rich in hydrophobic amino acids and selectively constrained. These data support a hypothesis that natural selection imposed by protein functional constraints has contributed to selective fixation of certain editing sites and maintenance of the editing activity in plant organelles over a period of more than four hundred millions years. The retention of genes encoding RNA editing activity may be driven by forces that shape nucleotide composition equilibrium in two organellar genomes of these plants. Nevertheless, the causes of lineage-specific occurrence of a large portion of RNA editing sites remain to be determined.
This article was reviewed by Michael Gray (nominated by Laurence Hurst), Kirsten Krause (nominated by Martin Lercher), and Jeffery Mower (nominated by David Ardell).
PMCID: PMC2584032  PMID: 18939975
13.  A new model defines the minimal set of polymorphism in HLA-DQ and -DR that determines susceptibility and resistance to autoimmune diabetes 
Biology Direct  2008;3:42.
The mechanism underlying autoimmune diabetes has been difficult to define. There is a strong genetic contribution and numerous studies associate the major histocompatibility complex, especially the class II region, with predisposition or resistance. However, how these molecules are implicated remains obscure.
Presentation of the hypothesis
We have supplemented structural analysis with computational biophysical and sequence analyses and propose an heuristic for distinguishing between human leukocyte antigen molecules that predispose to insulin dependent diabetes mellitus and those that are protective. Polar residues at both β37 and β9 suffice to distinguish accurately between class II alleles that predispose to type 1 diabetes and those that do not. The electrostatic potential within the peptide binding pocket exerts a strong influence on diabetogenic epitopes with basic residues. Diabetes susceptibility alleles are predicted to bind autoantigens strongly with tight affinity, prolonged association and altered cytokine expression profile. Protective alleles bind moderately, and neutral alleles poorly or not at all. Non-Asp β57 is a modifier that supplements disease risk but only in the presence of the polymorphic, polar pair at β9 and β37. The nature of β37 determines resistance on one hand, and susceptibility or dominant protection on the other.
The proposed ideas are illustrated with structural, functional and population studies from the literature. The hypothesis, in turn, rationalizes their results. A plausible mechanism of immune mediated diabetes based on binding affinity and peptide kinetics is discussed. The number of the polymorphic markers present correlates with onset of disease and severity. The molecular elucidation of disease susceptibility and resistance paves the way for risk prediction, treatment and prevention of disease based on analogue peptides.
This article was reviewed by Eugene V. Koonin, Michael Lenardo, Hossam Ashour, and Bhagirath Singh. For the full reviews, please go to the Reviewers' comments section.
PMCID: PMC2590596  PMID: 18854049
14.  Transduplication resulted in the incorporation of two protein-coding sequences into the Turmoil-1 transposable element of C. elegans 
Biology Direct  2008;3:41.
Transposable elements may acquire unrelated gene fragments into their sequences in a process called transduplication. Transduplication of protein-coding genes is common in plants, but is unknown of in animals. Here, we report that the Turmoil-1 transposable element in C. elegans has incorporated two protein-coding sequences into its inverted terminal repeat (ITR) sequences. The ITRs of Turmoil-1 contain a conserved RNA recognition motif (RRM) that originated from the rsp-2 gene and a fragment from the protein-coding region of the cpg-3 gene. We further report that an open reading frame specific to C. elegans may have been created as a result of a Turmoil-1 insertion. Mutations at the 5' splice site of this open reading frame may have reactivated the transduplicated RRM motif.
This article was reviewed by Dan Graur and William Martin. For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2572040  PMID: 18842128
15.  Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution 
Biology Direct  2008;3:40.
Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate.
This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude.
Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution.
This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2572155  PMID: 18840284
16.  A new family of polymerases related to superfamily A DNA polymerases and T7-like DNA-dependent RNA polymerases 
Biology Direct  2008;3:39.
Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases.
This article was reviewed by Eugene Koonin and Mark Ragan.
PMCID: PMC2579912  PMID: 18834537
17.  CAIcal: A combined set of tools to assess codon usage adaptation 
Biology Direct  2008;3:38.
The Codon Adaptation Index (CAI) was first developed to measure the synonymous codon usage bias for a DNA or RNA sequence. The CAI quantifies the similarity between the synonymous codon usage of a gene and the synonymous codon frequency of a reference set.
We describe here CAIcal, a web-server available at that includes a complete set of utilities related with the CAI. The server provides useful important features, such as the calculation and graphical representation of the CAI along either an individual sequence or a protein multiple sequence alignment translated to DNA. The automated calculation of CAI and its expected value is also included as one of the CAIcal tools. The software is also free to be downloaded as a standalone application for local use.
The CAIcal server provides a complete set of tools to assess codon usage adaptation and to help in genome annotation.
This article was reviewed by Purificación López-García, Dan Graur, Rob Knight and Shamil Sunyaev.
PMCID: PMC2553769  PMID: 18796141
18.  An extension of the coevolution theory of the origin of the genetic code 
Biology Direct  2008;3:37.
The coevolution theory of the origin of the genetic code suggests that the genetic code is an imprint of the biosynthetic relationships between amino acids. However, this theory does not seem to attribute a role to the biosynthetic relationships between the earliest amino acids that evolved along the pathways of energetic metabolism. As a result, the coevolution theory is unable to clearly define the very earliest phases of genetic code origin. In order to remove this difficulty, I here suggest an extension of the coevolution theory that attributes a crucial role to the first amino acids that evolved along these biosynthetic pathways and to their biosynthetic relationships, even when defined by the non-amino acid molecules that are their precursors.
It is re-observed that the first amino acids to evolve along these biosynthetic pathways are predominantly those codified by codons of the type GNN, and this observation is found to be statistically significant. Furthermore, the close biosynthetic relationships between the sibling amino acids Ala-Ser, Ser-Gly, Asp-Glu, and Ala-Val are not random in the genetic code table and reinforce the hypothesis that the biosynthetic relationships between these six amino acids played a crucial role in defining the very earliest phases of genetic code origin.
All this leads to the hypothesis that there existed a code, GNS, reflecting the biosynthetic relationships between these six amino acids which, as it defines the very earliest phases of genetic code origin, removes the main difficulty of the coevolution theory. Furthermore, it is here discussed how this code might have naturally led to the code codifying only for the domains of the codons of precursor amino acids, as predicted by the coevolution theory. Finally, the hypothesis here suggested also removes other problems of the coevolution theory, such as the existence for certain pairs of amino acids with an unclear biosynthetic relationship between the precursor and product amino acids and the collocation of Ala between the amino acids Val and Leu belonging to the pyruvate biosynthetic family, which the coevolution theory considered as belonging to different biosyntheses.
This article was reviewed by Rob Knight, Paul Higgs (nominated by Laura Landweber), and Eugene Koonin.
PMCID: PMC2538516  PMID: 18775066
19.  Same-strand overlapping genes in bacteria: compositional determinants of phase bias 
Biology Direct  2008;3:36.
Same-strand overlapping genes may occur in frameshifts of one (phase 1) or two nucleotides (phase 2). In previous studies of bacterial genomes, long phase-1 overlaps were found to be more numerous than long phase-2 overlaps. This bias was explained by either genomic location or an unspecified selection advantage. Models that focused on the ability of the two genes to evolve independently did not predict this phase bias. Here, we propose that a purely compositional model explains the phase bias in a more parsimonious manner. Same-strand overlapping genes may arise through either a mutation at the termination codon of the upstream gene or a mutation at the initiation codon of the downstream gene. We hypothesized that given these two scenarios, the frequencies of initiation and termination codons in the two phases may determine the number for overlapping genes.
We examined the frequencies of initiation- and termination-codons in the two phases, and found that termination codons do not significantly differ between the two phases, whereas initiation codons are more abundant in phase 1. We found that the primary factors explaining the phase inequality are the frequencies of amino acids whose codons may combine to form start codons in the two phases. We show that the frequencies of start codons in each of the two phases, and, hence, the potential for the creation of overlapping genes, are determined by a universal amino-acid frequency and species-specific codon usage, leading to a correlation between long phase-1 overlaps and genomic GC content.
Our model explains the phase bias in same-strand overlapping genes by compositional factors without invoking selection. Therefore, it can be used as a null model of neutral evolution to test selection hypotheses concerning the evolution of overlapping genes.
This article was reviewed by Bill Martin, Itai Yanai, and Mikhail Gelfand.
PMCID: PMC2542354  PMID: 18717987
20.  Critical role for BIM in T cell receptor restimulation-induced death 
Biology Direct  2008;3:34.
Upon repeated or chronic antigen stimulation, activated T cells undergo a T cell receptor (TCR)-triggered propriocidal cell death important for governing the intensity of immune responses. This is thought to be chiefly mediated by an extrinsic signal through the Fas-FasL pathway. However, we observed that TCR restimulation still potently induced apoptosis when this interaction was blocked, or genetically impaired in T cells derived from autoimmune lymphoproliferative syndrome (ALPS) patients, prompting us to examine Fas-independent, intrinsic signals.
Upon TCR restimulation, we specifically noted a marked increase in the expression of BIM, a pro-apoptotic Bcl-2 family protein known to mediate lymphocyte apoptosis induced by cytokine withdrawal. In fact, T cells from an ALPS type IV patient in which BIM expression is suppressed were more resistant to restimulation-induced death. Strikingly, knockdown of BIM expression rescued normal T cells from TCR-induced death to as great an extent as Fas disruption.
Our data implicates BIM as a critical mediator of apoptosis induced by restimulation as well as growth cytokine withdrawal. These findings suggest an important role for BIM in eliminating activated T cells even when IL-2 is abundant, working in conjunction with Fas to eliminate chronically stimulated T cells and maintain immune homeostasis.
This article was reviewed by Dr. Wendy Davidson (nominated by Dr. David Scott), Dr. Mark Williams (nominated by Dr. Neil Greenspan), and Dr. Laurence C. Eisenlohr.
PMCID: PMC2529272  PMID: 18715501
21.  A nitty-gritty aspect of correlation and network inference from gene expression data 
Biology Direct  2008;3:35.
All currently available methods of network/association inference from microarray gene expression measurements implicitly assume that such measurements represent the actual expression levels of different genes within each cell included in the biological sample under study. Contrary to this common belief, modern microarray technology produces signals aggregated over a random number of individual cells, a "nitty-gritty" aspect of such arrays, thereby causing a random effect that distorts the correlation structure of intra-cellular gene expression levels.
This paper provides a theoretical consideration of the random effect of signal aggregation and its implications for correlation analysis and network inference. An attempt is made to quantitatively assess the magnitude of this effect from real data. Some preliminary ideas are offered to mitigate the consequences of random signal aggregation in the analysis of gene expression data.
Resulting from the summation of expression intensities over a random number of individual cells, the observed signals may not adequately reflect the true dependence structure of intra-cellular gene expression levels needed as a source of information for network reconstruction. Whether the reported effect is extrime or not, the important point, is to reconize and incorporate such signal source for proper inference. The usefulness of inference on genetic regulatory structures from microarray data depends critically on the ability of investigators to overcome this obstacle in a scientifically sound way.
This article was reviewed by Byung Soo KIM, Jeanne Kowalski and Geoff McLachlan
PMCID: PMC2569917  PMID: 18715503
22.  The evolution of strand preference in simulated RNA replicators with strand displacement: Implications for the origin of transcription 
Biology Direct  2008;3:33.
The simplest conceivable example of evolving systems is RNA molecules that can replicate themselves. Since replication produces a new RNA strand complementary to a template, all templates would eventually become double-stranded and, hence, become unavailable for replication. Thus the problem of how to separate the two strands is considered a major issue for the early evolution of self-replicating RNA. One biologically plausible way to copy a double-stranded RNA is to displace a preexisting strand by a newly synthesized strand. Such copying can in principle be initiated from either the (+) or (-) strand of a double-stranded RNA. Assuming that only one of them, say (+), can act as replicase when single-stranded, strand displacement produces a new replicase if the (-) strand is the template. If, however, the (+) strand is the template, it produces a new template (but no replicase). Modern transcription exhibits extreme strand preference wherein anti-sense strands are always the template. Likewise, replication by strand displacement seems optimal if it also exhibits extreme strand preference wherein (-) strands are always the template, favoring replicase production. Here we investigate whether such strand preference can evolve in a simple RNA replicator system with strand displacement.
We first studied a simple mathematical model of the replicator dynamics. Our results indicated that if the system is well-mixed, there is no selective force acting upon strand preference per se. Next, we studied an individual-based simulation model to investigate the evolution of strand preference under finite diffusion. Interestingly, the results showed that selective forces "emerge" because of finite diffusion. Strikingly, the direction of the strand preference that evolves [i.e. (+) or (-) strand excess] is a complex non-monotonic function of the diffusion intensity. The mechanism underlying this behavior is elucidated. Furthermore, a speciation-like phenomenon is observed under certain conditions: two extreme replication strategies, namely replicase producers and template producers, emerge and coexist among competing replicators.
Finite diffusion enables the evolution of strand preference, the direction of which is a non-monotonic function of the diffusion intensity. By identifying the conditions under which strand preference evolves, this study provides an insight into how a rudimentary transcription-like pattern might have emerged in an RNA-based replicator system.
This article was reviewed by Eugene V Koonin, Rob Kinght and István Scheuring (nominated by David H Ardell). For the full reviews, please go to the Reviewers' comments section.
PMCID: PMC2648946  PMID: 18694481
23.  A highly conserved family of inactivated archaeal B family DNA polymerases 
Biology Direct  2008;3:32.
A widespread and highly conserved family of apparently inactivated derivatives of archaeal B-family DNA polymerases is described. Phylogenetic analysis shows that the inactivated forms comprise a distinct clade among archaeal B-family polymerases and that, within this clade, Euryarchaea and Crenarchaea are clearly separated from each other and from a small group of bacterial homologs. These findings are compatible with an ancient duplication of the DNA polymerase gene followed by inactivation and parallel loss in some of the lineages although contribution of horizontal gene transfer cannot be ruled out. The inactivated derivative of the archaeal DNA polymerase could form a complex with the active paralog and play a structural role in DNA replication.
This article was reviewed by Purificacion Lopez-Garcia and Chris Ponting. For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2527604  PMID: 18684330
24.  Origin of the nucleus and Ran-dependent transport to safeguard ribosome biogenesis in a chimeric cell 
Biology Direct  2008;3:31.
The origin of the nucleus is a central problem about the origin of eukaryotes. The common ancestry of nuclear pore complexes (NPC) and vesicle coating complexes indicates that the nucleus evolved via the modification of a pre-existing endomembrane system. Such an autogenous scenario is cell biologically feasible, but it is not clear what were the selective or neutral mechanisms that had led to the origin of the nuclear compartment.
A key selective force during the autogenous origin of the nucleus could have been the need to segregate ribosome factories from the cytoplasm where ribosomal proteins (RPs) of the protomitochondrium were synthesized. After its uptake by an anuclear cell the protomitochondrium transferred several of its RP genes to the host genome. Alphaproteobacterial RPs and archaebacterial-type host ribosomes were consequently synthesized in the same cytoplasm. This could have led to the formation of chimeric ribosomes. I propose that the nucleus evolved when the host cell compartmentalised its ribosome factories and the tightly linked genome to reduce ribosome chimerism. This was achieved in successive stages by first evolving karyopherin and RanGTP dependent chaperoning of RPs, followed by the evolution of a membrane network to serve as a diffusion barrier, and finally a hydrogel sieve to ensure selective permeability at nuclear pores. Computer simulations show that a gradual segregation of cytoplasm and nucleoplasm via these steps can progressively reduce ribosome chimerism.
Ribosome chimerism can provide a direct link between the selective forces for and the mechanisms of evolving nuclear transport and compartmentalisation. The detailed molecular scenario presented here provides a solution to the gradual evolution of nuclear compartmentalization from an anuclear stage.
This article was reviewed by Eugene V Koonin, Martijn Huynen, Anthony M. Poole and Patrick Forterre.
PMCID: PMC2503971  PMID: 18652645
25.  The Last Universal Common Ancestor: emergence, constitution and genetic legacy of an elusive forerunner 
Biology Direct  2008;3:29.
Since the reclassification of all life forms in three Domains (Archaea, Bacteria, Eukarya), the identity of their alleged forerunner (Last Universal Common Ancestor or LUCA) has been the subject of extensive controversies: progenote or already complex organism, prokaryote or protoeukaryote, thermophile or mesophile, product of a protracted progression from simple replicators to complex cells or born in the cradle of "catalytically closed" entities? We present a critical survey of the topic and suggest a scenario.
LUCA does not appear to have been a simple, primitive, hyperthermophilic prokaryote but rather a complex community of protoeukaryotes with a RNA genome, adapted to a broad range of moderate temperatures, genetically redundant, morphologically and metabolically diverse. LUCA's genetic redundancy predicts loss of paralogous gene copies in divergent lineages to be a significant source of phylogenetic anomalies, i.e. instances where a protein tree departs from the SSU-rRNA genealogy; consequently, horizontal gene transfer may not have the rampant character assumed by many. Examining membrane lipids suggest LUCA had sn1,2 ester fatty acid lipids from which Archaea emerged from the outset as thermophilic by "thermoreduction," with a new type of membrane, composed of sn2,3 ether isoprenoid lipids; this occurred without major enzymatic reconversion. Bacteria emerged by reductive evolution from LUCA and some lineages further acquired extreme thermophily by convergent evolution. This scenario is compatible with the hypothesis that the RNA to DNA transition resulted from different viral invasions as proposed by Forterre. Beyond the controversy opposing "replication first" to metabolism first", the predictive arguments of theories on "catalytic closure" or "compositional heredity" heavily weigh in favour of LUCA's ancestors having emerged as complex, self-replicating entities from which a genetic code arose under natural selection.
Life was born complex and the LUCA displayed that heritage. It had the "body "of a mesophilic eukaryote well before maturing by endosymbiosis into an organism adapted to an atmosphere rich in oxygen. Abundant indications suggest reductive evolution of this complex and heterogeneous entity towards the "prokaryotic" Domains Archaea and Bacteria. The word "prokaryote" should be abandoned because epistemologically unsound.
This article was reviewed by Anthony Poole, Patrick Forterre, and Nicolas Galtier.
PMCID: PMC2478661  PMID: 18613974

Results 1-25 (54)