Search tips
Search criteria

Results 1-25 (49)

Clipboard (0)
Year of Publication
1.  Detection of a common chimeric transcript between human chromosomes 7 and 16 
Biology Direct  2012;7:49.
Interchromosomal chimeric RNA molecules are often transcription products from genomic rearrangement in cancerous cells. Here we report the computational detection of an interchromosomal RNA fusion between ZC3HAV1L and CHMP1A from RNA-seq data of normal human mammary epithelial cells, and experimental confirmation of the chimeric transcript in multiple human cells and tissues. Our experimental characterization also detected three variants of the ZC3HAV1L-CHMP1A chimeric RNA, suggesting that these genes are involved in complex splicing. The fusion sequence at the novel exon-exon boundary, and the absence of corresponding DNA rearrangement suggest that this chimeric RNA is likely produced by trans-splicing in human cells.
This article was reviewed by Rory Johnson (nominated by Fyodor Kondrashov); Gal Avital and Itai Yanai
PMCID: PMC3538553  PMID: 23273016
Chimeric transcripts; RNA fusion; trans-splicing; Genome rearrangement
2.  Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios 
Biology Direct  2012;7:48.
A long recognized problem is the inference of the supertree S that amalgamates a given set {Gj} of trees Gj, with leaves in each Gj being assigned homologous elements.
We ground on an approach to find the tree S by minimizing the total cost of mappings αj of individual gene trees Gj into S. Traditionally, this cost is defined basically as a sum of duplications and gaps in each αj. The classical problem is to minimize the total cost, where S runs over the set of all trees that contain an exhaustive non-redundant set of species from all input Gj.
We suggest a reformulation of the classical NP-hard problem of building a supertree in terms of the global minimization of the same cost functional but only over species trees S that consist of clades belonging to a fixed set P (e.g., an exhaustive set of clades in all Gj). We developed a deterministic solving algorithm with a low degree polynomial (typically cubic) time complexity with respect to the size of input data.
We define an extensive set of elementary evolutionary events and suggest an original definition of mapping β of tree G into tree S. We introduce the cost functional c(G, S, f ) and define the mapping β as the global minimum of this functional with respect to the variable f, in which sense it is a generalization of classical mapping α.
We suggest a reformulation of the classical NP-hard mapping (reconciliation) problem by introducing time slices into the species tree S and present a cubic time solving algorithm to compute the mapping β. We introduce two novel definitions of the evolutionary scenario based on mapping β or a random process of gene evolution along a species tree.
Developed algorithms are mathematically proved, which justifies the following statements. The supertree building algorithm finds exactly the global minimum of the total cost if only gene duplications and losses are allowed and the given sets of gene trees satisfies a certain condition. The mapping algorithm finds exactly the minimal mapping β, the minimal total cost and the evolutionary scenario as a minimum over all possible distributions of elementary evolutionary events along the edges of tree S.
The algorithms and their effective software implementations provide useful tools in many biological studies. They facilitate processing of voluminous tree data in acceptable time still largely avoiding heuristics. Performance of the tools is tested with artificial and prokaryotic tree data.
This article was reviewed by Prof. Anthony Almudevar, Prof. Alexander Bolshoy (nominated by Prof. Peter Olofsson), and Prof. Marek Kimmel.
PMCID: PMC3577452  PMID: 23259766
Phylogenetics; Fast algorithms; Tree inference; Species tree; Tree amalgamation; Tree reconciliation; Supertree; Evolutionary events; Gene duplication; Gene loss; Horizontal gene transfer; Gene gain; Time slices
3.  AID/APOBEC cytosine deaminase induces genome-wide kataegis 
Biology Direct  2012;7:47.
Clusters of localized hypermutation in human breast cancer genomes, named “kataegis” (from the Greek for thunderstorm), are hypothesized to result from multiple cytosine deaminations catalyzed by AID/APOBEC proteins. However, a direct link between APOBECs and kataegis is still lacking. We have sequenced the genomes of yeast mutants induced in diploids by expression of the gene for PmCDA1, a hypermutagenic deaminase from sea lamprey. Analysis of the distribution of 5,138 induced mutations revealed localized clusters very similar to those found in tumors. Our data provide evidence that unleashed cytosine deaminase activity is an evolutionary conserved, prominent source of genome-wide kataegis events.
This article was reviewed by: Professor Sandor Pongor, Professor Shamil R. Sunyaev, and Dr Vladimir Kuznetsov.
PMCID: PMC3542020  PMID: 23249472
APOBEC; Deaminase; Mutation; Kataegis; Cancer; Diploid yeast; Hypermutation
4.  Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer 
Biology Direct  2012;7:46.
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea.
The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer.
The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time.
This article was reviewed by (for complete reviews see the Reviewers’ Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
PMCID: PMC3534625  PMID: 23241446
Archaea; Orthologs; Horizontal gene transfer
5.  Ethanolamine utilization in Vibrio alginolyticus 
Biology Direct  2012;7:45.
Ethanolamine is used as an energy source by phylogenetically diverse bacteria including pathogens, by the concerted action of proteins from the eut-operon. Previous studies have revealed the presence of eutBC genes encoding ethanolamine-ammonia lyase, a key enzyme that breaks ethanolamine into acetaldehyde and ammonia, in about 100 bacterial genomes including members of gamma-proteobacteria. However, ethanolamine utilization has not been reported for any member of the Vibrio genus. Our comparative genomics study reveals the presence of genes that are involved in ethanolamine utilization in several Vibrio species. Using Vibrio alginolyticus as a model system we demonstrate that ethanolamine is better utilized as a nitrogen source than as a carbon source.
This article was reviewed by Dr. Lakshminarayan Iyer and Dr. Vivek Anantharaman (nominated by Dr. L Aravind).
PMCID: PMC3542024  PMID: 23234435
Pathogenesis; Ethanolamine; Vibrio; Eut operon; Metabolosome
6.  Harnessing the complexity of gene expression data from cancer: from single gene to structural pathway methods 
Biology Direct  2012;7:44.
High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
This article was reviewed by Arcady Mushegian, Byung-Soo Kim and Joel Bader.
PMCID: PMC3769148  PMID: 23227854
Gene expression data; Cancer data; Statistical analysis methods; Pathway methods; Correlation structure; Cancer genomics
7.  Bioinformatics clouds for big data manipulation 
Biology Direct  2012;7:43.
As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics.
This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.
PMCID: PMC3533974  PMID: 23190475
Cloud computing; Bioinformatics; Big data; Data storage; Data analysis
8.  The origin of life is a spatially localized stochastic transition 
Biology Direct  2012;7:42.
Life depends on biopolymer sequences as catalysts and as genetic material. A key step in the Origin of Life is the emergence of an autocatalytic system of biopolymers. Here we study computational models that address the way a living autocatalytic system could have emerged from a non-living chemical system, as envisaged in the RNA World hypothesis.
We consider (i) a chemical reaction system describing RNA polymerization, and (ii) a simple model of catalytic replicators that we call the Two’s Company model. Both systems have two stable states: a non-living state, characterized by a slow spontaneous rate of RNA synthesis, and a living state, characterized by rapid autocatalytic RNA synthesis. The origin of life is a transition between these two stable states. The transition is driven by stochastic concentration fluctuations involving relatively small numbers of molecules in a localized region of space. These models are simulated on a two-dimensional lattice in which reactions occur locally on single sites and diffusion occurs by hopping of molecules to neighbouring sites.
If diffusion is very rapid, the system is well-mixed. The transition to life becomes increasingly difficult as the lattice size is increased because the concentration fluctuations that drive the transition become relatively smaller when larger numbers of molecules are involved. In contrast, when diffusion occurs at a finite rate, concentration fluctuations are local. The transition to life occurs in one local region and then spreads across the rest of the surface. The transition becomes easier with larger lattice sizes because there are more independent regions in which it could occur. The key observations that apply to our models and to the real world are that the origin of life is a rare stochastic event that is localized in one region of space due to the limited rate of diffusion of the molecules involved and that the subsequent spread across the surface is deterministic. It is likely that the time required for the deterministic spread is much shorter than the waiting time for the origin, in which case life evolves only once on a planet, and then rapidly occupies the whole surface.
Reviewed by Omer Markovitch (nominated by Doron Lancet), Claus Wilke, and Nobuto Takeuchi (nominated by Eugene Koonin).
PMCID: PMC3541068  PMID: 23176307
9.  Somatic transposition in the brain has the potential to influence the biosynthesis of metabolites involved in Parkinson’s disease and schizophrenia 
Biology Direct  2012;7:41.
It has been recently discovered that transposable elements show high activity in the brain of mammals, however, the magnitude of their influence on its functioning is unclear so far. In this paper, I use flux balance analysis to examine the influence of somatic retrotransposition on brain metabolism, and the biosynthesis of its key metabolites, including neurotransmitters. The analysis shows that somatic transposition in the human brain can influence the biosynthesis of more than 250 metabolites, including dopamine, serotonin and glutamate, shows large inter-individual variability in metabolic effects, and may contribute to the development of Parkinson’s disease and schizophrenia.
This article was reviewed by Dr Kenji Kojima (nominated by Dr Jerzy Jurka) and Dr Eugene Koonin.
PMCID: PMC3534579  PMID: 23176288
Retrotransposition; Brain; Metabolic network; Parkinson’s disease; Schizophrenia
10.  Live virus-free or die: coupling of antivirus immunity and programmed suicide or dormancy in prokaryotes 
Biology Direct  2012;7:40.
The virus-host arms race is a major theater for evolutionary innovation. Archaea and bacteria have evolved diverse, elaborate antivirus defense systems that function on two general principles: i) immune systems that discriminate self DNA from nonself DNA and specifically destroy the foreign, in particular viral, genomes, whereas the host genome is protected, or ii) programmed cell suicide or dormancy induced by infection.
Presentation of the hypothesis
Almost all genomic loci encoding immunity systems such as CRISPR-Cas, restriction-modification and DNA phosphorothioation also encompass suicide genes, in particular those encoding known and predicted toxin nucleases, which do not appear to be directly involved in immunity. In contrast, the immunity systems do not appear to encode antitoxins found in typical toxin-antitoxin systems. This raises the possibility that components of the immunity system themselves act as reversible inhibitors of the associated toxin proteins or domains as has been demonstrated for the Escherichia coli anticodon nuclease PrrC that interacts with the PrrI restriction-modification system. We hypothesize that coupling of diverse immunity and suicide/dormancy systems in prokaryotes evolved under selective pressure to provide robustness to the antivirus response. We further propose that the involvement of suicide/dormancy systems in the coupled antivirus response could take two distinct forms:
1) induction of a dormancy-like state in the infected cell to ‘buy time’ for activation of adaptive immunity; 2) suicide or dormancy as the final recourse to prevent viral spread triggered by the failure of immunity.
Testing the hypothesis
This hypothesis entails many experimentally testable predictions. Specifically, we predict that Cas2 protein present in all cas operons is a mRNA-cleaving nuclease (interferase) that might be activated at an early stage of virus infection to enable incorporation of virus-specific spacers into the CRISPR locus or to trigger cell suicide when the immune function of CRISPR-Cas systems fails. Similarly, toxin-like activity is predicted for components of numerous other defense loci.
Implications of the hypothesis
The hypothesis implies that antivirus response in prokaryotes involves key decision-making steps at which the cell chooses the path to follow by sensing the course of virus infection.
This article was reviewed by Arcady Mushegian, Etienne Joly and Nick Grishin. For complete reviews, go to the Reviewers’ reports section.
PMCID: PMC3506569  PMID: 23151069
11.  ALOG domains: provenance of plant homeotic and developmental regulators from the DNA-binding domain of a novel class of DIRS1-type retroposons 
Biology Direct  2012;7:39.
Members of the Arabidopsis LSH1 and Oryza G1 (ALOG) family of proteins have been shown to function as key developmental regulators in land plants. However, their precise mode of action remains unclear. Using sensitive sequence and structure analysis, we show that the ALOG domains are a distinct version of the N-terminal DNA-binding domain shared by the XerC/D-like, protelomerase, topoisomerase-IA, and Flp tyrosine recombinases. ALOG domains are distinguished by the insertion of an additional zinc ribbon into this DNA-binding domain. In particular, we show that the ALOG domain is derived from the XerC/D-like recombinases of a novel class of DIRS-1-like retroposons. Copies of this element, which have been recently inactivated, are present in several marine metazoan lineages, whereas the stramenopile Ectocarpus, retains an active copy of the same. Thus, we predict that ALOG domains help establish organ identity and differentiation by binding specific DNA sequences and acting as transcription factors or recruiters of repressive chromatin. They are also found in certain plant defense proteins, where they are predicted to function as DNA sensors. The evolutionary history of the ALOG domain represents a unique instance of a domain, otherwise exclusively found in retroelements, being recruited as a specific transcription factor in the streptophyte lineage of plants. Hence, they add to the growing evidence for derivation of DNA-binding domains of eukaryotic specific TFs from mobile and selfish elements.
PMCID: PMC3537659  PMID: 23146749
DIRS1; Tyrosine recombinase; Plant development; DNA-binding; Retroposon; Transcription factor; Chromatin protein; Plant defense
12.  Early evolution of efficient enzymes and genome organization 
Biology Direct  2012;7:38.
Cellular life with complex metabolism probably evolved during the reign of RNA, when it served as both information carrier and enzyme. Jensen proposed that enzymes of primordial cells possessed broad specificities: they were generalist. When and under what conditions could primordial metabolism run by generalist enzymes evolve to contemporary-type metabolism run by specific enzymes?
Here we show by numerical simulation of an enzyme-catalyzed reaction chain that specialist enzymes spread after the invention of the chromosome because protocells harbouring unlinked genes maintain largely non-specific enzymes to reduce their assortment load. When genes are linked on chromosomes, high enzyme specificity evolves because it increases biomass production, also by reducing taxation by side reactions.
The constitution of the genetic system has a profound influence on the limits of metabolic efficiency. The major evolutionary transition to chromosomes is thus proven to be a prerequisite for a complex metabolism. Furthermore, the appearance of specific enzymes opens the door for the evolution of their regulation.
This article was reviewed by Sándor Pongor, Gáspár Jékely, and Rob Knight.
PMCID: PMC3534232  PMID: 23114029
Origin of life; Chromosome; Metabolism; Ribozyme; Major transitions; Enzyme evolution
13.  Thousands of missed genes found in bacterial genomes and their analysis with COMBREX 
Biology Direct  2012;7:37.
The dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST.
By analyzing 1,474 prokaryotic genome annotations in GenBank, we identify 13,602 likely missed genes that are homologs to non-hypothetical proteins, and 11,792 likely missed genes that are homologs only to hypothetical proteins, yet have supporting evidence of their protein-coding nature from COMBREX, a newly created gene function database. We also estimate the likelihood that each potential missing gene found is a genuine protein-coding gene using COMBREX.
Our analysis of the causes of missed genes suggests that larger annotation centers tend to produce annotations with fewer missed genes than smaller centers, and many of the missed genes are short genes <300 bp. Over 1,000 of the likely missed genes could be associated with phenotype information available in COMBREX. 359 of these genes, found in pathogenic organisms, may be potential targets for pharmaceutical research. The newly identified genes are available on COMBREX’s website.
This article was reviewed by Daniel Haft, Arcady Mushegian, and M. Pilar Francino (nominated by David Ardell).
PMCID: PMC3534567  PMID: 23111013
14.  Distinct groups of repetitive families preserved in mammals correspond to different periods of regulatory innovations in vertebrates 
Biology Direct  2012;7:36.
Mammalian genomes are repositories of repetitive DNA sequences derived from transposable elements (TEs). Typically, TEs generate multiple, mostly inactive copies of themselves, commonly known as repetitive families or families of repeats. Recently, we proposed that families of TEs originate in small populations by genetic drift and that the origin of small subpopulations from larger populations can be fueled by biological innovations.
We report three distinct groups of repetitive families preserved in the human genome that expanded and declined during the three previously described periods of regulatory innovations in vertebrate genomes. The first group originated prior to the evolutionary separation of the mammalian and bird lineages and the second one during subsequent diversification of the mammalian lineages prior to the origin of eutherian lineages. The third group of families is primate-specific.
The observed correlation implies a relationship between regulatory innovations and the origin of repetitive families. Consistent with our previous hypothesis, it is proposed that regulatory innovations fueled the origin of new subpopulations in which new repetitive families became fixed by genetic drift.
Eugene Koonin, I. King Jordan, Jürgen Brosius.
PMCID: PMC3500645  PMID: 23098210
Transposable elements; Conserved repeats; Genetic drift; Evolution
15.  Constructive neutral evolution: exploring evolutionary theory’s curious disconnect 
Biology Direct  2012;7:35.
Constructive neutral evolution (CNE) suggests that neutral evolution may follow a stepwise path to extravagance. Whether or not CNE is common, the mere possibility raises provocative questions about causation: in classical neo-Darwinian thinking, selection is the sole source of creativity and direction, the only force that can cause trends or build complex features. However, much of contemporary evolutionary genetics departs from the conception of evolution underlying neo-Darwinism, resulting in a widening gap between what formal models allow, and what the prevailing view of the causes of evolution suggests. In particular, a mutationist conception of evolution as a 2-step origin-fixation process has been a source of theoretical innovation for 40 years, appearing not only in the Neutral Theory, but also in recent breakthroughs in modeling adaptation (the “mutational landscape” model), and in practical software for sequence analysis. In this conception, mutation is not a source of raw materials, but an agent that introduces novelty, while selection is not an agent that shapes features, but a stochastic sieve. This view, which now lays claim to important theoretical, experimental, and practical results, demands our attention. CNE provides a way to explore its most significant implications about the role of variation in evolution.
Alex Kondrashov, Eugene Koonin and Johann Peter Gogarten reviewed this article.
PMCID: PMC3534586  PMID: 23062217
Evolutionary theory; Constructive neutral evolution; Neo-Darwinism; Mutation; Evolutionary genetics; Mutation bias; Modern Synthesis
16.  Proteorhodopsin genes in giant viruses 
Biology Direct  2012;7:34.
Viruses with large genomes encode numerous proteins that do not directly participate in virus biogenesis but rather modify key functional systems of infected cells. We report that a distinct group of giant viruses infecting unicellular eukaryotes that includes Organic Lake Phycodnaviruses and Phaeocystis globosa virus encode predicted proteorhodopsins that have not been previously detected in viruses. Search of metagenomic sequence data shows that putative viral proteorhodopsins are extremely abundant in marine environments. Phylogenetic analysis suggests that giant viruses acquired proteorhodopsins via horizontal gene transfer from proteorhodopsin-encoding protists although the actual donor(s) could not be presently identified. The pattern of conservation of the predicted functionally important amino acid residues suggests that viral proteorhodopsin homologs function as sensory rhodopsins. We hypothesize that viral rhodopsins modulate light-dependent signaling, in particular phototaxis, in infected protists.
This article was reviewed by Igor B. Zhulin and Laksminarayan M. Iyer. For the full reviews, see the Reviewers’ reports section.
PMCID: PMC3500653  PMID: 23036091
17.  Stable feature selection and classification algorithms for multiclass microarray data 
Biology Direct  2012;7:33.
Recent studies suggest that gene expression profiles are a promising alternative for clinical cancer classification. One major problem in applying DNA microarrays for classification is the dimension of obtained data sets. In this paper we propose a multiclass gene selection method based on Partial Least Squares (PLS) for selecting genes for classification. The new idea is to solve multiclass selection problem with the PLS method and decomposition to a set of two-class sub-problems: one versus rest (OvR) and one versus one (OvO). We use OvR and OvO two-class decomposition for other recently published gene selection method. Ranked gene lists are highly unstable in the sense that a small change of the data set often leads to big changes in the obtained ordered lists. In this paper, we take a look at the assessment of stability of the proposed methods. We use the linear support vector machines (SVM) technique in different variants: one versus one, one versus rest, multiclass SVM (MSVM) and the linear discriminant analysis (LDA) as a classifier. We use balanced bootstrap to estimate the prediction error and to test the variability of the obtained ordered lists.
This paper focuses on effective identification of informative genes. As a result, a new strategy to find a small subset of significant genes is designed. Our results on real multiclass cancer data show that our method has a very high accuracy rate for different combinations of classification methods, giving concurrently very stable feature rankings.
This paper shows that the proposed strategies can improve the performance of selected gene sets substantially. OvR and OvO techniques applied to existing gene selection methods improve results as well. The presented method allows to obtain a more reliable classifier with less classifier error. In the same time the method generates more stable ordered feature lists in comparison with existing methods.
This article was reviewed by Prof Marek Kimmel, Dr Hans Binder (nominated by Dr Tomasz Lipniacki) and Dr Yuriy Gusev
PMCID: PMC3599581  PMID: 23031190
18.  Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage 
Biology Direct  2012;7:32.
The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.
The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC, Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage.
This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important “missing genes” or “missing function” cases and illustrates the danger of functional annotation of protein families by homology alone.
Reviewers’ names
This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind.
PMCID: PMC3541065  PMID: 23013770
Diphthamide; Vitamin B12; Amidotransferase; Comparative genomics
19.  Evasion of tumours from the control of the immune system: consequences of brief encounters 
Biology Direct  2012;7:31.
In this work a mathematical model describing the growth of a solid tumour in the presence of an immune system response is presented. Specifically, attention is focused on the interactions between cytotoxic T-lymphocytes (CTLs) and tumour cells in a small, avascular multicellular tumour. At this stage of the disease the CTLs and the tumour cells are considered to be in a state of dynamic equilibrium or cancer dormancy. The precise biochemical and cellular mechanisms by which CTLs can control a cancer and keep it in a dormant state are still not completely understood from a biological and immunological point of view. The mathematical model focuses on the spatio-temporal dynamics of tumour cells, immune cells, chemokines and “chemorepellents” in an immunogenic tumour. The CTLs and tumour cells are assumed to migrate and interact with each other in such a way that lymphocyte-tumour cell complexes are formed. These complexes result in either the death of the tumour cells (the normal situation) or the inactivation of the lymphocytes and consequently the survival of the tumour cells. In the latter case, we assume that each tumour cell that survives its “brief encounter” with the CTLs undergoes certain beneficial phenotypic changes.
We explore the dynamics of the model under these assumptions and show that the process of immuno-evasion can arise as a consequence of these encounters. We show that the proposed mechanism not only shape the dynamics of the total number of tumor cells and of CTLs, but also the dynamics of their spatial distribution. We also briefly discuss the evolutionary features of our model, by framing them in the recent quasi-Lamarckian theories.
Our findings might have some interesting implication of interest for clinical practice. Indeed, immuno-editing process can be seen as an “involuntary” antagonistic process acting against immunotherapies, which aim at maintaining a tumor in a dormant state, or at suppressing it.
This article was reviewed by G. Bocharov (nominated by V. Kuznetsov, member of the Editorial Board of Biology Direct), M. Kimmel and A. Marciniak-Czochra.
PMCID: PMC3582466  PMID: 23009638
Tumour growth; Immune response; Cytotoxic T-lymphocytes; Immuno-evasion; Mathematical models; Chemotaxis; Diffusion; Immuno-editing
20.  Stop codons in bacteria are not selectively equivalent 
Biology Direct  2012;7:30.
The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes.
We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG.
Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.
This article was reviewed by Michail Gelfand, Arcady Mushegian and Shamil Sunyaev. For the full reviews, please go to the Reviewers’ Comments section.
PMCID: PMC3549826  PMID: 22974057
21.  Novel and unexpected bacterial diversity in an arsenic-rich ecosystem revealed by culture-dependent approaches 
Biology Direct  2012;7:28.
Acid Mine Drainages (AMDs) are extreme environments characterized by very acid conditions and heavy metal contaminations. In these ecosystems, the bacterial diversity is considered to be low. Previous culture-independent approaches performed in the AMD of Carnoulès (France) confirmed this low species richness. However, very little is known about the cultured bacteria in this ecosystem. The aims of the study were firstly to apply novel culture methods in order to access to the largest cultured bacterial diversity, and secondly to better define the robustness of the community for 3 important functions: As(III) oxidation, cellulose degradation and cobalamine biosynthesis.
Despite the oligotrophic and acidic conditions found in AMDs, the newly designed media covered a large range of nutrient concentrations and a pH range from 3.5 to 9.8, in order to target also non-acidophilic bacteria. These approaches generated 49 isolates representing 19 genera belonging to 4 different phyla. Importantly, overall diversity gained 16 extra genera never detected in Carnoulès. Among the 19 genera, 3 were previously uncultured, one of them being novel in databases. This strategy increased the overall diversity in the Carnoulès sediment by 70% when compared with previous culture-independent approaches, as specific phylogenetic groups (e.g. the subclass Actinobacteridae or the order Rhizobiales) were only detected by culture. Cobalamin auxotrophy, cellulose degradation and As(III)-oxidation are 3 crucial functions in this ecosystem, and a previous meta- and proteo-genomic work attributed each function to only one taxon. Here, we demonstrate that other members of this community can also assume these functions, thus increasing the overall community robustness.
This work highlights that bacterial diversity in AMDs is much higher than previously envisaged, thus pointing out that the AMD system is functionally more robust than expected. The isolated bacteria may be part of the rare biosphere which remained previously undetected due to molecular biases. No matter their current ecological relevance, the exploration of the full diversity remains crucial to decipher the function and dynamic of any community. This work also underlines the importance to associate culture-dependent and -independent approaches to gain an integrative view of the community function.
This paper was reviewed by Sándor Pongor, Eugene V. Koonin and Brett Baker (nominated by Purificacion Lopez-Garcia).
PMCID: PMC3443666  PMID: 22963335
Acid mine drainage (AMD); Alkaliphilic bacteria; Neutrophilic bacteria; Functional redundancy; Rare biosphere; Uncultured bacteria; Molecular biases; Culture-dependent approaches; Actinobacteria; Bacterial diversity
22.  Identifying the mechanisms of intron gain: progress and trends 
Biology Direct  2012;7:29.
Continued improvements in Next-Generation DNA/RNA sequencing coupled with advances in gene annotation have provided researchers access to a plethora of annotated genomes. Subsequent analyses of orthologous gene structures have identified numerous intron gain and loss events that have occurred both recently and in the very distant past. This research has afforded exceptional insight into the temporal and lineage-specific rates of intron gain and loss among various species throughout evolution. Numerous studies have also attempted to identify the molecular mechanisms of intron gain and loss. However, even after considerable effort, very little is known about these processes. In particular, the mechanism(s) of intron gain have proven exceptionally enigmatic and remain topics of considerable debate. Currently, there exists no definitive consensus as to what mechanism(s) may generate introns. Because many introns are known to affect gene expression, it is necessary to understand the molecular process(es) by which introns may be gained. Here we review the seven most commonly purported mechanisms of intron gain and, when possible, summarize molecular evidence for or against the occurrence of each of these mechanisms. Furthermore, we catalogue indirect evidence that supports the occurrence of each mechanism. Finally, because these proposed mechanisms fail to explain the mechanistic origin of many recently gained introns, we also look at trends that may aid researchers in identifying other potential mechanism(s) of intron gain.
This article was reviewed by Eugene Koonin, Scott Roy (nominated by W. Ford Doolittle), and John Logsdon.
PMCID: PMC3443670  PMID: 22963364
Intron; Intron gain; Intron evolution; Gene structure; Evolution; Mechanism
23.  Does the central dogma still stand? 
Biology Direct  2012;7:27.
Prions are agents of analog, protein conformation-based inheritance that can confer beneficial phenotypes to cells, especially under stress. Combined with genetic variation, prion-mediated inheritance can be channeled into prion-independent genomic inheritance. Latest screening shows that prions are common, at least in fungi. Thus, there is non-negligible flow of information from proteins to the genome in modern cells, in a direct violation of the Central Dogma of molecular biology. The prion-mediated heredity that violates the Central Dogma appears to be a specific, most radical manifestation of the widespread assimilation of protein (epigenetic) variation into genetic variation. The epigenetic variation precedes and facilitates genetic adaptation through a general ‘look-ahead effect’ of phenotypic mutations. This direction of the information flow is likely to be one of the important routes of environment-genome interaction and could substantially contribute to the evolution of complex adaptive traits.
This article was reviewed by Jerzy Jurka, Pierre Pontarotti and Juergen Brosius. For the complete reviews, see the Reviewers’ Reports section.
PMCID: PMC3472225  PMID: 22913395
24.  Modeling RNA polymerase interaction in mitochondria of chordates 
Biology Direct  2012;7:26.
In previous work, we introduced a concept, a mathematical model and its computer realization that describe the interaction between bacterial and phage type RNA polymerases, protein factors, DNA and RNA secondary structures during transcription, including transcription initiation and termination. The model accurately reproduces changes of gene transcription level observed in polymerase sigma-subunit knockout and heat shock experiments in plant plastids. The corresponding computer program and a user guide are available at Here we apply the model to the analysis of transcription and (partially) translation processes in the mitochondria of frog, rat and human. Notably, mitochondria possess only phage-type polymerases. We consider the entire mitochondrial genome so that our model allows RNA polymerases to complete more than one circle on the DNA strand.
Our model of RNA polymerase interaction during transcription initiation and elongation accurately reproduces experimental data obtained for plastids. Moreover, it also reproduces evidence on bulk RNA concentrations and RNA half-lives in the mitochondria of frog, human with or without the MELAS mutation, and rat with normal (euthyroid) or hyposecretion of thyroid hormone (hypothyroid). The transcription characteristics predicted by the model include: (i) the fraction of polymerases terminating at a protein-dependent terminator in both directions (the terminator polarization), (ii) the binding intensities of the regulatory protein factor (mTERF) with the termination site and, (iii) the transcription initiation intensities (initiation frequencies) of all promoters in all five conditions (frog, healthy human, human with MELAS syndrome, healthy rat, and hypothyroid rat with aberrant mtDNA methylation). Using the model, absolute levels of all gene transcription can be inferred from an arbitrary array of the three transcription characteristics, whereas, for selected genes only relative RNA concentrations have been experimentally determined. Conversely, these characteristics and absolute transcription levels can be obtained using relative RNA concentrations and RNA half-lives known from various experimental studies. In this case, the “inverse problem” is solved with multi-objective optimization.
In this study, we demonstrate that our model accurately reproduces all relevant experimental data available for plant plastids, as well as the mitochondria of chordates. Using experimental data, the model is applied to estimate binding intensities of phage-type RNA polymerases to their promoters as well as predicting terminator characteristics, including polarization. In addition, one can predict characteristics of phage-type RNA polymerases and the transcription process that are difficult to measure directly, e.g., the association between the promoter’s nucleotide composition and the intensity of polymerase binding. To illustrate the application of our model in functional predictions, we propose a possible mechanism for MELAS syndrome development in human involving a decrease of Phe-tRNA, Val-tRNA and rRNA concentrations in the cell. In addition, we describe how changes in methylation patterns of the mTERF binding site and three promoters in hypothyroid rat correlate with changes in intensities of the mTERF binding and transcription initiations. Finally, we introduce an auxiliary model to describe the interaction between polysomal mRNA and ribonucleases.
PMCID: PMC3583402  PMID: 22873568
RNA polymerase interaction; RNA polymerase competition; Transcription; Circular DNA; mtDNA in chordates; MELAS syndrome; Impact of DNA methylation; Hyposecretion of hormones; RNA interaction model; Polysome and ribonuclease interaction model
25.  Integrative transcriptome analysis suggest processing of a subset of long non-coding RNAs to small RNAs 
Biology Direct  2012;7:25.
The availability of sequencing technology has enabled understanding of transcriptomes through genome-wide approaches including RNA-sequencing. Contrary to the previous assumption that large tracts of the eukaryotic genomes are not transcriptionally active, recent evidence from transcriptome sequencing approaches have revealed pervasive transcription in many genomes of higher eukaryotes. Many of these loci encode transcripts that have no obvious protein-coding potential and are designated as non-coding RNA (ncRNA). Non-coding RNAs are classified empirically as small and long non-coding RNAs based on the size of the functional RNAs. Each of these classes is further classified into functional subclasses. Although microRNAs (miRNA), one of the major subclass of ncRNAs, have been extensively studied for their roles in regulation of gene expression and involvement in a large number of patho-physiological processes, the functions of a large proportion of long non-coding RNAs (lncRNA) still remains elusive. We hypothesized that some lncRNAs could potentially be processed to small RNA and thus could have a dual regulatory output.
Integration of large-scale independent experimental datasets in public domain revealed that certain well studied lncRNAs harbor small RNA clusters. Expression analysis of the small RNA clusters in different tissue and cell types reveal that they are differentially regulated suggesting a regulated biogenesis mechanism.
Our analysis suggests existence of a potentially novel pathway for lncRNA processing into small RNAs. Expression analysis, further suggests that this pathway is regulated. We argue that this evidence supports our hypothesis, though limitations of the datasets and analysis cannot completely rule out alternate possibilities. Further in-depth experimental verification of the observation could potentially reveal a novel pathway for biogenesis.
This article was reviewed by Dr Rory Johnson (nominated by Fyodor Kondrashov), Dr Raya Khanin (nominated by Dr Yuriy Gusev) and Prof Neil Smalheiser. For full reviews, please go to the Reviewer’s comment section.
PMCID: PMC3477000  PMID: 22871084

Results 1-25 (49)