PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (369)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
1.  Description of plant tRNA-derived RNA fragments (tRFs) associated with argonaute and identification of their putative targets 
Biology Direct  2013;8:6.
tRNA-derived RNA fragments (tRFs) are 19mer small RNAs that associate with Argonaute (AGO) proteins in humans. However, in plants, it is unknown if tRFs bind with AGO proteins. Here, using public deep sequencing libraries of immunoprecipitated Argonaute proteins (AGO-IP) and bioinformatics approaches, we identified the Arabidopsis thaliana AGO-IP tRFs. Moreover, using three degradome deep sequencing libraries, we identified four putative tRF targets. The expression pattern of tRFs, based on deep sequencing data, was also analyzed under abiotic and biotic stresses. The results obtained here represent a useful starting point for future studies on tRFs in plants.
doi:10.1186/1745-6150-8-6
PMCID: PMC3574835  PMID: 23402430
tRNAs; Small RNA; tRFs; tRNA-derived RNA fragments; Argonaute and Arabidopsis
2.  GABBR1 has a HERV-W LTR in its regulatory region – a possible implication for schizophrenia 
Biology Direct  2013;8:5.
Abstract
Schizophrenia is a complex disease with uncertain aetiology. We suggest GABBR1, GABA receptor B1 implicated in schizophrenia based on a HERV-W LTR in the regulatory region of GABBR1. Our hypothesis is supported by: (i) GABBR1 is in the 6p22 genomic region most often implicated in schizophrenia; (ii) microarray studies found that only presynaptic pathway-related genes, including GABA receptors, have altered expression in schizophrenic patients and (iii) it explains how HERV-W elements, expressed in schizophrenia, play a role in the disease: by altering the expression of GABBR1 via a long terminal repeat that is also a regulatory element to GABBR1.
Reviewers
This paper was reviewed by Sandor Pongor and Martijn Huynen.
doi:10.1186/1745-6150-8-5
PMCID: PMC3574838  PMID: 23391219
Schizophrenia; Human endogenous retrovirus; HERV-W; long terminal repeat; LTR; GABA; GABBR1; GABA receptor; Enhancer; Silencer
3.  Surprisingly high number of Twintrons in vertebrates 
Biology Direct  2013;8:4.
Twintrons represent a special intronic arrangement in which introns of two different types occupy the same gene position. Consequently, alternative splicing of these introns requires two different spliceosomes competing for the same RNA molecule. So far, only two twintrons have been described in insects. Surprisingly, we discovered several such arrangements in vertebrate genomes, which are quite conserved throughout the lineages.
Reviewers
This article was reviewed by Fyodor Kondrashow and Eugene Koonin.
doi:10.1186/1745-6150-8-4
PMCID: PMC3564746  PMID: 23356793
Twintrons; Vertebrate genomes; Gene expression
4.  Next-generation phylogenomics 
Biology Direct  2013;8:3.
Abstract
Thanks to advances in next-generation technologies, genome sequences are now being generated at breadth (e.g. across environments) and depth (thousands of closely related strains, individuals or samples) unimaginable only a few years ago. Phylogenomics – the study of evolutionary relationships based on comparative analysis of genome-scale data – has so far been developed as industrial-scale molecular phylogenetics, proceeding in the two classical steps: multiple alignment of homologous sequences, followed by inference of a tree (or multiple trees). However, the algorithms typically employed for these steps scale poorly with number of sequences, such that for an increasing number of problems, high-quality phylogenomic analysis is (or soon will be) computationally infeasible. Moreover, next-generation data are often incomplete and error-prone, and analysis may be further complicated by genome rearrangement, gene fusion and deletion, lateral genetic transfer, and transcript variation. Here we argue that next-generation data require next-generation phylogenomics, including so-called alignment-free approaches.
Reviewers
Reviewed by Mr Alexander Panchin (nominated by Dr Mikhail Gelfand), Dr Eugene Koonin and Prof Peter Gogarten. For the full reviews, please go to the Reviewers’ comments section.
doi:10.1186/1745-6150-8-3
PMCID: PMC3564786  PMID: 23339707
Phylogenomics; Multiple sequence alignment; Alignment-free methods; k-mers; Homology signal
5.  Microevolutionary, macroevolutionary, ecological and taxonomical implications of punctuational theories of adaptive evolution 
Biology Direct  2013;8:1.
Abstract
Punctuational theories of evolution suggest that adaptive evolution proceeds mostly, or even entirely, in the distinct periods of existence of a particular species. The mechanisms of this punctuated nature of evolution suggested by the various theories differ. Therefore the predictions of particular theories concerning various evolutionary phenomena also differ.
Punctuational theories can be subdivided into five classes, which differ in their mechanism and their evolutionary and ecological implications. For example, the transilience model of Templeton (class III), genetic revolution model of Mayr (class IV) or the frozen plasticity theory of Flegr (class V), suggests that adaptive evolution in sexual species is operative shortly after the emergence of a species by peripatric speciation – while it is evolutionary plastic. To a major degree, i.e. throughout 98-99% of their existence, sexual species are evolutionarily frozen (class III) or elastic (class IV and V) on a microevolutionary time scale and evolutionarily frozen on a macroevolutionary time scale and can only wait for extinction, or the highly improbable return of a population segment to the plastic state due to peripatric speciation.
The punctuational theories have many evolutionary and ecological implications. Most of these predictions could be tested empirically, and should be analyzed in greater depth theoretically. The punctuational theories offer many new predictions that need to be tested, but also provide explanations for a much broader spectrum of known biological phenomena than classical gradualistic evolutionary theories.
Reviewers
This article was reviewed by Claus Wilke, Pierre Pantarotti and David Penny (nominated by Anthony Poole).
doi:10.1186/1745-6150-8-1
PMCID: PMC3564765  PMID: 23324625
Speciation; Frozen plasticity; Frozen evolution; Peripatric speciation; Invasive species; Domestication; Asexual species; Genetic draft; Genetic hitchhiking; Advantage of sex; Evolutionary trends; Dead clade walking; Cambrian explosion; Origin of genera; Taxonomy
6.  Deep transcriptome-sequencing and proteome analysis of the hydrothermal vent annelid Alvinella pompejana identifies the CvP-bias as a robust measure of eukaryotic thermostability 
Biology Direct  2013;8:2.
Background
Alvinella pompejana is an annelid worm that inhabits deep-sea hydrothermal vent sites in the Pacific Ocean. Living at a depth of approximately 2500 meters, these worms experience extreme environmental conditions, including high temperature and pressure as well as high levels of sulfide and heavy metals. A. pompejana is one of the most thermotolerant metazoans, making this animal a subject of great interest for studies of eukaryotic thermoadaptation.
Results
In order to complement existing EST resources we performed deep sequencing of the A. pompejana transcriptome. We identified several thousand novel protein-coding transcripts, nearly doubling the sequence data for this annelid. We then performed an extensive survey of previously established prokaryotic thermoadaptation measures to search for global signals of thermoadaptation in A. pompejana in comparison with mesophilic eukaryotes. In an orthologous set of 457 proteins, we found that the best indicator of thermoadaptation was the difference in frequency of charged versus polar residues (CvP-bias), which was highest in A. pompejana. CvP-bias robustly distinguished prokaryotic thermophiles from prokaryotic mesophiles, as well as the thermophilic fungus Chaetomium thermophilum from mesophilic eukaryotes. Experimental values for thermophilic proteins supported higher CvP-bias as a measure of thermal stability when compared to their mesophilic orthologs. Proteome-wide mean CvP-bias also correlated with the body temperatures of homeothermic birds and mammals.
Conclusions
Our work extends the transcriptome resources for A. pompejana and identifies the CvP-bias as a robust and widely applicable measure of eukaryotic thermoadaptation.
Reviewer
This article was reviewed by Sándor Pongor, L. Aravind and Anthony M. Poole.
doi:10.1186/1745-6150-8-2
PMCID: PMC3564776  PMID: 23324115
7.  Detection of a common chimeric transcript between human chromosomes 7 and 16 
Biology Direct  2012;7:49.
Abstract
Interchromosomal chimeric RNA molecules are often transcription products from genomic rearrangement in cancerous cells. Here we report the computational detection of an interchromosomal RNA fusion between ZC3HAV1L and CHMP1A from RNA-seq data of normal human mammary epithelial cells, and experimental confirmation of the chimeric transcript in multiple human cells and tissues. Our experimental characterization also detected three variants of the ZC3HAV1L-CHMP1A chimeric RNA, suggesting that these genes are involved in complex splicing. The fusion sequence at the novel exon-exon boundary, and the absence of corresponding DNA rearrangement suggest that this chimeric RNA is likely produced by trans-splicing in human cells.
Reviewers
This article was reviewed by Rory Johnson (nominated by Fyodor Kondrashov); Gal Avital and Itai Yanai
doi:10.1186/1745-6150-7-49
PMCID: PMC3538553  PMID: 23273016
Chimeric transcripts; RNA fusion; trans-splicing; Genome rearrangement
8.  Cubic time algorithms of amalgamating gene trees and building evolutionary scenarios 
Biology Direct  2012;7:48.
Background
A long recognized problem is the inference of the supertree S that amalgamates a given set {Gj} of trees Gj, with leaves in each Gj being assigned homologous elements.
We ground on an approach to find the tree S by minimizing the total cost of mappings αj of individual gene trees Gj into S. Traditionally, this cost is defined basically as a sum of duplications and gaps in each αj. The classical problem is to minimize the total cost, where S runs over the set of all trees that contain an exhaustive non-redundant set of species from all input Gj.
Results
We suggest a reformulation of the classical NP-hard problem of building a supertree in terms of the global minimization of the same cost functional but only over species trees S that consist of clades belonging to a fixed set P (e.g., an exhaustive set of clades in all Gj). We developed a deterministic solving algorithm with a low degree polynomial (typically cubic) time complexity with respect to the size of input data.
We define an extensive set of elementary evolutionary events and suggest an original definition of mapping β of tree G into tree S. We introduce the cost functional c(G, S, f ) and define the mapping β as the global minimum of this functional with respect to the variable f, in which sense it is a generalization of classical mapping α.
We suggest a reformulation of the classical NP-hard mapping (reconciliation) problem by introducing time slices into the species tree S and present a cubic time solving algorithm to compute the mapping β. We introduce two novel definitions of the evolutionary scenario based on mapping β or a random process of gene evolution along a species tree.
Conclusions
Developed algorithms are mathematically proved, which justifies the following statements. The supertree building algorithm finds exactly the global minimum of the total cost if only gene duplications and losses are allowed and the given sets of gene trees satisfies a certain condition. The mapping algorithm finds exactly the minimal mapping β, the minimal total cost and the evolutionary scenario as a minimum over all possible distributions of elementary evolutionary events along the edges of tree S.
The algorithms and their effective software implementations provide useful tools in many biological studies. They facilitate processing of voluminous tree data in acceptable time still largely avoiding heuristics. Performance of the tools is tested with artificial and prokaryotic tree data.
Reviewers
This article was reviewed by Prof. Anthony Almudevar, Prof. Alexander Bolshoy (nominated by Prof. Peter Olofsson), and Prof. Marek Kimmel.
doi:10.1186/1745-6150-7-48
PMCID: PMC3577452  PMID: 23259766
Phylogenetics; Fast algorithms; Tree inference; Species tree; Tree amalgamation; Tree reconciliation; Supertree; Evolutionary events; Gene duplication; Gene loss; Horizontal gene transfer; Gene gain; Time slices
9.  AID/APOBEC cytosine deaminase induces genome-wide kataegis 
Biology Direct  2012;7:47.
Clusters of localized hypermutation in human breast cancer genomes, named “kataegis” (from the Greek for thunderstorm), are hypothesized to result from multiple cytosine deaminations catalyzed by AID/APOBEC proteins. However, a direct link between APOBECs and kataegis is still lacking. We have sequenced the genomes of yeast mutants induced in diploids by expression of the gene for PmCDA1, a hypermutagenic deaminase from sea lamprey. Analysis of the distribution of 5,138 induced mutations revealed localized clusters very similar to those found in tumors. Our data provide evidence that unleashed cytosine deaminase activity is an evolutionary conserved, prominent source of genome-wide kataegis events.
Reviewers
This article was reviewed by: Professor Sandor Pongor, Professor Shamil R. Sunyaev, and Dr Vladimir Kuznetsov.
doi:10.1186/1745-6150-7-47
PMCID: PMC3542020  PMID: 23249472
APOBEC; Deaminase; Mutation; Kataegis; Cancer; Diploid yeast; Hypermutation
10.  Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer 
Biology Direct  2012;7:46.
Background
Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea.
Results
The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major ‘highways’ of horizontal gene transfer.
Conclusions
The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time.
Reviewers
This article was reviewed by (for complete reviews see the Reviewers’ Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).
doi:10.1186/1745-6150-7-46
PMCID: PMC3534625  PMID: 23241446
Archaea; Orthologs; Horizontal gene transfer
11.  Ethanolamine utilization in Vibrio alginolyticus 
Biology Direct  2012;7:45.
Abstract
Ethanolamine is used as an energy source by phylogenetically diverse bacteria including pathogens, by the concerted action of proteins from the eut-operon. Previous studies have revealed the presence of eutBC genes encoding ethanolamine-ammonia lyase, a key enzyme that breaks ethanolamine into acetaldehyde and ammonia, in about 100 bacterial genomes including members of gamma-proteobacteria. However, ethanolamine utilization has not been reported for any member of the Vibrio genus. Our comparative genomics study reveals the presence of genes that are involved in ethanolamine utilization in several Vibrio species. Using Vibrio alginolyticus as a model system we demonstrate that ethanolamine is better utilized as a nitrogen source than as a carbon source.
Reviewers
This article was reviewed by Dr. Lakshminarayan Iyer and Dr. Vivek Anantharaman (nominated by Dr. L Aravind).
doi:10.1186/1745-6150-7-45
PMCID: PMC3542024  PMID: 23234435
Pathogenesis; Ethanolamine; Vibrio; Eut operon; Metabolosome
12.  Bioinformatics clouds for big data manipulation 
Biology Direct  2012;7:43.
Abstract
As advances in life sciences and information technology bring profound influences on bioinformatics due to its interdisciplinary nature, bioinformatics is experiencing a new leap-forward from in-house computing infrastructure into utility-supplied cloud computing delivered over the Internet, in order to handle the vast quantities of biological data generated by high-throughput experimental technologies. Albeit relatively new, cloud computing promises to address big data storage and analysis issues in the bioinformatics field. Here we review extant cloud-based services in bioinformatics, classify them into Data as a Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS), and present our perspectives on the adoption of cloud computing in bioinformatics.
Reviewers
This article was reviewed by Frank Eisenhaber, Igor Zhulin, and Sandor Pongor.
doi:10.1186/1745-6150-7-43
PMCID: PMC3533974  PMID: 23190475
Cloud computing; Bioinformatics; Big data; Data storage; Data analysis
13.  The origin of life is a spatially localized stochastic transition 
Biology Direct  2012;7:42.
Background
Life depends on biopolymer sequences as catalysts and as genetic material. A key step in the Origin of Life is the emergence of an autocatalytic system of biopolymers. Here we study computational models that address the way a living autocatalytic system could have emerged from a non-living chemical system, as envisaged in the RNA World hypothesis.
Results
We consider (i) a chemical reaction system describing RNA polymerization, and (ii) a simple model of catalytic replicators that we call the Two’s Company model. Both systems have two stable states: a non-living state, characterized by a slow spontaneous rate of RNA synthesis, and a living state, characterized by rapid autocatalytic RNA synthesis. The origin of life is a transition between these two stable states. The transition is driven by stochastic concentration fluctuations involving relatively small numbers of molecules in a localized region of space. These models are simulated on a two-dimensional lattice in which reactions occur locally on single sites and diffusion occurs by hopping of molecules to neighbouring sites.
Conclusions
If diffusion is very rapid, the system is well-mixed. The transition to life becomes increasingly difficult as the lattice size is increased because the concentration fluctuations that drive the transition become relatively smaller when larger numbers of molecules are involved. In contrast, when diffusion occurs at a finite rate, concentration fluctuations are local. The transition to life occurs in one local region and then spreads across the rest of the surface. The transition becomes easier with larger lattice sizes because there are more independent regions in which it could occur. The key observations that apply to our models and to the real world are that the origin of life is a rare stochastic event that is localized in one region of space due to the limited rate of diffusion of the molecules involved and that the subsequent spread across the surface is deterministic. It is likely that the time required for the deterministic spread is much shorter than the waiting time for the origin, in which case life evolves only once on a planet, and then rapidly occupies the whole surface.
Reviewers
Reviewed by Omer Markovitch (nominated by Doron Lancet), Claus Wilke, and Nobuto Takeuchi (nominated by Eugene Koonin).
doi:10.1186/1745-6150-7-42
PMCID: PMC3541068  PMID: 23176307
14.  Somatic transposition in the brain has the potential to influence the biosynthesis of metabolites involved in Parkinson’s disease and schizophrenia 
Biology Direct  2012;7:41.
Abstract
It has been recently discovered that transposable elements show high activity in the brain of mammals, however, the magnitude of their influence on its functioning is unclear so far. In this paper, I use flux balance analysis to examine the influence of somatic retrotransposition on brain metabolism, and the biosynthesis of its key metabolites, including neurotransmitters. The analysis shows that somatic transposition in the human brain can influence the biosynthesis of more than 250 metabolites, including dopamine, serotonin and glutamate, shows large inter-individual variability in metabolic effects, and may contribute to the development of Parkinson’s disease and schizophrenia.
Reviewers
This article was reviewed by Dr Kenji Kojima (nominated by Dr Jerzy Jurka) and Dr Eugene Koonin.
doi:10.1186/1745-6150-7-41
PMCID: PMC3534579  PMID: 23176288
Retrotransposition; Brain; Metabolic network; Parkinson’s disease; Schizophrenia
15.  Live virus-free or die: coupling of antivirus immunity and programmed suicide or dormancy in prokaryotes 
Biology Direct  2012;7:40.
Background
The virus-host arms race is a major theater for evolutionary innovation. Archaea and bacteria have evolved diverse, elaborate antivirus defense systems that function on two general principles: i) immune systems that discriminate self DNA from nonself DNA and specifically destroy the foreign, in particular viral, genomes, whereas the host genome is protected, or ii) programmed cell suicide or dormancy induced by infection.
Presentation of the hypothesis
Almost all genomic loci encoding immunity systems such as CRISPR-Cas, restriction-modification and DNA phosphorothioation also encompass suicide genes, in particular those encoding known and predicted toxin nucleases, which do not appear to be directly involved in immunity. In contrast, the immunity systems do not appear to encode antitoxins found in typical toxin-antitoxin systems. This raises the possibility that components of the immunity system themselves act as reversible inhibitors of the associated toxin proteins or domains as has been demonstrated for the Escherichia coli anticodon nuclease PrrC that interacts with the PrrI restriction-modification system. We hypothesize that coupling of diverse immunity and suicide/dormancy systems in prokaryotes evolved under selective pressure to provide robustness to the antivirus response. We further propose that the involvement of suicide/dormancy systems in the coupled antivirus response could take two distinct forms:
1) induction of a dormancy-like state in the infected cell to ‘buy time’ for activation of adaptive immunity; 2) suicide or dormancy as the final recourse to prevent viral spread triggered by the failure of immunity.
Testing the hypothesis
This hypothesis entails many experimentally testable predictions. Specifically, we predict that Cas2 protein present in all cas operons is a mRNA-cleaving nuclease (interferase) that might be activated at an early stage of virus infection to enable incorporation of virus-specific spacers into the CRISPR locus or to trigger cell suicide when the immune function of CRISPR-Cas systems fails. Similarly, toxin-like activity is predicted for components of numerous other defense loci.
Implications of the hypothesis
The hypothesis implies that antivirus response in prokaryotes involves key decision-making steps at which the cell chooses the path to follow by sensing the course of virus infection.
Reviewers
This article was reviewed by Arcady Mushegian, Etienne Joly and Nick Grishin. For complete reviews, go to the Reviewers’ reports section.
doi:10.1186/1745-6150-7-40
PMCID: PMC3506569  PMID: 23151069
16.  ALOG domains: provenance of plant homeotic and developmental regulators from the DNA-binding domain of a novel class of DIRS1-type retroposons 
Biology Direct  2012;7:39.
Members of the Arabidopsis LSH1 and Oryza G1 (ALOG) family of proteins have been shown to function as key developmental regulators in land plants. However, their precise mode of action remains unclear. Using sensitive sequence and structure analysis, we show that the ALOG domains are a distinct version of the N-terminal DNA-binding domain shared by the XerC/D-like, protelomerase, topoisomerase-IA, and Flp tyrosine recombinases. ALOG domains are distinguished by the insertion of an additional zinc ribbon into this DNA-binding domain. In particular, we show that the ALOG domain is derived from the XerC/D-like recombinases of a novel class of DIRS-1-like retroposons. Copies of this element, which have been recently inactivated, are present in several marine metazoan lineages, whereas the stramenopile Ectocarpus, retains an active copy of the same. Thus, we predict that ALOG domains help establish organ identity and differentiation by binding specific DNA sequences and acting as transcription factors or recruiters of repressive chromatin. They are also found in certain plant defense proteins, where they are predicted to function as DNA sensors. The evolutionary history of the ALOG domain represents a unique instance of a domain, otherwise exclusively found in retroelements, being recruited as a specific transcription factor in the streptophyte lineage of plants. Hence, they add to the growing evidence for derivation of DNA-binding domains of eukaryotic specific TFs from mobile and selfish elements.
doi:10.1186/1745-6150-7-39
PMCID: PMC3537659  PMID: 23146749
DIRS1; Tyrosine recombinase; Plant development; DNA-binding; Retroposon; Transcription factor; Chromatin protein; Plant defense
17.  Early evolution of efficient enzymes and genome organization 
Biology Direct  2012;7:38.
Background
Cellular life with complex metabolism probably evolved during the reign of RNA, when it served as both information carrier and enzyme. Jensen proposed that enzymes of primordial cells possessed broad specificities: they were generalist. When and under what conditions could primordial metabolism run by generalist enzymes evolve to contemporary-type metabolism run by specific enzymes?
Results
Here we show by numerical simulation of an enzyme-catalyzed reaction chain that specialist enzymes spread after the invention of the chromosome because protocells harbouring unlinked genes maintain largely non-specific enzymes to reduce their assortment load. When genes are linked on chromosomes, high enzyme specificity evolves because it increases biomass production, also by reducing taxation by side reactions.
Conclusion
The constitution of the genetic system has a profound influence on the limits of metabolic efficiency. The major evolutionary transition to chromosomes is thus proven to be a prerequisite for a complex metabolism. Furthermore, the appearance of specific enzymes opens the door for the evolution of their regulation.
Reviewers
This article was reviewed by Sándor Pongor, Gáspár Jékely, and Rob Knight.
doi:10.1186/1745-6150-7-38
PMCID: PMC3534232  PMID: 23114029
Origin of life; Chromosome; Metabolism; Ribozyme; Major transitions; Enzyme evolution
18.  Thousands of missed genes found in bacterial genomes and their analysis with COMBREX 
Biology Direct  2012;7:37.
Background
The dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST.
Results
By analyzing 1,474 prokaryotic genome annotations in GenBank, we identify 13,602 likely missed genes that are homologs to non-hypothetical proteins, and 11,792 likely missed genes that are homologs only to hypothetical proteins, yet have supporting evidence of their protein-coding nature from COMBREX, a newly created gene function database. We also estimate the likelihood that each potential missing gene found is a genuine protein-coding gene using COMBREX.
Conclusions
Our analysis of the causes of missed genes suggests that larger annotation centers tend to produce annotations with fewer missed genes than smaller centers, and many of the missed genes are short genes <300 bp. Over 1,000 of the likely missed genes could be associated with phenotype information available in COMBREX. 359 of these genes, found in pathogenic organisms, may be potential targets for pharmaceutical research. The newly identified genes are available on COMBREX’s website.
Reviewers
This article was reviewed by Daniel Haft, Arcady Mushegian, and M. Pilar Francino (nominated by David Ardell).
doi:10.1186/1745-6150-7-37
PMCID: PMC3534567  PMID: 23111013
19.  Distinct groups of repetitive families preserved in mammals correspond to different periods of regulatory innovations in vertebrates 
Biology Direct  2012;7:36.
Background
Mammalian genomes are repositories of repetitive DNA sequences derived from transposable elements (TEs). Typically, TEs generate multiple, mostly inactive copies of themselves, commonly known as repetitive families or families of repeats. Recently, we proposed that families of TEs originate in small populations by genetic drift and that the origin of small subpopulations from larger populations can be fueled by biological innovations.
Results
We report three distinct groups of repetitive families preserved in the human genome that expanded and declined during the three previously described periods of regulatory innovations in vertebrate genomes. The first group originated prior to the evolutionary separation of the mammalian and bird lineages and the second one during subsequent diversification of the mammalian lineages prior to the origin of eutherian lineages. The third group of families is primate-specific.
Conclusions
The observed correlation implies a relationship between regulatory innovations and the origin of repetitive families. Consistent with our previous hypothesis, it is proposed that regulatory innovations fueled the origin of new subpopulations in which new repetitive families became fixed by genetic drift.
Reviewers
Eugene Koonin, I. King Jordan, Jürgen Brosius.
doi:10.1186/1745-6150-7-36
PMCID: PMC3500645  PMID: 23098210
Transposable elements; Conserved repeats; Genetic drift; Evolution
20.  Constructive neutral evolution: exploring evolutionary theory’s curious disconnect 
Biology Direct  2012;7:35.
Abstract
Constructive neutral evolution (CNE) suggests that neutral evolution may follow a stepwise path to extravagance. Whether or not CNE is common, the mere possibility raises provocative questions about causation: in classical neo-Darwinian thinking, selection is the sole source of creativity and direction, the only force that can cause trends or build complex features. However, much of contemporary evolutionary genetics departs from the conception of evolution underlying neo-Darwinism, resulting in a widening gap between what formal models allow, and what the prevailing view of the causes of evolution suggests. In particular, a mutationist conception of evolution as a 2-step origin-fixation process has been a source of theoretical innovation for 40 years, appearing not only in the Neutral Theory, but also in recent breakthroughs in modeling adaptation (the “mutational landscape” model), and in practical software for sequence analysis. In this conception, mutation is not a source of raw materials, but an agent that introduces novelty, while selection is not an agent that shapes features, but a stochastic sieve. This view, which now lays claim to important theoretical, experimental, and practical results, demands our attention. CNE provides a way to explore its most significant implications about the role of variation in evolution.
Reviewers
Alex Kondrashov, Eugene Koonin and Johann Peter Gogarten reviewed this article.
doi:10.1186/1745-6150-7-35
PMCID: PMC3534586  PMID: 23062217
Evolutionary theory; Constructive neutral evolution; Neo-Darwinism; Mutation; Evolutionary genetics; Mutation bias; Modern Synthesis
21.  Proteorhodopsin genes in giant viruses 
Biology Direct  2012;7:34.
Viruses with large genomes encode numerous proteins that do not directly participate in virus biogenesis but rather modify key functional systems of infected cells. We report that a distinct group of giant viruses infecting unicellular eukaryotes that includes Organic Lake Phycodnaviruses and Phaeocystis globosa virus encode predicted proteorhodopsins that have not been previously detected in viruses. Search of metagenomic sequence data shows that putative viral proteorhodopsins are extremely abundant in marine environments. Phylogenetic analysis suggests that giant viruses acquired proteorhodopsins via horizontal gene transfer from proteorhodopsin-encoding protists although the actual donor(s) could not be presently identified. The pattern of conservation of the predicted functionally important amino acid residues suggests that viral proteorhodopsin homologs function as sensory rhodopsins. We hypothesize that viral rhodopsins modulate light-dependent signaling, in particular phototaxis, in infected protists.
This article was reviewed by Igor B. Zhulin and Laksminarayan M. Iyer. For the full reviews, see the Reviewers’ reports section.
doi:10.1186/1745-6150-7-34
PMCID: PMC3500653  PMID: 23036091
22.  Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage 
Biology Direct  2012;7:32.
Background
The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family.
Results
The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage.
Conclusions
This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important “missing genes” or “missing function” cases and illustrates the danger of functional annotation of protein families by homology alone.
Reviewers’ names
This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind.
doi:10.1186/1745-6150-7-32
PMCID: PMC3541065  PMID: 23013770
Diphthamide; Vitamin B12; Amidotransferase; Comparative genomics
23.  Evasion of tumours from the control of the immune system: consequences of brief encounters 
Biology Direct  2012;7:31.
Background
In this work a mathematical model describing the growth of a solid tumour in the presence of an immune system response is presented. Specifically, attention is focused on the interactions between cytotoxic T-lymphocytes (CTLs) and tumour cells in a small, avascular multicellular tumour. At this stage of the disease the CTLs and the tumour cells are considered to be in a state of dynamic equilibrium or cancer dormancy. The precise biochemical and cellular mechanisms by which CTLs can control a cancer and keep it in a dormant state are still not completely understood from a biological and immunological point of view. The mathematical model focuses on the spatio-temporal dynamics of tumour cells, immune cells, chemokines and “chemorepellents” in an immunogenic tumour. The CTLs and tumour cells are assumed to migrate and interact with each other in such a way that lymphocyte-tumour cell complexes are formed. These complexes result in either the death of the tumour cells (the normal situation) or the inactivation of the lymphocytes and consequently the survival of the tumour cells. In the latter case, we assume that each tumour cell that survives its “brief encounter” with the CTLs undergoes certain beneficial phenotypic changes.
Results
We explore the dynamics of the model under these assumptions and show that the process of immuno-evasion can arise as a consequence of these encounters. We show that the proposed mechanism not only shape the dynamics of the total number of tumor cells and of CTLs, but also the dynamics of their spatial distribution. We also briefly discuss the evolutionary features of our model, by framing them in the recent quasi-Lamarckian theories.
Conclusions
Our findings might have some interesting implication of interest for clinical practice. Indeed, immuno-editing process can be seen as an “involuntary” antagonistic process acting against immunotherapies, which aim at maintaining a tumor in a dormant state, or at suppressing it.
Reviewers
This article was reviewed by G. Bocharov (nominated by V. Kuznetsov, member of the Editorial Board of Biology Direct), M. Kimmel and A. Marciniak-Czochra.
doi:10.1186/1745-6150-7-31
PMCID: PMC3582466  PMID: 23009638
Tumour growth; Immune response; Cytotoxic T-lymphocytes; Immuno-evasion; Mathematical models; Chemotaxis; Diffusion; Immuno-editing
24.  Stop codons in bacteria are not selectively equivalent 
Biology Direct  2012;7:30.
Background
The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes.
Results
We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG.
Conclusions
Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.
Reviewers
This article was reviewed by Michail Gelfand, Arcady Mushegian and Shamil Sunyaev. For the full reviews, please go to the Reviewers’ Comments section.
doi:10.1186/1745-6150-7-30
PMCID: PMC3549826  PMID: 22974057
25.  Novel and unexpected bacterial diversity in an arsenic-rich ecosystem revealed by culture-dependent approaches 
Biology Direct  2012;7:28.
Background
Acid Mine Drainages (AMDs) are extreme environments characterized by very acid conditions and heavy metal contaminations. In these ecosystems, the bacterial diversity is considered to be low. Previous culture-independent approaches performed in the AMD of Carnoulès (France) confirmed this low species richness. However, very little is known about the cultured bacteria in this ecosystem. The aims of the study were firstly to apply novel culture methods in order to access to the largest cultured bacterial diversity, and secondly to better define the robustness of the community for 3 important functions: As(III) oxidation, cellulose degradation and cobalamine biosynthesis.
Results
Despite the oligotrophic and acidic conditions found in AMDs, the newly designed media covered a large range of nutrient concentrations and a pH range from 3.5 to 9.8, in order to target also non-acidophilic bacteria. These approaches generated 49 isolates representing 19 genera belonging to 4 different phyla. Importantly, overall diversity gained 16 extra genera never detected in Carnoulès. Among the 19 genera, 3 were previously uncultured, one of them being novel in databases. This strategy increased the overall diversity in the Carnoulès sediment by 70% when compared with previous culture-independent approaches, as specific phylogenetic groups (e.g. the subclass Actinobacteridae or the order Rhizobiales) were only detected by culture. Cobalamin auxotrophy, cellulose degradation and As(III)-oxidation are 3 crucial functions in this ecosystem, and a previous meta- and proteo-genomic work attributed each function to only one taxon. Here, we demonstrate that other members of this community can also assume these functions, thus increasing the overall community robustness.
Conclusions
This work highlights that bacterial diversity in AMDs is much higher than previously envisaged, thus pointing out that the AMD system is functionally more robust than expected. The isolated bacteria may be part of the rare biosphere which remained previously undetected due to molecular biases. No matter their current ecological relevance, the exploration of the full diversity remains crucial to decipher the function and dynamic of any community. This work also underlines the importance to associate culture-dependent and -independent approaches to gain an integrative view of the community function.
Reviewers
This paper was reviewed by Sándor Pongor, Eugene V. Koonin and Brett Baker (nominated by Purificacion Lopez-Garcia).
doi:10.1186/1745-6150-7-28
PMCID: PMC3443666  PMID: 22963335
Acid mine drainage (AMD); Alkaliphilic bacteria; Neutrophilic bacteria; Functional redundancy; Rare biosphere; Uncultured bacteria; Molecular biases; Culture-dependent approaches; Actinobacteria; Bacterial diversity

Results 1-25 (369)