Search tips
Search criteria

Results 1-25 (462)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods 
Biology Direct  2015;10:38.
In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to “Domains of Unknown Function” (DUF) or “Uncharacterized Protein Family” (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function.
We applied a ‘computational structural genomics’ approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659.
This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still ‘non-trivial’ with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners.
This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0069-2) contains supplementary material, which is available to authorized users.
PMCID: PMC4520260
Domain of Unknown Function (DUF); Fold assignment; Function annotation; Remote similarity; Homology detection and protein evolution
2.  Synthetic circuit designs for earth terraformation 
Biology Direct  2015;10:37.
Mounting evidence indicates that our planet might experience runaway effects associated to rising temperatures and ecosystem overexploitation, leading to catastrophic shifts on short time scales. Remediation scenarios capable of counterbalancing these effects involve geoengineering, sustainable practices and carbon sequestration, among others. None of these scenarios seems powerful enough to achieve the desired restoration of safe boundaries.
Presentation of the hypothesis
We hypothesize that synthetic organisms with the appropriate engineering design could be used to safely prevent declines in some stressed ecosystems and help improving carbon sequestration. Such schemes would include engineering mutualistic dependencies preventing undesired evolutionary processes. We hypothesize that some particular design principles introduce unescapable constraints to the engineered organisms that act as effective firewalls.
Testing the hypothesis
Testing this designed organisms can be achieved by using controlled bioreactor models, with single and heterogeneous populations, and accurate computational models including different scales (from genetic constructs and metabolic pathways to population dynamics).
Implications of the hypothesis
Our hypothesis heads towards a future anthropogenic action that should effectively act as Terraforming processes. It also implies a major challenge in the existing biosafety policies, since we suggest release of modified organisms as potentially necessary strategy for success.
This article was reviewed by This article was reviewed by Eugene V. Koonin, Tom Ellis (nominated by Purificación Lopez-Garcia) and Eörs Szathmary.
PMCID: PMC4506446  PMID: 26187273
Synthetic biology; Ecological engineering; Climate change; Catastrophic shifts; Mutualism
3.  The UBR-box and its relationship to binuclear RING-like treble clef zinc fingers 
Biology Direct  2015;10:36.
The N-end rule pathway is a part of the ubiquitin–dependent proteolytic system wherein N-recognin proteins recognize the amino terminal degradation signals (N-degrons) of the substrate. The type 1 N-degron recognizing UBR-box domain of the eukaryotic Arg/N-end rule pathway is known to possess a novel three-zinc-stabilized heart-shaped fold.
Using sequence and structure analysis we argue that the UBR-box fold emerged from a binuclear RING-like treble clef zinc finger. The RING-like core is preserved in the UBR-box and the metal-chelating motifs display significant sequence and structural similarity to B-box and ZZ domains. UBR-box domains retrieved in our analysis co-occur with a variety of other protein domains, suggestive of its involvement in diverse biological roles. The UBR-box is a unique family of RING-like treble clefs as it displays a distinct circular permutation at the zinc-knuckle of the first zinc-binding site unlike other documented permutations of the RING-like domains which occur at the second zinc-binding site. The circular permutation of the RING-like treble clef scaffold has possibly aided the gain of a novel and relatively deep cleft suited for binding N-degrons. The N- and C-terminal extensions to the circularly permuted RING-like region bind a third zinc ion, which likely provides additional stability to the domain by keeping the two halves of the permuted zinc-knuckle together.
Structural modifications and extensions to the RING-like core have resulted in a novel UBR-box fold, which can recognize and target the type 1 N-degron containing proteins for ubiquitin-mediated proteolysis. The UBR-box appears to have emerged during the expansion of ubiquitin system pathway-related functions in eukaryotes, but is also likely to have other non-N-recognin functions as well.
This article was reviewed by Eugene Koonin, Balaji Santhanam, Kira S. Makarova.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0066-5) contains supplementary material, which is available to authorized users.
PMCID: PMC4506424  PMID: 26185100
Binuclear treble clefs; zinc fingers; B-box; ZZ domain; N-end rule; novel fold; U-box
4.  The distribution, diversity, and importance of 16S rRNA gene introns in the order Thermoproteales 
Biology Direct  2015;10:35.
Intron sequences are common in 16S rRNA genes of specific thermophilic lineages of Archaea, specifically the Thermoproteales (phylum Crenarchaeota). Environmental sequencing (16S rRNA gene and metagenome) from geothermal habitats in Yellowstone National Park (YNP) has expanded the available datasets for investigating 16S rRNA gene introns. The objectives of this study were to characterize and curate archaeal 16S rRNA gene introns from high-temperature habitats, evaluate the conservation and distribution of archaeal 16S rRNA introns in geothermal systems, and determine which “universal” archaeal 16S rRNA gene primers are impacted by the presence of intron sequences.
Several new introns were identified and their insertion loci were constrained to thirteen locations across the 16S rRNA gene. Many of these introns encode homing endonucleases, although some introns were short or partial sequences. Pyrobaculum, Thermoproteus, and Caldivirga 16S rRNA genes contained the most abundant and diverse intron sequences. Phylogenetic analysis of introns revealed that sequences within the same locus are distributed biogeographically. The most diverse set of introns were observed in a high-temperature, circumneutral (pH 6) sulfur sediment environment, which also contained the greatest diversity of different Thermoproteales phylotypes.
The widespread presence of introns in the Thermoproteales indicates a high probability of misalignments using different “universal” 16S rRNA primers employed in environmental microbial community analysis.
This article was reviewed by Dr. Eugene Koonin and Dr. W. Ford Doolittle.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0065-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4496867  PMID: 26156036
Archaea; Thermoproteales; Introns; 16S rRNA; Geothermal; Metagenomics
5.  The Lamarckian chicken and the Darwinian egg 
Biology Direct  2015;10:34.
“Which came first, the Chicken or the Egg?” We suggest this question is not a paradox. The Modern Synthesis envisions speciation through genetic changes in germ cells via random mutations, an “Egg first” scenario, but perhaps epigenetic inheritance mechanisms can transmit adaptive changes initiated in the soma (“Chicken first”).
The article was reviewed by Dr. Eugene Koonin, Dr. Itai Yanai, Dr. Laura Landweber.
PMCID: PMC4486432  PMID: 26126811
Evolution; Darwinian Evolution; Lamarckian Evolution; Modern Synthesis; Weismann Barrier; Epigenetic Inheritance; Transgenerational RNA Interference
6.  Emergence of life: Physical chemistry changes the paradigm 
Biology Direct  2015;10:33.
Origin of life research has been slow to advance not only because of its complex evolutionary nature (Franklin Harold: In Search of Cell History, 2014) but also because of the lack of agreement on fundamental concepts, including the question of ‘what is life?’. To re-energize the research and define a new experimental paradigm, we advance four premises to better understand the physicochemical complexities of life’s emergence:Chemical and Darwinian (biological) evolutions are distinct, but become continuous with the appearance of heredity.Earth’s chemical evolution is driven by energies of cycling (diurnal) disequilibria and by energies of hydrothermal vents.Earth’s overall chemical complexity must be high at the origin of life for a subset of (complex) chemicals to phase separate and evolve into living states.Macromolecular crowding in aqueous electrolytes under confined conditions enables evolution of molecular recognition and cellular self-organization.
We discuss these premises in relation to current ‘constructive’ (non-evolutionary) paradigm of origins research – the process of complexification of chemical matter ‘from the simple to the complex’. This paradigm artificially avoids planetary chemical complexity and the natural tendency of molecular compositions toward maximum disorder embodied in the second law of thermodynamics. Our four premises suggest an empirical program of experiments involving complex chemical compositions under cycling gradients of temperature, water activity and electromagnetic radiation.
PMCID: PMC4460864  PMID: 26059688
Chemical evolution; Darwinian evolution; Origin of life; Diurnal gradients; Chemical complexity; Biomacromolecular crowding; Non-covalent intermolecular forces; Molecular recognition; Cellular organization
7.  The human transmembrane proteome 
Biology Direct  2015;10:31.
Transmembrane proteins have important roles in cells, as they are involved in energy production, signal transduction, cell-cell interaction, cell-cell communication and more. In human cells, they are frequently targets for pharmaceuticals; therefore, knowledge about their properties and structure is crucial. Topology of transmembrane proteins provide a low resolution structural information, which can be a starting point for either laboratory experiments or modelling their 3D structures.
Here, we present a database of the human α-helical transmembrane proteome, including the predicted and/or experimentally established topology of each transmembrane protein, together with the reliability of the prediction. In order to distinguish transmembrane proteins in the proteome as well as for topology prediction, we used a newly developed consensus method (CCTOP) that incorporates recent state of the art methods, with tested accuracies on a novel human benchmark protein set. CCTOP utilizes all available structure and topology data as well as bioinformatical evidences for topology prediction in a probabilistic framework provided by the hidden Markov model. This method shows the highest accuracy (98.5 % for discrinimating between transmembrane and non-transmembrane proteins and 84 % for per protein topology prediction) among the dozen tested topology prediction methods. Analysis of the human proteome with the CCTOP indicates that it contains 4998 (26 %) transmembrane proteins. Besides predicting topology, reliability of the predictions is estimated as well, and it is demonstrated that the per protein prediction accuracies of more than 60 % of the predictions are over 98 % on the benchmark sets and most probably on the predicted human transmembrane proteome too.
Here, we present the most accurate prediction of the human transmembrane proteome together with the experimental topology data. These data, as well as various statistics about the human transmembrane proteins and their topologies can be downloaded from and can be visualized at the website of the human transmembrane proteome (
This article was reviewed by Dr. Sandor Pongor, Dr. Michael Galperin and Dr. Pascale Gaudet (nominated by Dr Michael Galperin).
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0061-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4445273  PMID: 26018427
Transmembrane protein; Topology prediction; Hidden markov model; Constrained prediction
8.  Design principles for cancer therapy guided by changes in complexity of protein-protein interaction networks 
Biology Direct  2015;10:32.
The ever-increasing expanse of online bioinformatics data is enabling new ways to, not only explore the visualization of these data, but also to apply novel mathematical methods to extract meaningful information for clinically relevant analysis of pathways and treatment decisions. One of the methods used for computing topological characteristics of a space at different spatial resolutions is persistent homology. This concept can also be applied to network theory, and more specifically to protein-protein interaction networks, where the number of rings in an individual cancer network represents a measure of complexity.
We observed a linear correlation of R = −0.55 between persistent homology and 5-year survival of patients with a variety of cancers. This relationship was used to predict the proteins within a protein-protein interaction network with the most impact on cancer progression. By re-computing the persistent homology after computationally removing an individual node (protein) from the protein-protein interaction network, we were able to evaluate whether such an inhibition would lead to improvement in patient survival. The power of this approach lied in its ability to identify the effects of inhibition of multiple proteins and in the ability to expose whether the effect of a single inhibition may be amplified by inhibition of other proteins. More importantly, we illustrate specific examples of persistent homology calculations, which correctly predict the survival benefit observed effects in clinical trials using inhibitors of the identified molecular target.
We propose that computational approaches such as persistent homology may be used in the future for selection of molecular therapies in clinic. The technique uses a mathematical algorithm to evaluate the node (protein) whose inhibition has the highest potential to reduce network complexity. The greater the drop in persistent homology, the greater reduction in network complexity, and thus a larger potential for survival benefit. We hope that the use of advanced mathematics in medicine will provide timely information about the best drug combination for patients, and avoid the expense associated with an unsuccessful clinical trial, where drug(s) did not show a survival benefit.
This article was reviewed by Nathan J. Bowen (nominated by I. King Jordan), Tomasz Lipniacki, and Merek Kimmel.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0058-5) contains supplementary material, which is available to authorized users.
PMCID: PMC4445818  PMID: 26018239
Topology; Persistent homology; Betti number; Cancer; Protein interaction networks
9.  Imprecise intron losses are less frequent than precise intron losses but are not rare in plants 
Biology Direct  2015;10:24.
In this study, we identified 19 intron losses, including 11 precise intron losses (PILs), six imprecise intron losses (IILs), one de-exonization, and one exon deletion in tomato and potato, and 17 IILs in Arabidopsis thaliana. Comparative analysis of related genomes confirmed that all of the IILs have been fixed during evolution. Consistent with previous studies, our results indicate that PILs are a major type of intron loss. However, at least in plants, IILs are unlikely to be as rare as previously reported.
This article was reviewed by Jun Yu and Zhang Zhang. For complete reviews, see the Reviewers’ Reports section.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0056-7) contains supplementary material, which is available to authorized users.
PMCID: PMC4443532
Intron loss; De-intronization; De-exonization; Insertion; Deletion; Solanum; Arabidopsis thaliana
10.  In silico ribozyme evolution in a metabolically coupled RNA population 
Biology Direct  2015;10:30.
The RNA World hypothesis offers a plausible bridge from no-life to life on prebiotic Earth, by assuming that RNA, the only known molecule type capable of playing genetic and catalytic roles at the same time, could have been the first evolvable entity on the evolutionary path to the first living cell. We have developed the Metabolically Coupled Replicator System (MCRS), a spatially explicit simulation modelling approach to prebiotic RNA-World evolution on mineral surfaces, in which we incorporate the most important experimental facts and theoretical considerations to comply with recent knowledge on RNA and prebiotic evolution. In this paper the MCRS model framework has been extended in order to investigate the dynamical and evolutionary consequences of adding an important physico-chemical detail, namely explicit replicator structure – nucleotide sequence and 2D folding calculated from thermodynamical criteria – and their possible mutational changes, to the assumptions of a previously less detailed toy model.
For each mutable nucleotide sequence the corresponding 2D folded structure with minimum free energy is calculated, which in turn is used to determine the fitness components (degradation rate, replicability and metabolic enzyme activity) of the replicator. We show that the community of such replicators providing the monomer supply for their own replication by evolving metabolic enzyme activities features an improved propensity for stable coexistence and structural adaptation. These evolutionary advantages are due to the emergent uniformity of metabolic replicator fitnesses imposed on the community by local group selection and attained through replicator trait convergence, i.e., the tendency of replicator lengths, ribozyme activities and population sizes to become similar between the coevolving replicator species that are otherwise both structurally and functionally different.
In the most general terms it is the surprisingly high extra viability of the metabolic replicator system that the present model adds to the MCRS concept of the origin of life. Surface-bound, metabolically coupled RNA replicators tend to evolve different, enzymatically active sites within thermodynamically stable secondary structures, and the system as a whole evolves towards the robust coexistence of a complete set of such ribozymes driving the metabolism producing monomers for their own replication.
This article was reviewed by Gáspár Jékely, Anthony Poole and Armen Mulkidjanian
PMCID: PMC4445502  PMID: 26014147
RNA; folding; RNA-World; enzyme evolution; enzyme promiscuity; MCRS
11.  Modeling of interaction between cytochrome c and the WD domains of Apaf-1: bifurcated salt bridges underlying apoptosome assembly 
Biology Direct  2015;10:29.
Binding of cytochrome c, released from the damaged mitochondria, to the apoptotic protease activating factor 1 (Apaf-1) is a key event in the apoptotic signaling cascade. The binding triggers a major domain rearrangement in Apaf-1, which leads to oligomerization of Apaf-1/cytochrome c complexes into an apoptosome. Despite the availability of crystal structures of cytochrome c and Apaf-1 and cryo-electron microscopy models of the entire apoptosome, the binding mode of cytochrome c to Apaf-1, as well as the nature of the amino acid residues of Apaf-1 involved remain obscure.
We investigated the interaction between cytochrome c and Apaf-1 by combining several modeling approaches. We have applied protein-protein docking and energy minimization, evaluated the resulting models of the Apaf-1/cytochrome c complex, and carried out a further analysis by means of molecular dynamics simulations. We ended up with a single model structure where all the lysine residues of cytochrome c that are known as functionally-relevant were involved in forming salt bridges with acidic residues of Apaf-1. This model has revealed three distinctive bifurcated salt bridges, each involving a single lysine residue of cytochrome c and two neighboring acidic resides of Apaf-1. Salt bridge-forming amino acids of Apaf-1 showed a clear evolutionary pattern within Metazoa, with pairs of acidic residues of Apaf-1, involved in bifurcated salt bridges, reaching their highest numbers in the sequences of vertebrates, in which the cytochrome c-mediated mechanism of apoptosome formation seems to be typical.
The reported model of an Apaf-1/cytochrome c complex provides insights in the nature of protein-protein interactions which are hard to observe in crystallographic or electron microscopy studies. Bifurcated salt bridges can be expected to be stronger than simple salt bridges, and their formation might promote the conformational change of Apaf-1, leading to the formation of an apoptosome. Combination of structural and sequence analyses provides hints on the evolution of the cytochrome c-mediated apoptosis.
This article was reviewed by Andrei L. Osterman, Narayanaswamy Srinivasan, Igor N. Berezovsky, and Gerrit Vriend (nominated by Martijn Huynen).
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0059-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4445527  PMID: 26014357
Apoptosis; WD40 domains; Hydrogen bond; Salt bridge; Apoptosis; Protein-protein interactions; Caspase; Molecular dynamics simulations; Sequence analysis; Evolution
12.  Ancient origin of the biosynthesis of lignin precursors 
Biology Direct  2015;10:23.
Lignin plays an important role in plant structural support and water transport, and is considered one of the hallmarks of land plants. The recent discovery of lignin or its precursors in various algae has raised questions on the evolution of its biosynthetic pathway, which could be much more ancient than previously thought. To determine the taxonomic distribution of the lignin biosynthesis genes, we screened all publicly available genomes of algae and their closest non-photosynthetic relatives, as well as representative land plants. We also performed phylogenetic analysis of these genes to decipher the evolution and origin(s) of lignin biosynthesis.
Enzymes involved in making p-coumaryl alcohol, the simplest lignin monomer, are found in a variety of photosynthetic eukaryotes, including diatoms, dinoflagellates, haptophytes, cryptophytes as well as green and red algae. Phylogenetic analysis of these enzymes suggests that they are ancient and spread to some secondarily photosynthetic lineages when they acquired red and/or green algal endosymbionts. In some cases, one or more of these enzymes was likely acquired through lateral gene transfer (LGT) from bacteria.
Genes associated with p-coumaryl alcohol biosynthesis are likely to have evolved long before the transition of photosynthetic eukaryotes to land. The original function of this lignin precursor is therefore unlikely to have been related to water transport. We suggest that it participates in the biological defense of some unicellular and multicellular algae.
This article was reviewed by Mark Ragan, Uri Gophna, Philippe Deschamps.
PMCID: PMC4455696  PMID: 25994183
Lignin; Monolignol; Algae; Evolution; Haptophyte; Chlorophyte; Rhodophyte; Diatom; Cryptophyte; Dinoflagellate
13.  Mitochondrial activity in gametes and transmission of viable mtDNA 
Biology Direct  2015;10:22.
The retention of a genome in mitochondria (mtDNA) has several consequences, among which the problem of ensuring a faithful transmission of its genetic information through generations despite the accumulation of oxidative damage by reactive oxygen species (ROS) predicted by the free radical theory of ageing. A division of labour between male and female germ line mitochondria was proposed: since mtDNA is maternally inherited, female gametes would prevent damages by repressing oxidative phosphorylation, thus being quiescent genetic templates. We assessed mitochondrial activity in gametes of an unusual biological system (doubly uniparental inheritance of mitochondria, DUI), in which also sperm mtDNA is transmitted to the progeny, thus having to overcome the problem of maintaining genetic information viability while producing ATP for swimming.
Ultrastructural analysis shows no difference in the conformation of mitochondrial cristae in male and female mature gametes, while mitochondria in immature oocytes exhibit a simpler internal structure. Our data on transcriptional activity in germ line mitochondria show variability between sexes and different developmental stages, but we do not find evidence for transcriptional quiescence of mitochondria. Our observations on mitochondrial membrane potential are consistent with mitochondria being active in both male and female gametes.
Our findings and the literature we discussed may be consistent with the hypothesis that template mitochondria are not functionally silenced, on the contrary their activity might be fundamental for the inheritance mechanism. We think that during gametogenesis, fertilization and embryo development, mitochondria undergo selection for different traits (e.g. replication, membrane potential), increasing the probability of the transmission of functional organelles. In these phases of life cycle, the great reduction in mtDNA copy number per organelle/cell and the stochastic segregation of mtDNA variants would greatly improve the efficiency of selection. When a higher mtDNA copy number per organelle/cell is present, selection on mtDNA deleterious mutants is less effective, due to the buffering effect of wild-type variants. In our opinion, a combination of drift and selection on germ line mtDNA population, might be responsible for the maintenance of viable mitochondrial genetic information through generations, and a mitochondrial activity would be necessary for the selective process.
This article was reviewed by Nick Lane, Fedor S Severin and Fyodor Kondrashov.
PMCID: PMC4435915  PMID: 25981894
Mitochondrial membrane potential; Mitochondrial inheritance; Doubly uniparental inheritance of mitochondria; Germ line mitochondria; Oxidative phosphorylation; Reactive oxygen species; Oxidative stress; Ageing; mtDNA selection; Balbiani body
14.  The eukaryotic translation initiation regulator CDC123 defines a divergent clade of ATP-grasp enzymes with a predicted role in novel protein modifications 
Biology Direct  2015;10:21.
Deciphering the origin of uniquely eukaryotic features of sub-cellular systems, such as the translation apparatus, is critical in reconstructing eukaryogenesis. One such feature is the highly conserved, but poorly understood, eukaryotic protein CDC123, which regulates the abundance of the eukaryotic translation initiation eIF2 complex and binds one of its components eIF2γ. We show that the eukaryotic protein CDC123 defines a novel clade of ATP-grasp enzymes distinguished from all other members of the superfamily by a RAGNYA domain with two conserved lysines (henceforth the R2K clade). Combining the available biochemical and genetic data on CDC123 with the inferred enzymatic function, we propose that the eukaryotic CDC123 proteins are likely to function as ATP-dependent protein-peptide ligases which modify proteins by ribosome-independent addition of an oligopeptide tag. We also show that the CDC123 family emerged first in bacteria where it appears to have diversified along with the two other families of the R2K clade. The bacterial CDC123 family members are of two distinct types, one found as part of type VI secretion systems which deliver polymorphic toxins and the other functioning as potential effectors delivered to amoeboid eukaryotic hosts. Representatives of the latter type have also been independently transferred to phylogenetically unrelated amoeboid eukaryotes and their nucleo-cytoplasmic large DNA viruses. Similarly, the two other prokaryotic R2K clade families are also proposed to participate in biological conflicts between bacteriophages and their hosts. These findings add further evidence to the recently proposed hypothesis that the horizontal transfer of enzymatic effectors from the bacterial endosymbionts of the stem eukaryotes played a fundamental role in the emergence of the characteristically eukaryotic regulatory systems and sub-cellular structures.
This article was reviewed by Michael Galperin and Sandor Pongor.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0053-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4431377  PMID: 25976611
15.  Evolution of the RAG1-RAG2 locus: both proteins came from the same transposon 
Biology Direct  2015;10:20.
The RAG1 and RAG2 proteins are essential subunits of the V(D)J recombinase that is required for the generation of the enormous variability of antibodies and T-cell receptors in jawed vertebrates. It was demonstrated previously that the 600-aa catalytic core of RAG1 evolved from the transposase of the Transib superfamily transposons. However, although homologs of RAG1 and RAG2 genes are adjacent in the purple sea urchin genome, a transposon encoding both proteins so far has not been reported. Here we describe such transposons in the genomes of green sea urchin, a starfish and an oyster. Comparison of the domain architectures of the RAG1 homologs in these transposons, denoted TransibSU, and other Transib superfamily transposases provides for reconstruction of the structure of the hypothetical TransibVDJ transposon that gave rise to the VDJ recombinases at the onset of vertebrate evolution some 500 million years ago.
This article was reviewed by Mart Krupovic and I. King Jordan.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0055-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4411706  PMID: 25928409
Molecular evolution; genetics; immune system; V(D)J recombination; RAG1 and RAG2 proteins; Transib DNA transposons; Transib transposase
16.  Co-regulation of translation in protein complexes 
Biology Direct  2015;10:18.
Co-regulation of gene expression has been known for many years, and studied widely both globally and for individual genes. Nevertheless, most analyses concerned transcriptional control, which in case of physically interacting proteins and protein complex subunits may be of secondary importance. This research is the first quantitative analysis that provides global-scale evidence for translation co-regulation among associated proteins.
By analyzing the results of our previous quantitative model of translation, we have demonstrated that protein production rates plus several other translational parameters, such as mRNA and protein abundance, or number of produced proteins from a gene, are well concerted between stable complex subunits and party hubs. This may be energetically favorable during synthesis of complex building blocks and ensure their accurate production in time. In contrast, for connections with regulatory particles and date hubs translational co-regulation is less visible, indicating that in these cases maintenance of accurate levels of interacting particles is not necessarily beneficial.
Similar results obtained for distantly related model organisms, Saccharomyces cerevisiae and Homo sapiens, suggest that the phenomenon of translational co-regulation applies to the variety of living organisms and concerns many complex constituents. This phenomenon was also observed among the set of functionally linked proteins from Escherichia coli operons. This leads to the conclusion that translational regulation of a protein should always be studied with respect to the expression of its primary interacting partners.
This article was reviewed by Sandor Pongor and Claus Wilke.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0048-7) contains supplementary material, which is available to authorized users.
PMCID: PMC4409705  PMID: 25909184
Translation control; Translation regulation; Protein complex; Protein-protein interaction; Computer modeling; Protein production rate
17.  A new family of hybrid virophages from an animal gut metagenome 
Biology Direct  2015;10:19.
Search of metagenomics sequence databases for homologs of virophage capsid proteins resulted in the discovery of a new family of virophages in the sheep rumen metagenome. The genomes of the rumen virophages (RVP) encode a typical virophage major capsid protein, ATPase and protease combined with a Polinton-type, protein primed family B DNA polymerase. The RVP genomes appear to be linear molecules, with terminal inverted repeats. Thus, the RVP seem to represent virophage-Polinton hybrids that are likely capable of formation of infectious virions. Virion proteins of mimiviruses were detected in the same metagenomes as the RVP suggesting that the virophages of the new family parasitize on giant viruses that infect protist inhabitants of the rumen.
This article was reviewed by Mart Krupovic and Kenneth Stedman; for complete reviews, see the Reviewers’ Reports section.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0054-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4409740  PMID: 25909276
18.  Synthetic lethals in HIV: ways to avoid drug resistance 
Biology Direct  2015;10:17.
RNA viruses rapidly accumulate genetic variation, which can give rise to synthetic lethal (SL) and deleterious (SD) mutations. Synthetic lethal mutations (non-lethal when alone but lethal when combined in one genome) have been studied to develop cancer therapies. This principle can also be used against fast-evolving RNA-viruses. Indeed, targeting protein sites involved in SD + SL interactions with a drug would render any mutation of such sites, lethal.
Here, we set up a strategy to detect intragenic pairs of SL and SD at the surface of the protein to predict less escapable drug target sites. For this, we detected SD + SL, studying HIV protease (PR) and reverse transcriptase (RT) sequence alignments from two groups of VIH+ individuals: treated with drugs (T) or not (NT). Using a series of statistical approaches, we were able to propose bona fide SD + SL couples. When focusing on spatially close co-variant SD + SL couples at the surface of the protein, we found 5 SD + SL groups (2 in the protease and 3 in the reverse transcriptase), which could be good candidates to form pockets to accommodate potential drugs.
Thus, designing drugs targeting these specific SD + SL groups would not allow the virus to mutate any residue involved in such groups without losing an essential function. Moreover, we also show that the selection pressure induced by the treatment leads to the appearance of new mutations, which change the mutational landscape of the protein. This drives the existence of differential SD + SL couples between the drug-treated and non-treated groups. Thus, new anti-viral drugs should be designed differently to target such groups.
This article was reviewed by Neil Greenspan Csaba Pal and István Simon.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0044-y) contains supplementary material, which is available to authorized users.
PMCID: PMC4399722  PMID: 25888435
Synthetic lethals; Drug targets; Drug design; RNA viruses
19.  Sequence analysis reveals a conserved extension in the capping enzyme of the alphavirus supergroup, and a homologous domain in nodaviruses 
Biology Direct  2015;10:16.
Members of the alphavirus supergroup include human pathogens such as chikungunya virus, hepatitis E virus and rubella virus. They encode a capping enzyme with methyltransferase-guanylyltransferase (MTase-GTase) activity, which is an attractive drug target owing to its unique mechanism. However, its experimental study has proven very difficult.
We examined over 50 genera of viruses by sequence analyses. Earlier studies showed that the MTase-GTase contains a “Core” region conserved in sequence. We show that it is followed by a long extension, which we termed “Iceberg” region, whose secondary structure, but not sequence, is strikingly conserved throughout the alphavirus supergroup. Sequence analyses strongly suggest that the minimal capping domain corresponds to the Core and Iceberg regions combined, which is supported by earlier experimental data. The Iceberg region contains all known membrane association sites that contribute to the assembly of viral replication factories. We predict that it may also contain an overlooked, widely conserved membrane-binding amphipathic helix. Unexpectedly, we detected a sequence homolog of the alphavirus MTase-GTase in taxa related to nodaviruses and to chronic bee paralysis virus. The presence of a capping enzyme in nodaviruses is biologically consistent, since they have capped genomes but replicate in the cytoplasm, where no cellular capping enzyme is present. The putative MTase-GTase domain of nodaviruses also contains membrane-binding sites that may drive the assembly of viral replication factories, revealing an unsuspected parallel with the alphavirus supergroup.
Our work will guide the functional analysis of the alphaviral MTase-GTase and the production of domains for structure determination. The identification of a homologous domain in a simple model system, nodaviruses, which replicate in numerous eukaryotic cell systems (yeast, flies, worms, mammals, and plants), can further help crack the function and structure of the enzyme.
This article was reviewed by Valerian Dolja, Eugene Koonin and Sebastian Maurer-Stroh.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0050-0) contains supplementary material, which is available to authorized users.
PMCID: PMC4392871  PMID: 25886938
Methyltransferase; Guanylyltransferase; Capping; Alphavirus; Bromovirus; Nodavirus; Homology detection; Protein sequence analysis; Amphipathic alpha-helix; Viral replication factory; Chikungunya virus; Sindbis virus; Hepatitis E virus
20.  Test on existence of histology subtype-specific prognostic signatures among early stage lung adenocarcinoma and squamous cell carcinoma patients using a Cox-model based filter 
Biology Direct  2015;10:15.
Non-small cell lung cancer (NSCLC) is the predominant histological type of lung cancer, accounting for up to 85% of cases. Disease stage is commonly used to determine adjuvant treatment eligibility of NSCLC patients, however, it is an imprecise predictor of the prognosis of an individual patient. Currently, many researchers resort to microarray technology for identifying relevant genetic prognostic markers, with particular attention on trimming or extending a Cox regression model.
Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are two major histology subtypes of NSCLC. It has been demonstrated that fundamental differences exist in their underlying mechanisms, which motivated us to postulate the existence of specific genes related to the prognosis of each histology subtype.
In this article, we propose a simple filter feature selection algorithm with a Cox regression model as the base. Applying this method to real-world microarray data identifies a histology-specific prognostic gene signature. Furthermore, the resulting 32-gene (32/12 for AC/SCC) prognostic signature for early-stage AC and SCC samples has superior predictive ability relative to two relevant prognostic signatures, and has comparable performance with signatures obtained by applying two state-of-the art algorithms separately to AC and SCC samples.
Our proposal is conceptually simple, and straightforward to implement. Furthermore, it can be easily adapted and applied to a range of other research settings.
This article was reviewed by Leonid Hanin (nominated by Dr. Lev Klebanov), Limsoon Wong and Jun Yu.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0051-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4415297  PMID: 25887039
Non-small cell lung cancer (NSCLC); Adenocarcinoma (AC); Squamous cell carcinoma (SCC); Cox model; Prognosis; Histology-subtype specific; Gene expression barcode; Feature selection algorithm
21.  An atlas of mouse CD4+ T cell transcriptomes 
Biology Direct  2015;10:14.
CD4+ T cells are key regulators of the adaptive immune system and can be divided into T helper (Th) cells and regulatory T (Treg) cells. During an immune response Th cells mature from a naive state into one of several effector subtypes that exhibit distinct functions. The transcriptional mechanisms that underlie the specific functional identity of CD4+ T cells are not fully understood.
To assist investigations into the transcriptional identity and regulatory processes of these cells we performed mRNA-sequencing on three murine T helper subtypes (Th1, Th2 and Th17) as well as on splenic Treg cells and induced Treg (iTreg) cells. Our integrated analysis of this dataset revealed the gene expression changes associated with these related but distinct cellular identities. Each cell subtype differentially expresses a wealth of ‘subtype upregulated’ genes, some of which are well known whilst others promise new insights into signalling processes and transcriptional regulation. We show that hundreds of genes are regulated purely by alternative splicing to extend our knowledge of the role of post-transcriptional regulation in cell differentiation.
This CD4+ transcriptome atlas provides a valuable resource for the study of CD4+ T cell populations. To facilitate its use by others, we have made the data available in an easily accessible online resource at
This article was reviewed by Wayne Hancock, Christine Wells and Erik van Nimwegen.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0045-x) contains supplementary material, which is available to authorized users.
PMCID: PMC4384382  PMID: 25886751
T helper cell; CD4+ T cell; Treg; RNA-seq; Transcriptomics
22.  Babela massiliensis, a representative of a widespread bacterial phylum with unusual adaptations to parasitism in amoebae 
Biology Direct  2015;10:13.
Only a small fraction of bacteria and archaea that are identifiable by metagenomics can be grown on standard media. Recent efforts on deep metagenomics sequencing, single-cell genomics and the use of specialized culture conditions (culturomics) increasingly yield novel microbes some of which represent previously uncharacterized phyla and possess unusual biological traits.
We report isolation and genome analysis of Babela massiliensis, an obligate intracellular parasite of Acanthamoeba castellanii. B. massiliensis shows an unusual, fission mode of cell multiplication whereby large, polymorphic bodies accumulate in the cytoplasm of infected amoeba and then split into mature bacterial cells. This unique mechanism of cell division is associated with a deep degradation of the cell division machinery and delayed expression of the ftsZ gene. The genome of B. massiliensis consists of a circular chromosome approximately 1.12 megabase in size that encodes, 981 predicted proteins, 38 tRNAs and one typical rRNA operon. Phylogenetic analysis shows that B. massiliensis belongs to the putative bacterial phylum TM6 that so far was represented by the draft genome of the JCVI TM6SC1 bacterium obtained by single cell genomics and numerous environmental sequences.
Currently, B. massiliensis is the only cultivated member of the putative TM6 phylum. Phylogenomic analysis shows diverse taxonomic affinities for B. massiliensis genes, suggestive of multiple gene acquisitions via horizontal transfer from other bacteria and eukaryotes. Horizontal gene transfer is likely to be facilitated by the cohabitation of diverse parasites and symbionts inside amoeba. B. massiliensis encompasses many genes encoding proteins implicated in parasite-host interaction including the greatest number of ankyrin repeats among sequenced bacteria and diverse proteins related to the ubiquitin system. Characterization of B. massiliensis, a representative of a distinct bacterial phylum, thanks to its ability to grow in amoeba, reaffirms the critical role of diverse culture approaches in microbiology.
This article was reviewed by Dr. Igor Zhulin, Dr. Jeremy Selengut, and Pr Martijn Huynen.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0043-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4378268  PMID: 25884386
Intracellular bacteria; Amoeba; Bacterial replication
23.  Plant viruses of the Amalgaviridae family evolved via recombination between viruses with double-stranded and negative-strand RNA genomes 
Biology Direct  2015;10:12.
Plant viruses of the recently recognized family Amalgaviridae have monopartite double-stranded (ds) RNA genomes and encode two proteins: an RNA-dependent RNA polymerase (RdRp) and a putative capsid protein (CP). Whereas the RdRp of amalgaviruses has been found to be most closely related to the RdRps of dsRNA viruses of the family Partitiviridae, the provenance of their CP remained obscure. Here we show that the CP of amalgaviruses is homologous to the nucleocapsid proteins of negative-strand RNA viruses of the genera Phlebovirus (Bunyaviridae) and Tenuivirus. The chimeric genomes of amalgaviruses are a testament to the effectively limitless gene exchange between viruses that shaped the evolution of the virosphere.
Reviewers: This article was reviewed by Lakshminarayan M. Iyer and Nick V. Grishin. For complete reviews, see the Reviewers’ Reports section.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0047-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4377212  PMID: 25886840
Amalgaviridae; Virus origin; dsRNA viruses; Negative-sense RNA viruses; Capsid proteins
24.  Erratum to: Minimization of extracellular space as a driving force in prokaryote association and the origin of eukaryotes 
Biology Direct  2015;10:11.
Following the publication of this article [1] it was noticed that, due to an error on the part of the publisher, the 2nd round of comments submitted by Reviewer 1, Dr. López-García, were unintentionally omitted during the peer review process. As a consequence of this error, the authors were unable to reply to Dr. López-García’s comments and subsequently revise their manuscript accordingly (where appropriate).
In fairness to both the authors and reviewer, Dr. López-García’s (Reviewer 1) 2nd round of comments are now included below and Scott L Hooper and Helaine J Burstein (author) were given the opportunity to reply. Any consequent amendments to the research article [1] are outlined in the author’s replies.
PMCID: PMC4383194  PMID: 25888113
25.  QSAR based model for discriminating EGFR inhibitors and non-inhibitors using Random forest 
Biology Direct  2015;10:10.
Epidermal Growth Factor Receptor (EGFR) is a well-characterized cancer drug target. In the past, several QSAR models have been developed for predicting inhibition activity of molecules against EGFR. These models are useful to a limited set of molecules for a particular class like quinazoline-derivatives. In this study, an attempt has been made to develop prediction models on a large set of molecules (~3500 molecules) that include diverse scaffolds like quinazoline, pyrimidine, quinoline and indole.
We train, test and validate our classification models on a dataset called EGFR10 that contains 508 inhibitors (having inhibition activity IC50 less than 10 nM) and 2997 non-inhibitors. Our Random forest based model achieved maximum MCC 0.49 with accuracy 83.7% on a validation set using 881 PubChem fingerprints. In this study, frequency-based feature selection technique has been used to identify best fingerprints. It was observed that PubChem fingerprints FP380 (C(~O) (~O)), FP579 (O = C-C-C-C), FP388 (C(:C) (:N) (:N)) and FP 816 (ClC1CC(Br)CCC1) are more frequent in the inhibitors in comparison to non-inhibitors. In addition, we created different datasets namely EGFR100 containing inhibitors having IC50 < 100 nM and EGFR1000 containing inhibitors having IC50 < 1000 nM. We trained, test and validate our models on datasets EGFR100 and EGFR1000 datasets and achieved and maximum MCC 0.58 and 0.71 respectively. In addition, models were developed for predicting quinazoline and pyrimidine based EGFR inhibitors.
In summary, models have been developed on a large set of molecules of various classes for discriminating EGFR inhibitors and non-inhibitors. These highly accurate prediction models can be used to design and discover novel EGFR inhibitors. In order to provide service to the scientific community, a web server/standalone EGFRpred also has been developed (
This article was reviewed by Dr Murphy, Prof Wang and Dr. Eisenhaber.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0046-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4372225  PMID: 25880749
EGFR inhibitors; Classification of EGFR inhibitors and non-inhibitors; Active substructure; Active functional groups; PubChem fingerprint; QSAR; Random forest

Results 1-25 (462)