CRISPR (Clustered, Regularly, Interspaced, Short, Palindromic Repeats) loci provide prokaryotes with an adaptive immunity against viruses and other mobile genetic elements. CRISPR arrays can be transcribed and processed into small crRNA molecules, which are then used by the cell to target the foreign nucleic acid. Since spacers are accumulated by active CRISPR/Cas systems, the sequences of these spacers provide a record of the past "infection history" of the organism.
Here we analyzed all currently known spacers present in archaeal genomes and identified their source by DNA similarity. While nearly 50% of archaeal spacers matched mobile genetic elements, such as plasmids or viruses, several others matched chromosomal genes of other organisms, primarily other archaea. Thus, networks of gene exchange between archaeal species were revealed by the spacer analysis, including many cases of inter-genus and inter-species gene transfer events. Spacers that recognize viral sequences tend to be located further away from the leader sequence, implying that there exists a selective pressure for their retention.
CRISPR spacers provide direct evidence for extensive gene exchange in archaea, especially within genera, and support the current dogma where the primary role of the CRISPR/Cas system is anti-viral and anti-plasmid defense.
Open peer review
This article was reviewed by: Profs. W. Ford Doolittle, John van der Oost, Christa Schleper (nominated by board member Prof. J Peter Gogarten)
CRISPR; Lateral Gene transfer; Horizontal gene transfer; viruses; archaea; competence
Data assimilation refers to methods for updating the state vector (initial condition) of a complex spatiotemporal model (such as a numerical weather model) by combining new observations with one or more prior forecasts. We consider the potential feasibility of this approach for making short-term (60-day) forecasts of the growth and spread of a malignant brain cancer (glioblastoma multiforme) in individual patient cases, where the observations are synthetic magnetic resonance images of a hypothetical tumor.
We apply a modern state estimation algorithm (the Local Ensemble Transform Kalman Filter), previously developed for numerical weather prediction, to two different mathematical models of glioblastoma, taking into account likely errors in model parameters and measurement uncertainties in magnetic resonance imaging. The filter can accurately shadow the growth of a representative synthetic tumor for 360 days (six 60-day forecast/update cycles) in the presence of a moderate degree of systematic model error and measurement noise.
The mathematical methodology described here may prove useful for other modeling efforts in biology and oncology. An accurate forecast system for glioblastoma may prove useful in clinical settings for treatment planning and patient counseling.
This article was reviewed by Anthony Almudevar, Tomas Radivoyevitch, and Kristin Swanson (nominated by Georg Luebeck).
State estimation; data assimiliation; mathematical models; glioblastoma multiforme
The ability to perform de novo biosynthesis of purines is present in organisms in all three domains of life, reflecting the essentiality of these molecules to life. Although the pathway is quite similar in eukaryotes and bacteria, the archaeal pathway is more variable. A careful manual curation of genes in this pathway demonstrates the value of manual curation in archaea, even in pathways that have been well-studied in other domains.
We searched the Integrated Microbial Genome system (IMG) for the 17 distinct genes involved in the 11 steps of de novo purine biosynthesis in 65 sequenced archaea, finding 738 predicted proteins with sequence similarity to known purine biosynthesis enzymes. Each sequence was manually inspected for the presence of active site residues and other residues known or suspected to be required for function.
Many apparently purine-biosynthesizing archaea lack evidence for a single enzyme, either glycinamide ribonucleotide formyltransferase or inosine monophosphate cyclohydrolase, suggesting that there are at least two more gene variants in the purine biosynthetic pathway to discover. Variations in domain arrangement of formylglycinamidine ribonucleotide synthetase and substantial problems in aminoimidazole carboxamide ribonucleotide formyltransferase and inosine monophosphate cyclohydrolase assignments were also identified.
Manual curation revealed some overly specific annotations in the IMG gene product name, with predicted proteins without essential active site residues assigned product names implying enzymatic activity (21 proteins, 2.8% of proteins inspected) or Enzyme Commission (E. C.) numbers (57 proteins, 7.7%). There were also 57 proteins (7.7%) assigned overly generic names and 78 proteins (10.6%) without E.C. numbers as part of the assigned name when a specific enzyme name and E. C. number were well-justified.
The patchy distribution of purine biosynthetic genes in archaea is consistent with a pathway that has been shaped by horizontal gene transfer, duplication, and gene loss. Our results indicate that manual curation can improve upon automated annotation for a small number of automatically-annotated proteins and can reveal a need to identify further pathway components even in well-studied pathways.
This article was reviewed by Dr. Céline Brochier-Armanet, Dr Kira S Makarova (nominated by Dr. Eugene Koonin), and Dr. Michael Galperin.
Speciation corresponds to the progressive establishment of reproductive barriers between groups of individuals derived from an ancestral stock. Since Darwin did not believe that reproductive barriers could be selected for, he proposed that most events of speciation would occur through a process of separation and divergence, and this point of view is still shared by most evolutionary biologists today.
I do, however, contend that, if so much speciation occurs, the most likely explanation is that there must be conditions where reproductive barriers can be directly selected for. In other words, situations where it is advantageous for individuals to reproduce preferentially within a small group and reduce their breeding with the rest of the ancestral population. This leads me to propose a model whereby new species arise not by populations splitting into separate branches, but by small inbreeding groups "budding" from an ancestral stock. This would be driven by several advantages of inbreeding, and mainly by advantageous recessive phenotypes, which could only be retained in the context of inbreeding. Reproductive barriers would thus not arise as secondary consequences of divergent evolution in populations isolated from one another, but under the direct selective pressure of ancestral stocks. Many documented cases of speciation in natural populations appear to fit the model proposed, with more speciation occurring in populations with high inbreeding coefficients, and many recessive characters identified as central to the phenomenon of speciation, with these recessive mutations expected to be surrounded by patterns of limited genomic diversity.
Whilst adaptive evolution would correspond to gains of function that would, most of the time, be dominant, this type of speciation by budding would thus be driven by mutations resulting in the advantageous loss of certain functions since recessive mutations very often correspond to the inactivation of a gene. A very important further advantage of inbreeding is that it reduces the accumulation of recessive mutations in genomes. A consequence of the model proposed is that the existence of species would correspond to a metastable equilibrium between inbreeding and outbreeding, with excessive inbreeding promoting speciation, and excessive outbreeding resulting in irreversible accumulation of recessive mutations that could ultimately only lead to extinction.
Eugene V. Koonin, Patrick Nosil (nominated by Dr Jerzy Jurka), Pierre Pontarotti
speciation; inbreeding; saeptation; mutation load; extinction; evolution
Transcription factors are thought to regulate the transcription of microRNA genes in a manner similar to that of protein-coding genes; that is, by binding to conventional transcription factor binding site DNA sequences located in or near promoter regions that lie upstream of the microRNA genes. However, in the course of analyzing the genomics of human microRNA genes, we noticed that annotated transcription factor binding sites commonly lie within 70- to 110-nt long microRNA small hairpin precursor sequences.
We report that about 45% of all human small hairpin microRNA (pre-miR) sequences contain at least one predicted transcription factor binding site motif that is conserved across human, mouse and rat, and this rises to over 75% if one excludes primate-specific pre-miRs. The association is robust and has extremely strong statistical significance; it affects both intergenic and intronic pre-miRs and both isolated and clustered microRNA genes. We also confirmed and extended this finding using a separate analysis that examined all human pre-miR sequences regardless of conservation across species.
The transcription factor binding sites localized within small hairpin microRNA precursor sequences may possibly regulate their transcription. Transcription factors may also possibly bind directly to nascent primary microRNA gene transcripts or small hairpin microRNA precursors and regulate their processing.
This article was reviewed by Guillaume Bourque (nominated by Jerzy Jurka), Dmitri Pervouchine (nominated by Mikhail Gelfand), and Yuriy Gusev.
Transcription factors; microRNA biogenesis; drosha
The universal common ancestry (UCA) of all known life is a fundamental component of modern evolutionary theory, supported by a wide range of qualitative molecular evidence. Nevertheless, recently both the status and nature of UCA has been questioned. In earlier work I presented a formal, quantitative test of UCA in which model selection criteria overwhelmingly choose common ancestry over independent ancestry, based on a dataset of universally conserved proteins. These model-based tests are founded in likelihoodist and Bayesian probability theory, in opposition to classical frequentist null hypothesis tests such as Karlin-Altschul E-values for sequence similarity. In a recent comment, Koonin and Wolf (K&W) claim that the model preference for UCA is "a trivial consequence of significant sequence similarity". They support this claim with a computational simulation, derived from universally conserved proteins, which produces similar sequences lacking phylogenetic structure. The model selection tests prefer common ancestry for this artificial data set.
For the real universal protein sequences, hierarchical phylogenetic structure (induced by genealogical history) is the overriding reason for why the tests choose UCA; sequence similarity is a relatively minor factor. First, for cases of conflicting phylogenetic structure, the tests choose independent ancestry even with highly similar sequences. Second, certain models, like star trees and K&W's profile model (corresponding to their simulation), readily explain sequence similarity yet lack phylogenetic structure. However, these are extremely poor models for the real proteins, even worse than independent ancestry models, though they explain K&W's artificial data well. Finally, K&W's simulation is an implementation of a well-known phylogenetic model, and it produces sequences that mimic homologous proteins. Therefore the model selection tests work appropriately with the artificial data.
For K&W's artificial protein data, sequence similarity is the predominant factor influencing the preference for common ancestry. In contrast, for the real proteins, model selection tests show that phylogenetic structure is much more important than sequence similarity. Hence, the model selection tests demonstrate that real universally conserved proteins are homologous, a conclusion based primarily on the specific nested patterns of correlations induced in genetically related protein sequences.
This article was reviewed by Rob Knight, Robert Beiko (nominated by Peter Gogarten), and Michael Gilchrist.
Genome-wide studies of intron dynamics in mammalian orthologous genes have found convincing evidence for loss of introns but very little for intron turnover. Similarly, large-scale analysis of intron dynamics in a few vertebrate genomes has identified only intron losses and no gains, indicating that intron gain is an extremely rare event in vertebrate evolution. These studies suggest that the intron-rich genomes of vertebrates do not allow intron gain. The aim of this study was to search for evidence of de novo intron gain in domesticated genes from an analysis of their exon/intron structures.
A phylogenomic approach has been used to analyse all domesticated genes in mammals and chordates that originated from the coding parts of transposable elements. Gain of introns in domesticated genes has been reconstructed on well established mammalian, vertebrate and chordate phylogenies, and examined as to where and when the gain events occurred. The locations, sizes and amounts of de novo introns gained in the domesticated genes during the evolution of mammals and chordates has been analyzed. A significant amount of intron gain was found only in domesticated genes of placental mammals, where more than 70 cases were identified. De novo gained introns show clear positional bias, since they are distributed mainly in 5' UTR and coding regions, while 3' UTR introns are very rare. In the coding regions of some domesticated genes up to 8 de novo gained introns have been found. Intron densities in Eutheria-specific domesticated genes and in older domesticated genes that originated early in vertebrates are lower than those for normal mammalian and vertebrate genes. Surprisingly, the majority of intron gains have occurred in the ancestor of placentals.
This study provides the first evidence for numerous intron gains in the ancestor of placental mammals and demonstrates that adequate taxon sampling is crucial for reconstructing intron evolution. The findings of this comprehensive study slightly challenge the current view on the evolutionary stasis in intron dynamics during the last 100 - 200 My. Domesticated genes could constitute an excellent system on which to analyse the mechanisms of intron gain in placental mammals.
Reviewers: this article was reviewed by Dan Graur, Eugene V. Koonin and Jürgen Brosius.
A few major discoveries have influenced how ecologists and evolutionists study microbes. Here, in the format of an interview, we answer questions that directly relate to how these discoveries are perceived in these two branches of microbiology, and how they have impacted on both scientific thinking and methodology.
The first question is "What has been the influence of the 'Universal Tree of Life' based on molecular markers?" For evolutionists, the tree was a tool to understand the past of known (cultured) organisms, mapping the invention of various physiologies on the evolutionary history of microbes. For ecologists the tree was a guide to discover the current diversity of unknown (uncultured) organisms, without much knowledge of their physiology.
The second question we ask is "What was the impact of discovering frequent lateral gene transfer among microbes?" In evolutionary microbiology, frequent lateral gene transfer (LGT) made a simple description of relationships between organisms impossible, and for microbial ecologists, functions could not be easily linked to specific genotypes. Both fields initially resisted LGT, but methods or topics of inquiry were eventually changed in one to incorporate LGT in its theoretical models (evolution) and in the other to achieve its goals despite that phenomenon (ecology).
The third and last question we ask is "What are the implications of the unexpected extent of diversity?" The variation in the extent of diversity between organisms invalidated the universality of species definitions based on molecular criteria, a major obstacle to the adaptation of models developed for the study of macroscopic eukaryotes to evolutionary microbiology. This issue has not overtly affected microbial ecology, as it had already abandoned species in favor of the more flexible operational taxonomic units. This field is nonetheless moving away from traditional methods to measure diversity, as they do not provide enough resolution to uncover what lies below the species level.
The answers of the evolutionary microbiologist and microbial ecologist to these three questions illustrate differences in their theoretical frameworks. These differences mean that both fields can react quite distinctly to the same discovery, incorporating it with more or less difficulty in their scientific practice.
This article was reviewed by W. Ford Doolittle, Eugene V. Koonin and Maureen A. O'Malley.
Ribosomal RNA genes; diversity, lateral gene transfer; microbial ecology; microbial evolution; evolutionary microbiology; ecological microbiology; Tree of Life
Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues.
We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information.
For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments.
This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian.
Mitochondria mediate most of the energy production that occurs in the majority of eukaryotic organisms. These subcellular organelles contain a genome that differs from the nuclear genome and is referred to as mitochondrial DNA (mtDNA). Despite a disparity in gene content, all mtDNAs encode at least two components of the mitochondrial electron transport chain, including cytochrome c oxidase I (Cox1).
Presentation of the hypothesis
A positionally conserved ORF has been found on the complementary strand of the cox1 genes of both eukaryotic mitochondria (protist, plant, fungal and animal) and alpha-proteobacteria. This putative gene has been named gau for gene antisense ubiquitous in mtDNAs. The length of the deduced protein is approximately 100 amino acids. In vertebrates, several stop codons have been found in the mt gau region, and potentially functional gau regions have been found in nuclear genomes. However, a recent bioinformatics study showed that several hypothetical overlapping mt genes could be predicted, including gau; this involves the possible import of the cytosolic AGR tRNA into the mitochondria and/or the expression of mt antisense tRNAs with anticodons recognizing AGR codons according to an alternative genetic code that is induced by the presence of suppressor tRNAs. Despite an evolutionary distance of at least 1.5 to 2.0 billion years, the deduced Gau proteins share some conserved amino acid signatures and structure, which suggests a possible conserved function. Moreover, BLAST analysis identified rare, sense-oriented ESTs with poly(A) tails that include the entire gau region. Immunohistochemical analyses using an anti-Gau monoclonal antibody revealed strict co-localization of Gau proteins and a mitochondrial marker.
Testing the hypothesis
This hypothesis could be tested by purifying the gau gene product and determining its sequence. Cell biological experiments are needed to determine the physiological role of this protein.
Implications of the hypothesis
Studies of the gau ORF will shed light on the origin of novel genes and their functions in organelles and could also have medical implications for human diseases that are caused by mitochondrial dysfunction. Moreover, this strengthens evidence for mitochondrial genes coded according to an overlapping genetic code.
Mitochondrial DNA; cox-1 gene; ubiquitous gene; overprinting; genome evolution; janolog
Mitochondria are thought to have evolved from eubacteria-like endosymbionts; however, the origin of the mitochondrion remains a subject of debate. In this study, we investigated the phenomenon of chimerism in mitochondria to shed light on the origin of these organelles by determining which species played a role in their formation. We used the mitochondria of four distinct organisms, Reclinomonas americana, Homo sapiens, Saccharomyces cerevisiae and multichromosome Pediculus humanus, and attempted to identify the origin of each mitochondrial gene.
Our results suggest that the origin of mitochondrial genes is not limited to the Rickettsiales and that the creation of these genes did not occur in a single event, but through multiple successive events. Some of these events are very old and were followed by events that are more recent and occurred through the addition of elements originating from current species. The points in time that the elements were added and the parental species of each gene in the mitochondrial genome are different to the individual species. These data constitute strong evidence that mitochondria do not have a single common ancestor but likely have numerous ancestors, including proto-Rickettsiales, proto-Rhizobiales and proto-Alphaproteobacteria, as well as current alphaproteobacterial species. The analysis of the multichromosome P. humanus mitochondrion supports this mechanism.
The most plausible scenario of the origin of the mitochondrion is that ancestors of Rickettsiales and Rhizobiales merged in a proto-eukaryotic cell approximately one billion years ago. The fusion of the Rickettsiales and Rhizobiales cells was followed by gene loss, genomic rearrangements and the addition of alphaproteobacterial elements through ancient and more recent recombination events. Each gene of each of the four studied mitochondria has a different origin, while in some cases, multichromosomes may allow for enhanced gene exchange. Therefore, the tree of life is not sufficient to explain the chimeric structure of current genomes, and the theory of a single common ancestor and a top-down tree does not reflect our current state of knowledge. Mitochondrial evolution constitutes a rhizome, and it should be represented as such.
This article was revised by William Martin, Arcady Mushegian and Eugene V. Koonin.
The vertebrate globin gene repertoire consists of seven members that differ in terms of structure, function and phyletic distribution. While hemoglobin, myoglobin, cytoglobin, and neuroglobin are present in almost all gnathostomes examined so far, other globin genes, like globin X, are much more restricted in their phyletic distribution. Till today, globin X has only been found in teleost fish and Xenopus. Here, we report that globin X is also present in the genomes of the sea lamprey, ghost shark and reptiles. Moreover, the identification of orthologs of globin X in crustacean, insects, platyhelminthes, and hemichordates confirms its ancient origin.
Identifying group-specific characteristics in metabolic networks can provide better insight into evolutionary developments. Here, we present an approach to classify the three domains of life using topological information about the underlying metabolic networks. These networks have been shown to share domain-independent structural similarities, which pose a special challenge for our endeavour. We quantify specific structural information by using topological network descriptors to classify this set of metabolic networks. Such measures quantify the structural complexity of the underlying networks. In this study, we use such measures to capture domain-specific structural features of the metabolic networks to classify the data set. So far, it has been a challenging undertaking to examine what kind of structural complexity such measures do detect. In this paper, we apply two groups of topological network descriptors to metabolic networks and evaluate their classification performance. Moreover, we combine the two groups to perform a feature selection to estimate the structural features with the highest classification ability in order to optimize the classification performance.
By combining the two groups, we can identify seven topological network descriptors that show a group-specific characteristic by ANOVA. A multivariate analysis using feature selection and supervised machine learning leads to a reasonable classification performance with a weighted F-score of 83.7% and an accuracy of 83.9%. We further demonstrate that our approach outperforms alternative methods. Also, our results reveal that entropy-based descriptors show the highest classification ability for this set of networks.
Our results show that these particular topological network descriptors are able to capture domain-specific structural characteristics for classifying metabolic networks between the three domains of life.
Based on unique, coherent properties of phylogenetic analysis, key amino acid substitutions and structural modeling, we have identified a new class of unusual microbial rhodopsins related to the Anabaena sensory rhodopsin (ASR) protein, including multiple homologs not previously recognized. We propose the name xenorhodopsin for this class, reflecting a taxonomically diverse membership spanning five different Bacterial phyla as well as the Euryarchaeotal class Nanohaloarchaea. The patchy phylogenetic distribution of xenorhodopsin homologs is consistent with historical dissemination through horizontal gene transfer. Shared characteristics of xenorhodopsin-containing microbes include the absence of flagellar motility and isolation from high light habitats.
Reviewers: This article was reviewed by Dr. Michael Galperin and Dr. Rob Knight.
High-throughput sequencing is generating massive amounts of data at a pace that largely exceeds the throughput of data analysis routines. Here we introduce Fish the ChIPs (FC), a computational pipeline aimed at a broad public of users and designed to perform complete ChIP-Seq data analysis of an unlimited number of samples, thus increasing throughput, reproducibility and saving time.
Starting from short read sequences, FC performs the following steps: 1) quality controls, 2) alignment to a reference genome, 3) peak calling, 4) genomic annotation, 5) generation of raw signal tracks for visualization on the UCSC and IGV genome browsers. FC exploits some of the fastest and most effective tools today available. Installation on a Mac platform requires very basic computational skills while configuration and usage are supported by a user-friendly graphic user interface. Alternatively, FC can be compiled from the source code on any Unix machine and then run with the possibility of customizing each single parameter through a simple configuration text file that can be generated using a dedicated user-friendly web-form. Considering the execution time, FC can be run on a desktop machine, even though the use of a computer cluster is recommended for analyses of large batches of data. FC is perfectly suited to work with data coming from Illumina Solexa Genome Analyzers or ABI SOLiD and its usage can potentially be extended to any sequencing platform.
Compared to existing tools, FC has two main advantages that make it suitable for a broad range of users. First of all, it can be installed and run by wet biologists on a Mac machine. Besides it can handle an unlimited number of samples, being convenient for large analyses. In this context, computational biologists can increase reproducibility of their ChIP-Seq data analyses while saving time for downstream analyses.
This article was reviewed by Gavin Huttley, George Shpakovski and Sarah Teichmann.
Volutin granules appear to be universally distributed and are morphologically and chemically identical to acidocalcisomes, which are electron-dense granular organelles rich in calcium and phosphate, whose functions include storage of phosphorus and various metal ions, metabolism of polyphosphate, maintenance of intracellular pH, osmoregulation and calcium homeostasis. Prokaryotes are thought to differ from eukaryotes in that they lack membrane-bounded organelles. However, it has been demonstrated that as in acidocalcisomes, the calcium and polyphosphate-rich intracellular "volutin granules (polyphosphate bodies)" in two bacterial species, Agrobacterium tumefaciens, and Rhodospirillum rubrum, are membrane bound and that the vacuolar proton-translocating pyrophosphatases (V-H+PPases) are present in their surrounding membranes. Volutin granules and acidocalcisomes have been found in organisms as diverse as bacteria and humans.
Here, we show volutin granules also occur in Archaea and are, therefore, present in the three superkingdoms of life (Archaea, Bacteria and Eukarya). Molecular analyses of V-H+PPase pumps, which acidify the acidocalcisome lumen and are diagnostic proteins of the organelle, also reveal the presence of this enzyme in all three superkingdoms suggesting it is ancient and universal. Since V-H+PPase sequences contained limited phylogenetic signal to fully resolve the ancestral nodes of the tree, we investigated the divergence of protein domains in the V-H+PPase molecules. Using Protein family (Pfam) database, we found a domain in the protein, PF03030. The domain is shared by 31 species in Eukarya, 231 in Bacteria, and 17 in Archaea. The universal distribution of the V-H+PPase PF03030 domain, which is associated with the V-H+PPase function, suggests the domain and the enzyme were already present in the Last Universal Common Ancestor (LUCA).
The importance of the V-H+PPase function and the evolutionary dynamics of these domains support the early origin of the acidocalcisome organelle. In particular, the universality of volutin granules and presence of a functional V-H+PPase domain in the three superkingdoms of life reveals that the acidocalcisomes may have appeared earlier than the divergence of the superkingdoms. This result is remarkable and highlights the possibility that a high degree of cellular compartmentalization could already have been present in the LUCA.
This article was reviewed by Anthony Poole, Lakshminarayan Iyer and Daniel Kahn
Currently a huge amount of protein-protein interaction data is available therefore extracting meaningful ones are a challenging task. In a protein-protein interaction network, hubs are considered as key proteins maintaining function and stability of the network. Therefore, studying protein-protein complexes from a structural perspective provides valuable information for predicted interactions.
In this study, we have predicted by comparative modelling and docking methods protein-protein complexes of hubs of human NR-RTK network inferred from our earlier study. We found that some interactions are mutually excluded while others could occur simultaneously. This study revealed by structural analysis the key role played by Estrogen receptor (ESR1) in mediating the signal transduction between human Receptor Tyrosine kinases (RTKs) and nuclear receptors (NRs).
Although the methods require human intervention and judgment, they can identify the interactions that could occur together or ones that are mutually exclusive. This adds a fourth dimension to interaction network, that of time, and can assist in obtaining concrete predictions consistent with experiments.
Open peer review
This article was reviewed by Dr. Anthony Almudevar, Prof. James Faeder and Prof. Eugene Koonin. For the full reviews, please go to the Reviewers' comments.
Chromosomal orthologs can reveal the shared ancestral gene set and their evolutionary trends. Additionally, physico-chemical properties of encoded proteins could provide information about functional adaptation and ecological niche requirements.
We analyzed 7080 genes (five groups of 1416 orthologs each) from Rhizobiales species (S. meliloti, R. etli, and M. loti, plant symbionts; A. tumefaciens, a plant pathogen; and B. melitensis, an animal pathogen). We evaluated their phylogenetic relationships and observed three main topologies. The first, with closer association of R. etli to A. tumefaciens; the second with R. etli closer to S. meliloti; and the third with A. tumefaciens and S. meliloti as the closest pair. This was not unusual, given the close relatedness of these three species. We calculated the synonymous (dS) and nonsynonymous (dN) substitution rates of these orthologs, and found that informational and metabolic functions showed relatively low dN rates; in contrast, genes from hypothetical functions and cellular processes showed high dN rates. An alternative measure of sequence variability, percentage of changes by species, was used to evaluate the most specific proportion of amino acid residues from alignments. When dN was compared with that measure a high correlation was obtained, revealing that much of evolutive information was extracted with the percentage of changes by species at the amino acid level. By analyzing the sequence variability of orthologs with a set of five properties (polarity, electrostatic charge, formation of secondary structures, molecular volume, and amino acid composition), we found that physico-chemical characteristics of proteins correlated with specific functional roles, and association of species did not follow their typical phylogeny, probably reflecting more adaptation to their life styles and niche preferences. In addition, orthologs with low dN rates had residues with more positive values of polarity, volume and electrostatic charge.
These findings revealed that even when orthologs perform the same function in each genomic background, their sequences reveal important evolutionary tendencies and differences related to adaptation.
This article was reviewed by: Dr. Purificación López-García, Prof. Jeffrey Townsend (nominated by Dr. J. Peter Gogarten), and Ms. Olga Kamneva.
rhizobia; comparative genomics; evolutionary rates; nonsynonymous substitution; adaptation
In the presence of horizontal gene transfer (HGT), the concepts of lineage and genealogy in the microbial world become more ambiguous because chimeric genomes trace their ancestry from a myriad of sources, both living and extinct.
We present the evolutionary histories of three aminoacyl-tRNA synthetases (aaRS) to illustrate that the concept of organismal lineage in the prokaryotic world is defined by both vertical inheritance and reticulations due to HGT. The acquisition of a novel gene from a distantly related taxon can be considered as a shared derived character that demarcates a group of organisms, as in the case of the spirochaete Phenylalanyl-tRNA synthetase (PheRS). On the other hand, when organisms transfer genetic material with their close kin, the similarity and therefore relatedness observed among them is essentially shaped by gene transfer. Studying the distribution patterns of divergent genes with identical functions, referred to as homeoalleles, can reveal preferences for transfer partners. We describe the very ancient origin and the distribution of the archaeal homeoalleles for Threonyl-tRNA synthetases (ThrRS) and Seryl-tRNA synthetases (SerRS).
Patterns created through biased HGT can be undistinguishable from those created through shared organismal ancestry. A re-evaluation of the definition of lineage is necessary to reflect genetic relatedness due to both HGT and vertical inheritance. In most instances, HGT bias will maintain and strengthen similarity within groups. Only in cases where HGT bias is due to other factors, such as shared ecological niche, do patterns emerge from gene phylogenies that are in conflict with those reflecting shared organismal ancestry.
This article was reviewed by W. Ford Doolittle, François-Joseph Lapointe, and Frederic Bouchard.
In many biological and therapeutic contexts, it is highly desirable to target a chemical specifically to a particular tissue where it exerts its biological effect. In this paper, we present a simple, generic, mathematical model that elucidates a general method for targeting a chemical to particular tissues. The model consists of coupled reaction-diffusion equations to describe the evolution within the tissue of the concentrations of three chemical species: a (concentration of free chemical), b (binding protein) and their complex, c (chemical bound to binding protein). We assume that all species are free to diffuse, and that a and b undergo a reversible reaction to form c. In addition, the complex, c, can be broken down by a process (e.g. an enzyme in the tissue) that results in the release of the chemical, a, which is then free to exert its biological action.
For simplicity, we consider a one-dimensional geometry. In the special case where the rate of complex formation is small (compared to the diffusion timescale of the species within the tissue) the system can be solved analytically. This analytic solution allows us to show how the concentration of free chemical, a, in the tissue can be increased over the concentration of free chemical at the tissue boundary. We show that, under certain conditions, the maximum concentration of a can occur at the centre of the tissue, and give an upper bound on this maximum level. Numerical simulations are then used to determine how the behaviour of the system changes when the assumption of negligible complex formation rate is relaxed.
We have shown, using our mathematical model, how complex degradation can potentially be exploited to target a chemical to a particular tissue, and how the level of the active chemical depends on factors such as the diffusion coefficients and degradation/production rates of each species. The biological significance of these results in terms of potential applications in cartilage tissue engineering and chemotherapy is discussed. In particular, we believe these results may be of use in determining the most promising prodrug candidates.
Solute transport; Tissue; Mathematical model; Prodrug; Insulin-like Growth factor (IGF); proteases
Phylogenetic reconstruction using DNA and protein sequences has allowed the reconstruction of evolutionary histories encompassing all life. We present and discuss a means to incorporate much of this rich narrative into a single model that acknowledges the discrete evolutionary units that constitute the organism. Briefly, this Rooted Net of Life genome phylogeny is constructed around an initial, well resolved and rooted tree scaffold inferred from a supermatrix of combined ribosomal genes. Extant sampled ribosomes form the leaves of the tree scaffold. These leaves, but not necessarily the deeper parts of the scaffold, can be considered to represent a genome or pan-genome, and to be associated with members of other gene families within that sequenced (pan)genome. Unrooted phylogenies of gene families containing four or more members are reconstructed and superimposed over the scaffold. Initially, reticulations are formed where incongruities between topologies exist. Given sufficient evidence, edges may then be differentiated as those representing vertical lines of inheritance within lineages and those representing horizontal genetic transfers or endosymbioses between lineages.
W. Ford Doolittle, Eric Bapteste and Robert Beiko.
Eukaryotic genomes harbor diverse families of repetitive DNA derived from transposable elements (TEs) that are able to replicate and insert into genomic DNA. The biological role of TEs remains unclear, although they have profound mutagenic impact on eukaryotic genomes and the origin of repetitive families often correlates with speciation events. We present a new hypothesis to explain the observed correlations based on classical concepts of population genetics.
Presentation of the hypothesis
The main thesis presented in this paper is that the TE-derived repetitive families originate primarily by genetic drift in small populations derived mostly by subdivisions of large populations into subpopulations. We outline the potential impact of the emerging repetitive families on genetic diversification of different subpopulations, and discuss implications of such diversification for the origin of new species.
Testing the hypothesis
Several testable predictions of the hypothesis are examined. First, we focus on the prediction that the number of diverse families of TEs fixed in a representative genome of a particular species positively correlates with the cumulative number of subpopulations (demes) in the historical metapopulation from which the species has emerged. Furthermore, we present evidence indicating that human AluYa5 and AluYb8 families might have originated in separate proto-human subpopulations. We also revisit prior evidence linking the origin of repetitive families to mammalian phylogeny and present additional evidence linking repetitive families to speciation based on mammalian taxonomy. Finally, we discuss evidence that mammalian orders represented by the largest numbers of species may be subject to relatively recent population subdivisions and speciation events.
Implications of the hypothesis
The hypothesis implies that subdivision of a population into small subpopulations is the major step in the origin of new families of TEs as well as of new species. The origin of new subpopulations is likely to be driven by the availability of new biological niches, consistent with the hypothesis of punctuated equilibria. The hypothesis also has implications for the ongoing debate on the role of genetic drift in genome evolution.
This article was reviewed by Eugene Koonin, Juergen Brosius and I. King Jordan.
MicroRNAs (miRNAs) regulate their targets by triggering mRNA degradation or translational repression. The negative relationship between miRNAs and their targets suggests that the regulatory effect of a miRNA could be determined from the expression levels of its targets. Here, we investigated the relationship between miRNA activities determined by computational programs and miRNA expression levels by using data in which both mRNA and miRNA expression from the same samples were measured. We found that different from the intuitive expectation one might have, miRNA activity shows very weak correlation with miRNA expression, which indicates complex regulating mechanisms between miRNAs and their target genes.
This manuscript was reviewed by an anonymous reviewer and Dr Yuriy Gusev.
MicroRNA; microRNA activity; microRNA expression
More and more antiretroviral therapies are being developed for treatment of HIV infection. The in-vivo efficacy of these drugs is commonly predicted based on in-vitro measures of antiviral effect. One primary in-vitro measure is the IC50, the amount of drug required for 50% inhibition of viral replication. We have previously shown that HIV life-cycle kinetics impact clinically observed HIV viral dynamics. Here we present a mathematical model of how they affect the pharmacodynamics of antiretroviral drugs.
We find that experimentally measured antiretroviral IC50s are determined by three factors: (i) intrinsic drug properties (e.g. drug-target binding), (ii) kinetics of the HIV life cycle, and (iii) kinetics of drug-inhibited infected cells. Our model predicts that the IC50 is a declining function of the duration of the drug-susceptible stage in the host cell. We combine our model with known viral life-cycle kinetics to derive a measure of intrinsic properties, reflecting drug action, for known antiretroviral drugs from previously measured IC50s. We show that this measure of intrinsic drug property correlates very well with in vitro-measured antiviral activity, whereas experimentally measured IC50 does not.
Our results have implications for understanding pharmacodynamics of and improving activity of antiretroviral drugs. Our findings predict that drug activity can be improved through co-administration of synergistic drugs that delay the viral life cycle but are not inhibitory by themselves. Moreover, our results may easily extend to treatment of other pathogens.
This article was reviewed by Dr. Ruy Ribeiro, Dr. Ha Youn Lee, Dr. Alan Perelson and Dr. Christoph Adami.
HIV; viral dynamics; HAART; antiretroviral therapy; viral life cycle; pharmacodynamics; IC50
It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis.