Search tips
Search criteria

Results 1-25 (39)

Clipboard (0)
Year of Publication
1.  Temporal order of evolution of DNA replication systems inferred by comparison of cellular and viral DNA polymerases 
Biology Direct  2006;1:39.
The core enzymes of the DNA replication systems show striking diversity among cellular life forms and more so among viruses. In particular, and counter-intuitively, given the central role of DNA in all cells and the mechanistic uniformity of replication, the core enzymes of the replication systems of bacteria and archaea (as well as eukaryotes) are unrelated or extremely distantly related. Viruses and plasmids, in addition, possess at least two unique DNA replication systems, namely, the protein-primed and rolling circle modalities of replication. This unexpected diversity makes the origin and evolution of DNA replication systems a particularly challenging and intriguing problem in evolutionary biology.
I propose a specific succession for the emergence of different DNA replication systems, drawing argument from the differences in their representation among viruses and other selfish replicating elements. In a striking pattern, the DNA replication systems of viruses infecting bacteria and eukaryotes are dominated by the archaeal-type B-family DNA polymerase (PolB) whereas the bacterial replicative DNA polymerase (PolC) is present only in a handful of bacteriophage genomes. There is no apparent mechanistic impediment to the involvement of the bacterial-type replication machinery in viral DNA replication. Therefore, I hypothesize that the observed, markedly unequal distribution of the replicative DNA polymerases among the known cellular and viral replication systems has a historical explanation. I propose that, among the two types of DNA replication machineries that are found in extant life forms, the archaeal-type, PolB-based system evolved first and had already given rise to a variety of diverse viruses and other selfish elements before the advent of the bacterial, PolC-based machinery. Conceivably, at that stage of evolution, the niches for DNA-viral reproduction have been already filled with viruses replicating with the help of the archaeal system, and viruses with the bacterial system never took off. I further suggest that the two other systems of DNA replication, the rolling circle mechanism and the protein-primed mechanism, which are represented in diverse selfish elements, also evolved prior to the emergence of the bacterial replication system. This hypothesis is compatible with the distinct structural affinities of PolB, which has the palm-domain fold shared with reverse transcriptases and RNA-dependent RNA polymerases, and PolC that has a distinct, unrelated nucleotidyltransferase fold. I propose that PolB is a descendant of polymerases that were involved in the replication of genetic elements in the RNA-protein world, prior to the emergence of DNA replication. By contrast, PolC might have evolved from an ancient non-templated polymerase, e.g., polyA polymerase. The proposed temporal succession of the evolving DNA replication systems does not depend on the specific scenario adopted for the evolution of cells and viruses, i.e., whether viruses are derived from cells or virus-like elements are thought to originate from a primordial gene pool. However, arguments are presented in favor of the latter scenario as the most parsimonious explanation of the evolution of DNA replication systems.
Comparative analysis of the diversity of genomic strategies and organizations of viruses and cellular life forms has the potential to open windows into the deep past of life's evolution, especially, with the regard to the origin of genome replication systems. When complemented with information on the evolution of the relevant protein folds, this comparative approach can yield credible scenarios for very early steps of evolution that otherwise appear to be out of reach.
Eric Bapteste, Patrick Forterre, and Mark Ragan.
PMCID: PMC1766352  PMID: 17176463
2.  A computational analysis of the three isoforms of glutamate dehydrogenase reveals structural features of the isoform EC supporting a key role in ammonium assimilation by plants 
Biology Direct  2006;1:38.
There are three isoforms of glutamate dehydrogenase. The isoform EC (GDH4) catalyses glutamate synthesis from 2-oxoglutarate and ammonium, using NAD(P)H. Ammonium assimilation is critical for plant growth. Although GDH4 from animals and prokaryotes are well characterized, there are few data concerning plant GDH4, even from those whose genomes are well annotated.
A large set of the three GDH isoforms was built resulting in 116 non-redundant full polypeptide sequences. A computational analysis was made to gain more information concerning the structure – function relationship of GDH4 from plants (Eukaryota, Viridiplantae). The tested plant GDH4 sequences were the two ones known to date, those of Chlorella sorokiniana. This analysis revealed several structural features specific of plant GDH4: (i) the lack of a structure called "antenna"; (ii) the NAD(P)-binding motif GAGNVA; and (iii) a second putative coenzyme-binding motif GVLTGKG together with four residues involved in the binding of the reduced form of NADP.
A number of structural features specific of plant GDH4 have been found. The results reinforce the probable key role of GDH4 in ammonium assimilation by plants.
This article was reviewed by Tina Bakolitsa (nominated by Eugene Koonin), Martin Jambon (nominated by Laura Landweber), Sandor Pangor and Franck Eisenhaber.
PMCID: PMC1716157  PMID: 17173671
3.  Proteomic and bioinformatic analysis of epithelial tight junction reveals an unexpected cluster of synaptic molecules 
Biology Direct  2006;1:37.
Zonula occludens, also known as the tight junction, is a specialized cell-cell interaction characterized by membrane "kisses" between epithelial cells. A cytoplasmic plaque of ~100 nm corresponding to a meshwork of densely packed proteins underlies the tight junction membrane domain. Due to its enormous size and difficulties in obtaining a biochemically pure fraction, the molecular composition of the tight junction remains largely unknown.
A novel biochemical purification protocol has been developed to isolate tight junction protein complexes from cultured human epithelial cells. After identification of proteins by mass spectroscopy and fingerprint analysis, candidate proteins are scored and assessed individually. A simple algorithm has been devised to incorporate transmembrane domains and protein modification sites for scoring membrane proteins. Using this new scoring system, a total of 912 proteins have been identified. These 912 hits are analyzed using a bioinformatics approach to bin the hits in 4 categories: configuration, molecular function, cellular function, and specialized process. Prominent clusters of proteins related to the cytoskeleton, cell adhesion, and vesicular traffic have been identified. Weaker clusters of proteins associated with cell growth, cell migration, translation, and transcription are also found. However, the strongest clusters belong to synaptic proteins and signaling molecules. Localization studies of key components of synaptic transmission have confirmed the presence of both presynaptic and postsynaptic proteins at the tight junction domain. To correlate proteomics data with structure, the tight junction has been examined using electron microscopy. This has revealed many novel structures including end-on cytoskeletal attachments, vesicles fusing/budding at the tight junction membrane domain, secreted substances encased between the tight junction kisses, endocytosis of tight junction double membranes, satellite Golgi apparatus and associated vesicular structures. A working model of the tight junction consisting of multiple functions and sub-domains has been generated using the proteomics and structural data.
This study provides an unbiased proteomics and bioinformatics approach to elucidate novel functions of the tight junction. The approach has revealed an unexpected cluster associating with synaptic function. This surprising finding suggests that the tight junction may be a novel epithelial synapse for cell-cell communication.
This article was reviewed by Gáspár Jékely, Etienne Joly and Neil Smalheiser.
PMCID: PMC1712231  PMID: 17156438
4.  Did group II intron proliferation in an endosymbiont-bearing archaeon create eukaryotes? 
Biology Direct  2006;1:36.
Martin & Koonin recently proposed that the eukaryote nucleus evolved as a quality control mechanism to prevent ribosome readthrough into introns. In their scenario, the bacterial ancestor of mitochondria was resident in an archaeal cell, and group II introns (carried by the fledgling mitochondrion) inserted into coding regions in the archaeal host genome. They suggest that if transcription and translation were coupled, and because splicing is expected to have been slower than translation, the effect of insertion would have been ribosome readthrough into introns, resulting in production of aberrant proteins. The emergence of the nuclear compartment would thus have served to separate transcription and splicing from translation, thereby alleviating this problem. In this article, I argue that Martin & Koonin's model is not compatible with current knowledge. The model requires that group II introns would spread aggressively through an archaeal genome. It is well known that selfish elements can spread through an outbreeding sexual population despite a substantial fitness cost to the host. The same is not true for asexual lineages however, where both theory and observation argue that such elements will be under pressure to reduce proliferation, and may be lost completely. The recent introduction of group II introns into archaea by horizontal transfer provides a natural test case with which to evaluate Martin & Koonin's model. The distribution and behaviour of these introns fits prior theoretical expectations, not the scenario of aggressive proliferation advocated by Martin & Koonin. I therefore conclude that the mitochondrial seed hypothesis for the origin of eukaryote introns, on which their model is based, better explains the early expansion of introns in eukaryotes. The mitochondrial seed hypothesis has the capacity to separate the origin of eukaryotes from the origin of introns, leaving open the possibility that the cell that engulfed the ancestor of mitochondria was a sexually outcrossing eukaryote cell.
PMCID: PMC1712230  PMID: 17156426
5.  Did the last common ancestor have a biological membrane? 
Biology Direct  2006;1:35.
All theories about the origin and evolution of membrane bound cells necessarily have to cope with the nature of the last common ancestor of cellular life. One of the most important aspect of this ancestor, whether it had a closed biological membrane or not, has recently been intensely debated. Having a consensus about it would be an important step towards an eventual (though probably still remote) synthesis of the best elements of the current multitude of cell evolution models. Here I analyse the structural and functional conservation of the few universally distributed proteins that were undoubtedly present in the last common ancestor and that carry out membrane-associated functions. These include the SecY subunit of the protein-conducting channel, the signal recognition particle, the signal recognition particle receptor, the signal peptidase, and the proton ATPase. The conserved structural and functional aspects of these proteins indicate that the last common ancestor was associated with a hydrophobic layer with two hydrophilic sides (an inside and an outside) that had a full-fledged and asymmetric protein insertion and translocation machinery and served as a permeability barrier for protons and other small molecules. It is difficult to escape the conclusion that the last common ancestor had a closed biological membrane from which all cellular membranes evolved.
PMCID: PMC1675992  PMID: 17129384
6.  Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus 
Biology Direct  2006;1:34.
The interpandemic evolution of the influenza A virus hemagglutinin (HA) protein is commonly considered a paragon of rapid evolutionary change under positive selection in which amino acid replacements are fixed by virtue of their effect on antigenicity, enabling the virus to evade immune surveillance.
We performed phylogenetic analyses of the recently obtained large and relatively unbiased samples of the HA sequences from 1995–2005 isolates of the H3N2 and H1N1 subtypes of influenza A virus. Unexpectedly, it was found that the evolution of H3N2 HA includes long intervals of generally neutral sequence evolution without apparent substantial antigenic change ("stasis" periods) that are characterized by an excess of synonymous over nonsynonymous substitutions per site, lack of association of amino acid replacements with epitope regions, and slow extinction of coexisting virus lineages. These long periods of stasis are punctuated by shorter intervals of rapid evolution under positive selection during which new dominant lineages quickly displace previously coexisting ones. The preponderance of positive selection during intervals of rapid evolution is supported by the dramatic excess of amino acid replacements in the epitope regions of HA compared to replacements in the rest of the HA molecule. In contrast, the stasis intervals showed a much more uniform distribution of replacements over the HA molecule, with a statistically significant difference in the rate of synonymous over nonsynonymous substitution in the epitope regions between the two modes of evolution. A number of parallel amino acid replacements – the same amino acid substitution occurring independently in different lineages – were also detected in H3N2 HA. These parallel mutations were, largely, associated with periods of rapid fitness change, indicating that there are major limitations on evolutionary pathways during antigenic change. The finding that stasis is the prevailing modality of H3N2 evolution suggests that antigenic changes that lead to an increase in fitness typically result from epistatic interactions between several amino acid substitutions in the HA and, perhaps, other viral proteins. The strains that become dominant due to increased fitness emerge from low frequency strains thanks to the last amino acid replacement that completes the set of replacements required to produce a significant antigenic change; no subset of substitutions results in a biologically significant antigenic change and corresponding fitness increase. In contrast to H3N2, no clear intervals of evolution under positive selection were detected for the H1N1 HA during the same time span. Thus, the ascendancy of H1N1 in some seasons is, most likely, caused by the drop in the relative fitness of the previously prevailing H3N2 lineages as the fraction of susceptible hosts decreases during the stasis intervals.
Numbers of synonymous and nonsynonymous substitution per site (dN/dS) in H3N2 HA
We show that the common view of the evolution of influenza virus as a rapid, positive selection-driven process is, at best, incomplete. Rather, the interpandemic evolution of influenza appears to consist of extended intervals of stasis, which are characterized by neutral sequence evolution, punctuated by shorter intervals of rapid fitness increase when evolutionary change is driven by positive selection. These observations have implications for influenza surveillance and vaccine formulation; in particular, the possibility exists that parallel amino acid replacements could serve as a predictor of new dominant strains.
Ron Fouchier (nominated by Andrey Rzhetsky), David Krakauer, Christopher Lee
PMCID: PMC1647279  PMID: 17067369
7.  Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression 
Biology Direct  2006;1:33.
The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues.
There are 42–54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30–40% of genes whose expression patterns are positively correlated and 10–15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data.
To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes.
This article was reviewed by Dr. I. King Jordan, Dr. Joel Bader, and Dr. Arcady Mushegian.
PMCID: PMC1634740  PMID: 17064414
8.  The immune-body cytokine network defines a social architecture of cell interactions 
Biology Direct  2006;1:32.
Three networks of intercellular communication can be associated with cytokine secretion; one limited to cells of the immune system (immune cells), one limited to parenchymal cells of organs and tissues (body cells), and one involving interactions between immune and body cells (immune-body interface). These cytokine connections determine the inflammatory response to injury and subsequent healing as well as the biologic consequences of the adaptive immune response to antigens. We informatically probed the cytokine database to uncover the underlying network architecture of the three networks.
We now report that the three cytokine networks are among the densest of complex networks yet studied, and each features a characteristic profile of specific three-cell motifs. Some legitimate cytokine connections are shunned (anti-motifs). Certain immune cells can be paired by their input-output positions in a cytokine architecture tree of five tiers: macrophages (MΦ) and B cells (BC) comprise the first tier; the second tier is formed by T helper 1 (Th1) and T helper 2 (Th2) cells; the third tier includes dendritic cells (DC), mast cells (MAST), Natural Killer T cells (NK-T) and others; the fourth tier is formed by neutrophils (NEUT) and Natural Killer cells (NK); and the Cytotoxic T cell (CTL) stand alone as a fifth tier. The three-cell cytokine motif architecture of immune system cells places the immune system in a super-family that includes social networks and the World Wide Web. Body cells are less clearly stratified, although cells involved in wound healing and angiogenesis are most highly interconnected with immune cells.
Cytokine network architecture creates an innate cell-communication platform that organizes the biologic outcome of antigen recognition and inflammation. Informatics sheds new light on immune-body systems organization.
This article was reviewed by Neil Greenspan, Matthias von Herrath and Anne Cooke.
PMCID: PMC1636025  PMID: 17062134
9.  Evolution of glyoxylate cycle enzymes in Metazoa: evidence of multiple horizontal transfer events and pseudogene formation 
Biology Direct  2006;1:31.
The glyoxylate cycle is thought to be present in bacteria, protists, plants, fungi, and nematodes, but not in other Metazoa. However, activity of the glyoxylate cycle enzymes, malate synthase (MS) and isocitrate lyase (ICL), in animal tissues has been reported. In order to clarify the status of the MS and ICL genes in animals and get an insight into their evolution, we undertook a comparative-genomic study.
Using sequence similarity searches, we identified MS genes in arthropods, echinoderms, and vertebrates, including platypus and opossum, but not in the numerous sequenced genomes of placental mammals. The regions of the placental mammals' genomes expected to code for malate synthase, as determined by comparison of the gene orders in vertebrate genomes, show clear similarity to the opossum MS sequence but contain stop codons, indicating that the MS gene became a pseudogene in placental mammals. By contrast, the ICL gene is undetectable in animals other than the nematodes that possess a bifunctional, fused ICL-MS gene. Examination of phylogenetic trees of MS and ICL suggests multiple horizontal gene transfer events that probably went in both directions between several bacterial and eukaryotic lineages. The strongest evidence was obtained for the acquisition of the bifunctional ICL-MS gene from an as yet unknown bacterial source with the corresponding operonic organization by the common ancestor of the nematodes.
The distribution of the MS and ICL genes in animals suggests that either they encode alternative enzymes of the glyoxylate cycle that are not orthologous to the known MS and ICL or the animal MS acquired a new function that remains to be characterized. Regardless of the ultimate solution to this conundrum, the genes for the glyoxylate cycle enzymes present a remarkable variety of evolutionary events including unusual horizontal gene transfer from bacteria to animals.
Arcady Mushegian (Stowers Institute for Medical Research), Andrey Osterman (Burnham Institute for Medical Research), Chris Ponting (Oxford University).
PMCID: PMC1630690  PMID: 17059607
10.  Mathematical modeling of tumor therapy with oncolytic viruses: effects of parametric heterogeneity on cell dynamics 
Biology Direct  2006;1:30.
One of the mechanisms that ensure cancer robustness is tumor heterogeneity, and its effects on tumor cells dynamics have to be taken into account when studying cancer progression. There is no unifying theoretical framework in mathematical modeling of carcinogenesis that would account for parametric heterogeneity.
Here we formulate a modeling approach that naturally takes stock of inherent cancer cell heterogeneity and illustrate it with a model of interaction between a tumor and an oncolytic virus. We show that several phenomena that are absent in homogeneous models, such as cancer recurrence, tumor dormancy, and others, appear in heterogeneous setting. We also demonstrate that, within the applied modeling framework, to overcome the adverse effect of tumor cell heterogeneity on the outcome of cancer treatment, a heterogeneous population of an oncolytic virus must be used. Heterogeneity in parameters of the model, such as tumor cell susceptibility to virus infection and the ability of an oncolytic virus to infect tumor cells, can lead to complex, irregular evolution of the tumor. Thus, quasi-chaotic behavior of the tumor-virus system can be caused not only by random perturbations but also by the heterogeneity of the tumor and the virus.
The modeling approach described here reveals the importance of tumor cell and virus heterogeneity for the outcome of cancer therapy. It should be straightforward to apply these techniques to mathematical modeling of other types of anticancer therapy.
Leonid Hanin (nominated by Arcady Mushegian), Natalia Komarova (nominated by Orly Alter), and David Krakauer.
PMCID: PMC1622743  PMID: 17018145
11.  The ancient Virus World and evolution of cells 
Biology Direct  2006;1:29.
Recent advances in genomics of viruses and cellular life forms have greatly stimulated interest in the origins and evolution of viruses and, for the first time, offer an opportunity for a data-driven exploration of the deepest roots of viruses. Here we briefly review the current views of virus evolution and propose a new, coherent scenario that appears to be best compatible with comparative-genomic data and is naturally linked to models of cellular evolution that, from independent considerations, seem to be the most parsimonious among the existing ones.
Several genes coding for key proteins involved in viral replication and morphogenesis as well as the major capsid protein of icosahedral virions are shared by many groups of RNA and DNA viruses but are missing in cellular life forms. On the basis of this key observation and the data on extensive genetic exchange between diverse viruses, we propose the concept of the ancient virus world. The virus world is construed as a distinct contingent of viral genes that continuously retained its identity throughout the entire history of life. Under this concept, the principal lineages of viruses and related selfish agents emerged from the primordial pool of primitive genetic elements, the ancestors of both cellular and viral genes. Thus, notwithstanding the numerous gene exchanges and acquisitions attributed to later stages of evolution, most, if not all, modern viruses and other selfish agents are inferred to descend from elements that belonged to the primordial genetic pool. In this pool, RNA viruses would evolve first, followed by retroid elements, and DNA viruses. The Virus World concept is predicated on a model of early evolution whereby emergence of substantial genetic diversity antedates the advent of full-fledged cells, allowing for extensive gene mixing at this early stage of evolution. We outline a scenario of the origin of the main classes of viruses in conjunction with a specific model of precellular evolution under which the primordial gene pool dwelled in a network of inorganic compartments. Somewhat paradoxically, under this scenario, we surmise that selfish genetic elements ancestral to viruses evolved prior to typical cells, to become intracellular parasites once bacteria and archaea arrived at the scene. Selection against excessively aggressive parasites that would kill off the host ensembles of genetic elements would lead to early evolution of temperate virus-like agents and primitive defense mechanisms, possibly, based on the RNA interference principle. The emergence of the eukaryotic cell is construed as the second melting pot of virus evolution from which the major groups of eukaryotic viruses originated as a result of extensive recombination of genes from various bacteriophages, archaeal viruses, plasmids, and the evolving eukaryotic genomes. Again, this vision is predicated on a specific model of the emergence of eukaryotic cell under which archaeo-bacterial symbiosis was the starting point of eukaryogenesis, a scenario that appears to be best compatible with the data.
The existence of several genes that are central to virus replication and structure, are shared by a broad variety of viruses but are missing from cellular genomes (virus hallmark genes) suggests the model of an ancient virus world, a flow of virus-specific genes that went uninterrupted from the precellular stage of life's evolution to this day. This concept is tightly linked to two key conjectures on evolution of cells: existence of a complex, precellular, compartmentalized but extensively mixing and recombining pool of genes, and origin of the eukaryotic cell by archaeo-bacterial fusion. The virus world concept and these models of major transitions in the evolution of cells provide complementary pieces of an emerging coherent picture of life's history.
W. Ford Doolittle, J. Peter Gogarten, and Arcady Mushegian.
PMCID: PMC1594570  PMID: 16984643
12.  Diverse bacterial genomes encode an operon of two genes, one of which is an unusual class-I release factor that potentially recognizes atypical mRNA signals other than normal stop codons 
Biology Direct  2006;1:28.
While all codons that specify amino acids are universally recognized by tRNA molecules, codons signaling termination of translation are recognized by proteins known as class-I release factors (RF). In most eukaryotes and archaea a single RF accomplishes termination at all three stop codons. In most bacteria, there are two RFs with overlapping specificity, RF1 recognizes UA(A/G) and RF2 recognizes U(A/G)A.
The hypothesis
First, we hypothesize that orthologues of the E. coli K12 pseudogene prfH encode a third class-I RF that we designate RFH. Second, it is likely that RFH responds to signals other than conventional stop codons. Supporting evidence comes from the following facts: (i) A number of bacterial genomes contain prfH orthologues with no discernable interruptions in their ORFs. (ii) RFH shares strong sequence similarity with other class-I bacterial RFs. (iii) RFH contains a highly conserved GGQ motif associated with peptidyl hydrolysis activity (iv) residues located in the areas supposedly interacting with mRNA and the ribosomal decoding center are highly conserved in RFH, but different from other RFs. RFH lacks the functional, but non-essential domain 1. Yet, RFH-encoding genes are invariably accompanied by a highly conserved gene of unknown function, which is absent in genomes that lack a gene for RFH. The accompanying gene is always located upstream of the RFH gene and with the same orientation. The proximity of the 3' end of the former with the 5' end of the RFH gene makes it likely that their expression is co-regulated via translational coupling. In summary, RFH has the characteristics expected for a class-I RF, but likely with different specificity than RF1 and RF2.
Testing the hypothesis
The most puzzling question is which signals RFH recognizes to trigger its release function. Genetic swapping of RFH mRNA recognition components with its RF1 or RF2 counterparts may reveal the nature of RFH signals.
Implications of the hypothesis
The hypothesis implies a greater versatility of release-factor like activity in the ribosomal A-site than previously appreciated. A closer study of RFH may provide insight into the evolution of the genetic code and of the translational machinery responsible for termination of translation.
This article was reviewed by Daniel Wilson (nominated by Eugene Koonin), Warren Tate (nominated by Eugene Koonin), Yoshikazu Nakamura (nominated by Eugene Koonin) and Eugene Koonin.
PMCID: PMC1586002  PMID: 16970810
13.  Generalization of DNA microarray dispersion properties: microarray equivalent of t-distribution 
Biology Direct  2006;1:27.
DNA microarrays are a powerful technology that can provide a wealth of gene expression data for disease studies, drug development, and a wide scope of other investigations. Because of the large volume and inherent variability of DNA microarray data, many new statistical methods have been developed for evaluating the significance of the observed differences in gene expression. However, until now little attention has been given to the characterization of dispersion of DNA microarray data.
Here we examine the expression data obtained from 682 Affymetrix GeneChips® with 22 different types and we demonstrate that the Gaussian (normal) frequency distribution is characteristic for the variability of gene expression values. However, typically 5 to 15% of the samples deviate from normality. Furthermore, it is shown that the frequency distributions of the difference of expression in subsets of ordered, consecutive pairs of genes (consecutive samples) in pair-wise comparisons of replicate experiments are also normal. We describe a consecutive sampling method, which is employed to calculate the characteristic function approximating standard deviation and show that the standard deviation derived from the consecutive samples is equivalent to the standard deviation obtained from individual genes. Finally, we determine the boundaries of probability intervals and demonstrate that the coefficients defining the intervals are independent of sample characteristics, variability of data, laboratory conditions and type of chips. These coefficients are very closely correlated with Student's t-distribution.
In this study we ascertained that the non-systematic variations possess Gaussian distribution, determined the probability intervals and demonstrated that the Kα coefficients defining these intervals are invariant; these coefficients offer a convenient universal measure of dispersion of data. The fact that the Kα distributions are so close to t-distribution and independent of conditions and type of arrays suggests that the quantitative data provided by Affymetrix technology give "true" representation of physical processes, involved in measurement of RNA abundance.
This article was reviewed by Yoav Gilad (nominated by Doron Lancet), Sach Mukherjee (nominated by Sandrine Dudoit) and Amir Niknejad and Shmuel Friedland (nominated by Neil Smalheiser).
PMCID: PMC1586001  PMID: 16959036
14.  Refuting phylogenetic relationships 
Biology Direct  2006;1:26.
Phylogenetic methods are philosophically grounded, and so can be philosophically biased in ways that limit explanatory power. This constitutes an important methodologic dimension not often taken into account. Here we address this dimension in the context of concatenation approaches to phylogeny.
We discuss some of the limits of a methodology restricted to verificationism, the philosophy on which gene concatenation practices generally rely. As an alternative, we describe a software which identifies and focuses on impossible or refuted relationships, through a simple analysis of bootstrap bipartitions, followed by multivariate statistical analyses. We show how refuting phylogenetic relationships could in principle facilitate systematics. We also apply our method to the study of two complex phylogenies: the phylogeny of the archaea and the phylogeny of the core of genes shared by all life forms. While many groups are rejected, our results left open a possible proximity of N. equitans and the Methanopyrales, of the Archaea and the Cyanobacteria, and as well the possible grouping of the Methanobacteriales/Methanoccocales and Thermosplasmatales, of the Spirochaetes and the Actinobacteria and of the Proteobacteria and firmicutes.
It is sometimes easier (and preferable) to decide which species do not group together than which ones do. When possible topologies are limited, identifying local relationships that are rejected may be a useful alternative to classical concatenation approaches aiming to find a globally resolved tree on the basis of weak phylogenetic markers.
This article was reviewed by Mark Ragan, Eugene V Koonin and J Peter Gogarten.
PMCID: PMC1574289  PMID: 16956399
15.  The signaling helix: a common functional theme in diverse signaling proteins 
Biology Direct  2006;1:25.
The mechanism by which the signals are transmitted between receptor and effector domains in multi-domain signaling proteins is poorly understood.
Using sensitive sequence analysis methods we identify a conserved helical segment of around 40 residues in a wide range of signaling proteins, including numerous sensor histidine kinases such as Sln1p, and receptor guanylyl cyclases such as the atrial natriuretic peptide receptor and nitric oxide receptors. We term this helical segment the signaling (S)-helix and present evidence that it forms a novel parallel coiled-coil element, distinct from previously known helical segments in signaling proteins, such as the Dimerization-Histidine phosphotransfer module of histidine kinases, the intra-cellular domains of the chemotaxis receptors, inter-GAF domain helical linkers and the α-helical HAMP module. Analysis of domain architectures allowed us to reconstruct the domain-neighborhood graph for the S-helix, which showed that the S-helix almost always occurs between two signaling domains. Several striking patterns in the domain neighborhood of the S-helix also became evident from the graph. It most often separates diverse N-terminal sensory domains from various C-terminal catalytic signaling domains such as histidine kinases, cNMP cyclase, PP2C phosphatases, NtrC-like AAA+ ATPases and diguanylate cyclases. It might also occur between two sensory domains such as PAS domains and occasionally between a DNA-binding HTH domain and a sensory domain. The sequence conservation pattern of the S-helix revealed the presence of a unique constellation of polar residues in the dimer-interface positions within the central heptad of the coiled-coil formed by the S-helix.
Combining these observations with previously reported mutagenesis studies on different S-helix-containing proteins we suggest that it functions as a switch that prevents constitutive activation of linked downstream signaling domains. However, upon occurrence of specific conformational changes due to binding of ligand or other sensory inputs in a linked upstream domain it transmits the signal to the downstream domain. Thus, the S-helix represents one of the most prevalent functional themes involved in the flow of signals between modules in diverse prokaryote-type multi-domain signaling proteins.
This article was reviewed by Frank Eisenhaber, Arcady Mushegian and Sandor Pongor.
PMCID: PMC1592074  PMID: 16953892
16.  Codon insertion and deletion functions as a somatic diversification mechanism in human antibody repertoires 
Biology Direct  2006;1:24.
It has been suggested that codon insertion and/or deletion may represent a mechanism that, along with hypermutation, contributes to the affinity maturation of antibodies. We used repertoire cloning to examine human antibodies directed against 3 carbohydrate antigens and 1 protein antigen for the presence of such modifications. We find that both the insertion and deletion of codons occur frequently in antigen-specific responses following vaccination. Codon insertions and deletions were observed most often in the complementarity determining regions, and less frequently in the framework regions, of VH, Vκ, and Vλ gene segments, and involved motifs known to be preferred targets of somatic hypermutation. Clonal lineage analysis shows that these events occur through out the course of the somatic maturation of individual antibody clones. We also determined that these alterations of paratope structure have varying effects on the relative affinity of the binding site for its cognate antigen.
This article was reviewed by Mark Shlomchik, Deborah Dunn-Walters (nominated by Dr. Andrew Macpherson), and Rachel M. Gerstein.
Open peer review
Reviewed by Mark Shlomchik, Deborah Dunn-Walters (nominated by Dr. Andrew Macpherson), and Rachel M. Gerstein. For the full reviews, please go to the Reviewers' comments section.
PMCID: PMC1624809  PMID: 16942619
17.  A system for studying evolution of life-like virtual organisms 
Biology Direct  2006;1:23.
Fitness landscapes, the dependences of fitness on the genotype, are of critical importance for the evolution of living beings. Unfortunately, fitness landscapes that are relevant to the evolution of complex biological functions are very poorly known. As a result, the existing theory of evolution is mostly based on postulated fitness landscapes, which diminishes its usefulness. Attempts to deduce fitness landscapes from models of actual biological processes led, so far, to only limited success.
We present a model system for studying the evolution of biological function, which makes it possible to attribute fitness to genotypes in a natural way. The system mimics a very simple cell and takes into account the basic properties of gene regulation and enzyme kinetics. A virtual cell contains only two small molecules, an organic nutrient A and an energy carrier X, and proteins of five types – two transcription factors, two enzymes, and a membrane transporter. The metabolism of the cell consists of importing A from the environment and utilizing it in order to produce X and an unspecified end product. The genome may carry an arbitrary number of genes, each one encoding a protein of one of the five types. Both major mutations that affect whole genes and minor mutations that affect individual characteristics of genes are possible. Fitness is determined by the ability of the cell to maintain homeostasis when its environment changes. The system has been implemented as a computer program, and several numerical experiments have been performed on it. Evolution of the virtual cells usually involves a rapid initial increase of fitness, which eventually slows down, until a fitness plateau is reached. The origin of a wide variety of genetic networks is routinely observed in independent experiments performed under the same conditions. These networks can have different, including very high, levels of complexity and often include large numbers of non-essential genes.
The described system displays a rich repertoire of biologically sensible behaviors and, thus, can be useful for investigating a number of unresolved issues in evolutionary biology, including evolution of complexity, modularity and redundancy, as well as for studying the general properties of genotype-to-fitness maps.
This article was reviewed by Drs. Eugene Koonin, Shamil Sunyaev and Arcady Mushegian.
PMCID: PMC1569368  PMID: 16916465
18.  The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? 
Biology Direct  2006;1:22.
Ever since the discovery of 'genes in pieces' and mRNA splicing in eukaryotes, origin and evolution of spliceosomal introns have been considered within the conceptual framework of the 'introns early' versus 'introns late' debate. The 'introns early' hypothesis, which is closely linked to the so-called exon theory of gene evolution, posits that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. Under this scenario, the absence of spliceosomal introns in prokaryotes is considered to be a result of "genome streamlining". The 'introns late' hypothesis counters that spliceosomal introns emerged only in eukaryotes, and moreover, have been inserted into protein-coding genes continuously throughout the evolution of eukaryotes. Beyond the formal dilemma, the more substantial side of this debate has to do with possible roles of introns in the evolution of eukaryotes.
I argue that several lines of evidence now suggest a coherent solution to the introns-early versus introns-late debate, and the emerging picture of intron evolution integrates aspects of both views although, formally, there seems to be no support for the original version of introns-early. Firstly, there is growing evidence that spliceosomal introns evolved from group II self-splicing introns which are present, usually, in small numbers, in many bacteria, and probably, moved into the evolving eukaryotic genome from the α-proteobacterial progenitor of the mitochondria. Secondly, the concept of a primordial pool of 'virus-like' genetic elements implies that self-splicing introns are among the most ancient genetic entities. Thirdly, reconstructions of the ancestral state of eukaryotic genes suggest that the last common ancestor of extant eukaryotes had an intron-rich genome. Thus, it appears that ancestors of spliceosomal introns, indeed, have existed since the earliest stages of life's evolution, in a formal agreement with the introns-early scenario. However, there is no evidence that these ancient introns ever became widespread before the emergence of eukaryotes, hence, the central tenet of introns-early, the role of introns in early evolution of proteins, has no support. However, the demonstration that numerous introns invaded eukaryotic genes at the outset of eukaryotic evolution and that subsequent intron gain has been limited in many eukaryotic lineages implicates introns as an ancestral feature of eukaryotic genomes and refutes radical versions of introns-late. Perhaps, most importantly, I argue that the intron invasion triggered other pivotal events of eukaryogenesis, including the emergence of the spliceosome, the nucleus, the linear chromosomes, the telomerase, and the ubiquitin signaling system. This concept of eukaryogenesis, in a sense, revives some tenets of the exon hypothesis, by assigning to introns crucial roles in eukaryotic evolutionary innovation.
The scenario of the origin and evolution of introns that is best compatible with the results of comparative genomics and theoretical considerations goes as follows: self-splicing introns since the earliest stages of life's evolution – numerous spliceosomal introns invading genes of the emerging eukaryote during eukaryogenesis – subsequent lineage-specific loss and gain of introns. The intron invasion, probably, spawned by the mitochondrial endosymbiont, might have critically contributed to the emergence of the principal features of the eukaryotic cell. This scenario combines aspects of the introns-early and introns-late views.
this article was reviewed by W. Ford Doolittle, James Darnell (nominated by W. Ford Doolittle), William Martin, and Anthony Poole.
PMCID: PMC1570339  PMID: 16907971
19.  Phylogeographic support for horizontal gene transfer involving sympatric bruchid species 
Biology Direct  2006;1:21.
We report on the probable horizontal transfer of a mitochondrial gene, cytb, between species of Neotropical bruchid beetles, in a zone where these species are sympatric.
The bruchid beetles Acanthoscelides obtectus, A. obvelatus, A. argillaceus and Zabrotes subfasciatus develop on various bean species in Mexico. Whereas A. obtectus and A. obvelatus develop on Phaseolus vulgaris in the Mexican Altiplano, A. argillaceus feeds on P. lunatus in the Pacific coast. The generalist Z. subfasciatus feeds on both bean species, and is sympatric with A. obtectus and A. obvelatus in the Mexican Altiplano, and with A. argillaceus in the Pacific coast. In order to assess the phylogenetic position of these four species, we amplified and sequenced one nuclear (28S rRNA) and two mitochondrial (cytb, COI) genes.
Whereas species were well segregated in topologies obtained for COI and 28S rRNA, an unexpected pattern was obtained in the cytb phylogenetic tree. In this tree, individuals from A. obtectus and A. obvelatus, as well as Z. subfasciatus individuals from the Mexican Altiplano, clustered together in a unique little variable monophyletic unit. In contrast, A. argillaceus and Z. subfasciatus individuals from the Pacific coast clustered in two separated clades, identically to the pattern obtained for COI and 28S rRNA. An additional analysis showed that Z. subfasciatus individuals from the Mexican Altiplano also possessed the cytb gene present in individuals of this species from the Pacific coast. Zabrotes subfasciatus individuals from the Mexican Altiplano thus demonstrated two cytb genes, an "original" one and an "infectious" one, showing 25% of nucleotide divergence. The "infectious" cytb gene seems to be under purifying selection and to be expressed in mitochondria.
The high degree of incongruence of the cytb tree with patterns for other genes is discussed in the light of three hypotheses: experimental contamination, hybridization, and pseudogenisation. However, none of these seem able to explain the patterns observed. A fourth hypothesis, involving recent horizontal gene transfer (HGT) between A. obtectus and A. obvelatus, and from one of these species to Z. subfasciatus in the Mexican Altiplano, seems the only plausible explanation. The HGT between our study species seems to have occurred recently, and only in a zone where the three beetles are sympatric and share common host plants. This suggests that transfer could have been effected by some external vector such as a eukaryotic or viral parasite, which might still host the transferred fragment.
This article was reviewed by Eric Bapteste, Adam Eyre-Walker and Alexey Kondrashov.
PMCID: PMC1562361  PMID: 16872524
20.  Transposable element derived DNaseI-hypersensitive sites in the human genome 
Biology Direct  2006;1:20.
Transposable elements (TEs) are abundant genomic sequences that have been found to contribute to genome evolution in unexpected ways. Here, we characterize the evolutionary and functional characteristics of TE-derived human genome regulatory sequences uncovered by the high throughput mapping of DNaseI-hypersensitive (HS) sites.
Human genome TEs were found to contribute substantially to HS regulatory sequences characterized in CD4+ T cells: 23% of HS sites contain TE-derived sequences. While HS sites are far more evolutionarily conserved than non HS sites in the human genome, consistent with their functional importance, TE-derived HS sites are highly divergent. Nevertheless, TE-derived HS sites were shown to be functionally relevant in terms of driving gene expression in CD4+ T cells. Genes involved in immune response are statistically over-represented among genes with TE-derived HS sites. A number of genes with both TE-derived HS sites and immune tissue related expression patterns were found to encode proteins involved in immune response such as T cell specific receptor antigens and secreted cytokines as well as proteins with clinical relevance to HIV and cancer. Genes with TE-derived HS sites have higher average levels of sequence and expression divergence between human and mouse orthologs compared to genes with non TE-derived HS sites.
The results reported here support the notion that TEs provide a specific genome-wide mechanism for generating functionally relevant gene regulatory divergence between evolutionary lineages.
This article was reviewed by Wolfgang J. Miller (nominated by Jerzy Jurka), Itai Yanai and Mikhail S.Gelfand.
PMCID: PMC1538576  PMID: 16857058
21.  Rooting the tree of life by transition analyses 
Biology Direct  2006;1:19.
Despite great advances in clarifying the family tree of life, it is still not agreed where its root is or what properties the most ancient cells possessed – the most difficult problems in phylogeny. Protein paralogue trees can theoretically place the root, but are contradictory because of tree-reconstruction artefacts or poor resolution; ribosome-related and DNA-handling enzymes suggested one between neomura (eukaryotes plus archaebacteria) and eubacteria, whereas metabolic enzymes often place it within eubacteria but in contradictory places. Palaeontology shows that eubacteria are much more ancient than eukaryotes, and, together with phylogenetic evidence that archaebacteria are sisters not ancestral to eukaryotes, implies that the root is not within the neomura. Transition analysis, involving comparative/developmental and selective arguments, can polarize major transitions and thereby systematically exclude the root from major clades possessing derived characters and thus locate it; previously the 20 shared neomuran characters were thus argued to be derived, but whether the root was within eubacteria or between them and archaebacteria remained controversial.
I analyze 13 major transitions within eubacteria, showing how they can all be congruently polarized. I infer the first fully resolved prokaryote tree, with a basal stem comprising the new infrakingdom Glidobacteria (Chlorobacteria, Hadobacteria, Cyanobacteria), which is entirely non-flagellate and probably ancestrally had gliding motility, and two derived branches (Gracilicutes and Unibacteria/Eurybacteria) that diverged immediately following the origin of flagella. Proteasome evolution shows that the universal root is outside a clade comprising neomura and Actinomycetales (proteates), and thus lies within other eubacteria, contrary to a widespread assumption that it is between eubacteria and neomura. Cell wall and flagellar evolution independently locate the root outside Posibacteria (Actinobacteria and Endobacteria), and thus among negibacteria with two membranes. Posibacteria are derived from Eurybacteria and ancestral to neomura. RNA polymerase and other insertions strongly favour the monophyly of Gracilicutes (Proteobacteria, Planctobacteria, Sphingobacteria, Spirochaetes). Evolution of the negibacterial outer membrane places the root within Eobacteria (Hadobacteria and Chlorobacteria, both primitively without lipopolysaccharide): as all phyla possessing the outer membrane β-barrel protein Omp85 are highly probably derived, the root lies between them and Chlorobacteria, the only negibacteria without Omp85, or possibly within Chlorobacteria.
Chlorobacteria are probably the oldest and Archaebacteria the youngest bacteria, with Posibacteria of intermediate age, requiring radical reassessment of dominant views of bacterial evolution. The last ancestor of all life was a eubacterium with acyl-ester membrane lipids, large genome, murein peptidoglycan walls, and fully developed eubacterial molecular biology and cell division. It was a non-flagellate negibacterium with two membranes, probably a photosynthetic green non-sulphur bacterium with relatively primitive secretory machinery, not a heterotrophic posibacterium with one membrane.
This article was reviewed by John Logsdon, Purificación López-García and Eric Bapteste (nominated by Simonetta Gribaldo).
PMCID: PMC1586193  PMID: 16834776
22.  Variation in fiberoptic bead-based oligonucleotide microarrays: dispersion characteristics among hybridization and biological replicate samples 
Biology Direct  2006;1:18.
Gene expression microarray technology continues to evolve and its use has expanded into all areas of biology. However, the high dimensionality of the data makes analysis a difficult challenge. Evaluating measurements and estimating the significance of the observed differences among samples remain important issues that must be addressed for each technology platform. In this work we use a consecutive sampling method to characterize the dispersion patterns of data generated from Illumina fiberoptic bead-based oligonucleotide arrays.
To describe general properties of the dispersion we used a linear function SD = a + bYmean, approximating the standard deviation across arrays (Ymean is the mean expression of a given consecutive sample). First we examined three levels of variability: 1) same cell culture, same reverse transcription, duplicate hybridizations; 2) same cell culture, reverse transcription replicates; 3) parallel cultures. Each higher level is expected to introduce a new source of variability. We observed minor differences in the constant term: the mean values are 3.5, 3.1 and 3.5, respectively. However, the mean coefficient b increased from 0.045 to 0.147 and 0.133. We compared the coefficients derived from the consecutive sampling to those obtained from the standard deviation of individual gene expressions and found them in good agreement. In the second experiment samples we detected 11 genes with systematically different expressions between the experiment samples treated with glucose oxidase and controls and corroborated the selection using the Mann-Whitney and other tests. We also compared the consecutive sampling and coincidence method to t-test: the average percentage of consistency was above 80 for the former and below 50 for the latter.
Our results indicate that the consecutive sampling method and standard deviation function provide a convenient description of the overall dispersion of Illumina arrays. We observed that the constant term of the standard deviation function is at average approximately the same for duplicate hybridization as for the assays with additional sources of variability. Furthermore, among the genes affected by glucose oxidase treatment we identified 6 genes in oxidative stress pathways and 5 genes involved in DNA repair. Finally, we noted that the consecutive sampling and coincidence test provide, under given conditions, more consistent results than the t-test.
This article was reviewed by Alexander Karpikov (nominated by MarkGerstein), Jordan King and Eugene V. Koonin.
PMCID: PMC1533816  PMID: 16787528
23.  Genome increase as a clock for the origin and evolution of life 
Biology Direct  2006;1:17.
The size of non-redundant functional genome can be an indicator of biological complexity of living organisms. Several positive feedback mechanisms including gene cooperation and duplication with subsequent specialization may result in the exponential growth of biological complexity in macro-evolution.
I propose a hypothesis that biological complexity increased exponentially during evolution. Regression of the logarithm of functional non-redundant genome size versus time of origin in major groups of organisms showed a 7.8-fold increase per 1 billion years, and hence the increase of complexity can be viewed as a clock of macro-evolution. A strong version of the exponential hypothesis is that the rate of complexity increase in early (pre-prokaryotic) evolution of life was at most the same (or even slower) than observed in the evolution of prokaryotes and eukaryotes.
The increase of functional non-redundant genome size in macro-evolution was consistent with the exponential hypothesis. If the strong exponential hypothesis is true, then the origin of life should be dated 10 billion years ago. Thus, the possibility of panspermia as a source of life on earth should be discussed on equal basis with alternative hypotheses of de-novo life origin. Panspermia may be proven if bacteria similar to terrestrial ones are found on other planets or satellites in the solar system.
This article was reviewed by Eugene V. Koonin, Chris Adami and Arcady Mushegian.
PMCID: PMC1526419  PMID: 16768805
24.  Clinical applications of Genome Polymorphism Scans 
Biology Direct  2006;1:16.
Applications of Genome Polymorphism Scans range from the relatively simple such as gender determination and confirmation of biological relationships, to the relatively complex such as determination of autozygosity and propagation of genetic information throughout pedigrees. Unlike nearly all other clinical DNA tests, the Scan is a universal test – it covers all people and all genes. In balance, I argue that the Genome Polymorphism Scan is the most powerful, affordable clinical DNA test available today.
Reviewers: This article was reviewed by Scott Weiss (nominated by Neil Smalheiser), Roberta Pagon (nominated by Jerzy Jurka) and Val Sheffield (nominated by Neil Smalheiser).
PMCID: PMC1524726  PMID: 16756678
25.  Positive selection on the nonhomologous end-joining factor Cernunnos-XLF in the human lineage 
Biology Direct  2006;1:15.
Cernunnos-XLF is a nonhomologous end-joining factor that is mutated in patients with a rare immunodeficiency with microcephaly. Several other microcephaly-associated genes such as ASPM and microcephalin experienced recent adaptive evolution apparently linked to brain size expansion in humans. In this study we investigated whether Cernunnos-XLF experienced similar positive selection during human evolution.
We obtained or reconstructed full-length coding sequences of chimpanzee, rhesus macaque, canine, and bovine Cernunnos-XLF orthologs from sequence databases and sequence trace archives. Comparison of coding sequences revealed an excess of nonsynonymous substitutions consistent with positive selection on Cernunnos-XLF in the human lineage. The hotspots of adaptive evolution are concentrated around a specific structural domain, whose analogue in the structurally similar XRCC4 protein is involved in binding of another nonhomologous end-joining factor, DNA ligase IV.
Cernunnos-XLF is a microcephaly-associated locus newly identified to be under adaptive evolution in humans, and possibly played a role in human brain expansion. We speculate that Cernunnos-XLF may have contributed to the increased number of brain cells in humans by efficient double strand break repair, which helps to prevent frequent apoptosis of neuronal progenitors and aids mitotic cell cycle progression.
This article was reviewed by Chris Ponting and Richard Emes (nominated by Chris Ponting), Kateryna Makova, Gáspár Jékely and Eugene V. Koonin.
PMCID: PMC1552050  PMID: 16749933

Results 1-25 (39)