PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (68)
 

Clipboard (0)
None
Journals
Year of Publication
1.  Mitochondrial pathogenic mutations are population-specific 
Biology Direct  2010;5:68.
Background
Surveying deleterious variation in human populations is crucial for our understanding, diagnosis and potential treatment of human genetic pathologies. A number of recent genome-wide analyses focused on the prevalence of segregating deleterious alleles in the nuclear genome. However, such studies have not been conducted for the mitochondrial genome.
Results
We present a systematic survey of polymorphisms in the human mitochondrial genome, including those predicted to be deleterious and those that correspond to known pathogenic mutations. Analyzing 4458 completely sequenced mitochondrial genomes we characterize the genetic diversity of different types of single nucleotide polymorphisms (SNPs) in African (L haplotypes) and non-African (M and N haplotypes) populations. We find that the overall level of polymorphism is higher in the mitochondrial compared to the nuclear genome, although the mitochondrial genome appears to be under stronger selection as indicated by proportionally fewer nonsynonymous than synonymous substitutions. The African mitochondrial genomes show higher heterozygosity, a greater number of polymorphic sites and higher frequencies of polymorphisms for synonymous, benign and damaging polymorphism than non-African genomes. However, African genomes carry significantly fewer SNPs that have been previously characterized as pathogenic compared to non-African genomes.
Conclusions
Finding SNPs classified as pathogenic to be the only category of polymorphisms that are more abundant in non-African genomes is best explained by a systematic ascertainment bias that favours the discovery of pathogenic polymorphisms segregating in non-African populations. This further suggests that, contrary to the common disease-common variant hypothesis, pathogenic mutations are largely population-specific and different SNPs may be associated with the same disease in different populations. Therefore, to obtain a comprehensive picture of the deleterious variability in the human population, as well as to improve the diagnostics of individuals carrying African mitochondrial haplotypes, it is necessary to survey different populations independently.
Reviewers
This article was reviewed by Dr Mikhail Gelfand, Dr Vasily Ramensky (nominated by Dr Eugene Koonin) and Dr David Rand (nominated by Dr Laurence Hurst).
doi:10.1186/1745-6150-5-68
PMCID: PMC3022564  PMID: 21194457
2.  Evolution of gene regulation of pluripotency - the case for wiki tracks at genome browsers 
Biology Direct  2010;5:67.
Background
Experimentally validated data on gene regulation are hard to obtain. In particular, information about transcription factor binding sites in regulatory regions are scattered around in the literature. This impedes their systematic in-context analysis, e.g. the inference of their conservation in evolutionary history.
Results
We demonstrate the power of integrative bioinformatics by including curated transcription factor binding site information into the UCSC genome browser, using wiki and custom tracks, which enable easy publication of annotation data. Data integration allows to investigate the evolution of gene regulation of the pluripotency-associated genes Oct4, Sox2 and Nanog. For the first time, experimentally validated transcription factor binding sites in the regulatory regions of all three genes were assembled together based on manual curation of data from 39 publications. Using the UCSC genome browser, these data were then visualized in the context of multi-species conservation based on genomic alignment. We confirm previous hypotheses regarding the evolutionary age of specific regulatory patterns, establishing their "deep homology". We also confirm some other principles of Carroll's "Genetic theory of Morphological Evolution", such as "mosaic pleiotropy", exemplified by the dual role of Sox2 reflected in its regulatory region.
Conclusions
We were able to elucidate some aspects of the evolution of gene regulation for three genes associated with pluripotency. Based on the expected return on investment for the community, we encourage other scientists to contribute experimental data on gene regulation (original work as well as data collected for reviews) to the UCSC system, to enable studies of the evolution of gene regulation on a large scale, and to report their findings.
Reviewers
This article was reviewed by Dr. Gustavo Glusman and Dr. Juan Caballero, Institute for Systems Biology, Seattle, USA (nominated by Dr. Doron Lancet, Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel), Dr. Niels Grabe, TIGA Center (BIOQUANT) and Medical Systems Biology Group, Institute of Medical Biometry and Informatics, University Hospital Heidelberg, Germany (nominated by Dr. Mikhail Gelfand, Department of Bioinformatics, Institute of Information Transfer Problems, Russian Academy of Science, Moscow, Russian Federation) and Dr. Franz-Josef Müller, Center for Regenerative Medicine, The Scripps Research Institute, La Jolla, CA, USA and University Hospital for Psychiatry and Psychotherapy (part of ZIP gGmbH), University of Kiel, Germany (nominated by Dr. Trey Ideker, University of California, San Diego, La Jolla CA, United States).
doi:10.1186/1745-6150-5-67
PMCID: PMC3024949  PMID: 21190561
3.  Bayesian classification of residues associated with protein functional divergence: Arf and Arf-like GTPases 
Biology Direct  2010;5:66.
Background
Certain residues within proteins are highly conserved across very distantly related organisms, yet their (presumably critical) structural or mechanistic roles are completely unknown. To obtain clues regarding such residues within Arf and Arf-like (Arf/Arl) GTPases--which function as on/off switches regulating vesicle trafficking, phospholipid metabolism and cytoskeletal remodeling--I apply a new sampling procedure for comparative sequence analysis, termed multiple category Bayesian Partitioning with Pattern Selection (mcBPPS).
Results
The mcBPPS sampler classified sequences within the entire P-loop GTPase class into multiple categories by identifying those evolutionarily-divergent residues most likely to be responsible for functional specialization. Here I focus on categories of residues that most distinguish various Arf/Arl GTPases from other GTPases. This identified residues whose specific roles have been previously proposed (and in some cases corroborated experimentally and that thus serve as positive controls), as well as several categories of co-conserved residues whose possible roles are first hinted at here. For example, Arf/Arl/Sar GTPases are most distinguished from other GTPases by a conserved aspartate residue within the phosphate binding loop (P-loop) and by co-conserved residues nearby that, together, can form a network of salt-bridge and hydrogen bond interactions centered on the GTPase active site. Residues corresponding to an N-[VI] motif that is conserved within Arf/Arl GTPases may play a role in the interswitch toggle characteristic of the Arf family, whereas other, co-conserved residues may modulate the flexibility of the guanine binding loop. Arl8 GTPases conserve residues that strikingly diverge from those typically found in other Arf/Arl GTPases and that form structural interactions suggestive of a novel interswitch toggle mechanism.
Conclusions
This analysis suggests specific mutagenesis experiments to explore mechanisms underlying GTP hydrolysis, nucleotide exchange and interswitch toggling within Arf/Arl GTPases. More generally, it illustrates how the mcBPPS sampler can complement traditional evolutionary analyses by providing an objective, quantitative and statistically rigorous way to explore protein functional-divergence in molecular detail. Because the sampler classifies the input sequences at the same time, it can be used to generate subgroup profiles, in which functionally-divergent categories of residues are annotated automatically.
Reviewers
This article was reviewed by Frank Eisenhaber, L Aravind and Daniel Gaston (nominated by Eric Bapteste). For the full reviews, go to the Reviewers' comments section.
doi:10.1186/1745-6150-5-66
PMCID: PMC3012027  PMID: 21129209
4.  The scenario on the origin of translation in the RNA world: in principle of replication parsimony 
Biology Direct  2010;5:65.
Background
It is now believed that in the origin of life, proteins should have been "invented" in an RNA world. However, due to the complexity of a possible RNA-based proto-translation system, this evolving process seems quite complicated and the associated scenario remains very blurry. Considering that RNA can bind amino acids with specificity, it has been reasonably supposed that initial peptides might have been synthesized on "RNA templates" containing multiple amino acid binding sites. This "Direct RNA Template (DRT)" mechanism is attractive because it should be the simplest mechanism for RNA to synthesize peptides, thus very likely to have been adopted initially in the RNA world. Then, how this mechanism could develop into a proto-translation system mechanism is an interesting problem.
Presentation of the hypothesis
Here an explanation to this problem is shown considering the principle of "replication parsimony" --- genetic information tends to be utilized in a parsimonious way under selection pressure, due to its replication cost (e.g., in the RNA world, nucleotides and ribozymes for RNA replication). Because a DRT would be quite long even for a short peptide, its replication cost would be great. Thus the diversity and the length of functional peptides synthesized by the DRT mechanism would be seriously limited. Adaptors (proto-tRNAs) would arise to allow a DRT's complementary strand (called "C-DRT" here) to direct the synthesis of the same peptide synthesized by the DRT itself. Because the C-DRT is a necessary part in the DRT's replication, fewer turns of the DRT's replication would be needed to synthesize definite copies of the functional peptide, thus saving the replication cost. Acting through adaptors, C-DRTs could transform into much shorter templates (called "proto-mRNAs" here) and substitute the role of DRTs, thus significantly saving the replication cost. A proto-rRNA corresponding to the small subunit rRNA would then emerge to aid the binding of proto-tRNAs and proto-mRNAs, allowing the reduction of base pairs between them (ultimately resulting in the triplet anticodon/codon pair), thus further saving the replication cost. In this context, the replication cost saved would allow the appearance of more and longer functional peptides and, finally, proteins. The hypothesis could be called "DRT-RP" ("RP" for "replication parsimony").
Testing the hypothesis
The scenario described here is open for experimental work at some key scenes, including the compact DRT mechanism, the development of adaptors from aa-aptamers, the synthesis of peptides by proto-tRNAs and proto-mRNAs without the participation of proto-rRNAs, etc. Interestingly, a recent computer simulation study has demonstrated the plausibility of one of the evolving processes driven by replication parsimony in the scenario.
Implication of the hypothesis
An RNA-based proto-translation system could arise gradually from the DRT mechanism according to the principle of "replication parsimony" --- to save the replication cost of RNA templates for functional peptides. A surprising side deduction along the logic of the hypothesis is that complex, biosynthetic amino acids might have entered the genetic code earlier than simple, prebiotic amino acids, which is opposite to the common sense. Overall, the present discussion clarifies the blurry scenario concerning the origin of translation with a major clue, which shows vividly how life could "manage" to exploit potential chemical resources in nature, eventually in an efficient way over evolution.
Reviewers
This article was reviewed by Eugene V. Koonin, Juergen Brosius, and Arcady Mushegian.
doi:10.1186/1745-6150-5-65
PMCID: PMC3002371  PMID: 21110883
5.  The common ancestry of life 
Biology Direct  2010;5:64.
Background
It is common belief that all cellular life forms on earth have a common origin. This view is supported by the universality of the genetic code and the universal conservation of multiple genes, particularly those that encode key components of the translation system. A remarkable recent study claims to provide a formal, homology independent test of the Universal Common Ancestry hypothesis by comparing the ability of a common-ancestry model and a multiple-ancestry model to predict sequences of universally conserved proteins.
Results
We devised a computational experiment on a concatenated alignment of universally conserved proteins which shows that the purported demonstration of the universal common ancestry is a trivial consequence of significant sequence similarity between the analyzed proteins. The nature and origin of this similarity are irrelevant for the prediction of "common ancestry" of by the model-comparison approach. Thus, homology (common origin) of the compared proteins remains an inference from sequence similarity rather than an independent property demonstrated by the likelihood analysis.
Conclusion
A formal demonstration of the Universal Common Ancestry hypothesis has not been achieved and is unlikely to be feasible in principle. Nevertheless, the evidence in support of this hypothesis provided by comparative genomics is overwhelming.
Reviewers
this article was reviewed by William Martin, Ivan Iossifov (nominated by Andrey Rzhetsky) and Arcady Mushegian. For the complete reviews, see the Reviewers' Report section.
doi:10.1186/1745-6150-5-64
PMCID: PMC2993666  PMID: 21087490
6.  How many antiviral small interfering RNAs may be encoded by the mammalian genomes? 
Biology Direct  2010;5:62.
Background
The discovery of RNA interference phenomenon (RNAi) and understanding of its mechanisms has revolutionized our views on many molecular processes in the living cell. Among the other, RNAi is involved in silencing of transposable elements and in inhibition of virus infection in various eukaryotic organisms. Recent experimental studies demonstrate few cases of viral replication suppression via complementary interactions between the mammalian small RNAs and viral transcripts.
Presentation of the hypothesis
It was found that >50% of the human genome is transcribed in different cell types and that these transcripts are mainly not associated with known protein coding genes, but represent non-coding RNAs of unknown functions. We propose a hypothesis that mammalian DNAs encode thousands RNA motifs that may serve for antiviral protection. We also presume that the evolutional success of some groups of genomic repeats and, in particular, of transposable elements (TEs) may be due to their ability to provide antiviral RNA motifs to the host organism. Intense genomic repeat propagation into the genome would inevitably cause bidirectional transcription of these sequences, and the resulting double-stranded RNAs may be recognized and processed by the RNA interference enzymatic machinery. Provided that these processed target motifs may be complementary to viral transcripts, fixation of the repeats into the host genome may be of a considerable benefit to the host. It fits with our bioinformatical data revealing thousands of 21-28 bp long motifs identical between human DNA and human-pathogenic adenoviral and herpesviral genomes. Many of these motifs are transcribed in human cells, and the transcribed part grows proportionally to their lengths. Many such motifs are included in human TEs. For example, one 23 nt-long motif that is a part of human abundant Alu retrotransposon, shares sequence identity with eight human adenoviral genomes.
Testing the hypothesis
This hypothesis could be tested on various mammalian species and viruses infecting mammalian cells.
Implications of the hypothesis
This hypothesis proposes that mammalian organisms may use their own genomes as sources of thousands of putative interfering RNA motifs that can be recruited to repress intracellular pathogens like proliferating viruses.
Reviewers
This article was reviewed by Eugene V. Koonin, Valerian V. Dolja and Yuri V. Shpakovski.
doi:10.1186/1745-6150-5-62
PMCID: PMC2992506  PMID: 21059241
7.  Modeling compositional dynamics based on GC and purine contents of protein-coding sequences 
Biology Direct  2010;5:63.
Background
Understanding the compositional dynamics of genomes and their coding sequences is of great significance in gaining clues into molecular evolution and a large number of publically-available genome sequences have allowed us to quantitatively predict deviations of empirical data from their theoretical counterparts. However, the quantification of theoretical compositional variations for a wide diversity of genomes remains a major challenge.
Results
To model the compositional dynamics of protein-coding sequences, we propose two simple models that take into account both mutation and selection effects, which act differently at the three codon positions, and use both GC and purine contents as compositional parameters. The two models concern the theoretical composition of nucleotides, codons, and amino acids, with no prerequisite of homologous sequences or their alignments. We evaluated the two models by quantifying theoretical compositions of a large collection of protein-coding sequences (including 46 of Archaea, 686 of Bacteria, and 826 of Eukarya), yielding consistent theoretical compositions across all the collected sequences.
Conclusions
We show that the compositions of nucleotides, codons, and amino acids are largely determined by both GC and purine contents and suggest that deviations of the observed from the expected compositions may reflect compositional signatures that arise from a complex interplay between mutation and selection via DNA replication and repair mechanisms.
Reviewers
This article was reviewed by Zhaolei Zhang (nominated by Mark Gerstein), Guruprasad Ananda (nominated by Kateryna Makova), and Daniel Haft.
doi:10.1186/1745-6150-5-63
PMCID: PMC2989939  PMID: 21059261
8.  Proteomic changes associated with deletion of the Magnaporthe oryzae conidial morphology-regulating gene COM1 
Biology Direct  2010;5:61.
Background
The rice blast disease caused by Magnaporthe oryzae is a major constraint on world rice production. The conidia produced by this fungal pathogen are the main source of disease dissemination. The morphology of conidia may be a critical factor in the spore dispersal and virulence of M. oryzae in the field. Deletion of a conidial morphology regulating gene encoding putative transcriptional regulator COM1 in M. oryzae resulted in aberrant conidial shape, reduced conidiation and attenuated virulence.
Results
In this study, a two-dimensional gel electrophoresis/matrix assisted laser desorption ionization- time of flight mass spectrometry (2-DE/MALDI-TOF MS) based proteomics approach was employed to identify the cellular and molecular components regulated by the COM1 protein (COM1p) that might contribute to the aberrant phenotypes in M. oryzae. By comparing the conidial proteomes of COM1 deletion mutant and its isogenic wild-type strain P131, we identified a potpourri of 31 proteins that exhibited statistically significant alterations in their abundance levels. Of these differentially regulated proteins, the abundance levels of nine proteins were elevated and twelve were reduced in the Δcom1 mutant. Three proteins were detected only in the Δcom1 conidial proteome, whereas seven proteins were apparently undetectable. The data obtained in the study suggest that the COM1p plays a key role in transcriptional reprogramming of genes implicated in melanin biosynthesis, carbon and energy metabolism, structural organization of cell, lipid metabolism, amino acid metabolism, etc. Semi-quantitative RT-PCR analysis revealed the down-regulation of genes encoding enzymes involved in melanin biosynthesis in the COM1 mutant.
Conclusions
Our results suggest that the COM1p may regulate the transcription of genes involved in various cellular processes indispensable for conidial development and appressorial penetration. These functions are likely to contribute to the effects of COM1p upon the aberrant phenotypes of M. oryzae.
Reviewers
This article is reviewed by George V. Shpakovski, Karthikeyan Sivaraman (nominated by M. Madan Babu) and Lakshminarayan M. Iyer.
doi:10.1186/1745-6150-5-61
PMCID: PMC2989938  PMID: 21040590
9.  Riboswitches as hormone receptors: hypothetical cytokinin-binding riboswitches in Arabidopsis thaliana 
Biology Direct  2010;5:60.
Background
Riboswitches are mRNA elements that change conformation when bound to small molecules. They are known to be key regulators of biosynthetic pathways in both prokaryotes and eukaryotes.
Presentation of the Hypothesis
The hypothesis presented here is that riboswitches function as receptors in hormone perception. We propose that riboswitches initiate or integrate signaling cascades upon binding to classic signaling molecules. The molecular interactions for ligand binding and gene expression control would be the same as for biosynthetic pathways, but the context and the cadre of ligands to consider is dramatically different. The hypothesis arose from the observation that a compound used to identify adenine binding RNA sequences is chemically similar to the classic plant hormone, or growth regulator, cytokinin. A general tenet of the hypothesis is that riboswitch-binding metabolites can be used to make predictions about chemically related signaling molecules. In fact, all cell permeable signaling compounds can be considered as potential riboswitch ligands. The hypothesis is plausible, as demonstrated by a cursory review of the transcriptome and genome of the model plant Arabidopsis thaliana for transcripts that i) contain an adenine aptamer motif, and ii) are also predicted to be cytokinin-regulated. Here, one gene, CRK10 (for Cysteine-rich Receptor-like Kinase 10, At4g23180), contains an adenine aptamer-related sequence and is down-regulated by cytokinin approximately three-fold in public gene expression data. To illustrate the hypothesis, implications of cytokinin-binding to the CRK10 mRNA are discussed.
Testing the hypothesis
At the broadest level, screening various cell permeable signaling molecules against random RNA libraries and comparing hits to sequence and gene expression data bases could determine how broadly the hypothesis applies. Specific cases, such as CRK10 presented here, will require experimental validation of direct ligand binding, altered RNA conformation, and effect on gene expression. Each case will be different depending on the signaling pathway and the physiology involved.
Implications of the hypothesis
This would be a very direct signal perception mechanism for regulating gene expression; rivaling animal steroid hormone receptors, which are frequently ligand dependent transcription initiation factors. Riboswitch-regulated responses could occur by modulating target RNA stability, translatability, and alternative splicing - all known expression platforms used in riboswitches. The specific illustration presented, CRK10, implies a new mechanism for the perception of cytokinin, a classic plant hormone. Experimental support for the hypothesis would add breadth to the growing list of important functions attributed to riboswitches.
Reviewers
This article was reviewed by Anthony Poole, Rob Knight, Mikhail Gelfand.
doi:10.1186/1745-6150-5-60
PMCID: PMC2974657  PMID: 20961447
10.  The calculation of information and organismal complexity 
Biology Direct  2010;5:59.
Background
It is difficult to measure precisely the phenotypic complexity of living organisms. Here we propose a method to calculate the minimal amount of genomic information needed to construct organism (effective information) as a measure of organismal complexity, by using permutation and combination formulas and Shannon's information concept.
Results
The results demonstrate that the calculated information correlates quite well with the intuitive organismal phenotypic complexity defined by traditional taxonomy and evolutionary theory. From viruses to human beings, the effective information gradually increases, from thousands of bits to hundreds of millions of bits. The simpler the organism is, the less the information; the more complex the organism, the more the information. About 13% of human genome is estimated as effective information or functional sequence.
Conclusions
The effective information can be used as a quantitative measure of phenotypic complexity of living organisms and also as an estimate of functional fraction of genome.
Reviewers
This article was reviewed by Dr. Lavanya Kannan (nominated by Dr. Arcady Mushegian), Dr. Chao Chen, and Dr. ED Rietman (nominated by Dr. Marc Vidal).
doi:10.1186/1745-6150-5-59
PMCID: PMC2973933  PMID: 20937149
11.  Application of computational approaches to study signalling networks of nuclear and Tyrosine kinase receptors 
Biology Direct  2010;5:58.
Background
Nuclear receptors (NRs) and Receptor tyrosine kinases (RTKs) are essential proteins in many cellular processes and sequence variations in their genes have been reported to be involved in many diseases including cancer. Although crosstalk between RTK and NR signalling and their contribution to the development of endocrine regulated cancers have been areas of intense investigation, the direct coupling of their signalling pathways remains elusive. In our understanding of the role and function of nuclear receptors on the cell membrane the interactions between nuclear receptors and tyrosine kinase receptors deserve further attention.
Results
We constructed a human signalling network containing nuclear receptors and tyrosine kinase receptors that identified a network topology involving eleven highly connected hubs.
We further developed an integrated knowledge database, denominated NR-RTK database dedicated to human RTKs and NRs and their vertebrate orthologs and their interactions. These interactions were inferred using computational tools and those supported by literature evidence are indicated. NR-RTK database contains links to other relevant resources and includes data on receptor ligands. It aims to provide a comprehensive interaction map that identifies complex dynamics and potential crosstalk involved.
Availability: NR-RTK database is accessible at http://www.bioinfo-cbs.org/NR-RTK/
Conclusions
We infer that the NR-RTK interaction network is scale-free topology. We also uncovered the key receptors mediating the signal transduction between these two types of receptors. Furthermore, NR-RTK database is expected to be useful for researchers working on various aspects of the molecular basis of signal transduction by RTKs and NRs.
Reviewers
This article was reviewed by Professor Paul Harrison (nominated by Dr. Mark Gerstein), Dr. Arcady Mushegian and Dr. Anthony Almudevar.
doi:10.1186/1745-6150-5-58
PMCID: PMC2964540  PMID: 20937105
12.  The origin of Eastern European Jews revealed by autosomal, sex chromosomal and mtDNA polymorphisms 
Biology Direct  2010;5:57.
Background
This study aims to establish the likely origin of EEJ (Eastern European Jews) by genetic distance analysis of autosomal markers and haplogroups on the X and Y chromosomes and mtDNA.
Results
According to the autosomal polymorphisms the investigated Jewish populations do not share a common origin, and EEJ are closer to Italians in particular and to Europeans in general than to the other Jewish populations. The similarity of EEJ to Italians and Europeans is also supported by the X chromosomal haplogroups. In contrast according to the Y-chromosomal haplogroups EEJ are closest to the non-Jewish populations of the Eastern Mediterranean. MtDNA shows a mixed pattern, but overall EEJ are more distant from most populations and hold a marginal rather than a central position. The autosomal genetic distance matrix has a very high correlation (0.789) with geography, whereas the X-chromosomal, Y-chromosomal and mtDNA matrices have a lower correlation (0.540, 0.395 and 0.641 respectively).
Conclusions
The close genetic resemblance to Italians accords with the historical presumption that Ashkenazi Jews started their migrations across Europe in Italy and with historical evidence that conversion to Judaism was common in ancient Rome. The reasons for the discrepancy between the biparental markers and the uniparental markers are discussed.
Reviewers
This article was reviewed by Damian Labuda (nominated by Jerzy Jurka), Kateryna Makova and Qasim Ayub (nominated by Dan Graur).
doi:10.1186/1745-6150-5-57
PMCID: PMC2964539  PMID: 20925954
13.  Relating underrepresented genomic DNA patterns and tiRNAs: the rule behind the observation and beyond 
Biology Direct  2010;5:56.
Background
One of the central problems of post-genomic biology is the understanding of regulatory network of genes. Traditionally the problem is approached from the protein-DNA interaction perspective. In recent years various types of noncoding RNAs appeared on the scene as new potent players of the game. The exact role of these molecules in gene expression control is mostly unknown at present, while their importance is generally recognized.
Results
The Human and Mouse genomes have been screened with a statistical model for sequence patterns underrepresented in these genomes, and a subset of motifs, named spanions, has been identified. The common portion of the motif lists of the two species is 75% indicating evolutionary conservation of this feature. These motifs are arranged in clusters at close proximity of distinct genetic landmarks: 5' ends of genes, exon side of the exon/intron junctions and 5' ends of 3' UTRs. The length of the clusters is typically in the 20 to 25 bases range. The findings are in agreement with the known C/G bias of promoter regions while access much more sequential information than the simple composition based model.
In the Human genome the recently reported transcription initiation RNAs (tiRNAs) are typically transcribed from these spanion clusters according to the presented results. The spanion clusters account for 70% of the published tiRNAs. Apparently, the model access the common statistical feature of this new and mostly uncharacterized non-coding RNA class and, in this way, supports the experimental observations with theoretical background.
Conclusions
The presented results seem to support the emerging model of the RNA-driven eukaryotic gene expression control. Beyond that, the model detects spanion clusters at genetic positions where no tiRNA counterpart was considered and reported. The GO-term analysis of genes with high concentration of spanion clusters in their promoter proximal region indicates involvement in gene regulatory processes. The results of the analysis suggest that the gene regulatory potential of the small non-coding RNAs is grossly underestimated at present.
Reviewers
This article was reviewed by Frank Eisenhaber, Sandor Pongor and Rotem Sorek (nominated by Doron Lancet).
doi:10.1186/1745-6150-5-56
PMCID: PMC3583238  PMID: 20860791
14.  The ancient function of RB-E2F Pathway: insights from its evolutionary history 
Biology Direct  2010;5:55.
Background
The RB-E2F pathway is conserved in most eukaryotic lineages, including animals and plants. E2F and RB family proteins perform crucial functions in cycle controlling, differentiation, development and apoptosis. However, there are two kinds of E2Fs (repressive E2Fs and active E2Fs) and three RB family members in human. Till now, the detail evolutionary history of these protein families and how RB-E2F pathway evolved in different organisms remain poorly explored.
Results
We performed a comprehensive evolutionary analysis of E2F, RB and DP (dimerization partners of E2Fs) protein family in representative eukaryotic organisms. Several interesting facts were revealed. First, orthologues of RB, E2F, and DP family are present in several representative unicellular organisms and all multicellular organisms we checked. Second, ancestral E2F, RB genes duplicated before placozoans and bilaterians diverged, thus E2F family was divided into E2F4/5 subgroup (including repressive E2Fs: E2F4 and E2F5) and E2F1/2/3 subgroup (including active E2Fs: E2F1, E2F2 and E2F3), RB family was divided into RB1 subgroup (including RB1) and RBL subgroup (including RBL1 and RBL2). Third, E2F4 and E2F5 share more sequence similarity with the predicted E2F ancestral sequence than E2F1, E2F2 and E2F3; E2F4 and E2F5 also possess lower evolutionary rates and higher purification selection pressures than E2F1, E2F2 and E2F3. Fourth, for RB family, the RBL subgroup proteins possess lower evolutionary rates and higher purification selection pressures compared with RB subgroup proteins in vertebrates,
Conclusions
Protein evolutionary rates and purification selection pressures are usually linked with protein functions. We speculated that function conducted by E2F4/5 subgroup and RBL subgroup proteins might mainly represent the ancient function of RB-E2F pathway, and the E2F1/2/3 subgroup proteins and RB1 protein might contribute more to functional diversification in RB-E2F pathway. Our results will enhance the current understanding of RB-E2F pathway and will also be useful to further functional studies in human and other model organisms.
Reviewers
This article was reviewed by Dr. Pierre Pontarotti, Dr. Arcady Mushegian and Dr. Zhenguo Lin (nominated by Dr. Neil Smalheiser).
doi:10.1186/1745-6150-5-55
PMCID: PMC3224931  PMID: 20849664
15.  Asymmetric and non-uniform evolution of recently duplicated human genes 
Biology Direct  2010;5:54.
Background
Gene duplications are a source of new genes and protein functions. The innovative role of duplication events makes families of paralogous genes an interesting target for studies in evolutionary biology. Here we study global trends in the evolution of human genes that resulted from recent duplications.
Results
The pressure of negative selection is weaker during a short time immediately after a duplication event. Roughly one fifth of genes in paralogous gene families are evolving asymmetrically: one of the proteins encoded by two closest paralogs accumulates amino acid substitutions significantly faster than its partner. This asymmetry cannot be explained by differences in gene expression levels. In asymmetric gene pairs the number of deleterious mutations is increased in one copy, while decreased in the other copy as compared to genes constituting non-asymmetrically evolving pairs. The asymmetry in the rate of synonymous substitutions is much weaker and not significant.
Conclusions
The increase of negative selection pressure over time after a duplication event seems to be a major trend in the evolution of human paralogous gene families. The observed asymmetry in the evolution of paralogous genes shows that in many cases one of two gene copies remains practically unchanged, while the other accumulates functional mutations. This supports the hypothesis that slowly evolving gene copies preserve their original functions, while fast evolving copies obtain new specificities or functions.
Reviewers
This article was reviewed by Dr. Igor Rogozin (nominated by Dr. Arcady Mushegian), Dr. Fyodor Kondrashov, and Dr. Sergei Maslov.
doi:10.1186/1745-6150-5-54
PMCID: PMC2942815  PMID: 20825637
16.  Uniting sex and eukaryote origins in an emerging oxygenic world 
Biology Direct  2010;5:53.
Background
Theories about eukaryote origins (eukaryogenesis) need to provide unified explanations for the emergence of diverse complex features that define this lineage. Models that propose a prokaryote-to-eukaryote transition are gridlocked between the opposing "phagocytosis first" and "mitochondria as seed" paradigms, neither of which fully explain the origins of eukaryote cell complexity. Sex (outcrossing with meiosis) is an example of an elaborate trait not yet satisfactorily addressed in theories about eukaryogenesis. The ancestral nature of meiosis and its dependence on eukaryote cell biology suggest that the emergence of sex and eukaryogenesis were simultaneous and synergic and may be explained by a common selective pressure.
Presentation of the hypothesis
We propose that a local rise in oxygen levels, due to cyanobacterial photosynthesis in ancient Archean microenvironments, was highly toxic to the surrounding biota. This selective pressure drove the transformation of an archaeal (archaebacterial) lineage into the first eukaryotes. Key is that oxygen might have acted in synergy with environmental stresses such as ultraviolet (UV) radiation and/or desiccation that resulted in the accumulation of reactive oxygen species (ROS). The emergence of eukaryote features such as the endomembrane system and acquisition of the mitochondrion are posited as strategies to cope with a metabolic crisis in the cell plasma membrane and the accumulation of ROS, respectively. Selective pressure for efficient repair of ROS/UV-damaged DNA drove the evolution of sex, which required cell-cell fusions, cytoskeleton-mediated chromosome movement, and emergence of the nuclear envelope. Our model implies that evolution of sex and eukaryogenesis were inseparable processes.
Testing the hypothesis
Several types of data can be used to test our hypothesis. These include paleontological predictions, simulation of ancient oxygenic microenvironments, and cell biological experiments with Archaea exposed to ROS and UV stresses. Studies of archaeal conjugation, prokaryotic DNA recombination, and the universality of nuclear-mediated meiotic activities might corroborate the hypothesis that sex and the nucleus evolved to support DNA repair.
Implications of the hypothesis
Oxygen tolerance emerges as an important principle to investigate eukaryogenesis. The evolution of eukaryotic complexity might be best understood as a synergic process between key evolutionary innovations, of which meiosis (sex) played a central role.
Reviewers
This manuscript was reviewed by Eugene V. Koonin, Anthony M. Poole, and Gáspár Jékely.
doi:10.1186/1745-6150-5-53
PMCID: PMC2933680  PMID: 20731852
17.  Encoding the states of interacting proteins to facilitate biological pathways reconstruction 
Biology Direct  2010;5:52.
Background
In a systems biology perspective, protein-protein interactions (PPI) are encoded in machine-readable formats to avoid issues encountered in their retrieval for the reconstruction of comprehensive interaction maps and biological pathways. However, the information stored in electronic formats currently used doesn't allow a valid automatic reconstruction of biological pathways.
Results
We propose a logical model of PPI that takes into account the "state" of proteins before and after the interaction. This information is necessary for proper reconstruction of the pathway.
Conclusions
The adoption of the proposed model, which can be easily integrated into existing machine-readable formats used to store the PPI data, would facilitate the automatic or semi-automated reconstruction of biological pathways.
Reviewers
This article was reviewed by Dr. Wen-Yu Chung (nominated by Kateryna Makova), Dr. Carl Herrmann (nominated by Dr. Purificación López-García) and Dr. Arcady Mushegian.
doi:10.1186/1745-6150-5-52
PMCID: PMC2930634  PMID: 20707925
18.  Measuring gene expression divergence: the distance to keep 
Biology Direct  2010;5:51.
Background
Gene expression divergence is a phenotypic trait reflecting evolution of gene regulation and characterizing dissimilarity between species and between cells and tissues within the same species. Several distance measures, such as Euclidean and correlation-based distances have been proposed for measuring expression divergence.
Results
We show that different distance measures identify different trends in gene expression patterns. When comparing orthologous genes in eight rat and human tissues, the Euclidean distance identified genes uniformly expressed in all tissues near the expression background as genes with the most conserved expression pattern. In contrast, correlation-based distance and generalized-average distance identified genes with concerted changes among homologous tissues as those most conserved. On the other hand, correlation-based distance, Euclidean distance and generalized-average distance highlight quite well the relatively high similarity of gene expression patterns in homologous tissues between species, compared to non-homologous tissues within species.
Conclusions
Different trends exist in the high-dimensional numeric data, and to highlight a particular trend an appropriate distance measure needs to be chosen. The choice of the distance measure for measuring expression divergence can be dictated by the expression patterns that are of interest in a particular study.
Reviewers
This article was reviewed by Mikhail Gelfand, Eugene Koonin and Subhajyoti De (nominated by Sarah Teichmann).
doi:10.1186/1745-6150-5-51
PMCID: PMC2928186  PMID: 20691088
19.  A kinetic model of TBP auto-regulation exhibits bistability 
Biology Direct  2010;5:50.
Background
TATA Binding Protein (TBP) is required for transcription initiation by all three eukaryotic RNA polymerases. It participates in transcriptional initiation at the majority of eukaryotic gene promoters, either by direct association to the TATA box upstream of the transcription start site or by indirectly localizing to the promoter through other proteins. TBP exists in solution in a dimeric form but binds to DNA as a monomer. Here, we present the first mathematical model for auto-catalytic TBP expression and use it to study the role of dimerization in maintaining the steady state TBP level.
Results
We show that the autogenous regulation of TBP results in a system that is capable of exhibiting three steady states: an unstable low TBP state, one stable state corresponding to a physiological TBP concentration, and another stable steady state corresponding to unviable cells where no TBP is expressed. Our model predicts that a basal level of TBP is required to establish the transcription of the TBP gene, and hence for cell viability. It also predicts that, for the condition corresponding to a typical mammalian cell, the high-TBP state and cell viability is sensitive to variation in DNA binding strength. We use the model to explore the effect of the dimer in buffering the response to changes in TBP levels, and show that for some physiological conditions the dimer is not important in buffering against perturbations.
Conclusions
Results on the necessity of a minimum basal TBP level support the in vivo observations that TBP is maternally inherited, providing the small amount of TBP required to establish its ubiquitous expression. The model shows that the system is sensitive to variations in parameters indicating that it is vulnerable to mutations in TBP. A reduction in TBP-DNA binding constant can lead the system to a regime where the unviable state is the only steady state. Contrary to the current hypotheses, we show that under some physiological conditions the dimer is not very important in restoring the system to steady state. This model demonstrates the use of mathematical modelling to investigate system behaviour and generate hypotheses governing the dynamics of such nonlinear biological systems.
Reviewers
This article was reviewed by Tomasz Lipniacki, James Faeder and Anna Marciniak-Czochra.
doi:10.1186/1745-6150-5-50
PMCID: PMC2928763  PMID: 20687914
20.  Phylogenetic and regulatory region analysis of Wnt5 genes reveals conservation of a regulatory module with putative implication in pancreas development 
Biology Direct  2010;5:49.
Background
Wnt5 genes belong to the large Wnt family, encoding proteins implicated into several tumorigenic and developmental processes. Phylogenetic analyses showed that Wnt5 gene has been duplicated at the divergence time of gnathostomata from agnatha. Interestingly, experimental data for some species indicated that only one of the two Wnt5 paralogs participates in the development of the endocrine pancreas. The purpose of this paper is to reexamine the phylogenetic history of the Wnt5 developmental regulators and investigate the functional shift between paralogs through comparative genomics.
Results
In this study, the phylogeny of Wnt5 genes was investigated in species belonging to protostomia and deuterostomia. Furthermore, an in silico regulatory region analysis of Wnt5 paralogs was conducted, limited to those species with insulin producing cells and pancreas, covering the evolutionary distance from agnatha to gnathostomata. Our results confirmed the Wnt5 gene duplication and additionally revealed that this duplication event included also the upstream region. Moreover, within this latter region, a conserved module was detected to which a complex of transcription factors, known to be implicated in embryonic pancreas formation, bind.
Conclusions
Results and observations presented in this study, allow us to conclude that during evolution, the Wnt5 gene has been duplicated in early vertebrates, and that some paralogs conserved a module within their regulatory region, functionally related to embryonic development of pancreas. Interestingly, our results allowed advancing a possible explanation on why the Wnt5 orthologs do not share the same function during pancreas development. As a final remark, we suggest that an in silico comparative analysis of regulatory regions, especially when associated to published experimental data, represents a powerful approach for explaining shift of roles among paralogs.
Reviewers
This article was reviewed by Sarath Janga (nominated by Sarah Teichmann), Ran Kafri (nominated by Yitzhak Pilpel), and Andrey Mironov (nominated by Mikhail Gelfand).
doi:10.1186/1745-6150-5-49
PMCID: PMC2922100  PMID: 20684756
21.  Predicted class-I aminoacyl tRNA synthetase-like proteins in non-ribosomal peptide synthesis 
Biology Direct  2010;5:48.
Background
Recent studies point to a great diversity of non-ribosomal peptide synthesis systems with major roles in amino acid and co-factor biosynthesis, secondary metabolism, and post-translational modifications of proteins by peptide tags. The least studied of these systems are those utilizing tRNAs or aminoacyl-tRNA synthetases (AAtRS) in non-ribosomal peptide ligation.
Results
Here we describe novel examples of AAtRS related proteins that are likely to be involved in the synthesis of widely distributed peptide-derived metabolites. Using sensitive sequence profile methods we show that the cyclodipeptide synthases (CDPSs) are members of the HUP class of Rossmannoid domains and are likely to be highly derived versions of the class-I AAtRS catalytic domains. We also identify the first eukaryotic CDPSs in fungi and in animals; they might be involved in immune response in the latter organisms. We also identify a paralogous version of the methionyl-tRNA synthetase, which is widespread in bacteria, and present evidence using contextual information that it might function independently of protein synthesis as a peptide ligase in the formation of a peptide- derived secondary metabolite. This metabolite is likely to be heavily modified through multiple reactions catalyzed by a metal-binding cupin domain and a lysine N6 monooxygenase that are strictly associated with this paralogous methionyl-tRNA synthetase (MtRS). We further identify an analogous system wherein the MtRS has been replaced by more typical peptide ligases with the ATP-grasp or modular condensation-domains.
Conclusions
The prevalence of these predicted biosynthetic pathways in phylogenetically distant, pathogenic or symbiotic bacteria suggests that metabolites synthesized by them might participate in interactions with the host. More generally, these findings point to a complete spectrum of recruitment of AAtRS to various non-ribosomal biosynthetic pathways, ranging from the conventional AAtRS, through closely related paralogous AAtRS dedicated to certain pathways, to highly derived versions of the class-I AAtRS catalytic domain like the CDPSs. Both the conventional AAtRS and their closely related paralogs often provide aminoacylated tRNAs for peptide ligations by MprF/Fem/MurM-type acetyltransferase fold ligases in the synthesis of peptidoglycan, N-end rule modifications of proteins, lipid aminoacylation or biosynthesis of antibiotics, such as valinamycin. Alternatively they might supply aminoacylated tRNAs for other biosynthetic pathways like that for tetrapyrrole or directly function as peptide ligases as in the case of mycothiol and those identified here.
Reviewers
This article was reviewed by Andrei Osterman and Igor Zhulin.
doi:10.1186/1745-6150-5-48
PMCID: PMC2922099  PMID: 20678224
22.  Some considerations for analyzing biodiversity using integrative metagenomics and gene networks 
Biology Direct  2010;5:47.
Background
Improving knowledge of biodiversity will benefit conservation biology, enhance bioremediation studies, and could lead to new medical treatments. However there is no standard approach to estimate and to compare the diversity of different environments, or to study its past, and possibly, future evolution.
Presentation of the hypothesis
We argue that there are two conditions for significant progress in the identification and quantification of biodiversity. First, integrative metagenomic studies - aiming at the simultaneous examination (or even better at the integration) of observations about the elements, functions and evolutionary processes captured by the massive sequencing of multiple markers - should be preferred over DNA barcoding projects and over metagenomic projects based on a single marker. Second, such metagenomic data should be studied with novel inclusive network-based approaches, designed to draw inferences both on the many units and on the many processes present in the environments.
Testing the hypothesis
We reached these conclusions through a comparison of the theoretical foundations of two molecular approaches seeking to assess biodiversity: metagenomics (mostly used on prokaryotes and protists) and DNA barcoding (mostly used on multicellular eukaryotes), and by pragmatic considerations of the issues caused by the 'species problem' in biodiversity studies.
Implications of the hypothesis
Evolutionary gene networks reduce the risk of producing biodiversity estimates with limited explanatory power, biased either by unequal rates of LGT, or difficult to interpret due to (practical) problems caused by type I and type II grey zones. Moreover, these networks would easily accommodate additional (meta)transcriptomic and (meta)proteomic data.
Reviewers
This article was reviewed by Pr. William Martin, Dr. David Williams (nominated by Pr. J Peter Gogarten) & Dr. James McInerney (nominated by Pr. John Logsdon).
doi:10.1186/1745-6150-5-47
PMCID: PMC2921367  PMID: 20673351
23.  The evolutionary rate variation among genes of HOG-signaling pathway in yeast genomes 
Biology Direct  2010;5:46.
Background
Responses to extracellular stress are required for microbes to survive in changing environments. Although the stress response mechanisms have been characterized extensively, the evolution of stress response pathway remains poorly understood. Here, we studied the evolution of High Osmolarity Glycerol (HOG) pathway, one of the important osmotic stress response pathways, across 10 yeast species and underpinned the evolutionary forces acting on the pathway evolution.
Results
Although the HOG pathway is well conserved across the surveyed yeast species, the evolutionary rate of the genes in this pathway varied substantially among or within different lineages. The fast divergence of MSB2 gene indicates that this gene is subjected to positive selection. Moreover, transcription factors in HOG pathway tend to evolve more rapidly, but the genes in conserved MAPK cascade underwent stronger functional selection. Remarkably, the dN/dS values are negatively correlated with pathway position along HOG pathway from Sln1 (Sho1) to Hog1 for transmitting external signal into nuclear. The increased gradient of selective constraints from upstream to downstream genes suggested that the downstream genes are more pleiotropic, being required for a wider range of pathways. In addition, protein length, codon usage, gene expression, and protein interaction appear to be important factors to determine the evolution of genes in HOG pathway.
Conclusions
Taken together, our results suggest that functional constraints play a large role in the evolutionary rate variation in HOG pathway, but the genetic variation was influenced by quite complicated factors, such as pathway position, protein length and so on. These findings provide some insights into how HOG pathway genes evolved rapidly for responding to environmental osmotic stress changes.
Reviewers
This article was reviewed by Han Liang (nominated by Laura Landweber), Georgy Bazykin (nominated by Mikhail Gelfand) and Zhenguo Lin (nominated by John Logsdon).
doi:10.1186/1745-6150-5-46
PMCID: PMC2914728  PMID: 20618989
24.  Opportunities and challenges for digital morphology 
Biology Direct  2010;5:45.
Advances in digital data acquisition, analysis, and storage have revolutionized the work in many biological disciplines such as genomics, molecular phylogenetics, and structural biology, but have not yet found satisfactory acceptance in morphology. Improvements in non-invasive imaging and three-dimensional visualization techniques, however, permit high-throughput analyses also of whole biological specimens, including museum material. These developments pave the way towards a digital era in morphology. Using sea urchins (Echinodermata: Echinoidea), we provide examples illustrating the power of these techniques. However, remote visualization, the creation of a specialized database, and the implementation of standardized, world-wide accepted data deposition practices prior to publication are essential to cope with the foreseeable exponential increase in digital morphological data.
Reviewers
This article was reviewed by Marc D. Sutton (nominated by Stephan Beck), Gonzalo Giribet (nominated by Lutz Walter), and Lennart Olsson (nominated by Purificación López-García).
doi:10.1186/1745-6150-5-45
PMCID: PMC2908069  PMID: 20604956
25.  Save the tree of life or get lost in the woods 
Biology Direct  2010;5:44.
Background
The wealth of prokaryotic genomic data available has revealed that the histories of many genes are inconsistent, leading some to question the value of the tree of life hypothesis. It has been argued that a tree-like representation requires suppressing too much information, and that a more pluralistic approach is necessary for understanding prokaryotic evolution. We argue that trees may still be a useful representation for evolutionary histories in light of new data.
Results
Genomic data alone can be highly misleading when trying to resolve the tree of life. We present evidence from protein abundance data sets that genomic conservation greatly underestimates functional conservation. Function follows more of a tree-like structure than genetic material, even in the presence of horizontal transfer. We argue that the tree of cells must be incorporated into any new synthesis in order to place horizontal transfers into their proper selective context. We also discuss the role data sources other than primary sequence can play in resolving the tree of cells.
Conclusions
The tree of life is alive, but not well. Construction of the tree of cells has been viewed as the end goal of the study of evolution, where in reality we need to consider it more of a starting point. We propose a duality where we must consider variation of genetic material in terms of networks and selection of cellular function in terms of trees. Otherwise one gets lost in the woods of neutral evolution.
Reviewers
This article was reviewed by Dr. Eric Bapteste, Dr. Arcady Mushegian, and Dr. Celine Brochier.
doi:10.1186/1745-6150-5-44
PMCID: PMC2910001  PMID: 20594329

Results 1-25 (68)