Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
Document Types
1.  The Evolution of the Ribosome and the Genetic Code 
Life : Open Access Journal  2014;4(2):227-249.
The evolution of the genetic code is mapped out starting with the aminoacyl tRNA-synthetases and their interaction with the operational code in the tRNA acceptor arm. Combining this operational code with a metric based on the biosynthesis of amino acids from the Citric acid, we come to the conclusion that the earliest genetic code was a Guanine Cytosine (GC) code. This has implications for the likely earliest positively charged amino acids. The progression from this pure GC code to the extant one is traced out in the evolution of the Large Ribosomal Subunit, LSU, and its proteins; in particular those associated with the Peptidyl Transfer Center (PTC) and the nascent peptide exit tunnel. This progression has implications for the earliest encoded peptides and their evolutionary progression into full complex proteins.
PMCID: PMC4187167  PMID: 25370196
Genetic Code; evolutions; ribosomal proteins
2.  SCOREM: statistical consolidation of redundant expression measures 
Nucleic Acids Research  2011;40(6):e46.
Many platforms for genome-wide analysis of gene expression contain ‘redundant’ measures for the same gene. For example, the most highly utilized platforms for gene expression microarrays, Affymetrix GeneChip® arrays, have as many as ten or more probe sets for some genes. Occasionally, individual probe sets for the same gene report different trends in expression across experimental conditions, a situation that must be resolved in order to accurately interpret the data. We developed an algorithm, SCOREM, for determining the level of agreement between such probe sets, utilizing a statistical test of concordance, Kendall's W coefficient of concordance, and a graph-searching algorithm for the identification of concordant probe sets. We also present methods for consolidating concordant groups into a single value for its corresponding gene and for post hoc analysis of discordant groups. By combining statistical consolidation with sequence analysis, SCOREM possesses the unique ability to identify biologically meaningful discordant behaviors, including differing behaviors in alternate RNA isoforms and tissue-specific patterns of expression. When consolidating concordant behaviors, SCOREM outperforms other methods in detecting both differential expression and overrepresented functional categories.
PMCID: PMC3315298  PMID: 22210887
3.  Comparison of the Complete Protein Sets of Worm and Yeast: Orthology and Divergence 
Science (New York, N.Y.)  1998;282(5396):2022-2028.
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.
PMCID: PMC3057080  PMID: 9851918
4.  Trichomonas Transmembrane Cyclases Result from Massive Gene Duplication and Concomitant Development of Pseudogenes 
Trichomonas vaginalis has an unusually large genome (∼160 Mb) encoding ∼60,000 proteins. With the goal of beginning to understand why some Trichomonas genes are present in so many copies, we characterized here a family of ∼123 Trichomonas genes that encode transmembrane adenylyl cyclases (TMACs).
Methodology/Principal Findings
The large family of TMACs genes is the result of recent duplications of a small set of ancestral genes that appear to be unique to trichomonads. Duplicated TMAC genes are not closely associated with repetitive elements, and duplications of flanking sequences are rare. However, there is evidence for TMAC gene replacements by homologous recombination. A high percentage of TMAC genes (∼46%) are pseudogenes, as they contain stop codons and/or frame shifts, or the genes are truncated. Numerous stop codons present in the genome project G3 strain are not present in orthologous genes of two other Trichomonas strains (S1 and B7RC2). Each TMAC is composed of a series of N-terminal transmembrane helices and a single C-terminal cyclase domain that has adenylyl cyclase activity. Multiple TMAC genes are transcribed by Trichomonas cloned by limiting dilution.
We conclude that one reason for the unusually large genome of Trichomonas is the presence of unstable families of genes such as those encoding TMACs that are undergoing massive gene duplication and concomitant development of pseudogenes.
Author Summary
Trichomonas vaginalis is the only medically important protist (single-cell eukaryote) that is sexually transmitted. The ∼160-Mb Trichomonas genome contains more predicted protein-encoding genes (∼60,000) than the human genome. To begin to understand why there are so many copies of some genes, we chose here to study a large family of genes encoding unique transmembrane cyclases. Our most important results include the following. More than 100 transmembrane cyclase genes do not result from chromosomal duplications, because for the most part only the coding regions of the genes, rather than flanking sequences, are duplicated. Almost half of the transmembrane cyclase genes are pseudogenes, and these pseudogenes are polymorphic among laboratory strains of Trichomonas. Messenger RNAs for numerous transmembrane cyclases are expressed simultaneously, and representative cyclase domains have adenylyl cyclase activity. In summary, the large family of Trichomonas genes encoding transmembrane adenylyl cyclases results from massive gene duplication and concomitant development of pseudogenes.
PMCID: PMC2914791  PMID: 20689771
5.  GTPases and the origin of the ribosome 
Biology Direct  2010;5:36.
This paper is an attempt to trace the evolution of the ribosome through the evolution of the universal P-loop GTPases that are involved with the ribosome in translation and with the attachment of the ribosome to the membrane. The GTPases involved in translation in Bacteria/Archaea are the elongation factors EFTu/EF1, the initiation factors IF2/aeIF5b + aeIF2, and the elongation factors EFG/EF2. All of these GTPases also contain the OB fold also found in the non GTPase IF1 involved in initiation. The GTPase involved in the signal recognition particle in most Bacteria and Archaea is SRP54.
1) The Elongation Factors of the Archaea based on structural considerations of the domains have the following evolutionary path: EF1→ aeIF2 → EF2. The evolution of the aeIF5b was a later event; 2) the Elongation Factors of the Bacteria based on the topological considerations of the GTPase domain have a similar evolutionary path: EFTu→ IF→2→EFG. These evolutionary sequences reflect the evolution of the LSU followed by the SSU to form the ribosome; 3) the OB-fold IF1 is a mimic of an ancient tRNA minihelix.
The evolution of translational GTPases of both the Archaea and Bacteria point to the evolution of the ribosome. The elongation factors, EFTu/EF1, began as a Ras-like GTPase bringing the activated minihelix tRNA to the Large Subunit Unit. The initiation factors and elongation factor would then have evolved from the EFTu/EF1 as the small subunit was added to the evolving ribosome. The SRP has an SRP54 GTPase and a specific RNA fold in its RNA component similar to the PTC. We consider the SRP to be a remnant of an ancient form of an LSU bound to a membrane.
This article was reviewed by George Fox, Leonid Mirny and Chris Sander.
PMCID: PMC2881122  PMID: 20487556
6.  Transcriptional Analysis of Fracture Healing and the Induction of Embryonic Stem Cell–Related Genes 
PLoS ONE  2009;4(5):e5393.
Fractures are among the most common human traumas. Fracture healing represents a unique temporarily definable post-natal process in which to study the complex interactions of multiple molecular events that regulate endochondral skeletal tissue formation. Because of the regenerative nature of fracture healing, it is hypothesized that large numbers of post-natal stem cells are recruited and contribute to formation of the multiple cell lineages that contribute to this process. Bayesian modeling was used to generate the temporal profiles of the transcriptome during fracture healing. The temporal relationships between ontologies that are associated with various biologic, metabolic, and regulatory pathways were identified and related to developmental processes associated with skeletogenesis, vasculogenesis, and neurogenesis. The complement of all the expressed BMPs, Wnts, FGFs, and their receptors were related to the subsets of transcription factors that were concurrently expressed during fracture healing. We further defined during fracture healing the temporal patterns of expression for 174 of the 193 genes known to be associated with human genetic skeletal disorders. In order to identify the common regulatory features that might be present in stem cells that are recruited during fracture healing to other types of stem cells, we queried the transcriptome of fracture healing against that seen in embryonic stem cells (ESCs) and mesenchymal stem cells (MSCs). Approximately 300 known genes that are preferentially expressed in ESCs and ∼350 of the known genes that are preferentially expressed in MSCs showed induction during fracture healing. Nanog, one of the central epigenetic regulators associated with ESC stem cell maintenance, was shown to be associated in multiple forms or bone repair as well as MSC differentiation. In summary, these data present the first temporal analysis of the transcriptome of an endochondral bone formation process that takes place during fracture healing. They show that neurogenesis as well as vasculogenesis are predominant components of skeletal tissue formation and suggest common pathways are shared between post-natal stem cells and those seen in ESCs.
PMCID: PMC2673045  PMID: 19415118
7.  The origin and evolution of the ribosome 
Biology Direct  2008;3:16.
The origin and early evolution of the active site of the ribosome can be elucidated through an analysis of the ribosomal proteins' taxonomic block structures and their RNA interactions. Comparison between the two subunits, exploiting the detailed three-dimensional structures of the bacterial and archaeal ribosomes, is especially informative.
The analysis of the differences between these two sites can be summarized as follows: 1) There is no self-folding RNA segment that defines the decoding site of the small subunit; 2) there is one self-folding RNA segment encompassing the entire peptidyl transfer center of the large subunit; 3) the protein contacts with the decoding site are made by a set of universal alignable sequence blocks of the ribosomal proteins; 4) the majority of those peptides contacting the peptidyl transfer center are made by bacterial or archaeal-specific sequence blocks.
These clear distinctions between the two subunit active sites support an earlier origin for the large subunit's peptidyl transferase center (PTC) with the decoding site of the small subunit being a later addition to the ribosome. The main implications are that a single self-folding RNA, in conjunction with a few short stabilizing peptides, formed the precursor of the modern ribosomal large subunit in association with a membrane.
This article was reviewed by Jerzy Jurka, W. Ford Doolittle, Eugene Shaknovich, and George E. Fox (nominated by Jerzy Jurka).
PMCID: PMC2386862  PMID: 18430223
8.  Inferring genome-scale rearrangement phylogeny and ancestral gene order: a Drosophila case study 
Genome Biology  2007;8(11):R236.
A simple, fast, and biologically-inspired computational approach to infer genome-scale rearrangement phylogeny and ancestral gene order has been developed and applied to eight Drosophila genomes, providing insights into evolutionary chromosomal dynamics.
A simple, fast, and biologically inspired computational approach for inferring genome-scale rearrangement phylogeny and ancestral gene order has been developed. This has been applied to eight Drosophila genomes. Existing techniques are either limited to a few hundred markers or a small number of taxa. This analysis uses over 14,000 genomic loci and employs discrete elements consisting of pairs of homologous genetic elements. The results provide insight into evolutionary chromosomal dynamics and synteny analysis, and inform speciation studies.
PMCID: PMC2258185  PMID: 17996033
9.  Transcription Factor Map Alignment of Promoter Regions 
PLoS Computational Biology  2006;2(5):e49.
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Sequence comparisons and alignments are among the most powerful tools in research in biology. Since similar sequences play, in general, similar functions, identification of sequence conservation between two or more nucleotide or amino acid sequences is often used to infer common biological functionality. Sequence comparisons, however, have limitations; often similar functions are encoded by higher order elements which do not hold a univocal relationship to the underlying primary sequence. In consequence, similar functions are frequently encoded by diverse sequences. Promoter regions are a case in point. Often, promoter sequences of genes with similar expression patterns do not show conservation. This is because, even though their expression may be regulated by a similar arrangement of transcription factors, the binding sites for these factors may exhibit great sequence variability. To overcome this limitation, the authors obtain predictions of transcription factor binding sites on promoter sequences, and annotate the predicted sites with the labels of the corresponding transcription factors. They develop an algorithm—inspired in an early algorithm to align restriction enzyme maps—to align the resulting sequence of labels—the so-called TF-maps (transcription factor maps). They show that TF-map alignments are able to uncover conserved regulatory elements common to the promoter regions of co-regulated genes, but those regulatory elements cannot be detected by typical sequence alignments.
PMCID: PMC1464811  PMID: 16733547
10.  The archaeal origins of the eukaryotic translational system 
Archaea  2005;2(1):1-9.
Among the 78 eukaryotic ribosomal proteins, eleven are specific to Eukarya, 33 are common only to Archaea and Eukarya and 34 are homologous (at least in part) to those of both Bacteria and Archaea. Several other translational proteins are common only to Eukarya and Archaea (e.g., IF2a, SRP19, etc.), whereas others are shared by the three phyla (e.g., EFTu/EF1A and SRP54).
Although this and other analyses strongly support an archaeal origin for a substantial fraction of the eukaryotic translational machinery, especially the ribosomal proteins, there have been numerous unique and ubiquitous additions to the eukaryotic translational system besides the 11 unique eukaryotic ribosomal proteins. These include peptide additions to most of the 67 archaeal homolog proteins, rRNA insertions, the 5.8S RNA and the Alu extension to the SRP RNA. Our comparative analysis of these and other eukaryotic features among the three different cellular phylodomains supports the idea that an archaeal translational system was most likely incorporated by means of endosymbiosis into a host cell that was neither bacterial nor archaeal in any modern sense. Phylogenetic analyses provide support for the timing of this acquisition coinciding with an ancient bottleneck in prokaryotic diversity.
PMCID: PMC2685589  PMID: 16877317
Crenarchaea; eukaryotic origin; evolution; ribosome
11.  Constraining ribosomal RNA conformational space 
Nucleic Acids Research  2005;33(16):5106-5111.
Despite the potential for many possible secondary-structure conformations, the native sequence of ribosomal RNA (rRNA) is able to find the correct and universally conserved core fold. This study reports a computational analysis investigating two mechanisms that appear to constrain rRNA secondary-structure conformational space: ribosomal proteins and rRNA sequence composition. The analysis was carried out by using rRNA–ribosomal protein interaction data for the Escherichia coli 16S rRNA and free energy minimization software for secondary-structure prediction. The results indicate that selection pressures on rRNA sequence composition and ribosomal protein–rRNA interaction play a key role in constraining the rRNA secondary structure to a single stable form.
PMCID: PMC1214544  PMID: 16155182
12.  Functional conservation between members of an ancient duplicated transcription factor family, LSF/Grainyhead 
Nucleic Acids Research  2003;31(15):4304-4316.
The LSF/Grainyhead transcription factor family is involved in many important biological processes, including cell cycle, cell growth and development. In order to investigate the evolutionary conservation of these biological roles, we have characterized two new family members in Caenorhabditis elegans and Xenopus laevis. The C.elegans member, Ce-GRH-1, groups with the Grainyhead subfamily, while the X.laevis member, Xl-LSF, groups with the LSF subfamily. Ce-GRH-1 binds DNA in a sequence-specific manner identical to that of Drosophila melanogaster Grainyhead. In addition, Ce-GRH-1 binds to sequences upstream of the C.elegans gene encoding aromatic l-amino-acid decarboxylase and genes involved in post-embryonic development, mab-5 and dbl-1. All three C.elegans genes are homologs of D.melanogaster Grainyhead-regulated genes. RNA-mediated interference of Ce-grh-1 results in embryonic lethality in worms, accompanied by soft, defective cuticles. These phenotypes are strikingly similar to those observed previously in D.melanogaster grainyhead mutants, suggesting conservation of the developmental role of these family members over the course of evolution. Our phylogenetic analysis of the expanded LSF/GRH family (including other previously unrecognized proteins/ESTs) suggests that the structural and functional dichotomy of this family dates back more than 700 million years, i.e. to the time when the first multicellular organisms are thought to have arisen.
PMCID: PMC169928  PMID: 12888489
13.  Probabilistic prediction of Saccharomyces cerevisiae mRNA 3′-processing sites 
Nucleic Acids Research  2002;30(8):1851-1858.
We present a tool for the prediction of mRNA 3′-processing (cleavage and polyadenylation) sites in the yeast Saccharomyces cerevisiae, based on a discrete state-space model or hidden Markov model. Comparison of predicted sites with experimentally verified 3′-processing sites indicates good agreement. All predicted or known yeast genes were analyzed to find probable 3′-processing sites. Known alternative 3′-processing sites, both within the 3′-untranslated region and within the protein coding sequence were successfully identified, leading to the possibility of prediction of previously unknown alternative sites. The lack of an apparent 3′-processing site calls into question the validity of some predicted genes. This is specifically investigated for predicted genes with overlapping coding sequences.
PMCID: PMC113205  PMID: 11937640
14.  Survey of human mitochondrial diseases using new genomic/proteomic tools 
Genome Biology  2001;2(6):research0021.1-research0021.16.
We have constructed Bayesian prior-based, amino-acid sequence profiles for the complete yeast mitochondrial proteome and used them to develop methods for identifying and characterizing the context of protein mutations that give rise to human mitochondrial diseases. (Bayesian priors are conditional probabilities that allow the estimation of the likelihood of an event - such as an amino-acid substitution - on the basis of prior occurrences of similar events.) Because these profiles can assemble sets of taxonomically very diverse homologs, they enable identification of the structurally and/or functionally most critical sites in the proteins on the basis of the degree of sequence conservation. These profiles can also find distant homologs with determined three-dimensional structures that aid in the interpretation of effects of missense mutations.
This survey reports such an analysis for 15 missense mutations, one insertion and three deletions involved in Leber's hereditary optic neuropathy, Leigh syndrome, mitochondrial neurogastrointestinal encephalomyopathy, Mohr-Tranebjaerg syndrome, iron-storage disorders related to Friedreich's ataxia, and hereditary spastic paraplegia. We present structural correlations for seven of the mutations.
Of the 19 mutations analyzed, 14 involved changes in very highly conserved parts of the affected proteins. Five out of seven structural correlations provided reasonable explanations for the malfunctions. As additional genetic and structural data become available, this methodology can be extended. It has the potential for assisting in identifying new disease-related genes. Furthermore, profiles with structural homologs can generate mechanistic hypotheses concerning the underlying biochemical processes - and why they break down as a result of the mutations.
PMCID: PMC33397  PMID: 11423010

Results 1-14 (14)