Nucleosomes, the basic repeat units of eukaryotic chromatin, have been suggested to influence the evolution of eukaryotic genomes, both by altering the propensity of DNA to mutate and by selection acting to maintain or exclude nucleosomes in particular locations. Contrary to the popular idea that nucleosomes are unique to eukaryotes, histone proteins have also been discovered in some archaeal genomes. Archaeal nucleosomes, however, are quite unlike their eukaryotic counterparts in many respects, including their assembly into tetramers (rather than octamers) from histone proteins that lack N- and C-terminal tails. Here, we show that despite these fundamental differences the association between nucleosome footprints and sequence evolution is strikingly conserved between humans and the model archaeon Haloferax volcanii. In light of this finding we examine whether selection or mutation can explain concordant substitution patterns in the two kingdoms. Unexpectedly, we find that neither the mutation nor the selection model are sufficient to explain the observed association between nucleosomes and sequence divergence. Instead, we demonstrate that nucleosome-associated substitution patterns are more consistent with a third model where sequence divergence results in frequent repositioning of nucleosomes during evolution. Indeed, we show that nucleosome repositioning is both necessary and largely sufficient to explain the association between current nucleosome positions and biased substitution patterns. This finding highlights the importance of considering the direction of causality between genetic and epigenetic change.
Genome sequences as well as epigenetic states, such as DNA methylation or nucleosome binding patterns, change during evolution. But what is the causal relationship between the two? We already know that nucleotide variation within and between species is distributed unevenly around nucleosome footprints, but does this mean that sequence evolution follows a biased course because the presence of nucleosomes affects mutation and DNA repair dynamics? Or is it, in fact, the other way around, i.e. changes happen at the DNA level and prompt shifts in nucleosome positioning? To investigate the direction of causality in genetic versus epigenetic evolution, we analyze substitutions patterns in eukaryotes as well as the archaeon Haloferax volcanii in the context of genome-wide nucleosome binding maps. We demonstrate that the relationship between nucleosome positions and between-species divergence patterns, strikingly similar in eukaryotes and archaea, can be explained in large parts by nucleosomes shifting positions in response to substitution, although both mutation and selection biases might still exist. Our results illustrate that it is important to consider the direction of causality between epigenetic and genetic change when analyzing patterns of sequence divergence and using sequence conservation to infer selection on epigenetic states.
A high-fat diet accompanied by hypertriglyceridemia increases an individual’s risk for developing atherosclerosis. An early event in this process is monocyte recruitment through binding to VCAM-1 upregulated on inflamed arterial endothelium. Diets high in polyunsaturated fatty acids (PUFAs) may provide athero-protection by ameliorating this effect.
We investigated the acute regulation of VCAM-1 expression in human aortic endothelial cells (HAEC) in response to triglyceride-rich lipoproteins (TGRL) isolated from subjects following consumption of a high-fat meal.
Methods and Results
Postprandial TGRL isolated from 38 subjects were categorized as pro- or anti-atherogenic according to their capacity to alter the inflammatory response of HAEC. Pro-atherogenic TGRL increased expression of VCAM-1, ICAM-1, and E-selectin by ~20% compared to stimulation with TNFα alone, while anti-atherogenic TGRL decreased VCAM-1 expression by ~20% while still upregulating ICAM-1. The relative atherogenicity of TGRL positively correlated with particle density of TG, ApoCIII, ApoE, and cholesterol. Ω3-PUFA mimicked the effect of anti-atherogenic TGRL by down-regulating VCAM-1 expression. TGRL exerted this differential regulation of VCAM-1 by reciprocally modulating expression and activity of the transcription factor IRF-1 and expression of microRNA 126 (miR-126). Overexpression or silencing of IRF-1 or miR-126 expression recapitulated the pro- or anti-atherogenic regulation of VCAM-1.
In response to a high-fat meal, TGRL bias the inflammatory response of endothelium via transcriptional and post-transcriptional editing of VCAM-1. Subjects with an anti-inflammatory response to a high-fat meal produced TGRL that was enriched in non-esterified fatty acids, decreased IRF-1 expression, increased miR-126 activity, and diminished monocyte arrest.
Hypertriglyceridemia; triglyceride; fatty acid; atherosclerosis; endothelial dysfunction
In bacteriorhodopsin, the order of molecular events that control the cytoplasmic or extracellular accessibility of the Schiff bases (SB) are not well understood. We use molecular dynamics simulations to study a process involved in the second accessibility switch of SB that occurs after its reprotonation in the N intermediate of the photocycle. We find that once protonated, the SB C15 = NZ bond switches from a cytoplasmic facing (13-cis, 15-anti) configuration to an extracellular facing (13-cis, 15-syn) configuration on the pico to nanosecond timescale. Significantly, rotation about the retinal’s C13 = C14 double bond is not observed. The dynamics of the isomeric state transitions of the protonated SB are strongly influenced by the surrounding charges and dielectric effects of other buried ions, particularly D96 and D212. Our simulations indicate that the thermal isomerization of retinal from 13-cis back to all-trans likely occurs independently from and after the SB C15 = NZ rotation in the N-to-O transition.
Cross-species comparisons of gene neighborhoods (also called genomic contexts) in microbes may provide insight into determining functionally related or co-regulated sets of genes, suggest annotations of previously un-annotated genes, and help to identify horizontal gene transfer events across microbial species. Existing tools to investigate genomic contexts, however, lack features for dynamically comparing and exploring genomic regions from multiple species. As DNA sequencing technologies improve and the number of whole sequenced microbial genomes increases, a user-friendly genome context comparison platform designed for use by a broad range of users promises to satisfy a growing need in the biological community.
Here we present JContextExplorer: a tool that organizes genomic contexts into branching diagrams. We implement several alternative context-comparison and tree rendering algorithms, and allow for easy transitioning between different clustering algorithms. To facilitate genomic context analysis, our tool implements GUI features, such as text search filtering, point-and-click interrogation of individual contexts, and genomic visualization via a multi-genome browser. We demonstrate a use case of our tool by attempting to resolve annotation ambiguities between two highly homologous yet functionally distinct genes in a set of 22 alpha and gamma proteobacteria.
JContextExplorer should enable a broad range of users to analyze and explore genomic contexts. The program has been tested on Windows, Mac, and Linux operating systems, and is implemented both as an executable JAR file and java WebStart. Program executables, source code, and documentation is available at http://www.bme.ucdavis.edu/facciotti/resources_data/software/.
Genomic context; Genomic neighborhood; Comparative genomics; Java; GUI
Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research.
We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature.
Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at
Motif; Monte Carlo; ChIP-seq; ChIP-chip; Comparative genomics; MEME; STAMP; TFB
Remarkable advances in DNA sequencing technology have created a need for de novo genome assembly methods tailored to work with the new sequencing data types. Many such methods have been published in recent years, but assembling raw sequence data to obtain a draft genome has remained a complex, multi-step process, involving several stages of sequence data cleaning, error correction, assembly, and quality control. Successful application of these steps usually requires intimate knowledge of a diverse set of algorithms and software. We present an assembly pipeline called A5 (Andrew And Aaron's Awesome Assembly pipeline) that simplifies the entire genome assembly process by automating these stages, by integrating several previously published algorithms with new algorithms for quality control and automated assembly parameter selection. We demonstrate that A5 can produce assemblies of quality comparable to a leading assembly algorithm, SOAPdenovo, without any prior knowledge of the particular genome being assembled and without the extensive parameter tuning required by the other assembly algorithm. In particular, the assemblies produced by A5 exhibit 50% or more reduction in broken protein coding sequences relative to SOAPdenovo assemblies. The A5 pipeline can also assemble Illumina sequence data from libraries constructed by the Nextera (transposon-catalyzed) protocol, which have markedly different characteristics to mechanically sheared libraries. Finally, A5 has modest compute requirements, and can assemble a typical bacterial genome on current desktop or laptop computer hardware in under two hours, depending on depth of coverage.
We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology.
Deciphering the structure of gene regulatory networks across the tree of life remains one of the major challenges in postgenomic biology. We present a novel ChIP-seq workflow for the archaea using the model organism Halobacterium salinarum sp. NRC-1 and demonstrate its application for mapping the genome-wide binding sites of natively expressed transcription factors. This end-to-end pipeline is the first protocol for ChIP-seq in archaea, with methods and tools for each stage from gene tagging to data analysis and biological discovery. Genome-wide binding sites for transcription factors with many binding sites (TfbD) are identified with sensitivity, while retaining specificity in the identification the smaller regulons (bacteriorhodopsin-activator protein). Chromosomal tagging of target proteins with a compact epitope facilitates a standardized and cost-effective workflow that is compatible with high-throughput immunoprecipitation of natively expressed transcription factors. The Pique package, an open-source bioinformatics method, is presented for identification of binding events. Relative to ChIP-Chip and qPCR, this workflow offers a robust catalog of protein–DNA binding events with improved spatial resolution and significantly decreased cost. While this study focuses on the application of ChIP-seq in H. salinarum sp. NRC-1, our workflow can also be adapted for use in other archaea and bacteria with basic genetic tools.
It is becoming clear that the regulation of gas vesicle biogenesis in Halobacterium salinarum NRC-1 is multifaceted and appears to integrate environmental and metabolic cues at both the transcriptional and posttranscriptional levels. The mechanistic details underlying this process, however, remain unclear. In this manuscript, we quantify the contribution of light scattering made by both intracellular and released gas vesicles isolated from Halobacterium salinarum NRC-1, demonstrating that each form can lead to distinct features in growth curves determined by optical density measured at 600 nm (OD600). In the course of the study, we also demonstrate the sensitivity of gas vesicle accumulation in Halobacterium salinarum NRC-1 on small differences in growth conditions and reevaluate published works in the context of our results to present a hypothesis regarding the roles of the general transcription factor tbpD and the TCA cycle enzyme aconitase on the regulation of gas vesicle biogenesis.
Summary: High-throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high-quality reference genome, and measuring assembly accuracy using some metrics.
We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high-quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.
Availability: GPL source code and a usage tutorial is at http://ngopt.googlecode.com
Supplementary information: Supplementary data is available at Bioinformatics online.
Next-generation DNA sequencing coupled with chromatin immunoprecipitation (ChIP-seq) is revolutionizing our ability to interrogate whole genome protein-DNA interactions. Identification of protein binding sites from ChIP-seq data has required novel computational tools, distinct from those used for the analysis of ChIP-Chip experiments. The growing popularity of ChIP-seq spurred the development of many different analytical programs (at last count, we noted 31 open source methods), each with some purported advantage. Given that the literature is dense and empirical benchmarking challenging, selecting an appropriate method for ChIP-seq analysis has become a daunting task. Herein we compare the performance of eleven different peak calling programs on common empirical, transcription factor datasets and measure their sensitivity, accuracy and usability. Our analysis provides an unbiased critical assessment of available technologies, and should assist researchers in choosing a suitable tool for handling ChIP-seq data.
Rapidly characterizing the operational interrelationships among all genes in a given organism is a critical bottleneck to significantly advancing our understanding of thousands of newly sequenced microbial and eukaryotic species. While evolving technologies for global profiling of transcripts, proteins, and metabolites are making it possible to comprehensively survey cellular physiology in newly sequenced organisms, these experimental techniques have not kept pace with sequencing efforts. Compounding these technological challenges is the fact that individual experiments typically only stimulate relatively small-scale cellular responses, thus requiring numerous expensive experiments to survey the operational relationships among nearly all genetic elements. Therefore, a relatively quick and inexpensive strategy for observing changes in large fractions of the genetic elements is highly desirable.
We have discovered in the model organism Halobacterium salinarum NRC-1 that batch culturing in complex medium stimulates meaningful changes in the expression of approximately two thirds of all genes. While the majority of these changes occur during transition from rapid exponential growth to the stationary phase, several transient physiological states were detected beyond what has been previously observed. In sum, integrated analysis of transcript and metabolite changes has helped uncover growth phase-associated physiologies, operational interrelationships among two thirds of all genes, specialized functions for gene family members, waves of transcription factor activities, and growth phase associated cell morphology control.
Simple laboratory culturing in complex medium can be enormously informative regarding the activities of and interrelationships among a large fraction of all genes in an organism. This also yields important baseline physiological context for designing specific perturbation experiments at different phases of growth. The integration of such growth and perturbation studies with measurements of associated environmental factor changes is a practical and economical route for the elucidation of comprehensive systems-level models of biological systems.
The relatively small numbers of proteins and fewer possible posttranslational modifications in microbes provides a unique opportunity to comprehensively characterize their dynamic proteomes. We have constructed a Peptide Atlas (PA) for 62.7% of the predicted proteome of the extremely halophilic archaeon Halobacterium salinarum NRC-1 by compiling approximately 636,000 tandem mass spectra from 497 mass spectrometry runs in 88 experiments. Analysis of the PA with respect to biophysical properties of constituent peptides, functional properties of parent proteins of detected peptides, and performance of different mass spectrometry approaches has helped highlight plausible strategies for improving proteome coverage and selecting signature peptides for targeted proteomics. Notably, discovery of a significant correlation between absolute abundances of mRNAs and proteins has helped identify low abundance of proteins as the major limitation in peptide detection. Furthermore we have discovered that iTRAQ labeling for quantitative proteomic analysis introduces a significant bias in peptide detection by mass spectrometry. Therefore, despite identifying at least one proteotypic peptide for almost all proteins in the PA, a context-dependent selection of proteotypic peptides appears to be the most effective approach for targeted proteomics.
Peptide Atlas; Halobacterium; iTRAQ; bioinformatics; archaea; proteomics
Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of ∼64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein–DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3′ ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes—events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.
archaea; ChIP–chip; non-coding RNA; tiling array; transcription
The Inferelator, a method for deriving genome-wide transcriptional regulatory interactions, successfully predicted global expression in Halobacterium under novel perturbations.
We present a method (the Inferelator) for deriving genome-wide transcriptional regulatory interactions, and apply the method to predict a large portion of the regulatory network of the archaeon Halobacterium NRC-1. The Inferelator uses regression and variable selection to identify transcriptional influences on genes based on the integration of genome annotation and expression data. The learned network successfully predicted Halobacterium's global expression under novel perturbations with predictive power similar to that seen over training data. Several specific regulatory predictions were experimentally tested and verified.