All archaeal genomes encode RNA polymerase (RNAP) subunits E and F that share a common ancestry with the eukaryotic RNAP subunits A43 and A14 (Pol I), Rpb7 and Rpb4 (Pol II), and C25 and C17 (Pol III). By gene replacement, we have isolated archaeal mutants of Thermococcus kodakarensis with the subunit F-encoding gene (rpoF) deleted, but we were unable to isolate mutants lacking the subunit E-encoding gene (rpoE). Wild-type T. kodakarensis grows at temperatures ranging from 60 to 100 °C, optimally at 85°C, and the ΔrpoF cells grew at the same rate as wild-type at 70 °C, but much slower and to lower cell densities at 85 °C. The abundance of a chaperonin subunit, CpkB, was much reduced in the ΔrpoF strain growing at 85 °C and increased expression of cpkB, rpoF or rpoE integrated at a remote site in the genome, using a nutritionally-regulated promoter, improved the growth of ΔrpoF cells. RNAP preparations purified from ΔrpoF cells lacked subunit F and also subunit E and a transcription factor TFE that co-purifies with RNAP from wild-type cells, but in vitro, this mutant RNAP exhibited no discernible differences from wild-type RNAP in promoter-dependent transcription, abortive transcript synthesis, transcript elongation or termination.
Archaea; RNA polymerase subunits E and F; ΔrpoF; TFE; Thermococcus kodakarensis
Both ppGpp and pppGpp are thought to function collectively as second messengers for many complex cellular responses to nutritional stress throughout biology. There are few indications that their regulatory effects might be different; however, this question has been largely unexplored for lack of an ability to experimentally manipulate the relative abundance of ppGpp and pppGpp. Here, we achieve preferential accumulation of either ppGpp or pppGpp with Escherichia coli strains through induction of different Streptococcal (p)ppGpp synthetase fragments. In addition, expression of E. coli GppA, a pppGpp 5′-gamma phosphate hydrolase that converts pppGpp to ppGpp, is manipulated to fine tune differential accumulation of ppGpp and pppGpp. In vivo and in vitro experiments show that pppGpp is less potent than ppGpp with respect to regulation of growth rate, RNA/DNA ratios, ribosomal RNA P1 promoter transcription inhibition, threonine operon promoter activation and RpoS induction. To provide further insights into regulation by (p)ppGpp, we have also determined crystal structures of E. coli RNA polymerase-σ70 holoenzyme with ppGpp and pppGpp. We find that both nucleotides bind to a site at the interface between β′ and ω subunits.
BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research.
The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization.
We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.
BioHackathon; Open source; Software; Semantic Web; Databases; Data integration; Data visualization; Web services; Interfaces
Proteins interact with other proteins or biomolecules in complexes to perform cellular functions. Existing protein-protein interaction (PPI) databases and protein complex databases for human proteins are not organized to provide protein complex information or facilitate the discovery of novel subunits. Data integration of PPIs focused specifically on protein complexes, subunits, and their functions. Predicted candidate complexes or subunits are also important for experimental biologists.
Based on integrated PPI data and literature, we have developed a human protein complex database with a complex quality index (PCDq), which includes both known and predicted complexes and subunits. We integrated six PPI data (BIND, DIP, MINT, HPRD, IntAct, and GNP_Y2H), and predicted human protein complexes by finding densely connected regions in the PPI networks. They were curated with the literature so that missing proteins were complemented and some complexes were merged, resulting in 1,264 complexes comprising 9,268 proteins with 32,198 PPIs. The evidence level of each subunit was assigned as a categorical variable. This indicated whether it was a known subunit, and a specific function was inferable from sequence or network analysis. To summarize the categories of all the subunits in a complex, we devised a complex quality index (CQI) and assigned it to each complex. We examined the proportion of consistency of Gene Ontology (GO) terms among protein subunits of a complex. Next, we compared the expression profiles of the corresponding genes and found that many proteins in larger complexes tend to be expressed cooperatively at the transcript level. The proportion of duplicated genes in a complex was evaluated. Finally, we identified 78 hypothetical proteins that were annotated as subunits of 82 complexes, which included known complexes. Of these hypothetical proteins, after our prediction had been made, four were reported to be actual subunits of the assigned protein complexes.
We constructed a new protein complex database PCDq including both predicted and curated human protein complexes. CQI is a useful source of experimentally confirmed information about protein complexes and subunits. The predicted protein complexes can provide functional clues about hypothetical proteins. PCDq is freely available at http://h-invitational.jp/hinv/pcdq/.
H-InvDB (http://www.h-invitational.jp/) is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.
The nucleotidyl transfer reaction leading to formation of the first phosphodiester bond has been followed in real-time by Raman microscopy, as it proceeds in single crystals of the N4 phage virion RNA polymerase (RNAP). The reaction is initiated by soaking NTP substrates and divalent cations into the RNAP and promoter DNA complex crystal, where the phosphodiester bond formation is completed in about 40 minutes. This slow reaction allowed us to monitor the changes of RNAP and DNA conformations as well as bindings of substrate and metal through Raman spectra taken every 5 minutes. Recently published snapshot X-ray crystal structures along the same reaction pathway assisted the spectroscopic assignments of changes in the enzyme and DNA, while isotopically labeled NTP substrates allowed differentiation of the Raman spectra of bases in substrates and DNA. We observed that substrates are bound at 2-7 minutes after commencing soaking, the O-helix completes its conformational change, and that binding of both divalent metals required for catalysis in the active site changes the conformation of the ribose triphosphate at position +1. These are followed by a slower decrease of NTP triphosphate groups due to phosphodiester bond formation that reaches completion at about 15 minutes, and even slower complete release of the divalent metals at about 40 minutes. We have also shown that the O-helix movement can be driven by substrate binding only. The kinetics of the in crystallo nucleotidyl transfer reaction revealed in this study suggest that soaking the substrate and metal into the RNAP-DNA complex crystal for a few minutes generates novel and uncharacterized intermediates for future X-ray and spectroscopic analysis.
Results are presented supporting a regulatory role for the product of the MA3302 gene locus (designated MreA) previously annotated as a hypothetical protein in the methanogenic species Methanosarcina acetivorans of the domain Archaea. Sequence analysis of MreA revealed identity to the TrmB family of transcription factors, albeit the sequence is lacking the sensor domain analogous to TrmBL2, abundant in nonmethanogenic species of the domain Archaea. Transcription of mreA was highly upregulated during growth on acetate versus methylotrophic substrates, and an mreA deletion (ΔmreA) strain was impaired for growth with acetate in contrast to normal growth with methylotrophic substrates. Transcriptional profiling of acetate-grown cells identified 280 genes with altered expression in the ΔmreA strain versus the wild-type strain. Expression of genes unique to the acetate pathway decreased whereas expression of genes unique to methylotrophic metabolism increased in the ΔmreA strain relative to the wild type, results indicative of a dual role for MreA in either the direct or indirect activation of acetate-specific genes and repression of methylotrophic-specific genes. Gel shift experiments revealed specific binding of MreA to promoter regions of regulated genes. Homologs of MreA were identified in M. acetivorans and other Methanosarcina species for which expression patterns indicate roles in regulating methylotrophic pathways.
Species in the domain Archaea utilize basal transcription machinery resembling that of the domain Eukarya, raising questions addressing the role of numerous putative transcription factors identified in sequenced archaeal genomes. Species in the genus Methanosarcina are ideally suited for investigating principles of archaeal transcription through analysis of the capacity to utilize a diversity of substrates for growth and methanogenesis. Methanosarcina species switch pathways in response to the most energetically favorable substrate, metabolizing methylotrophic substrates in preference to acetate marked by substantial regulation of gene expression. Although conversion of the methyl group of acetate accounts for most of the methane produced in Earth’s biosphere, no proteins involved in the regulation of genes in the acetate pathway have been reported. The results presented here establish that MreA participates in the global regulation of diverse methanogenic pathways in the genus Methanosarcina. Finally, the results contribute to a broader understanding of transcriptional regulation in the domain Archaea.
To elucidate the mechanism of transcription by cellular RNA polymerases (RNAPs), high resolution X-ray crystal structures together with structure-guided biochemical, biophysical and genetics studies are essential. The recently-solved X-ray crystal structures of archaeal RNA polymerase (RNAP) allow a structural comparison of the transcription machinery among all three domains of life. The archaea were once thought of closely related to bacteria, but they are now considered to be more closely related to the eukaryote at the molecular level than bacteria. According to these structures, the archaeal transcription apparatus, which includes RNAP and general transcription factors, is similar to the eukaryotic transcription machinery. Yet, the transcription regulators, activators and repressors, encoded by archaeal genomes are closely related to bacterial factors. Therefore, archaeal transcription appears to possess an intriguing hybrid of eukaryotic-type transcription apparatus and bacterial-like regulatory mechanisms. Elucidating the transcription mechanism in archaea, which possesses a combination of bacterial and eukaryotic transcription mechanisms that are commonly regarded as separate and mutually exclusive, can provide data that will bring basic transcription mechanisms across all three domains of life.
The roles of three TATA binding protein (TBP) homologs (TBP1, TBP2, and TBP3) in the archaeon Methanosarcina acetivorans were investigated by using genetic and molecular approaches. Although tbp2 and tbp3 deletion mutants were readily obtained, a tbp1 mutant was not obtained, and the growth of a conditional tbp1 expression strain was tetracycline dependent, indicating that TBP1 is essential. Transcripts of tbp1 were 20-fold more abundant than transcripts of tbp2 and 100- to 200-fold more abundant than transcripts of tbp3, suggesting that TBP1 is the primary TBP utilized during growth. Accordingly, tbp1 is strictly conserved in the genomes of Methanosarcina species. Δtbp3 and Δtbp2 strains exhibited an extended lag phase compared with the wild type, although the lag phase for the Δtbp2 strain was less pronounced when this strain was transitioning from growth on methylotrophic substrates to growth on acetate. Acetate-adapted Δtbp3 cells exhibited growth rates, final growth yields, and lag times that were significantly reduced compared with those of the wild type when the organisms were cultured with growth-limiting concentrations of acetate, and the acetate-adapted Δtbp2 strain exhibited a final growth yield that was reduced compared with that of the wild type when the organisms were cultured with growth-limiting acetate concentrations. DNA microarray analyses identified 92 and 77 genes with altered transcription in the Δtbp2 and Δtbp3 strains, respectively, which is consistent with a role for TBP2 and TBP3 in optimizing gene expression. Together, the results suggest that TBP2 and TBP3 are required for efficient growth under conditions similar to the conditions in the native environment of M. acetivorans.
The recently solved X-ray crystal structures of archaeal RNA polymerase allows a structural comparison of the transcription machinery among all three domains of life. Archaeal transcription is very simple and all components, including the structures of general transcription factors and RNA polymerase, are highly conserved in eukaryotes. Therefore, it could be a new model for dissection of the eukaryotic transcription apparatus. The archaeal RNA polymerase structure also provides a framework for addressing the functional role that Fe–S clusters play within the transcription machinery of archaea and eukaryotes. A comparison between bacterial and archaeal open complex models reveals likely key motifs of archaeal RNA polymerase for DNA unwinding during the open complex formation.
The transcription apparatus in Archaea can be described as a simplified version of its eukaryotic RNA polymerase (RNAP) II counterpart, comprising a RNAPII-like enzyme as well as two general transcription factors, the TATA-binding protein (TBP) and the eukaryotic TFIIB ortholog TFB1,2. It has been widely understood that precise comparisons among cellular RNAP crystal structures could reveal structural elements common to all enzymes and that these insights would be useful to analyze components of each enzyme that enable it to perform domain-specific gene expression. However, the structure of archaeal RNAP has been limited to individual subunits3,4. Here, we report the first crystal structure of the archaeal RNAP from Sulfolobus solfataricus at 3.4 Å resolution, completing the suite of multi-subunit RNAP structures from all three domains of life. We also report the high resolution (at 1.76 Å) crystal structure of the D/L subcomplex of archaeal RNAP and provide the first experimental evidence of any RNAP possessing an iron-sulfur (Fe-S) cluster, which may play a structural role in a key subunit of RNAP assembly. The striking structural similarity between archaeal RNAP and eukaryotic RNAPII highlights the simpler archaeal RNAP as an ideal model system for dissecting the molecular basis of eukaryotic transcription.
Coliphage N4 virion-encapsidated RNA polymerase (vRNAP) is a member of the phage T7-like single-subunit RNA polymerase (RNAP) family. Its central domain (mini-vRNAP) contains all RNAP functions of the full-length vRNAP, which recognizes a five- to seven-base pair stem and three-nucleotide loop hairpin DNA promoter. Here we report the X-ray crystal structures of mini-vRNAP bound to promoters. Mini-vRNAP uses four structural motifs to recognize DNA sequences at the hairpin loop and stem, and to unwind DNA. Despite their low sequence similarity, three out of four motifs are shared with T7 RNAP that recognizes a double-stranded DNA promoter. The binary complex structure and results of engineered disulfide-linkage experiments reveal that the plug and motif B loop, which block the access of template DNA to the active site in the apo-form mini-vRNAP, undergo a large-scale conformational change upon promoter binding, explaining the restricted promoter specificity that is critical for N4 phage early transcription.
We report the extended database and data mining resources newly released in the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). H-InvDB is a comprehensive annotation resource of human genes and transcripts, and consists of two main views and six sub-databases. The latest release of H-InvDB (release 6.2) provides the annotation for 219 765 human transcripts in 43 159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources—‘Navigation search’, ‘H-InvDB Enrichment Analysis Tool (HEAT)’ and web service APIs. ‘Navigation search’ is an extended search system that enables complicated searches by combining 16 different search options. HEAT is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in programs, providing the users extended data accessibility.
Summary: G-compass is designed for efficient comparative genome analysis between human and other vertebrate genomes. The current version of G-compass allows us to browse two corresponding genomic regions between human and another species in parallel. One-to-one evolutionarily conserved regions (i.e. orthologous regions) between species are highlighted along the genomes. Information such as locations of duplicated regions, copy number variations and mammalian ultra-conserved elements is also provided. These features of G-compass enable us to easily determine patterns of genomic rearrangements and changes in gene orders through evolutionary time. Since G-compass is a satellite database of H-InvDB, which is a comprehensive annotation resource for human genes and transcripts, users can easily refer to manually curated functional annotations and other abundant biological information for each human transcript. G-compass is expected to be a valuable tool for comparing human and model organisms and promoting the exchange of functional information.
Availability: G-compass is freely available at http://www.h-invitational.jp/g-compass/.
Recent structures of Escherichia coli catabolite activator protein (CAP) in complex with DNA, and in complex with RNA polymerase α subunit C-terminal domain (αCTD) and DNA, have yielded insights into how CAP binds DNA and activates transcription. Comparison of multiple structures of CAP-DNA complexes has revealed contributions of direct readout and indirect readout to DNA binding by CAP. The structure of the CAP-αCTD-DNA complex has provided the first structural description of interactions between a transcription activator and its functional target within the general transcription machinery. Using the structure of the CAP-αCTD-DNA complex, the structure of an RNAP-DNA complex, and restraints from biophysical, biochemical, and genetic experiments, it has been possible to construct detailed three-dimensional models of intact Class I and Class II transcription activation complexes.
catabolite activator protein (CAP); cAMP receptor protein (CRP); RNA polymerase; σ70; promoter; DNA binding; DNA bending; transcription activation
It is essential in modern biology to understand how transcriptional regulatory regions are composed of cis-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.
We predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more likely to be CpG-rich and to be expressed ubiquitously than those that harbor Class 2 pairs. Third, the 'hub' motifs, which are used in many different motif pairs, are different between the two classes. In addition, many of the transcription factors that correspond to the Class 2 hub motifs contain domains rich in specific amino acids; these domains may form disordered regions important for protein-protein interaction.
There exist at least two classes of motif pairs with respect to TSSs in human promoters, possibly reflecting compositional differences between promoters and enhancers. We anticipate that our visualization method may be useful for the further characterisation of promoters.
Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Currently, with the rapid growth of transcriptome data of various species, more reliable orthology information is prerequisite for further studies. However, detection of orthologs could be erroneous if pairwise distance-based methods, such as reciprocal BLAST searches, are utilized. Thus, as a sub-database of H-InvDB, an integrated database of annotated human genes (http://h-invitational.jp/), we constructed a fully curated database of evolutionary features of human genes, called ‘Evola’. In the process of the ortholog detection, computational analysis based on conserved genome synteny and transcript sequence similarity was followed by manual curation by researchers examining phylogenetic trees. In total, 18 968 human genes have orthologs among 11 vertebrates (chimpanzee, mouse, cow, chicken, zebrafish, etc.), either computationally detected or manually curated orthologs. Evola provides amino acid sequence alignments and phylogenetic trees of orthologs and homologs. In ‘dN/dS view’, natural selection on genes can be analyzed between human and other species. In ‘Locus maps’, all transcript variants and their exon/intron structures can be compared among orthologous gene loci. We expect the Evola to serve as a comprehensive and reliable database to be utilized in comparative analyses for obtaining new knowledge about human genes. Evola is available at http://www.h-invitational.jp/evola/.
Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions.
We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI.
Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions.
promoter; tissue-specific gene expression; position weight matrix; regulatory motif
During quorum sensing in Vibrio fischeri, the luminescence, or lux, operon is regulated in a cell density-dependent manner by the activator LuxR in the presence of an acylated homoserine lactone autoinducer molecule [N-(3-oxohexanoyl) homoserine lactone]. LuxR, which binds to the lux operon promoter at a position centered at −42.5 relative to the transcription initiation site, is thought to function as an ambidextrous activator making multiple contacts with RNA polymerase (RNAP). The specific role of the α-subunit C-terminal domain (αCTD) of RNAP in LuxR-dependent transcriptional activation of the lux operon promoter has been investigated. The effects of 70 alanine substitution variants of the α subunit were determined in vivo by measuring the rate of transcription of the lux operon via luciferase assays in recombinant Escherichia coli. The mutant RNAPs from strains exhibiting at least twofold-increased or -decreased activity in comparison to the wild type were further examined by in vitro assays. Since full-length LuxR has not been purified, an autoinducer-independent N-terminally truncated form of LuxR, LuxRΔN, was used for in vitro studies. Single-round transcription assays were performed using reconstituted mutant RNAPs in the presence of LuxRΔN, and 14 alanine substitutions in the αCTD were identified as having negative effects on the rate of transcription from the lux operon promoter. Five of these 14 α variants were also involved in the mechanisms of both LuxR- and LuxRΔN-dependent activation in vivo. The positions of these residues lie roughly within the 265 and 287 determinants in α that have been identified through studies of the cyclic AMP receptor protein and its interactions with RNAP. This suggests a model where residues 262, 265, and 296 in α play roles in DNA recognition and residues 290 and 314 play roles in α-LuxR interactions at the lux operon promoter during quorum sensing.
The icd gene of Escherichia coli, encoding isocitrate dehydrogenase, was shown to be expressed from two different promoters: the previously identified icd P1 and a newly detected second promoter, icd P2, whose expression is positively regulated by the catabolite repressor-activator protein Cra, formerly called FruR. In each case, we determined the mRNA start site by primer extension analysis of in vivo transcripts and examined the interaction of the icd control region with either RNA polymerase or Cra. We observed that (i) the Cra factor binds to and activates transcription from a site centered at position −76.5 within the icd P2 promoter region and (ii) three particular mutations in the C-terminal end of the α subunit of RNA polymerase (L262A, R265A, and N268A) considerably diminish transcription initiating from the icd P2 promoter, as shown by in vitro experiments performed in the presence of mutant RNA polymerases carrying Ala substitutions.