Electron transfer enzymes and associated pathways in the membrane
Previously, curation of electron transfer reactions in EcoCyc was limited to brief written summaries of the gene products and protein complexes. This approach did not provide for a visual representation of the electron transfer enzymes in the membrane, nor did it indicate known or potential roles in cellular electron transfer and proton movement relative to the cell compartments. To address these issues, we have extended the Pathway Tools software that underlies EcoCyc in two respects: First, it can now visually depict electron transfer enzyme complexes and their associated balanced oxidation/reduction reactions (). Reaction displays now show enzyme membrane localization, the flow of all substrates and products, and the fate of the protons associated with the overall reactions. Second, the software can now depict electron transfer pathways that consist of coupled systems of electron transfer enzymes ().
E. coli possesses more than 25 enzymes and enzyme complexes that participate in the oxidation of primary electron donors or in the reduction of terminal electron acceptors during different cell culture conditions. The literature-based curation for approximately 15 electron transfer enzymes and enzyme complexes has been updated, and associated membrane depictions and balanced reactions are available. Electron transfer pathways have been generated and curated for 10 sets of electron donor/acceptor pairs.
An example of a membrane depiction is shown in for the E. coli enzyme NADH dehydrogenase I, encoded by the nuoABCDEFGHIJKLMN operon. Herein, the oxidation of NADH is shown to occur at the cytoplasmic face of the enzyme with electron transfer within the enzyme to the physiological electron acceptors, ubiquinone (UQ) or menaquinone (MQ).
Combining the oxidation reactions for a physiological electron donor and an acceptor yields an electron transport pathway. For example, in the NADH dehydrogenase I enzyme shown in is combined with cytochrome bo oxidase (cyoABCD) to represent the transfer of electrons from NADH to molecular oxygen (O2). Net movement of protons across the membrane by each enzyme complex provides, in part, the proton motive force (PMF) needed for ATP synthesis.
Updates to regulation of transcription initiation
Curation of transcriptional regulation is performed by the RegulonDB group at the Center for Genomic Sciences, Universidad Nacional Autónoma de México. Curation of older literature on transcriptional regulation was completed in December 2006 and since then, data from new literature is consistently added to EcoCyc shortly after publication.
After reports of differences and apparent inconsistencies between the transcriptional regulatory networks of EcoCyc and RegulonDB appeared (
3,
4), we undertook detailed curation that led to fully synchronized content and releases in both databases (
5). Other systematic curation efforts included the sigmulons of σ
54 (RpoN), σ
28 (FliA), σ
19 (FecI), σ
24 (RpoE), σ
32 (RpoH), and σ
38 (RpoS); various metabolic and motility regulons; and representations of the binding sites for the ArcA and NarL transcription factors. In addition, we have developed guidelines for transcription factor summaries to include relevant physiological data found in the literature that cannot be easily added as database objects. Many summaries have been updated according to these guidelines.
To facilitate the tracking and querying of data based on the quality of the evidence, we have classified the types of evidence used to annotate regulatory objects as ‘strong’ or ‘weak’. Strong evidence corresponds to experiments—irrespective of methodology—that provide direct physical evidence. Examples of strong evidence include the experimental mapping of transcription start sites and DNA binding of purified transcription factors. Evidence such as that from gene expression analyses that provide only indirect evidence is considered weak. Strong and weak evidence types are graphically distinguished by using solid or dashed lines for the corresponding objects (such as promoter arrows).
To expand the information about transcription regulation of
E. coli, the RegulonDB group has incorporated various new types of experimental and predicted data into EcoCyc. A collection of 259 new transcription start sites, which resulted from a high-throughput experimental modified RACE approach, was added (
6). Promoters and DNA binding sites with evidence from at least two types of high-throughput data (such as computational predictions, microarrays and ChIP-chip experiments) have been added to EcoCyc. Examples include a collection of 54 σ
32 promoters experimentally identified by ChIP-chip and by gene expression assays (
7); 45 σ
32 promoters identified by microarray analysis, transcription initiation mapping and computational analysis (
8); and 45 Fur DNA binding sites identified by computational prediction and binding of purified protein (
9).
Beyond regulation of transcription initiation
EcoCyc has included information about the regulation of both transcription initiation and enzyme activity for many years. A major new EcoCyc initiative is to expand the database schema and content to include other types of regulation, such as attenuation and regulation of translation by small RNAs (sRNAs). For example, the EcoCyc schema can now represent all six known types of regulation by attenuation of transcription, each of which involves slightly different database fields to capture aspects such as the regulatory ligand, protein and RNA regions involved. This initiative will provide both more complete information about E. coli regulation and the regulatory datasets that can be used by bioinformaticians to develop predictors for a broader diversity of regulatory interactions from genome datasets.
All known examples of ribosome-mediated attenuation in the pathways of amino acid biosynthesis have been added to EcoCyc in release 12.5. For example, shows regulation of the thrLABC operon by attenuation, which is modulated by the availability of charged isoleucyl- and threonyl-tRNA. In this example of attenuation, translation of the thrL leader peptide open reading frame influences the formation of an attenuator structure. When charged isoleucyl- and threonyl-tRNAs are abundant, unobstructed translation by the ribosome enables the formation of a secondary structure that acts as a terminator, releasing RNA polymerase and halting transcription of the operon. On the EcoCyc display, the charged tRNAs are represented as rods. Their role in modulating termination at the attenuator is indicated by their red color and the ‘X’ near the terminator structure; this shows at a glance that a charged tRNA leads to premature termination. Curation of other attenuation systems is ongoing.
An example of the representation of regulation by sRNAs is shown in . The transcription unit that encompasses the
glmUS operon is shown. Expression of this operon is regulated at the level of transcription initiation by the transcription factor NagC (
10), whose binding sites are shown as green boxes upstream of the
glmUS transcription start site. In addition, the sRNA GlmZ was recently shown to regulate translation of the second open reading frame,
glmS (
11,
12).
glmS encodes
l-glutamine:
d-fructose-6-phosphate aminotransferase, the enzyme that catalyzes the first step in the biosynthesis of UDP-
N-acetylglucosamine, which is used as the precursor for the synthesis of peptidoglycan, lipid A and the enterobacterial common antigen. Genetic experiments suggest that full-length GlmZ interacts directly with the 5′ UTR of
glmS, unmasking the ribosome binding site and thus activating translation (
11,
12). The interaction of GlmZ with the
glmUS mRNA is shown by a bar (representing GlmZ) that is connected with lines to
glmUS, suggesting base-pairing at the position indicated.
The 12.5 release of EcoCyc contains 19 examples of attenuation and 15 examples of regulation by mechanisms other than transcription initiation, attenuation, or regulation of enzyme activity. We are actively expanding both the curation of the preceding regulatory mechanisms and the ability of the Pathway Tools software to handle additional regulatory mechanisms.
Annotation of EcoCyc gene products with Gene Ontology terms
Gene Ontology (GO) is an accepted standard for ontological annotation of gene products (
www.GeneOntology.org). The EcoCyc project has been annotating
E. coli genes with GO terms for the past two years. Overall, the more than 38 000 GO terms present in EcoCyc have been derived from four sources: (i) GO terms were inferred from a mapping from the original MultiFun (
13) ontology annotations within EcoCyc to GO terms; (ii) GO terms were inferred from a mapping from the Enzyme Commission (EC) numbers present within EcoCyc to GO terms; (iii) GO term assignments are manually curated by EcoCyc curators on an ongoing basis; and (iv) many GO terms were imported into EcoCyc from UniProt. EcoCyc and the EcoliWiki project (
www.EcoliWiki.net) are jointly producing an official data file of
E. coli GO terms that we regularly submit to the GO project, and that is available from the GO Web site at
http://www.geneontology.org/GO.current.annotations.shtml.
GO terms are found on EcoCyc gene and gene product pages and provide a useful way of finding all E. coli genes with a common function. For example, rsmD encodes an rRNA methyltransferase and is annotated with the GO process term for rRNA methylation, GO:0031167. Clicking that GO term navigates the user to a page that both provides the definition of that GO term and lists all other gene products within EcoCyc that have been annotated with that GO term. The GO term annotations within EcoCyc should be considered incomplete, as manual curation of GO terms is ongoing.
Updates to metabolic pathways
Although EcoCyc has now expanded far beyond its initial role, EcoCyc began as a database of E. coli metabolism, primarily describing metabolic enzymes and pathways. Therefore, annotations for many metabolic enzymes are among the oldest entries in EcoCyc. During the past decade, significant progress has been made in understanding E. coli metabolic pathways and their enzymes. Therefore, we have begun to systematically re-annotate these pathways; in release 12.5, 41 pathways that were entered into EcoCyc more than ten years ago, as well as 19 more recently added pathways, have been updated. As part of this effort, the curation of more than 180 metabolic enzymes has already been updated to reflect the latest state of knowledge.