A number of enhancements have been made to EcoCyc in the last three years (3
). Our efforts to curate protein function based on the published literature have been expanded considerably. We are striving for complete coverage of all genes, proteins and RNAs in the E.coli
genome by comments summarizing structural, functional and regulatory information from the scientific literature. For example, the number of proteins covered by comments and functional annotations has more than tripled since 2002 to nearly 3500 in Version 8.5 of EcoCyc (see Table and http://BioCyc.org/ecocyc/release-notes.shtml
). Recent literature providing important insights into the properties of a gene or protein is incorporated into EcoCyc as quickly as possible, and literature references are in most cases provided as hyperlinks to the PubMed record.
Summary of data content in EcoCyc (Version 8.5)
An important recent addition to the information provided by EcoCyc is a set of icons allowing the user to evaluate the evidence underlying the curator's annotations (see below); these are provided in addition to hyperlinks to the pertinent primary literature. The icons indicate whether the information is based on experimental or computational evidence or based on human inference. Clicking on the icon displays a more detailed description of the evidence based on an evidence ontology (4
), describing for example if a transcription start site was mapped by primer extension.
We are continuously incorporating new information, such as the recently released version of the E.coli
K12 MG1655 genome sequence (GenBank accession number U00096, version U00096.2, GI:48994873; June 24, 2004), which corrects a significant number of sequencing errors present in the original genome sequence released in 1997 (5
). In addition,extensive links to other databases such as Swiss-Prot and RefSeq have recently been added or updated. We also continue to curate the EcoCyc description of the E.coli
metabolic network to reflect newly discovered pathways and enzymes. The EcoCyc network of small-molecule metabolism consists of 905 reactions catalyzed by 865 enzymes encoded by 961 genes.
Our curation procedure now includes partnering with outside experts on particular cellular systems to provide a more comprehensive literature overview and up-to-date coverage of the field. Special reviewers are acknowledged on the ‘Credits’ page (http://EcoCyc.org/contributors.shtml
). Recently, this type of curation has been applied to the process of DNA repair. We have annotated both direct repair mechanisms, such as photolyase, as well as indirect repair mechanisms, such as nucleotide excision repair, base excision repair and homologous recombination. We have also curated 56 untranslated RNA species in EcoCyc.
Transport reactions in EcoCyc
The known and predicted membrane transporter complement of E.coli
is now fully described within EcoCyc. A total of 202 transport reactions are described; all have been annotated with the same detailed, literature-based approach that EcoCyc uses for enzymes and pathways. Since the last publication describing the EcoCyc database (6
), we have completed the curation of cytoplasmic membrane transporters and expanded coverage to include outer membrane channels, auxiliary transport proteins within transport systems, and protein secretion systems, such as the Sec and Tat pathways.
Transcriptional regulation in EcoCyc
Since merging with the RegulonDB (7
) database in 1998, EcoCyc has incorporated extensive information on pathways that regulate the transcription initiation step of gene expression (3
). EcoCyc's current contents on the elements supporting the regulatory network of E.coli
are summarized in Table . The information on mechanisms of regulation gathered in both EcoCyc and RegulonDB is currently the largest known network of regulatory interactions of a bacterial cell, with 2393 specific interactions of transcription initiation (promoters and binding sites for regulators). The network includes more than 1000 mapped transcription initiation sites, which are regulated by nearly 1400 binding sites for specific transcriptional regulatory factors (TFs).
The names of TFs have been standardized in a manner that describes whether a TF acts as a repressor, activator or has a dual effect. Comments on regulatory proteins have been expanded and updated. Annotations now include, among others, the active conformation of TFs with the associated signal metabolites, the evolutionary family to which they belong, and whether they are autoregulated. The anatomy of regulatory regions upstream of operons and transcription units is also encoded in the database.
Most recently, we have expanded our scope to encompass regulation at all levels of gene expression and metabolism. We have initiated this effort by focusing on the conditions of anaerobiosis and utilization of carbon sources. This approach will produce a more integrated electronic description of the collection of regulatory mechanisms present in E.coli.