REPAIRtoire is a relational database that links together the aforementioned data sets, which can be queried via five menus, ‘DNA DAMAGE’, ‘PATHWAYS’, ‘PROTEINS’, ‘DISEASES’ and ‘PUBLICATIONS’ ().
DAMAGE: We collected information about 85 different types of damage in the DNA (as of October 14, 2010). Many of them describe general classes of damage events such as single-strand breaks or base loss that are independent of the local sequence). About 60 chemical compounds that cause DNA damage were connected to the according types of damage. Of all lesions, 36 could be connected to a single molecular structure (e.g. point mutations and nucleotide modifications such as 1-hydroxypropyl-adenine). Each type of damage is described on its own sub-page, which includes information about the potential source (e.g. spontaneous formation, intermediate in some DNA repair process, etc.), proteins that may recognize its presence in the DNA, keywords that facilitate analyzing its context, and literature links. For 36 types of damage with unique chemical structures, REPAIRtoire displays the structure in 1D (using a SMILE code), 2D and 3D (with the JMol JAVA applet), and provides atomic coordinates for download in the .mol format.
PATHWAYS: This menu provides access to data about eight pathways (DDS, DDR, BER, NER, MMR, HHR, NHEJ, TLS) from three model organisms:
Escherichia coli,
Saccharomyces cerevisiae and
Homo sapiens. These pathways are represented as graphs visualized with PyGraphviz (
http://networkx.lanl.gov/pygraphviz/), in which the nodes represent DNA states, and the edges represent the reactions between them, e.g. enzymatic reactions. All edges of the (sub)graph, i.e. arrows that connect the images, are hyperlinked to static ‘reaction’ windows comprising one or more panels that display basic information about the selected reaction. All nodes of the graph, e.g. DNA–protein complexes at various stages of the repair process, are hyperlinked to static windows comprising detailed information about the given stage of DNA repair. All protein components of each state/complex are also hyperlinked to individual protein pages.
PROTEINS: As of October 14, 2010, REPAIRtoire stores information about 69, 78 and 154 proteins from
E.
coli,
S.
cerevisiae and H.
sapiens, respectively, and their genes that can be assessed either via pathways or directly from the `PROTEINS’ menu. In the process of manual data curation, we collected the available information concerning alternative gene and protein names and amino acid sequences available in the NCBI databases (
22), 3D structures available in the protein data bank (
23) and various features available in other databases (e.g. information about the enzymatic function, presence of isoforms, cellular/tissue and subcellular localization, together with links to the relevant database entries). Currently the DNA repair protein data set encompasses only
E. coli,
S. cerevisiae and
H. sapiens, but will be expanded in the future and may eventually comprise all orthologs of the functionally characterized enzymes identifiable in fully sequenced genomes.
DISEASES: Thus far, we have compiled information about 40 diseases caused by the mutations in 32 genes linked to defects in DNA repair proteins. This data set is presented as a table with hyperlinks to the proteins concerned, and to the relevant entries in the Online Mendelian Inheritance in Man (OMIM) database (
22). Each disease has its own subpage with a succinct description and additional links (e.g. keywords). Reciprocal links to diseases are also available in each protein field.
KEYWORDS: This menu provides quick access to the most common keywords used to annotate the database entities according to biological processes and activities such as: DNA repair pathways, the response to DNA damage, the cell cycle checkpoint control, the DNA N-glycosylase, the DNA N-glycosylase/AP-lyase, etc.
PUBLICATIONS: Literature references to entries in the PubMed database (
22) have been compiled into an additional data set, currently comprising 2613 positions.