RNA editing refers to a molecular process by which the sequence of a transcribed RNA is modified. This has been seen to occur in the mitochondria of several eukaryotic taxa, such as plants (9
) and trypanosomes (10
), and in chloroplasts (11
). At the level of basic changes, examples exist in the database of sequences being modified by the substitution of one residue for another, by deletion of residues, and by the addition of residues, usually uracil.
The RNA editing interface in GOBASE is based primarily on the previously existing RNA query page, with the addition of editing-specific selection parameters such as the type of modification (insertion, deletion or substitution). A query result is shown in . In addition to the sequence itself, edited positions are displayed, both as a list specifying the exact change made at each position, and marked in red on an alignment of the relevant sections of sequence for a straightforward and intuitive visual representation. The interface displays only the regions of the sequence where editing occurs. Coding and intronic regions of the sequence are distinguished by background color. Complete unedited and edited sequences can be downloaded from the interface page. Future development will include the possibility of downloading the sequence alignment as displayed, and the addition of multiple rows to the alignment in cases where edits to a sequence are known to occur sequentially, so that observed intermediate stages in the editing process can be represented.
Figure 1. RNA editing result page, showing sequence-specific data, location of edited positions and alignment of gene sequence with edited sequence. Hyperlinks lead to database pages for details of appropriate Gene Product, Taxonomy, Sequence and Gene, and to the (more ...)
The Human Sequence query page allows the user to select a set of human mitochondrial sequences based on haplogroup and disease state. More than 450 different haplogroup assignments are available in GOBASE, so a full list might become unwieldy for some queries. As haplogroup designators always start with a letter, the user is offered the option of first selecting an initial letter or letters, and then picking a range of individual haplogroups from the corresponding subset of haplogroup assignments shown in a menu. The results page () provides relevant information from the standard GOBASE Sequence page, and also shows all the positions at which this sequence differs from the reference human mitochondrial genome as defined in GenBank (accession no NC_001807) using an alignment. On this alignment, mutations that have been associated with disease are marked in yellow, and other polymorphic mutations are indicated in red.
Human sequence result page, showing the difference between the queried sequence and the reference human mitochondrial genome sequence, both as a list of divergent positions and as an alignment of relevant sections of the sequences.
The Human Mutation query page (a) allows the user to search the dataset for mutations of interest within a specified range of positions on the human mitochondrial genome sequence, either by specifying start and end positions directly or by selecting one or more genes from a list on the interface. This search returns a list of positions at which mutations are documented. For each mutation (b), the result page provides data on its disease associations, a section of the reference sequence showing the location and neighborhood of the mutation, and a list of the sequences in GOBASE containing this mutation.
(a) Human mutation query page, allowing the user to select the gene(s) of interest and specify the range of positions on the sequence to search for mutations. (b) Result page showing details for an individual mutation.
Other functional enhancements
The DNA sequence download functionality has been modified to allow the user to download either genomic sequence or gene-coding regions, selectable via buttons from the Gene query page. There are a small number of unusual cases, such as trans-spliced genes, where there is no straightforward correspondence between a single gene and a contiguous linear region of the source sequence record. The GOBASE database structure has now been modified to address these cases transparently. Sequences of complex gene-coding regions are assembled in advance, stored and made available in query results through the same interface as conventional linear genes.
All sequences retrieved from GOBASE now come with detailed literature references derived from the source GenBank records. Journal, author and title are provided, and a direct link to the appropriate PubMed entry if one exists.
Because of practical constraints, any given query in GOBASE returns at most 5000 results. Users wishing to execute custom queries retrieving larger amounts of data are invited to contact the GOBASE team at gobase/at/bch.umontreal.ca so that the query can be run directly on the database via SQL.