In the future we anticipate that Apollo will be available to any researcher who is interested in annotating particular areas of biological interest. For this to be possible, researchers must be able to readily configure and customize Apollo as needed, and to access data from remote database sites as well as incorporate local information. This section discusses the different means of customizing and extending Apollo so that it is optimized for a particular environment and can meet the requirements of individual researchers. The first part of the section simply describes how to configure Apollo and personalize it. The second part briefly describes how one might add an adaptor to interact directly with an internal database server, a personal file format, or an external source over the network. The third part focuses on how to add a class to draw features in a new shape or style. The final part is an overview describing the addition of entirely new views of the data in independent frames. To present these topics an understanding of Apollo's Java class hierarchy is needed.
Apollo has three main packages: external data adaptors; data models; and the graphical user interfaces. In addition, there are some auxiliary packages from external sources and minor additional packages within Apollo: seq, analysis, util and io. The seq package contains utilities for handling FASTA files. The analysis package will allow various analyses of the sequence and the annotations to be carried out by direct command of the user. The util package provides general utilities, such as sequence translation and simple statistical functions. Similarly, the io package handles basic input and output functions.
The data models fall roughly into two categories: sequence models and feature models (Figure ). In a sense, feature models may be viewed as a superclass of sequences, because every feature in fact defines a sequence, but within Apollo they are treated as distinct and separable classes.
Figure 11 Categories of data models. The Apollo data models fall into one of two broad categories (or are a descriptive auxiliary class, for example, Comment): (a) a location on a sequence; or (b) a sequence. The corresponding Java superclasses are Range and AbstractSequence, (more ...)
Sequences have obvious attributes, including the residues themselves, a name, a unique identifier, length, database cross-references and a description, but internally, the abstract sequences are much more complex. It is impracticable for an application to load an entire annotated chromosome into memory, because it is far too large. Therefore, the actual sequence residues are only retrieved and loaded when the magnification level of the view is high enough to limit the required length of sequence to a tractable amount. The logic to determine when it is appropriate to retrieve more sequence (or replace the current sequence) is hidden within the sequence classes, and thus Apollo's performance and memory footprint is kept within reasonable bounds.
The simplest feature class is reduced to a range on a sequence consisting of just four attributes: a low end for the range; a high end for the range; the sequence that this range falls on; and a strand. The features further split into three subclasses: genomic ranges; results; and annotations. Genomic ranges include a chromosome and organism, and are used to support comparisons of syntenic genomic regions within Apollo. Results support alignments for viewing pairs of ranges along two similar sequences and contain information describing how the analysis was carried out. Annotations reference the results as evidence, but also contain information supplied by the individual carrying out the editing, such as their name, their comments, the date of each edit, and processing flags for feedback to the computational pipeline.
Installation and configuration
Apollo runs under the Windows, Mac OS X, Linux and Solaris operating systems and requires Java 1.2 or 1.3. In our memory tests with a 300 kilobase (kb) sequence with several thousand features, the memory heap size hovered around 100 Mbytes (MB). Therefore Apollo will run on a machine with 128 MB of available memory, but it will do better with 256 MB.
Apollo is highly configurable in terms of appearance, connectivity and extensions. Users can specify the color, graphical representation and placement of any feature (both annotations and computational results). The configuration file describes both the tiers and the types of data that are to be displayed. The file describes each tier by a label (for example 'Prediction'), a visible flag (either true or false), and a layout flag (if true then all data in this tier is arranged to avoid overlap, that is, the tier is expanded). The file also describes for each feature type: the tier it is to appear in; a label for that feature type; a color; the list of attributes to appear in the tabular report; initial score thresholds; and the drawable class to be used to display this feature. URLs for connecting features to outside data sources are also configurable. Note that the separation of tiers and types allows multiple feature types to be displayed in the same tier. For example the results of all gene-prediction programs can be shown in a single 'Prediction' tier. The aim is provide as much flexibility as possible in the presentation of the data. The information included in this file is incorporated when Apollo data are loaded.
The user can change some of the 'style' attributes of a feature type - for example, its color and tier position - interactively. A 'types' panel pops up as a separate window and shows all the tier names and colors. Right-clicking on a type lets the user adjust the color used for displaying features of that type; middle clicking allows the score threshold to be adjusted (features with scores below the threshold are not shown in the display). The user can adjust the order of tiers by selecting and dragging tiers with the mouse and the control key. The new tier options can then be saved to a tiers file, allowing each user to store personal style preferences.
External data adaptors: connecting Apollo to a database
Apollo is populated with data via a set of adaptors. Typically (for data coming from an external data source) these consist of two Java classes: a dialog box and the adaptor itself. The dialog box allows the user to tell the system what sequence they wish to examine. The adaptor provides a method for translating between the structure of the external data and the internal Apollo data models.
The existing adaptors allow data to be loaded from the Ensembl CGI server, GAME XML files, direct connections to either the Ensembl or Gadfly databases, DAS servers [49
], and the Ensembl variant of GFF. Once edited the data may be stored as GAME XML files, GFF files, or GenBank table format [50
]. We recognize a need for additional adaptors to read other variants of GFF (particularly that generated by Artemis), and GenBank and EMBL flat-file entries.
The class Apollo.gui.DataLoader manages the fetching and recording of annotation session data, which are held in the CurationSet data model. This class is instantiated whenever a new request is received for an additional feature or sequence data from an external source. The DataLoader class is quite straightforward; it provides two publicly available methods for loading and saving the data utilized during an annotation session. If a specific data adaptor is not provided among the arguments in the initial request, then the DataLoader will bring up a dialog box (org.bdgp.swing.widget.DataAdapterChooser) to allow the user to select an adaptor for an external data source (or sink). When the DataLoader instantiates a DataAdapterChooser it provides it with an org.bdgp.io.DataAdapterRegistry. This DataAdapterRegistry is populated with the list of all available adaptors from the configuration file and is initialized during Apollo's startup. These adaptors are one of the two pieces of software that must be provided to add a new external adaptor.
A data adaptor extends the org.bdgp.io-AbstractDataAdapter class and implements the following methods of the ApolloDataAdapterI interface:
GetUI: The DataAdapterChooser requires an interface to interact with the user and collect any necessary information. The DataAdapterChooser obtains this interface via this DataAdapter method, including as an argument the type of input/output operation (for example, IOOperation.READ or IOOperation.WRITE). This user interface is the other class (in addition to the DataAdapter itself) that the software developer must provide to add an external adaptor.
GetCurationSet: his returns a CurationSet model (from which all annotations and features can be accessed). A CurationSet is the hub for all sequence and feature models contemplated in an annotation session.
CommitChanges:The argument passed to this is a CurationSet. The purpose is to record the curator's annotation in a persistent form, whether this is a file or a database.
GetStateInformation: Following a load this method captures the current query parameters so that they may be saved and then restored in subsequent sessions.
setStateInformation: This method restores the parameters needed to access a CurationSet that was under examination in an earlier session. It is an alternate to explicitly setting load parameters using a dialog.
There are other required methods as well, but those described above provide the essential functionality for a DataAdapter.
The explicit DataAdapterUI that must be written provides an implementation of the org.bdgp.swingAbstractIntDataAdapUI interface. It provides the DataAdapter with any user information that is entered into the dialog window. The key method is doOperation, which performs the data transformation by delegating it to its partner data adaptor. The input and outputs to this method are entirely dependent on the operation the data adaptor is supposed to perform.
In summary, the mechanics of adding a new external data source are straightforward. This allows a developer to focus solely on writing the software to handle the semantic mapping between their own data representation and the Apollo data models.
Adding new Drawables
Several graphical elements are already available for representing features (Figure ). Developers can incorporate new Drawable classes by simply extending the existing base classes and implementing relevant methods. The following describes some of the methods that can be overridden.
DrawSelected, drawUnselected, or simply draw. These methods actually paint the graphics for the primary representation of the sequence.
AddHighlights can be used to draw graphical indications that a feature shares start or end positions with other features on the sequence.
AddDecorations is used to draw any additional characteristics, for example the sites of start and stop codons within the feature.
A single feature type in fact requires two Drawable classes, one for a set of those features and another for an individual feature. This is useful for drawing gene structures because the draw method for the set can be used to draw a single line to illustrate the introns, whereas the draw method for the individual exons can be used to draw the rectangles to illustrate the exons. Once a developer has written the Drawable software it can be added by including the name of the class in the configuration file.
Adding new views
This is undoubtedly the most ambitious way of extending Apollo. A developer must understand how events are passed between windows because it is these events that indicate some change has occurred in the state of the shared underlying data models. While there are many other steps that are involved in adding a new view, this section will focus on crucial elements of event handling inside Apollo.
Windows that are interested in receiving a particular type of event must implement the appropriate listener interface for that event. When the listening window is instantiated it is provided with the central Controller object and must add itself to the global list of listeners held in the Controller object by calling the Controller's addListener method with itself as the only argument. When the Controller receives notification of a particular event it notifies every listener on this list.
Events themselves are provided to the central event Controller. The Controller implements an individual handler method for every type of event within Apollo. These Controller handler methods notify all registered Listeners that an Event of that type has occurred. Some examples of the current list of events for broadcasting changes involving feature and sequence selection, movement to a different sequence position, changes to feature content or appearance, window operations, and the availability of new data are briefly described below.
Feature selection: This is the most basic of events and indicates that one or more of the sequence features has been selected, usually by a mouse click, but also from the text search dialog, menus and scrolling lists.
Base focus: This event is used to center a display of the genomic sequence on a particular base. This may be used to position a display on a feature, or because a user has typed in a particular position, or because a given sequence string occurs at that position.
Region change: This event indicates that a completely new region of the genome is replacing the current sequence.
Annotation change: The AnnotationEditor is the sole generator of these events. To track changes to these important features and ensure consistency this task is highly centralized. Thus, for example, when editing from the synteny view, the modification requests will be relayed to the AnnotationEditor. While the user interacts with the synteny display, the non-graphical AnnotationEditor class directly manages the edits. The AnnotationEditor carries out the additions, deletions and changes to start and end positions for exons, transcripts, genes and any other annotation features. The Controller then passes these events on to the listeners so that the annotations can be redrawn accordingly.
Result change: This event is used only under unusual circumstances. Features are occasionally placed on the incorrect strand because of problems the alignment software has in identifying the correct strand (EST sequences that do not contain any splice junctions are the common source of this problem). In these cases we allow the curator to move the computed feature from one strand to the other so that it may be used during annotation.
Appearance change: This event is used to issue a request for a change in the appearance of a feature, for example, if the chosen color is altered.