OnTheFly is a service to automatically annotate document files such as Microsoft Word, Excel, Power Point, PDF or plain text files. After submitting the files to the service, the system returns a tagged HTML version of the documents. Gene, protein and chemical names are highlighted and by clicking on them the user activates a pop-up window which contains relevant information about the entity. The presented information includes domains, sequence, organism, sub-cellular localization for proteins, formula for chemicals and protein–chemical and chemical–chemical interactions for both entity types. This functionality is provided by the Reflect server (http://reflect.ws
OnTheFly can furthermore generate interaction networks for a set of bioentities (genes, proteins, chemicals) extracted from the STITCH database (Kuhn et al.
). The user can select the preferred organism whose protein aliases will be used for the tagging and network generation; the default organism is set to Homo sapiens
. The size of the network and the number of interactors per recognized entity can be manually defined by the user. The network generation is not restricted to one document but can be applied to a set of documents simultaneously.
Lists, summarizing the identified bioentities are also generated. These lists contain the ID of the bioentities together with the organism and description. These summary results contain information about bioentities found in the set of the selected files.
The performance of the service can be assessed in a number of ways, such as the quality of the document conversion, the time required to tag a document and the accuracy of the annotation. The used file converters are able to maintain most of the layout of the documents, including column separation, tables and figures. The time to process a full text article of about 15 pages with images and tables ranges typically between 15 to 20 s. This time includes the whole process including the communication with the server.
The name tagging performance of the Reflect server is comparable to other available methods. More information can be found under the FAQ section on the web server.
To demonstrate the functionality of OnTheFly a full text article on protein–protein interaction predictions (Pitre et al.
) stored locally as a PDF file, has been processed. A below shows a table section of the resulting HTML file with the tagged protein identifiers. C shows the corresponding automatically retrieved association network of these entities using the STITCH database.
Fig. 1. The Figure shows an annotated table (A) of an PDF full text article (Pitre et al., 2006), the generated pop-up window with information about the protein YGL227W (B) and an automatically generated protein–protein interaction network (C) of associated (more ...)