|Home | About | Journals | Submit | Contact Us | Français|
Web applications for biology and medicine often need to integrate data from external data sources. Entrez, provided by the National Center for Biotechnology Information (NCBI) provides a searchable interface to important biological databases including PubMed, GenBank and GenPept . Application developers may perform searches of Entrez directly by accessing the publically available Entrez Programming Utilities (Entrez eUtils) interface . However, until now there has been no API available which allows web application ('webapp') developers to access Entrez eUtils directly from the browser. Developers are limited to accessing Entrez eUtils through software code running on the web "backend". This approach is less than ideal as it requires the call to Entrez eUtils to have been completed before each web page can be rendered. This means that that pages load slowly and may become blocked if the Entrez eUtils interface is unavailable for some reason (e.g. downtime, network congestion). It also means bandwidth required to provide the service may be increased due to the overhead of fetching the page from the backend and returning it to the user. This synchronous approach may also result in larger pages being generated, which also slow down page loading (Figure (Figure11).
We describe EntrezAJAX , an AJAX service which provides fast, convenient and reliable access to Entrez eUtils from any web browser. By circumventing the web browser security restrictions this API can be used by any developer wishing to incorporate Entrez results into their webapp.
The backend was built using the model-view-controller (MVC) web framework Django . MVC approaches to web development help enforce good development practice by separating data (model) from presentation (view) and business logic (controller) in code. We used the Entrez module of Biopython  to access Entrez eUtils.
The backend software was deployed on the web using Google App Engine (GAE) , a freely-available cloud computing service. The Datastore and Memcache components of GAE were used to store the registry of developer API keys and cache search results respectively.
The results of web requests are stored in a temporary memory cache for 24 hours. This value is configurable on a per-application basis. Each request is given a key, which comprises the method name and the alphabetically sorted parameter list (excluding the developer API key). Each request's key is checked against the cache first before contacting Entrez eUtils. If the key is present in the cache, the result is returned directly from the cache. This acts to reduce the time taken to serve requests and to reduce the number of calls made to Entrez eUtils to save bandwidth.
Developers wishing to use EntrezAJAX must first register their website on the project homepage to receive a developer API key. Web developers make requests to EntrezAJAX by constructing a URL consisting of three components; the endpoint, the method name and a dictionary of parameters (a hash of key/value pairs). The parameter dictionary must include the developer API key, which identifies the originator of the request. Other parameters are method-dependent. Each of the Entrez eUtils applications (EInfo, ESearch etc.) are exposed as a separate method name.
Developers wishing to access the Entrez eUtils 'Esearch' application should construct a URL in the following format, substituting <APIKEY>:
In order to accommodate common patterns of usage, combined methods permit two calls to Entrez eUtils to be chained together. In these cases, the first of these methods return a list of GI numbers, which are not returned to the user but are instead passed to the second method via the id parameter. Table Table11 lists the available methods.
NCBI specify strict limitations on the use of the Entrez Programming Utilities service on a per-developer basis. Where practical, we have enforced these limitations in code. The service will not permit more than three requests to be passed through to Entrez within one second. Additionally, the tool and email parameters are automatically filled-in using the information supplied when registering for a developer API key. We urge users of this service to familiarise themselves with the NCBI limitations and ensure their application meets them.
The EntrezAJAX project website has example code for using EntrezAJAX. These include the retrieval of results from PubMed and GenBank, retrieval of journal articles related to a nucleotide sequence and the ability to automatically correct users' spelling. Additionally, EntrezAJAX is heavily used on the authors' own xBASE resource for comparative bacterial genomics .
The use of GAE has significant advantages for implementation of services such as EntrezAJAX. These include the availability of a large in-memory cache, persistent data storage, a distributed network infrastructure and automatic failover mechanisms. During the development of EntrezAJAX we did not experience any occasion when the service was not available. However, the application sometimes took several seconds to respond, probably because a new GAE process was started up.
Currently any user may deploy an application on GAE for free. However, the application must stay within certain limits otherwise the application may be prevented from serving further requests until the quota period has elapsed. Quotas are subject to change, but important limits to consider when implementing this service include the incoming HTTP request limit, the UrlFetch limit and the Memcache API limit. The limitations imposed by the free tariff we believe are sufficient to cater for likely demand for the service in the near future. However we plan to monitor the service usage in case limits are reached. In that case, heavy users of the service will be contacted and we may suggest that they deploy the EntrezAJAX application from their own Google App Engine account and update their endpoint details accordingly.
The EntrezAJAX service is dependent on the availability of NCBI eUtils to work correctly. If NCBI eUtils is unavailable, requests will not be fulfilled, unless the request is already stored in the cache.
EntrezAJAX provides a working implementation for providing direct web browser access to biomedical resources accessible via the web. Therefore, we encourage users wishing to access other resources via AJAX to contribute code accordingly. However we believe this intelligent proxy approach represents a stepping-stone along the path to more integrated biomedical resources on the web. We are actively looking for other bioinformatics web resources that would benefit from a similar interface to EntrezAJAX. We also hope this project will inspire developers to invest the time and energy in producing AJAX-compatible endpoints for their databases.
Project name: EntrexAJAX
Project home page: http://entrezajax.appspot.com/
Source code home page: http://github.com/nickloman/entrezajax
Operating system(s): Platform-independent
Programming language: Python 2.5 +
Other requirements: Django 1.1 +, BioPython 1.53 +, Google App Engine
License: Apache License, Version 2.0
Any restrictions to use by non-academics: None
API: Application Programming Interface; HTTP: HyperText Transfer Protocol
The authors declare that they have no competing interests.
NJL conceived and implemented the software. NJL and MJP jointly drafted the manuscript. Both authors have read and approved the final manuscript.
The xBASE facility and Loman's position are funded by BBSRC grant BBE0111791