1.  Local Antiglycan Antibody Responses to Skin Stage and Migratory Schistosomula of Schistosoma japonicum 
Infection and Immunity  2015;84(1):21-33.
Schistosomiasis is a tropical disease affecting over 230 million people worldwide. Although effective drug treatment is available, reinfections are common, and development of immunity is slow. Most antibodies raised during schistosome infection are directed against glycans, some of which are thought to be protective. Developing schistosomula are considered most vulnerable to immune attack, and better understanding of local antibody responses raised against glycans expressed by this life stage might reveal possible glycan vaccine candidates for future vaccine research. We used antibody-secreting cell (ASC) probes to characterize local antiglycan antibody responses against migrating Schistosoma japonicum schistosomula in different tissues of rats. Analysis by shotgun Schistosoma glycan microarray resulted in the identification of antiglycan antibody response patterns that reflected the migratory pathway of schistosomula. Antibodies raised by skin lymph node (LN) ASC probes mainly targeted N-glycans with terminal mannose residues, Galβ1-4GlcNAc (LacNAc) and Galβ1-4(Fucα1-3)GlcNAc (LeX). Also, responses to antigenic and schistosome-specific glycosphingolipid (GSL) glycans containing highly fucosylated GalNAcβ1-4(GlcNAcβ1)n stretches that are believed to be present at the parasite's surface constitutively upon transformation were found. Antibody targets recognized by lung LN ASC probes were mainly N-glycans presenting GalNAcβ1-4GlcNAc (LDN) and GlcNAc motifs. Surprisingly, antibodies against highly antigenic multifucosylated motifs of GSL glycans were not observed in lung LN ASC probes, indicating that these antigens are not expressed in lung stage schistosomula or are not appropriately exposed to induce immune responses locally. The local antiglycan responses observed in this study highlight the stage- and tissue-specific expression of antigenic parasite glycans and provide insights into glycan targets possibly involved in resistance to S. japonicum infection.
PMCID: PMC4694003  PMID: 26459512
2.  Generation of a Novel Bacteriophage Library Displaying scFv Antibody Fragments from the Natural Buffalo Host to Identify Antigens from Adult Schistosoma japonicum for Diagnostic Development 
PLoS Neglected Tropical Diseases  2015;9(12):e0004280.
The development of effective diagnostic tools will be essential in the continuing fight to reduce schistosome infection; however, the diagnostic tests available to date are generally laborious and difficult to implement in current parasite control strategies. We generated a series of single-chain antibody Fv domain (scFv) phage display libraries from the portal lymph node of field exposed water buffaloes, Bubalus bubalis, 11–12 days post challenge with Schistosoma japonicum cercariae. The selected scFv-phages showed clear enrichment towards adult schistosomes and excretory-secretory (ES) proteins by immunofluorescence, ELISA and western blot analysis. The enriched libraries were used to probe a schistosome specific protein microarray resulting in the recognition of a number of proteins, five of which were specific to schistosomes, with RNA expression predominantly in the adult life-stage based on interrogation of schistosome expressed sequence tags (EST). As the libraries were enriched by panning against ES products, these antigens may be excreted or secreted into the host vasculature and hence may make good targets for a diagnostic assay. Further selection of the scFv library against infected mouse sera identified five soluble scFv clones that could selectively recognise soluble whole adult preparations (SWAP) relative to an irrelevant protein control (ovalbumin). Furthermore, two of the identified scFv clones also selectively recognised SWAP proteins when spiked into naïve mouse sera. These host B-cell derived scFvs that specifically bind to schistosome protein preparations will be valuable reagents for further development of a cost effective point-of-care diagnostic test.
Author Summary
Mass drug administration using the highly effective drug praziquantel (PZQ) is currently the method of choice to combat schistosomiasis. However, this treatment regime has limitations; in particular, it does not prevent re-infection and sporadic parasite resistance against PZQ is a continuing threat. The path to the successful control of schistosomiasis is highly challenging and must consider, not only the complex nature of the host-parasite interaction, but also the capacity to assess disease burden and parasite re-emergence in communities where successful control has been achieved. Furthermore, control programs must be economically sustainable in endemic countries and despite significant recent advancements the elimination of schistosomiasis may still be some time away. Accordingly, there is a definitive need to formulate innovative approaches for the development of improved diagnostic tools to accurately assess the disease burden associated with active schistosome infections. Here we describe the usefulness of a phage display library to mature antibody fragments derived from lymph node RNA of the natural buffalo host of the Asian schistosome, Schistosoma japonicum, following an experimental infection. These mature antibody fragments were able to bind native parasite proteins and could thus be used to develop a low cost and accurate point-of-care diagnostic test.
PMCID: PMC4686158  PMID: 26684756
3.  Using EMBL-EBI services via Web interface and programmatically via Web Services 
The European Bioinformatics Institute (EMBL-EBI) provides access to a wide range of databases and analysis tools that are of key importance in bioinformatics. As well as providing Web interfaces to these resources, Web Services are available using SOAP and REST protocols that enable programmatic access to our resources and allow their integration into other applications and analytical workflows.
This unit describes the various options available to a typical researcher or bioinformatician who wishes to use our resources via Web interface or programmatically via a range of programming languages.
PMCID: PMC4312015  PMID: 25501941
Web Services; Programmatic access; SOAP; REST; analytical pipelines; workflows
4.  The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI 
Nucleic Acids Research  2015;43(Web Server issue):W585-W588.
The European Bioinformatics Institute (EMBL-EBI— provides free and unrestricted access to data across all major areas of biology and biomedicine. Searching and extracting knowledge across these domains requires a fast and scalable solution that addresses the requirements of domain experts as well as casual users. We present the EBI Search engine, referred to here as ‘EBI Search’, an easy-to-use fast text search and indexing system with powerful data navigation and retrieval capabilities. API integration provides access to analytical tools, allowing users to further investigate the results of their search. The interconnectivity that exists between data resources at EMBL-EBI provides easy, quick and precise navigation and a better understanding of the relationship between different data types including sequences, genes, gene products, proteins, protein domains, protein families, enzymes and macromolecular structures, together with relevant life science literature.
PMCID: PMC4489232  PMID: 25855807
5.  The EMBL-EBI bioinformatics web and programmatic tools framework 
Nucleic Acids Research  2015;43(Web Server issue):W580-W584.
Since 2009 the EMBL-EBI Job Dispatcher framework has provided free access to a range of mainstream sequence analysis applications. These include sequence similarity search services ( such as BLAST, FASTA and PSI-Search, multiple sequence alignment tools ( such as Clustal Omega, MAFFT and T-Coffee, and other sequence analysis tools ( such as InterProScan. Through these services users can search mainstream sequence databases such as ENA, UniProt and Ensembl Genomes, utilising a uniform web interface or systematically through Web Services interfaces ( using common programming languages, and obtain enriched results with novel visualisations. Integration with EBI Search ( and the dbfetch retrieval service ( further expands the usefulness of the framework. New tools and updates such as NCBI BLAST+, InterProScan 5 and PfamScan, new categories such as RNA analysis tools (, new databases such as ENA non-coding, WormBase ParaSite, Pfam and Rfam, and new workflow methods, together with the retirement of depreciated services, ensure that the framework remains relevant to today's biological community.
PMCID: PMC4489272  PMID: 25845596
6.  A molecular basis underpinning the T cell receptor heterogeneity of mucosal-associated invariant T cells 
The Journal of Experimental Medicine  2014;211(8):1585-1600.
A novel MAIT cell antagonist, Ac-6-FP, stabilizes MR1 and can inhibit MAIT cell activation with the flexible TCR β-chain serving to fine-tune the affinity of the TCR for antigen-MR1 complexes.
Mucosal-associated invariant T (MAIT) cells express an invariant T cell receptor (TCR) α-chain (TRAV1-2 joined to TRAJ33, TRAJ20, or TRAJ12 in humans), which pairs with an array of TCR β-chains. MAIT TCRs can bind folate- and riboflavin-based metabolites restricted by the major histocompatibility complex (MHC)-related class I−like molecule, MR1. However, the impact of MAIT TCR and MR1-ligand heterogeneity on MAIT cell biology is unclear. We show how a previously uncharacterized MR1 ligand, acetyl-6-formylpterin (Ac-6-FP), markedly stabilized MR1, potently up-regulated MR1 cell surface expression, and inhibited MAIT cell activation. These enhanced properties of Ac-6-FP were attributable to structural alterations in MR1 that subsequently affected MAIT TCR recognition via conformational changes within the complementarity-determining region (CDR) 3β loop. Analysis of seven TRBV6-1+ MAIT TCRs demonstrated how CDR3β hypervariability impacted on MAIT TCR recognition by altering TCR flexibility and contacts with MR1 and the Ag itself. Ternary structures of TRBV6-1, TRBV6-4, and TRBV20+ MAIT TCRs in complex with MR1 bound to a potent riboflavin-based antigen (Ag) showed how variations in TRBV gene usage exclusively impacted on MR1 contacts within a consensus MAIT TCR-MR1 footprint. Moreover, differential TRAJ gene usage was readily accommodated within a conserved MAIT TCR-MR1-Ag docking mode. Collectively, MAIT TCR heterogeneity can fine-tune MR1 recognition in an Ag-dependent manner, thereby modulating MAIT cell recognition.
PMCID: PMC4113946  PMID: 25049336
7.  Discovery of novel Schistosoma japonicum antigens using a targeted protein microarray approach 
Parasites & Vectors  2014;7:290.
Novel vaccine candidates against Schistosoma japonicum are required, and antigens present in the vulnerable larval developmental stage are attractive targets. Post-genomic technologies are now available which can contribute to such antigen discovery.
A schistosome-specific protein microarray was probed using the local antibody response against migrating larvae. Antigens were assessed for their novelty and predicted larval expression and host-exposed features. One antigen was further characterised and its sequence and structure were analysed in silico. Real-time polymerase chain reaction was used to analyse transcript expression throughout development, and immunoblotting and enzyme-linked immunosorbent assays employed to determine antigen recognition by antibody samples.
Several known and novel antigens were discovered, two of which showed up-regulated transcription in schistosomula. One novel antigen, termed S. japonicum Ly-6-like protein 1 (Sj-L6L-1), was further characterised and shown to share structural and sequence features with the Ly-6 protein family. It was found to be present in the worm tegument and expressed in both the larval and adult worms, but was found to be antigenic only in the lungs that the larvae migrate to and traverse.
This study represents a novel approach to vaccine antigen discovery and may contribute to schistosome vaccine development against this important group of human and veterinary pathogens.
PMCID: PMC4080988  PMID: 24964958
Schistosoma japonicum; Vaccine development; Ly-6 proteins; Protein microarray; Immunomics
8.  InterProScan 5: genome-scale protein function classification 
Bioinformatics  2014;30(9):1236-1240.
Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.
Availability and implementation: InterProScan is distributed via FTP at and the source code is available from
Contact: or or
PMCID: PMC3998142  PMID: 24451626
9.  Assembly information services in the European Nucleotide Archive 
Nucleic Acids Research  2013;42(Database issue):D38-D43.
The European Nucleotide Archive (ENA; is a repository for the world public domain nucleotide sequence data output. ENA content covers a spectrum of data types including raw reads, assembly data and functional annotation. ENA has faced a dramatic growth in genome assembly submission rates, data volumes and complexity of datasets. This has prompted a broad reworking of assembly submission services, for which we now reach the end of a major programme of work and many enhancements have already been made available over the year to components of the submission service. In this article, we briefly review ENA content and growth over 2013, describe our rapidly developing services for genome assembly information and outline further major developments over the last year.
PMCID: PMC3965037  PMID: 24214989
10.  Local Immune Responses of the Chinese Water Buffalo, Bubalus bubalis, against Schistosoma japonicum Larvae: Crucial Insights for Vaccine Design 
Asian schistosomiasis is a zoonotic parasitic disease infecting up to a million people and threatening tens of millions more. Control of this disease is hindered by the animal reservoirs of the parasite, in particular the water buffalo (Bubalus bubalis), which is responsible for significant levels of human transmission. A transmission-blocking vaccine administered to buffaloes is a realistic option which would aid in the control of schistosomiasis. This will however require a better understanding of the immunobiology of schistosomiasis in naturally exposed buffaloes, particularly the immune response to migrating schistosome larvae, which are the likely targets of an anti-schistosome vaccine. To address this need we investigated the immune response at the major sites of larval migration, the skin and the lungs, in previously exposed and re-challenged water buffaloes. In the skin, a strong allergic-type inflammatory response occurred, characterised by leukocyte and eosinophil infiltration including the formation of granulocytic abscesses. Additionally at the local skin site, interleukin-5 transcript levels were elevated, while interleukin-10 levels decreased. In the skin-draining lymph node (LN) a predominant type-2 profile was seen in stimulated cells, while in contrast a type-1 profile was detected in the lung draining LN, and these responses occurred consecutively, reflecting the timing of parasite migration. The intense type-2 immune response at the site of cercarial penetration is significantly different to that seen in naive and permissive animal models such as mice, and suggests a possible mechanism for immunity. Preliminary data also suggest a reduced and delayed immune response occurred in buffaloes given high cercarial challenge doses compared with moderate infections, particularly in the skin. This study offers a deeper understanding into the immunobiology of schistosomiasis in a natural host, which may aid in the future design of more effective vaccines.
Author Summary
Schistosomiasis is caused by a parasitic blood fluke, and in parts of Asia it infects both humans and livestock such as water buffaloes. This makes controlling the disease more difficult, because both humans and livestock must be treated regularly. A vaccine given to buffaloes is likely to reduce human infection rates and improve buffalo health by providing long-lasting protection from re-infection; at present no vaccines are available. Older buffaloes are known to have some immunity to schistosomiasis which is acquired over time; however how this occurs is not understood. In this study we investigated the immune response of buffalo against the schistosome larvae, which are vulnerable to immune attack, and hence are the ideal stage to target for vaccination. We found that the buffalo produces a profound allergic type-2 response as larvae penetrate the skin, with significant cellular infiltrates and abscesses. When the larvae move next to the lungs, a uniquely type-1 response was induced. This skin response is much greater than more susceptible animals such as mice, and may be a mechanism for larval killing in the buffalo. This study offers insight into the immunobiology of an important host for schistosomiasis and may help in designing better vaccines.
PMCID: PMC3784499  PMID: 24086786
11.  Analysis Tool Web Services from the EMBL-EBI 
Nucleic Acids Research  2013;41(Web Server issue):W597-W600.
Since 2004 the European Bioinformatics Institute (EMBL-EBI) has provided access to a wide range of databases and analysis tools via Web Services interfaces. This comprises services to search across the databases available from the EMBL-EBI and to explore the network of cross-references present in the data (e.g. EB-eye), services to retrieve entry data in various data formats and to access the data in specific fields (e.g. dbfetch), and analysis tool services, for example, sequence similarity search (e.g. FASTA and NCBI BLAST), multiple sequence alignment (e.g. Clustal Omega and MUSCLE), pairwise sequence alignment and protein functional analysis (e.g. InterProScan and Phobius). The REST/SOAP Web Services ( interfaces to these databases and tools allow their integration into other tools, applications, web sites, pipeline processes and analytical workflows. To get users started using the Web Services, sample clients are provided covering a range of programming languages and popular Web Service tool kits, and a brief guide to Web Services technologies, including a set of tutorials, is available for those wishing to learn more and develop their own clients. Users of the Web Services are informed of improvements and updates via a range of methods.
PMCID: PMC3692137  PMID: 23671338
12.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats 
Bioinformatics  2013;29(10):1325-1332.
Motivation: Advancing the search, publication and integration of bioinformatics tools and resources demands consistent machine-understandable descriptions. A comprehensive ontology allowing such descriptions is therefore required.
Results: EDAM is an ontology of bioinformatics operations (tool or workflow functions), types of data and identifiers, application domains and data formats. EDAM supports semantic annotation of diverse entities such as Web services, databases, programmatic libraries, standalone tools, interactive applications, data schemas, datasets and publications within bioinformatics. EDAM applies to organizing and finding suitable tools and data and to automating their integration into complex applications or workflows. It includes over 2200 defined concepts and has successfully been used for annotations and implementations.
Availability: The latest stable version of EDAM is available in OWL format from and in OBO format from It can be viewed online at the NCBO BioPortal and the EBI Ontology Lookup Service. For documentation and license please refer to This article describes version 1.2 available at
PMCID: PMC3654706  PMID: 23479348
13.  The Annotation-enriched non-redundant patent sequence databases 
The EMBL-European Bioinformatics Institute (EMBL-EBI) offers public access to patent sequence data, providing a valuable service to the intellectual property and scientific communities. The non-redundant (NR) patent sequence databases comprise two-level nucleotide and protein sequence clusters (NRNL1, NRNL2, NRPL1 and NRPL2) based on sequence identity (level-1) and patent family (level-2). Annotation from the source entries in these databases is merged and enhanced with additional information from the patent literature and biological context. Corrections in patent publication numbers, kind-codes and patent equivalents significantly improve the data quality. Data are available through various user interfaces including web browser, downloads via FTP, SRS, Dbfetch and EBI-Search. Sequence similarity/homology searches against the databases are available using BLAST, FASTA and PSI-Search. In this article, we describe the data collection and annotation and also outline major changes and improvements introduced since 2009. Apart from data growth, these changes include additional annotation for singleton clusters, the identifier versioning for tracking entry change and the entry mappings between the two-level databases.
Database URL:
PMCID: PMC3568390  PMID: 23396323
14.  Facing growth in the European Nucleotide Archive 
Nucleic Acids Research  2012;41(Database issue):D30-D35.
The European Nucleotide Archive (ENA; collects, maintains and presents comprehensive nucleic acid sequence and related information as part of the permanent public scientific record. Here, we provide brief updates on ENA content developments and major service enhancements in 2012 and describe in more detail two important areas of development and policy that are driven by ongoing growth in sequencing technologies. First, we describe the ENA data warehouse, a resource for which we provide a programmatic entry point to integrated content across the breadth of ENA. Second, we detail our plans for the deployment of CRAM data compression technology in ENA.
PMCID: PMC3531187  PMID: 23203883
15.  IPD—the Immuno Polymorphism Database 
Nucleic Acids Research  2012;41(Database issue):D1234-D1240.
The Immuno Polymorphism Database (IPD), is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project.
PMCID: PMC3531162  PMID: 23180793
16.  The IMGT/HLA database 
Nucleic Acids Research  2012;41(Database issue):D1222-D1227.
It is 14 years since the IMGT/HLA database was first released, providing the HLA community with a searchable repository of highly curated HLA sequences. The HLA complex is located within the 6p21.3 region of human chromosome 6 and contains more than 220 genes of diverse function. Of these, 21 genes encode proteins of the immune system that are highly polymorphic. The naming of these HLA genes and alleles and their quality control is the responsibility of the World Health Organization Nomenclature Committee for Factors of the HLA System. Through the work of the HLA Informatics Group and in collaboration with the European Bioinformatics Institute, we are able to provide public access to these data through the website Regular updates to the website ensure that new and confirmatory sequences are dispersed to the HLA community and the wider research and clinical communities. This article describes the latest updates and additional tools added to the IMGT/HLA project.
PMCID: PMC3531221  PMID: 23080122
17.  PSI-Search: iterative HOE-reduced profile SSEARCH searching 
Bioinformatics  2012;28(12):1650-1651.
Summary: Iterative similarity searches with PSI-BLAST position-specific score matrices (PSSMs) find many more homologs than single searches, but PSSMs can be contaminated when homologous alignments are extended into unrelated protein domains—homologous over-extension (HOE). PSI-Search combines an optimal Smith–Waterman local alignment sequence search, using SSEARCH, with the PSI-BLAST profile construction strategy. An optional sequence boundary-masking procedure, which prevents alignments from being extended after they are initially included, can reduce HOE errors in the PSSM profile. Preventing HOE improves selectivity for both PSI-BLAST and PSI-Search, but PSI-Search has ~4-fold better selectivity than PSI-BLAST and similar sensitivity at 50% and 60% family coverage. PSI-Search is also produces 2- for 4-fold fewer false-positives than JackHMMER, but is ~5% less sensitive.
Availability and implementation: PSI-Search is available from the authors as a standalone implementation written in Perl for Linux-compatible platforms. It is also available through a web interface ( and SOAP and REST Web Services (
PMCID: PMC3371869  PMID: 22539666
18.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega 
Multiple sequence alignments are fundamental to many sequence analysis methods. The new program Clustal Omega can align virtually any number of protein sequences quickly and has powerful features for adding sequences to existing precomputed alignments.
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.
PMCID: PMC3261699  PMID: 21988835
bioinformatics; hidden Markov models; multiple sequence alignment
19.  Fast and efficient searching of biological data resources—using EB-eye 
Briefings in Bioinformatics  2010;11(4):375-384.
The EB-eye is a fast and efficient search engine that provides easy and uniform access to the biological data resources hosted at the EMBL-EBI. Currently, users can access information from more than 62 distinct datasets covering some 400 million entries. The data resources represented in the EB-eye include: nucleotide and protein sequences at both the genomic and proteomic levels, structures ranging from chemicals to macro-molecular complexes, gene-expression experiments, binary level molecular interactions as well as reaction maps and pathway models, functional classifications, biological ontologies, and comprehensive literature libraries covering the biomedical sciences and related intellectual property. The EB-eye can be accessed over the web or programmatically using a SOAP Web Services interface. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines. The EB-eye is a novel alternative to existing biological search and retrieval engines. In this article we describe in detail how to exploit its powerful capabilities.
PMCID: PMC2905521  PMID: 20150321
text search; biological databases; integration; interoperability; web services; Apache Lucene
20.  The IMGT/HLA database 
Nucleic Acids Research  2010;39(Database issue):D1171-D1176.
It is 12 years since the IMGT/HLA database was first released, providing the HLA community with a searchable repository of highly curated HLA sequences. The HLA complex is located within the 6p21.3 region of human chromosome 6 and contains more than 220 genes of diverse function. Many of the genes encode proteins of the immune system and are highly polymorphic. The naming of these HLA genes and alleles and their quality control is the responsibility of the WHO Nomenclature Committee for Factors of the HLA System. Through the work of the HLA Informatics Group and in collaboration with the European Bioinformatics Institute, we are able to provide public access to this data through the web site Regular updates to the web site ensure that new and confirmatory sequences are dispersed to the HLA community, and the wider research and clinical communities.
PMCID: PMC3013815  PMID: 21071412
21.  A new bioinformatics analysis tools framework at EMBL–EBI 
Nucleic Acids Research  2010;38(Web Server issue):W695-W699.
The EMBL-EBI provides access to various mainstream sequence analysis applications. These include sequence similarity search services such as BLAST, FASTA, InterProScan and multiple sequence alignment tools such as ClustalW, T-Coffee and MUSCLE. Through the sequence similarity search services, the users can search mainstream sequence databases such as EMBL-Bank and UniProt, and more than 2000 completed genomes and proteomes. We present here a new framework aimed at both novice as well as expert users that exposes novel methods of obtaining annotations and visualizing sequence analysis results through one uniform and consistent interface. These services are available over the web and via Web Services interfaces for users who require systematic access or want to interface with customized pipe-lines and workflows using common programming languages. The framework features novel result visualizations and integration of domain and functional predictions for protein database searches. It is available at for sequence similarity searches and at for multiple sequence alignments.
PMCID: PMC2896090  PMID: 20439314
22.  Improvements to services at the European Nucleotide Archive 
Nucleic Acids Research  2009;38(Database issue):D39-D45.
The European Nucleotide Archive (ENA; is Europe’s primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL–EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.
PMCID: PMC2808951  PMID: 19906712
23.  Non-redundant patent sequence databases with value-added annotations at two levels 
Nucleic Acids Research  2009;38(Database issue):D52-D56.
The European Bioinformatics Institute (EMBL-EBI) provides public access to patent data, including abstracts, chemical compounds and sequences. Sequences can appear multiple times due to the filing of the same invention with multiple patent offices, or the use of the same sequence by different inventors in different contexts. Information relating to the source invention may be incomplete, and biological information available in patent documents elsewhere may not be reflected in the annotation of the sequence. Search and analysis of these data have become increasingly challenging for both the scientific and intellectual-property communities. Here, we report a collection of non-redundant patent sequence databases, which cover the EMBL-Bank nucleotides patent class and the patent protein databases and contain value-added annotations from patent documents. The databases were created at two levels by the use of sequence MD5 checksums. Sequences within a level-1 cluster are 100% identical over their whole length. Level-2 clusters were defined by sub-grouping level-1 clusters based on patent family information. Value-added annotations, such as publication number corrections, earliest publication dates and feature collations, significantly enhance the quality of the data, allowing for better tracking and cross-referencing. The databases are available format:
PMCID: PMC2808894  PMID: 19884134
24.  IPD—the Immuno Polymorphism Database 
Nucleic Acids Research  2009;38(Database issue):D863-D869.
The Immuno Polymorphism Database (IPD) ( is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory.
PMCID: PMC2808958  PMID: 19875415
25.  Web services at the European Bioinformatics Institute-2009 
Nucleic Acids Research  2009;37(Web Server issue):W6-W10.
The European Bioinformatics Institute (EMBL-EBI) has been providing access to mainstream databases and tools in bioinformatics since 1997. In addition to the traditional web form based interfaces, APIs exist for core data resources such as EMBL-Bank, Ensembl, UniProt, InterPro, PDB and ArrayExpress. These APIs are based on Web Services (SOAP/REST) interfaces that allow users to systematically access databases and analytical tools. From the user's point of view, these Web Services provide the same functionality as the browser-based forms. However, using the APIs frees the user from web page constraints and are ideal for the analysis of large batches of data, performing text-mining tasks and the casual or systematic evaluation of mathematical models in regulatory networks. Furthermore, these services are widespread and easy to use; require no prior knowledge of the technology and no more than basic experience in programming. In the following we wish to inform of new and updated services as well as briefly describe planned developments to be made available during the course of 2009–2010.
PMCID: PMC2703973  PMID: 19435877

