Here, we describe a pre-derivation embryo haplotyping strategy that we developed in order to maximize the efficiency and minimize the costs of establishing banks of clinical grade hESC lines in which human leukocyte antigen (HLA) haplotypes match a significant proportion of the population. Using whole genome amplification followed by medium resolution HLA typing using PCR amplification with sequence-specific primers (PCR-SSP), we have typed the parents, embryos and hESC lines from three families as well as our eight clinical grade hESC lines and shown that this technical approach is rapid, reliable and accurate. By employing this pre-derivation strategy where, based on HLA match, embryos are selected for a GMP route on day 3–4 of development, we would have drastically reduced our cGMP laboratory running costs.
clinical grade stem cells; clinical grade human embryonic stem cell validation and banking; HLA typing; human embryonic stem cells; single cell analyses
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
The AID/APOBEC family (activation induced deaminase/apolipoprotein B mRNA editing cytokine deaminase) in B cells play important roles in adaptive and innate immunity. Whereas APOBEC3G has been studied in CD4+ T cells and myeloid cells its functional potential in B cells has received little attention. AID combines two critical functions of antibodies, class switching and affinity maturation and may serve as a functional surrogate of protection. These functions were studied following systemic immunization of rhesus macaques with recombinant HLA constructs, linked with HIV and SIV antigens and HSP70 to dextran. The results showed significant upregulation of AID in CD20+ B cells, APOBEC 3G in CD27+ memory B cells and CD4+ effector memory T cells. After immunization the upregulated APOBEC 3G and AID were directly correlated in B cells (p<0.0001). Following challenge with SHIV SF162.P4 the viral load was inversely correlated with AID in B cells and APOBEC 3G in B and T cells, suggesting that both deaminases may have protective functions. Investigation of major interactions between DC, T cells and B cells showed significant increase in membrane associated IL-15 in DC and CD40L in CD4+ T cells. IL-15 binds the IL-15 receptor complex in CD4+ T and B cells, which may reactivate the DC, T and B cell interactions. The overall results are consistent with AID inhibiting pre-entry SHIV by eliciting IgG and IgA antibodies, whereas APOBEC 3G may contribute to the post-entry control of SHIV replication and cellular spread.
Major histocompatibility complex (MHC) molecules expressed on the surface of human immunodeficiency virus (HIV) are potential targets for neutralizing antibodies. Since MHC molecules are polymorphic, nonself MHC can also be immunogenic. We have used combinations of novel recombinant HLA class I and II and HIV/simian immunodeficiency virus (SIV) antigens, all linked to dextran, to investigate whether they can elicit protective immunity against heterologous simian/human immunodeficiency virus (SHIV) challenge in rhesus macaques. Three groups of animals were immunized with HLA (group 1, n = 8), trimeric YU2 HIV type 1 (HIV-1) gp140 and SIV p27 (HIV/SIV antigens; group 2, n = 8), or HLA plus HIV/SIV antigens (group 3, n = 8), all with Hsp70 and TiterMax Gold adjuvant. Another group (group 4, n = 6) received the same vaccine as group 3 without TiterMax Gold. Two of eight macaques in group 3 were completely protected against intravenous challenge with 18 50% animal infective doses (AID50) of SHIV-SF162P4/C grown in human cells expressing HLA class I and II lineages represented in the vaccine, while the remaining six macaques showed decreased viral loads compared to those in unimmunized animals. Complement-dependent neutralizing activity in serum and high levels of anti-HLA antibodies were elicited in groups 1 and 3, and both were inversely correlated with the plasma viral load at 2 weeks postchallenge. Antibody-mediated protection was strongly supported by the fact that transfer of pooled serum from the two challenged but uninfected animals protected two naïve animals against repeated low-dose challenge with the same SHIV stock. This study demonstrates that immunization with recombinant HLA in combination with HIV-1 antigens might be developed into an alternative strategy for a future AIDS vaccine.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During 2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences and support for raw next-generation sequence read submissions.
Hepatitis C virus (HCV) nonstructural protein 5A (NS5A) exhibits a preference for G/U-rich RNA in vitro. Biological analysis of the NS5A RNA-binding activity and its target sites in the genome will be facilitated by a description of the NS5A-RNA complex. We demonstrate that the C-4 carbonyl of the uracil base and, by inference, the C-6 carbonyl of the guanine base interact with NS5A. U-rich RNA of 5 to 6 nucleotides (nt) is sufficient for high-affinity binding to NS5A. The minimal RNA-binding domain of NS5A consists of residues 2005 to 2221 (referred to as domain I-plus). This region of the protein includes the amino-terminal domain I as well as the subsequent linker that separates domains I and II. This linker region is the site of adaptive mutations. U-rich RNA-binding activity is not observed for an NS5A derivative containing only residues 2194 to 2419 (domains II and III). Mass spectrometric analysis of an NS5A-poly(rU) complex identified domains I and II as sites for interaction with RNA. Dimerization of NS5A was demonstrated by glutaraldehyde cross-linking. This dimerization is likely mediated by domain I-plus, as dimers of this protein are trapped by cross-linking. Dimers of the domain II-III protein are not observed. The monomer-dimer equilibrium of NS5A shifts in favor of dimer when U-rich RNA is present but not when A-rich RNA is present, consistent with an NS5A dimer being the RNA-binding-competent form of the protein. These data provide a molecular perspective of the NS5A-RNA complex and suggest possible mechanisms for regulation of HCV and cellular gene expression.
A single nucleotide polymorphism (SNP) of the gene encoding protein tyrosine phosphatase type 22 (PTPN22 620W) has recently been described as a strong common genetic risk factor for human autoimmune disease. We have analysed the association of PTPN22 620W in patients with Behçet's disease (BD).
Genomic DNA was obtained from 270 patients with BD from the UK and the Middle East. Normal controls (n = 203) were collected from the same populations. Patients with idiopathic retinal vasculitis from the UK (n = 136) were used as disease controls. PTPN22 620W was detected by SSP–PCR analysis and agarose gel electrophoresis.
The results showed an inverse correlation between the presence of PTPN22 620W and Behçet's disease in either patient group tested. There was a greatly reduced prevalence in Middle Eastern compared to UK patients and controls. Finally, there was no association with either UK patients with retinal vasculitis compared with UK controls.
The presence of PTPN22 620W was inversely associated with BD and the distribution of the SNP in the Middle East supports previous findings in the global prevalence.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.
Rheumatoid arthritis (RA) is associated with reduced lifespan and shortened telomere length in lymphocytes, but the mechanism underlying this is unclear. Telomere loss in white blood cells (WBC) is accelerated by oxidative stress and inflammation in vitro. It was postulated that the accelerated WBC telomere shortening in RA occurs as a result of exposure to chronic inflammation.
To measure telomere terminal restriction fragment (TRF) length in a large cohort of RA cases and healthy controls, to explore associations of TRF length with features of disease and with RA‐associated HLA‐DRB1 alleles.
WBC and TRF length were measured by Southern blot in DNA from 176 hospital‐based RA cases satisfying the 1987 American College of Rheumatology criteria and from 1151 controls. TRF length was compared between cases and controls, and the effects of disease duration, severity and HLA‐DRB1 alleles encoding the shared epitope (SE) were assessed.
Age‐ and sex‐adjusted TRF length was significantly shorter in RA cases compared with controls (p<0.001). There was no association between age‐ and sex‐adjusted TRF length and disease duration, C reactive protein or Larsen score. The presence of one or more SE‐encoding alleles was associated with reduced adjusted TRF length in RA cases (SE positive vs SE negative cases, p = 0.038), but not in controls.
The reduced TRF length in a large group of patients with RA compared with controls has been shown. The reduction is apparently independent of disease duration and markers of disease severity, but is influenced by HLA‐DRB1 genotype.
Epidemiological studies suggest that allogeneic immunity may inhibit HIV-1 transmission from mother to baby and is less frequent in multiparous than uniparous women. Alloimmune responses may also be elicited during unprotected heterosexual intercourse, which is associated ex vivo with resistance to HIV infection.
The investigation was carried out in well-defined heterosexual and homosexual monogamous partners, practising unprotected sex and a heterosexual cohort practising protected sex. Allogeneic CD4+ and CD8+ T cell proliferative responses were elicited by stimulating PBMC with the partners' irradiated monocytes and compared with 3rd party unrelated monocytes, using the CFSE method. Significant increase in allogeneic proliferative responses was found in the CD4+ and CD8+ T cells to the partners' irradiated monocytes, as compared with 3rd party unrelated monocytes (p≤0.001). However, a significant decrease in proliferative responses, especially of CD8+ T cells to the partners' compared with 3rd party monocytes was consistent with tolerization, in both the heterosexual and homosexual partners (p<0.01). Examination of CD4+CD25+FoxP3+ regulatory T cells by flow cytometry revealed a significantly greater proportion of these cells in the homosexual than heterosexual partners practising unprotected sex (p<0.05). Ex vivo studies of infectivity of PBMC with HIV-1 showed significantly greater inhibition of infectivity of PBMC from heterosexual subjects practising unprotected compared with those practising protected sex (p = 0.02).
Both heterosexual and homosexual monogamous partners practising unprotected sex develop allogeneic CD4+ and CD8+ T cell proliferative responses to the partners' unmatched cells and a minority may be tolerized. However, a greater proportion of homosexual rather than heterosexual partners developed CD4+CD25FoxP3+ regulatory T cells. These results, in addition to finding greater inhibition of HIV-1 infectivity in PBMC ex vivo in heterosexual partners practising unprotected, compared with those practising protected sex, suggest that allogeneic immunity may play a significant role in the immuno-pathogenesis of HIV-1 infection.
The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL–EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.
Viral capsid proteins (CPs) can regulate gene expression and encapsulate viral RNAs. Low-level expression of the brome mosaic virus (BMV) CP was found to stimulate viral RNA accumulation, while higher levels inhibited translation and BMV RNA replication. Regulation of translation acts through an RNA element named the B box, which is also critical for the replicase assembly. The BMV CP has also been shown to preferentially bind to an RNA element named SLC that contains the core promoter for genomic minus-strand RNA synthesis. To further elucidate CP interaction with RNA, Available online we used a reversible cross-linking–peptide fingerprinting assay to identify 27 May 2009 peptides in the capsid that contact the SLC, the B-box RNA, and the encapsidated RNA. Transient expression of three mutations made in residues within or close by the cross-linked peptides partially released the normal inhibition of viral RNA accumulation in agroinfiltrated Nicotiana benthamiana. Interestingly, two of the mutants, R142A and D148A, were found to retain the ability to down-regulate reporter RNA translation. These two mutants formed viral particles in inoculated leaves, but only R142Awas able to move systemically in the inoculated plant. The R142A CP was found to have higher affinities for SLC and the B box compared with those of wild-type CP and to alter contacts to the RNA in the virion. These results better define how the BMV CP can interact with RNA and regulate different viral processes.
positive-strand RNA virus; brome mosaic virus; capsid protein; RNA replication; translation
Older people suffer from a decline in immune system, which affects their ability to respond to infections and to raise efficient responses to vaccines. Effective and specific antibodies in responses from older individuals are decreased in favour of non-specific antibody production. We investigated the B-cell repertoire in DNA samples from peripheral blood of individuals aged 86–94 years, and a control group aged 19–54 years, using spectratype analysis of the IGHV complementarity determining region (CDR)3. We found that a proportion of older individuals had a dramatic collapse in their B-cell repertoire diversity. Sequencing of polymerase chain reaction products from a selection of samples indicated that this loss of diversity was characterized by clonal expansions of B cells in vivo. Statistical analysis of the spectratypes enabled objective comparisons and showed that loss of diversity correlated very strongly with the general health status of the individuals; a distorted spectratype can be used to predict frailty. Correlations with survival and vitamin B12 status were also seen. We conclude that B-cell diversity can decrease dramatically with age and may have important implications for the immune health of older people. B-cell immune frailty is also a marker of general frailty.
aging; B cells; diversity; elderly; immune frailty
Acute anterior uveitis (AAU) is the most common form of uveitis and is thought to be autoimmune in nature. Recent studies have described genes that act as master controllers of autoimmunity. Protein tyrosine phosphatase type 22 (PTPN22) and Cytotoxic T lymphocyte antigen-4 (CTLA-4) are two of these genes, and single nucleotide polymorphisms (SNPs) in the genes encoding these molecules have been associated with several autoimmune diseases. In this study we have analyzed SNPs in PTPN22 and CTLA-4 in patients with AAU.
The functional protein tyrosine phosphatase type 22 (PTPN22) SNP (R620W rs2476601, 1858C/T), and two CTLA-4 SNPs (rs5742909, −318C/T and rs231775, 49A/G) were analyzed in 140 patients with AAU and 92 healthy controls by sequence-specific primer -polymerase chain reaction (SSP-PCR). Data was analyzed by χ2 analysis and Fisher’s exact test.
There was no significant association between PTPN22 620W, CTLA-4 −318C/T, or CTLA-4 49A/G and AAU. Similarly, there was no association with the three SNPs when patients were classified by race or gender. Finally, there was no association with the presence of ankylosing spondylitis in the patient cohort.
The data do not support an association between SNPs in PTPN22 and CTLA-4, genes regarded as genetic master switches of autoimmunity. This raises the issue of the etiology of AAU and the possibility that it should be regarded as an autoinflammatory rather than an autoimmune condition.
Toll-like receptor 3 (TLR3) can signal the production of a suite of cytokines and chemokines in response to double-stranded RNA (dsRNA) ligands or the dsRNA mimic poly(I-C). Using a human embryonic kidney 293T cell line to express human TLR3, we determined that poly(I-C)-induced signal could be significantly inhibited by single-stranded DNAs (ssDNAs), but not ssRNA or dsDNA. The ssDNA molecules that down-modulated TLR3 signaling did not affect TLR4 and do not require the hypomethylated CpG motif found in TLR9 ligands. The degree of modulation can be altered by the length, base sequence, and modification state of the ssDNAs. An inhibitory ssDNA was found to colocalize with TLR3 in transfected cells and in a cell line that naturally expresses TLR3. The inhibitory ssDNAs can compete efficiently with dsRNA for binding purified TLR3 ectodomains in vitro, while noninhibitory nucleic acids do not. The ssDNAs also decrease the levels of several cytokines produced by the human bronchial epithelial cell line BEAS-2B and by human peripheral blood mononuclear cells in response to poly(I-C) stimulation of native TLR3. These activities indicate that ssDNAs could be used to regulate the inflammatory response through TLR3.
Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.
The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.
This meeting report summarizes the proceedings of the “eGenomics: Cataloguing our Complete Genome Collection III” workshop held September 11–13, 2006, at the National Institute for Environmental eScience (NIEeS), Cambridge, United Kingdom. This 3rd workshop of the Genomic Standards Consortium was divided into two parts. The first half of the three-day workshop was dedicated to reviewing the genomic diversity of our current and future genome and metagenome collection, and exploring linkages to a series of existing projects through formal presentations. The second half was dedicated to strategic discussions. Outcomes of the workshop include a revised “Minimum Information about a Genome Sequence” (MIGS) specification (v1.1), consensus on a variety of features to be added to the Genome Catalogue (GCat), agreement by several researchers to adopt MIGS for imminent genome publications, and an agreement by the EBI and NCBI to input their genome collections into GCat for the purpose of quantifying the amount of optional data already available (e.g., for geographic location coordinates) and working towards a single, global list of all public genomes and metagenomes.
The EMBL Nucleotide Sequence Database () at the EMBL European Bioinformatics Institute, UK, offers a large and freely accessible collection of nucleotide sequences and accompanying annotation. The database is maintained in collaboration with DDBJ and GenBank. Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation, alignments and bulk data. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. In 2006, the volume of data has continued to grow exponentially. Access to the data is provided via SRS, ftp and variety of other methods. Extensive external and internal cross-references enable users to search for related information across other databases and within the database. All available resources can be accessed via the EBI home page at . Changes over the past year include changes to the file format, further development of the EMBLCDS dataset and developments to the XML format.
The EMBL Nucleotide Sequence Database () at the EMBL European Bioinformatics Institute, UK, offers a comprehensive set of publicly available nucleotide sequence and annotation, freely accessible to all. Maintained in collaboration with partners DDBJ and GenBank, coverage includes whole genome sequencing project data, directly submitted sequence, sequence recorded in support of patent applications and much more. The database continues to offer submission tools, data retrieval facilities and user support. In 2005, the volume of data offered has continued to grow exponentially. In addition to the newly presented data, the database encompasses a range of new data types generated by novel technologies, offers enhanced presentation and searchability of the data and has greater integration with other data resources offered at the EBI and elsewhere. In stride with these developing data types, the database has continued to develop submission and retrieval tools to maximise the information content of submitted data and to offer the simplest possible submission routes for data producers. New developments, the submission process, data retrieval and access to support are presented in this paper, along with links to sources of further information.
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), maintained at the European Bioinformatics Institute (EBI), incorporates, organizes and distributes nucleotide sequences from public sources. The database is a part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. The web-based tool, Webin, is the preferred system for individual submission of nucleotide sequences, including Third Party Annotation (TPA) and alignment data. Automatic submission procedures are used for submission of data from large-scale genome sequencing centres and from the European Patent Office. Database releases are produced quarterly. The latest data collection can be accessed via FTP, email and WWW interfaces. The EBI’s Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases as well as many other specialist molecular biology databases. For sequence similarity searching, a variety of tools (e.g. FASTA and BLAST) are available that allow external users to compare their own sequences against the data in the EMBL Nucleotide Sequence Database, the complete genomic component subsection of the database, the WGS data sets and other databases. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).