|Home | About | Journals | Submit | Contact Us | Français|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact email@example.com
The Transporter Classification Database (TCDB) is a web accessible, curated, relational database containing sequence, classification, structural, functional and evolutionary information about transport systems from a variety of living organisms. TCDB is a curated repository for factual information compiled from >10000 references, encompassing ~3000 representative transporters and putative transporters, classified into >400 families. The transporter classification (TC) system is an International Union of Biochemistry and Molecular Biology approved system of nomenclature for transport protein classification. TCDB is freely accessible at http://www.tcdb.org. The web interface provides several different methods for accessing the data, including step-by-step access to hierarchical classification, direct search by sequence or TC number and full-text searching. The functional ontology that underlies the database structure facilitates powerful query searches that yield valuable data in a quick and easy way. The TCDB website also offers several tools specifically designed for analyzing the unique characteristics of transport proteins. TCDB not only provides curated information and a tool for classifying newly identified membrane proteins, but also serves as a genome transporter-annotation tool.
Transport is an essential function of every living cell. Thousands of researchers worldwide devote their efforts to the study of this basic function. Transmembrane transport protein biology has undergone an explosive growth in scientific discovery in the last several years. The recent high-resolution structural elucidation of many transporters [such as BtuB (1,2), AcrB (3), LacY (4), GlpT (5), EmrE (6) and MsbA (7,8)] has enabled investigation into the molecular dynamics of fundamental transport processes. As the structures of other unique transport systems are revealed, the power of computational methods in transporter analysis and prediction will grow exponentially.
Transporters play critical roles in the life science industries. Absorption, distribution and excretion of drugs within the human body are regulated by transporters which must be factored into pharmacological studies. Growing numbers of pathogenic microbial strains resistant to many common antibiotics are wreaking havoc on the public health system. Computational prediction of potential inhibitors of multi-drug resistance transporters and of transporters that offer a survival advantage to pathogenic microbes would help in the design of novel anti-microbial drugs. Additionally, drug resistance in cancer cells, caused by drug efflux pumps, is of increasing concern in the field of oncology. The immense importance of studying transport proteins and the enormity of the data available on these proteins have warranted the systematic classification of transport systems in order to promote a comprehensive understanding of one of the basic functions of all living cells (9–11).
The Transporter Classification Database (TCDB) is a freely accessible web resource (http://www.tcdb.org) allowing access to the data upon which the transporter classification (TC) system is based. All data in TCDB is a compilation of published information from over 10000 references. Approximately 3000 distinct proteins from all kinds of known living organisms are organized into >400 transporter families based on the TC system. Data are added on a continuous basis as new functional data are published and new transport systems are identified. Several resources for analyzing transmembrane proteins are provided on the website. Uniting a multitude of resources and biological databases for centralized computational analysis facilitates the ease-of-use that life scientists require when researching transporters. The availability of TCDB has allowed major basic research advances including answering fundamental biological questions and determining the routes of evolution taken for the appearance of these proteins (12,13).
The TC system consists of a set of representative protein sequences, most of which have been functionally characterized. These transporters are classified with a five-character designation, as follows: D1.L1.D2.D3.D4. D1 (a single digit) corresponds to the transporter class (i.e. channel, carrier, primary active transporter, group translocator or transmembrane electron flow carrier). L1 (a letter) corresponds to the transporter subclass, which, e.g. in the case of primary active transporters, refers to the energy source used to drive transport. D2 (a number) corresponds to the transporter family (sometimes actually a superfamily). D3 (a number) corresponds to the subfamily (or the family of a superfamily) in which a transporter is found. D4 (a number) corresponds to the transporter itself. This refers to a specific transport system with a defined range of substrates, a known polarity of transport, an energy source that drives vectorial movement of the substrate and a mechanism of action. Only in one of the TC classes (class 9) is this information incomplete or absent.
A TC number for proteins in classes 1–5 provides the following information: (i) the type of transporter (D1); (ii) the subtype of transporter; e.g. for primary active transporters, the type of energy source used to drive transport (L1); (iii) the specific family to which the transporter belongs (D2); (iv) the subfamily to which the transporter belongs (D3) and (v) the specific transporter with a given polarity, specificity and mechanism of action (D4). Because phylogeny reflects the mechanism, mode of energy coupling, polarity and substrate specificity of a transporter, a functional/phylogenetic system of classification provides far more information than would be possible with a purely functional one. The basis for the architecture of the TC system as approved by the International Union of Biochemistry and Molecular Biology has been enunciated in detail in (11). The full architectural consideration of the TC system is beyond the scope of this article.
At the heart of TCDB are the protein families. Although there are a few examples of transporters within families that can use more than a single mode of action or can use a mechanism dissimilar from that used by other members of the family, for the most part, family membership implies similar function and mechanism. Any two transport systems in the same subfamily of a transporter family that transports the same substrate(s) are given the same TC number, regardless of whether they are orthologs (e.g. arose in distinct organisms by speciation) or paralogs (e.g. arose within a single organism by gene duplication). However, because different types of information may be available for two proteins of the same specificity (e.g. regulatory data, subcellular localization data, disease association data), two or more such systems may on occasion be included in TCDB. It should be noted that within practical limits, TCDB reflects the current state of our knowledge about the proteins included within it.
If two transporters exhibit weak similarity but operate by the same transport mechanism, two distinct subfamilies will represent the two transporters and their close homologs. Sequenced homologs of unknown function are normally not assigned a TC number unless they represent a unique family/subfamily or are from an underrepresented kingdom. Transporter classes 1–5 are well-defined classes, class 8 is reserved for accessory transport proteins, while class 9 is for transporters which are incompletely characterized. When sufficient information warrants their transfer to one of the defined classes (1–5), they will acquire a new TC number. Class 9 is therefore in a continual state of flux. TC classes 6 and 7 are currently unused but will be introduced if additional classes of transporters are discovered.
The TCDB web application is based upon a three-tier architecture. The underlying tier of the system is the open source database MySQL. An Apache-PHP applications server forms the middle tier, which retrieves tuples from the database and returns populated HTML data to the web browser client, the superficial tier. This architecture resides upon a dual processor PowerPC G5 running Mac OS X operating system.
The raw data stored in TCDB originates from multiple sources. Protein sequences are obtained from the Swiss-Prot knowledgebase (14). The 3D macromolecular structures are retrieved from the PDB (15) in mmCIF format. Protein domains from Interpro (16) are integrated with TCDB. Human transporters with nomenclature approved by the Human Genome Nomenclature Committee are presented as reported in GENEW (17). Life science journal citations are integrated, and in the case of human transporters, as well as transporters with structural data, PubMed ID numbers (18) are provided.
Encoded within the TCDB relational schema is the functional and phylogenetic TC system taxonomy. The clickable ‘TC System’ button on the main page (Figure 1) provides access to the data in TCDB. A two-vertical-frame architecture of the web page allows quick browsing through the hierarchical TC system in the left frame as well as access to detailed description or to protein information in the right window. Thus, users can access the classification system through the intuitive interface that allows the user to read descriptions of entries at varying levels of granularity. The user can start at the top of the hierarchy and descend through the taxonomy. At the deepest level, the user can retrieve individual protein information such as Swiss-Prot accession number, the primary sequence, source organism and the protein name, length, molecular weight and probable topology (Figure 2). Several links, such as links to the Swiss-Pfam database, the ExPASy server, the Swiss Institute of Bioinformatics BLAST Network service, and transmembrane segment (TMS) prediction are provided (Figure 2). A link to the FASTA formatted protein sequence as well as a quick link to the hydropathy and amphipathicity plots for the protein are available. A user can enter the TC family name or TC number to search the database (Figure 1). Additionally, the ‘Search’ link at the top of the page (Figure 1) allows advance searches by keyword, disease name, protein name, etc. Cited literature in TCDB can be searched as well.
Phylogenetic analyses and refined sequence comparisons of many transport systems in our laboratory have revealed distant relationships between many families in TCDB (19). These distant relationships are detailed in a section named ‘TC Superfamilies’ on the main website (Figure 1). This information has been integrated with the data in TCDB. Thus, if the user explores the TC hierarchy and inspects a family with known distant relationships to other families, the relationships will be mentioned in the family description.
We have also included a section detailing human transporters that have been approved by the Human Genome Nomenclature Committee. Each of these proteins has been cross-referenced with the TC system. This information can be accessed via the ‘Human MTPs’ button on the main website (Figure 1). Another new section reports diseases in humans that are associated with human transporters and includes cross-references to the Online Mendelian Inheritance of Man database (OMIM, http://www.ncbi.nlm.nih.gov/omim/). The ‘MTP Diseases’ section (Figure 1) contains these data. The burgeoning number of transporter structures and accessory proteins being sequenced and deposited has led us to catalog known transporter macromolecular structures and cross-reference each structure with its TC number. The ‘MTP Structures’ link (Figure 1) on the TCDB website provides access to a table listing such information. The presence of 3D structural data for a protein in any given protein family is noted as well in the protein family description in TCDB.
Over the years, we have developed an extensive collection of tools suited to the analysis of transporters. All of these tools (Figure 3) can be accessed through TCDB by clicking on the ‘Analyze’ link on the website or by directly visiting http://www.tcdb.org/analyze.php. Several tools to analyze transporters are provided such as TMS prediction using HMMTOP 2.0 (20), hydropathy analysis using the Kyte and Doolittle hydropathy scale (21), hydrophobic moment (amphipathicity) analysis (22) using the H moment program from EMBOSS (23) and helical wheel plots using the Pepwheel program from EMBOSS. A protein sequence can be submitted for hydropathy and amphipathicity analysis. TMSs predicted by HMMTOP are displayed on the plot (Figure 4). The user may then click on a TMS and view the helical wheel plot of the TMS.
Sequence similarity searches using BLAST (24,25) or SSEARCH (26) are available to search for homologous proteins in TCDB. A quick link to BLAST is provided on the main website as well (Figure 1). A protein or a nucleotide sequence can be submitted to TC-BLAST for a sequence similarity search. The results will specify similar proteins with their TC numbers and numbers of TMSs. The user may then select several sequences and view the multiple sequence alignment and generate a phylogenetic tree by clicking on the ‘TC-TREE’ button. The user interface displays the multiple sequence alignment with marked predicted TMSs as well as a plot of average hydropathy and conservation. A phylogenetic tree for the sequences can be viewed using ATV (27).
Tools for alignment of two or more sequences are also provided. Two sequences are aligned either locally using SSEARCH or globally using the Needle program from EMBOSS. The output of the pairwise global alignment also highlights the TMSs that are predicted using HMMTOP. Multiple sequence alignment with predicted TMSs displayed (28) can also be carried out. A link to additional sequence analysis tools on the Biotools server (http://saier-144-37.ucsd.edu) is also provided (Figure 3). Several analytical tools developed in our lab are hosted on this server.
TCDB is a centralized resource for transporter data and analysis. We are dedicated to bringing data and analytical tools to TCDB users in a timely fashion. Further improvements will include the addition of more analytical sequence tools as well as a bioinformatics process pipeline generator which will enable the user to create work flows for complicated analyses. We will also improve data mining capabilities for analyzing the textual information stored in our database such as PubMed citations and TC family descriptions.
Funding to pay the Open Access publication charges for this article was provided by a grant from The National Institute of Health.
Conflict of interest statement. None declared.