|Home | About | Journals | Submit | Contact Us | Français|
AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).
Protein structures and functions are defined by the combinations of physicochemical and biochemical properties of 20 naturally occurring amino acids that are the building-blocks of proteins. A wide variety of properties of amino acids have been investigated through a large number of experiments and theoretical studies. Each of these amino acid properties that can be represented by a set of 20 numerical values is referred to as an amino acid index. Nakai et al. (1) collected 222 amino acid indices from published literature and investigated the relationships among them using hierarchical cluster analysis. They also released the amino acid indices as an online database. In 1996, Tomii and Kanehisa (2) further collected amino acid indices to enrich the database. Additionally, they also collected 42 amino acid substitution matrices from the literature and released the collection as AAindex2. The AAindex database is continuously updated by the present authors (3,4).
AAindex has been used in wide-ranging bioinformatics research on protein sequences, such as predicting protein subcellular localization (5), immunogenicity of MHC class I binding peptides (6), protein SUMO modification site (7) and coordinated substitutions in multiple alignments of protein sequences (8). Furthermore, there is a derivative database of AAindex (UMBC AAindex Database: http://www.evolvingcode.net:8080/aaindex/) and a web tool for visualizing relationships among AAindex entries (9). Given the examples cited here, AAindex has become a useful resource in bioinformatics.
In 2005, Pokarowski et al. (10) compared 29 published matrices of protein pairwise contact potentials, i.e. energy functions that are obtained from statistical analysis of protein structures (10). These potentials have long been used to predict protein structures in silico. Pokarowski and coworkers elucidated that each of the contact potentials is similar to one of two popular matrices derived by Miyazawa and Jernigan (11). Recently, working on 29 mostly new amino acid substitution matrices and 5 contact potentials, the same team (12) obtained segregation of substitution matrices similar to Tomii and Kanehisa (2). Moreover, they found intermediate links between substitution matrices and contact potentials—matrices and potentials that exhibit mutual correlations of at least 0.8. In both works (10,12), Pokarowski and coworkers approximated matrices by simple functions of amino acid indices, which allow us to comprehend better the exchangeability of amino acids as well as the residue–residue interactions in proteins. These relations between substitution matrices, contact potentials and amino acid indices provide motivation to extend the AAindex database. In the present work, we have compiled the data collected in the study on contact potentials (10) as a new section of AAindex database, named AAindex3. As a result we believe that the AAindex has increased its utility in the bioinformatics study of proteins. In this paper we report the current status of the three sections of AAindex.
The AAindex is released approximately annually. The latest version is the 9.0 release.
The AAindex database is a flat file database that consists of three sections: AAindex1 for the amino acid indices, AAindex2 for the amino acid substitution matrices and AAindex3 for the amino acid contact potentials. The contents of the three sections are as follows.
The AAIndex1 currently contains 544 amino acid indices. Each entry consists of an accession number, a short description of the index, the reference information and the numerical values for the properties of 20 amino acids.
We have provided a link to the corresponding PubMed entries of each AAindex entry, instead of a link to the LitDB literature database (13) that we originally used. In addition, each entry contains cross-links to other entries with an absolute value for the correlation coefficient of 0.8 or larger. The links enable the users to identify a set of entries describing similar properties. In some instances the values are not reported for all 20 amino acids.
To represent an overview of the relationships among current amino acids indices, we constructed the minimum spanning tree of amino acid indices by the procedure described by Tomii et al. (2) (Figure 1). In Figure 1, each rectangle represents an index. The colored rectangles are the 402 indices classified in six groups defined by Tomii and coworkers. The indices belonging to the Tomii's classification are still grouped into clusters. Newly added indices are distributed evenly across the tree. That is, the indices for various kinds of properties have been added to the AAindex.
The AAindex2 currently contains 94 amino acid substitution matrices: 67 symmetric matrices and 27 non-symmetric matrices. The format of the entry is almost the same as that of AAindex1 except that it contains 210 numerical values (20 diagonal and 20 × 19/2 off-diagonal elements) for a symmetric matrix and 400 or more numerical values for a non-symmetric matrix (some matrices include a gap or distinguish two states of cysteine). In the previous release, each symmetric matrix, which is triangular in shape, was folded into a 10 × 21 table for the purpose of saving space, and columns were separated by space characters. In the present release, symmetric matrices are not folded and delimiter of columns has been changed into a tab character easier parsing of the entry.
The AAindex3 section currently contains 47 amino acid contact potential matrices: 44 symmetric matrices and 3 non-symmetric matrices. The format of the entry is almost the same as that of AAindex2. A sample entry of the AAindex3 is shown in Figure 2.
The DBGET/LinkDB system integrates most of the major molecular biology databases and is especially suited for using hyperlinks to related entries within the AAindex database as well as to the other databases. Alternatively, the entries database may be copied and used locally. The URL for anonymous FTP is: ftp://ftp.genome.jp/pub/db/community/aaindex/
BioRuby that is a bioinformatics library of Ruby programming language has provided the useful functions to handle the AAindex database (http://bioruby.org/). EMBOSS (16) has provided a program to extract the index data from the AAindex entry.
Users are requested to cite this article when making use of the AAindex database.
We thank Drs Kenta Nakai and Kentaro Tomii for the initial developments of the AAindex database. This work was supported by grants from the Ministry of Education, Culture, Sports, Science and Technology, and the Japan Science and Technology Agency. We thank Ms Mansi Srivastava and Dr Takeshi Kawashima for critical reading of our manuscript. The computational resources were provided by the Bioinformatics Center, Institute for Chemical Research, Kyoto University and the Super Computer System, Human Genome Center, Institute of Medical Science, University of Tokyo. Funding to pay the Open Access publication charges for this article was provided by the University of Tokyo.
Conflict of interest statement. None declared.