|Home | About | Journals | Submit | Contact Us | Français|
Protein interactions are involved in important cellular functions and biological processes that are the fundamentals of all life activities. With improvements in experimental techniques and progress in research, the overall protein interaction network frameworks of several model organisms have been created through data collection and integration. However, most of the networks processed only show simple relationships without boundary, weight or direction, which do not truly reflect the biological reality. In vivo, different types of protein interactions, such as the assembly of protein complexes or phosphorylation, often have their specific functions and qualifications. Ignorance of these features will bring much bias to the network analysis and application. Therefore, we annotate the Arabidopsis proteins in the AtPID database with further information (e.g. functional annotation, subcellular localization, tissue-specific expression, phosphorylation information, SNP phenotype and mutant phenotype, etc.) and interaction qualifications (e.g. transcriptional regulation, complex assembly, functional collaboration, etc.) via further literature text mining and integration of other resources. Meanwhile, the related information is vividly displayed to users through a comprehensive and newly developed display and analytical tools. The system allows the construction of tissue-specific interaction networks with display of canonical pathways. The latest updated AtPID database is available at http://www.megabionet.org/atpid/.
With the improvement of modern biological research technology and the advances in studying the model organism––Arabidopsis thaliana, a large amount of data related to proteins, such as the data of proteomics, subcellular localization, (1,2) three-dimensional structures, the tissue-specific gene expressions (3,4), etc. have been published in corresponding literature. Progress in functional genomics has allowed large amounts of data about mutants to be reported. The mutant related data are partially curated and mapped to the protein-coding genes with the information of germplasm and phenotype information in TAIR (5) and other seed resource databases (6). These annotated data are valuable resources for researchers to further comprehensively understand the gene/protein functions at multi-levels.
Meanwhile, the research of protein–protein interaction (PPI) in Arabidopsis has achieved significant results, both experimentally and computationally (7). High-throughput data, such as protein phosphorylation (8,9) has also been reported. In addition, the accumulation of data related to signal transduction and transcriptional regulatory mechanisms (10,11) has allowed a great quantity of protein–protein interactions to be annotated in detail. The integration of various data from the AtPID database has been updated accordingly and the new properties about the network have been established, the annotation assignment of each node, along with the direction of edges and the type/style of edges all added to the previous protein–protein interaction network.
Besides the accomplished text mining and general data collection that expand the contents of AtPID database, new network display programs have also been developed to help researchers focus on the protein that they are interested in. In our latest AtPID 4.0 version, a more advanced query mode, allowing retrieval of a whole pathway, is implemented by certain optimized algorithms. All of the improvements and updates will accelerate researchers in exploiting information in our protein–protein interaction network in an effective and comprehensive way.
Rather than storing data from other databases in our system, we retrieve data directly from these other data resources through a data interface; this eliminates the potential bias due to delayed updates and inconsistent data integration. In the process of literature data assembly, we consider accuracy as the first priority, not just quantity of the data. At the same time, we also standardize data mining processes, determine and record data manually for data from each literature with archiving the related literature information and descriptive sentences. Moreover, we tap the literature PubMed link for text-mining data, and provide the cross-links to other related databases. Users can view these resources from the query result pages and network display page.
For each queried results, besides displaying them in a table format layout, we also display them by integrating all related data in a network way with our newly developed powerful tool, and the intuitive visualization provides users a convenient way to navigate the data and check the relationships between proteins, the relationships include the interaction between proteins, the transcription regulation between the transcription factors and their target genes, the pathway in which the proteins are involved and the phosphorylation status of the protein, etc.
Mutants have been widely used in functional genomics research, considering the advantages in seed mutagenesis, genetic modification and tissue culture, plants are easier than animals for obtaining stable traits, and thus, have generated rich resources for mutants. So far, a large number of characterized stable Arabidopsis mutants has been reported in research literature (12), and some seed resource databases. We have integrated information and experimental results extracted from research literature, seed resource database and TAIR-released phenotype data into our AtPID database; at the same time, we also have classified those data and annotated the phenotypes for mutants based on plant ontology (13) (Table 1)
We have collected proteomics data from several MS experiments (14) and integrated them into AtPID database. Those MS data are generated from 12 different tissues and some of those are tissue-specific proteins that come from flower bud, flower, cotyledon, juvenile leaf, root, seed, carpel, silique, cell suspension culture, shoot, rosette and pollen. There are a total of 13970 non-redundant proteins identified from those 12 different tissues (Table 2), which can be regarded as a set with high confidence.
Moreover, based on the collected proteomics data, we have integrated the identified proteins from each of 12 tissues with the protein interaction data in AtPID database; users can display the protein interaction network in a selected tissue. The network view of proteins in each tissue provides an intuitive way for users to explore the protein function, the protein interaction relationships, the function module of the protein involved and the potential regulatory relationship based on the current available data.
Transcriptional regulation is one of the research fields making dramatic progress recently with the adoption of new technologies and is an important resource for us to understand the mechanisms of various biological processes and activities. Therefore, we have also integrated such related information into our AtPID database to help the community to explore the regulatory relationships between the transcription factors and their target genes. In the AtPID database, the gene transcription regulation information has been embedded into the protein interaction network and we believe that it makes the overall network more consistent with the biological reality through the transcriptional regulation information integrated into the static protein interaction network. Moreover, we extend our PPI data through the integration and text mining, and have collected 770 new relationships involved in 522 proteins into our AtPID database (Table 3).
For the purpose of maximizing display annotation information and optimizing web data transmission, we developed new online display tools, which provide more detailed network information output and incorporate some useful online analysis tools. With the optimization of algorithms, the components of the displayed network can be expanded to 1500 nodes maximally or until up to the limit of the dataset. In this way, users can analyze and visualize the query results in a global view (Figure 1). We plan to launch the off-line version in the near future in order to provide the opportunity for the users to better display and analyze relevant network information.
As the platform of protein–protein interaction for Arabidopsis thaliana, AtPID database will continue to enrich the data concerning protein–protein interactions through text mining and analysis, focus on the integration of different types of data efficiently as well as developing new display tools, which will enable researchers to analyze, utilize and visualize the data in convenient ways. As more direct evidence for protein–protein interactions in A. thaliana cells becomes available, it will be possible to refine the networks we have defined and make them more useful for testing hypotheses about the mechanisms of various physiological activities. Our next step is to integrate several high-throughput data types, such as gene expression data generated from deep sequence technology and proteomics profiles, and annotate the network to provide direction. We also plan to carry out quantification analysis for protein interaction network via statistical approaches; at the same time, we will dynamically display the network based on time course data on developmental or real-time responses to reflect tissue-specific gene expression.
Funding for open access charge: State Key Program of Basic Research of China grants (2007CB108800, 2010CB945400); National High Technology Research and Development Program of China (863 project) (Grant No. 2006AA10Z129, 2006AA02Z313); National Natural Science Foundation of China grants (30870575, 30730078); Science and Technology Commission of Shanghai Municipality (06DZ22923).
Conflict of interest statement. None declared.