|Home | About | Journals | Submit | Contact Us | Français|
More than 7000 papers related to “protein refolding” have been published to date, with approximately 300 reports each year during the last decade. Whilst some of these papers provide experimental protocols for protein refolding, a survey in the structural life science communities showed a necessity for a comprehensive database for refolding techniques. We therefore have developed a new resource – “REFOLDdb” that collects refolding techniques into a single, searchable repository to help researchers develop refolding protocols for proteins of interest.
We based our resource on the existing REFOLD database, which has not been updated since 2009. We redesigned the data format to be more concise, allowing consistent representations among data entries compared with the original REFOLD database. The remodeled data architecture enhances the search efficiency and improves the sustainability of the database. After an exhaustive literature search we added experimental refolding protocols from reports published 2009 to early 2017. In addition to this new data, we fully converted and integrated existing REFOLD data into our new resource. REFOLDdb contains 1877 entries as of March 17th, 2017, and is freely available at http://p4d-info.nig.ac.jp/refolddb/.
REFOLDdb is a unique database for the life sciences research community, providing annotated information for designing new refolding protocols and customizing existing methodologies. We envisage that this resource will find wide utility across broad disciplines that rely on the production of pure, active, recombinant proteins. Furthermore, the database also provides a useful overview of the recent trends and statistics in refolding technology development.
Establishment of heterologous expression technology of recombinant proteins has revolutionized protein purification such that it is performed with cloned, recombinant proteins expressed in a suitable host. The predominant host is Escherichia coli. However, many overexpressed proteins in E. coli are found in an insoluble form called inclusion bodies (IBs). Since the target protein is often highly pure in washed IBs, the challenge is not so much to purify the target, but rather to solubilize IBs and refold the protein into its native, biologically active state [1, 2]. While many of the operations to prepare IBs are quite general—expression, cell disruption, IB isolation and washing, the precise conditions that are required to achieve efficient refolding vary for each protein.
The refolding experiments consist of two steps: (1) the solubilization of IBs by adding a denaturant and (2) the renaturation of the denatured protein by lowering the denaturant concentration. The solubilization step is relatively easily, performed by adding a denaturant, typically urea or guanidinium chloride at a final concentration of 6–8 M or 6 M, respectively. The renaturation step is often difficult. In order to maximize the refolding yield, the optimization of the following experimental methods/conditions of this process is required:
Because the suitable refolding methods/conditions differ from protein to protein, a knowledge database of optimized refolding methods/conditions for each protein is an important resource for many biochemists and molecular biologists. Thus, the REFOLD database established and published by Monash University in 2006 played an important role as the sole information source for refolding experiments [10–13]. This database, however, suspended its updates in 2009. We carried out a preliminary study in 2013 for the development of a sustainable database on protein refolding technologies. We decided that a new database, REFOLDdb, was required as a gateway to experimental methodologies that describe experimental refolding in detail. We therefore designed a simple data format and consistent data representation among entries so that users are able to easily interrogate the database and painlessly retrieve and understand search results. The design also allows straightforward maintenance, allowing the database to be sustainable over a long period.
The sustainability of biological databases is a serious issue [14, 15] and database developers have to analyze cost-effectiveness in advance. In the case of databases relating to technologies (Tech_db), the data volume will not expand as rapidly as in the case of molecular databases, for example the International Sequence Database , the Worldwide Protein Data Bank , UniProt , SUPERFAMILY database . Nevertheless, developer of Tech_db must be sensitive to the direct and indirect cost of data extraction from the primary articles, curation and updating. REFOLDdb is designed to balance both cost and usefulness.
We have captured the refolding data from up-to-date literature as well as retrospectively from articles published since 2009. We also updated, converted and integrated the data stored in the REFOLD database into REFOLDdb. As of March 17th, 2017, REFOLDdb provides users with data on 1877 experimental methods for refolding 1628 proteins. Most of these data were extracted from 1232 publications.
We searched the NCBI PubMed database by a keyword search of “(refolding[All Fields] OR renaturation[All Fields]) AND (“proteins”[MeSH Terms] OR “proteins”[All Fields] OR “protein”[All Fields])” to find 2606 research reports published between 2009-early 2017 that might be relevant to REFOLDdb. Manual inspection of the results identified 420 reports that contained experimental protocols for the refolding of 650 proteins. These data were then integrated in REFOLDdb along with the data stored in the REFOLD database. REFOLDdb refers to 1232 publications in total (Full list available via a menu “List of publications referred by REFOLDdb” in “About” page at http://p4d-info.nig.ac.jp/refolddb/about.cgi?lang=EN).
Due to the standardization and other extension of the data format, the database now contains the following functionality: (1) it is searchable by sequence similarity; (2) it is equipped with statistics that enables the discovery of trends in refolding techniques; and (3) it is easy to upload/submit new data to the database manager. Specifically, the database has the following three sections: Article [title/abstract/PubMed ID/Author/Journal/Date], Protein [Protein name/Amino acid sequence/Comment/UniProt ID/Function/Domain] and Experiment [Refolding methods/pH/Temperature/Validation]. We did not itemize “protein concentration” and “additive(s)”, because “protein concentration” is often missing in articles and the description of “additive(s)” is quite heterogeneous. REFOLDdb is composed of 12 tables in a relational database system.
REFOLDdb was created using open-source PostgreSQL relational database server software version 9.2.14 (https://www.postgresql.org/), running under CentOS 7 Server (version 7.2-1511) on a virtual machine based on VMware ESXi (http://www.vmware.com/products/esxi-and-esx.html). The system complies with the security policy of the National Institute of Genetics, Japan. A web-based query interface to the database was developed using the Perl programming language and PDO database abstraction classes (http://jp2.php.net/manual/en/book.pdo.php), and is hosted on the same virtual machine running the Apache 2.4.6 web server.
The top page of REFOLDdb is composed of (a) a horizontal bar menu and (b) a large main search window.
The resources, including human resources, required for running and updating REFOLDdb is kept to a minimum. A team of one annotator who is knowledgeable about structural biology and a part time system engineer will be able to keep REFOLDdb up-to-date as far as collecting data from research papers on a monthly basis. The database system based on the virtual machine is almost autonomous and also flexible enough to allow future expansion.
In the future, we will evaluate new data sources other than research articles, such as patents, that might make the database more comprehensive. In addition, we will investigate the implementation of data-mining functionality to allow the prediction of suitable refolding methods based on chemical, physical and/or genetic features of proteins that have been successfully refolded.
The authors are grateful to Professor Junichi Takagi (Institute for Protein Research, Osaka University) for his suggestions on the needs of the research communities to databases in structural life sciences.
This work was supported by the ‘Platform for Drug Discovery, Informatics, and Structural Life Science’ grant from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT) and the Japan Agency for Medical Research and Development (AMED). The funding body did not play any role in the design or conclusion of the study. Funding for open access charge: Waseda University, Japan.
REOLDdb is searchable and downloadable at http://p4d-info.nig.ac.jp/refolddb/. A list of all papers which REFOLDdb referred to is accessible from the menu “List of publications referred by REFOLDdb” in “About” page at http://p4d-info.nig.ac.jp/refolddb/about.cgi?lang=EN.
HM, HS, TS, JO, KN and MT designed the REFOLD db. HM, KM, JO, TS, SN, YX, DW and HU contributed to the production of the data set based on research papers on refolding technologies. AB contributed to the conversion of the REFOLD database to REFOLD db. HS, KN, AB and KY wrote the manuscript. All authors read and approved the final manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Hisashi Mizutani, Email: pj.ca.gin@natuzimh.
Hideaki Sugawara, Phone: +81-55-981-6895, Email: pj.ca.gin@rawagush.
Ashley M. Buckle, Email: email@example.com.
Takeshi Sangawa, Email: pj.ca.u-akaso.nietorp@31ihsakaya.
Ken-ichi Miyazono, Email: pj.ca.oykot-u.cce.liam@zayima.
Jun Ohtsuka, Email: moc.liamg@kusthoja.
Koji Nagata, Email: pj.ca.oykot-u.cce.liam@ataganka.
Tomoki Shojima, Email: pj.ca.oykot-u.cce.g@3923198541.
Shohei Nosaki, Email: pj.ca.oykot-u.cce.g@ikasona.
Yuqun Xu, Email: moc.liamg@nuquyuxnuquyux.
Delong Wang, Email: firstname.lastname@example.org.
Xiao Hu, Email: moc.liamg@9300uhwas.
Masaru Tanokura, Email: pj.ca.oykot-u.cce.liam@konatma.
Kei Yura, Email: email@example.com.