Search tips
Search criteria 


Logo of narLink to Publisher's site
Nucleic Acids Res. 2011 January; 39(Database issue): D1–D6.
Published online 2010 December 16. doi:  10.1093/nar/gkq1243
PMCID: PMC3013748

The 2011 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection


The current 18th Database Issue of Nucleic Acids Research features descriptions of 96 new and 83 updated online databases covering various areas of molecular biology. It includes two editorials, one that discusses COMBREX, a new exciting project aimed at figuring out the functions of the ‘conserved hypothetical’ proteins, and one concerning BioDBcore, a proposed description of the ‘minimal information about a biological database’. Papers from the members of the International Nucleotide Sequence Database collaboration (INSDC) describe each of the participating databases, DDBJ, ENA and GenBank, principles of data exchange within the collaboration, and the recently established Sequence Read Archive. A testament to the longevity of databases, this issue includes updates on the RNA modification database, Definition of Secondary Structure of Proteins (DSSP) and Homology-derived Secondary Structure of Proteins (HSSP) databases, which have not been featured here in >12 years. There is also a block of papers describing recent progress in protein structure databases, such as Protein DataBank (PDB), PDB in Europe (PDBe), CATH, SUPERFAMILY and others, as well as databases on protein structure modeling, protein–protein interactions and the organization of inter-protein contact sites. Other highlights include updates of the popular gene expression databases, GEO and ArrayExpress, several cancer gene databases and a detailed description of the UK PubMed Central project. The Nucleic Acids Research online Database Collection, available at:, now lists 1330 carefully selected molecular biology databases. The full content of the Database Issue is freely available online at the Nucleic Acids Research web site (


This current, 18th annual Database Issue of Nucleic Acids Research (NAR) features descriptions of 96 new (Table 1) online databases covering a variety of molecular biology data and 83 data resources that have previously been published in NAR or other journals. The accompanying NAR online Molecular Biology Database Collection ( now includes 1330 data sources.

Table 1.
New molecular biology databases featured in the 2011 NAR Database Issue

In addition to this editorial comment, the current issue includes two more editorials. The first of them (1) is a collective statement by a large consortium of scientists, including the authors of this article, who are concerned with the proliferation of new databases that are rarely able to talk to each other. As a result, instead of contributing to building a single body of knowledge, these databases risk functioning increasingly as isolated islands in a sea of disparate biological data. This article proposes creating a community-defined, uniform, generic description of the core attributes of biological databases, BioDBcore, a kind of ‘minimal information about a biological database’, and provides a preliminary checklist to describe basic specifications of each new database (1). We would ask the authors of future submissions to the NAR Database Issue to fill out that checklist (or its latest version posted at and provide it as Supplementary Data to their manuscripts. In addition, we will explore ways in which the NAR online Molecular Biology Database Collection might ultimately support the standard.

Another editorial (2) describes COMBREX, an exciting project that is aimed at figuring out the functions of the ‘conserved hypothetical’ and poorly or incorrectly annotated proteins, identified through genome sequencing [see also refs (3,4)]. This project is designed to serve as a clearinghouse, collecting functional predictions from specialists in bioinformatics and functional genomics and then sending these predictions for testing by experimentalists. COMBREX offers an entirely new arrangement for research funding, whereby relatively small amounts of money are offered on a competitive basis to the experimental groups that are willing to test those predictions, employing the techniques and equipment that already exist in their laboratories. This arrangement dramatically decreases the costs of functional analysis of the uncharacterized proteins and gives hope that many of them could be assigned a biochemical—and/or general biological—function.

A bright example of databases that do talk to each other is the International Nucleotide Sequence Database Collaboration (INSDC), which consists of three participating databases, the DNA Data Bank of Japan (DDBJ), the European Nucleotide Archive (ENA) at the European Bioinformatics Institute (EMBL-EBI), and GenBank at the US National Center for Biotechnology Information (NCBI). This issue features separate papers from each of these three databases (5–7), as well as a joint paper describing the principles of data maintenance and exchange within the collaboration (8). A separate paper describes the functioning of the Sequence Read Archive (SRA), recently established by the three INSDC partners (9).

Another area where database collaboration proved extremely successful is storage and dissemination of published research. This issue features a detailed description of the UK PubMed Central, an extremely important project that, in collaboration with PubMed Central projects in USA and Canada, provides a permanent online record for the research sponsored by British funding agencies, such as MRC, BBSRC, Wellcome Trust and the National Institute for Health Research (10).

In addition to the archival databases such as those of the INSDC, this issue includes curated databases of DNA sequence motifs, such as AREsite, a collection of AU-rich elements in vertebrate mRNA UTR sequences, and non-B DB, a repository of DNA sequences that form cruciform, triplex, slipped (hairpin) structures, tetraplex (G-quadruplex), left-handed Z-DNA and other DNA structures (11,12).

The RNA database papers featured in this issue include updates on Rfam and miRBase, two gold-standard databases of RNA sequences (13,14), a description of lncRNAdb, a new resource on experimentally characterized long non-coding RNA (15), as well as descriptions of several databases of predicted and/or experimentally validated microRNA targets (16–21). This issue also includes an update on the status of the RNA Modification Database, which was regularly featured in the NAR Database Issue in the 1990s (22–25) but not in the past 12 years. The current version lists 107 types of posttranscriptional modifications of nucleosides in RNA, primarily in various tRNAs (26). Two new databases present data on the RNA-binding proteins [RBPDB, (27)] and the specific structures of their RNA-binding sites [PRIDB, (28)].

This issue also features a block of 15 papers describing recent progress in protein structure databases, such as Protein DataBank (PDB), PDB in Europe (PDBe), CATH, SUPERFAMILY (29–32), as well as a selection of databases on protein building blocks, protein–protein interactions, protein structure modeling, and the organization of inter-protein contact sites (33–38). Among new databases, it is worth mentioning, a database of 3D cryo-electron microscopy maps (39), a database of protein circular dichroism data (40) and three databases that are dedicated to the conformational dynamics of proteins (41–43). In addition, a paper from Gert Vriend’s group (44) presents their PDB-facilities web site with several useful PDB-derived databases for the analysis of protein structures. These include the famous Definition of Secondary Structure of Proteins (DSSP) and Homology-derived Secondary Structure of Proteins (HSSP) databases, which were last featured in the NAR Database Issue >12 years ago (45,46).

Progress in the analysis of the human genome prompted the creation of databases that list genes implicated in a variety of human diseases, including coronary artery disease (47), type I diabetes (48) and cancer. Cancer databases in this issue are represented by an update paper on the Catalogue of Somatic Mutations In Cancer [COSMIC, (49)], a description of the University of California Santa Cruz (UCSC) Cancer Genomics Browser [ (50)], a new resource tightly integrated with the popular UCSC Genome Browser and the ENCODE database (51,52), and three more databases, dedicated, respectively, to cervical cancer, prostate cancer and potential cancer drug targets (53–55).

There are many other excellent databases that could not be mentioned here because of the space restrictions. In fact, we expect every single database featured in this issue to be useful to a wide audience of students and researchers in various areas of molecular biology.

As explained in last year’s editorial (56), moving to an online-only format for the NAR Database Issue has allowed us to accommodate longer papers and to offer the authors of the most popular data resources an opportunity to describe their resources in more detail, providing a deeper insight into the organization and goals of their respective resources and putting the recent updates of these resources into a broader context. This year, such extended papers were invited for a much larger number of databases, resulting in comprehensive descriptions of the PDB, PDBe, EMDataBank, MODBASE, GPCRDB, RegulonDB, STRING and other well-known databases (29,30,35,39,57–59). In some cases, longer descriptions were accepted for first-time descriptions of several new databases (36,60,61). We intend to continue accepting long(er) database papers in the future.


Intramural Research Program of the US National Institutes of Health (to M.Y.G.); European Molecular Biology Laboratory (to G.R.C.). Funding for open access charge: Waived by Oxford University Press.

Conflict of interest statement. The authors' opinions do not necessarily reflect the views of their respective institutions.


The authors thank Sir Richard Roberts and Dr Alex Bateman, Dr David Landsman and Dr Francis Ouellette for helpful comments; Patricia Anderson, Dr Martine Bernardes-Silva and Gail Welsh for excellent editorial assistance, the Oxford University Press team lead by Claire Bird and Jennifer Boyd and Sheila Plaister at EMBL-EBI for their help in compiling this issue and the online Molecular Biology Database Collection.


1. Gaudet P., Bairoch A., Field D., Sansone S.-A., Taylor C., Attwood T.K., Bateman A., Blake J.A., Bult C.J., Cherry J.M., et al. Towards BioDBcore: a community-defined information specification for biological databases. Nucleic Acids Res. 2011;39:D7–D10. [PMC free article] [PubMed]
2. Roberts R.J., Chang Y.-C., Hu Z., Rachlin J., Anton B., Pokrzywa R., Choi H.-P., Faller L., Guleria J., Housman G., et al. COMBREX: a project to accelerate the functional annotation of prokaryotic genomes. Nucleic Acids Res. 2011;39:D11–D14. [PMC free article] [PubMed]
3. Galperin M.Y., Koonin E.V. ‘Conserved hypothetical’ proteins: prioritization of targets for experimental study. Nucleic Acids Res. 2004;32:5452–5463. [PMC free article] [PubMed]
4. Galperin M.Y., Koonin E.V. From complete genome sequence to ‘complete’ understanding? Trends Biotechnol. 2010;28:398–406. [PMC free article] [PubMed]
5. Kaminuma E., Mashima J., Kodama Y., Gojobori T., Ogasawara O., Okubo K., Takagi T., Nakamura Y. DDBJ Progress Report. Nucleic Acids Res. 2011;39:D22–D27. [PMC free article] [PubMed]
6. Leinonen R., Akhtar R., Birney E., Bonfield J., Bower L., Corbett M., Cheng Y., Demiralp F., Faruque N., Goodgame N., et al. The European nucleotide archive. Nucleic Acids Res. 2011;39:D28–D31. [PMC free article] [PubMed]
7. Benson D., Karsch-Mizrachi I., Lipman D., Ostell J., Sayers E.W. GenBank. Nucleic Acids Res. 2011;39:D32–D37. [PMC free article] [PubMed]
8. Cochrane G., Karsch-Mizrachi I., Nakamura Y. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2011;39:D15–D18. [PMC free article] [PubMed]
9. Leinonen R., Sugawara H., Shumway M. The Sequence Read Archive. Nucleic Acids Res. 2011;39:D19–D21. [PMC free article] [PubMed]
10. McEntyre J.R., Ananiadou S., Andrews S., Black W.J., Boulderstone R., Buttery P., Chaplin D., Chevuru S., Cobley N., Coleman L.-A., et al. UKPMC: a full text article resource for the life sciences. Nucleic Acids Res. 2011;39:D58–D65. [PMC free article] [PubMed]
11. Gruber A., Fallmann J., Kratochvill F., Kovarik P., Hofacker I.L. AREsite: a database for the comprehensive investigation of AU-rich elements. Nucleic Acids Res. 2011;39:D66–D69. [PMC free article] [PubMed]
12. Stephens R., Cer R., Bruce K., Mudunuri U., Yi M., Volfovsky N., Luke B., Bacolla A., Collins J. Non-B DB - A database of predicted non-B DNA forming motifs in mammalian genomes. Nucleic Acids Res. 2011;39:D383–D391. [PMC free article] [PubMed]
13. Gardner P., Daub J., Tate J., Moore B., Osuch I., Griffiths-Jones S., Finn R., Nawrocki E., Kolbe D., Eddy S., et al. Rfam: Wikipedia, clans and the “decimal” release. Nucleic Acids Res. 2011;39:D141–D145. [PMC free article] [PubMed]
14. Kozomara A., Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–D157. [PMC free article] [PubMed]
15. Amaral P.P., Clark M.B., Gascoigne D.K., Dinger M.E., Mattick J.S. lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res. 2011;39:D146–D151. [PMC free article] [PubMed]
16. Mewes H.W., Ruepp A., Theis F., Rattei T., Walter M., Frishman D., Suhre K., Mayer K., Stümpflen V., Antonov A. MIPS: curated databases and comprehensive secondary data resources in 2010. Nucleic Acids Res. 2011;39:D220–D224. [PMC free article] [PubMed]
17. Yang J.-H., Li J.-H., Shao P., Zhou H., Chen Y.-Q., Qu L.-H. starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-seq and Degradome-seq data. Nucleic Acids Res. 2011;39:D202–D209. [PMC free article] [PubMed]
18. Elefant N., Berger A., Shein H., Hofree M., Margalit H., Altuvia Y. RepTar: a database of predicted cellular targets of host and viral miRNAs. Nucleic Acids Res. 2011;39:D188–D194. [PMC free article] [PubMed]
19. Meng Y., Gou L., Chen D., Mao C., Jin Y., Wu P., Chen M. PmiRKB: a plant microRNA knowledge base. Nucleic Acids Res. 2011;39:D181–D187. [PMC free article] [PubMed]
20. Hsu S.-D., Lin F.-M., Wu W.-Y., Liang C., Huang W.-C., Chan W.-L., Tsai W.-T., Chen G.-Z., Lee C.-J., Chiu C.-M., et al. miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res. 2011;39:D163–D169. [PMC free article] [PubMed]
21. Cho S., Jun Y., Lee S., Choi H., Jung S., Jang Y., Lee S., Kim S., Lee S., Kim W.K. miRGator v2.0: an integrated system for functional investigation of microRNAs. Nucleic Acids Res. 2011;39:D158–D162. [PMC free article] [PubMed]
22. Crain P.F., McCloskey J.A. The RNA modification database. Nucleic Acids Res. 1996;24:98–99. [PMC free article] [PubMed]
23. Crain P.F., McCloskey J.A. The RNA modification database. Nucleic Acids Res. 1997;25:126–127. [PMC free article] [PubMed]
24. McCloskey J.A., Crain P.F. The RNA modification database–1998. Nucleic Acids Res. 1998;26:196–197. [PMC free article] [PubMed]
25. Rozenski J., Crain P.F., McCloskey J.A. The RNA modification database: 1999 update. Nucleic Acids Res. 1999;27:196–197. [PMC free article] [PubMed]
26. Agris P., Cantara W., Crain P., Rozenski J., McCloskey J., Harris K., Zhang X., Vendeix F., Fabris D. The RNA modification database, RNAMDB: 2011 update. Nucleic Acids Res. 2011;39:D195–D201. [PMC free article] [PubMed]
27. Cook K.B., Kazan H., Zuberi K., Morris Q., Hughes T.R. RBPDB: a database of RNA-binding specificities. Nucleic Acids Res. 2011;39:D301–D308. [PMC free article] [PubMed]
28. Lewis B., Walia R., Terribilini M., Ferguson J., Zheng C., Honavar V., Dobbs D. PRIDB: a protein-RNA interface database. Nucleic Acids Res. 2011;39:D277–D282. [PMC free article] [PubMed]
29. Rose P.W., Beran B., Bi C., Bluhm W.F., Dimitropoulos D., Goodsell D.S., Prlic A., Quesada M., Quinn G.B., Westbrook J.D., et al. The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res. 2011;39:D392–D401. [PMC free article] [PubMed]
30. Velankar S., Alhroub Y., Alili A., Best C., Boutselakis H.C., Caboche S., Conroy M.J., Dana J.M., van Ginkel G., Golovin A., et al. PDBe: protein data bank in Europe. Nucleic Acids Res. 2011;39:D402–D410. [PMC free article] [PubMed]
31. Cuff A., Sillitoe I., Lewis T., Clegg A., Rentzsch R., Furnham N., Pellegrini-Calace M., Jones D.T., Thornton J., Orengo C. Extending CATH: Increasing coverage of the protein structure universe and linking structure with function. Nucleic Acids Res. 2011;39:D420–D426. [PMC free article] [PubMed]
32. de Lima Morais D., Fang H., Rackham O., Wilson D., Pethica R., Chothia C., Gough J. SUPERFAMILY 1.75 including a domain-centric Gene Ontology method. Nucleic Acids Res. 2011;39:D427–D434. [PMC free article] [PubMed]
33. Vanhee P., Verschueren E., Baeten L., Stricher F., Serrano L., Rousseau F., Schymkowitz J. BriX: a database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res. 2011;39:D435–D442. [PMC free article] [PubMed]
34. Stein A., Ceol A., Aloy P. 3did: identification and classification of domain-based interactions of known three-dimensional structure. Nucleic Acids Res. 2011;39:D718–D723. [PMC free article] [PubMed]
35. Pieper U., Webb B.M., Barkan D.T., Schneidman-Duhovny D., Schlessinger A., Braberg H., Yang, Meng E., Pettersen E., Huang C., et al. ModBase, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 2011;39:D465–D474. [PMC free article] [PubMed]
36. Xu Q., Dunbrack R.L., Jr The protein common interface database (ProtCID)–a comprehensive database of interactions of homologous proteins in multiple crystal forms. Nucleic Acids Res. 2011;39:D761–D770. [PMC free article] [PubMed]
37. Luo Q., Pagel P., Vilne B., Frishman D. DIMA 3.0: domain interaction map. Nucleic Acids Res. 2011;39:D724–D729. [PMC free article] [PubMed]
38. Yellaboina S., Tasneem A., Zaykin D., Raghavachari B., Jothi R. DOMINE: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Res. 2011;39:D730–D735. [PMC free article] [PubMed]
39. Lawson C.L., Baker M.L., Best C., Bi C., Dougherty M., Feng P., van Ginkel G., Devkota B., Lagerstedt I., Ludtke S.J., et al. unified data resource for CryoEM. Nucleic Acids Res. 2011;39:D456–D464. [PMC free article] [PubMed]
40. Whitmore ., Woollett B., Miles A., Klose D., Janes R., Wallace B. PCDDB: the protein circular dichroism data bank, a repository for circular dichroism spectral and metadata. Nucleic Acids Res. 2011;39:D480–D486. [PMC free article] [PubMed]
41. Kim D.N., Altschuler J., Strong C., McGill G., Bathe M. Conformational dynamics data bank: a database for conformational dynamics of proteins and supramolecular protein assemblies. Nucleic Acids Res. 2011;39:D451–D455. [PMC free article] [PubMed]
42. Juritz E., Fernandez Alberti S., Parisi G. PCDB: a database of protein conformational diversity. Nucleic Acids Res. 2011;39:D475–D479. [PMC free article] [PubMed]
43. Sikora M., Sulkowska J.I., Witkowski B.S., Cieplak M. BSDB: the biomolecule stretching database. Nucleic Acids Res. 2011;39:D443–D450. [PMC free article] [PubMed]
44. Joosten R., te Beek T., Krieger E., Hooft R., Schneider R., Sander C., Vriend G. A series of PDB related databases for everyday needs. Nucleic Acids Res. 2011;39:D411–D419. [PMC free article] [PubMed]
45. Hooft R.W., Sander C., Scharf M., Vriend G. The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Comput. Appl. Biosci. 1996;12:525–529. [PubMed]
46. Dodge C., Schneider R., Sander C. The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res. 1998;26:313–315. [PMC free article] [PubMed]
47. Liu H., Liu W., Liao Y., Cheng L., Liu Q., Ren X., Shi L., Tu X., Wang Q.K., Guo A.Y. CADgene: a comprehensive database for coronary artery disease genes. Nucleic Acids Res. 2011;39:D991–D996. [PMC free article] [PubMed]
48. Burren O.S., Adlem E.C., Achuthan P., Christensen M., Coulson R.M., Todd J.A. T1DBase: update 2011, organization and presentation of large-scale data sets for type 1 diabetes research. Nucleic Acids Res. 2011;39:D997–D1001. [PMC free article] [PubMed]
49. Forbes S.A., Bindal N., Bamford S., Cole C., Kok C.Y., Beare D., Jia M., Shepherd R., Leung K., Menzies A., et al. COSMIC: mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 2011;39:D945–D950. [PMC free article] [PubMed]
50. Sanborn J.Z., Benz S.C., Craft B., Szeto C., Kober K.M., Meyer M., Vaske C.J., Goldman M., Smith K.E., Kuhn R.M., et al. The UCSC cancer genomics browser: update 2011. Nucleic Acids Res. 2011;39:D951–D959. [PMC free article] [PubMed]
51. Fujita P.A., Rhead B., Zweig A.S., Hinrichs A.S., Karolchik D., Cline M.S., Goldman M., Barber G.P., Clawson H., Coelho A., et al. The UCSC Genome Browser database: update 2011. Nucleic Acids Res. 2011;39:D876–D882. [PMC free article] [PubMed]
52. Raney B.J., Cline M.S., Rosenbloom K.R., Dreszer T.R., Learned K., Barber G.P., Meyer L.R., Sloan C.A., Malladi V.S., Roskin K.M., et al. ENCODE whole-genome data in the UCSC genome browser (2011 update) Nucleic Acids Res. 2011;39:D871–D875. [PMC free article] [PubMed]
53. Agarwal S.M., Raghav D., Singh H., Raghava G.P. CCDB: a curated database of genes involved in cervix cancer. Nucleic Acids Res. 2011;39:D975–D979. [PMC free article] [PubMed]
54. Maqungo M., Kaur M., Kwofie S.K., Radovanovic A., Schaefer U., Schmeier S., Oppon E., Christoffels A., Bajic V.B. DDPC: Dragon Database of Genes associated with Prostate Cancer. Nucleic Acids Res. 2011;39:D980–D985. [PMC free article] [PubMed]
55. Ahmed J., Meinel T., Dunkel M., Murgueitio M.S., Adams R., Blasse C., Eckert A., Preissner S., Preissner R. CancerResource: a comprehensive database of cancer-relevant proteins and compound interactions supported by experimental knowledge. Nucleic Acids Res. 2011;39:D960–D967. [PMC free article] [PubMed]
56. Cochrane G.R., Galperin M.Y. The 2010 Nucleic Acids Research Database Issue and online Database Collection: a community of data resources. Nucleic Acids Res. 2010;38:D1–D4. [PMC free article] [PubMed]
57. Gama-Castro S., Salgado H., Peralta-Gil M., Santos-Zavaleta A., Muniz-Rascado L., Solano-Lira H., Jimenez-Jacinto V., Weiss V., Garcia-Sotelo J.S., Lopez-Fuentes A., et al. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units) Nucleic Acids Res. 2011;39:D98–D105. [PMC free article] [PubMed]
58. Vroling B., Sanders M., Baakman C., Borrmann A., Verhoeven S., Klomp J., Oliveira L., de Vlieg J., Vriend G. GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Res. 2011;39:D309–D319. [PMC free article] [PubMed]
59. Szklarczyk D., Franceschini A., Kuhn M., Simonovic M., Roth A., Minguez P., Doerks T., Stark M., Muller J., Bork P., et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39:D561–D568. [PMC free article] [PubMed]
60. Masuya H., Makita Y., Kobayashi N., Nishikata K., Yoshida Y., Mochizuki Y., Doi K., Takatsuki T., Waki K., Tanaka N., et al. The RIKEN integrated database of mammals. Nucleic Acids Res. 2011;39:D861–D870. [PMC free article] [PubMed]
61. Lee T.Y., Bo-Kai Hsu J., Chang W.C., Huang H.D. RegPhos: a system to explore the protein kinase-substrate phosphorylation network in humans. Nucleic Acids Res. 2010;39:D777–D787. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press