|Home | About | Journals | Submit | Contact Us | Français|
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Many scientists lack the background to fully utilize the wealth of solved three-dimensional biomacromolecule structures. Thus, a resource is needed to present structure/function information in a user-friendly manner to a broad scientific audience. Proteopedia http://www.proteopedia.org is an interactive, wiki web-resource whose pages have embedded three-dimensional structures surrounded by descriptive text containing hyperlinks that change the appearance (view, representations, colors, labels) of the adjacent three-dimensional structure to reflect the concept explained in the text.
Structural biology has played a central role in fueling the massive advances made by the life sciences in the last few decades. More than a dozen Nobel prizes have been awarded for achievements in structural biology since solution of the structure of the DNA double helix in the early 1950s was followed by solution of the first protein structures at the end of the same decade. Beautiful images of three-dimensional structures regularly adorn the covers of Science, Nature and Cell. Indeed, a wealth of protein structures has been solved in recent years, and entries in the Protein Data Bank (PDB) [1,2] now number over 50,000. But structural information is surprisingly still not in the mainstream of biology for the simple reason that three-dimensional structures are often hard to understand, even for a structural biologist. The widely held impression is that these structures are understood in detail and put to use in research; in fact, the structures are hardly discussed at all, especially by biologists lacking a structural background. While computer graphics software greatly aids in the understanding of these structures by displaying them in three-dimensions, the pages of printed scientific journals flatten the structures to a two-dimensional image, with much of the three-dimensional information thus being lost. It should be noted, however, that a number of journals (Nature, Nature Structural and Molecular Biology, ACS Chemical Biology and Molecular Biosystems) have begun to offer links to FirstGlance in Jmol  for interactive three-dimensional structure visualization, and two journals (ACS Chemical Biology and Biochemical Journal) occasionally offer interactive three-dimensional figures crafted by Molecules In Motion ; but these still lack the simple direct link between the printed information and the three-dimensional structures that is provided by Proteopedia. Moreover, many biologists have a limited knowledge of chemistry; thus, structural biologists need to make a special effort to develop tools that make macromolecular structures accessible and useful to the life science and clinical communities.
One such tool is molecular animation. Movies are successful at making biomacromolecules and their complexes come to life on the screen, and thus are often able to preserve and convey three-dimensional information far better than static two-dimensional images. Previous efforts to communicate the structural and functional features of a biomacromolecule have largely focused on creation of such movies and on interactive visualizations (for example, Kinemage , MovieMaker , Protein Explorer [7,8], Protein Movie Generator , and PDB2MGIF [10,11]). Until recently, the time and technical knowledge required to make such macromolecular animations were daunting. This has been partly rectified with the advent of eMovie , a plug-in for the molecular visualization program PyMOL , and PolyView3D [14,15], which have both simplified the creation process and lowered the threshold for sharing molecular three-dimensional information via movies. However, although movies are excellent for individual presentations, they are not an adequate solution to the problem that we are attempting to address, because they are fixed once created, and provide neither an interactive environment nor integration with textual information.
What is missing is a common resource that would make three-dimensional structures easier to understand, permit linking of function to structure, and at the same time simplify the sharing of structural information. This should be accomplished not by reducing the amount of information conveyed, but rather by making three-dimensional information intuitive, and thus more accessible to all. Already, valuable attempts have been made to tackle this problem. Perhaps the most notable recent example is iSee , which, like Kinemage, makes three-dimensional structures more intuitive by linking textual information to three-dimensional views of the structure. However, iSee uses both proprietary authoring tools, which must be purchased, and a proprietary viewer that has to be downloaded and installed in order to view both text and three-dimensional structures.
For non-structural biologists, the issue is not understanding a structure as an end in itself, but relating the structural information to biological applications: for example, how do mutations cause disease? Or, to be more specific, what mutation can be performed that will prevent one protein from interacting with another? How can one design a drug that will stabilize a protein destabilized by mutagenesis? Which part of a protein may be useful as an epitope? What happens in an organism in which a given protein domain is missing? In order for structural biology to provide genuine added value for non-structural biologists, we need a resource that will allow the relevant information and its analysis to be entered by the appropriate, knowledgeable scientists - and easily accessed and understood by users without a formal background in structural biology.
Proteopedia is a wiki-based web-resource that has been designed to address what is missing from structural biology: a mechanism for making three-dimensional structures easier to understand, a linking of function to interactive three-dimensional structure visualization, and a simplified sharing of structural and functional knowledge (a wiki is a resource or website where users can edit the pages in the website using simple text-editing tools). This resource is a tool for all scientists who need to utilize three-dimensional structural information in their research, as well as for educators requiring a medium for compelling presentation of structure-function relationships. Proteopedia is also meant for structural biology specialists in need of a more effective method of communicating their results. As a website, Proteopedia is freely accessible to all users without the need for downloading and installing any software. (Java is required. Most users will find that they already have Java installed on their computers. Should they need to download Java, they will be directed to the Java website for the free and simple download.). Furthermore, adding content to the website is simple: textual content is added in the same way as it is added in Wikipedia , taking advantage of an interface that is familiar to millions. Interactive, customized scenes of three-dimensional structures linked to the text are simple to add via Proteopedia's easy-to-use Scene Authoring Tools. Proteopedia is intended to be the website of first-resort for everyone from research scientists to students seeking integrated three-dimensional structural and functional information about a particular protein or molecule.
Proteopedia has three defining features. First, three-dimensional information is presented in an intuitive manner: descriptive text contains hyperlinks that change the adjacently displayed three-dimensional structures to coincide with points made in the text. (Figure (Figure1).1). (The visualizations in Proteopedia are, in fact, not truly three-dimensional, but the impression of three-dimensionality is achieved by having the structure rotate, a visualization technique pioneered by Levinthal in the 1970s .) Second, there is no requirement for installation and operation of downloadable viewers. A web browser is all that is needed for full access, including both interactive three-dimensional viewing and content authoring. The site works equally well on Windows, Mac OS X, and Linux. Third, content can be easily added by any approved, knowledgeable user, via simple-to-use authoring tools.
At first sight, Proteopedia looks a lot like Wikipedia. Indeed, Proteopedia runs on the same open software wiki package used by Wikipedia, MediaWiki . However, a Proteopedia user will soon notice several differences. For one, most pages include at least one instance of the molecular visualization applet Jmol  (an applet is a small program embedded in a webpage), displaying a slowly revolving three-dimensional protein structure. Instead of a flattened, two-dimensional image of a protein structure, users are greeted by a three-dimensional structure that may be rotated and explored in real-time. The second most obvious difference is the existence of green hyperlinks within the text. Clicking on these hyperlinks changes the three-dimensional molecular scene displayed within the adjacent Jmol applet to one that better illustrates the concept referred to in the relevant text. In some sense this follows the familiar and important English essay-writing adage "Show, don't tell".
For example, a user interested in hemoglobin visits the page of that name in Proteopedia. A slowly rotating three-dimensional crystal structure of hemoglobin is displayed in an interactive Jmol applet. While reading the text, the user clicks on the embedded green hyperlinks to display new molecular scenes illustrating the points in the text (Figure (Figure1).1). Each of the links, which can be traversed in any order, smoothly transitions from the previous scene to the next one, enhancing the user's spatial comprehension of relative locations on and within the protein. In contrast, two-dimensional images of protein structures often leave the user grappling with the spatial relations of one image to another.
The key breakthrough in Proteopedia is the ease with which any user can create 'text-to-molecular-scene links' using the Scene Authoring Tools (for example, see  for a narrated video tutorial). The Scene Authoring Tools strive for user-friendliness, and they can be accessed by virtually any system, be it Windows, Linux, or Mac, running any of the most popular web browsers (Internet Explorer, Firefox, Safari, and others).
A Proteopedia user who wants to create a scene uses the Scene Authoring Tools to manipulate his or her three-dimensional structure into the desired viewing-perspective and zoom, colors, representations and labels (like a two-dimensional picture). That particular scene of the three-dimensional structure is then saved and married to a green link in the text of the page. Whenever that green link is clicked, the Jmol applet will recall the saved scene, and will automatically transition smoothly to it. Conformational changes (or morphs) can be animated as well. Previously created scenes are easily recalled and edited within the Scene Authoring Tools.
Each page in Proteopedia can be modified by the members of the user community, thus permitting addition and editing of content. Modifications become visible and searchable immediately. Adding and editing content is quick, easy, and accessible to the common non-technical user and scientist.
Compared to other three-dimensional structural databases that solely archive, in a rigid format, data from scientists working on a given protein, Proteopedia, because it is a wiki, permits anyone knowledgeable with respect to that particular protein to add information regarding its function and to relate the information directly to the three-dimensional structure. Mistakes and errors are easily corrected by users who have opted to receive e-mail notification whenever the page on which they are expert is changed. Each change made to a page is logged in that page's history, so that pages can easily be reverted to a previous state. When appropriate or necessary, a page may be protected from being edited except by a selected group of stewards who can evaluate proposed changes to the page.
In creating a wiki for the scientific community, two chief concerns are to ensure that only knowledgeable users are authoring content, and to ensure that authors receive proper credit for their contributions. Proteopedia addresses these issues in the following manner. While anyone can view Proteopedia pages, only registered users can edit pages and add content. In contrast to Wikipedia, Proteopedia user accounts are exclusive to the scientific community, and only scientists, educators, and students of science are invited to request accounts by clicking on "log in/request account" at the upper right-hand side of the webpage. Approved accounts are created using the users' real names so that the authors both receive appropriate credit for their contributions (each page lists the names of the people who have contributed to the page) and take responsibility for their entries.
In a departure from the purist wiki model, Proteopedia provides each user with a section where she or he can create pages that are protected from editing by others. By so doing, Proteopedia encourages educators and lecturers to take advantage of the three-dimensional visualization features of Proteopedia to create interactive three-dimensional 'lecture slides' for projection from the website, without having to worry that the content might be changed by someone else. Students can access this lecture material at any time, anywhere, even after the lecture. Additionally, scientific papers discussing three-dimensional macromolecular structures may also benefit from the three-dimensional visualization features of Proteopedia via protected pages with interactive, three-dimensional material supplementary to the publication.
Proteopedia is already online, serving the scientific community. It contains automatically seeded pages for each of the more than 50,000 entries in the PDB, updated weekly with each release of new PDB entries. Each such page includes, along with a rotatable/zoomable three-dimensional structure, the abstract of the paper associated with the structure (from PubMed ), green hyperlinks that highlight key parts of the structure defined in the PDB file (for example, ligands and functional sites) and other useful information detailed in Figure Figure2.2. A user familiar with a structure will thus find its page ripe and ready for enhancement with additional content and new scenes to better illustrate the function of the protein - much easier than starting from a blank page. Additionally, these PDB entry seed pages have high value to a diverse audience of scientists even before insertion of user-added content due to the inherent convenience of having an interactive, three-dimensional visualization of the structure adjacent to the abstract of its publication.
But Proteopedia is not a one-to-one mapping of the PDB. The seeded PDB entry pages in Proteopedia provide a base level in a hierarchical organization. A higher level consists of pages that explain and summarize structure/function knowledge about particular molecules or classes of molecules. For example, the hemoglobin and acetylcholinesterase pages provide general overviews of these molecules along with rotatable/zoomable three-dimensional structures and links to all of the related PDB entry pages in Proteopedia.
To have real value to a diverse audience, three-dimensional structures of proteins, RNA, DNA, and other biomacromolecules must be communicated, wherever possible, together with their biochemical and biological functions. While Proteopedia makes this integrated communication possible, and even simple, it is a resource that relies on community-annotation, and there is no guarantee that enough knowledgeable users will take to Proteopedia en masse to reach a critical level of users. To minimize this risk, Proteopedia attempts to be as enticing as possible to these knowledgeable users, with intuitive visualization features, with user-friendly authoring tools, with attribution of content, with special protected pages for lectures, tutorials, and supplementary information for journal articles, and with a familiar interface (from Wikipedia). In addition, all textual content and scenes added by users to Proteopedia are licensed under the GNU Free Documentation License (as in Wikipedia), thus ensuring that the content is free, and that Proteopedia is solely a vehicle for content creation and dissemination. Proteopedia will also continue to cater to its knowledgeable users by listening to their feedback and actively developing in ways that satisfy their needs and desires. For example, Proteopedia will shortly offer the option to display the amino acids in three-dimensional protein structures color-coded according to their degree of evolutionary conservation (using ConSurf ).
The number of user-created and user-enhanced pages currently number in the double digits. User added content is expected to rise following publication of this paper, but over 100 users have already registered accounts. These Proteopedia users have started to develop several protein and molecule pages (see, for example, , a page on recoverin, a calcium-activated myristoyl switch), and have also expanded the seeded pages for the PDB entries they have authored or know well (see, for example, , a page on PDB entry 2rkx from a recent, exciting study of an enzyme designed for a reaction not catalyzed by a naturally occurring biocatalyst ). In one case, Proteopedia was used to render in three-dimensions several figures from a publication before a journal club meeting (see , a page on the structure of a human p110alpha/p85alpha complex ). In another case, an undergraduate student created a page on Photosystem II in Proteopedia for a biochemistry class assignment (see ). Using the protected pages format, a university professor and educator has created a graphical tutorial on Ramachandran plots (Figure (Figure3).3). A page on the highest impact structures of all time currently lists the DNA double helix (B form), myoglobin, lysozyme, deoxy-hemoglobin, transfer RNA, tobacco bushy stunt virus, major histocompatibility complex class I, and the ribosome, and invites contribution and discussion (see ).
Proteopedia is a novel resource, and its key advantages are as follows. First, it links text with interactive three-dimensional scenes of molecules and molecular complexes. Second, the three-dimensional scenes of molecules and molecular complexes can be created easily by Proteopedia users, using the Proteopedia Scene Authoring Tools, and immediately shared with and viewed by all. Third, it can be viewed via any standard browser and operating system, and requires no proprietary or commercial software. Fourth, in contrast with most other wikis, there are different levels of editorial control and input access, ranging from pages to which any registered user can contribute to protected pages, intended, for example, for teaching, which can be modified by only one author. Fifth, also in contrast with most other wikis, each page shows the full real names of its authors; thus, authors not only receive appropriate credit but also take responsibility for their contributions. Sixth, it features automated seeding of pages for each of the entries in the PDB, but with substantial added content. The added content includes the published abstract of the article associated with the structure, an interactive three-dimensional structure of the macromolecule with, where possible, links to key structural features, including the active site, ligands, and links to other relevant databases. These seeded pages provide valuable templates to which knowledgeable users can add content. Seventh, it extends beyond the contents of the PDB, providing for hierarchical organization of structure and function categories such as protein families, structural classes and biological function. Eighth, content is not restricted to PDB structures. Contributors can upload their own coordinates, experimental or theoretical models, whether of single biomolecules or of complexes. Theoretical models are clearly distinguished as such. Contributors may also add small molecules that are biologically relevant or that could benefit from Proteopedia's visualization technology. Ninth, visualization is not restricted to a single format. Several are already incorporated, including Jmol, kinemages (using MageJava ), movies, morphs (for example conformational changes or docking actions), and images. Table Table11 shows the unique combination of features in Proteopedia in comparison to related software tools.
Protein structures are not ends in themselves. Structural information must be placed in the appropriate biological context in order to be useful. To borrow from Greg Petsko, "Structures have value when they are part of a larger effort to understand the biochemical and biological functions of the protein in question... [Structure determination] is not the end in itself, nor should it be, not anymore..." . Structures have value to a more diverse audience when three-dimensional structural information is smoothly integrated with biochemical and biological information. For example, it would be ideal if each new deposition in the PDB were accompanied by a well-developed page in Proteopedia by its authors, serving at least as a sort of 'News and Views', and touching on deeper details about the structure as necessary.
Proteopedia enhances the scientific community's ability to communicate complex three-dimensional information. Its integrated text and graphics allow for structural information to be conveyed in a manner that is accessible to a broad repertoire of scientists. Relevance of structure to function can be transmitted in a transparent fashion, and shared via simple tools for contributing to the website. Furthermore, Proteopedia has the capacity to leverage the resources of many diverse experts in varied fields rather than just the curators at a database site - and the ability to do so in an exciting, new medium.
Proteopedia is built upon a customized version of the MediaWiki  open software package, and integrates the Jmol  open-source Java applet viewer for chemical structures in three-dimensions using an adapted version of the Jmol MediaWiki Extension  with novel Scene Authoring Tools built specifically for Proteopedia. Kinemages are visualized in Proteopedia using MageJava . PDB entry pages are automatically seeded using a script driven by OCA  (the browser/database for protein structure/function), which aggregates information from various resources (listed at ). SGKB annotation plays a key part in OCA's data collection for seeding the PDB entry pages, and two-dimensional images for these pages are seeded from the RCSB PDB  and the Jena Library . Proteopedia is backed up daily to both local and remote locations at the Weizmann Institute of Science, with incremental backups daily and full backups weekly.
PDB, Protein Data Bank.
EH translated JLS's vision of an easy-to-use and universally accessible resource for communicating complex biological structural information into the first working version of Proteopedia, which included the Proteopedia Scene Authoring Tools. JP migrated this first working version of Proteopedia to an externally accessible server and developed and seeded the automatically created pages for each of the entries in the PDB as well as implemented several new and crucial features such as content attribution. EH and JP are active co-developers of Proteopedia. EM contributed to policy development, lent expert opinion, contributed content, and occasionally code, to the project. EM and JLS have been involved in testing and have provided ideas for new features, improvement of existing features, and for the project in general. IS and JM contributed expert opinion and guidance to the overall direction of the project. The idea for a resource like Proteopedia arose out of discussions between JM, JP, IS and JLS on the urgent need for better tools to integrate three-dimensional structure with functional information. JLS provided the main scientific and strategic guidance for the project. The manuscript was drafted by EH and all authors contributed revisions with JLS leading and coordinating the effort.
This study was supported by the Divadol Foundation, the Nalvyco Foundation, the Jean and Julia Goldwurm Memorial Foundation, the Benoziyo Center for Neuroscience, the Neuman Foundation, a research grant from Mr. Erwin Pearl, the Kimmelman Center, the European Commission Sixth Framework Research and Technological Development Programme 'SPINE2-COMPLEXES' Project under contract number LSHG-CT-2006-031220 and 'Teach-SG' Project, under contract number ISSG-CT-2007-037198. JLS is the Morton and Gladys Pickman Professor of Structural Biology. EH is grateful to the Karyn Kupcinet Program and the Feinberg Graduate School (Weizmann Institute of Science) for a fellowship. EM's visit to the Weizmann Institute of Science was funded by the Divadol Foundation. The authors are very grateful to the Jmol and MediaWiki development teams for their support and development of their respective software packages. Special thanks go to Bob Hanson, the current lead developer of Jmol, whose timely incorporation of requested features and bug fixes is unparalleled. The authors are further very grateful to all of the resources whose information is aggregated on the Proteopedia seeded pages (PDB code-titled pages) and wish to thank David Lipman for his advice on the proper usage of PubMed abstracts. We also greatly appreciate the useful discussions with Karl Oberholser, Frieda Reichsman, Gideon Schreiber, Yigal Burstein, Harry Greenblatt, Anat Kats, Steven Brenner and David Givol, as well as the generous permission to incorporate content and images developed by Jane and David Richardson [5,39] and David S Goodsell . We wish to thank, in particular, Nir Ben-Tal and Elana Erez for making ConSurf data available in Proteopedia and Tali Wiesel, of the Weizmann Institute of Science's Graphics Department, for designing Proteopedia's logo.