Over the past decade, structural genomics (SG) efforts in the USA alone have determined the structures of more than 3000 previously uncharacterized proteins at a sustained rate of over 500 novel structure depositions per year to the Protein Databank (PDB) (
1). Through the discovery of numerous new folds and an even greater number of variants of known folds (
2), SG structures provide key input for innovative research into protein evolution and function. One of the main challenges presented by such high-throughput research involves the timely annotation and integration of the resulting data to provide direct input into ongoing research within the greater biological community. Traditional mechanisms for publication are simply too slow to keep pace with the speed of structure determination. Thus, currently over 90% of SG deposited structures are not yet described in literature. The rate and volume of protein structures being produced requires novel mechanisms to ensure that the knowledge gained by these structures is disseminated in a timely manner.
Several new protein structure annotation platforms, using wiki-based methods, have been described (
3–5). However, their content is largely static and derived from peer-reviewed publications, aspects that do not easily lend themselves to exploring new knowledge about structures. We developed The Open Protein Structure Annotation Network (TOPSAN) to serve both as an annotation and a communication platform with the goal of facilitating and accelerating research relevant to SG structures. TOPSAN integrates a wide range of information about SG proteins, from different high-throughput experiments to literature, evolutionary analysis and even functional predictions. Through the implementation of a semantic web layer in the current version, TOPSAN enables database-like searches through its entire content and thus promotes further integration between its content and mainstream biology.