Several recent reviews have outlined the progress and achievements of the PSI program [
3-
9]. Ultimately, the success of the initiative will be determined by the scientific impact of the new technologies, reagents, and 3D structures provided into the public domain, and the knowledge that is gained from these data. One operational metric of the program is a count of 3D structures of ‘distinct’ proteins (or domains), referred to as ‘Distinct Structures’, deposited into the Protein Data Bank (PDB). Two protein sequences are ‘distinct’ if they share < 98% sequence identity over the full-length of the shortest sequence of the pair, e.g. though each provides uniquely valuable information, two crystal structures of the same protein bound to different ligands count as a single Distinct Structure. “Novel” structures are defined as those which have < 30% sequence identity with any structure in the PDB at the time of deposition. In the second phase of the PSI program (called PSI2, 2005-2010), investigators achieved their goal set in 2005 of depositing more than 3,000 Distinct Structures into the PDB. Most of these were also Novel Structures, greatly expanding our knowledge of the relationship between protein sequences and 3D structure. Over the full ten years of the PSI program, investigators completed and deposited into the public domain more than 5,000 3D protein structures, including protein-ligand complexes and pairs of X-ray and NMR structures which, though not counted as “Distinct”, have important scientific value (see for example [
10]).
Many of the structures determined in first and second phases of the PSI program were, at the time of deposition into the PDB, the first representatives from extensive protein domain families [
11]. A ‘protein domain family’ is a set of homologous protein domains likely to have similar structures and possibly similar biochemical functions. These included both domains with known biochemical functions and domains of unknown function, known as DUFs. These structures are being used as templates for modelling tens of thousands of homologous proteins [
12-
17] and provide a database of protein sequences, structures, and biophysical properties (e.g. chemical shifts) that also inform the fields of protein structure prediction, design, and engineering. By focusing the choice of targets on proteins that have minimal sequence similarity with known structures, PSI structures have greatly increased the size of the non-redundant protein structure knowledge base that is being used to develop improved structure prediction algorithms, including fragment-based search algorithms and knowledge-based atomic potentials. In some cases, these structural data are accompanied by extensive chemical shift, nuclear overhauser effect (NOE), and other NMR data that are being used in hybrid structure determination methods [
18].
The PSI has also become the primary contributor of structural data that can be used for testing new methods for protein structure prediction and automated data analysis, including data used in such projects as the Critical Assessment of Structure Prediction [
19,
20] and the Critical Assessment of Automated Structure Determination of Proteins from NMR data [
21]. Structural Genomics projects are unique as they can provide their data for community-wide tests of computational methods without concern regarding how it impacts their priorities for publishing a particular structure. PSI Centers are also involved in collaborative projects aimed at accelerating the field of protein NMR structure analysis [
22-
28] and computational protein design [
29-
31].
As part of the community-outreach goals of the PSI program, the National Institute of General Medical Science has created the PSI Structural Biology Knowledge Base (PSI-SBKB) [
32], for organizing and disseminating the entire repertoire of scientific information generated by the PSI program, and the PSI Materials Repository (PSI-MR) [
33], designed to provide easy, rapid, and broad access to the biochemical reagents produced by PSI Centers, particularly the protein expression systems. These resources serve as a platform for PSI-funded investigators to provide information on protein samples and 3D structures to the broad biological community in an “open source” fashion, in which intermediate results, protein expression systems and protocols, protein structures, and new technologies are made available to the community as soon as the data and/or methods are deemed to be reliable. The PSI-SBKB also provides access to 3D protein models generated using various comparative modeling methods [
14], together with coordinates of structures solved by the PSI program.
The PSI program has also instituted a Community Nomination Target (CNT) program, through which scientists can nominate targets for study by PSI centers and collaborate on functional follow on studies (
http://sbkb.org/cnt/). This program provides a unique method of connecting PSI investigators with important biological problems and top-tier biological investigators and provides access to PSI Centers by a wide range of collaborators who are not directly funded by the PSI. Several of these CNT projects have yielded important and challenging structures enabling the research programs of individual investigators across the globe (see for example [
34-
42]).