More than half of all proteins in cells associate with biological membranes permanently or temporarily. This includes integral monotopic and transmembrane (TM) proteins, which are encoded by 20–30% of sequenced genomes (1
), and more numerous peripheral proteins and peptides that can form transient complexes with membrane lipids or proteins. Recent progress in structure determination techniques (2
) have led to a significant growth of the number of membrane proteins with known three-dimensional (3D) structures. Currently, there are approximately 1200 and 10
000 entries in the Protein Data Bank (PDB) (3
) related to TM and peripheral proteins, respectively, which corresponds to 1.6 and 13% of the PDB content. Many PDB entries represent different complexes, conformations, mutants or crystal forms of the same protein, so the set of distinct proteins is approximately 3-fold smaller.
Integral membrane proteins with known 3D structures can be found in several specialized databases, such as Stephen White's list (4
), the Membrane Proteins Data Bank (MPDB) (5
) and the transporter classification database (TCDB) (6
). These resources provide some complementary information, including bibliography, crystallization and solubilization conditions (5
) or classification and phylogenetic analysis of membrane transporters (6
). More specialized resources cover membrane-targeting domains [MeTaDoR (7
)], and antimicrobial peptides with non-standard amino acids [Peptaibol (8
The critical information missing in these databases is the exact position of membrane boundaries, which is not obvious from the protein 3D structure, even if the protein was crystallized with phospholipids. Spatial positions of membrane-associated proteins with respect to the lipid bilayer can be determined by experimental techniques or computationally. Experimental methods, including chemical modification, spin-labeling, X-ray scattering, neutron diffraction, infrared spectroscopy, electron-cryomicroscopy and NMR, are very laborious and, therefore, have been applied for a limited set of proteins and peptides (9
). On the other hand, development of a fast and reliable computational approach would allow positioning of proteins in membranes in a timely manner, following the expanding flow of experimentally determined structures.
Several theoretical methods have been applied for positioning of proteins in membranes, which are based on molecular dynamics (MD) simulations (11
), coarse-grained MD (12
), optimization of electrostatic energy (13–16
), energy minimization with implicit solvent models (17–21
) or membrane depth-dependent scoring functions (22–24
). The results for TM proteins have been collected in two databases, Protein Data Bank of TransMembrane proteins (PDBTM) (25
), and Coarse-Grained DataBase (CGDB) (12
). PDBTM includes an up-to-date set of 1441 PDB structures (release as of 24 June 2011) of α-helical and β-barrel proteins arranged in the lipid bilayer by the TMDET algorithm (23
). CGDB holds pseudo-atoms models of approximately 370 TM proteins and around a dozen of monotopic proteins generated by the Coarse Grained MD simulations (12
). Both databases are focused on integral membrane proteins but do not include peripheral proteins, because the prediction of their weak association with lipid bilayers would require a significantly higher precision in calculation of membrane binding affinities than can be provided by the underlying methods.
To fill this gap, we have proposed and recently advanced a method for Positioning of Proteins in Membrane (PPM 2.0) by optimizing free energy of protein transfer from water to the membrane environment that implements an anisotropic solvent model of the lipid bilayer (9
). The method was thoroughly verified for several dozens of TM and peripheral proteins and peptides, whose arrangements in membranes have been experimentally studied (9
). High computational efficiency of PPM 2.0 allows its application for the large-scale analysis of proteins from the PDB. The results are deposited in our Orientation of Proteins in Membranes (OPM) database that includes both TM and peripheral proteins (28
). Hence, it covers a significantly larger number of membrane-associated macromolecules (1255 proteins and peptides) and PDB entries (3766 structures) than PDBTM and CGDB. It also provides a four-level protein classification system together with information about protein topology, type of intracellular membrane, source organism and comparison with experimental publications on arrangement of the corresponding proteins in membranes.