|Home | About | Journals | Submit | Contact Us | Français|
We discuss recent progresses in computational studies of membrane proteins based on physical models with parameters derived from bioinformatics analysis. We describe computational identification of membrane proteins and prediction of their topology from sequence, discovery of sequence and spatial motifs, and implications of these discoveries. The detection of evolutionary signal for understanding the substitution pattern of residues in the TM segments and for sequence alignment are also discussed. We further discuss empirical potential functions for energetics of inserting residues in the TM domain, for interactions between TM helices or strands, and their applications in predicting lipid-facing surfaces of the TM domain. Recent progresses in structure predictions of membrane proteins are also reviewed, with further discussions on calculation of ensemble properties such as melting temperature based on simplified state space model. Additional topics include prediction of oligomerization state of membrane proteins, identification of the interfaces for protein-protein interactions, and design of membrane proteins.
Membrane proteins account for about 20% to 30% of all proteins encoded in a typical genome (1; 2). They play central roles in transport of nutrients and metabolites, and in signaling of regulatory networks (3; 4; 5). A major obstacle in studying membrane proteins is the difficulty in experimental determination of their three dimensional structures. Computational studies of membrane proteins can compliment experimental studies and have made significant strides. In this review, we discuss recent work based on analysis of sequences and structures of membrane proteins, as well as important understandings gained from these studies on the physical processes of membrane protein assembly. An overview of the scope of studies surveyed in this review is shown in Fig 1, in the form of a diagram of the central dogma of molecular biology, in which different aspects where computational studies have made important contributions are depicted.
It was discovered very early on that the presence of stretches of hydrophobic residues in a protein sequence is a good indicator that this sequence encodes a membrane protein (6). Because most transmembrane helices are hydrophobic, they appear as periodic stretches of non-polar amino acids of length 17–25 in the primary sequence. These stretches of hydrophobic residues cross the lipid membrane multiple times, and are connected by loops containing more polar residues. Such periodicity of hydrophobicity can be easily detected, and early methods for membrane protein prediction are based on calculation of a hydrophobicity index of residues within a window sliding along the protein sequence (6; 7).
A major source of misclassification with this approach is the existence of signal peptides important for targeting proteins for export. Signal peptide contains a hydrophobic region that can easily be mistaken for a transmembrane segment (8). Another source of difficulty is due to C-terminal peptides that are cleaved upon glycosylphosphatidylinositol (GPI)-anchoring, as these peptides are also often hydrophobic (9). An effective solution is to pre-process sequences by deleting signal peptides and cleaved peptides, both can be predicted accurately (10; 11; 12).
Predicting β-barrel membrane proteins is more challenging. Although residues facing the lipid membrane are predominantly hydrophobic, those facing the interior of the barrel can be quite polar (13). Unlike helical membrane proteins, there are no clear stretches of hydrophobic residues in their primary sequences.
Many modern methods for identification of membrane proteins are based on techniques from machine learning and can also predict the topology of membrane proteins. The topology of a membrane protein refers to the number of transmembrane segments and the sidedness of the terminal ends of the protein, namely, whether the N- and C-end is on the non-translocated side or on the translocated side.
The topology of helical membrane protein can be predicted with high accuracy. Most prediction methods are based on processing multiple-sequence alignment data using machine-learning techniques such as neural networks (14; 15), Hidden Markov models (16; 17; 18; 19; 20), and support vector machines (21). The well-known “positive-inside” rule (22; 23; 24), namely, Arg and Lys residues are enriched in loops on the non-translocated side across the membrane compared to the translocated side, greatly aids in the development of these machine learning methods (22; 23; 24). For large scale prediction, recent experimentation using the consensus of many single-sequence based prediction methods also showed promise, which dispenses with time-consuming multiple-sequence alignments and are better suited for genome-scale predictions (25). For β-barrel membrane proteins, despite the lack of clear hydrophobic stretches of residues in the primary sequences, machine learning methods can now predict outer membrane proteins also very accurately (see (26) and (27)).
An approach alternative to machine learning is to predict membrane proteins and their topology based on physical considerations. This approach gives more mechanistic insight and is based on the fact that membrane protein folding and sorting are driven by physical processes. Using the scale of measured free energy contributions of inserting individual amino acids at different positions of the TM helices into the endoplasmic reticulum membrane (28), a simple additive free-energy model was used to identify putative TM helices. Combined with the positive-inside rule, this approach can predict the topology of α-helical membrane proteins accurately based on physical principles (29).
For β-barrel membrane protein, there are several characteristic observations that can help to determine their topology. First, the periplasmic loops are always short compared to extracellular loops (13), although this may not be true for mitochondrial and chloroplast outer membrane proteins. Second, there is a significant, albeit less dramatic bias in the topological sidedness of the distribution of charged residues. Different from the “positive-inside” rule for helical membrane proteins, there exists an overall “positive-outside” distribution. The extracellular cap region of the β-barrel membrane proteins are disproportionately enriched with positively charged Arg and Lys, which are disfavored in the periplasmic cap region (30). This is likely due to the asymmetric distribution of the two leaflets of the lipid bilayer, in which negatively charged lipopolysaccharides (LPS) is enriched in the outer-leaflet of the outer membrane (31). For gram’s negative bacteria, this “positive-out” rule for the outer membrane is consistent and complements the “positive-in” rule for the inner membrane, as both rules implies that positively charged residues are not favored in the periplasmic region.
Several computational methods based on machine learning techniques can predict the topology of β-barrel membrane proteins well (32; 33; 34). Built upon earlier results (35), a recent study based on measured physicochemical properties of residues and empirical statistical potential is also shown to have excellent performance in identifying β-barrel membrane proteins (36).
The GxxxG (or GG4) motif, in which two Gly are separated by three other residues, was the first sequence motif discovered (37). Originally observed in glycophorin A, this motif mediates close interaction of TM helices (38; 37). It is an example of the more general small-xxx-small motif forming helical dimer interface. Found in many biological systems, this class of motif provides a general framework for transmembrane helix association (39). Recent studies greatly broadened our view on the existence of different types of sequence motifs in membrane proteins, as well as their roles in providing structural stabilization and in regulating biological signaling (39; 40; 41; 42).
Computational discovery of sequence motifs of membrane proteins is a challenging task. Because the length of a transmembrane segment is short, there is strong coupling effects between the appearance of residues at one position and its consequential absence in another position (37; 43; 30; 44). The discovery of the GxxxG motif is the outcome of an important development, namely, the formulation of a rigorous statistical treatment of what would be the expected frequency of various patterns of residues for a given transmembrane helix (37). Prior to the study of Senes et al, widely used statistical models such as the Bernoulli/binomial model, the Markovian model, and the χ2 model do not account for this finite-size effect (45; 44). This model, subsequently termed as the permutation model (44), enables detection of very subtle signals even when only limited data are available. Similar permutation model was also later applied in studying spatial motifs in β-barrel membrane protein (30). Senes et al also introduced a dynamic programming method that makes it possible to compute efficiently the random distribution and p-values essential for identifying motifs using a database of membrane protein structures (37).
Subsequently, exact formulae for propensities of motifs with arbitrary number of residues under the permutation model were discovered, along with analytical formulae for p-value calculations for several types of sequence motifs (43; 44). An improved model, called positional null model that is based on exhaustive permutation but also account for bias of residue at certain positions was also developed (43; 44). Further studies showed that anti-motifs, which are sequence patterns that occur far less than would be expected, also reveal important biological information (30; 43; 44). Applications of these results have lead to the discovery of a large number of sequence motifs and antimotifs in β-barrel membrane proteins (43; 44). For example, the terminal motif YF2 was predicted to be important for recognition by periplasmic chaperon SurA for assisted folding (43), as mutations and deletion of the terminal F residue in PhoE from E. coli resulted in impairment of correct assembly of PhoE into the outer membrane (46). The MeMotif database contains many computationally derived sequence motifs for α-helical membrane proteins (47). A study of GPCRs using motifs of reduced alphabet of amino acids can be found in (48).
There are strong specific helix-helix and strand-strand interactions that can be detected through computational analysis. Interactions between TM helices and between strands as well as their overall assembly are the structural basis of sequence motifs. A global view on how TM helices interact spatially was obtained in a comprehensive study of interacting helical pairs, in which pairs of helices were clustered by their shape similarity (49). It was found that just five clusters accounts for about 74% of all observed interacting helical pairs. These clusters can be rationalized in simple principles of helix-helix packing that goes back to Crick (50). The recurring geometric patterns of helix-helix interactions were organized into a library of spatial motifs of interacting helical pairs (49). The classification of spatial motifs and the library of interacting helical pairs lead to important understanding of the structural organization rule of helix assembly (40). This approach also proved to be invaluable in predicting membrane protein structures (40).
Somewhat similar approach was adopted by Martin et al for β-barrel membrane proteins. From the decomposition of known structures of β-barrel membrane proteins, a library of four residue fragment were constructed (51). It was found that there are strong preferences for different fragments to be located at different regions, and there are also specific preferences for inter-strand contacts between these fragments (51).
Another approach for discovery of spatial motifs of interacting residues is by comparing the frequency of observed appearance of certain spatial patterns of interacting residues with the frequency of what would be expected by random chance if there were no specific interhelical or interstrand interactions (52; 30; 44). The serine-zipper spatial motif (Fig 2a) was found in cytochrome c oxidase and in erythropoietin receptor (53; 54), where multiple repeated S-S interacting pairs form a large number of H-bonds (52). The placement of these small Ser ensure close packing between helices (55; 49). The polar clamps spatial motif (Fig 2b) involves three residues located on two helices, where a residue capable of forming two or more H-bonds is clamped by H-bonds formed with two residues (53). This motif is highly conserved among G-protein coupled receptors, and likely contributed to stability and specificity of the assembly of TM helices (53).
A systematic analysis of triplet interactions involving three-residues revealed a number of additional spatial motifs, such as A-G-F and A-G-G (Fig 2c) (52). These well-defined spatial conformations exist on helices of unrelated proteins with similar parallel/antiparallel orientation and similar crossing angles (52). Often, well-known sequence motifs such as GG4 and AG4 participate in these higher order motifs of interaction (52).
In β-barrel membrane proteins, Trp and Tyr residues are found to form a frequently occurring motif through non-H-bonded interaction (Fig 2d). The spatial motif aromatic rescue consists of interacting G–Y residues and G–F residues across neighboring strands (30). The Tyr residue adopts an unusual rotamer and covers the backbone of Gly through H-bonding (Fig 2e). This motif stabilizes the protein structure by mitigates the instability Gly causes, as it prevents exposure of the backbone around Gly to solvent, at the same time minimizing exposure of aromatic ring to the solvent (56; 30). Experimental studies on similar motifs in soluble proteins showed that they contribute significantly to protein stability and affect folding dynamics (57). Other spatial motifs found in β-barrel membrane proteins are discussed in (30).
Both sequence and spatial motifs are products of selection pressure on membrane proteins throughout evolution, either for structural integrity or for biological function. As evolution is a general driving force of biological machineries, we discuss how patterns of evolution of membrane proteins can be detected and how they can be used for biological predictions.
An essential computational tool for membrane protein studies is sequence alignment (58; 59), which is used in database searches for homologous proteins. A key component of sequence alignment is the scoring matrix for quantification of sequence similarity.
Standard scoring matrices such as BLOSUM and PAM used in default NCBI sequence alignment were derived from soluble proteins (60; 61), and are inappropriate for membrane protein studies. Overall, membrane proteins are under unique physicochemical constraints, and experience selection pressure very different from that of soluble proteins. The patterns of allowed and forbidden substitutions at different positions of the transmembrane segments are different from that of soluble proteins. Scoring matrices therefore need to be specifically designed to capture the evolutionary pressure experienced by the TM segments.
A number of specialized scoring matrices have been developed for helical membrane proteins, including the SLIM and the PHAT matrices. Their applications result in significant improvement in identifying homologs of membrane protein (62; 63). These scoring matrices, however, are inappropriate for β-barrel membrane protein studies. As the lipid bilayer of bacterial outer membrane has different composition (eg., the presence of lipopolysac-charides, LPSs), there are significant differences in the selection pressure experienced between helical and barrel membrane proteins. Results of a rigorous test showed that scoring matrices SLIM and PHAT designed for helical membrane proteins misidentified soluble proteins and random sequences as β-barrel membrane proteins (64).
Customized specific scoring matrices can be derived based on a general framework for analyzing amino acid residue substitutions (65). Using a continuous-time Markov process to model amino acid substitution and a Bayesian Monte Carlo estimation algorithm (65), the instantaneous substitution rates of residues in the TM-segments of β-barrel membrane proteins were estimated (64). Scoring matrices specific for different evolutionary time were then derived from the estimated rates (Fig 3), and were shown to have significantly improved sensitivity and specificity in detecting remote homologs of β-barrel membrane proteins (64). As the estimated substitution rates encode probability of exchanges between different residue pairs, they can also be used to suggest design of mutagenesis studies.
A remaining open question is whether evolutionary patterns are sufficiently similar between bacterial and mitochondrial β-barrel membrane proteins, and whether the same scoring matrices would capture their common evolutionary selection pressure. As machineries and mechanisms involved in the assembly of both bacterial and mitochondrial β-barrel membrane protein are quite similar (66; 67; 68), their substitution patterns in the TM strands may be very similar. Further computational study is required to resolve this issue.
Another widely used empirical approach to extract evolutionary information from sequences is using the method of PSI-BLAST (69). Evolutionary information of the transmembrane segments are implicitly encoded in search results, and can be organized into a profile or a position specific weighted matrices (69). Such information can be effectively used to develop machine learning method, for example, as input data in the construction of a Hidden Markov Model or training a neural network for predicting the topology of transmembrane helices (70; 18).
Phospholipid molecules are not only building blocks of membrane, they also play important roles in influencing the topology, folding, and assembly of membrane proteins, as well as in modulating their biological functions (71). By estimating the site-specific ratio of synonymous vs. non-synonymous substitution of the underlying DNA sequences, selection pressure experienced at individual amino acid positions can be measured (72). It was found that among lipid-facing residues, there are specific lipid binding sites that are evolutionarily conserved. These include the cholesterol-binding sites in β2-adrenergic receptor and in Na-K ATPase, the cardiolipin binding site in formate dehydrogenase-N, and the PG binding site in the KcsA potassium channel (72).
If a particular mutation affects the stability or the function of the protein, another mutation might occur at a different position to compensate the effects of the original mutation (73; 74). This phenomenon of co-evolution of residues has been exploited for identification of packing interfaces between helices (75; 76) and for detection of residues that mediates gating in voltage-dependent potassium channels (77).
Evolutionary information can also help to understand the function and classes of poorly characterized membrane proteins, such as those obtained from large scale genome and meta genome sequencing projects (78; 79). For example, there are now a large number of new sequences homologous to ar-chaeal retinal-containing rhodopsin-like proteins found in marine bacteria, fungi, and unicellular algae (80). However, there is a lack of understanding of basic aspects of the biology of these sequences. Structures of known bacterial rhodopsins and evolutionary information contained in the homologous sequences helped to predict and delineate the functional relationship of these rhodopsin-like proteins (81). Although retinal-binding rhodopsins fold in similar structures, the residue make-up of the retinal-binding pockets may be tuned to adapt to different biological functions. Using residue fragments that form the retinal-binding pocket and amino acid substitution matrices derived specifically for the retinal-binding pockets, a relationship tree was obtained that groups rhodopsins by their biological function. This tree characterizes well rhodopsins with known functions, and predicts the functions of uncharacterized rhodpsin-like sequences (81). For example, Gloeobacter violaceus rhodopsin was grouped into the same branch as the xantorhodopsin from Salinibacter ruber, which uses carotenoids for light harvesting in the blue-green region of the light spectrum (82). Subsequent experimental studies showed that G. violaceus rhodopsin indeed binds specifically a carotenoid molecule, which functions as an antenna for light-harvesting (83).
The physical forces that hold membrane proteins together are of fundamental interests (84; 85; 86; 87; 88; 39; 89). Below we discuss experimentally measured hydrophobicity scale and how they are useful for computational studies. We also discuss equivalent scales derived from analysis of structures and sequences of membrane proteins, as well as their applications.
Extensive studies have been carried out to measure the free energy of inserting a residue into the lipid membrane. By measuring partitioning of a model helix-forming peptide between water and a reference state, the free energy of helix insertion into a membrane environment is obtained (90; 91; 85). As the environment of inserted helix is important, both membrane center and interface were taken as the reference state in measurements (90; 91; 92). Recently, the free energy contribution from individual amino acid for inserting a TM helix into the biological endoplasmic reticulum (ER) membrane via the Sec61 translocon were measured (93; 94; 28). The resulting insertion free energy scale, called the biological hydrophobicity scale or translocon scale, was the first free energy scale for insertion into a biological membrane. Very recently, the first water-to-bilayer transfer free energy scale measured in the context of a native membrane protein and lipid bilayer was reported (95).
These experimentally derived insertion free energy scales (or hydrophobic scale) have been used effectively in computational studies of membrane proteins. For example, both the Wimley-White whole residue octanol scale and interface scale can be used to accurately predict TM helices in membrane protein (96; 97). The biological hydrophobicity scale was also successfully used in predicting membrane protein topology, with the topology of 79% of a set of 123 membrane protein chains predicted correctly, which is better or comparable to hidden Markov model based methods (29).
Similar hydrophobic scales have also been developed computationally through statistical analysis of known structures of membrane proteins (7; 2; 98; 99; 30; 100; 36). The main idea is to estimate the ratio of the frequency of observing an amino acid residue in the TM segment vs what would be expected by random chance (101; 102). Similar to experimentally measured scale, the empirical hydrophobic scale can also be made dependent on the local helical position of the residue (103; 100), as well as the random model for expected frequency, which is equivalent to the reference states in experimental studies (102; 43; 44). For example, both computed and measured free energy costs of embedding Asn and Gln strongly depends on their location in the TM helix (103). An empirically derived statistical potential function has been successfully applied in genome wide prediction of membrane proteins, with test results indicating an accuracy of 99% (35; 36). Such potential function can also be used to estimate the tilt angle of a TM helix with respect to the bilayer normal, and to select amino acids in membrane protein design studies (100).
Empirical hydrophobicity scale can be used to predict lipid-facing regions of membrane proteins. From structural analysis, it was found that there are strong preference for certain residues to face the headgroup and the hydrocarbon core regions of the lipid membrane (99). For example, Lys, Arg, Trp, Phe and Leu prefer to face the head-group region of the lipid bilayer instead of facing other helices, whereas Ile, Leu, Phe and Val prefer to face the hydrocarbon core region of the lipid bilayer. Small and polar residues are more likely to be buried inside the helical bundles and are lipophobic. In addition, it was found that Trp is frequently found in the hydrocarbon region, with its side-chain forming extensive interactions with residues on neighboring helices (99). This finding was consistent with subsequent experimental study in which it was found that Trp strongly supports self-assembly of TM helices, especially when placed on the g-position of the standard heptad. This position facilitates the side chain of Trp to interacting with neighboring helices (104). Overall, buried or interior-facing residues are significantly more polar and, hence, lipophobic, than the exterior residues (105). This lipophobic effect may play a general role in the folding and assembly of membrane proteins by encouraging the overall aggregation of TM helices, with the final structure determined through more specific interhelical H-bond, packing interactions, and loop constraints (105).
The lipid preferences of residues were quantified as a specialized empirical propensity scale called TMlip (for TransMembrane helix-LIPid) potential (99). TMlip was successfully used to predict the orientation of TM helices relative to the phospholipid bilayer (106). Based on a canonical model of the coiled-coil heptad repeat and the combination of the TM-lip propensity and evolutionary information, a computational method called LIPS (LIPid-facing Surface (105) can predict helical surface patches interfacing lipid molecules at 88% accuracy. Other studies based on surface propensity scale and evolutionary information also reported excellent results (77; 107; 108; 109). A recent study that integrates evolutionary profiles and propensities for both membrane exposed residues and solvent exposed residues reported excellent performance (110).
The Lips method is also useful in detecting inconsistencies in the structures of membrane proteins, such as the two structures of cytochrome b6f complex (105). It has also been used to aid in methods of template-free protein structure prediction (111), as well as in suggesting experimental studies (112). Further development based on TMlip potentials allowed the development of the Rants method (for RANKING of Transmembrane helices by Solvent accessibility) (106). Predictions made by Rants have been shown to be useful in designing experiments to identify interior facing residues and important polar interactions in the anion transporter SulP protein family (112).
Physical forces beyond single body or insertion free energy are also at play in stabilizing membrane proteins. These include multibody interactions involving two or more helices or strands.
Polar residues buried in the membrane environment likely contribute significantly for maintaining the stability of membrane proteins and their functions (85; 113; 114; 53; 103). Introduction of a single Asn, Asp, Glu or Gln in the TM segment can provide sufficiently strong driving force for helical self-association (113; 114). A survey of known membrane protein structures showed that polar and ionizable residues form extensive H-bond connections between TM helices, as virtually all TM helices form one or more interhelical H-bond (53).
Due to extreme experimental difficulty, quantitative assessment of the magnitude of the H-bond energy in the transmembrane environment became available only recently through elegant studies of double-mutant cycle analysis (115). The average energy of side-chain H-bond interactions is found to be modest (−0.6 kcal/mol). It is possible that the unfolded state of membrane proteins may already have H-bond largely satisfied through alternative interactions (116), as the polarity of the interior of both membrane and soluble proteins are quite similar (99). The apparent contribution of H-bond for specific helical interactions in membrane protein seems to vary significantly (115).
Other physical forces important for the assembly of TM helices include side chains packing, overall helical packing with small residues at helical-helical interfaces, aromatic interactions, and salt bridges (98; 117; 118; 119; 89; 120; 121). For β-barrel membrane proteins, the classical model of β-strand interactions of β-sheets, in which backbone H-bond, side-chain interactions, and weak H-bond stabilize neighboring strands, works well (122; 123; 30) (Fig 4a–c). The energetic contributions of H-bond and residues in the aromatic girdles of TM strands in the protein OmpA have been measured (124; 125). Recently, it was found that specific interactions between lipid and the TM strands of protein FhuA also provide significant stability to the TM domain (72). Based on the TMsip potential function and the reduced state space, it was found that strands 7–9 form the most unstable region in the protein FhuA, and strand 8, which runs through the middle of the LPS-binding site, has the highest energy. These strands are stabilized by biding a lipid molecule.
Inter-helical and inter-strand interactions are also the major source of the mechanical stability of membrane protein, as shown by unfolding experiments of bacteriorhodopsin using atomic force microscopy and single-molecule force spectroscopy (126). In addition, these physical forces can be directly linked to protein stability as measured by the calculated melting temperatures of β-barrel membrane proteins (127) (Fig 4a–c and Fig 5).
A number of empirical potential functions have been developed for helix-helix interactions. In an early study, pairwise potential function based on statistics of known structures was successfully used to predict super-secondary structures of several packed small TM helices (128). Based on an atomic probabilistic model and packing contacts detected through Delaunay triangulation of membrane protein structures, a potential function for helix interaction called MHIP (for membrane helical interfacial pairwise propensity) was developed (98). By combining packing and helix contact analyses, Eilers et al developed an interfacial propensity scale for prediction of the relative orientation of TM helices (129). Dobbs et al developed a potential function for predicting inter-helical packing based on optimized discrimination of native helix-helix interactions from Monte Carlo generated decoy structures (130). Further development includes distance-based empirical potential that works well in predicting anchoring helix pairs (131). An interhelical contact potential was developed using a reduced alphabet of four amino acid types, which can discriminate native structures from many decoy conformations (132). Another empirical pairwise potential function for helical interaction was a major component of the force field used in the ROSETTA structure prediction method (111; 133; 134).
For β-barrel membrane proteins, an empirical potential function called the TMsip (for TransMembrane Strand Interaction Propensity) potential has been developed based on the canonical interaction model of β-sheet (122; 123; 30). Its null model is the rigorous permutation model discussed earlier. The TMsip potential function can be used to identify weakly stable region in the TM domain (127). Analysis of these weakly stable regions revealed four general mechanisms that β-barrel membrane proteins use to stabilize the TM domain (Fig 5): the well-known in-plug mechanism, as seen in FhuA (135), in which an inter-strand loop or a separate domain folds back and plug into the interior barrel to stabilize the TM barrel; the out-clamp mechanism as seen in PagP and hemolysin, in which a secondary structural element such as a helix outside the barrel stabilizes the TM barrel (136; 127), the newly discovered mechanism of specific lipid binding, in which the unstable region of the TM barrel is stabilized through specific strong binding with the LPS lipid molecule (72), as well as the mechanism of protein-protein interactions with which weakly stable regions are stabilized by another membrane protein (127).
Empirical potential function can also help to gain biological understanding. There are many examples where excellent agreement were reported between results obtained using experimentally derived free energy scale and those obtained using empirical potential function. For example, the measured free energy scales of inserting amino acid residues embedded in a helix into the endoplasmic reticulum agrees well with free-energy profiles derived from statistical analysis of membrane protein structures (28).
Multiple Arg residues are found in the S4 trans-membrane helix of KvAP ion channel and other related channel proteins, and are likely to be important in sensing membrane depolarization and mediating channel gating (137; 138). Intuitively, these ionizable residues found in the hydrophobic core of lipid membrane would be energetically costly, and it is important to understand the physical basis of their locations. There is significant discrepancy in free energy of inserting Arg into the hydrophobic core when measured experimentally vs when calculated from molecular dynamics (MD) simulation (93; 139; 140; 141; 95). It was found that extra helices facilitate the retainment of hydration water molecules, which reduces solvation cost significantly (142). It was also suggested that part of the discrepancy may be because MD simulation does not account for the tendency of Arg side chain to snorkel towards the membrane-water interface (141). Although simulations are carried out using physics based force field, the large number of parameters involved and the difficulty in ensuring full sampling of an equilibrium ensemble of conformations may be sources of non-negligible errors (92; 141).
Hydrophobicity scale and empirical potential function can offer significant insight. According to the analysis of Hristova and Wimley using the experimentally derived Wimley-White scale (92), less than two Ala to Leu substitutions are required to compensate for one Ala to Arg substitution. It was found that it is easier to insert Arg in the interface region than the core of the bilayer (92).
The occurrence of Arg in hydrophobic core can also be understood through empirical potential function. Important favorable interactions across neighboring helices/strands often exist (98; 53; 103; 132), and such context dependent interactions will significantly modify the overall free energy of the protein. Since measured insertion free energy scales were mostly based on studies designed with single TM helices, the observed occurrence of Arg in the hydrophobic core of natural membrane proteins can be better interpreted with additional consideration incorporating inter-helical and inter-strand interactions.
This can be illustrated by analyzing the energetic consequence of embedding an Arg residue in the TM segment of a β-barrel membrane protein using the empirical potential function TMsip (Fig 4) (30). Arg in β-barrel membrane proteins facing inside the β-barrel pore is energetically favorable, but very unfavorable when facing the lipid membrane (30). Through interstrand interactions, there are three additional types of interactions that are major contributors to the stability of TM β-strands, namely, strong H-bond between main chain (C-O…H-N), side-chain interaction (R…R) including side chain H-bond, and weak H-bonds between C-O…H-Cα (122; 123; 30). According to the recently updated version of the TMsip scale incorporating additional structural data, Arg can be stabilized by main chain H-bond interactions with Ala, Trp, Val, and Thr, if they are located on appropriate positions of the neighboring strands (30). Since side chain H-bonds are known to contribute only modestly to the overall stability of membrane proteins (143), the context dependent main chain H-bond interactions are likely the main contributors that modifies the single-body energetics of Arg insertion.
According to TMsip, Arg is only slightly energetically unfavorable in the extracellular interfacial region, but is highly unfavorable in the hydrophobic core region and the periplasmic interfacial region (30). As Arg residue is inserted from the periplasmic side into the lipid bilayer, favorable main-chain H-bond interactions with Ala, Trp, Val, and Thr located on neighboring strands may compensate for the unfavorable insertion of Arg (Fig 4). This compensation effect would facilitate the translocation of Arg towards the more favorable extracellular interface in β-barrel membrane protein.
Experimentally measured insertion free energy derived from studies of single helix experiments can be regarded as one-body energetics, and the equivalent empirical potential function are hydrophobic scale involving only a single residue and its depth in the membrane environment. An accurate account of the full energetics of residues in the context of a wild type membrane protein needs also incorporate the effects of inter-helical or inter-strand interactions, namely, the two-body interactions. It is possible that higher order cooperative effects may also be relevant (52; 39).
In a recent study, the free energy changes in a wild type membrane protein were measured when an Ala was replaced with each of the 20 amino acids (95). This is the first time such measurements were made in wild type membrane protein placed in a lipid bilayer. Although Trp fluorescence was employed in experimental measurement, and Glu and Asp are mostly likely in the protonated state at the experimental condition, free energy changes of replacing Ala with the other residue types provide a wealth of quantitative information about membrane protein stability. It was found that Arg substitution incurs only a modest free energy cost (95). Although the analysis of this study was based on a simple one-body additive insertion energy model, interstrand interactions that is context dependent at the host position is likely to be non-negligible in wild type membrane proteins. The wealth of information provided in studies such as (95) can be used for alternative analyses using a statistical mechanical model (127) that considers context dependent interstrand interactions as well as non-native conformations, which works well to account for observed nonlinear and non-additive effects.
An idealized model of helical membrane proteins is that of an assembly of highly hydrophobic helices connected by loops, with orientations perpendicular to the membrane plane. This is the model upon which many successful hidden Markov model (HMM) methods for topology prediction were based (144). However, recent structures showed that there are many irregular structures. Transmembrane helices are often kinked at varying length and tilt angle (145; 146). In the water-membrane interfacial regions, there may exist amphipathic α-helices parallel to the membrane plane (147; 148). In addition, there exists re-entrant regions that enter and leave the membrane from the same side of the transmembrane region (149).
About 44% of TM helices have kinks, with 35% of which associated with Pro residue, and others with Ser and Gly at the center of the kink (150; 151). Kinks are likely to be important for membrane protein function, as they provide locations for movement such as hinge bending, and introduces structural diversity even among members of the same protein family. It was suggested that Pro in ancestral proteins may have initiated such kinks (152). TM helices subsequently were stabilized through evolution to an extent that the maintenance of the kinked conformation no longer required the presence of Pro residues (152). Molecular dynamics simulation of single TM helix has been successful in identifying many kinks (151). In a study of 405 TM helices, it was found that 79% of the proline kinks, 59% of the vestigial proline kinks, and 18% of the non-proline helical kinks can be reproduced from 1 ns of MD simulation (151).
A study of the re-entrant regions using the technique of principal component analysis for dimension reduction revealed that these regions have distinct amino acid composition (149). As many re-entrant regions are found in transporters, Gly and Ala are abundantly found in this region (149). In addition, Ser and Thr are also enriched (153). Hidden Markov models developed based on these patterns can now predict the re-entrant regions successfully at 70–75% accuracy (149; 153).
If the structure of a homologous membrane protein exists, comparative or homology structural model can be built based on the template structure (154; 155). This technique has been applied fruitfully to study the G-protein coupled receptors (GPCRs), an important receptor for cellular signal transduction (155; 156). When a template structure is identified and a quality alignment is obtained, a specialized comparative modeling method Medeller can identify a reliable core structure, and build a structural model by extending the core to other TM region and to the loop region (157). This approach showed higher accuracy in modeled structure than generic homology modeling methods. For β-barrel membrane proteins, the TMBPro method takes predicted secondary structures and evaluate their overall energy to each structural template containing the same number of strands (158). Combined with conformational search via simulated annealing for the lowest energy alignment of the sequence to the structural template, the conformation with the lowest overall energy can be taken as the predicted structure (158). It is expected that improvement in alignment and detection of remote homologs can be obtained through usage of customized scoring matrices (64; 159). This will allow further leverage of current knowledge of existing membrane protein structures, at a rate of about 130 proteins per template structure (159). Furthermore, these scoring matrices are found to be useful for identifying mitochondria outer membrane proteins in eukaryotes (159).
A more challenging task in structure prediction is when there is no known structures that can serve as the template structure. That is, none of the homologous proteins have known structures. The Rosetta de novo protein structure prediction method has been extended to predict structures of helical membrane proteins, without the need of a template structure (111; 133; 134), although no template-free methods currently exist that can predict structures of β-barrel membrane proteins.
Using an empirical potential function that combines van der Waals interaction, backbone torsional force, electrostatic interaction, and orientation dependent H-bond interaction, Barth et al developed a method based on Rosetta Monte Carlo sampling that can successfully recover the side chain conformations of membrane proteins, can model distorted TM helices, and can predict the conformation of glycophorin A interface (133). Further prediction of likely interacting helical pairs with a large sequence separation was obtained from a carefully constructed library of interacting helical pairs and the evolutionary profiles of the two helices. With such predicted inter-helical geometry and co-factor coordination when available to restrict the conformational space, Barth et al successfully predicted three dimensional structures of a divers set of membrane proteins with different size, topologies, and biological functions, with excellent results at the level of about 4 Å in RMSD (134).
Partial experimental information that is insufficient on its own right for structure determination can be very effective in guiding computational prediction methods towards a much smaller feasible space for conformational search. An important form of experimental data is coarse grained density map of cryo-electron microscopy at medium-resolution (7–10 Å), in which helices are better resolved as rods than strands and loops. By placing predicted helices into the density rods for helices and adding modeled loops, the overall structures of helical membrane proteins can be predicted in some cases with much improved resolution, although this method hinges upon the correct prediction of helices (160). Combining CryoEM data with evolutionary information, the Cα-trace model of the transmembrane domain of human copper transporter 1 was also successfully constructed (119; 161).
Another approach is to integrate experimental mutagenesis data into the structure prediction protocol by biasing the selection of the final model towards those that are consistent with the experimental mutagenesis results. This approach has been applied successfully to predict the structure of the transmembrane domain of the homodimeric BNIP3 (162) and the heterodimeric structure of complete αIIb and β3 complex (163). However, significant amount of experimental data are required, and therefore this approach is best-suited for well studied membrane proteins. A general theoretical framework to generate protein structures that satisfy different experimentally derived restraints described in (164) may be useful for such tasks.
Great progresses have been made in predicting structures of membrane proteins. However, many important problems in membrane protein studies require information beyond that of a single native structure. Below we first discuss studies on the ensemble nature of conformations of membrane proteins, which is the basis of their thermodynamics properties. We also discuss prediction of oligomerization state and protein-protein interactions. In addition, we discuss future development in protein design, in which computational studies will likely make significant contributions.
There are many important questions beyond the knowledge of a single predicted structure. For example, do membrane proteins exist in multiple conformations (Fig 4)? What are their associated probabilities? How thermodynamic properties can be calculated from ensemble properties of conformations? How do dynamic transitions occur among these conformations and how such changes may contribute to the observed biological functions?
In the study of β-barrel membrane proteins, progresses have been made in addressing some of these questions (127). Because of the relatively regular pattern in strand interactions, the conformational space of TM strands can be effectively modeled using a simplified state-space model (30). By assuming a reduced conformational space in which each strand can slide up or down for a total of 7 positions, one can enumerate all possible conformations and calculate the energy value for each conformation. Thermodynamic properties of the transmembrane domain can then be computed (127). Fig 5 depicts one such thermodynamic property, namely, the relative melting temperature calculated for the TM domains of a number of β-barre membrane proteins.
It is important to consider non-native conformations in computing thermodynamic properties of membrane proteins. Although it was not immediately obvious why TM strands would not always adopt the ground state conformation, as it would be very costly to break all the H-bonds to move up or down to a different register, experimental results on PagP showed that there can be significant conformational change when different detergent is present (165). In fact, alternative conformations with low energy may serve as obligate on-pathway transient states (166).
Recent studies in helical membrane protein demonstrated the flexible nature of transmembrane helices, which contain many kinks, bulges, and re-entrant loops (149; 167; 150). Furthermore, the spatial close proximity among newly synthesized TM helices during co-tranlational insertion to membrane suggests that there may exist interhelical interactions even in the early stage of membrane protein folding (168). For example, several experimentally determined TM helices in Gltph glutamate transporter were found not to have lowest free energy of insertion in wild type protein, and the segment with both measured and predicted lowest free energy has significant position displacement compared to the wild type protein (168). These findings suggest that TM helices may shift positions dramatically during the folding and oligomerization process, which may be important for bringing functionally important polar residues into places.
Overall, the population of alternative conformational states may play important roles in determining the final native structure and function of membrane protein, and in ensuring the overall stability and robustness of the cell machineries in which membrane proteins are important components.
A genomic scale survey of domain combinations of helical membrane proteins suggested that membrane proteins exist mostly as single domains, and oligomerization within the membrane may be the general mechanism for membrane proteins to gain new biological functions (169; 170). For GPCRs, characterizing their oligomerization state is of considerable importance (171). Computational docking and molecular dynamics simulations have been applied in gaining insight into the oligomerization state and in delineating the protein-protein interface (see ref (171) for a recent review).
The oligomerization state of β-barrel membrane proteins can be accurately predicted computationally (127). Based on the TMsip empirical potential function and the reduced conformational state model, it was found that the average deviation in energy of the unstable strands from the mean of all strands serves as an excellent predictor of the overall oligomerization state of the membrane protein. In a leave-one out blind test of 25 non-homologous β-membrane proteins, in which each of the protein is taken in turn for testing, while the remaining 24 proteins used for model construction, excellent results are obtained in predicting the oligomeric state. As subsequently realized that protein FhuA can exist in dimeric form, the predictions of the oligomerization state for these 25 β-barrel membrane proteins are 100% accurate with 100% specificity (172; 127). These predictions are robust, as the outcome does not depend on specific choice of structures used in the construction of the energy function. Furthermore, as structural information is not essential for such predictions, the oligomerization state can also be predicted quite successful even when only sequence information is employed (127): The accuracy and specificity are 96% and 94% when only sequence information is used (127), respectively, with the consideration that protein FhuA indeed form a dimer (172).
The interface of protein-protein interaction for β-barrel membrane proteins can also be predicted (127). Based on the observation that the protein-protein interface is enriched with weakly stable strands, interfaces can be predicted either with the knowledge of the structure where high accuracy can be achieved, or with sequence information only where accuracy is slightly degraded (Fig 6). Another approach based on the machine learning method of random forest can also predict residues located in the protein-protein interface accurately (173).
Success in predicting the oligomerization state and in identifying protein-protein interaction interface in the TM domain will likely reveal novel insight into the mechanism of many membrane proteins. For β-barrel membrane proteins, mutations can be suggested that would strongly affect the oligomerization state (Fig 6, inlet). It is conceivable that protein-protein interface for eukaryotic membrane proteins can also be predicted, and mutants with different oligomerization behavior can be engineered. For example, the eukaryotic proteins VDAC found in mitochondria oligomerizes during the induction of apoptosis (174). Predicted oligomerization site on VDAC can aid in experimental design of studies to identify key residues involved in VDAC oligomerization. Such investigations will be important for studying the underlying mechanism of apoptosis (174; 175).
De novo protein design and protein engineering aim to produce proteins with new or enhanced activity and stability. Although significant progress has been made in recent years (176), there are only a limited number of reported successes in de novo membrane protein design. The most promising approach is to extend computational methods used for the design of globular proteins. This approach lead to the successful design of a four helix bundle membrane protein engineered to bind two Fe(II/III) diphenylporphyrins in a bis-His geometry. This designed membrane protein forms a channel capable of transmembrane electron transfer (177). There has also been significant progress in the design of small peptides that target the transmembrane proteins and inhibit protein-protein interactions in the TM domain (178). Anti-αIIb peptide that targets the transmembrane domain of the α subunit of the integrin αIIbβ3 disrupts the heteromeric helix-helix interactions. The specificity of the designed anti-αIIb was validated both in vitro and in vivo (178; 179).
As more structures of membrane proteins become available, improved understanding of their organizational principles has led to efforts in engineering membrane proteins with improved protein stability (180). For example, a metal binding site was engineered in the mastoparan X protein, an am-phiphilic α-helix that is too short to form a stable helix in water. This newly acquired metal binding ability stabilizes the helical structure of the protein, and increased the binding and lysis ability of the protein to the membrane (181). Longer transmembrane regions were also engineered for the β-barrel membrane protein FhuA to match the hydrophobic cores of thick polymeric membranes, with the goal for targeted drug delivery (182).
There have also been successes in engineering stability of oligomerized membrane proteins. Using a statistical potential function, mutations that would stabilize or destabilize the dimeric interface of GPCRs were predicted based on a de novo designed rhodopsin homodimer model (183). These predictions compared favorably with experimental studies (183). Computational study on β-barrel membrane protein has also suggested that oligomers form primarily due to the instability of monomers. Such oligomerization can be altered by mutations that stabilize or destabilize the monomeric form of the β-barrel membrane protein (127).
Success has also been reported on engineering the geometry of β-barrel membrane protein. Most β-barrel membrane proteins consist of an even number of strands, and β-hairpins are often thought as the basic repeating unit (184; 185). It is plausible that the evolution of β-barrel membrane proteins are based on the modularity of hairpin duplication and oligomeric assembly of these hairpins (185). Indeed, bacterial toxin α-hemolysin and the multidrug efflux system TolC forms β-barrel membrane protein upon oligomerization once multiple hairpins are inserted into the lipid membrane (186; 187). Arnold et al constructed an artificial β-barrel membrane protein by duplicating the sequence of 8-strand OmpX. The resulting protein has a pore size of that of a 16-strand porin based on single-channel conductance measurements (185).
Pores with specially constructed filters have been successfully engineered to control the flow of ions and metabolites through the membrane bilayer. β-barrel membrane protein OmpF, which is slightly cation-selective due to the −1 net charge in the filter region, has been converted into Ca2+-selective channel by carefully mutating two Args located in the constriction zone to Glus (188). Similarly, aquaporin-1 filter was engineered to enhance proton conductance computationally and the results were subsequently confirmed by experiments (189). Dynamics of reconstituted native plugged FhuA channels in an ion-conducting state have been studied by adding 4M urea on the cis side, which reversibly unfolds the plug domain and open an ion-conducting pathway that mimics the TonB dependent channel (190). Mutants of OmpF whose extracellular loops were deleted one at a time were also engineered to be pH insensitive (191).
It is likely that the pace of designing membrane protein will accelerate, and many more novel membrane proteins with desirable biophysical properties and novel or enhanced functions will be made.
We have summarized key aspects of computational studies of membrane proteins, including bioinformatics prediction of membrane proteins and their topology, the discovery and implication of sequence and spatial motifs, membrane protein evolution and the substitution patterns of amino acids in the TM domain, as well as the modeling of the underlying physical forces through empirical potential function. We have also discussed recent successes in structure prediction and in protein-protein interactions prediction, as well as progress in characterization of ensemble properties of membrane proteins. We believe that computational studies based on both the underlying physical forces as well as bioinformatics analysis of evolutionary signal will continue to make important contributions in understanding and manipulating membrane proteins that compliments experimental investigations.
We thank Drs. Patrick Barth, Ann Delcour, Robert Eisenberg, Alan Finkelstein, Dennis Gessman, Nathan Joh, Linda Kenney, Renhao Li, Stephan Nussberger, Varda Shoshan-Barmatz, Alessandro Senes, Ann Stanley, and Bill Wimley for very helpful discussions. This work was supported by NIH grants GM079804 and GM086145, NSF grants DBI 1062328 and DMS-0800257, and ONR grant N00014-09-1-0028.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.