|Home | About | Journals | Submit | Contact Us | Français|
α-Helical transmembrane (TM) proteins play an important role in many critical and diverse biological processes, and specific associations between TM helices are important determinants for membrane protein folding, dynamics and function. In order to gain insights into the above phenomena, it is necessary to investigate different types of helix-packing modes and interactions. However, such information is difficult to obtain because of the experimental impediment and a lack of a well-annotated source of helix-packing folds in TM proteins. We have developed the TMPad (TransMembrane Protein Helix-Packing Database) which addresses the above issues by integrating experimentally observed helix–helix interactions and related structural information of membrane proteins. Specifically, the TMPad offers pre-calculated geometric descriptors at the helix-packing interface including residue backbone/side-chain contacts, interhelical distances and crossing angles, helical translational shifts and rotational angles. The TMPad also includes the corresponding sequence, topology, lipid accessibility, ligand-binding information and supports structural classification, schematic diagrams and visualization of the above structural features of TM helix-packing. Through detailed annotations and visualizations of helix-packing, this online resource can serve as an information gateway for deciphering the relationship between helix–helix interactions and higher levels of organization in TM protein structure and function. The website of the TMPad is freely accessible to the public at http://bio-cluster.iis.sinica.edu.tw/TMPad.
α-Helical transmembrane (TM) proteins are a major class of proteins pivotal for many critical biological processes, including signal transductions, bioenergetics, ion transport, cell adhesion and cell–cell recognition (1). It has also been estimated that ~20–30% of a typical genome encodes for proteins with a TM domain (2,3). Despite their biological importance and abundance, the mechanisms by which TM proteins fold into their native states remain elusive due to the limited number of solved structures (4). Therefore, continuous development of data collection and analytical methods that contribute to bridging the sequence-to-structure disparity in TM proteins is highly demanded.
The fold of an α-helical TM protein can be dissected into sets of interacting TM helices, connecting loops and extramembraneous domains. In particular, helix–helix interactions remain an important determinant of folding and stabilization by the commonly accepted two-stage model (5,6). Such an interaction is mediated by residue contacts at the helix-packing interfaces and also influenced by protein–lipid and protein–ligand interactions. In particular, how helix-packing contributes to membrane protein assembly has been the subject of many previous works aiming at delineating different helix-packing geometries, sequence motifs and preference of residue contacts. Historically, canonical models of helix-packing such as ‘ridge-into-groove’ (7) and ‘knob-into-hole’ (8) have been proposed for soluble proteins. Later, a comparison between helix-packing in soluble and membrane proteins also reported the remarkable differences in crossing angles, orientation, and packing density (9). In addition, several groups focused on the occurrence of sequence motifs in helix–helix interactions, from the more specific Gly and Ser zippers (10,11), to the degenerate ‘Gly-Ala-Ser’ and ‘Ala-coil’ motifs described by Walters and DeGrado (12). The above over-represented motifs provide clues to an interesting observation that small (Ala and Gly) and polar (Ser) residues likely contribute to stability by van der Waals (VDW) and hydrogen bond interactions, respectively. Furthermore, it was also found that non-hydrophobic residues are often buried in the TM core at conserved positions (13). Recently, the above-mentioned helix-packing features were recapitulated and tested by designing a novel peptide sequence which yields high binding specificity to a target TM helix (14). This work demonstrates that the principles of helix-packing in TM proteins can be adapted for gaining insights into membrane protein folding and designing new modulators of protein–protein interactions.
With the above reported progress in this field, one can clearly get a picture that helix-packing in membrane proteins is not governed by a single factor alone; and understanding this phenomenon may require a multi-faceted approach, in which the integration of helix-packing geometries, sequence motifs, contact residues/side-chains, as well as protein–lipid and protein–ligand interactions must be considered. Several databases of related to the above features in membrane protein have been published to-date. One of them is the PDBTM (15), a collection of TM structures from the PDB (16) with computed helical boundaries using the TMDET algorithm (17). Another related work is the OPM database (18) which also contains experimentally determined TM proteins with calculated membrane depths and the orientations of TM proteins in the membranes. In parallel to the above works, the TOPDB database (19) contains topologies of membrane proteins from combined experimental and predicted information. In contrast, two recent methods namely the HIT (20) and the MPlot (21), focused on deriving helix–helix interactions from membrane protein structures in the form of web servers. Both methods allow calculations and visualization of helix–helix interaction features such as interhelical contacts and crossing angles. Another recent work, the MeMotif database (22), provides a collection of linear structural and functional sequence motifs in membrane proteins. Although the above works provide insights into helix–helix interactions in membrane proteins, many other valuable details of helix-packing descriptors are still missing and their relationships to topology, lipid accessibility and ligand information remain unclear. Furthermore, there is still a lack of a comprehensive, well-annotated and integrated repository for existing helix-packing folds in TM proteins.
In the light of addressing the above issues, we have developed the TMPad, an integrated repository of helix-packing folds in α-helical TM proteins. The helix-packing folds in α-helical membrane proteins can be viewed as ensembles of tightly packed helical substructures mediated by helix–helix interactions. The TMPad aims to provide a comprehensive coverage of observed helix–helix interactions with the corresponding sequence, topology, lipid accessibility, ligand and binding site information extracted from experimentally determined membrane protein structures. Furthermore, the TMPad contains annotations of pre-calculated geometric descriptors of each helix-packing interface including residue backbone/side chain contacts, interhelical distances and crossing angles, helical translational shifts and rotational angles. The TMPad also provides structural classification, schematic diagrams and 3D visualization of the above structural features of TM helix packing. To the best of our knowledge, the TMPad is the first database to provide detailed and integrated annotations of helix-packing structural features in α-helical TM proteins. Given this unique characteristic of the TMPad database, it can be used for deciphering the relationship between helix–helix interactions and higher levels of organization in TM protein structure and function.
The TMPad contains derived data for a comprehensive set of experimentally determined α-helical TM protein structures. In this section, we describe the data collection and preparation process. First, all available TM structures were obtained from the PDBTM and OPM databases. As a pre-processing step, we filtered out the following structures: (i) redundant structures of the same PDB identifiers; (ii) theoretical models and (iii) those without any α-helical TM domains. For each entry in the TMPad, different levels of information are collected in categories including ‘Overview’, ‘Topology’, ‘Helix–helix Interactions’, ‘Lipid Accessibility’ and ‘Ligands’. The details of data preparation in each category are described below:
The TMPad enables extensive searches and provides user-friendly interfaces for retrieving different levels of helix-packing related information about a TM protein. We show the summary of a typical workflow in the TMPad with an example in Figure 1. In this example we show the steps in accessing the information by searching for a PDB identifier or the corresponding keywords. The ‘Overview’ of the retrieved entry is first displayed. A quick overview of the protein including the lists of the complete and individual TM domain chains, experimental details and cross references is shown. The structural and functional annotations from GO and SCOP databases of the protein chains are also shown. The ‘Topology’ information is shown in a cartoon for the selected TM chain and we show the sequence, classification by HELANAL, orientation and computed tilt angles of each TM helix with respect to membrane. In the ‘Helix–helix Interactions’ page, users are first presented with a summary of all helical pair interactions of the selected chain, along with a helix–helix interaction graph in a top view where the nodes and the edges represent the TM helices and the pairwise interactions, respectively. Each interaction can be further selected and expanded for retrieving the detailed geometric descriptors of helix-packing described in the previous section on ‘Database Content’, in addition to the visualization in Jmol. Furthermore, in the ‘Lipid accessibility’ page, the helix–helix interaction graph is combined with RLA information represented by helical wheels. The edges of interaction graph are labeled with the number of contacts and crossing angles between the interacting helical pair. Each helical wheel shows a color-coded gradient scale of the calculated RLA value for each residue ranging from 0 (dark red: very buried) to 1 (dark blue: very exposed). In addition, the global surface of the protein is displayed in wireframes and both interior (highlighted in yellow) and exterior (in green) cavities can also be viewed in Jmol. Lastly, in the ‘Ligands’ view we list the ligands bound to all protein chains in a table with the contact residues. For visualization of ligands, we enabled a customized menu for viewing the selected ligand in Jmol. Users can select different options for visualizaing ligands and their contact side-chains, or adjust parameters for viewing the protein–ligand or ligand–ligand complexes.
There are several search options for accessing the information in the TMPad. As mentioned above and shown in Figure 1, users can enter a valid PDB identifier or keywords to begin the search. However, users are not limited to this option alone. To further illustrate the database functionalities, we discuss the following example as an application to study helix-packing in TM proteins. In this example, we assume the user is investigating and comparing the topologies, helix-packing interactions and ligand-binding sites of a specific protein family such as the aquaporins (AQPs). To retrieve a list of ‘AQP-like’ structures in the database, we limited the keyword search to the term ‘aquaporin’ and selected representives from different types of archaeal, bacterial and eukaryotic origins. When there were multiple structures found for a particular type, we selected the one structure solved by X-ray with the best resolution. The following representative structures were selected for this analysis: bovine AQP0 (PDB ID: 1ymg_A) (41), bovine AQP1 (PDB ID: 1j4n_A) (42), human AQP4 (PDB ID: 3gd8_A) (43), human AQP5 (PDB ID: 3d9s_A) (44), archaeal AQPM (PDB ID: 2f2b_A) (45), Escherichia coli AQPZ (PDB ID: 2o9g_A) (46), yeast AQY1 (PDB ID: 2w2e_A) (47), spinach PIP2 (plasma membrane intrinsic protein; PDB ID: 3cn5_A) (48) and E. coli GlpF (aquaglyceroporin; PDB ID: 1ldf_A) (49). We downloaded the data for all nine proteins and compared them based on different levels of information in the TMPad.
Several interesting observations can be made. First, although the above ‘AQP-like’ proteins are diverse in origin, substrates, or tissues expressed, they share striking similarities in their topologies and helix-packing. With respect to the membrane topology, the examined AQP-family proteins share the same overall architecture, with six TM helices and two re-entrant loops arranged in a pseudo 2-fold symmetry. The reentrant loops have been well-characterized for its important role in substrate recognition during translocation (50). This process is largely facilitated by the essential Asn-Pro-Ala (NPA) motif and its adjacent Arg residue located in the two re-entrant loops. We show the NPA motifs in AQPM and GlpF protein complexes with glycerol molecules in Figure 2a and b, respectively. It can be observed that the glycerol molecule is in contact with Asn199 (Figure 2a) or Asn203 (Figure 2b) of the NPA motifs in the re-entrant regions buried in the core of the channels. Several hydrophobic amino acids on other TM helices also participate in these interactions.
In addition, the examined proteins exhibit similar global helix–helix interactions patterns, resembling a ‘horseshoe-shaped’ network topology. We show the helix–helix interaction graphs for all proteins in Supplementary Figure S2. For brevity, only the helix–helix interaction topology of AQP0 supported by at least three VDW contacts is shown in Figure 3a. Each monomer of the above proteins has a total of five identical helix–helix interactions (TM1/TM2, TM1/TM3, TM2/TM5, TM4/TM5 and TM4/TM6). We further compared the proteins on the basis of their constituent helix–helix interactions. It can be observed that the same interacting helical pairs across all proteins show similar preferences toward their handedness (right-handed), crossing angles (with minimum and maximum standard deviations [SDmin, SDmax] = [2.08°, 4.42°]) and minimum distances between the backbone Cα atoms (SDmin = 0.15Å; SDmax = 0.45Å). The details of all helix–helix interaction comparison are listed in Supplementary Table S1. Interestingly, we also observed that TM4/TM5 utilized a higher number of hydrophobic (not including Ala) contacts than the others which were mainly composed of small (including Ala and Gly) and polar amino acids. For example, we show the hydrophobic contacts between TM4/TM5 of AQP0 in Figure 3b. Using a multiple sequence alignment program, M-coffee (51), we aligned the sequences of all proteins and observed that the hydrophobic residues (I, L, V) were concentrated on TM5 located one turn away from one of the conserved residues, His, whose key role was previously characterized for the selectivity filter in AQPs (50). However, AQPM and GlpF were exceptions to the above observation, where Gly and Ile were found at this position, respectively. The output of the multiple sequence alignment is in Supplementary Figure S3. It can be observed that some residues of the ‘hydrophobic patch’ are localized on TM5 in all proteins and they tend to be buried rather than exposed to the lipids. Our observation is in agreement with recent characterization of the hydrophobic residues exerting an additional effect on selectivity filter by lowering the permeability of hydrophobic solutes (52).
As of 10 October 2010, the TMPad contains 896 α-helical proteins, 2685 chains of TM domains and 10 289 helix–helix interactions. The vast majority of the proteins were solved by X-ray diffraction (85.5%) and solution NMR (11.4%). As for the origin of proteins, the top two groups include bacterial and eukaryotic species by taxonomy, accounting for 50 and 31% of the records in the TMPad, respectively. Furthermore, we also show the distribution of GO terms of the proteins containing TM chains in the TMPad according to the Cellular Component (CC), Molecular Function (MF) and Biological Process (BP) terms as annotated by GO in Figure 4. We calculated these distributions based on Level 2 terms that are immediate child nodes of the root node in CC, MF and BP and we also removed redundant counts for each protein. The top terms in CC include ‘cell part’ (>80%), ‘organelle’ and ‘organelle part’ (~55%), and ‘macromolecular complex’ (36%). For MP terms, most proteins carry the labels in ‘transporter activity’ (55%), followed by ‘binding’ (46%), ‘catalytic activity’ (39%) and ‘electron carrier activity’ (30%). Lastly, terms such as ‘cellular process’ (67%), ‘localization’ (64%) and ‘metabolic process’ (57%) are among the most overrepresented terms in BP. Furthermore, we also calculated and compared the amino acid compositions of the whole protein or TM domains in the TMPad as shown in Figure 5. As expected, the TM domains contain higher fractions in hydrophobic residues (maximum difference in Leu of 5.5%) and a sharp decrease for charged residues (difference of 2.6–3.4%). There is also a slightly lowered composition of polar residues and an increase in overall Phe composition (2.9%) in the TM domains, but no significant difference for Trp and Tyr.
With respect to helix-packing statistics, we show the distribution of helix–helix crossing angles (Ω) in Figure 6 and that of helical tilt angles (τ) in Figure 7. The majority (42%) of interacting helical pairs in TMPad is packed in left-handed and anti-parallel configuration (−180° < Ω < −110°). This group also has a relatively narrower spread (SD = 13°) with a mean and a median between 155° and 160°. The remaining groups in descending order of size are: right-handed and anti-parallel (110° < Ω < 180°; 25%), right-handed and parallel (−70° < Ω < 0°; 18%) and lastly left-handed and parallel helical pairs (0° < Ω < 70°; 13%). In Figure 7, the distribution of tilt angles of all helices in TMPad appears slightly left-skewed, with a median of 22°, a mean of 23° and a SD of 11°. Lastly, the largest the helical geometry class is the ‘curved’ type (53%), while the ‘kinked’ type represents ~37% of all helices. Only ~9% of all helices are classified as ‘linear’ based on the current entries of the database.
The TMPad database is a comprehensive source for studying helix-packing and it is aimed to provide leverage for examining among different levels of structural information in topology, lipid accessibility, and ligand-binding information in α-helical TM proteins. As we have shown in the example, the thorough helix-packing descriptors we provide in the TMPad may be investigated on their own terms and/or in conjunction with the above structural features for better understanding of their relationships. In addition, there are other applications of the database particularly in structural modeling such as the derivation of knowledge-based potentials, and structure prediction. For the former problem, it was shown that crossing angles and contact pairs can be used to formulate new energy functions (53). In addition, two recent works have used interacting helical pairs as template library for reconstructing helical bundles in silico (54,55). Thus, the TMPad may be used for not only gaining insights into helix-packing in existing structures, but also has broad application in development of new structure prediction methods. Lastly, with the recent advances in structure determination in membrane proteins, we anticipate the rate of solved structures to gain momentum. To further enhance the TMPad, we plan in the future to develop a web server for calculating helix-packing based on user input. This work will be a continued and joint effort with structural biologists and computer scientists for expanding the database functionality and information content.
Supplementary Data are available at NAR Online.
The thematic program of Academia Sinica (AS95ASIA03); National Science Council, partial (NSC97-2627-P-001-004). Funding for open access charge: Academia Sinica Investigator Award, Academia Sinica.
Conflict of interest statement. None declared.
The authors thank Chao-Ling Lin for designing the logo of the website.