|Home | About | Journals | Submit | Contact Us | Français|
The hydrogen bond patterns between mainchain atoms in protein structures not only give rise to regular secondary structures but also satisfy mainchain hydrogen bond potential. However, not all mainchain atoms can be satisfied through hydrogen bond interactions that arise in regular secondary structures; in some locations sidechain-to-mainchain hydrogen bonds are required to provide polar group satisfaction. Buried polar residues that are hydrogen-bonded to mainchain amide atoms tend to be highly conserved within protein families, confirming that mainchain architecture is a critical restraint on the evolution of proteins. We have investigated the stabilizing roles of buried polar sidechains on the backbones of protein structures by performing an analysis of solvent inaccessible residues that are entirely conserved within protein families and superfamilies and hydrogen bonded to an equivalent mainchain atom in each family member.
We show that polar and sometimes charged sidechains form hydrogen bonds to mainchain atoms in the cores of proteins in a manner that has been conserved in evolution. Although particular motifs have previously been identified where buried polar residues have conserved roles in stabilizing protein structure, for example in helix capping, we demonstrate that such interactions occur in a range of architectures and highlight those polar amino acid types that fulfil these roles. We show that these buried polar residues often span elements of secondary structure and provide stabilizing interactions of the overall protein architecture.
Conservation of buried polar residues and the hydrogen-bond interactions that they form implies an important role for maintaining protein structure, contributing strong restraints on amino acid substitutions during divergent protein evolution. Our analysis sheds light on the important stabilizing roles of these residues in protein architecture and provides further insight into factors influencing the evolution of protein families and superfamilies.
As Pauling and Corey realised, satisfaction of hydrogen bonding potential of polypeptide mainchain functions is one of the major factors that give rise to the β-strand and α-helix [1,2]. These regular elements of secondary structure give their names to the main features of protein structure: classical β-sheets, α-helical bundles, αβ-Rossman fold, αβ-barrel and many others. Hydrogen bonding also plays important roles in the intricate and sometimes elaborate arches and turns which link α-helices and β-strands [3-5].
However, these elegant architectures still leave many mainchain functions unsatisfied in their potential to form hydrogen bonds: an early survey of hydrogen bonding in proteins revealed that ~40% of mainchain atoms do not form hydrogen bonds with other mainchain atoms . In general these occur in four different circumstances:
Water molecules or sidechains can usually satisfy the hydrogen bonding potential of mainchain functions that are at the protein surface in a variety of ways and so the residues are often substituted in evolution. However, in the smaller proportion of functions that must be satisfied from the core of the protein, this is achieved by buried sidechains of polar residues.
Analysis of the substitution patterns of amino acids within homologous protein families has revealed that buried polar residues that are hydrogen-bonded to mainchain amide atoms are highly conserved, more so than those polar residues forming hydrogen bonds to mainchain carbonyl atoms or other sidechains [19,20]. Furthermore, analysis of the median sequence entropy of buried amino acid residues has shown that buried polar sidechains, for which the hydrogen bond capacity is satisfied, are the most conserved amino acid residues within proteins . The number of hydrogen bonds to mainchain amide groups also influences the conservation of buried satisfied polar residues, with those forming two or more being significantly more conserved than those forming only one or none . Together, these results imply that the hydrogen bond functions maintained by these conserved buried polar groups have an important role in maintaining protein architecture. Figure Figure11 shows an example of conservation of sequence and local environment for the beta/gamma crystallin family. In the crystallins, the hydrogen bonds provided by a buried and conserved serine help to stabilize a β-hairpin structure; this is the serine that recurs in each of the four domains of β and γ crystallins and is part of the signature motif that has allowed recognition of distant homologues .
Previous in silico analyses of the stabilizing roles that polar sidechains have on the backbone of protein structures have tended to focus on a particular architectural context [13,23,24]. Bordo and Argos  identified recurring patterns and amino acid types involved in sidechain-to-sidechain and sidechain-to-mainchain interactions. However, the conservation of polar residues and the three-dimensional (3D) arrangements of the sidechain-to-mainchain hydrogen bonds were not considered. What then are the features of sidechain-to-mainchain hydrogen bonds formed by polar sidechains? Which amino acids are involved? What kinds of structures do these buried polar residues maintain? Are they local to a secondary structure or do they link between different helices and strands, stabilizing tertiary structure?
In this report we focus purely on buried polar residues that are entirely conserved within protein families and superfamilies, hydrogen bonding to a mainchain atom in each family member. We hypothesise that such buried sidechain-to-mainchain hydrogen bonds satisfy mainchain hydrogen bonding potential where secondary structures cannot be formed, and in so doing become irreplaceable elements of the overall architecture. In order to test this hypothesis we characterize the nature and tertiary structural context of these conserved and buried polar residues. We show that polar sidechains which bridge to mainchain functions in the cores of proteins have conserved tertiary structural roles in homologues. Like the elements of secondary structure, they are born of the need to satisfy hydrogen bonding but, in achieving this, they become key, conserved structural features of many well-known protein architectures. Some are joists or braces, spanning the helices and strands, while others form truss-like structures that support complex loop structures (Figure (Figure22).
In HOMSTRAD , a database of structurally aligned families, 143 families have five or more members with high resolution structures, 131 of which are non-redundant i.e. their sequence alignments do not overlap - see Additional file 1, Table S1. Of these, 65 have conserved and buried polar residues, providing a total of 233 alignment positions where the equivalent residue in each structure forms a hydrogen bond through its sidechain to a mainchain atom - see Additional file 2, Table S2. The frequency of occurrence for the polar amino acids at these 233 alignment positions are shown in Table Table1.1. We have examined the propensity with which such conserved and buried polar residues participate in various architectural motifs - shown in Table Table2.2. We have focused on interactions that are conserved in families, on the assumption that these have had a selective advantage and may teach us about important factors that determine protein architectures.
For conserved and buried polar residues making hydrogen bonds to mainchain NH functions in the N-terminal regions of α-helices, cysteine has the highest propensity to form such interactions, followed by negatively charged aspartate, histidine and glutamate (Table (Table22 and see Additional file 3, Figure S1A - grey bars); surprisingly, neutral residues such as serine, threonine and asparagine have higher propensities when solvent accessible positions are considered (Table (Table22 and see Additional file 3, Figure S1A - white bars) [8,27,28]. This may reflect the importance of the charged hydrogen bond in regions of low dielectric strength, as well as its interaction with the helix dipole .
Local capping effects of buried aspartates occurring either upstream (Figure 3A-B) or downstream (Figure 3C-D) from their hydrogen bonded partner have been well described, but less attention has been paid to aspartates that are hydrogen bonded to the N-terminal residue of a helix via a distant interaction (Figure 3B, E-F), providing structures that often resemble joists. Similar hydrogen bonded interactions are made by cysteines with N-terminal residues, except that cysteine mostly occurs upstream (Figure 3D, G) or interacts distantly (Figure (Figure3H)3H) and is rarely observed to occur downstream from the N-terminal residue (Figure (Figure3I3I).
In a similar way to the aspartates that interact with N-terminal regions of α-helices, the charged residue, arginine, has the highest propensity to form capping interactions that are both conserved and buried at the C-termini of α-helices, while at the same time compensating for the helix dipole (Table (Table22 and see Additional file 3, Figure S1B - grey bars). Interestingly, all conserved, buried arginine residues that interact with C-terminal residues do so distantly (Figure (Figure4),4), often with the arginine itself also being found within a capping region of a different helix (Figure 4D-F). This feature often occurs when the C-termini of multiple helices are aligned (Figure 4D-F), no doubt providing favourable interactions with the negative helix dipoles by helping to offset charge repulsion between two or more helix C-termini.
The polar amino acids with the highest propensities for interacting with edge strands are arginine, asparagine, glutamine and cysteine (Table (Table22 and see Additional file 3, Figure S2A - white bars). However, of conserved, buried polar residues making hydrogen bonds to mainchain atoms in edge strands, tyrosine has the highest propensity to form such an interaction, followed by cysteine, arginine, asparagine and threonine, although the propensities are rather low (Table (Table22 and see Additional file 3, Figure S2A - grey bars). Without the hydrogen bonds from buried, conserved sidechains, these mainchain atoms in edge strands would otherwise form no hydrogen bonds (Figure (Figure5).5). They include strands in β-barrels that are staggered and have no neighbouring strands (Figure 5B-C).
Arginine, followed by tyrosine and threonine (Table (Table22 and see Additional file 3, Figure S2B - white bars) have the highest propensity to form hydrogen bonds to mainchains within edge strands. However, amongst conserved and buried residues within edge strands, tryptophan has the highest propensity, followed by glutamine, histidine and asparagine (Table (Table22 and see Additional file 3, Figure S2B - grey bars). Asparagine and tryptophan often interact (locally) with regions connecting regular secondary structures e.g. β-turns and β-hairpins (Figure 6A-C), while glutamine can bridge the gap between two strands in β-barrel structures (Figure 6E-F).
The mainchains in centre strands are sometimes unable to form hydrogen bonds with the neighbouring strand. Examples include where two strands curve away from each other (Figure (Figure7A),7A), where the neighbouring strand is shorter than the central strand in question (Figure 7B-E), or where the mainchain atom is at the terminus of a strand (Figure 7B,C,E) or part of a β-barrel (Figure (Figure6E).6E). These mainchain functions are often satisfied by sidechain hydrogen bonds. Of polar residues that are conserved and buried and carrying out this role, cysteine, glutamine, threonine, asparagine and serine have the highest propensity to form such interactions (Table (Table22 and see Additional file 3, Figure S2C - grey bars). In some cases the sidechains act as "braces"; for example, the threonines of the conserved aspartic proteinases Asp-Thr-Gly triplet, where the strands diverge after the threonine on either side of the pseudo dyad in the eukaryotic enzymes or the dyad of the dimeric retroviral enzymes (Figure (Figure7F7F).
Of conserved, buried polar residues within centre strands forming hydrogen bonds to mainchain atoms, tyrosine has the highest propensity to form such interactions, followed closely by arginine, asparagine, serine, aspartate and glutamate (Table (Table22 and see Additional file 3, Figure S2D - grey bars). We see a different pattern however when we consider all polar amino acids in centre strands that form hydrogen bonds to mainchain atoms - arginine has the highest propensity to form this type of interaction followed by cysteine, tyrosine, threonine and asparagine (Table (Table22 and see Additional file 3, Figure S2D - white bars). Asparagine, aspartate, glutamate, serine and tyrosine are more commonly found to form hydrogen bonds to mainchain atoms from within edge strands when conservation and solvent accessibility are considered whereas threonine and cysteine are less common.
The conserved, buried polar residues within centre strands that form hydrogen bonds to mainchain atoms tend to occur at the termini of strands more often than in the middle of the strand (Figure (Figure8).8). They often interact with coils (Figure 8A-D), β-turns (Figure (Figure8E)8E) and polyproline, forming truss-like structures that support the coil-like regions they are interacting with. Others are observed to interact with helix capping regions (Figure 8F-G) and neighbouring strands in β-barrels, forming structures that resemble joists (Figure 8H-I).
Cysteine has the highest propensity of buried, conserved polar residues to form hydrogen bonds to mainchain atoms in 310 helices, followed by tyrosine, tryptophan, aspartate and arginine (Table (Table22 and see Additional file 3, Figure S3 - grey bars). This differs to all polar amino acids interacting with 310 helices where arginine, histidine, cysteine and asparagine have the highest propensities (Table (Table22 and see Additional file 3, Figure S3 - white bars). There is less of a clear preference for the 310 helices to hydrogen bond with particular polar sidechains than in α-helices, probably due to the greater plasticity in these helices, which usually comprise only two or three turns (Figure (Figure99).
In β-hairpins, mainchain atoms that are hydrogen-bonded to conserved and buried sidechains have a high propensity to interact with aspartate, cysteine, tryptophan and serine (Table (Table22 and see Additional file 3, Figure S4 - grey bars). We see a similar pattern when we consider all polar amino acids forming hydrogen bonds to mainchain atoms in β-hairpins; asparagine has the highest propensity to form this type of interaction followed by aspartate, arginine, serine and threonine (Table (Table22 and see Additional file 3, Figure S4 - white bars). Therefore, although asparagine, arginine and threonine often form hydrogen bonds to mainchain atoms within β-hairpins, these interactions tend not to be conserved in buried positions.
The conserved buried polar residues that form hydrogen bonds to mainchain atoms in β-hairpins almost always interact distantly with mainchain atoms that would otherwise form no hydrogen bonds (Figure (Figure10).10). Some of the β-hairpin structures are extremely long and complex (Figure 10A-C).
From the set of conserved, buried polar residues hydrogen-bonded to mainchain atoms of polyproline-type helices, arginine is most common, followed by histidine, tyrosine and tryptophan (Table (Table22 and see Additional file 3, Figure S5 - grey bars). Arginine also has the highest propensity to form this interaction when we consider all residues forming this type of interaction, followed by glutamine, asparagine and histidine (Table (Table22 and see Additional file 3, Figure S5 - white bars). A similar result has previously been observed where hydrogen bonds from sidechains to mainchains in polyproline were most frequently formed by arginine followed by glutamine, asparagine, serine and threonine .
Polyproline helices are extended and most often occur on the surface of proteins ; it is therefore not surprising that the conserved, buried residues that form hydrogen bonds come from a residue distant in the sequence. Typical examples are shown in Figures 11A and 11B from the α/β hydrolases and the alcohol dehydrogenases, respectively. In such a mode, the polar residues form truss-like structures that help to stabilize the irregular polyproline helices.
Cysteine and aspartate clearly have the highest propensity to form hydrogen bonds to coil regions out of buried conserved polar residues (Table (Table22 and see Additional file 3, Figure S6 - grey bars). However, arginine has the highest propensity to perform this role when all positions are considered, followed by asparagine and aspartate (Table (Table22 and see Additional file 3, Figure S6 - white bars). A previous analysis of intra-coil sidechain-to-mainchain hydrogen bonds revealed that aspartate, serine, asparagine and threonine are the polar residues that most commonly form this type of interaction, with 80% of these cases being at solvent-exposed sites .
Polar sidechains frequently form hydrogen bonds to coil regions, often in very elaborate loop structures that form extended turns and arches [3-5] (Figures 3A,C-D,H; 4A-B,E; 5D,F; 6A,C,D; 8A-F). However, there are also instances where the conserved and buried residues only form hydrogen bonds with mainchain atoms in coil regions, indicating that stabilization of these irregular regions by polar sidechains is important enough for them to be conserved during evolution (Figure (Figure1212).
We have previously demonstrated that buried polar residues, although small in number, tend to be more conserved when their hydrogen-bonding potential is satisfied or where they form hydrogen bonds to mainchain atoms . Conservation of these residues and the interactions that they form implies that they are important for maintaining protein structure and hence provide restraints on amino acid substitutions during divergent evolution. We have shown that conserved, buried polar residues have conserved roles in stabilizing the tertiary structure of proteins by forming hydrogen bonds to mainchain atoms. The conservation of these sidechain-to-mainchain hydrogen bonds implies that mainchain architecture is a crucial restraint on the evolution of proteins and that the interactions are retained as an essential part of the protein fold. The structural motifs that we have examined have been shown to have particular propensities for polar residues which form hydrogen bonds with mainchain atoms. Although local sidechain-to-mainchain interactions have been the focus of most previous studies, the propensity for sidechain-to-mainchain hydrogen bond formation is often met by distant interaction. For example, we observe that arginine frequently caps the C-termini of α-helices through a distant interaction. We have shown that buried polar residues maintain 3D relationships between secondary structures where mainchain-to-mainchain hydrogen bonds cannot play a role and that similar stabilizing structures recur in different architectures. The key roles of these stabilizing interactions in maintaining protein structures have been previously demonstrated in a few cases, for example in the tyrosine corner , but we have shown here that there are many others important for maintaining protein stability.
Although it is generally unfavourable to bury hydrophilic amino acids in the core of proteins, this is counterbalanced by the need to satisfy mainchain atom hydrogen-bond potential. The interactions that the polar residues form when providing these supporting roles are often quite complex and can be thought of as analogous to features in our own built 3D environment. Many form joists, bridging between the elements of secondary structure (for example, Figures Figures3B,3B, 4D-F, 5B-C, 7A-E), analogous to those that bridge columns and support structures above them in man-made buildings (Figure (Figure2A).2A). Other sidechains act as braces, tethering two strands at the point at which they diverge (Figure (Figure7F7F and Figure Figure2B).2B). Buried hydrogen bonded polar sidechains often maintain triangulated structures, supporting distorted helices and complex loop structures (Figures (Figures3I,3I, 6A,C, 8A-C, 11A-B): these provide a striking parallel with the trusses supporting the roofs of buildings (Figure 2C-E). Remarkably, these structural features have been highly conserved in their respective architectural histories, despite the variation in surface structures. Both are hidden from view and remain unappreciated, except by the cognoscenti. We hope that this paper will help bring understanding of these important structural features of protein architecture to a wider audience.
Protein families containing five or more members were selected from HOMSTRAD where the family alignment contained a conserved, buried polar residue and where the sidechain of the polar residue forms a hydrogen bond to a mainchain atom in each family member. The JOY alignment of each family within HOMSTRAD was used to identify families that met these criteria. JOY's default relative accessibility cut-off (7% or less) was used to define solvent inaccessible (buried) residues. In order to avoid redundancy, where protein families overlapped, the family with the highest sequence coverage was chosen for the analysis.
Hydrogen bond partner(s) to the conserved, buried polar residues were identified using the program, HBOND (J. Overington, unpublished). HBOND identifies all possible hydrogen bonds based on a distance criterion (3.5Å between donor and acceptor).
We used the program, PROMOTIF, to identify the structural context of the conserved polar residues and their interaction partners . The following motifs were identified:
1. α-helices (N-terminal and C-terminal residues were identified based on the following positional criteria: N-(N+1) to N-(N+3) for N-terminal residues and N-3 to N+1 for C-terminal residues (where N is the length of the helix).
2. 310 helices
3. β-strands - edge strands were distinguished from centre strands by referring to the number of hydrogen bonding partner strands. Strands defined as having >1 hydrogen bonding partner strand were defined as centre and all others as edge.
5. Coil regions
We also identified polyproline helices using the program SEGNO.
The propensity of a particular residue type x to form hydrogen bonds to mainchain atoms in a particular architectural context Parch was calculated using the following equation:
where narch(x) is the number of residues of type x forming hydrogen bonds to mainchain atoms in a particular architectural context, N(x ) is the number of residues of type x in the dataset of 131 families, narch(total) is the total number of residues forming hydrogen bonds to mainchain atoms in a particular architectural context and N(total) is the total number of residues in the dataset of 131 families.
Propensities were calculated for:
(i) Polar residues which are entirely conserved, buried in each family member and forming a hydrogen bond to a mainchain atom group in each family member. These numbers were therefore derived from the 233 alignment positions identified in the 66 families.
(ii) All polar residues in the 131 family set, regardless of solvent accessibility and conservation but where the polar residue forms a hydrogen bond to a mainchain atom group.
CLW participated in the design of the study, performed the computational experiments, analysed the data and drafted the manuscript. TLB conceived of the study, participated in its design and refined the manuscript. Both authors read and approved the final manuscript.
Table of the 131 non-redundant families which were used in the analysis.
Table of the families and their members that were used in the analysis.
Figures S1 to S6 show the propensity of polar amino acids to form hydrogen bonds to mainchain atoms in the various architectural contexts analysed.
This work was supported by a BBSRC studentship to CLW. TLB is supported by the Wellcome Trust.