|Home | About | Journals | Submit | Contact Us | Français|
Ras-like GTPases function as on-off switches in intracellular signaling pathways. Their on- or off-state is communicated through conformational changes in the so-called switch I and II regions. It is commonly believed that the distinguishing molecular features of these GTPases are well known. Here, however, I identify—through a Bayesian iterative analysis of GTPase evolutionary divergence—a previously undescribed switch II structural component that (along with previously described, functionally critical residues) most distinguish these signaling pathway on-off switches from other GTPases. In certain Ras-like GTPases this newly-identified component forms an aromatic pocket around the negative-dipole moment at the end of a switch II helix with a positively-charged residue inserted into the pocket. This helix is oriented in a specific direction away from the GTPase core, but is dramatically reoriented upon disruption of the charge-dipole pocket. The charge-dipole pocket occurs in both the on- and off-states and both the charge-dipole pocket and an alternative configuration occur within the unit cell of a single crystal structure of Rab5a GTPase in the off-state. Thus, the charge-dipole pocket configuration is closely associated, not with the on- or off-state, but rather with formation of the outward-oriented helix and, as a result, with restructuring of the switch II N-terminal region, which plays a critical role both in sensing the on-off state and in mediating GTP hydrolysis and nucleotide exchange.
Advances in high-throughput sequencing technologies present the opportunity to infer aspects of biological mechanisms from vast amounts of sequence data—the cell's own language for encoding those mechanisms. Such inferences rely on the assumption that sequence patterns that have been conserved for a billion years or more reflect important structural or mechanistic similarities. More relevant to the analysis here, divergent patterns that are conserved in descendent proteins maintaining a particular divergent function likewise reflect important structural or mechanistic differences. Examining functionally divergent residues within available crystal structures thus can identify the molecular features responsible for important functional distinctions and, as a result, suggest specific mechanisms to test experimentally. Moreover, such patterns may reflect aspects of underlying mechanisms that thus far have evaded detection owing to intrinsic experimental limitations. Just as careful statistical analyses of patterns of inherited traits allowed early geneticists to identify and characterize genes prior to the emergence of molecular biology, a statistical analysis of residue patterns may help identify and characterize the components of a protein's molecular machinery prior to the development and application of more direct methods.
Here I describe a Bayesian iterative strategy for comprehensive analysis of protein evolutionary divergence (Fig. 1a). This strategy can: (i) identify and accurately align vast numbers of related sequences; (ii) identify, based on statistically rigorous criteria, evolutionarily divergent patterns and the sequences conserving those patterns; and (iii) identify corresponding protein structural features. A procedure for performing step (i) of this strategy is introduced here (see Materials and Methods); procedures for performing steps (ii) and (iii) have been described previously 1. As a stringent test of its ability to identify important but previously undescribed molecular features, here I apply this strategy to the well characterized class of phosphate-binding loop (P-loop) GTPases and, as a proof of concept, present one outcome of this analysis: the identification of a previously undescribed structural component that is highly distinctive of Ras-like GTPase on-off switches.
P-loop GTPases bind to GDP or GTP through amino acid residues that show up at the sequence level as highly conserved motifs 2. These include the Walker A (G-K-[ST]) motif 3, which corresponds to the P-loop 4, and a Walker B (D-x-x-G) motif, the conserved aspartate (D) of which interacts (indirectly through a water molecule) with the Mg++ ion coordinating with guanine nucleotide phosphate groups. An important subgroup of P-loop GTPases is the extended Ras-like superfamily 5, which includes Ras, Rab, Rho/Rac, Ran, Arf, Arf-like (Arl), and Sar GTPases and α subunits of heterotrimeric G proteins (Fig. 1b) and which are referred to here as the Ras-like GTPases. Ras-like GTPases function as on-off switches within eukaryotic signaling pathways that regulate diverse cellular processes, including vesicle transport, embryonic development, the sensation of vision, odor, taste and pain, microtubule assembly and cell division. These GTPases are associated both with guanine nucleotide exchange factors, which turn them on by mediating the exchange of GTP for GDP, and with GTPase activating proteins, which turn them off by stimulating hydrolysis of GTP to GDP. The on- or off-state of a Ras-like GTPase is communicated through conformational changes within their switch I and II regions, which detect the presence or absence of the γ phosphate of bound guanine nucleotide (Fig. 1c).
Using the strategy described in Fig. 1a and in Materials and Methods, approximately a quarter of a million P-loop GTPase sequences were identified, multiply aligned, classified into evolutionarily divergent categories and, when atomic coordinates were available, functionally divergent residues were structurally analyzed. After removing nearly-identical and fragmented sequences, the final alignment contained 91,406 sequences (the most conserved features of which are highlighted in Fig. S1 of Supplemental Data). As mentioned in the Introduction, the focus here is on the most characteristic features of Ras-like GTPases. These were identified by optimally splitting this final P-loop GTPase alignment (using Bayesian partitioning with pattern selection6; Fig. 1a) into two major evolutionarily-divergent subgroups: Ras-like GTPases versus other P-loop GTPases (plus a few atypical Ras-like GTPases). Fig. 2 highlights the divergent residues that most distinguish typical Ras-like GTPases from other P-loop GTPases.
As expected, some of the residues highlighted in Fig. 2 were previously proposed to perform critical functions in Ras-like GTPases. Because the aim here is to identify Ras-like residues of unknown function, these residues merely serve as positive controls for this analysis and thus are mentioned only briefly here. These residues are located near the γ-phosphate of bound GTP and include: (i) A threonine residue (Thr43-Rab11a in Fig. 2), which is believed to mediate Switch I conformational changes and which coordinates with the Mg++ ion that is associated with guanine nucleotide phosphate groups 7. (ii) A glutamine residue (Gln70-Rab11a), which is proposed to play a critical catalytic role in GTP hydrolysis 7. (iii) An acidic residue (Glu71-Rab11a), which typically interacts with the Walker A lysine whenever Mg++ or guanine nucleotide is absent and which has been proposed to play a role in Mg++ release and nucleotide exchange 8; 9. (iv) A glycine or alanine residue (Ala68-Rab11a), for which-in GTPases harboring an alanine at this position-the side-chain methyl group is believed to block the Mg++ binding site and thereby promote release of the Mg++ ion during nucleotide exchange 10. And (v) an arginine residue within the switch II region (Arg74-Rab11a), which, for Arf, Arl and Gα GTPases, was proposed to sense the GTP-bound state 11.
In addition to residues with previously proposed functions, Fig. 2 reveals three additional patterns specific to Ras-like GTPases: (i) the pattern [RK]-x-[ILV] preceding the P-loop, (ii) the pattern [WF] directly preceding the Walker B aspartate residue (this aspartate is highlighted in Fig. S1 of Supplementary Data), and (iii) the pattern [YF]-[YF] at the C-terminal end of the switch II region. (Incidentally, among the five deleterious Rab GTPase missense mutations thus far identified in humans 12, one changes the tryptophan associated with the [WF] pattern to glycine resulting in Griscelli syndrome, which is typified by pigmentation defects and unusual activation of lymphocytes and macrophages.) Roughly two-thirds of all Ras-like GTPases match all three patterns exactly and about one-half (mostly Rab and Ran GTPases) conserve the most typical pattern residues. Rho/Rac, Ras and Gα GTPases, however, conserve alternative residues at one or more pattern positions, presumably owing to additional functional divergence. Although these patterns occur within three distinct regions in the sequences, within available crystal structures the corresponding residues cluster near the switch II region's C-terminal (swII-CT) end and exhibit diverse structural configurations, which appear to be facilitated by alternative interactions involving the aromatic residues of the [WF] and [YF]-[YF] patterns.
Interactions between aromatic units play a central role in molecular recognition and self-assembly 13; 14 and can accommodate diverse conformational forms inasmuch as aromatic rings can interact in four distinct geometric orientations: displaced face-to-face, face-to-edge, parallel staggered, and herringbone 15. Moreover, aromatic-aromatic interactions are believed to play an important role in stabilizing protein tertiary structure 16; 17. Aromatic residues stabilize protein structure by lowering conformational entropy inasmuch as their side chains have dramatically fewer degrees of freedom relative to the number of hydrophobic interactions they can establish. They occur more frequently in thermophilic proteins than in mesophilic homologs 18, likewise suggesting that aromatic clusters can counteract structurally instability (due, in that case, to high temperature). The swII-CT aromatic residues often interact with other aromatic residues that are highly conserved within specific Ras-like families; these include, for example, a phenylalanine in Rab GTPases that interacts with the swII-CT tryptophan (not shown) and a tryptophan specifically conserved within Arf, Arl and Gα GTPases 11. Aromatic tyrosine residues, which are favored within the swII-CT [YF]-[YF] pattern, can further contribute to structural diversity and stability 19 by forming internal hydrogen bonds involving their side-chain −OH groups, which can serve as both hydrogen donors and acceptors. Thus the swII-CT aromatic residues may facilitate alternative switch II conformational states, while also counteracting the destabilizing effects of conformational plasticity.
One configuration (Fig. 3a) can explain why the swII-CT residues are conserved inasmuch as, within this configuration, they form a pocket around the negative-dipole moment of a switch II α helix with the positively-charged residue of the [RK]-x-[ILV] pattern inserted into this pocket (Fig. 3a inset). The atomic interactions forming this charge-dipole pocket include: a CH-π interaction between the aromatic residues of the [YF]-[YF] pattern 20; a cation-π interaction 21; 22 between the positively-charged residue and the tryptophan (or, at times, a phenylalanine) that directly precedes the Walker B aspartate, and—when the positively-charged residue is a lysine—a CH-π interaction 20 between these residues as well; van der Waals interactions between the aliphatic residue of the [RK]-x-[ILV] pattern and the other swII-CT residues; and—whenever the first residue of the [YF]-[YF] pattern is a tyrosine, which is typically the case—a hydrogen bond between its side-chain −OH group and a specific backbone oxygen.
An association between the charge-dipole pocket and switch II restructuring is suggested by comparisons between typical monomeric forms of Rab family GTPases and an unusual, homodimeric form of GDP-bound Rab27. In the homodimeric form, which is observed in four distinct crystal structures 23 (pdb_ids: 2f7s, 2.70 Å; 2iey, 3.18 Å; 2iez, 2.80 Å; 2if0, 2.80 Å), the inter-switch region (which connects switch I to switch II) is exchanged between subunits so that, for each subunit, this region is donated by the other subunit (Fig. 3b). As a result, the switch II region forms a long α helix that is directed away from the structural core of one subunit and toward the structural core of the other subunit. Presumably this helix lacks the conformational strain typically imposed on monomeric GTPases—the switch II regions of which need to bend around and reconnect to the structural core. Moreover, for each Rab27 homodimeric subunit the swII-CT residues form the charge-dipole pocket configuration with nearly ideal geometry. Hence, presumably the charge-dipole pocket is structurally compatible with formation of this unusual, outward-directed switch II helix—a structural theme that occurs repeatedly in the following analysis of other Ras-like GTPases. Incidentally, effector-bound forms of (monomeric) Rab27-GTP 24; 25 (not shown) lack both the charge-dipole pocket configuration and the outward-directed switch II helix, thereby demonstrating that Rab27 is able to form alternative swII-CT configurations.
Monomeric Ras-like GTPases that form the charge-dipole pocket (Fig. 4a,d,g and Fig. S2a in Supplementary Data) or nearly so (Fig. 4b,e and Fig. S2b in Supplementary Data) consistently possess a switch II helix that traces out a path close to that of the Rab27 homodimeric helix (see structural superpositions in Fig. 4 and in Fig. S2 of Supplementary Data). However, in these monomeric forms this helix is truncated and tends to deviate slightly relative to the homodimeric form, presumably because the switch II backbone needs to bend around and reconnect to the structural core.
In contrast, monomeric Ras-like GTPases that lack the charge-dipole pocket configuration typically possess a switch II helix that more directly bridges the sites of attachment with the core and that thus traces out a path distinct from that of the Rab27 homodimeric helix (Fig. 4c,f,i and Fig. S3a,c,e in Supplementary Data). Less often, Ras-like GTPases exhibit atypical swII-CT alternative configurations, the switch II helix of which likewise follows a path distinct from that of the homodimer (Fig. S3b,d,f in Supplementary Data). Structures of various non-Ras-like GTPases (Fig. S4 in Supplementary Data) also lack the outward-facing switch II helix. Taken together, these observations indicate an association between the charge-dipole pocket and the outward-oriented α-helix that, based on a ‘back-of-the-envelop’ calculation1, is highly statistically significant.
An analysis of available crystal structures reveals that, although the charge-dipole pocket configuration is most often associated with the GDP-bound state, it also is observed in the GTP-bound state and may be present or absent in both states. For example, for Rab11a the charge-dipole pocket occurs in both states (Fig. 4d,e,g), whereas for Rab5a an alternative configuration occurs in both states (Fig. 4c,f). Moreover, within the unit cell of a single crystal of GDP-bound Rab5a, both the charge-dipole pocket and an alternative configuration occur 26 (Fig. 4a,c). In addition, for GDP-bound Rab11a and Rab23 relatively minor deviations from the charge-dipole pocket configuration are associated with shortening (Fig. 4g,e) or lengthening (Fig. S2a,b in Supplementary Data)(respectively) of the outward-directed switch II helix—suggesting that this helix can extend and retract. Together, these and the previous observations suggest that the role of the charge-dipole pocket is to promote formation of the outward-directed helix, which, in turn, is associated with restructuring of the switch II N-terminal region and, as a result, with repositioning of co-conserved residues implicated in GTP hydrolysis 7 and nucleotide exchange 8; 9; 10 (see Fig. S5 in Supplementary Data). These residues correspond to the pattern [AG]-x-Q-[DE] in Fig. 2.
The role of the charge-dipole pocket may be explored experimentally by determining: (i) whether preventing formation of the charge-dipole pocket abolishes or significantly inhibits GTP hydrolysis or nucleotide exchange and (ii) whether the outward facing switch II helix can form in the absence of the charge-dipole pocket. In particular, I propose two experiments involving unnatural amino acid mutagenesis 27 of human Rab5a and of human Rab11a in conjunction with NMR spectroscopic 28 monitoring of 13C and 15N labeled backbone conformations. Such analyses are facilitated by the numerous Ras-like GTPase structures that have been previously determined through both NMR spectroscopy and X-ray crystallography.
For both of these experiments, the first tyrosine residue of the [YF]-[YF] motif (Fig. 2) (Tyr89 and Tyr81 within Rab5a and Rab11a, respectively) would be mutated to o-nitrobenzyl-tyrosine (oNBTyr) using an in vivo system for unnatural amino acid mutagenesis (either in the yeast Saccharomyces cerevisiae 29 or in mammalian cells 30). Within Rab structures exhibiting the charge-dipole pocket configuration, the side chain OH group of this tyrosine forms a hydrogen bond with the backbone of the β-strand directly preceding the P-loop (Figs 3a and 4a,d,e,g). However, within Rab structures exhibiting the typical alternative swII-CT configuration (Figs 4c,f), the tyrosine side chain is flipped in the other direction and, as a result, the OH group is solvent exposed. Thus, mutation of this tyrosine to oNBTyr in Rab11a or in Rab5a is predicted to lock the swII-CT residues into this alternative configuration because the nitrobenzyl group—which is covalently bound to the tyrosine side chain oxygen within oNBTyr —prohibits the typical tyrosine hydrogen bond from forming and, in any case, is too bulky to accommodate the charge-dipole pocket configuration. Formation of the alternative SwII-CT configuration in this way can be confirmed through NMR spectroscopy. A critical feature of the oNBTyr mutant residue is that it can be rapidly converted back to the native tyrosine through UV-irradiation, which cleaves the nitrobenzyl-photocage from oNBTyr 31. The feasibility of this approach is supported by recent studies of oNBTyr mutants within Escherichia coli 32.
The first experiment utilizes the Rab5a oNBTyr89 mutant. Since available crystal structures of both the GTP-bound and the GDP-bound states of Rab5a can exhibit the alternative swII-CT configuration (Figs 4c,f), this mutant can be used to test whether formation of the charge-dipole pocket is important for transitioning between states—as measured by monitoring GTP hydrolysis and nucleotide exchange. At the same time, NMR spectroscopy can be used to monitor whether formation of the outward-directed switch II helix can occur when the swII-CT residues are prohibited from forming the charge-dipole pocket. As a control, the oNBTyr81 mutation can be converted back to the native tyrosine via UV-irradiation; this will ensure that any unusual biochemical properties of the protein (including an inability to form the outward-directed switch II helix) is due solely to this mutation. This experiment serves as a stringent test of the proposed hypothesis: If transitions between states are not significantly inhibited in the mutant relative to the native protein, then this would rule out a functionally-significant restructuring role for the charge-dipole pocket in GTP hydrolysis and nucleotide exchange
The second experiment utilizes the Rab11a oNBTyr81 mutation, and was inspired by the observation that all five currently available structures of Rab11a exhibit the charge-dipole pocket configuration and the outward-directed switch II helix (pdb identifiers and states: 1oiv, bound to GDP; 1oiw, bound to GTPγS; 1oix, bound to GDP + Pi; 1yzk, bound to GppNHp; and 2f9l bound to GDP). In this experiment, NMR spectroscopy would be used to monitor the Rab11a oNBTyr81 mutant backbone configuration. If the outward-directed switch II helix forms in the (mutation-induced) alternative swII-CT configuration, then this would falsify the proposed hypothesis that formation of the charge-dipole pocket is required to form this unusual helix. Again, as a control, the UV-irradiation can be used to convert the mutant back to the native protein.
The typical alternative swII-CT configuration that is observed for Rab GTPases (Fig. 4c,f and Fig. S3a in Supplementary Data) is also observed for Arf (Fig. S3c in Supplementary Data), Gα (Fig. 4i), and other Ras-like GTPases outside of the Rab family. However, for non-Rab family GTPases bound only to GDP or to GTP, the charge-dipole pocket configuration thus far is observed (albeit with suboptimal geometry) only for Ran (Fig. 4b). This is despite the fact that most Arf, Arl and Sar GTPases match the swII-CT patterns exactly. One explanation is that, for these GTPases, some of the swII-CT residues have assumed other functions that could influence formation of the charge-dipole pocket. For example, Arf, Arl and Sar possess an ‘inter-switch toggle’ device 33 that is absent from other Ras-like GTPases and that, in the GDP-bound state, shifts the swII-CT tryptophan residue away from its normal position. Of course, such family-specific mechanisms are not inconsistent with formation of the charge-dipole pocket in certain conformational states that have not yet been crystallized.
To explore this possibility, Fig. 5a shows a hypothetical charge-dipole pocket configuration for Arl8 based on molecular modeling and, for comparison, Fig. 5b shows an empirically-based crystal structure of Arl8 bound to GDP. In the Arl8-GDP structure and in other crystal structures of Arf and Arl GTPases bound to GDP the residue corresponding to Trp77 of Arl8 interacts with the (inter-switch toggle-shifted) swII-CT tryptophan (Trp65 of Arl8) 11. By contrast, in available structures of Arf, Arl and Gα GTPases bound to GTP the residues corresponding to Trp77 and Arg74 in Arl8 typically interact with the N-terminal switch II region involved in sensing of the γ-phosphate of GTP11 (not shown). Thus, it is noteworthy that, for the hypothetical Arl8 charge-dipole pocket configuration in Fig. 5a, Arg74 and Trp77 are directed away from the switch II N-terminal region due to formation of the outward-facing switch II helix, which—by restructuring the switch II region—could disrupt these interactions.
Support for the notion that formation of the charge-dipole pocket may, in some cases, require an additional, interacting component is provided by crystal structure analysis of Gαi (Fig. 4h), which like Arf and Arl GTPases, conserves switch II residues corresponding to Arg74 and Trp77 of Arl8 11. The hetertrimeric G protein α subunit, Gαi, when bound to both GDP and the GDP-selective peptide KB752 34 forms a (suboptimal) charge-dipole pocket with a switch II helix that traces out a path close to that of the Rab27 homodimer (Fig. 4h). This occurs despite the fact that within Gαi the first aromatic residue of the [YF]-[YF] pattern is replaced by a cysteine. Notably, the crystallographic data for the Gαi-GDP-KB752 complex 34 is consistent with this cysteine and the adjacent aromatic residue establishing an SH-π interaction 35; 36 that is chemically analogous to the typical face-to-edge CH-π interaction between the adjacent aromatic residues (Fig. 3a inset).
Why then might this Gαi-GDP-KB752 complex exhibit charge-dipole-pocket-like features? The KB752 peptide stimulates low-level exchange of GTP for GDP in Gαi by mimicking the role of the heterotrimeric G protein βγ complex in receptor-promoted nucleotide exchange 34; 37. Thus the Gαi-GDP-KB752 configuration appears to represent a transitional form required for Gαi activation. If so, then perhaps the cysteine substitution ensures tight regulation of Gαi activation by requiring the participation of Gβγ for charge-dipole-pocket-mediated switch II restructuring. Of course, this possibility, the formation of the charge-dipole pocket in Ras-like GTPases for which it has not yet been observed, and the extent to which it is generally associated with switch II restructuring remains to be determined.
This study has yielded the following observations: (i) The most distinguishing feature of Ras-like GTPases at the sequence level are five conserved residue positions (at three distinct sequence locations) that are noted for the first time here—along with five other conserved residue positions whose likely biological roles in Ras-like GTPases were noted previously. (ii) At the structural level, these newly identified residues come together to form either a ‘charge-dipole pocket’ configuration or a variety of alternative configurations. (iii) The specific atomic interactions forming the charge-dipole pocket configuration require the specific amino acids conserved at these positions. (iv) The charge-dipole pocket configuration occurs in both the GDP- and the GTP-bound states and both the charge-dipole pocket and alternative configurations can occur in the same state. (v) The formation of the charge-dipole pocket is highly correlated with formation within the switch II region of an outward-oriented α-helix. And (vi) the formation of this helix is associated with structural changes in the switch II N-terminal region, which senses the γ phosphate of GTP and which harbors three previously-noted Ras-like residues involved in GTP hydrolysis and nucleotide exchange. Taken together, these observations suggest that the charge-dipole pocket may play a critical role in the switching mechanism by facilitating restructuring of the switch II N-terminal region.
Fig. 1a outlines a strategy for iteratively enlarging and improving a multiple sequence alignment while at the same time categorizing the sequences based on evolutionarily-divergent patterns. This strategy relies on four procedures: (i) a Bayesian partitioning with pattern selection (BPPS) procedure 6; (ii) a routine to aid recursive application of the BPPS procedure; (iii) a search procedure that uses “Multiply-Aligned Profiles for Global Alignment of Protein Sequences” (MAPGAPS); and (iv) an automated procedure for examining the structural locations of the patterns identified by the BPPS procedure. The first and last of these procedures are implemented in the CHAIN program 1, which is available at www.chain.umaryland.edu; programs implementing other procedures, which are first described here, are available at http://mapgaps.igs.umaryland.edu. This strategy iteratively enlarges and improves the input alignment insofar as additional sequences are detected and aligned in Step 4 of each iteration and new subfamilies are identified and incorporated into the multiple profile alignment in Steps 1-3 of the next iteration (see Fig. 1a). In principle, an alignment can be progressively improved in this way until reaching the inherent limitations associated with the currently available data.
Within each subfamily, sequences typically are so closely-related that they can be accurately aligned using standard alignment procedures, such as MUSCLE 38. Across families, however, sequences can be impossible to align correctly using standard procedures. Consider, for example, a conserved arginine that forms a salt bridge with an acidic residue within Ras, Rho, Rab and Ran GTPases 6: In Rho GTPases (relative to these other GTPases) a two-residue deletion directly precedes this arginine and a 15-residue insertion directly follows it, so that conventional alignment methods fail to align this arginine correctly. However, once the subfamily profiles are accurately aligned, the MAPGAPS program can easily align such residues (as described below). For this study, creation of a highly accurate multiple profile alignment was facilitated by over 400 GTPase structures in the pdb database and, as a result, the MAPGAPS program was able to obtain a very accurate alignment compared to automated methods. For example, the MAPGAPS program correctly aligns five weakly conserved sequence motifs within 32 distantly related GTPases of known structure, whereas MUSCLE misaligns 77 of these regions (unpublished).
A multiple profile alignment consists of a set of consensus sequences, one for each profile, each of which is aligned against a master consensus sequence representing the profile alignment itself. For the analysis here, accurate alignment of these consensus sequences was based on the locations of evolutionarily-conserved and evolutionarily-divergent patterns in over 400 P-loop GTPase structures (using the automated structural analysis procedure mentioned above) and on alignment of subtle, yet statistically significant similarities within large numbers of GTPase sequences using publically-available Bayesian sequence analysis methods 39; 40; 41; 42; 43.
The iterative strategy was applied to sequences in the NCBI nr, env_nr and translated EST databases 44. After convergence, the P-loop GTPase alignment contained 278,101 sequences (193,785 unique sequences). After removing closely-related and fragmented sequences, the resulting alignment contained 91,406 P-loop GTPase sequences. The BPPS procedure was applied to this final alignment to obtain the ‘contrast alignment’ shown in Fig. 2.
The BPPS procedure starts by arbitrarily partitioning a multiple sequence alignment (given as input) into two sets (termed the foreground and the background) and by arbitrarily choosing a sequence pattern associated with the foreground set. For example, it might start by randomly selecting half of the sequences for the foreground set and the other half for the background set and by randomly selecting the residue patterns [YF], [ILV] and [ST] at column positions 20, 37 and 68, respectively. Next a Markov chain Monte Carlo strategy 45, called Gibbs sampling, iteratively reassigns sequences to one of the partitioned sets and adds or removes pattern residues at alignment positions 6. Such changes are sampled with probability proportional to the degree to which they improve the underlying statistical model, which measures how well the pattern distinguishes the foreground from the background sequences. This process is continued until convergence on a (local or global) optimum or nearly-optimum partition-pattern pair.
The BPPS procedure is specifically designed to find local optima because otherwise it would miss biologically important divergent patterns. Take Ran GTPases, for example. Ran conserves certain residues unique to the Ran family, other residues unique to the Ran, Rab, Rho and Ras families, and others shared by all Ras-like GTPases. Based on the posterior probability distribution two of these three categories of conserved residues will show up as local optima whereas a third category will show up as a global optimum. Thus it is often desirable to find local optima as well as the global optimum. Simulated annealing 46 is applied to drop into a (typically local) optimum, which corresponds to two groups of proteins that have strikingly diverged from each other at pattern positions. For mathematical details see 6. For input alignments composed of multiple subgroups, the BPPS procedure can identify these multiple subcategories by using an option where multiple runs are initialized with distinct seed patterns, as previously described 1; 6.
The BPPS procedure returns as output a contrast alignment, such as that shown in Fig. 2. Of course, inherent uncertainties are associated with such a contrast alignment due to: (i) the specific sequences included in the input alignment; (ii) regions of alignment uncertainty; (iii) statistical uncertainty regarding the pattern-partition pair; and (iv) the choice of parameters specifying the degree of pattern conservation and the expected number of pattern positions. Given these uncertainties, it is impossible to guarantee that the BPPS procedure will find the biologically most meaningful pattern-partition pair. Nevertheless, the robustness of a pattern-partition pair can be established by running the BPPS procedure multiple times using distinct random seeds, various parameter settings, and various input sequence sets. Fig. S6 of Supplemental Data illustrates how such inherent variability can be explored by generating three contrast alignments, each of which utilizes a different random seed and a different (50%) subset of the aligned sequences used for the analysis in Fig. 2: Although there is some variability, in each case the predominant residue pattern obtained is very similar and does not alter the main conclusions of the analysis. The issue of alignment uncertainty is addressed in the next section.
The MAPGAPS procedure takes as input a multiple alignment of hidden Markov Model (HMM) profiles 47; 48, searches a protein sequence database for significant hits to at least one of these profiles, and outputs a global alignment of the matching sequences. The HMM profiles are generated from multiple alignments obtained in step 1 of Fig. 1a. The MAPGAPS procedure uses PSI-BLAST heuristics and statistics 49 to speed up the search and to ensure appropriate sensitivity and selectivity. The search time depends on the number of profiles, their lengths, and the size of the protein database being search, but typically runs in a couple hours on half a dozen grid nodes when searching the NCBI nr database against about 150 profiles—roughly the number used for the GTPase analysis performed here. If desired, a MAPGAPS program option can be used to eliminate certain protein families from the output alignment when the goal is to analyze functionally divergent residues that distinguish one specific subgroup from another specific subgroup within a larger protein class.
After a search, the MAPGAPS program globally aligns all database sequences with a significant match to at least one profile using both the pair-wise alignment of each database sequence against its highest scoring profile and that profile's alignment against the other profiles—as specified by the multiple profile alignment. More specifically, if position x in a matching sequence aligns with position y in profile Y and if position y in profile Y aligns with position z in the multiple profile alignment, then position x in the sequence is aligned with position z in the global alignment. In this way, as long as the multiple profile alignment is correct, evolutionarily and functionally related residues that would otherwise be impossible to align using standard procedures, can be aligned correctly and automatically. The MAPGAPS program and its application will be described in detail elsewhere.
The Reduce program 50 was used to predict hydrogen atom positions within structural coordinate files; this step is required for automated detection of hydrogen bonds by the CHAIN program1. Molecular images were created using Rasmol 51. To eliminate extraneous influences on GTPase conformations, all of the structures shown in the figures, except for the Gαi structure in (Fig. 4h), are bound either only to GDP or only to GTP or a GTP analog. The structural alignments between monomeric GTPases and the Rab27 homodimer in Fig. 4 and in Supplementary Data were obtained by minimizing the least square distances between the monomeric and homodimeric α-carbons of residues corresponding to the swII-CT patterns [RK]-x-[ILV] and [WF]. Molecular modeling of the charge-dipole pocket configuration for Arl8 GTPase was performed by applying the GROMACS molecular dynamics program 52 (for energy minimization with hydrogen bond constraints) and the KiNG program (http://kinemage.biochem.duke.edu/software/king.php) (for obtaining favorable side-chain and backbone conformations and for resolving steric clashes) starting with the atomic coordinates for 1YZG (structural genomics consortium).
I thank L. Aravind, Leo Chavas, Tony Hunter, Kentaro Ihara, Vishvanath Nene, Ji-Joon Song and David J. Weber for comments and suggestions. This work was funded by the NIH Division of General Medicine Grant GM078541.
1Determining the statistical significance of the association between the charge-dipole pocket configuration and the outward facing switch II helix is complicated by the fact that various structural features associated with the swII-CT residues were examined during the analysis. This requires that the statistical significance be adjusted for multiple hypotheses—though it's not clear by how much. As a rough approximation, the following ‘back-of-the-envelop’ calculation was performed: Assume (conservatively) that about a million hypotheses were tested implicitly while examining about 120 distinct Ras-like GTPase structures: At least two dozen distinct structures exhibit the charge-dipole pocket and the outward-oriented switch II helix (or very nearly so); at least ninety distinct structures exhibit an alternative swII-CT configuration without the outward-oriented helix; and no more than a half dozen of these structures exhibit features intermediate between these two configurations. These intermediate forms can be categorized (conservatively) by viewing them as three charge-dipole pocket configurations without an outward-oriented helix and three alternative configurations with an outward-oriented helix. Based on these observations, Fisher's exact test rejects the hypothesis that there is no association between the charge-dipole pocket and the outward-oriented switch II helix with a p-value of ~10-12 (after adjusting for a million hypotheses using a Bonferroni correction).
Supplemental Data: Supplementary Data associated with this article can be found, in the online version, at doi: xxx.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.