G-protein coupled receptors (GPCRs) are an important class of proteins initiating major biochemical pathways sensing environmental stimuli. They are the largest protein superfamily with an estimated 1000 genes in the human genome alone
[
2]. An estimated 30% of known drug compounds target these receptors
[
3]. Around 500 of GPCRs are odorant or taste receptors and the remaining bind endogenous ligands. The GPCR family is divided into five distinct classes, class A – E
[
4]. The class A family is the largest class and includes rhodopsin, the prototypical GPCR, for which the first crystal structure of any GPCR was solved
[
5]. Its ligand is 11-
cis retinal (RT), covalently attached to the protein.
11-cis RT isomerizes to
alltrans RT upon light incidence, resulting in activation of the receptor. As of 2011, several additional GPCR structures have been deposited in the PDB increasing the total number of structures to 43 representing seven distinct GPCRs (Table

). All GPCR structures are characterized by a transmembrane (TM) region consisting of seven helices, the G-protein interacting intracellular (IC) domain and an extracellular (EC) domain.
In GPCRs, the binding of a ligand in the EC or TM domain is the signal that is propagated to the IC domain wherein different effectors bind, in particular the G protein heterotrimer, GPCR receptor kinases (GRK) and β-arrestin. Thus, receptor activation is an inherently allosteric process where the ligand binding signal is communicated to a distant site. The activation of rhodopsin and other class A GPCRs is thought to be conserved and involves rearrangements in structural microdomains
[
6]. Conformational changes of multiple ‘switches’ in tandem activate the receptor
[
7]. These long-range interactions between distant residues are important for the function of the receptors and are also closely involved in their folding and structural stability
[
8,
9]. Identifying the residues involved in the propagation of signals within the protein is important to understand the mechanism of activation. While much information can be directly extracted from crystal structures, allosteric interactions are dynamic and implicit in nature and thus are not directly observable in static crystal structures. Experimental methods for investigating dynamics, such as nuclear magnetic resonance, are presently incapable of resolving allosteric interactions in large membrane proteins, such as GPCRs.
Due to the limitations of experimental methods, statistical analysis of GPCR sequences is an alternative in identifying residues that may be involved in allosteric communication. Here, considerable effort has been directed towards identifying networks of co-evolving residues from multiple sequence alignments (MSA), i.e. residues that are statistically correlated in the MSA. Such correlations are thought to be necessary for function, and may provide insights into how signals are propagated between different domains. A number of computational methods have been developed to identify such couplings from MSAs, including Hidden Markov Models (HMMs)
[
10], Statistical Coupling Analysis (SCA)
[
11,
12], Explicit Likelihood of Subset Co-variation (ELSC)
[
13], Graphical Models for Residue Coupling (GMRC)
[
14], and Generative REgularized ModeLs of proteINs (GREMLIN)
[
1]. Like the GMRC method, GREMLIN learns an undirected probabilistic graphical model known as a Markov Random Field (MRF). Unlike HMMs, which are also graphical models, MRFs are well suited to modelling long-range couplings (i.e., between non-sequential residues). The SCA and ELSC methods return a set of residue couplings (which may include long-range couplings), but unlike MRFs, they do not distinguish between
direct (conditionally dependent) and
indirect (conditionally independent) correlations. This distinction is crucial in determining whether an observed correlation between two residues can be explained in terms of a network of correlations involving other residues. The key difference between the GMRC and GREMLIN methods is that GREMLIN is statistically consistent and guaranteed to learn an optimal MRF, whereas the GMRC uses heuristics to learn the MRF. We have previously reported detailed comparisons of the GMRC and GREMLIN methods
[
1] and found that GREMLIN achieved higher accuracy and superior scalability.
Multiple sequence alignments of class A GPCRs have previously been examined by the SCA
[
12] and GMRC
[
14] methods. In the SCA study, the authors focused on the critical residue at position 296 corresponding to a lysine (K296
7.43), which is the covalent attachment site for RT in bovine rhodopsin
[
6,
15]. Several networks of residues were proposed to mediate the signal flow from the ligand binding pocket to the G protein coupling site. This focus overlooked the important contribution of the EC domain to GPCR structure and dynamics
[
8]. In contrast to SCA, there were no statistically coupled residues involving K296
7.43 in the GMRC study, rendering a comparison of SCA and GMRC results impossible. Only 5 edges in GMRC were considered statistically significant, limiting the interpretability of the results. At the time of the above studies, the rhodopsin crystal structure was the only GPCR structure available. The now larger number of structures published (Table

) provides us with an opportunity to investigate the generality of the roles of individual residues for allostery in different GPCRs. Furthermore, we re-examine the communication across the entire membrane, not only from a single RT residue to the IC side, but considering all possible communication points.
Because of the demonstrated advantages of GREMLIN over other methods
[
1], we applied GREMLIN to the same GPCR sequence alignment previously investigated by SCA and GMRC studies for comparability
[
12,
14]. Using GREMLIN we identified statistically significant long-range couplings in class A GPCRs and analyzed the results with respect to all seven GPCRs that had been crystallized at the time of our study. Our findings indicate that the ligand binding residues are significantly enriched in these long-range couplings, mediating not only communication to the IC, but also to the EC side of the membrane. 9 out of the 10 residues with the largest number of long-range couplings belong to the ligand binding domain. There a total of 34 statistically significant long-range couplings involving these 10 residues, involving experimentally determined microdomains and activation switches in GPCRs. Our study describes a comprehensive view of the network of statistical couplings across the membrane in class A GPCRs. The details of this network are consistent with the hypothesis that the ligand-binding pocket mediates allosteric communication. The independent identification of a crucial role of the ligand binding pocket in mediating this communication provides the first sequence-based support for the early notion that all three domains in GPCRs are structurally coupled
[
16]. Finally, the extent of enrichment of edges in different GPCR structures allowed us to propose a novel minimal binding pocket predicted to represent the common core of ligand contact residues crucial for activation of all class A GPCRs.