|Home | About | Journals | Submit | Contact Us | Français|
Most proteins are only barely stable, which impedes research, complicates therapeutic applications, makes proteins susceptible to pathologically destabilizing mutations. Our ability to predict the thermodynamic consequences of even single point mutations is still surprisingly limited, and established methods of measuring stability are slow. Recent advances are bringing protein stability studies into the high-throughput realm. Some methods are based on inferential read-outs such as activity, proteolytic resistance or split-protein fragment reassembly. Other methods use miniaturization of direct measurements, such as intrinsic fluorescence, H/D exchange, cysteine reactivity, aggregation and hydrophobic dye binding (DSF). Protein engineering based on statistical analysis (consensus and correlated occurrences of amino acids) is promising, but much work remains to understand and implement these methods.
Site-directed mutagenesis, still the core technology of protein engineering, will turn 30 next year. The last three decades have seen well in excess of 100,000 mutations made (many more if we count combinatorial approaches) to probe and alter the structure, activity, folding and stability of a vast array of proteins with different folds and functions. A huge number of stability measurements have been amassed, in addition to a massive body of hypothesis-driven experiments designed to tease out the basis of protein stability. But predicting the stability of protein mutants remains one of the great unsolved problems of protein science, proving itself more difficult than even the prediction of protein structure or even the design of fairly efficient enzymes.
This difficulty is in spite of our actually knowing a great deal about the forces that dominate in protein folding [1,2] and perhaps even more about the atomic-resolution structures of folded proteins. So what’s the problem? One problem is that despite large forces being at work in the structure of the folded state, such as the enthalpies associated with all the hydrogen bonds that form, the net stabilities of proteins are small—5–15 kcal mol−1. This is because the forces acting on the unfolded state, such as all the hydrogen bond donors and acceptors that are satisfied by solvent, are also large. This marginal differential means that exquisite accuracy is required from fairly crude potential functions, and the problem is exacerbated by our inability to meaningfully model the unfolded state. Furthermore, it is difficult or impossible to model key aspects of protein folding, such as backbone motion or solvent entropy.
Even empirical approaches that attempt to extrapolate from training sets of thermodynamic data do not capture sufficient information to solve the problem, but it is less clear if the reasons for this are fundamental. On one hand, the standard methods of characterization—calorimetry or spectroscopically observed chemical or thermal denaturation—are slow and laborious. On the other hand, even “large” databases are easily dwarfed by the size of sequence space, and it is certainly clear that the effect of a mutation is only meaningful in context. Mutating alanine to serine is a vastly different thing in different scaffolds, in different secondary structures, with different packing densities or solvent exposures, or with different amino acids nearby. So while insight may not follow from numbers alone, there is a degree to which having large numbers of well-characterized and highly-related mutants will shed light on the problem of protein stability. And even if it doesn’t, the technology to enable those measurements will also enable brute-force approaches for engineering stability.
In recent years, the problem of protein stability has intersected with problems of large numbers in two interesting ways, which each are proving useful for engineering proteins for improved stability and elucidating the underlying reasons. The first is the development of fairly general high-throughput methods for measuring protein stability. The second is the use of statistics from the very large number of sequences that have resulted from 15 years of genome sequencing to predict stabilizing mutations. Here we will highlight some of the most important recent advances in these two areas.
High-throughput approaches for measuring or improving protein stability generally fall into two categories; either they attempt to infer the stability from properties that are typically measured close to physiological conditions, or they perturb the conditions of the protein in some way and read out the stability (more or less) directly. For example, protein expression level, solubility, secretion, binding and enzymatic activity, and resistance to proteolysis may all be taken as indications of a stable protein . In general, the Achilles’ heel of these approaches is a lack of broad applicability (for example, many interesting proteins do not have an enzymatic function) and “you-get-what-you-select-for” kinds of escape variants (for example, unstable but protease-resistant mutants). On the other hand, the problem with measuring the stability directly is the difficulty of miniaturization; circular dichroism and differential scanning calorimetry are not well suited to 96-well plates. But some creative ideas have been applied recently with both types of approaches, which we highlight here. Several other recent reviews highlight other aspects of combinatorial approaches to protein biophysical properties [3–6].
A straightforward approach for selecting for thermostable proteins is to monitor protein activity at elevated temperatures or after heating. For example, thermal inactivation was used to engineer an esterase with good tolerance of high temperatures but robust room-temperature activity, a feat that was not universally thought to be possible until it was demonstrated directly . This approach is limited to proteins with an activity that can be assayed easily, and there is not a clear correlation between the degree of thermal inactivation and the stability since it is complicated by aggregation and folding rates. But this is a very practical approach to screening for proteins with improved stability and activity under various perturbing conditions. Screens for achieving a stability threshold based on binding or catalytic activity have formed the basis for several notable combinatorial experiments in protein design using λ suppressor, barnase, chorismate mutase, and Rop, to name a few .
Resistance to proteolysis has been used broadly to identify structured variants, particularly on phage particles. It has been difficult to rely on nonspecific proteolysis as a read-out of stability, because proteolysis rates are related not just related to global stability but to local stability and substrate specificity. Recently, Bardwell and coworkers developed a system in which a protein of interest (POI) is inserted into a loop of TEM-1 β-lactamase, where it was hypothesized that lower-stability mutants of the POI would generally lead to greater degradation by cellular proteases [8*]. For several proteins, the log of the minimum inhibitory concentration (MIC) of antibiotic showed a striking correlation to the stability of mutants (R2 > 0.6). The relationship was especially good for Immunity protein 7, where it was clear that expression level was correlated to stability. Often, expression level differences from varying rates of transcription and translation, solubility differences, or display differences (on phage or yeast) can be confounding factors to these types of inferential screens. But the authors convincingly showed that the system could be used to select for Im7 variants with improvements in both thermodynamic and kinetic stability. The selection is also tunable and demands that selected variants be soluble and expressible.
Marqusee and coworkers have recently employed pulse proteolysis in increasing concentrations of urea using thermolysin, which retains its activity in high urea, to measure folding ΔG values . The method is read out by SDS-PAGE, but it can be applied to unpurified protein in crude lysate with sufficient overexpression or specific detection, making it suitable for fairly high-throughput quantitative determinations of stability. By adjusting the pulse time and using chemical denaturant, one can directly measure the fraction folded and avoid confounding differences in protease susceptibility under native conditions. This observation led Park et al. to challenge the entire E. coli proteome with a protease under native conditions to specifically identify resistant proteins [10*]. Maltose binding protein was a notable survivor of thermolysin treatment, and it achieves its resistance through kinetic stability to unfolding. While it may be a challenge to apply to libraries of mutants, this screening principle may be useful to shed light on determinants of kinetic stability.
Split-protein reassembly, also called protein-fragment complementation, has proven useful for identifying protein interactions in living cells, wherein reassembly of fragments of DHFR , GFP [12,13], luciferase  or other proteins is driven by the interaction between POIs fused to the fragments. Split fluorescent proteins reassemble irreversibly, making them useful for detecting weak interactions but generally unsuitable for measuring binding constants . In contrast, split luciferase reassembles reversibly, which has been exploited to look at interaction dynamics in cells but so far not to look at stability directly . Koide and coworkers recently combined yeast surface display with protein reassembly, and they demonstrated that FACS-detected reassembly could be used to measure stabilities and enrich in mutants with a defined range of stabilities [17*]. A human fibronectin type III domain (FN3) was split, with one fragment displayed on the cell surface and the other secreted into the medium. The fragments were fused to two epitopes for fluorescently-labeled antibodies, such that FACS could resolve the display and reassembly levels on each cell. The log of this ratio correlated well with the change in binding energy (R2 = 0.8) for a series of mutants that form a β-bulge in FN3. While not every protein will reassemble, and the binding energies may not perfectly match the stabilities of the full-length proteins, the ability to rapidly determine and sort for absolute protein stabilities is especially useful.
Despite the complicating irreversibility of split GFP reassembly, Linse and colleagues demonstrated that the split fragments of the B1 domain of protein G (GB1) could drive the reassembly of the known interaction-detection fragments of GFP, and that fluorescence was related to the thermal stability of the corresponding GB1 full-length protein with the same mutations . This result is somewhat surprising, considering that in general cellular fluorescence from split GFP reassembly does not quantitatively correspond to the binding affinity . This screen is also limited to proteins that can be dissected to reassemble, and while it lacks inherent controls for expression level differences, it is simpler in its implementation than yeast display complementation screening.
Waldo and colleagues introduced a screen for soluble proteins based on fusion of a “folding reporter” GFP to the C-terminus of a POI . The GFP only folds and becomes fluorescent if the fused POI folds and is soluble. One substantial improvement to the screen was the dissection of a “superfolder” GFP into a tagging fragment of 15 amino acids from the C-terminus and a 215 aa “detector” fragment [20**]. These GFP fragments spontaneously reassemble, but only if the peptide is fused to a folded, soluble POI, and the tag influences the solubility of the fusion less than the original folding reporter. Procedures for HT implementation have been described recently . Solubility and stability are not generally directly related, but solubility is another key biophysical property that cannot be predicted and requires HT screening methods.
Of all the traditional methods of measuring protein stability, thermal and chemical denaturation monitored by intrinsic fluorescence of aromatic amino acids is the most straightforward to miniaturize. Stites and colleagues developed an early home-built autotitrator for semi-automated denaturation measurements . Edgell, Pielak and colleagues carried out pioneering work in this area using autotitration methods and robotics with a standard fluorimeter . Dalby and colleagues extended the method considerably by adapting it to microtiter plates with autotitration, which dramatically increased the throughput [24*]. Mayo and coworkers recently coupled this system with computational design of proteins libraries, enabling exhaustive characterization of computational predictions in a reasonable experimental timeframe [25**]. Dalby and coworkers have gone on to further miniaturize their method to nanoliter scale using microfluidics, which enables screening with very small amounts of proteins (perhaps only 108 molecules) . This approaches a scale where concomitant miniaturization of protein production is a challenge for libraries, but it has great promise immediately for protein-ligand interactions. Of course, these methods rely on the presence of an intrinsic fluorophore, which not all proteins possess, and they require a fair amount of specialized equipment even in their simplest implementation. However, they are likely to produce measurements that directly compare to those taken by standard methods.
Hydrogen-deuterium exchange has been used extensively to measure the stability, dynamics and folding of proteins by NMR and mass spectrometry. Oas and Fitzgerald developed a HT screen called stability of unpurified proteins from rates of H/D exchange, or SUPREX . Cell lysate from 200 µL of cell culture is exposed to a pulse of D2O in varying concentrations of chemical denaturant, and the sample is dried with MALDI matrix for rapid acquisition. The method is complicated by aggregation or low expression. Also, EX2 conditions (wherein folding is faster than the intrinsic exchange rates of the protons) are required to extract thermodynamic parameters. But Oas has successfully used this method to measure protein stabilities in living cells . (Gierasch and colleagues have made similar measurements recently using biarsenical dyes as the readout instead of MALDI mass spectrometry .) Fitzgerald and colleagues recently described a variant of SUPREX based on oxidation rates (SPROX) which addresses some of the complications of H/D exchange for these sorts of experiments, such as resolving power, ion suppression, chromatographic separation and reversibility of modification .
Other reactions can also be used to monitor protein stability. For example, Harbury and colleagues developed a method called misincorporation proton-alkyl exchange (MPAX), which uses weak missense suppressors to make random, residue specific Cys mutations throughout a protein of interest . The burial of these Cys residues is interrogated by alkylation, which can be read out through mass spectrometry or chemical scission and PAGE. The method is especially useful for measuring the stabilities of proteins that do not refold reversibly, since the measurement is made under native conditions. Hellinga, Oas and colleagues have developed a related HT method called quantitative cysteine reactivity (QCR), which uses gel-shift as a readout, as well as a fast (fQCR) variant with a fluorescence readout [32,33*]. Like H/D exchange, thermodynamic parameters can only be extracted in the EX2 regime. Also, an appropriate buried Cys residue (ideally only one) is required, or the protein must be engineered with some peril of changing the protein thermodynamics. This method was demonstrated at picomole (nanogram) scale using HT gene fabrication and cell-free transcription-translation, which is a very exciting frontier in HT stability measurements.
A somewhat different measurement that is applicable to proteins that unfold irreversibly and aggregate—which represents a large fraction of interesting proteins—has been called differential static light scattering (DSLS). Senisterra et al. reported the use of a home-built instrument (which is now commercially available) that is capable of light scattering measurements of protein aggregation in 384-well format [34**]. It is worth noting that 600 nm absorbance in a standard plate reader is also a reasonable way to measure aggregation. The chief advantage of this method is its simplicity, as no intrinsic or extrinsic probes are required. Besides the limitation to aggregating proteins, this non-equilibrium method could be confounded by dramatic changes in the kinetics of unfolding or aggregation for different mutants. But these effects appear to be small in proof-of-principle experiments.
A variation on this theme is isothermal denaturation (ITD), in which the rate of irreversible denaturation is observed, typically at a temperature just below that of melting. In principle, this denaturation can be observed by loss of a signal such as CD or shift of a signal such as UV absorbance or fluorescence. For proteins that aggregate, light scattering is also possible. ITD measurements are highly reproducible and have been reported to be more sensitive to small changes in stability, which is especially useful for ligand binding studies. Senisterra et al. adapted the method to HT using their 384-well scattering apparatus, with the additional advantage that ITD required less protein than comparable methods . ITD measurements do require a priori knowledge of the protein’s approximate melting temperature, which could be problematic for protein libraries, and presumably could be very sensitive to changes in kinetics that may not be directly linked to equilibrium stability.
Schaeffer and colleagues have introduced an in vitro hybrid of the GFP fusion method for solubility and ITD, which can measure stability without purification [36*]. Here, the POI is fused to the N-terminus of GFP. The protein, purified or in lysate, is then subjected to ITD in HT format. The method, called GFP-Basta, is only applicable to proteins that aggregate upon unfolding and is limited by the GFP aggregation and photophysics, but practically these are not very significant limitations for most POIs.
One especially promising method called differential scanning fluorimetry (DSF) is simple, broadly applicable and requires little specialized equipment. A method called Thermofluor was developed by 3D Pharmaceuticals, now owned by Johnson & Johnson, which reports on the perturbation of the melting temperature of a receptor by a potential ligand through addition of an extrinsic fluorophores [37*]. Hydrophobic dyes such as ANS are quenched in aqueous solvent but become fluorescent in organic solvent or when bound to molten globules or protein unfolding intermediates.
Most laboratory implementations of DSF, which is used extensively to optimize buffer conditions for crystallography [38,39], use real-time PCR machines which typically lack filter sets in the blue. Consequently, dyes such as SYPRO Orange have been widely used instead of ANS. RT-PCR machines enable DSF in 96 and 384-well formats with ~20 µL of solution, where ~ 1 µg µL−1 solutions are required. Nordlund and colleagues have shown that DSF is applicable to a broad range of proteins but that some proteins bind to SYPRO Orange in the folded state . Magliery and co-workers demonstrated that for a series of related mutants of a protein, the correspondence between Tm values determined from CD thermal denaturation and DSF is excellent [40**]. The reverse-format protein-engineering implementation of Thermofluor, in which the conditions and ligands are held constant and the protein varied, was called High-Throughput Thermal Scanning (HTTS). It has been applied to core and loop libraries of four-helix bundle proteins to elucidate determinants of stability (Lavinder, J.J., Hari, S.B., Sen, S. and TJM, in preparation). DSF is surprisingly reversible through the melting point , although dye-protein aggregates appear upon extended heating of the denatured state. It is likely that the dye itself will perturb the apparent melting point, but the ΔTm values correspond quite well to calorimetric and spectroscopic measurements.
All of these miniaturized methods require miniaturization and high-throughput handling of protein expression, purification, and conceivably library construction. Growth of bacteria in 1–2 mL of culture in 96 deep-well plates is the technology of choice for most of these methods, where some amount of robotic liquid handling for plate or bead-based affinity purifications (particularly IMAC) is helpful. For the most part, these are achieved with considerable home optimization at present. Platforms for HT oligonucleotide and gene synthesis [42,43] and in vitro protein expression  stand to expand the screening front-end further, but these are still far from straightforward implementation in most labs.
Two targets of great interest in the pharmaceutical industry are particularly challenging for adapting to stability screens. Membrane proteins make up a large fraction of all drug targets, but they are difficult to work with in vitro, particularly for structural studies. Many recent successes in membrane protein crystallography have been born of strategies to stabilize the POI . The most rapidly expanding area in pharmaceuticals is that of biologics, which antibodies and antibody-like molecules dominate at present. But their generally poor biophysical properties make them difficult to engineer and formulate as drugs.
Membrane proteins are often difficult to express or purify and are only stable in detergent or lipid formulations. Stevens and colleagues adapted cysteine reactivity reported by fluorescence, similar to the fQCR method, for membrane proteins in detergent . They demonstrated its use on a lipophilic model protein, a monotopic membrane protein, and an integral membrane protein from the GPCR family. More recently, Cherezov, Stevens and colleagues have expanded this method by examining protein unfolding in the lipidic cubic phase with intrinsic fluorescence or fluorescence upon cysteine modification as a read-out (LCP-Tm) . The method was additionally used for ITD over long time frames for membrane proteins. Baldwin and coworkers also developed an ITD screen for membrane proteins in detergent . DSF has been applied to membrane proteins, but the high fluorescence background of the dye in detergent is a complicating factor . Dyes that are more specific for proteins over lipids may improve this.
Even many full-length monoclonal antibodies are limited in their use as therapeutics by marginal stability and aggregation. Formats that are more straightforward to engineer and express, such as Fabs and scFvs, often suffer from decreased stability, and more significant engineering for humanization and generation of bispecific species can compromise stability further. DSF and DSLS have both been applied successfully to formulation studies of monoclonal antibodies [50,51]. Thermal inactivation screening has also been used as a means of establishing sufficient stability for scFv variants . Cysteine reactivity has been applied to mAb stability, which seems particularly apt given the importance of disulfide bonds for antibody stability . Little has been published on the application of these sorts of screens to engineering antibody stability, but much of this work is behind industry doors at present.
Most of the methods described above are useful for sorting out which members of a library are folded and stable. This enables the researcher to make mutations according to some hypothesis of design or even entirely at random, and locate variants with suitable physical properties. But it is often not simple to find rare stabilizing mutations, and such mutations may compromise other features of the protein, such as enzymatic function or expression level. An alternative approach to identifying sites of stabilizing mutations is to turn to statistical analysis of the natural repertoire of the motif, domain, protein or fold of interest. An attractive idea is that making mutations to the most common amino acid in some position of a protein is likely to be beneficial, and indeed these so-called consensus mutations are tolerated and stabilizing far more frequently than at random or even from the best predictions today. But implementation is harder than it sounds. Multiple-sequence alignment (MSA) is often challenging, especially in poorly conserved regions or loops, leading to high noise. Most sites in proteins are not well conserved, and taking the most common amino acid in these positions is often little better than picking one at random. Moreover, some positions, especially weakly conserved ones, can be seen to vary together—that is, to be correlated—although these correlations are only sometimes close in space and are of uncertain significance in most cases. The very large number of protein sequences available today makes these kinds of approaches worthy of greater attention in the years ahead.
Steinbacher, Pluckthun and colleagues found that about half of the mutations made to an antibody Vκ domain were stabilizing [54**]. Steipe and coworkers went on to use this concept to generate hyperstable VH domains for intracellular expression of Fvs in E. coli . Wyss and colleagues made a number of variants of fungal phytases based on the consensus of a very small number of closely-related sequences (less than 20), and they found that even the full consensus sequences were active and significantly thermostabilized [56,57]. (‘Full consensus’ means the most common amino acid from the MSA was used in every position of the protein.) More recent efforts have focused on the design of ubiquitous motifs, such as ankyrin repeats [58,59] and tetratricopeptide repeats . Consensus variants of both of these repeats have been assembled into very stable domains and engineered using library and rational methods for novel binding properties.
The origin of consensus stabilization is not entirely clear. One possibility is that individual proteins in the MSA only avail themselves of as many stabilizing mutations as necessary for function, but that consensus amalgamates these mostly additive mutations. It is also not yet clear why only half of the mutations are stabilizing. Some light was shed on this recently by Arnold, Hilvert and coworkers, who showed that consensus mutations from library selections were also stabilizing [61*]. The authors suggested that this method benefited from the removal of phylogenetic artifacts. Other factors stemming from correlation and poorly conserved residues also likely play a role. A related approach to making consensus mutations is to make “ancestral” mutations by tracing mutations back to early sequences along the phylogenetic path. These mutations also turn out to be stabilizing about half the time . Tawfik and colleagues have incorporated ancestral mutations into the family shuffling of paraoxonase-3 to successfully yield stable, active chimeric enzymes .
An additional layer of complication in the statistical analysis of MSAs is that not all positions are statistically independent. Ranganathan and colleagues developed a method called statistical coupling analysis (SCA), which is a perturbation-based approach to identifying overrepresented pairs of amino acids . They showed that inclusion of both consensus and correlation information was necessary and sufficient for the design of folded WW domains—meaning that variants that were plausible from positional distributions alone were often unfolded if they did not capture enough correlation information [65**]. The meaning of these types of correlations is even less well understood than the etiology of consensus stabilization. Many correlated residues are not close in space, but many can also be assembled into networks of interacting residues that connect distant regions of a protein fold. A number of studies have identified roles for correlated residues in allosteric regulation [66–68]. Ranganathan has recently devised a new kind of SCA calculation and has used it to identify independent clusters of co-evolving residues in protein families (sectors) . Mutations to different sectors in trypsin demonstrated that one had structural and the other had functional consequences. There is a great deal of work that lies ahead to understand the meaning of correlations in a general way.
Magliery and Regan applied consensus analysis and SCA to TPR motifs, which uncovered two subfamilies with distinct alternative networks of interacting residues and resulted in an algorithm for identifying active-site residues [70*,71]. Among the most striking of the results of that study was an explanation for the unusually high charge of a consensus TPR motif (−7) despite the average TPR have a zero net charge. Charge neutralization occurred by correlations in weakly conserved positions on the surface of the motif. This effect was not always local (for example, two residues close in space forming a salt bridge), suggesting at least one mechanism for important non-local correlations. Magliery and colleagues recently engineered two closely-related consensus variants of triosephosphate isomerase (TIM) from slightly different sequence databases (BJS, Durani, V. and TJM, submitted). The two variants differed dramatically in their physical properties and activity, with one of them having wild-type like kinetics and the other being weakly active and poorly folded. Both variants were much more substantially different from any natural TIM than the two variants were from each other. The only apparent difference between these two variants is the extent to which they complete networks of correlated residues. This type of host-guest approach will hopefully shed light on the physical meaning of correlated positions.
First-principles computational methods are likely to remain far from a comprehensive predictive model for some time, until better potential functions and better treatments of the unfolded state can be incorporated. Even empirical parameterization is very difficult given our sparse coverage of sequence space in thermodynamics studies. But there has been a dramatic increase in efforts to bridge that gap in the last 5–10 years in the form of new HT screens for foldedness, thermodynamic stability, solubility and kinetic stability. In the next decade, with improved methods of HT gene construction, handling, expression and purification, these new stability screening methods will give us a vastly richer and more detailed view of the effects of mutations on protein physical properties. And in the meantime, these methods are immediately adaptable for screening random libraries to improve the physical properties of proteins for easier handling, crystallization and structural studies, and superior biotherapeutics. Application of these methods to biophysically “unfriendly” proteins that are larger, more complex and do not refold spontaneously is likely to change our view of protein folding for the majority of proteins.
Random screening in the absence of any information is often a slog, and any information to narrow down libraries to find stabilizing mutations is welcome. Protein sequence statistics can be a useful tool for guiding combinatorial experiments and limiting possibilities in difficult engineering experiments. The molecular etiology of the effects of consensus and correlated mutations remains a difficult problem, but the combination of screening methods with these kinds of calculations in the next decade will accelerate research towards an understanding of those effects.
Protein stability remains one of the most difficult problems in protein science, but its illumination by experiments that take advantage of large numbers, both experimentally and statistically, offers new hope for a solution in the years ahead.
A “consensus” residue is simply the most common amino acid in one position of a family of proteins—that is, in a column of a multiple sequence alignment. It is not always easy to determine the consensus sequence of a protein, because it is not easy to align stretches of sequence that are poorly conserved or have insertions or deletions (such as loops). Also, many positions are only weakly conserved and may use nearly all 20 amino acids with some frequency. A correlation is, fundamentally, when a pair of residues in two positions is observed more or less frequently than expected by chance. For example, if Ala is seen in position A in 20% of sequences, and if it is in position B in 20% of sequences, than we would expect Ala-Ala pairs in A–B in 4% of sequences. If we observed it in all 20% of the sequences, that might represent a strong correlation. Compared to consensus, many more sequences are necessary to be confident of the significance of correlations since there are 400 possible pairs in any two positions. Information theory (e.g., relative entropy and mutual information) can be used to quantify how biased or conserved a position is, and how interconnected the distributions of two positions are.
The authors thank the NIH (R01 GM083114 and U54 NS058183 to TJM) and The Ohio State University for support. JJL was an NIH CBIP fellow and a fellow of the Great Rivers affiliate of the AHA. BJS was an NIH CBIP fellow and is a Presidential fellow of The Ohio State University.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.