Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2967780

Formats

Article sections

- Abstract
- I. Historical overview of Ising models and motivation for the present review
- II. Linear repeat proteins and their connection to linear Ising models
- III. Formulating a homopolymer partition function and the zipper approximation
- IV. Matrix approach: homopolymers
- VI. Matrix approach: heteropolymers
- VI. Solvability criteria for Ising models applied to repeat protein folding
- V. Matrix homopolymer analysis of consensus TPR folding
- VII. Matrix heteropolymer analysis of consensus ankyrin repeat folding
- VIII. Summary and future directions
- LITERATURE CITED

Authors

Related links

Methods Enzymol. Author manuscript; available in PMC 2010 November 2.

Published in final edited form as:

PMCID: PMC2967780

NIHMSID: NIHMS170338

T.C. Jenkins Department of Biophysics, The Johns Hopkins University, 3400 N. Charles St., Baltimore MD 21218 USA

The publisher's final edited version of this article is available at Methods Enzymol

See other articles in PMC that cite the published article.

The linear “Ising” model, which has been around for nearly a century, treats the behavior of linear arrays of repetitive, interacting subunits. Linear “repeat-proteins” have only been described in the last decade or so, and their folding energies have only been characterized very recently. Owing to their repetitive structures, linear repeat-proteins are particularly well suited for analysis by the nearest-neighbor Ising formalism. After briefly describing the historical origins and applications of the Ising model to biopolymers, and introducing repeat protein structure, this chapter will focus on the application of the linear Ising model to repeat proteins. When applied to homopolymers, the model can be represented and applied in a fairly simplified form. When applied to heteropolymers, where differences in energies among individual subunits (i.e. repeats) must be included, some (but not all) of this simplicity is lost. Derivations of the linear Ising model for both homopolymer and heteropolymer repeat-proteins will be presented. With the increased complexity required for analysis of heteropolymeric repeat proteins, the ability to resolve different energy terms from experimental data can be compromised. Thus, a simple matrix approach will be developed to help inform on the degree to which different thermodynamic parameters can be extracted from a particular set of unfolding curves. Finally, we will describe the application of these models to analyze repeat-protein folding equilibria, focusing on simplified repeat proteins based on “consensus” sequence information.

The history of the “Ising” model, or perhaps more appropriately, the Ising-Lenz model, has been described extensively (Brush, 1967; Niss, 2005). Originally developed to study ferromagnetism, the model can be traced to the dissertation of Ernst Ising (Ising, 1925), and to an earlier proposal by Wilhelm Lenz (Lenz, 1920). At the time, Ising was directly connected to Lenz, as Ising carried out his dissertation work on the model under Lenz’s guidance at Hamburg University. Since that time, the model (with which Ising’s name is almost exclusively associated) has been applied to study a wide range of cooperative phenomena in one, two, and three-dimensions, including phase separation in mixtures, phase transitions in single-component systems (the lattice gas model), and cooperative phenomena in linear biopolymers. It seems unfortunate that Ising did not continue in this area, in part because he was discouraged that, in his view, the model could not capture ferromagnetic transitions (Brush, 1967).

Although the Ising model has been used to describe order-disorder transitions in a wide variety of diverse systems, the one-dimensional Ising model has been particularly useful for conformational transitions in linear polymers. These transitions, which can be categorized as “helix-coil” transitions, include the equilibria between the α-helix- and coil in peptides (Schellman, 1958; Zimm and Bragg, 1959; Lifson and Roig, 1961), and various equilibria of DNA and RNA, including double-helix formation (Zimm, 1960; Crothers and Kallenbach, 1966), and stacking transitions of single strands (Applequist and Damle, 1965; Poland *et al*., 1966). This literature, along with a very clear development of analytical models, is presented in a beautiful monograph by Poland and Scheraga (Poland and Scheraga, 1970). More recent applications include binding of protein ligands to repetitive structures such as DNA and protein filaments (McGhee and von Hippel, 1974; De La Cruz, 2005).

In this review, we develop aspects of the nearest-neighbor or Ising model in the context of linear repeat proteins, emphasizing key features that are pertinent to recent experimental studies (including heterogeneous, homogeneous, and “capped” structures, see below). We focus both on the theory and on how it can be used to analyze experimental data. It is our aim to provide enough detail so that all steps of the derivation can be followed (from the basic model to the development of the partition function, and then to modeling equilibrium unfolding transitions), while avoiding specific features that apply exclusively to other types of linear biopolymers. In addition, we will include a discussion of some practical issues associated with determining the model-dependent parameters, emphasizing the relationship between these parameters and the data needed for their accurate determination.

The structures and global stabilities of linear repeat proteins have been described in a number of reviews (Groves and Barford, 1999; Kobe and Kajava, 2000; Kajava, 2001; Mosavi *et al*., 2004; Main *et al*., 2005). The units of repeat proteins are constructed from tandem elements of secondary structure units (α-helix, β-strand, PII helix, turn), arranged in a large loop. The length of individual repeats is approximately 20-40 residues, depending on the type of repeat. Typically, individual repeats show primary sequence similarity, and in most cases repeats were identified by primary sequence before structural details were available. However, some repeats show little or no obvious repetition at the primary sequence level. Even when there is repetition, sequence identity from one repeat to the next is typically around twenty five percent. Thus, although consensus sequences can be identified, sequences of natural repeats differ significantly from the consensus.

Three types of repeat proteins that have been amenable to structural and thermodynamic analysis and simplification through consensus information are ankyrin- (ANK), leucine-rich- (LRR), and tetratricopeptide (TPR) repeat proteins (see (Kloss *et al*., 2008) for review; also (Courtemanche and Barrick, 2008; Kloss and Barrick, 2008)). TPR and ANK repeats are composed of α-helices and turns, with two short turns connecting the TPR helices, and one short turn and one extended loop connecting the ANK helices. In contrast, LRR proteins contain a β-strand that packs against strands of neighboring repeats to form a contiguous sheet. Depending on the subtype, LRRs contain either an α-helix, a 3_{10} helix, or an extended PPII (Kajava, 2001).

In linear repeat proteins, adjacent repeat units pack against their neighbors in a roughly linear array (Figure 1). Depending on the shape and packing of repeats, different types of repeats typically show regular deviation from linearity (Kobe and Kajava, 2000), displaying twist from repeat to repeat (particularly pronounced for TPRs) and/or curvature along the entire stack (particularly pronounced for some LRR subtypes). For some repeat proteins, such as WD40 domains and TIM barrels, curvature is so extreme that a “closed” or circular structure is formed. Since such closed proteins have numerous sequence-distant interactions, they are not easily analyzed using nearest-neighbor thermodynamic models, and will not be discussed here.

Linear repeat proteins have two features that make them ideal subjects for simple nearest-neighbor models. First, as described above, they are constructed of a repeating unit at the level of secondary and tertiary structure; repetition can be extended to the level of primary sequence using consensus information (see below). This translational symmetry reduces the number and type of energy terms required to describe stability, allowing different regions of the molecule to be described in the same way. Second, as can be seen in inter-residue contact maps, direct contacts are limited to repeats that are immediately adjacent in sequence, which justifies using a nearest-neighbor approximation to describe folding.

Given this structural simplicity, the free energy of repeat protein folding may be expected to have two dominant contributions: the intrinsic folding of individual units (which we will call *ΔG _{i}*) and the interfacial interaction of neighboring repeats (

The partition function, or sum over states, is central to analysis of the thermodynamic properties of repeat proteins, their populations, and their folding. Here the partition function will be developed for a homopolymeric linear system as a summation. As articulated by Zimm and Bragg in the late 1950s (Zimm and Bragg, 1959), this summation is particularly useful for short chains, thus keeping the number of terms in the sum manageable. The summation also simplifies to a useful approximate (closed) form in the high cooperativity limit.

One intuitive way to build a molecular partition function, *q*, for repeat protein folding, is to represent the statistical weight of each conformation (for a linear Ising model there will be *2 ^{n}* total) as the concentration of each conformation, compared (as a ratio) to an arbitrary reference conformation. By choosing the state in which all

$$q=\frac{1}{\left[{U}_{n}\right]}\sum _{i=0}^{n}\sum _{\mathit{configs}}[{F}_{i};{U}_{n-i}]$$

(1)

The inside sum in equation 1 is taken over all microscopic configurations which have *i* folded repeats (*F _{i}*). Because of the dependence of overall folding energies on interfacial interactions, these microscopic configurations can differ in energy even though they have the same number of folded repeats. The number of interfaces is maximized when folded repeats are clustered together, whereas gaps separating folded repeats decrease the number of interfaces. Thus, converting equation 1 to a sum of equilibrium constants κ and τ for intrinsic folding and interfacial interaction (or exponentials in energies) requires the number of gaps between folded segments to be explicitly stated:

$$q=1+\sum _{i=1}^{n}\sum _{g=0}^{i-1}{\Omega}_{i,g}{\kappa}^{i}{\tau}^{i-1-g}$$

(2)

In this equation, Ω* _{i,g}* is the number of ways that

Unfortunately, the degeneracy in equation 2 is rather complex even in open form, and is not particularly useful except for short arrays (low *n*), where each term in *q* can be given explicitly. However, in the limit of high interfacial stability, which eliminates gaps between folded repeats, the degeneracy (Ω*i,g=0*) and the partition function become particularly simple. When all *i* folded repeats are coalesced into one structured segment (*g=0*), there are *n−i+1* ways to arrange the structured segment. This approximation is often referred to as the “zipper model” because structure (folded repeats in this case) zips up as a single block. The partition function for the zipper model can be written as

$$\begin{array}{cc}\hfill q& =1+\sum _{i=1}^{n}(n-i+1){\kappa}^{i}{\tau}^{i-1}\hfill \\ \hfill & =1+{\tau}^{-1}\sum _{i=1}^{n}(n-i+1){\left(\kappa \tau \right)}^{i}\hfill \\ \hfill & =1+{\tau}^{-1}(n+1)\sum _{i=1}^{n}{\left(\kappa \tau \right)}^{i}-{\tau}^{-1}\sum _{i=1}^{n}i{\left(\kappa \tau \right)}^{i}\hfill \\ \hfill & =1+{\tau}^{-1}(n+1)\sum _{i=1}^{n}{\left(\kappa \tau \right)}^{i}-\kappa \frac{d}{d\left(\kappa \tau \right)}\sum _{i=1}^{n}{\left(\kappa \tau \right)}^{i}\hfill \end{array}$$

(3)

Both sums in the last line of equation (3) express partial geometric series in the variable *κτ*, which can be written in closed form as

$$\sum _{i=1}^{n}{\left(\kappa \tau \right)}^{i}=\frac{\kappa \tau ({\left\{\kappa \tau \right\}}^{n}-1)}{\kappa \tau -1}$$

Substituting this closed form expression into equation (3) gives

$$q=1+\frac{\kappa (n+1)({\left\{\kappa \tau \right\}}^{n}-1)}{\kappa \tau -1}-k\frac{d}{d\left(\kappa \tau \right)}\left(\frac{\kappa \tau ({\left\{\kappa \tau \right\}}^{n}-1)}{\kappa \tau -1}\right)$$

(4)

Differentiating the second term and rearranging gives a closed form of the partition function:

$$q=1+\frac{\kappa \left({\left\{\kappa \tau \right\}}^{n+1}-\{n+1\}\kappa \tau +n\right)}{{(\kappa \tau -1)}^{2}}$$

(5)

With this relatively simple expression for the partition function, populations and associated observable properties can be calculated. Of primary importance is the fraction of repeats that are folded, which is given as

$$\begin{array}{cc}\hfill \theta & =\frac{1}{n}\sum _{i=0}^{n}i{p}_{i}=\frac{1}{n}\sum _{i=0}^{n}i\frac{(n-i+1){\kappa}^{i}{\tau}^{i-1}}{q}=\frac{\kappa}{nq}\sum _{i=1}^{n}(n-i+1)i{\kappa}^{i-1}{\tau}^{i-1}\hfill \\ \hfill & =\frac{\kappa}{nq}\frac{d}{d\kappa}\left\{1+\sum _{i=1}^{n}(n-i+1){\kappa}^{i}{\tau}^{i-1}\right\}\hfill \\ \hfill & =\frac{\kappa}{nq}\frac{dq}{d\kappa}\hfill \\ \hfill & =\frac{1}{n}\frac{d\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}q}{d\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\phantom{\rule{thinmathspace}{0ex}}\kappa}\hfill \end{array}$$

(6)

where *p _{i}* is the fractional population of the i

$$\theta =\frac{\kappa}{n\left(\kappa \tau -1\right)}\times \frac{n{\left\{\kappa \tau \right\}}^{n+2}-\left(n+2\right){\left(\kappa \tau \right)}^{n+1}+(n+2)\kappa \tau -n}{{\left(\kappa \tau -1\right)}^{2}+\kappa \left({\left\{\kappa \tau \right\}}^{n+1}-\{n+1\}\kappa \tau +n\right)}$$

(7)

Equilibrium unfolding transitions can be derived from (or fitted using) equation (7) by introducing an explicit dependence on an external variable (temperature, pressure, or denaturant) to either *κ, τ*, or both parameters. In this review we will primarily focus on denaturant-induced unfolding. In Ising analysis of repeat protein unfolding, statistical weights have been have been assumed to vary exponentially with denaturant (linear in terms of free energy):

$$\kappa \left(x\right)={e}^{-\left(\Delta {G}_{i}\right)\u2215RT}={e}^{-\left(\Delta {G}_{i,H2O}-{m}_{i}\left[x\right]\right)}\u2215RT$$

(8A)

$$\tau \left(x\right)={e}^{-\left(\Delta {G}_{i,i+1}\right)\u2215RT}={e}^{-\left(\Delta {G}_{i,i+1,H2O}-{m}_{i,i+1}\left[x\right]\right)}\u2215RT$$

(8B)

Here, *[x]* represents molar denaturant concentration, *m _{i}* and

Although in principle both the intrinsic and interfacial stability may be affected, most studies of repeat-protein denaturation have attributed the effect of denaturant solely to the intrinsic folding constant, *κ* (Mello and Barrick, 2004; Kajander *et al*., 2005; Wetzel *et al*., 2008). Assuming intrinsic folding involves formation of secondary structure elements (Figure 2), whereas the nearest-neighbor interaction corresponds to packing of neighboring repeats, this partitioning is consistent with a growing body of evidence suggesting that denaturants destabilize proteins largely by acting on the backbone, and thus should primarily destabilize units of secondary structure rather than packing interactions between such structures (Scholtz *et al*., 1995; Auton *et al*., 2007; Bolen and Rose, 2008). Moreover, this partitioning is consistent with recent global analysis from our laboratory on denaturant-induced unfolding of large numbers of consensus ankyrin repeat unfolding transitions (TA & DB, in preparation).

The first application of the 1D-Ising model to repeat protein folding involved a series of constructs in which ankyrin repeats were deleted from one or both ends of the Notch ankyrin domain (Mello and Barrick, 2004). By analyzing the free energies of unfolding of these constructs using a set of linear equations, a free energy contribution originating from each repeat was obtained. Because of the way the deletion series was constructed, analysis yielded an estimate of the intrinsic stability (*ΔG _{i}*) of one of the repeats of +6.6 kcal/mol and an average interfacial stability (

The zipper model assumes that the folding of each repeat is highly coupled to its neighbors. High coupling allows conformations in which stretches of folded repeats are separated by unfolded repeats to be ignored. However, if cooperativity between adjacent repeats is low, or repeat arrays are long, these intermediates will be significantly populated, and must be accounted for. In this section we will present a simple matrix-based derivation of the partition function for the folding reaction of “homopolymeric” repeat proteins (i.e. all repeats are the same) that accounts for all partly folded conformations in a very compact way. This “matrix-method” has been widely used to study one dimensional interacting biological systems (Zimm and Bragg, 1959; Poland and Scheraga, 1970). In addition to providing a full description of all partly folded states, this matrix-based form can be used to analyze experimental unfolding transitions to determine *ΔG _{i}* and

Before we show how the matrix representation of the partition function can be manipulated to analyze unfolding curves, we will use a recursion-based approach that justifies the matrix form of the partition function. Although the matrix-based form of the partition function can easily be used without a detailed understanding of its origin, and its form is often justified simply by the fact that the rules of matrix multiplication combine statistical weights in the appropriate way, we feel that an understanding of the origins of the matrix method will result in a deeper understanding of its application.

In the homopolymer approximation, each repeat has the same intrinsic folding energy (*ΔG _{i}*), and the same interaction energy with its neighbors (

$$\Delta G\xb0=\sum _{j=1}^{n}{\delta}_{j}\Delta {G}_{i}+\sum _{j=1}^{n-1}{\delta}_{j}{\delta}_{j+1}\Delta {G}_{i,i+1}$$

where *δ _{j}=1* if repeat

$$q\left(n\right)=\sum _{\text{state}=1}^{{2}^{n}}{e}^{-\Delta G\xb0\u2215RT}$$

(9)

Long repeat proteins (large *n*) leads to a very large number (*2 ^{n}*) terms in the sum, and is impractical for calculations and analysis of data. Instead, a simpler, more compact form of

When the *n ^{th}* repeat is added to the C-terminal end in a folded state,

$${q}_{f}\left(n\right)={q}_{f}(n-1){e}^{-(\Delta {G}_{i}+\Delta {G}_{i,i+1})\u2215RT}+{q}_{u}(n-1){e}^{-\left(\Delta {G}_{i}\right)\u2215RT}$$

The equation above simply states that if repeat *n-1* is folded (with partition function *q _{f}*(

$$\begin{array}{cc}\hfill {q}_{u}\left(n\right)& ={q}_{f}(n-1){e}^{0\u2215RT}+{q}_{u}(n-1){e}^{0\u2215RT}\hfill \\ \hfill & ={q}_{f}(n-1)+{q}_{u}(n-1)\hfill \end{array}$$

The expressions for *q _{f}*(

$$\begin{array}{cc}\hfill & {q}_{f}\left(n\right)={e}^{-(\Delta {G}_{i}+\Delta {G}_{i,i+1})\u2215RT}{q}_{f}(n-1)+{e}^{-\Delta {G}_{i}\u2215RT}{q}_{u}(n-1)\hfill \\ \hfill & {q}_{u}={q}_{f}(n-1)+{q}_{u}(n-1)\hfill \end{array}$$

and can be consolidated with a simple matrix relationship:

$$\begin{array}{cc}\hfill \left[\begin{array}{c}\hfill {q}_{f}\left(n\right)\hfill \\ \hfill {q}_{u}\left(n\right)\hfill \end{array}\right]& =\left[\begin{array}{cc}\hfill {e}^{-(\Delta {G}_{i}+\Delta {G}_{i,i+1})\u2215RT}\hfill & \hfill {e}^{-\Delta {G}_{i}\u2215RT}\hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{c}\hfill {q}_{f}(n-1)\hfill \\ \hfill {q}_{u}(n-1)\hfill \end{array}\right]\hfill \\ \hfill & =\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{c}\hfill {q}_{f}(n-1)\hfill \\ \hfill {q}_{u}(n-1)\hfill \end{array}\right]\hfill \end{array}$$

The second line comes from substituting statistical weights *κ* = *e*^{−ΔGi/RT} and *τ* = *e*^{−ΔGi,i+1/RT} for the free energy terms.

Continuing the recursion to the *n-2* repeat gives

$$\begin{array}{cc}\hfill \left[\begin{array}{c}\hfill {q}_{f}\left(n\right)\hfill \\ \hfill {q}_{u}\left(n\right)\hfill \end{array}\right]& =\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{c}\hfill {q}_{f}(n-2)\hfill \\ \hfill {q}_{u}(n-2)\hfill \end{array}\right]\hfill \\ \hfill & ={\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{2}\left[\begin{array}{c}\hfill {q}_{f}(n-2)\hfill \\ \hfill {q}_{u}(n-2)\hfill \end{array}\right]\hfill \end{array}$$

This recursion can continued all the way to the first (N-terminal) repeat to give

$$\left[\begin{array}{c}\hfill {q}_{f}\left(n\right)\hfill \\ \hfill {q}_{u}\left(n\right)\hfill \end{array}\right]={\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{n-1}\left[\begin{array}{c}\hfill {q}_{f}\left(1\right)\hfill \\ \hfill {q}_{u}\left(1\right)\hfill \end{array}\right]$$

*q _{f}* (1) and

$$\begin{array}{c}\hfill {q}_{f}\left(1\right)=\kappa \hfill \\ \hfill {q}_{u}\left(1\right)=1\hfill \end{array}$$

Thus

$$\left[\begin{array}{c}\hfill {q}_{f}\left(n\right)\hfill \\ \hfill {q}_{u}\left(n\right)\hfill \end{array}\right]={\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{n-1}\left[\begin{array}{c}\hfill \kappa \hfill \\ \hfill 1\hfill \end{array}\right]$$

Multiplying the LHS by the row vector [*1 1*] sums *q _{f}*(

$$\begin{array}{cc}\hfill q\left(n\right)& =\left[1\phantom{\rule{1em}{0ex}}1\right]\left[\begin{array}{c}\hfill {q}_{f}\left(n\right)\hfill \\ \hfill {q}_{u}\left(n\right)\hfill \end{array}\right]\hfill \\ \hfill & =\left[1\phantom{\rule{1em}{0ex}}1\right]{\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{n-1}\left[\begin{array}{c}\hfill \kappa \hfill \\ \hfill 1\hfill \end{array}\right]\hfill \end{array}$$

By expanding the column vector on the RHS in terms of the statistical weight matrix, *q*(*n*) can be expressed as the *n ^{th}* power of the matrix

$$\begin{array}{cc}\hfill q\left(n\right)& =\left[1\phantom{\rule{1em}{0ex}}1\right]{\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{n-1}\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]\left[\begin{array}{c}\hfill 0\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[1\phantom{\rule{1em}{0ex}}1\right]{\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{n}\left[\begin{array}{c}\hfill 0\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \end{array}$$

One final rearrangement of *q*(*n*), which will be helpful for further calculations, is given by taking the transpose of the equation above (as *q*(*n*) is a scalar, it is unaffected by transposition):

$$\begin{array}{cc}\hfill q\left(n\right)& ={\left(\left[1\phantom{\rule{1em}{0ex}}1\right]{\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{n}\left[\begin{array}{c}\hfill 0\hfill \\ \hfill 1\hfill \end{array}\right]\right)}^{T}\hfill \\ \hfill & ={\left[\begin{array}{c}\hfill 0\hfill \\ \hfill 1\hfill \end{array}\right]}^{T}{\left({\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill \kappa \hfill \\ \hfill 1\hfill & \hfill 1\hfill \end{array}\right]}^{n}\right)}^{T}{\left[1\phantom{\rule{1em}{0ex}}1\right]}^{T}\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]{\left[\begin{array}{cc}\hfill \kappa \tau \hfill & \hfill 1\hfill \\ \hfill \kappa \hfill & \hfill 1\hfill \end{array}\right]}^{n}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]{W}^{n}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \end{array}$$

where the weight matrix is represented using *W*. The above equation allows *q*(*n*) to be computed without having to enumerate all 2* ^{n}* terms explicitly. Moreover, it can be simplified by treating it as an eigenvalue problem, which greatly simplifies the product of the statistical weight matrices. In this treatment,

$$W=TD{T}^{-1}$$

where *D* is a diagonal matrix of the eigenvalues (*λ _{1}, λ_{2}*) of

$$\begin{array}{cc}\hfill q\left(n\right)& =\left[0\phantom{\rule{1em}{0ex}}1\right]{\left(TD{T}^{-1}\right)}^{n}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]\left(TD{T}^{-1}\right)\left(TD{T}^{-1}\right)\dots \left(TD{T}^{-1}\right)\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]TD{T}^{-1}TD{T}^{-1}\dots TD{T}^{-1}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]TDD\dots D{T}^{-1}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]T{D}^{n}{T}^{-1}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]T{\left[\begin{array}{cc}\hfill {\lambda}_{1}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {\lambda}_{2}\hfill \end{array}\right]}^{n}{T}^{-1}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill & =\left[0\phantom{\rule{1em}{0ex}}1\right]T\left[\begin{array}{cc}\hfill {\lambda}_{1}^{n}\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {\lambda}_{2}^{n}\hfill \end{array}\right]{T}^{-1}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \end{array}$$

(10)

The eigenvalues of *W* are obtained by solving the characteristic equation det(*W* − *λI*) = 0, yielding the two roots:

$$\begin{array}{c}\hfill {\lambda}_{1}=\left(\kappa \tau +1+\sqrt{{\left(\kappa \tau -1\right)}^{2}+4\kappa}\right)/2;\hfill \\ \hfill d{\lambda}_{1}\u2215d\kappa =\tau \u22152+\left(\kappa {\tau}^{2}-\tau +2\right)/2\sqrt{{\left(\kappa \tau -1\right)}^{2}+4\kappa}\hfill \end{array}$$

(11A)

$$\begin{array}{c}\hfill {\lambda}_{2}=\left(\kappa \tau +1-\sqrt{{\left(\kappa \tau -1\right)}^{2}+4\kappa}\right)/2;\hfill \\ \hfill d{\lambda}_{2}\u2215d\kappa =\tau \u22152-\left(\kappa {\tau}^{2}-\tau +2\right)/2\sqrt{{\left(\kappa \tau -1\right)}^{2}+4\kappa}\hfill \end{array}$$

(11B)

(the derivatives will be used below). Two corresponding eigenvectors of *W* are

$$\overrightarrow{{t}_{1}}=\left[\begin{array}{c}\hfill 1-{\lambda}_{1}\hfill \\ \hfill -\kappa \hfill \end{array}\right],\overrightarrow{{t}_{2}}=\left[\begin{array}{c}\hfill 1-{\lambda}_{2}\hfill \\ \hfill -\kappa \hfill \end{array}\right]$$

and combine to give

$$T=\left[\overrightarrow{{t}_{1}}\phantom{\rule{1em}{0ex}}\overrightarrow{{t}_{2}}\right]=\left[\begin{array}{cc}\hfill 1-{\lambda}_{1}\hfill & \hfill 1-{\lambda}_{2}\hfill \\ \hfill -\kappa \hfill & \hfill \kappa \hfill \end{array}\right],\text{and}\phantom{\rule{thinmathspace}{0ex}}{T}^{-1}=\frac{1}{\kappa \left({\lambda}_{1}-{\lambda}_{2}\right)}\left[\begin{array}{cc}\hfill -\kappa \hfill & \hfill {\lambda}_{2}-1\hfill \\ \hfill \kappa \hfill & \hfill 1-{\lambda}_{1}\hfill \end{array}\right]$$

Combining these eigenvalues and eigenvectors into equation 10 gives a relatively simple closed-form expression for *q(n)*:

$$q\left(n\right)=\frac{\kappa \left(1-\tau \right)\left({{\lambda}_{1}}^{n}-{{\lambda}_{2}}^{n}\right)+{{\lambda}_{1}}^{n+1}-{{\lambda}_{2}}^{n+1}}{{\lambda}_{1}-{\lambda}_{2}}$$

By differentiating *q(n)* with respect to *κ* as in equation (6) above, the fraction of folded repeats (*θ*) can be calculated as

$$\theta =\frac{\kappa}{n}\left[\frac{\frac{\partial {\lambda}_{2}}{\partial \kappa}-\frac{\partial {\lambda}_{1}}{\partial \kappa}}{{\lambda}_{1}-{\lambda}_{2}}+\frac{\left(1-\tau \right)\left[{\lambda}_{1}^{n}-{\lambda}_{2}^{n}+\kappa n\left({\lambda}_{1}^{n-1}\frac{\partial {\lambda}_{1}}{\partial \kappa}-{\lambda}_{2}^{n-1}\frac{\partial {\lambda}_{2}}{\partial \kappa}\right)\right]+\left(n+1\right)\left({\lambda}_{1}^{n}\frac{\partial {\lambda}_{1}}{\partial \kappa}-{\lambda}_{2}^{n}\frac{\partial {\lambda}_{2}}{\partial \kappa}\right)}{\kappa \left(1-\tau \right)\left({\lambda}_{1}^{n}-{\lambda}_{2}^{n}\right)+{\lambda}_{1}^{n+1}-{\lambda}_{2}^{n+1}}\right]$$

(12)

Values of *λ*_{1} and *λ*_{2}, along with derivatives with respect to *κ*, can be inserted into equation 12 from equations 11A and 11B above. The denaturant dependence of the fraction of folded repeats can be obtained by combining equations 8A (and if necessary, 8B) into equation 12. Finally, the fraction of folded repeats can be used to analyze experimental equilibrium denaturation curves to determine the underlying thermodynamic parameters through the equation

$${Y}_{\mathit{obs}}\left(\left[x\right],n\right)=\left({A}_{f}\left[x\right]+{B}_{f}\right)\theta \left(\left[x\right],n\right)+\left({A}_{u}\left[x\right]+{B}_{u}\right)\left(1-\theta \left(\left[x\right],n\right)\right)$$

where *Y _{obs}* represents an observed signal (often far-UV circular dichroism or tryptophan fluorescence). The

A primary motivation for analyzing consensus repeat protein unfolding is that each repeat can be considered to have the same stability and the same interaction energy with its neighbors, greatly decreasing the number of unknown thermodynamic parameters. However, repeat protein arrays built of a single consensus sequence seem to have solubility problems, likely owing to large hydrophobic interfaces present at the ends of each array. In crystal structures of a fragment of the Notch ankyrin domain, a head-to head crystallographic dimer is seen (Lubman *et al*., 2005), suggesting that the end repeats can indeed mediate association by such an interface. Such associations are also seen crystallographically in superhelical consensus TPR arrays, and actually displace the C-terminal capping helix (Kajander *et al*., 2007). Capping one or both termini with repeats bearing polar or charge substitutions solves this problem, but introduces new thermodynamic parameters, and more importantly, requires more complex models for analysis.

In this section, we will describe how the partition function for a heterogeneous repeat protein can be manipulated to simulate populations and folding transitions, and more importantly, fitted to equilibrium folding transitions. As above, we will use a matrix representation of the partition function, which again can be simplified from an open sum that enumerates each conformation. For generality, our derivation will treat each repeat as different, having different intrinsic folding (*ΔG _{i}*) and interaction energies (

We will start with the same matrix formulation we presented for homopolymers, and define a unique weight matrix for repeat:

$$\begin{array}{cc}\hfill q\left(n\right)& =\left[0\phantom{\rule{1em}{0ex}}1\right]{W}_{1}{W}_{2}\cdots {W}_{n}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]\hfill \\ \hfill {W}_{i}& =\left[\begin{array}{cc}\hfill {\kappa}_{i}{\tau}_{i-1,i}\hfill & \hfill 1\hfill \\ \hfill {\kappa}_{i}\hfill & \hfill 1\hfill \end{array}\right]\hfill \\ \hfill {k}_{i}& ={e}^{-\Delta {G}_{i}\u2215RT}\hfill \\ \hfill {\tau}_{i-1,i}& ={e}^{-\Delta {G}_{i-1,i}\u2215RT}\hfill \end{array}$$

(13)

As demonstrated above for a homopolymer, the rules of matrix multiplication combine statistical weights in such a way as to produce the appropriate Boltzmann factor for each conformation. That derivation, which considered *q(n)* in terms *q(n-1), q(n-2)*…, can easily accommodate unique, position-specific coefficients, rather than a single value for *κ* and for *τ*, to generate *q(n)* as in equation 13. The index on the interaction parameter in equation 13 represents the interaction between repeat *i* and the previous repeat (*i-1*) because the rows of the statistical weight matrix represent the folding status of the previous repeat. In the partition function for the homopolymer, diagonalization provides a huge simplification, converting a product of *n* identical matrices to a product of only three (*TDT ^{−1}*). This is not possible for the heteropolymer partition function, because the

As described above, the quantity of greatest interest in terms of connecting with experiments is the fraction of the repeats folded, *θ*. For homopolymeric systems, an expression for *θ* could be generated by differentiating the partition function with respect to *κ*, and dividing by *q* (see equation 6). With the closed-form homopolymer partition function, this operation is mathematically quite simple. Here, not only is the partition function more complex, there is no single value of *κ* that can be used as a counter of folded repeats. Moreover, the option of calculating an open sum of populations for all possible conformations and multiplying by the number of folded repeats is cumbersome (*2 ^{n}* terms) and for large arrays of repeats, fitting requires significant computer memory.

Instead, we favor a summation over the *n* positions of the folded repeat, calculating the probability that each of the *n* repeats is folded, instead of the probability of each of the *2 ^{n}* conformations. Clearly, the fraction of repeats that are folded is simply the average probability that each of the repeats is folded:

$$\theta =\frac{1}{n}\sum _{i=1}^{n}{\theta}_{i}$$

where *θ _{i}* is the probability of finding

$${\theta}_{i}=\frac{{q}_{i}}{q\left(n\right)}$$

giving

$$\theta =\frac{1}{nq\left(n\right)}\sum _{i=1}^{n}{q}_{i}$$

This summation emphasizes the fact that *q(n)* only needs to be calculated once. In contrast, *q _{i}* needs to be calculated

$${q}_{i}=\left[0\phantom{\rule{1em}{0ex}}1\right]{W}_{1}{W}_{2}\dots {W}_{i-1}\left[\begin{array}{cc}\hfill {\kappa}_{i}{\tau}_{i-1}\hfill & \hfill 0\hfill \\ \hfill {\kappa}_{i}\hfill & \hfill 0\hfill \end{array}\right]{W}_{i+1}{W}_{i+2}\dots {W}_{n}\left[\begin{array}{c}\hfill 1\hfill \\ \hfill 1\hfill \end{array}\right]$$

In the statistical weight matrix, the second column corresponds to all of the conformations where the *i ^{th}* repeat is unfolded. Setting this column to zero in the

The above sections derive equations for nearest-neighbor partition functions for repeat protein folding. These partition functions can be used to evaluate populations of partly folded states, and generate folding curves, given a set of thermodynamic parameters (*ΔG _{i}*,

Much has been written regarding criteria for testing different models and estimating uncertainties of parameter values, given a set of experimental data (see (Johnson, 2008) for a recent review). Models are typically rejected based on non-random residuals and/or physically unreasonable fitted parameter values. Confidence intervals on parameter values can be estimated by statistical methods such as bootstrapanalysis, jack-knife analysis, or simple repetition of the experiment (all resampling methods that differ in their severity), analysis of the parameter covariance matrix, systematic exploration of how the variance of the fit increases as parameters are varied, and Monte Carlo simulation (Johnson, 2008). It is an unfortunate fact that these critical tests usually come after data have been collected. Experimental analysis of repeat protein folding is a laborious undertaking (involving cloning of multiple genes, expression and purification of multiple proteins of different length, and quantitative analysis of each protein (preferably multiple times) by denaturant titrations), and it would be good to know in advance whether such efforts are likely to yield significant thermodynamic insight.

Although many aspects of the sequence in which data acquisition precedes parameter and model testing are largely unavoidable, it is often the case that experiments can be designed *a priori* so that parameters of interest can be determined with confidence, and alternative models can be compared and discriminated. This is particularly true for repeat proteins, given their simple linear architecture, and the simple form of the linear free energy relationships implicit in the linear Ising model. Here we will describe how equilibrium folding studies on repeat proteins can be designed to maximize the information content of the results, given the framework of a particular thermodynamic model. In addition to helping to design future experiments, these ideas help to interpret published studies on repeat-protein folding.

By considering the free energies of folding of a collection of repeat proteins of different length as a system of linear equations, simple ideas from linear algebra relating to solvability can be used to determine whether parameters are likely to be well-determined, and if not, what additional constructs would be required to improve the situation. For a set of repeat proteins of different length and composition, the free energy difference between the fully folded and fully unfolded states can be written as

$$\Delta G\xb0=\sum _{k\phantom{\rule{thickmathspace}{0ex}}\text{repeat types}}{n}_{k}\Delta {G}_{i;k}+\sum _{j\phantom{\rule{thickmathspace}{0ex}}\text{interface types}}{n}_{j}\Delta {G}_{i,i+1;j}$$

The first sum takes into account the different intrinsic energy terms, and the second sum takes into account the different interaction terms. Table 1 provides some examples, both for a homopolymic repeat-protein and for a heteropolymeric repeat-protein with unique N- and C-terminal caps.

For a set of consensus repeats without caps (lines A-C, Table 1), the three free energy equations can be written as

$$\left[\begin{array}{cc}\hfill 3\hfill & \hfill 2\hfill \\ \hfill 4\hfill & \hfill 3\hfill \\ \hfill 5\hfill & \hfill 4\hfill \end{array}\right]\left[\begin{array}{c}\hfill \Delta {G}_{R}\hfill \\ \hfill \Delta {G}_{i,i+1}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \Delta G{\xb0}_{A}\hfill \\ \hfill \Delta G{\xb0}_{B}\hfill \\ \hfill \Delta G{\xb0}_{C}\hfill \end{array}\right]$$

where *ΔG° _{A}* is the free energy difference between the native and denatured states for the reaction defined on line A, and other

For a set of consensus repeats with caps (lines D-G, Table 1), the free energy equations can be written as

$$\left[\begin{array}{cccc}\hfill 1\hfill & \hfill 1\hfill & \hfill 1\hfill & \hfill 2\hfill \\ \hfill 1\hfill & \hfill 2\hfill & \hfill 1\hfill & \hfill 3\hfill \\ \hfill 1\hfill & \hfill 3\hfill & \hfill 1\hfill & \hfill 4\hfill \\ \hfill 1\hfill & \hfill 4\hfill & \hfill 1\hfill & \hfill 5\hfill \end{array}\right]\left[\begin{array}{c}\hfill \Delta {G}_{N}\hfill \\ \hfill \Delta {G}_{R}\hfill \\ \hfill \Delta {G}_{C}\hfill \\ \hfill \Delta {G}_{i,i+1}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \Delta G{\xb0}_{D}\hfill \\ \hfill \Delta G{\xb0}_{E}\hfill \\ \hfill \Delta G{\xb0}_{F}\hfill \\ \hfill \Delta G{\xb0}_{G}\hfill \end{array}\right]$$

Although there are enough equations to solve four unknowns (the column vector on the left-hand side), the columns are not independent. The first and third columns are equal; moreover, the sum of the first and second columns is equal to the fourth. Thus, the matrix lacks full column rank (again, *r=2*). As a result, this set of linear equations has an infinite number of solutions. Thus, the parameters cannot be uniquely determined by elimination. This problem will not be rectified by including additional equations (constructs) that retain both N- and C-terminal capping repeats.

Instead, if a set of four (or more) constructs is considered in which the caps vary along with the length, unique intrinsic folding energies can be determined for both the N- and C-terminal caps. For example, lines B, F, H, and J of Table 1 define the system of equations

$$\left[\begin{array}{cccc}\hfill 0\hfill & \hfill 4\hfill & \hfill 0\hfill & \hfill 3\hfill \\ \hfill 1\hfill & \hfill 3\hfill & \hfill 1\hfill & \hfill 4\hfill \\ \hfill 1\hfill & \hfill 3\hfill & \hfill 0\hfill & \hfill 3\hfill \\ \hfill 0\hfill & \hfill 3\hfill & \hfill 1\hfill & \hfill 3\hfill \end{array}\right]\left[\begin{array}{c}\hfill \Delta {G}_{N}\hfill \\ \hfill \Delta {G}_{R}\hfill \\ \hfill \Delta {G}_{C}\hfill \\ \hfill \Delta {G}_{i,i+1}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \Delta G{\xb0}_{D}\hfill \\ \hfill \Delta G{\xb0}_{E}\hfill \\ \hfill \Delta G{\xb0}_{F}\hfill \\ \hfill \Delta G{\xb0}_{G}\hfill \end{array}\right]$$

The columns of this matrix are now independent, showing full column (and row) rank (*r=4*). Thus, the four thermodynamic parameters can be uniquely determined (although adding equations by including additional constructs will likely improve the robustness of the solution, given uncertainties in free energy measurements).

In principle, this type of analysis could be applied directly to experimental unfolding free energies determined by linear extrapolation from denaturant-induced unfolding transitions (Pace, 1986; Street *et al*., 2008) assuming a two-state (high cooperativity) model. However, if partly folded states are populated in the transition, either because of moderate values of *ΔG _{i,i+1}* or because stability is unevenly distributed along the repeat array, such free energy estimates will be incorrect. In such cases, globally fitting the denaturation transitions directly using an Ising model, which takes partly folded states into account, may improve estimates of free energy terms, in favorable cases providing access to parameters that could not be determined based on considerations of matrix rank above (see discussion of consensus ankyrin arrays below). Nonetheless, this simple analysis is extremely useful both for thinking about what constructs need to be studied to analyze a particular model, and for thinking about why certain parameters don’t appear to be well-determined, given a set of data. This type of rank analysis can also be applied to models that can accommodate differences between interfaces, models that include non-nearest-neighbor interactions, and by differentiation with respect to denaturant concentration, partitioning of

The first study in which a homopolymeric Ising model was used to analyze repeat protein folding involved a collection of consensus TPR arrays of different lengths (Kajander *et al*., 2005). As described above, TPR units are composed of two anti parallel α-helices (termed A and B) and are arranged in a linear array in which adjacent repeats twist along the long axis of the domain, like the steps in a spiral staircase (Figure 1C). Using TPR units of identical consensus sequence (termed CTPRan by the authors, where *n* represents the number of full 34 residue TPR units in a given construct), Regan and coworkers created a series of constructs of different lengths that were amenable to analysis using a homopolymeric Ising model (section III above). However, as with other consensus repeat arrays, to make their CTPR proteins soluble, the authors added an additional polar C-terminal capping helix (a variant of helix A with four polar substitutions).

By monitoring helical structure using CD spectroscopy as a function of guanidine hydrochloride concentration, Kajandar et al. were able to generate and analyze unfolding transitions for constructs containing from two to ten full TPR repeats, as well as the C-terminal cap (CTPRa2 to CTPRa10; data reproduced from Fig. 2 of (Kajander *et al*., 2005)). The authors developed a homopolymer partition function in which each *helix*, rather than each repeat, is treated as the single repeating unit. Applying the homopolymer approximation at the single-helix level treats the A and B helices (and the C-terminal capping helix) as energetically equivalent, both in terms of intrinsic stability and in terms of nearest-neighbor interaction. Using this model, Kajandar et al were able to globally fit all of these transitions (and in a subsequent paper included even longer constructs (Kajander *et al*., 2007)) to a single intrinsic folding and interfacial interaction term (Kajander *et al*., 2005), clearly demonstrating the applicability of the linear Ising model to repeat protein folding.

Several aspects of this seminal study warrant further discussion. First, Kajandar et al. phrased the interaction energies in a way that is closer to the original magnetic spin-spin interactions than that described above (Kajander *et al*., 2005). Although at first glance the two representations look different, they can be shown to be identical, and the CTPR unfolding data can be fitted equally well with the two formulations of the homopolymer Ising model. The curves in Figure 3 were generated by fitting the model derived above to data from (Kajander *et al*., 2005); nearly identical fits and *χ ^{2}* values are obtained with their representation of the model. Moreover, parameters from the two different formulations are nearly identical, when converted using relationships given previously (Kloss

Second, fitted parameter values (*ΔG _{i}*,

Third, although treatment of the A and B helices as identical is clearly consistent with the published data, it would be surprising if the two helices were thermodynamically identical. The A and B helices have virtually no sequence similarity in the consensus design (Main *et al*., 2003). Moreover, structural analysis shows that the packing interactions of helices A and B differ substantially. Whereas the B-helices interact mostly with A-helices, lacking contacts with one another, the A-helices contact neighboring A-helices from adjacent TPRs, as well as their flanking B-helices, as can be seen from the zig-zag patterns in CTPR contact maps (Kajander *et al*., 2007). Adjacent A-helices have a two unit separation in a single-helix Ising model; thus, close contacts between adjacent A-helices would suggest a more complex model that has non-nearest neighbor terms (*ΔG _{i,i+2}*). In addition, the C-terminal polar cap may be expected to introduce further complexity, as its folding energy may differ significantly even from the A-helix from which it is derived.

Given all of these sequence complexities, why not use a more complicated model to describe CTPR folding? One answer to this question is that a simple model works just fine. But does that mean the simple model is right? Given the differences between the two types of helices, a more complex model in which the A and B helices are treated differently makes more physical sense. Unfortunately, all of the CTPR constructs in Kajandar et al. have the same number of A and B helices, and thus it is not possible to separate the relative contributions of the two. Consideration of the free energy equations describing these constructs in terms of separate A and B helices makes this clear:

$$\left[\begin{array}{ccc}\hfill 3\hfill & \hfill 2\hfill & \hfill 4\hfill \\ \hfill 4\hfill & \hfill 3\hfill & \hfill 6\hfill \\ \hfill 5\hfill & \hfill 4\hfill & \hfill 8\hfill \\ \hfill 7\hfill & \hfill 6\hfill & \hfill 12\hfill \\ \hfill 9\hfill & \hfill 8\hfill & \hfill 16\hfill \\ \hfill 11\hfill & \hfill 10\hfill & \hfill 20\hfill \end{array}\right]\left[\begin{array}{c}\hfill \Delta {G}_{A}\hfill \\ \hfill \Delta {G}_{B}\hfill \\ \hfill \Delta {G}_{i,i+1}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \Delta {G}_{\mathit{CTPR}a2}\hfill \\ \hfill \Delta {G}_{\mathit{CTPR}a3}\hfill \\ \hfill \Delta {G}_{\mathit{CTPR}a4}\hfill \\ \hfill \Delta {G}_{\mathit{CTPR}a6}\hfill \\ \hfill \Delta {G}_{\mathit{CTPR}a8}\hfill \\ \hfill \Delta {G}_{\mathit{CTPR}a10}\hfill \end{array}\right]$$

The matrix on the right hand side only has a rank of 2, and thus there are an infinite number of solutions to the set of equations. Treating each helix as identical simply adds column 1 and 2, making the unknown corresponding to this column the sum of *ΔG _{A}* and

Consensus ankyrin repeats have been available for some time (Mosavi *et al*., 2002; Binz *et al*., 2003), and have been used successfully as a platform for protein design (Steiner *et al*., 2008). However, the application of Ising analyis to the folding of consensus ankyrin repeats has been relatively recent (Wetzel *et al*., 2008). To maintain solubility, Pluckthun and coworkers added capping repeats on both termini (called *N* and *C* respectively). This modification is similar to the C-terminal TPR-capping helix of Regan and coworkers, although the capping *N* and *C* ankyrin repeats designed by Pluckthun and coworkers are significantly different from their consensus sequences, with only 15/33 and 8/24 identities, respectively.

Using guanidine hydrochloride-induced unfolding, Pluckthun and coworkers obtained complete reversible unfolding transitions that could be used for Ising analysis for three constructs, *NI _{1}C, NI_{2}C* and

As can be seen from the solid lines in Figure 4, this model describes the three fitted unfolding transitions reasonably well. Fitted parameters from Wetzel et al. (Wetzel *et al*., 2008) are listed in Table 2, along with confidence intervals provided by the authors. Again, there is no description of how these confidence intervals were determined. Using the heteropolymer partition function described above, and the same bootstrap method for error analysis described to analyze the CTPR array, we obtain intrinsic and interfacial energies that agree within 1-2.5 kcal/mol, although we find significantly larger margins of uncertainty on the fitted parameters than the authors; these values are also higher than those obtained by the same error analysis of the CTPR data. One reason for the high level of parameter uncertainty may be none of the three analyzed constructs have their caps removed, making it difficult to separate their contribution to free energy from the other parameters. Representing the constructs as a system of linear equations with a single cap free energy gives

$$\left(\begin{array}{ccc}\hfill 2\hfill & \hfill 1\hfill & \hfill 2\hfill \\ \hfill 2\hfill & \hfill 2\hfill & \hfill 3\hfill \\ \hfill 2\hfill & \hfill 3\hfill & \hfill 4\hfill \end{array}\right)\left(\begin{array}{c}\hfill \Delta {G}_{\mathit{cap}}\hfill \\ \hfill \Delta {G}_{\mathrm{i}}\hfill \\ \hfill \Delta {G}_{i,i+1}\hfill \end{array}\right)=\left(\begin{array}{c}\hfill \Delta G{\xb0}_{{\mathrm{NI}}_{1}C}\hfill \\ \hfill \Delta G{\xb0}_{{\mathrm{NI}}_{2}C}\hfill \\ \hfill \Delta G{\xb0}_{{\mathrm{NI}}_{3}C}\hfill \end{array}\right)$$

In the coefficient matrix, half the first column plus the second column is equal to the third column, giving a rank of only 2, and again, an infinite number of solutions. Although at face value, this would severely compromise the accuracy of the fitted parameters, one feature of the unfolding transitions of Wetzel et al. may significantly narrow parameter confidence intervals: the appearance of a partial unfolding transition in the long native baseline of *NI _{3}C*. Interpreted as a separate unfolding event involving one or both caps, this pre-transition provides additional information about the stability of the caps relative to the internal repeats. It is as if, from this region of the unfolding transition, the authors have prepared the construct

A more direct way to obtain information on the contribution of the caps would be to prepare constructs that lack the caps. Although ankyrin consensus arrays lacking both caps show poor solubility, we have been able to prepare arrays that lack either one cap or the other (we will refer to these as *NR _{n}* and

By independently removing the capping repeats, we have been able to test a number of different parameterizations of the Ising model to determine the relative intrinsic stabilities and contributions of the caps to denaturant-induced unfolding. The fits shown in Figure 5 are from an Ising model with separate intrinsic free energies for each cap and consensus sequence (*ΔG _{N}, ΔG_{R}, ΔG_{C}*), a single interfacial energy (

Overall, the two consensus ankyrin repeat studies show a similar view of cooperativity in which the individual repeats are unstable, and the interfacial interaction is highly stabilizing (Table 2). Again, this is consistent with the high degree of cooperativity seen in solution, because single folded repeats should be rarified, and conformations with a large number of interfaces (blocks of consecutive folded repeats) should be maximized. Although this is qualitatively similar to what was seen in the CTPR study, cooperativity is much higher for the consensus ankyrin arrays. This is especially clear when the fitted Ising parameters from the CTPR studies are converted to whole-repeat (rather than single-helix) parameters. The intrinsic folding energy of an entire CTPR (*ΔG _{i,helix}*+

The fitted Ising parameters for the two ankyrin consensus arrays are in reasonable agreement (Table 2). Fitted *ΔG _{i}* values for consensus repeats and

The studies featured in this article show quite clearly that a simple nearest-neighbor model that has been highly successful in describing a wide variety of cooperative phenomena can be used to study repeat-protein folding, and extract quantitative interaction energies from real data. Although Ising-like models have been applied to model globular protein folding (for example, see (Munoz, 2001)), the heterogeneity of globular proteins and their intrachain contacts makes such models overparameterized, requiring assumptions about energy terms that come from informatics or from native state structures, rather than from first principles or measurements. A recent retrospective from Harold Scheraga, one of the major contributors to the application of Ising analysis to biopolymers, states of his epic research trajectory “it was soon realized that the helix–coil transition is not a good model for conformational changes in globular proteins, because the one-dimensional Ising model does not capture the cooperative features, embodied in the interplay between short- and long-range interactions, of the folding/unfolding transition of globular proteins” (Scheraga, 2008). Although repeat proteins differ from globular proteins in that they have structural simplicity and are somewhat elongated, they are the same in many other key respects. They have large, continuous hydrophobic cores, they have significant medium and long-range electrostatic interactions (Kloss and Barrick, 2008; Merz *et al*., 2008), and they are highly cooperative (Kloss *et al*., 2008). Thus, repeat-proteins provide a unique experimental system to dissect protein folding using this elegant model.

One of the most exciting aspects of the work featured here is that it provides an opportunity to understand protein folding cooperativity in quantitative and structural detail. Determination of *ΔG _{i,i+1}* provides a direct measure of long-range coupling within a folded protein. Further analysis of repeat proteins using the 1D Ising model should reveal not only the structural origins of this cooperativity, but how such cooperativity influences the kinetics of folding.

This work was supported by NIH grant RO1GM068462 to DB.

^{1}If there are experimental errors associated with the column on the right hand side, the solution will be inexact, but can be found using least-squares.

^{2}To obtain a more rigorous measure of parameter uncertainties, we have measured each unfolding transition at least three times, allowing us to use resampling methods to fit separate transitions and compare results. This resampling approach, which employs more data, cannot be directly compared with the other studies analyzed here, but it gives similar confidence intervals to those from the bootstrap method.

- Applequist J, Damle V. Thermodynamics of the Helix-Coil Equilibrium in Ologoadenylic Acid from Hypochromicity Studies. J. Am. Chem. Soc. 1965;87(7):1450–1458.
- Auton M, Holthauzen LM, Bolen DW. Anatomy of energetic changes accompanying urea-induced protein denaturation. Proc Natl Acad Sci U S A. 2007;104(39):15317–15322. [PubMed]
- Binz HK, Stumpp MT, Forrer P, Amstutz P, Pluckthun A. Designing repeat proteins: well-expressed, soluble and stable proteins from combinatorial libraries of consensus ankyrin repeat proteins. J Mol Biol. 2003;332(2):489–503. [PubMed]
- Bolen DW, Rose GD. Structure and energetics of the hydrogen-bonded backbone in protein folding. Annu Rev Biochem. 2008;77:339–362. [PubMed]
- Brush SG. History of the Lenz-Ising Model. Rev. Mod. Phys. 1967;39(4):883–893.
- Courtemanche N, Barrick D. Folding thermodynamics and kinetics of the leucine-rich repeat domain of the virulence factor Internalin B. Protein Sci. 2008;17(1):43–53. [PubMed]
- Crothers DM, Kallenbach NR. On the Helix-Coil Transition in Heterogeneous Polymers. J Chem. Phys. 1966;45(3):917–927.
- De La Cruz EM. Cofilin Binding to Muscle and Non-muscle Actin Filaments: Isoform-dependent Cooperative Interactions. J. Mol. Biol. 2005;346:557–564. [PubMed]
- DeLano WL. MacPyMOL: PyMOL Enhanced for Mac OS X. DeLano Scientific; Palo Alto: 2003.
- Groves MR, Barford D. Topological characteristics of helical repeat proteins. Curr Opin Struct Biol. 1999;9(3):383–389. [PubMed]
- Ising E. Z.Physik. 1925;31:253. Title Unavailable.
- Johnson ML. Nonlinear least-squares fitting methods. Methods Cell Biol. 2008;84:781–805. [PubMed]
- Kajander T, Cortajarena AL, Main ER, Mochrie SG, Regan L. A new folding paradigm for repeat proteins. J Am Chem Soc. 2005;127(29):10188–10190. [PubMed]
- Kajander T, Cortajarena AL, Mochrie S, Regan L. Structure and stability of designed TPR protein superhelices: unusual crystal packing and implications for natural TPR proteins. Acta Crystallogr D Biol Crystallogr. 2007;63(Pt 7):800–811. [PubMed]
- Kajava AV. Review: proteins with repeated sequence--structural prediction and modeling. J Struct Biol. 2001;134(2-3):132–144. [PubMed]
- Kloss E, Barrick D. Thermodynamics, kinetics, and salt dependence of folding of YopM, a large leucine-rich repeat protein. J Mol Biol. 2008;383(5):1195–1209. [PMC free article] [PubMed]
- Kloss E, Courtemanche N, Barrick D. Repeat-protein folding: new insights into origins of cooperativity, stability, and topology. Arch Biochem Biophys. 2008;469(1):83–99. [PMC free article] [PubMed]
- Kobe B, Kajava AV. When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. Trends Biochem Sci. 2000;25(10):509–515. [PubMed]
- Lenz W. Physik. Z. 1920;21:613. Title Unavailable.
- Lifson S, Roig A. On the Theory of Helix-Coil Transition in Polypeptides. J. Chem. Phys. 1961;34(6):1963–1974.
- Lubman OY, Kopan R, Waksman G, Korolev S. The crystal structure of a partial mouse Notch-1 ankyrin domain: repeats 4 through 7 preserve an ankyrin fold. Protein Sci. 2005;14(5):1274–1281. [PubMed]
- Main ER, Lowe AR, Mochrie SG, Jackson SE, Regan L. A recurring theme in protein engineering: the design, stability and folding of repeat proteins. Curr Opin Struct Biol. 2005;15(4):464–471. [PubMed]
- Main ER, Xiong Y, Cocco MJ, D’Andrea L, Regan L. Design of stable alpha-helical arrays from an idealized TPR motif. Structure. 2003;11(5):497–508. [PubMed]
- McGhee JD, von Hippel PH. Theoretical Aspects of DNA-protein interadtions: co-operative and non-co-operative binding of large ligands to a one-dimensional homogeneous lattice. J. Mol. Biol. 1974;86(469-489) [PubMed]
- Mello CC, Barrick D. An experimentally determined protein folding energy landscape. Proc Natl Acad Sci U S A. 2004;101(39):14102–14107. [PubMed]
- Merz T, Wetzel SK, Firbank S, Pluckthun A, Grutter MG, Mittl PR. Stabilizing ionic interactions in a full-consensus ankyrin repeat protein. J Mol Biol. 2008;376(1):232–240. [PubMed]
- Mosavi LK, Cammett TJ, Desrosiers DC, Peng ZY. The ankyrin repeat as molecular architecture for protein recognition. Protein Sci. 2004;13(6):1435–1448. [PubMed]
- Mosavi LK, Minor DL, Jr., Peng ZY. Consensus-derived structural determinants of the ankyrin repeat motif. Proc Natl Acad Sci U S A. 2002;99(25):16029–16034. [PubMed]
- Munoz V. What can we learn about protein folding from Ising-like models? Curr Opin Struct Biol. 2001;11(2):212–216. [PubMed]
- Niss M. History of the Lenz-Ising Model 1920--1950: From Ferromagnetic to Cooperative Phenomena. Arch. Hist. Exact. Sci. 2005;59(3):267–318.
- Pace CN. Determination and analysis of urea and guanidine hydrochloride denaturation curves. Methods Enzymol. 1986;131:266–280. [PubMed]
- Poland D, Scheraga HA. Theory of Helix-Coil Transitions in Biopolymers. Academic Press; New York: 1970.
- Poland D, Vournakis JN, Scheraga HA. Cooperative Interactions in Single-Strand Oligomers of Adenylic Acid. Biopolymers. 1966;4:223–235. [PubMed]
- Schellman JA. The Factors Affecting the Stability of Hydrogen-Bonded Polypeptide Structures in Solution. J. Phys. Chem. 1958;62(12):1485–1494.
- Scheraga HA. From helix-coil transitions to protein folding. Biopolymers. 2008;89(5):479–485. [PMC free article] [PubMed]
- Scholtz JM, Barrick D, York EJ, Stewart JM, Baldwin RL. Urea unfolding of peptide helices as a model for interpreting protein unfolding. Proc Natl Acad Sci U S A. 1995;92(1):185–189. [PubMed]
- Steiner D, Forrer P, Pluckthun A. Efficient selection of DARPins with subnanomolar affinities using SRP phage display. J Mol Biol. 2008;382(5):1211–1227. [PubMed]
- Strang G. Introduction to Linear Algebra. Wellesly-Cambridge Press; Wellesly, MA: 2005.
- Street TO, Courtemanche N, Barrick D. Protein folding and stability using denaturants. Methods Cell Biol. 2008;84:295–325. [PubMed]
- Wetzel SK, Settanni G, Kenig M, Binz HK, Pluckthun A. Folding and unfolding mechanism of highly stable full-consensus ankyrin repeat proteins. J Mol Biol. 2008;376(1):241–257. [PubMed]
- Zimm B, Bragg J. Theory of the Phase Transition between Helix and Random Coil in Polypeptide Chains. Journal of Chemical Physics. 1959;31(2):526–535.
- Zimm BH. Theory of “Melting” of the Helical Form in Double Chains of the DNA Type. J. Chem. Phys. 1960;33(5):1349–1356.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |