Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2922776

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Methods
- 3 Case 1: George W. Bush’s Approval
- 4 Case 2: Congressional Ideology
- 5 Simulations
- 6 Extensions of Method
- 7 Conclusion
- References

Authors

Related links

Polit Anal. Author manuscript; available in PMC 2010 August 17.

Published in final edited form as:

Polit Anal. 2010 Winter; 18(1): 57–77.

doi: 10.1093/pan/mpp032PMCID: PMC2922776

NIHMSID: NIHMS188093

Marc T. Ratkovic, Departments of Political Science and Statistics, University of Wisconsin, Madison, 110 North Hall, 1050 Bascom Mall, Madison, WI 53706;

See other articles in PMC that cite the published article.

Many social processes are stable and smooth in general, with discrete jumps. We develop a sequential segmentation spline method that can identify both the location and the number of discontinuities in a series of observations with a time component, while fitting a smooth spline between jumps, using a modified Bayesian Information Criterion statistic as a stopping rule. We explore the method in a large-*n*, unbalanced panel setting with George W. Bush’s approval data, a small-*n* time series with median DW-NOMINATE scores for each Congress over time, and a series of simulations. We compare the method to several extant smoothers, and the method performs favorably in terms of visual inspection, residual properties, and event detection. Finally, we discuss extensions of the method.

Political processes are generally stable and smooth but occasionally explosive around critical events (Pierson 2004). Simultaneously identifying the jumps and estimating a smooth curve pose a particular statistical challenge. In this article, we adapt a smoothing spline in a manner that allows identification of both the location and the number of breaks in a time series, producing a fitted function that represents both critical and secular change.

Our interest in this problem began when we noticed the “9/11 problem” while trying to fit smoothers to George W. Bush’s approval ratings, as illustrated in Fig. 1. Smoothers, to some extent or another, show an uptick in Bush’s approval well before 9/11. The spline in Fig. 1 would lead to the conclusion that the increase in approval began in mid August, which is clearly incorrect. Bush’s approval was slowly trending downward until 9/10, and then by 9/12 it was up around 80% or even higher. Before 9/11, the spline estimate systematically overestimates Bush’s approval, while systematically underestimating Bush’s approval post-9/11.

Smoothing splines are dashed and our method is solid. The smoothing splines show an uptick in Bush’s popularity in mid-July, well before 9/11.

Similarly, when using a loess smoother, optimal choice of the tuning parameter resulted in a curve nearly identical to the generalized smoothing spline, also missing 9/11. Decreasing the span until it picks up the 9/11 jump creates too much variance elsewhere. From a time series standpoint, although presidential approval may be described well *on average* as an autoregressive process (Erikson, MacKuen, and Stimson 2002), it is not *everywhere* the same process. The “memory” in the process immediately post-9/11 was rather short, whereas during more mundane times, the memory may be longer. Nathaniel Beck and colleagues, using a Kalman filter, insert a jump at 9/11, which does leads to a far better fit (Beck, Jackman, and Rosenthal 2006). Inserting the jump at 9/11, though, assumes the answer to the question we are asking. While admitting 9/11 as a break seems sensical from even a casual glance at the data, an equally strong case could be made for a wide variety of events, such as the invasion of Iraq or Afghanistan, the events surrounding Hurricane Katrina, and so on. We simply do not know when to stop adding jumps.

This brought to the fore the two questions central to our method. Given data with a time component (or any natural ordering), we develop a method that identifies both the location and the number of jumps, while fitting a smooth curve elsewhere. The method fits a smoothing spline to the data, while using a binary segmentation algorithm to sequentially add “jumps” to the spline’s unpenalized space. We develop a modified Bayesian Information Criterion (BIC) statistic as a stopping rule.

We use the method here on two different sets of observed data. The first is George W. Bush’s approval ratings across multiple pollsters between January 25, 2001, and October 24, 2007. Our algorithm picks out two critical events in Bush’s term plus or minus a day or two: September 11, 2001, and the invasion of Iraq. We also analyze the median of the first dimension DW-NOMINATE, both by Congress and by party. With the DW-NOMINATE medians, we detect jumps in the Congressional median between 1910–1912 and 1874–1876, most likely corresponding to the rebellion against Joseph Cannon and the end of Reconstruction. We find a jump between 1930 and 1932 for the Democratic party, corresponding to the beginning of the New Deal. We find a jump for the non-Democratic party in 1818–1820, which is most likely a false positive.

The method provides several advances over existing methods. Although there are several standard methods for finding a structural break in a parameter (as in Calderia and Zorn 1998, e.g.), ranging from the simple Chow test to Bayesian methods that estimate the location of the change point (Western and Kleykamp 2004), most do not offer a technique for finding both the location and the *number* of breaks. Recent work by Arthur Spirling has addressed this issue explicitly, through the use of reversible jump Markov chain Monte Carlo, in finding “turning points” in civilian casualties through the Iraq war(Green 1995; Spirling 2007b). In contrast to Spirling, we embed our method within a nonparametric framework through the use of smoothing splines. The method employed by Spirling allows for identification of breaks in a given parameter in a structural model, whereas we search for a series break in the mean function itself, rather than any particular parameter. We are able to avoid heavy parametric assumptions about either the systematic or the random component of the model, which allows extension to a broad array of spline models and semiparametric models (Gu 2002).

A rather straightforward change in loss function could easily allow for modeling limited dependent variables, similar to methods that search for breaks in parameters in generalized linear models (Spirling 2007a). This would allow for the identification of change points within limited dependent variables. Finally, to ease interpretation, our method returns a serial order of breaks and a corresponding modified BIC statistic. This allows the researcher to present the jumps in terms of importance, as well as flexibility in the stopping decision.

Our interest with the presidential polling data is in summarizing its progression over time in a clear way while accounting for noisy measurements taken at irregular times. Any summarization is a trade-off between an estimate that is too variable and too complex to be readily interpretable and one that is too smooth and glosses over too much information. We search in between for a parsimonious, intelligible, de-noised estimate.

In the interest of producing an interpretable, data-driven estimate of the mean function, researchers familiar with polling data will be familiar with loess smoothing techniques for summarizing the data (Cleveland 1979; Beck and Jackman 1997). As a generalization of the weighted average, the loess balances the fit between a smooth curve and no smoothing by a bandwidth parameter.

Selection, or estimation, of this bandwidth parameter is a common characteristic of many smoothing methods. Recent work by Luke Keele (2006, 2008) has introduced a political science audience to the issues involved with semi- and nonparametric smoothing, and we refer the reader there for background.

We wish to highlight the use of smoothing splines for obtaining a good functional estimate. We rely here on the functions in the gss library in *R* (Gu 2002; R Development Core Team 2008). In this article, we use the restricted maximum likelihood (REML) algorithm to estimate the smoothing parameter throughout (Krivobokova and Kauermann 2007).

Given data (*y _{i}*,

$${y}_{i}=f({x}_{i})+{\mathrm{\epsilon}}_{i}.$$

(1)

$$E({\mathrm{\epsilon}}_{i})=0.$$

(2)

$$\text{var}({\mathrm{\epsilon}}_{i})={\sigma}_{\mathrm{\epsilon}}^{2}.$$

(3)

The cubic smoothing spline fits a function that has two components. The first, unpenalized component is the linear trend to the data. The second, penalized component allows for a nonlinear fit. Controlled by a smoothing parameter, λ, the estimator is a compromise between the least squares line (λ → ∞) and complete interpolation of the data (λ = 0). The cubic smoothing spline minimizes the penalized residual sum of squares:^{1}

$${\widehat{f}}_{\text{ss}}={\text{min}}_{f}{\displaystyle \sum _{i=1}^{n}{({y}_{i}-f({x}_{i}))}^{2}+\lambda}{\displaystyle \int {\{f\u2033\}}^{2}\mathrm{d}t.}$$

(4)

The second component is the penalty on the “nonlinear” part. If *f* is the sum of a linear part and a smooth (*C*^{∞}) nonlinear part, for example, by admitting a Taylor expansion, then the linear part of the function will have second derivative zero and hence is not considered in the penalty. The smoothing parameter λ controls the trade-off between smoothness and the least squares regression line. The smoothing spline is widely applied, and software packages implementing cubic smoothing splines are common (Wahba 1990; Gu 2002).

These methods are applicable when we believe that the true function underlying our data is a smooth function measured with some noise. The 9/11 problem, though, highlighted an aspect of Bush’s approval data that the smoothing spline could not handle. We are interested in a function that is everywhere smooth except for a small number of discontinuous jumps. We develop a method for identifying discontinuous shifts in the mean function, fitting a single smooth function but modeling breaks in the intercept. Discontinuities of this form represent a problem for smoothers. They can be viewed as a form of model misspecification or omitted variable bias. The smoother, when it smooths over the jumps, will leave a distinct residual pattern: residuals will be systematically above on one side of the jump and below on the other. Figure 2 illustrates the residual pattern around 9/11 for both the smoothing spline and a smoothing spline with a partial spline added at 9/11.

This misspecification creates a problem for selecting the smoothing parameter. Consider the jumps as sustained shifts in the mean from one part of the function to another. Residual variance estimates will be overstated, by adding part of the “jump-to-jump” variance to the true residual variance.

If we knew the location of jumps in a function, we might add them directly to the unpenalized space of the spline. This would augment the unpenalized space, resulting in a partial spline (Gu 2002). If (*x*) = *I*{*x* > “9/11”}, then the partial spline is the minimizer

$${\widehat{y}}_{\text{PS}}=\text{arg}\phantom{\rule{thickmathspace}{0ex}}\underset{f,\mathrm{\beta}}{\text{min}}{\displaystyle \sum _{i=1}^{n}{({y}_{i}-f({x}_{i})-\mathrm{\beta}\mathrm{\varphi}({x}_{i}))}^{2}+\mathrm{\lambda}}{\displaystyle \int {\{f\u2033(t)\}}^{2}\mathrm{d}t.}$$

(5)

That is, known functions are not penalized and are accounted for in the fit directly. One might think of them as being initially subtracted out, and the remainder smoothed.

“Breaks” in the residuals can be detected by eyeball and automatically. We suggest the application of binary segmentation procedures (Sen and Srivastava 1975), which look for breaks in a constant mean function.

Let *ê*_{1},…, *ê _{n}* be the estimated residuals from the smoothing spline fit, after removing the linear trend. Define the partial sums

$${Z}_{i}={\left(\frac{1}{i}+\frac{1}{n-i+1}\right)}^{-1/2}\left(\frac{{S}_{i}}{i}-\frac{{S}_{n}-{S}_{i+1}}{n-i+1}\right).$$

(6)

We propose a slightly adjusted statistic. Denote _{i} as the estimated standard error of the first *i* residuals and _{i−} as the standard error of the last *i* residuals. Let *Z** = max_{1 ≤ i ≤ n}|*Z*_{i}*| for

$${Z}_{i}^{*}={\left({\widehat{\mathrm{\sigma}}}_{i}^{2}+{\widehat{\mathrm{\sigma}}}_{{(n-i+1)}^{-}}^{2}\right)}^{-1/2}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{{S}_{i}-\frac{{S}_{i}}{i}}{i}-\frac{{S}_{n}-{S}_{i+1}-\frac{{S}_{n}-{S}_{i+1}}{n-i+1}}{n-i+1}\right).$$

(7)

*Z** greatly outperformed *Z* in both the simulations and the observed data. We developed *Z** to account for the “messiness” in data normally encountered by political scientists. The residuals to the left and right of a given break point may be neither mean zero nor homoscedastic, and *Z** accommodates these possibilities, whereas *Z* does not. This better captures the nature of observational data, but our test is no longer uniformly most powerful, due to the unbalancedness of the design and ambiguity of degrees of freedom in the variance estimate.

Since we are not as interested in the inference problem, we need only assume that the data are reasonably symmetric about the mean function. The statistic *Z** serves to locate the most likely position of the discontinuity. It considers, at each possible break point, the extent to which the residual immediately to the left is above the mean to the left and the residual immediately to the right is above the mean to the right. The more these two terms differ, the more likely a discontinuous break occurs at each point. The weight term in front comes from the variance estimate, and experimentation revealed it effective in regulating erratic behavior at the boundaries.

This is a rough approximation. The pattern of residuals about the jump is not constant, but in the interest of finding a single break, binary segmentation is a reasonable, computationally simple approximation. As our examples below show, it is also quite feasible. We had experimented with modeling the strength of the break through a weighted average that aligns with residual patterns (e.g., wavelets, exponentially damped sine curves) as well as more exotic variable selection methods (Efron et al. 2004, e.g.), but the additional complexity of method added little to the performance of the algorithm. When choosing jumps, the simple statistic above performed as well as its more complex competitors.

Since this is an automated process, we require a stopping rule. We consider several here. The first is a modified BIC statistic, with
${\widehat{\mathrm{\sigma}}}_{\mathrm{\epsilon}}^{2}$ an estimate of residual variance, *n* the sample size, *n** the number of randomly selected knots,^{2} and *k* the number of jumps:

$$\text{arg}\phantom{\rule{thickmathspace}{0ex}}\underset{k}{\text{min}}\frac{\widehat{\lambda}\phantom{\rule{thinmathspace}{0ex}}{\displaystyle \int {\phantom{\rule{thinmathspace}{0ex}}\{\widehat{f}\u2033\phantom{\rule{thinmathspace}{0ex}}(t)\phantom{\rule{thinmathspace}{0ex}}\}}^{2}\mathrm{d}t}}{{\widehat{\mathrm{\sigma}}}_{\mathrm{\epsilon}}^{2}}+k\phantom{\rule{thickmathspace}{0ex}}\text{log}(n)-\frac{k}{2}\text{log}({n}^{*})+\frac{k}{2}\text{log}(2\pi ).$$

(8)

Appendix A contains a derivation of the traditional BIC statistic and our modification of it. Note how, in the event that the spline is fit to every observed data point (*n* = *n**), this statistic reduces to the traditional BIC statistic. The term with the log(2π) is asymptotically negligible since it does not grow in *n* or *n**, but simulations revealed that the term was helpful in reducing the false-positive rate for our smallest simulation (*n* = 100, *n** = 30). It did not make a large impact on the power of the larger simulations (*n* = 250, 500).

The first term is increasing in *k* through both the numerator and the denominator. As more jumps are added, the spline has to bend less in order to accommodate the jumps, so the spline fit is more linear and the numerator increases. The penalty in the numerator is left unit-free by dividing through by the residual variance estimate. As well, the residual estimate decreases with the addition of more jumps, decreasing the denominator. The same logic motivating BIC then leads to a “cost” of
$\text{log}(n)-\frac{1}{2}\text{log}({n}^{*})$ for the addition of each additional jump, in order to ensure convergence to the true model as *n* → ∞. The spline penalty term serves the role of the log-likelihood, acting as a measure of the divergence between the “true” model and the fitted values.

As a means for balancing model fit and size, the BIC provides an estimate to the Bayes’s factor. The BIC approximates the posterior probability that a given model is the true model, given a uniform prior over all candidate models. The statistic is not interpretable on its own but is useful in model selection. The most preferred model with the sequential smoothing spline is the one with the smallest modified BIC statistic as given by equation (8).

A second stopping rule comes from the logic of hypothesis testing. Although we recommend our modified BIC statistic, we present the following below. The reference distribution for the maximal *Z* is usually determined by permutation. This creates problems, though, because the reference distribution of each subsequent break must be conditioned on the selection of all previous breaks. The permutation method will increase in complexity rather quickly. As a simplification, we suggest considering the models as a sequential series of nested models. This suggests an approximate χ^{2} statistic to consider among them:^{3}

$$\frac{\widehat{\mathrm{\lambda}}\phantom{\rule{thinmathspace}{0ex}}{\displaystyle \int {\phantom{\rule{thinmathspace}{0ex}}\{\widehat{f}\u2033\phantom{\rule{thinmathspace}{0ex}}(t)\phantom{\rule{thinmathspace}{0ex}}\}}^{2}\mathrm{d}t}}{{\widehat{\mathrm{\sigma}}}_{\mathrm{\epsilon}}^{2}}+{c}_{\alpha ,k}^{*},$$

where *c** is the critical value of the χ^{2} statistic, with family-wise error rate α and number of breaks *k*:

$${c}_{\alpha ,\phantom{\rule{thinmathspace}{0ex}}k}^{*}={\chi}_{1-\alpha /{k}^{,k.}}^{2}$$

Using Bonferroni’s adjustment of α, the term 1 − α/*k* keeps the probability of making at least one type I error among all breaks below α. This approach has a noticeably lower threshold even at small sample sizes of 100 for the first several jumps; for this reason, we remain wary.

The final suggestion is using a combination of expert evaluation and common sense. As heuristics, we suggest a few options. First, if a selection may be a false positive, explore the selections afterward. If several selections after it are known jumps, the jump under question is more plausible. If there are no known jumps afterward, then its selection is less plausible. If the researcher has a specific prior distribution proportional to weights *w _{i}* over all possible jumps, the statistic
${w}_{i}|{Z}_{i}^{*}|$ may of course be used instead. Similarly, known breaks can be incorporated into the unpenalized space directly, and our method consequently implemented.

Second, we noticed that sometimes the BIC may be quite flat around the minimum. This may limit the discussion to the range of acceptable jumps, such as 3–5, even if the strict minimum occurs at 4. Similarly, the first few jumps result in a large decrease in the penalty term, but the effect decreases dramatically. Using either rule above, some jumps are clearly reasonable, some are questionable, and some are unreasonable. We suggest reporting either the BIC statistic or χ^{2} *p*-values along with the data, and the researcher may decide to take the first of the “questionable” jumps as the stopping points. If computational power permits, resampling methods may be used, but we have been satisfied with the performance of choosing the modified BIC-minimizing number of partial splines. That is the rule we use throughout the remainder of the article.

Finally, with reasonably large data sets, the cubic splines are fit only using a random subset of the data as “knots.” This introduces variance into the stopping rule statistic, so we recommend exploring either adding to the number of data points selected or a resampling method to help get a sense of the variance of the test statistics described here. With large data sets, of over 1000, increasing the number of knots can aid in detection, making the acceptance level of our BIC statistic less stringent. Increasing the number of knots, though, is not computationally prohibitive. Bush’s approval data discussed below have an *n* of 1533, and ssanova selects only 52 knots. Computation took about 12 s on a Pentium D processor. Using our algorithm with 200 subsampled data points lengthened the time to only 2 min and 11 s.

These guidelines may sound arbitrary, but, as illustrated below, our method is never “too” wrong. It is conservative, in that it rarely overestimates the number of jumps, and even with reasonably small sample sizes and in the presence of autocorrelated noise, the method is powerful.

We suggest the following recursive procedure for finding these jumps in smooth functions:

- Fit a cubic smoothing spline using the REML algorithm to estimate the smoothing parameter.
- Remove the linear trend from the residuals.
- Search the residuals for a break point, using the binary segmentation algorithm.
- Add the corresponding break to the partial spline’s unpenalized space.
- Repeat from 1 until the BIC stopping rule ends the procedure.

This will result in breaks δ_{1},…, δ_{k}, and a mean function estimate of the form

$${\widehat{y}}_{i}=\widehat{f}({x}_{i})+{\displaystyle \sum _{j=1}^{k}{\widehat{\mathrm{\beta}}}_{j}I({x}_{i}>{\mathrm{\delta}}_{j})},$$

where is a smooth spline estimate. The estimate _{j} gives the estimated magnitude of the discontinuity and is reported when fitting partial splines. The standard errors on this coefficient will be wrong, though, since they do not account for the sequential nature of the selection and fitting. Were this magnitude of interest, and confidence intervals desired, we recommend a parametric bootstrap. Our algorithm sequentially augments the unpenalized space of the spline, fitting fixed jumps at specific points. Although estimating the number of jumps simultaneously may be preferred, and in more than one dimension is most certainly preferable, we are happy with the performance of the method in one dimension. We illustrate the procedure below on two observed data sets, Bush’s approval and DW-NOMINATE scores, as well as through a series of simulations.

The data in this section are 1533 estimates of George W. Bush’s approval, from 33 different houses, between January 25, 2001, and October 24, 2007. We observe polls on 1041 dates over the total 2463 dates. Although the polls are taken over multiple days, we consider the date range to be the end date on which the poll is conducted. The approval estimates range from 27% to 89%. The data are all publicly available.

Figure 3 shows the fit to each of the different methods, each in *R*. We use a moving average with a 2-week window, loess with spans .1 and .01 with command loess in library stats, the smoothing splines from command ssanova in library gss, a structural time series model using a Kalman filter from command StructTS in library stats, and our sequential segmentation method.

Results from each of the methods are shown in Fig. 3. Our sequential segmentation method outperforms, in terms of mean squared error (MSE), both the loess with span .1 and the smoothing splines. Our method performs worse, in terms of MSE, than the Kalman filter and the loess with span .01, but the improvement in MSE comes at the cost of a rather jagged fit. Figure 4 shows a larger graph, comparing splines to the sequential segmentation method. Notice how splines smooth over “nonsmooth” events. For example, the bump in Bush’s ratings after 9/11 could not have started before the event; people had no expectations that the events would occur on a certain date and so they did not start revising their opinions of Bush upward before then. Note also how the “rally around the flag” effect, whereby a President’s popularity increases upon starting a war, is immediate. During the run-up to the war, Bush’s approval dropped regularly, but immediately upon the invasion, he got a bump up that then proceeded to erode. Our algorithm selected the “rally around the flag” effect as instantaneous, even when the invasion of Iraq was clearly expected.

Just as important, note how our method follows the spline method in areas after the events. From about mid-2004 on, our method follows the smoothing spline almost exactly. Our sequential segmentation method acts like a smoothing spline when appropriate.

The real payoff for the sequential segmentation method comes from looking at autocorrelation and the quantile-quantile (QQ) plots. Figure 5 shows the QQ plots, compared to a normal distribution, for the residuals using each of the six methods earlier. The other methods tend to do passably well in either having normally distributed errors (the second loess fit) or having low autocorrelation (the moving average, Kalman filter, and splines). Only our algorithm does well on both counts and noticeably better than the other methods.

QQ plots, for a normal distribution, for each of the smoothing methods. “ρ” gives the correlation in the residuals with lag 1. For ease of interpretation, the last five plots are on the same *y* axis.

Next, we consider the events selected as “jumps” in approval. The jumps and their corresponding BIC statistics are listed in Table 1. The trouble with evaluating whether these jumps are valid is two-fold. First, we have no idea what the “true” curve is or even if the concept of such a curve is theoretically coherent. We conceive of this curve, instead, as the consensus of approval ratings among pollsters. Looking at the dates chosen and seeing if anything happened involving approval considerations of Bush lead to a problem, though: given any day, you can find *something* of import that might affect approval toward Bush. Unfortunately, there is no objective listing, serial or otherwise, to which we can compare these results.

That said, we are confident in the dates selected. We are counting the date of each poll as the last day it was administered, so we look for critical events a few days before the breaks selected by our method. The two dates selected, September 12, 2001, and March 20, 2003, clearly correspond with major events that would plausibly affect approval scores in the Bush presidency (9/11, the invasion of Iraq on 3/17). Since we are taking the date of the poll as the final day of its administration, the method selects dates a few days past the event of interest.

The primary substantive conclusion is that presidential approval and quite likely other public mood trends are not smooth. Whereas recent theoretical work on macropolitical evolution views presidential approval as a smooth, AR(1) time series in Gallup approval (Erikson, MacKuen, and Stimson 2002), we find that future empirical work should search for and model multiple structural breaks.

The data in this section are three time series: the median estimates of the first dimension of DW-NOMINATE ideology scores by Congress and party. We consider two parties, the Democrats and the non-Democrats, a pooling of Republicans and Whigs for the sake of this article. We have data on each Congress from 1788 to 2004,^{4} the non-Democratic Party from 1800 to 2004, and the Democratic Party from 1794 to 2004. Higher scores are commonly interpreted as being more “conservative” on economic issues, whereas lower scores mean that the Congressional median is more “liberal” on economic issues (Poole and Rosenthal 1997). The data are publicly available.

Unlike Bush’s approval data, the median DW-NOMINATE data are evenly spaced through time, although with only one data point per Congress. Relative to Bush’s approval, this is a relatively small-*n* test of the method, with only 109 data points, versus 1533 above.

A second key difference is that, in these data, no apparent discontinuities appear. The raw data for the entire Congress can be found in the upper left-hand corner of Fig. 6, with fits from various methods in the subsequent boxes. We notice nothing in these data comparable to the “9/11” problem above.

A comparison of fits to the median DW-NOMINATE score by Congress among three different smoothing methods.

Despite the lack of visually obvious jumps, a long literature has debated as to whether some elections may be “critical” (Key 1955, 1959; Mayhew 2002). Most Congressional elections are “secular,” reflecting slow gradual change, whereas some scholars consider a select few as “critical,” in that they result in rapid, sustained shifts in partisan composition of the public. This dynamic, as expressed through Congress members, captures the nature of the “smooth + jump” –type function that we are estimating here.

As suggested by the realignment literature, we rely rather heavily on our model’s assumptions: that median NOMINATE score is the sum of a smooth and discontinuous part. Comparing the fit of our method to the spline and time series method highlights how statistical findings are dependent on the assumptions under the statistical model. A researcher searching for smooth cycles in Congressional history may prefer either of the bottom two graphs. A researcher searching for a smooth curve with a few jumps would prefer our model. Our algorithm, though, is both consistent with the data and reduces to a spline in the limiting case.

Most of the discovered jumps, as shown in Table 2, are easily attributable to commonly accepted shifts in Congressional behavior. The two shifts selected for the entire Congress via BIC are 1910–1912 and 1874–1876. The first can be attributed to the rebellion against Speaker Joseph Cannon, which shifted power from the Speaker to the committee chairs. The second can be attributed to the end of Reconstruction. The next few dates, 1932, 1896, and 1860, though not selected by BIC, have long been considered the canonical dates for “realignment” (Mayhew 2002). The 1930–1932 break for Democratic median votes is clearly attributable to the New Deal. A single shift, in 1818–1820, was selected for the non-Democratic party. We worry this may be a false positive and may be a random artifact of pooling all non-Democratic parties.

The algorithm performed well, even in this noisy, imprecise setting. Plotting the data revealed no apparent, obvious jumps, and condensing the NOMINATE scores down to 109 points is a gross oversimplification of Congressional behavior and evolution. Given all this, our method yielded reasonable results, selecting only one false positive and a series of dates otherwise that correspond with well-known shifts in Congressional makeup and behavior.

We conduct 24 separate simulations in order to evaluate our method. We assume two different systematic components, where *f* is a Bessel function of the second type and *x* in the interval [0, 1000]. We ran each simulation 1000 times each, comparing our algorithm to both smoothing splines (function ssanova in *R* library gss) and a Kalman filter (function StructTS in *R* library stats). The characteristics of the simulations are below.

- Model specifications:$${\text{sim}}_{\text{jump}}({x}_{i})=-2f({x}_{i}/100)+8I({x}_{i}>200)-4I({x}_{i}>500)+2I({x}_{i}>800)+{u}_{i},{\text{sim}}_{\text{nojump}}({x}_{i})=-2f({x}_{i}/100)+{u}_{i}.$$
- Variance specifications:
- Gaussian noise : var(
*u*) {1, 4}; cor(_{i}*u*,_{i}*u*_{i−1}) = 0, - AR(1) noise : var(
*u*) {1, 4}; cor(_{i}*u*,_{i}*u*_{i−1}) = .4. - Sample sizes used:
*n*{100, 200, 500}.

Figure 7 contains an example of the simulation with jumps as well as one of our fits to it. The first jump is significant at the 5% level in all situations. The second jump is significant only in our less noisy simulations, and the third jump is never significant at 5%.

We vary the simulations by sample size, error variance, and variance structure. Table 3 contains a list of each of the simulations we conduct for both sim_{jump} and sim_{nojump}, as well as the percent of times each number of jumps was selected by our algorithm. The simulations all contain either independent, Gaussian noise or AR(1) noise, with correlation .4.

Our simulations demonstrate four desirable aspects of our algorithm. First, as shown in the bottom half of Table 3, the algorithm has a low false-positive rate when there are no jumps in the model. When there are no jumps, the algorithm will reduce to the spline, with an acceptable false-positive rate. For small sample sizes, and in our less noisy scenarios, the false-positive rate is approximately 10%, but the false-positive rate drops dramatically as the sample size gets to 500. False-positives rates are acceptably low in the other simulations, even for *n* = 100. The method is robust against error misspecification, and the false-positive rate drops to zero as the sample size increases. When there are no jumps in the true function, our method uncovers false positives at an acceptable rate.

Second, the method is powerful. Table 4 shows the percent of the time each of the three breaks was selected. The largest break, at *x _{i}* = 200, is selected the most. By

Third, the algorithm provides, on average, gains in squared error, and at best, the algorithm results in only modest losses versus the smoothing splines and time series method, as illustrated in Table 5. We calculated squared error as the sum of the squared difference between the fitted and true values. As shown in the top half of Table 5, our method performs well relative to the other methods. The Kalman filter proves marginally better, about 4%, in 2 of the 12 simulations with jumps. Improvements in squared error over the Kalman filter and splines in the remaining simulations range between 1% and 55%. Of the 12 simulations with jumps, our algorithm outperforms the Kalman filter by more than 20% 9 of the 12 times. Our algorithm similarly outperforms the splines by more than 20% 6 of the 12 times.

The bottom half of Table 5 illustrates the results when the systematic component does not contain jumps. In every instance, our algorithm outperforms the Kalman filter, ranging between 6% and 83% gains in squared error loss. Our method performs comparably to the smoothing spline. The only exception is in the smallest, least noisy simulation, where splines outperform our algorithm by 19%. Of the remaining 11 simulations, the spline never provides an improvement of more than 6%, and in 6 of the 12 simulations, our algorithm differs from the smoothing spline by less than 1%.

Finally, the method is robust with respect to correlated errors. While our algorithm is less powerful in the presence of AR(1) noise around the discontinuous, smooth function, by modest sample sizes of 200, the algorithm performs well. Even with AR(1) noise, our method maintains a small false-positive rate. In the presence of breaks and AR(1) noise, our algorithm outperforms splines in each case, and in the absence of breaks, splines never provide more than a 6% advantage in squared error over our algorithm. In 2 of the 12 simulations with AR(1) noise, our algorithm performs comparably to the Kalman filter; in the remaining 10, it provides substantial improvement, ranging between 25% and 86%.

Our method admits several interesting and useful extensions. First, we have presented it so far as a modeling, rather than inferential, tool, although resampling methods could provide confidence intervals. Since the search for breaks is sequential, the nature of the distribution under the null hypothesis is not clear; each subsequent break is conditioned on the occurrence of the earlier breaks. Entering unpenalized covariates of interest into the nonpenalized space also allows for inference in a semiparametric setting.

Second, our method can extend to a broad array of spline models.^{5} These include extensions to higher dimensions, through thin-plate splines (Pearce and Wand 2006). Our method can help find jumps in geographic data or any situation in which the context dictates estimating a functional form that is smooth with breaks. Since the spline is a limiting case, and our cases above show that the method can “act like a spline” when it needs to, we hope to generalize the algorithm to where it can be useful across a broad array of data sets and questions.

Third, we have so far characterized our jump covariates as a series of indicator functions. The result is a form of a sequential cumulative sum tests, with a spline fit in between each identification of a jump. The jumps, though, could instead contain covariates that are of interest to the researcher. For example, using our Bush data, we could look for jumps in approval as a function of economic news, Congressional approval, etc. This would require simply ordering the outcome by a different covariate than time and using our sequential segmentation spline along this dimension.

Finally, we have developed the method so far with few constraining distributional assumptions about either the systematic component or the error. Stronger assumptions can be made, as necessary, in order to characterize a likelihood that could handle choice models, count models, and other limited dependent variable models.

This project began with the goal of estimating house-level effects on presidential approval. In estimating these effects, it quickly became apparent that extant smoothing methods missed the underlying dynamic of presidential approval. In fact, within much political data of interest, “smoothing” appears to be presumptuous. Many processes are a mixture of both slow-moving change and immediate jumps due to rapid shocks. Smoothers miss the nature of the underlying function, whereas structural break tests offer no stopping criterion. This led us to consider first, how to think about these processes and, second, how to find the breaks and when to stop doing so.

Our method has performed well both in simulations and on observed data. Within a dense presidential approval data set, our method discovered breaks that correspond with clearly identifiable shocks. Within a relatively sparse Congressional data set, it was equally successful at highlighting important dates. The simulations further reenforce our faith in the method.

The “smooth + jump” function we describe here could apply to a wide variety of political processes that face critical events. The algorithm allows a flexible means to model other social and physical processes of interest. Examples include change point models or any model that must accommodate some smooth curve with the occasional persistent shock. In the face of a known exogenous shock, but uncertainty over the particular timing of the impact of the shock, our sequential segmentation spline will allow for estimation of the existence and most likely location of the discontinuity.

Testing for the existence of a structural break at a given point is straightforward. Here, we provide a method for solving a far harder problem: searching through all possible breaks, adding jumps sequentially, and stopping at a reasonable point. We plan to extend the method in the future to limited dependent variables and higher dimensional settings.

We thank Charles Franklin for providing the presidential approval data and for many useful conversations. We thank Grace Wahba, participants in the University of Wisconsin-Madison’s Models and Data Seminar, Sample Survey Seminar, and Fitting Curves to Data Seminar for many useful comments. Replication materials and relevant code are available on the *Political Analysis* Web site.

This appendix provides a derivation of the modified BIC statistic used as our stopping rule. We follow the presentation by Adrian Raftery (1995), whereas a more technical discussion can be found elsewhere (Tierney and Kadane 1986).

Given observed data *D*, candidate models *M _{i}*,

$$p(D|{M}_{i})={\displaystyle \int p(D|{\mathrm{\theta}}_{i},{M}_{i})}p({\mathrm{\theta}}_{i}|{M}_{i})\mathrm{d}{\mathrm{\theta}}_{i}.$$

(A1)

Letting *g*(θ_{i}) = log {*p*(*D*|θ)*p*(θ)}, we now expand *g*(θ) around the posterior mode, _{i}:

$$g(\mathrm{\theta})=g(\tilde{\mathrm{\theta}})+{(\mathrm{\theta}-\tilde{\mathrm{\theta}})}^{T}g\prime (\tilde{\mathrm{\theta}})+\frac{1}{2}{(\mathrm{\theta}-\tilde{\mathrm{\theta}})}^{T}g\u2033(\tilde{\mathrm{\theta}})(\mathrm{\theta}-\tilde{\mathrm{\theta}})+o({\Vert \mathrm{\theta}-\tilde{\mathrm{\theta}}\Vert}^{2}),$$

(A2)

*g*′ and *g*″ denote the gradient vector and Hessian matrix of *g* with respect to θ. Since is the posterior mode, we know that the linear term above is zero, since *g*′ () = 0. This leaves the approximation

$$g(\mathrm{\theta})\approx g(\tilde{\mathrm{\theta}})+\frac{1}{2}{(\mathrm{\theta}-\tilde{\mathrm{\theta}})}^{T}g\u2033(\tilde{\mathrm{\theta}})(\mathrm{\theta}-\tilde{\mathrm{\theta}}).$$

(A3)

Exponentiating both sides and noting that the integral is proportional to the integral of a multivariate normal give the result (MacKay 2003)

$$p(D)\approx \text{exp}\phantom{\rule{thinmathspace}{0ex}}(g(\tilde{\mathrm{\theta}})){\displaystyle \int \phantom{\rule{thinmathspace}{0ex}}\text{exp}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{1}{2}\xb7{(\mathrm{\theta}-\tilde{\mathrm{\theta}})}^{T}g\u2033\phantom{\rule{thinmathspace}{0ex}}(\tilde{\mathrm{\theta}})(\mathrm{\theta}-\tilde{\mathrm{\theta}})\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{d}\mathrm{\theta}}$$

(A4)

$$=\text{exp}\phantom{\rule{thinmathspace}{0ex}}\left(g\phantom{\rule{thinmathspace}{0ex}}\left(\tilde{\mathrm{\theta}}\right)\right){(2\pi )}^{\frac{d}{2}}|A{|}^{-\frac{1}{2}}.$$

(A5)

Assume that *A* approaches infinity as some *nA*_{0}, with *A*_{0} the asymptotic information matrix. Then log|*A*| approaches infinity as log|*nA*_{0}| = log(*n ^{d}*|

$$\underset{n\to \infty}{\text{lim}}\phantom{\rule{thinmathspace}{0ex}}\text{log}\phantom{\rule{thinmathspace}{0ex}}p(D)\approx \underset{n\to \infty}{\text{lim}}\phantom{\rule{thinmathspace}{0ex}}\text{log}\phantom{\rule{thinmathspace}{0ex}}p(D|{\widehat{\mathrm{\theta}}}_{i})-\frac{1}{2}\text{log}|A|$$

(A6)

$$=\underset{n\to \infty}{\text{lim}}\phantom{\rule{thinmathspace}{0ex}}\text{log}\phantom{\rule{thinmathspace}{0ex}}p(D|{\widehat{\mathrm{\theta}}}_{i})-\frac{1}{2}\text{log}({n}^{d}|{A}_{0}|)$$

(A7)

$$=\text{log}\phantom{\rule{thinmathspace}{0ex}}p(D|{\widehat{\mathrm{\theta}}}_{i})-\frac{d}{2}\text{log}(n).$$

(A8)

The first term on the right-hand side above is the log-likelihood, whereas the second is a measure of the dimensionality of the model. BIC is an asymptotic result, balancing the trade-off between model likelihood and dimensionality, with the model corresponding to the highest BIC generally selected. Often, it is written as −2·log-likelihood + *d*·log(*n*), in which case the model with the smallest BIC would be selected. The BIC provides an estimate of the posterior probability given a uniform distribution across all candidate models. If the correct model is a candidate, it will be chosen with probability one as the sample size approaches infinity.

We modify the BIC to account for the fact that the generalized smoothing spline fits a spline to a random subset of knots, due to the difficulties involved in inverting large matrices. Assume, with sample size *n*, that the spline is fit to *n** knots, chosen at random. This leads to covariance matrices |*A*| and |*A**|, each of which approaches the same |*A*_{0}|, as *n* and *n** increase.

Taking the approximations,

$$\mathit{\text{nA}}\approx {n}^{*}{A}^{*},$$

(A9)

$$\frac{n}{{n}^{*}}A\approx {A}^{*},$$

(A10)

$$\text{log}|{A}^{*}|\approx \text{log}\phantom{\rule{thinmathspace}{0ex}}\left\{{\left(\frac{n}{{n}^{*}}\right)}^{d}|A|\right\}.$$

(A11)

Given that *A* grows as *nA*_{0}, this leaves

$$\text{log}|{A}^{*}|\approx \text{log}\phantom{\rule{thinmathspace}{0ex}}\left\{{\left(\frac{n}{{n}^{*}}\right)}^{d}|{\mathit{\text{nA}}}_{0}|\right\}$$

(A12)

$$=\text{log}\phantom{\rule{thinmathspace}{0ex}}\left\{{\left(\frac{{n}^{2}}{{n}^{*}}\right)}^{d}|{A}_{0}|\right\}$$

(A13)

$$=2d\text{log}(n)-d\text{log}({n}^{*})+\text{log}|{A}_{0}|.$$

(A14)

Substituting equation (A14) into equation (A6) leaves our modified BIC statistic, where the first term in the sum is the divergence of the sampled data, *y**, which is sampled from the complete set of observed outcomes, *y*:

$$\text{BIC}(n,{n}^{*},i)=\text{log}P({y}^{*}|{\widehat{\mathrm{\theta}}}_{i})-d\text{log}(n)+\frac{d}{2}\text{log}({n}^{*})+\frac{d}{2}\text{log}(2\pi ).$$

Rather than a likelihood, we use the penalty on the nonlinear part as the measure of divergence. The logic carries through identically upon noting the correspondence between the cubic smoothing splines used here and penalized regression (Pearce and Wand 2006). Note how, as *n** gets closer to *n*, our modified BIC approaches the actual BIC. Simulations with *n* = 100 and *n** = 30 indicated that the term with log(2π) was necessary to maintain a small false discovery rate in such a small sample. This asymptotically negligible term did not affect the power in the larger-*n* simulations, though.

We use this statistic as a stopping rule throughout our analysis and simulations.

^{1}We formulate the problem in a least squares framework, but distributional assumptions could be added so as to characterize the data-generating process (see Gu 2002 for an extensive discussion with examples). This would result in a penalized likelihood rather than penalized regression.

^{2}Splines select a random subset of knots due to computational difficulties in inverting large matrices. Even with relatively few knots, estimates are quite stable and accurate.

^{3}Since we are not making any distributional assumptions about the error term, the χ^{2} statistic is only an approximation.

^{4}For clarity, we refer to each Congress by the year in which it was elected rather than its number, that is, “2006” rather than “110th.”

^{5}Specifically, any function that lives in a Hilbert space and has a reproducing kernel. See Wahba (1990) for extensions.

Marc T. Ratkovic, Departments of Political Science and Statistics, University of Wisconsin, Madison, 110 North Hall, 1050 Bascom Mall, Madison, WI 53706.

Kevin H. Eng, Department of Statistics, University of Wisconsin, Madison, 1300 University Ave., Madison, WI 53706 ; Email: ude.csiw@civoktar.

- Beck Nathaniel, Jackman Simon. Beyond linearity by default: generalized additive models. American Journal of Political Science. 1997;42:596–627.
- Beck Nathaniel, Jackman Simon, Rosenthal Howard. Presidential approval: the case of George W. Bush. Working Paper; 2006.
- Calderia Gregory A, Zorn Christopher JW. Of time and consensual norms in the supreme court. American Journal of Political Science. 1998;42:874–902.
- Cleveland William S. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association. 1979;74:829–836.
- Efron Bradley, Hastie Trevor, Johnstone Iain, Tibshirani Robert. Least angle regression. Annals of Statistics. 2004;32:407–499.
- Erikson Robert S, MacKuen Michael, Stimson James A. The macro polity. Cambridge Studies in Public Opinion and Political Psychology. Cambridge, United Kingdom: Cambridge University Press; 2002.
- Green PJ. Reversible jump MCMC computation and Bayesian model determination. Biometrika. 1995;82:711–732.
- Gu Chong. Smoothing spline ANOVA models. Springer Series in Statistics. New York: Springer; 2002.
- Keele Luke. How to be smooth: automated smoothing in political science. Ohio State University; 2006. Unpublished Manuscript.
- Keele Luke. Semiparametric regression for the social sciences. London: Wiley and Sons; 2008.
- Key VO. A theory of critical elections. The Journal of Politics. 1955;17:3–18.
- Key VO. Secular realignment and the party system. The Journal of Politics. 1959;21:198–210.
- Krivobokova Tatyana, Kauermann Göran. A note on penalized spline smoothing with correlated errors. Journal of the American Statistical Association. 2007;102:1328–1337.
- MacKay David. Information theory, inference, and learning algorithms. Cambridge, United Kingdom: Cambridge University Press; 2003.
- Mayhew David. Electoral realignments: a critique of an American genre. New Haven, CT: Yale University Press; 2002.
- Pearce Nathan D, Wand Matthew P. Penalized splines and reproducing kernel methods. The American Statistician. 2006;60:233–240.
- Pierson Paul. Politics in time: history, institutions, and social analysis. Princeton, NJ: Princeton University Press; 2004.
- Poole Keith T, Rosenthal Howard. Congress: a political-economic history of roll call voting. New York: Oxford University Press; 1997.
- Raftery Adrian. Bayesian model selection in social research. Sociological Methodology. 1995;25:111–163.
- R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008.
- Sen Ashish, Srivastava Muni. On tests for detecting the change in the mean. The Annals of Statistics. 1975;3:98–108.
- Spirling Arthur. Bayesian approaches for limited dependent variable change point problems. Political Analysis. 2007a;15:387–405.
- Spirling Arthur. Turning points in the Iraq conflict: reversible jump Markov Chain Monte Carlo in political science. The American Statistician. 2007b;61:1–6.
- Tierney Luke, Kadane Joseph. Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association. 1986;81:82–86.
- Wahba Grace. Spline models for observation data. Philadelphia: Society for Industrial and Applied Mathematics; 1990.
- Western Bruce, Meredith Kleykamp. A Bayesian change point model for historical time series analysis. Political Analysis. 2004;12:354–374.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |