Social and physical scientists are often interested in how certain variables depend on one another. They often assume that a functional relationship exists between these two variables, and they run experiments or collect samples to learn about this relationship. They are aware, however, that the data does not follow a deterministic equation, but follows the general stochastic equation
where
Y is a dependent variable,
x is an independent or controlled variable, and ε is random error with mean 0. One of the simplest models relating a response,
Y, to a single predictor,
x, is the
pth–order polynomial regression model
While this model is appealing and can be very useful, it may be inadequate since it assumes that only one polynomial describes the average relationship between
Y and
x. A more complex model would allow several polynomials to do this. An example of such a model is a piecewise polynomial regression spline. A piecewise polynomial regression spline with
k knots divides the domain of
f(
x) into
k + 1 mutually exclusive regions, and to each region corresponds a unique polynomial describing the average relationship between
Y and
x. With different polynomials describing different parts of the function,
f(
x) is not restricted to have the same smoothness throughout its domain. A piecewise polynomial regression spline of order
p with
k knots at
tk = (
t1,
t2, …,
tk) is written as
where β = (β
0, β
1, …,
βp+k) is a fixed set of parameters and

.
A natural polynomial regression spline takes a form nearly identical to the piecewise regression spline seen above. It is different, however, in that it restricts the function to be linear at the boundaries. This sets the last term on the right-hand side of
(2) to 0. A natural polynomial regression spline of order
p thus takes the form
If
f (
x) were modeled as such,
(1) could then be re-written as
where

and β = (β
0, β
1, β
2, …, β
1+k), and for
m observations collected at (
x1,
x2, …,
xm), it would be written as
The natural polynomial regression spline written in
Equation (3) illustrates the concept behind splines, but the design matrix composed of the row vectors
c (
x1,
tk), …,
c (
xm,
tk) is unstable and thus rarely used. B-splines, which are computationally more stable, are preferred (
Hastie and Tibshirani, 1990;
Zhou and Shen, 2001). The functional form of a b-spline is more complex than that of the natural polynomial regression spline, but is readily available (
deBoor, 1978). We write it as
where
b(
x,
tk) is the design vector associated with a b-spline evaluated at
x with
k knots at
tk = (
t1,
t2, …,
tk), and α is a fixed set of parameters. For
m observations collected at (
x1, …,
xm), the model would be written as
With this model, estimating
f(
x) becomes a problem of estimating the number of knots,
k, the locations of those knots,
tk, and α.
A variety of methods have been developed in estimating
k and
tk.
Halpern (1973) approached this problem using Bayesian methods. Allowing knots to only be placed at the design points in the experiment, he considered all of the subsets of the design points. Halpern assigned prior probabilities to all of these subsets, and calculated the corresponding posterior probabilities of these subsets. His estimator of the function was based on these posterior probabilities. Denison, Mallick, and Smith (1998) placed priors on the number of knots,
k, and their locations,
tk. With these priors, they calculated a joint posterior distribution which included the variables
k and
tk and then sampled from this posterior distribution using reversible jump MCMC methods. They too restricted the knots to be located only at the design points of the experiment.
DiMatteo, Genovese, and Kass (2001) proposed a method similar to that of Denison, et al. They did not restrict the knots to be located only at the design points of the experiment and they penalized models with unnecessarily large values of
k by averaging over the fixed effect coefficients, α.
Friedman (1991) and
Stone et al. (1997) try to solve the problem of knot selection using backward and/or forward knot selection. This process continues until the “best” model has been identified.
Zhou and Shen (2001) used an alternative method in identifying the locations of the knots. They constructed an algorithm which favored adding knots in locations where several knots had already been added.
Lindstrom (1999) used similar methods when selecting knot locations.
Splines have also been used to model curves that vary within and between subjects (
Shi et al., 1996,
Rice and Wu, 2001,
Behseta et al., 2005,
Crainiceanu et al., 2005).
Brumback and Rice (1998) use splines to model nested functions which vary between subjects and between groups of subjects. Functional models which only account for variation between and within subjects take the general form
where
Yi (
xj) is the observation of the
ith individual at
xj,
f (
x) can be thought of as a population curve, and
Gi (
x) is a random curve specific to subject
i. While these functions can be modeled in a variety of ways, we model these two functions as free-knot b-splines with
k and
k′ knots located at
tk and
tk′, respectively. Setting
f(
x) =
b(
x;
tk)α and
Gi(
x) =
b(
x;
tk′)γ
i, where γ
i are independent random vectors such that γ
i ~
N(0, Σ
γ),
equation (4) becomes
Bigelow and Dunson (2007) consider a variation of this model and identify the location of the knots by sampling from the full posterior distribution. They lessen the computational burden of this sampling procedure by setting
tk′ =
tk and restricting Σ
γ to be diagonal. In this paper, we consider a more flexible model and explore the computational issues associated with it. To be more specific, we eliminate the restrictions imposed by Bigelow and Dunson and then try to identify the number and location of the knots by sampling from
p(
k,
tk,
k′,
tk′|
y). This posterior can not be sampled from exactly, however, because eliminating the diagonal restriction on Σ
γ makes the likelihood
p (
y|
k,
tk,
k′,
tk′) intractable. We thus sample from two approximations to this posterior; each approximation corresponds to a different approximation to the likelihood
p (
y|
k,
tk,
k′,
tk′). To assess the accuracy of these approximations, we compare the posterior distribution of knots in each case to the marginal distribution of knots when sampling from
p(
k,
tk,
k′,
tk′, ψ, σ
2|
y), where ψ is a set of covariance parameters and σ
2 is the within-subject variability. The marginal distribution of knots when sampling from this higher-dimensional posterior is exact as its corresponding likelihood is available in closed form. Sampling from this larger posterior, however, is less efficient because of the additional parameters that need to be sampled.
In Section 2, we discuss the linear model given in
(5) in more detail. In Section 3, the likelihood of interest,
p(
y|
k,
tk,
k′,
tk′), is discussed as are the two methods we use to approximate this likelihood. Section 4 shows the algorithm used to sample from the resulting approximate posterior distributions, and Section 5 shows the algorithm we use to sample from
p(
k,
tk,
k′,
tk′, ψ, σ
2|
y). Section 6 theoretically compares the approximations, and in Section 7, we describe a simulation study performed to explore the differences in these approximations. In Section 8, we apply the preferred approximation to a real data set.