I-splines are monotone splines that have the most obvious applications in monotone nonlinear regression problem, as discussed by Ramsay

15. Though attractive in their simple expression and theoretical properties, very few articles described real applications of I-spline techniques. Lu et al

16 and Wu

17 used I-splines in solving Maximum Likelihood Estimation (MLE) problems. In this paper, we identified a new application of I-splines to solve calibration problems. Let us start with an introduction of I-spline basic concepts. The

*l*-th order I-splines based on a knot sequence are defined by Ramsay

15 as

with

*L* ≤

*s* ≤

*U*, where

*L* and

*U* are the left and the right end knots of the knot sequence, respectively. The number of I-splines is

*q* =

*n*^{(knot)} +

*l* +1, where

*n*^{(knot)} is the number of interior knots (i.e., the knots that are not end knots in the knot sequence). Note that

*i* corresponds to the index of I-splines. Wu

17 identified an interesting relationship: each

is related to a B-spline

18
such that

, and therefore B-splines could be used to construct I-splines

It is easy to compute I-splines using formula (

2), since B-splines can be efficiently computed, and are already available in statistical packages. De Boor

19 showed that

, and that each

is nonnegative, which implies that I-splines in (

1a) and (

1b) are monotone and have function values between 0 and 1. The nonparametricity, monotonicity, and the range constraint between [0,1] altogether make I-splines good candidates to model distribution functions.

Given pre-calibrated prediction probabilities

*P* = {

*p*_{1},

,

*p*_{n}} and class labels

*C* = {

*c*_{1},

,

*c*_{n}}, we now show how to use I-spline based smoothing techniques to calibrate predictive models by solving a nonlinear monotone least square regression problem.

Define

as the space of I-spline functions. The monotone least square regression finds

*f*^{*} Ω that minimizes:

As mentioned before, I-splines are monotone and their values are between 0 and 1, the constraints for

in Ω guarantee each

*f* Ω is monotone with function values lay between 0 and 1. Given a knot sequence, I-splines are fixed, hence this monotone regression problem is actually minimizing (

3) with respect to I-spline coefficients

with constraints

*α*_{i} ≥ 0 for

*i* =1,

,

*q* and

. We can rewrite this problem as a maximization problem:

The next question is how to pick interior knots, which is always critical to any spline-based technique. Intuitively, there are two general rules:

- More interior knots should be added to allow more flexibility;
- More interior knots should be added where samples are frequently observed.

The second rule is easy to follow, after deciding the number of interior knots we could position according to the sample percentiles. But how to decide the number of interior knots is really decided on a case-by-case basis. Ramsay

15 mentioned very few interior knots are necessary, say, 1 or 2, for I-spline based regression problems. However, both Lu et al.

16 and Wu

17 chose the cube root of sample size as the number of interior knots for the MLE based spline estimations, and their experiments supported this choice. Given our sample size

*n*, we used max{1, (

*n*^{1/3} − 4)} as the number of interior knots, which works best for the proposed estimation in this paper when cubic splines are applied.

The computing for the maximization problem (

4) can be done by a generalized gradient projection algorithm

1. First we rewrite the constraints in (

4) as

*Xα* ≤

*y*, where

*X* = (

*x*_{1},

*x*_{2},

,

*x*_{q}_{+1})

^{T} with

*x*_{1} = (−1, 0,

, 0)

^{T},

*x*_{2} = (0,−1, 0,

, 0)

^{T},

,

*x*_{q} = (0,

, 0,−1)

^{T},

*x*_{q}_{+1} = (1,

,1)

^{T} ;

*α* = (

*α*_{1},

,

*α*_{q})

^{T} ; and

*y* = (0,

0,1)

^{T} . If some I-spline coefficients equal 0 or all coefficients sum up to 1, then we say their according constraints are active and let

*α* =

represent all active constraints, where rows of

and

are from a subset of rows of

*X* and

*y* .

is used to facilitate the computation.

Initially we put integers representing active constraints in vector Λ (including indexes of I-spline coefficients for those equal to 0 and (

*q* +1) when all coefficients sum up to 1). The vector Λ with

*r* scalars corresponds to an

*r* ×

*q* matrix

. For example, if Λ = (2,1,(

*q* +1)), then

= (

*x*_{2},

*x*_{1},

*x*_{q}_{+1})

^{T} .

We denote the target function

in (

4) as

*F*(α) with. Let

*F*(

*α*) and

*H*(

*α*) be gradient and Hessian matrix of

*F*(

*α*) with respect to

*α*, respectively. Let

*W* = −

*H*(

*α*) +

*γI*, where

*I* is an identity matrix, and

*γ* is set to be large enough to make

*W* positive definite. With that introduced, the generalized gradient projection algorithm is implemented as .