The observed data consist of independent observations on the triple (*X, δ*, **z**), where *X* is the minimum of a failure and censoring time pair (*T, C*), *δ* = *I*(*T* ≤ *C*) is the indicator of the event that a failure has been observed and **z** = (*z*_{1}, … , *z*_{p})′ is a *p* vector of covariates. The random variables *T* and *C* denote the survival and censoring times respectively which are assumed to be independent.

The fundamental assumption of proportionality of hazards in the Cox PH model (

1.2) requires that the hazards ratio remains constant over time. The partial likelihood approach of

Cox (1972) is the standard inferential method for this model. The baseline hazard, λ

_{0}(

*t*), drops out of this partial likelihood and can be left completely unspecified (

Cox, 1972). On the other hand, it is evident from

equations (2.4) and (

2.7) that the generalized model (

2.2) allows crossing of hazard curves over time. Due to the non-constant hazards ratio, a factorization of the likelihood into a “partial” likelihood and a nuisance part is not possible. The baseline hazard must be specified in the likelihood and needs to be explicitly estimated. There are several methods available for estimation in this model. For example,

Bordes & Breuils (2006) discussed sequential estimation for semi-parametric models with application to the PH model, while

Wu & Hsieh (2009) adopted a piecewise constant approach to estimating the baseline hazard in model (

2.2). One could use a set of parametric basis functions, such as splines, to approximate the baseline hazard and then proceed with maximum likelihood (ML) methods to estimate the parameters of the model. Splines have been used to model the baseline hazard in the Cox PH model by

Durrelman & Simon (1989), Gray (1992) and

Rosenberg (1995).

Models based on spline approximations are intermediate in structure between completely parametric models and non-parametric methods, and provide flexibility in incorporating a variety of shapes. Under certain regularity conditions,

Devarajan (2000, pp.53-58) showed that the maximum penalized likelihood estimate of the baseline hazard and baseline cumulative hazard in model (

2.2) are exponential splines with knots at the unique failure times. This provides us with a theoretical justification for using spline functions to model the baseline hazard and cumulative hazard functions.

de Boor (2001, Chapter IX, p.87) describes in detail the computational properties of

*B-spline basis functions*. The

*B*-spline basis function has the properties of a probability density function and a linear combination of such functions is like a mixture of densities that allows for various distributions. In this paper, we use a set of cubic

*B*-spline basis functions to model the baseline hazard.

3.1 Modeling the Baseline Hazard using B-Splines

We assume that the baseline hazard function, λ

_{0}(

*t*), and its first two derivatives are continuous and model it as a linear combination of cubic

*B*-spline basis functions. This model consists of

*K* + 1 segments determined by

*K* knots

*t*_{0} <

*t*_{1} <

*t*_{2} < … <

*t*_{K} <

*t*_{K+1} over the interval [

*t*_{0},

*t*_{K+1}] subject to the constraints that adjacent functions and their first two derivatives agree at the knots. It requires total of 4(

*K* + 1) parameters and

*K* + 4 basis functions (

Devarajan (2000, p.72);

de Boor (2001)).

We also define six additional ‘slack’ knots (

Rosenberg (1995)) given by

*t*_{−3} =

*t*_{0} − 3,

*t*_{−2} =

*t*_{0} − 2,

*t*_{−1} =

*t*_{0} − 1,

*t*_{K+2} =

*t*_{K+1} + 1,

*t*_{K+3} =

*t*_{K+1} + 2 and

*t*_{K+4} =

*t* _{K+1} + 3 where

*t*_{0} = min(

*x*_{i}),

*t*_{K+1} = max(

*x*_{i}) and (

*x*_{i},

*δ*_{i}),

*i* = 1, … ,

*n* are right-censored observations. Using the result from

Schumaker (1981, p.46), the

*cubic B-Spline basis functions* are given by

where

*x*_{+} =

*x* if

*x* ≥ 0 and zero otherwise. Using (

2.2) and (

2.4), the log-likelihood for right-censored observations (

*x*_{i},

*δ*_{i}),

*i* = 1, … ,

*n* where

*δ*_{i} is the censoring indicator, is given by

Using a linear combination of

*B*-spline basis functions, the baseline hazard is given by

with

*B*_{k}(

*t*) defined as in (

3.1). With the number of knots fixed at

*K*, the order of approximation attainable is

*O*(

*K*^{−4}) (

Devarajan (2000, p.73);

de Boor (2001)).We take

*η*_{k} =

*e*^{αk} to ensure positivity of the hazard function λ

_{0}(

*t**α*). The cumulative baseline hazard function is

where

is a

*quartic B-spline basis function*. Since

*I*_{k}(

*t*) is monotone increasing and positive and

*e*^{αk} is positive for each

*k*, monotonicity and positivity of the cumulative hazard function is ensured. Now, the log-likelihood given by (

3.2) becomes,

This equation is maximized with respect to the model parameters

*α*,

*β* and

*γ* to obtain the maximum likelihood (ML) estimates. Using the second partial derivatives of the likelihood function with respect to the model parameters, we obtain the Fisher information matrix and hence the variance-covariance matrix of the parameter estimates. If

*n* → ∞ at a rate faster than the rate at which

*K* → ∞, then one can show that our proposed estimators are consistent estimators and are normally distributed (

Devarajan, 2000, pp.73-74).

3.2 Implementation

ML estimation of model parameters was implemented using the Broyden-Fletch-Goldfarb-Shanno (BFGS) method for the unconstrained optimization of a nonlinear multivariate function in double precision. Routines from the IMSL Fortran Library (DU2ING and DU2POL) were used for maximization of the likelihood function with respect to model parameters *α*, *β* and *γ* to obtain the ML estimates. Multiple restarts of the optimization procedure were performed with various starting values to check the consistency and validity of the results.

3.3 Knot Selection

An integral part of the estimation procedure using any spline approximations is the choice of the number and location of the knots. In general, it is desirable to have the knots reasonably spread out with roughly equal number of data points between successive knots. Knot selection has been discussed by many authors in the context of hazard estimation. We develop a semi-automatic knot selection procedure that involves initial knot placement based on quantiles equally spaced in percentile ranks of the uncensored observations followed by stepwise knot deletion and final model selection based on

*Akaike’s Information Criterion (AIC)*. During the stepwise deletion process, a sequence of models indexed by

*ν* is obtained such that the

*ν*^{th} model has

*n*_{ν} parameters. In each step, only the significant parameters are retained. The significance of any parameter associated with a covariate or a knot, say

*β*, is determined by its Wald

*z*-score. All non-significant knots are removed from the likelihood function at each stage as they do not provide a good approximation to the baseline hazard. If

*l*_{ν} denotes the log-likelihood of the

*ν*^{th} model, then the

*AIC* for this model, with penalty parameter

*a* = 2 is given by

The model *ν* corresponding to the minimum value of *AIC*(*a* = 2, *ν*,*K*) is chosen as the final model where *n*_{ν} is the number of significant parameters in the final model. This method provides a balance between computational complexity and model flexibility.