Bernoulli (Andover). Author manuscript; available in PMC 2010 May 1.
Published in final edited form as:
Bernoulli (Andover). 2009 November 1; 15(4): 1010–1035.
PMCID: PMC2850000
NIHMSID: NIHMS177730

# Nonparametric estimation of a convex bathtub-shaped hazard function

## Abstract

In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n2/5 at points x0 where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.

Keywords: antimode, bathtub, consistency, convex, failure rate, force of mortality, hazard rate, invelope process, limit distribution, nonparametric estimation, U-shaped

## 1. Introduction

Information on the behavior of time to a random event is of much interest in many fields. The random event could be failure of a material or machine, death, an earthquake or infection by a disease, to name but a few examples. Frequently, this type of data is called lifetime data, and it is natural to assume that it takes values in [0, ∞). If the lifetime distribution F has a density f , then a key quantity of interest is the hazard (or failure) rate h(t) = f(t)/(1 − F(t)). Heuristically, h(t) dt is the probability that, given survival until time t, the event will occur in the next duration of length dt. The hazard function is also known as the force of mortality in actuarial science or the intensity function in extreme value theory.

Certain shape restrictions arise quite naturally for hazard rates. In this work, we are particularly interested in the family of hazard functions which are convex. That is, we attach the additional smoothness constraint of convexity to the more traditional assumption of a bathtub-shaped failure rate (that is, first decreasing, then increasing). Heuristically, bathtub-shaped hazards correspond to lifetime distributions with high initial hazard (or infant mortality), lower and often rather constant hazard during the middle of life and then increasing hazard of failure (or wear-out) as aging proceeds; see [20,23].

Many other estimators of hazard functions (and solutions to the closely related problem of estimating the intensity of a Poisson process) with and without shape restrictions have been considered in the literature; see [24] for a partial review up to 2002. In recent years, the focus has shifted to construction of “adaptive” estimators over large scales of smoothness classes; see, for example, [5,6,25]. Virtually all of these other estimators require careful choice of penalty terms or tuning parameters, and computation of the adaptive estimators typically involves methods of combinatorial optimization. As far as we know, reliable algorithms for computing them are not yet available. Our estimators avoid the choices of tuning parameters or penalty terms by virtue of the shape constraint of convexity and are relatively straightforward to compute since the corresponding optimization problems are convex.

Recall the definition of a convex function. Let $C⊂R+=[0,∞)$ be convex. Then $h:C↦R$ is convex (on C) if it satisfies

$h(λx+(1−λ)y)≤λh(x)+(1−λ)h(y),0<λ<1,$

for all x, y C. Equivalently, a function is convex if its epigraph

${(x,μ):x∈C,μ∈R,μ≥f(x)}$

is a convex set in $R2$ (see, for example, [27], Section 4). Thus, a convex function on C may be extended to a convex function on $R+$ by setting h(x) = +∞ for $x∈R+∩Cc$.

Suppose, then, taht we observe i.i.d. variables X1, … , Xn from a distribution F0 with density f0 and hazard rate h0. We denote the true cumulative hazard function by $H0(t)=∫0th0(s)ds$, and the true survival function by S0 = 1 − F0. Also, 0 < X(1) < X(2) < < X(n) denote the order statistics corresponding to X1, … , Xn.

To define the MLE of h0, $h^n$, we first consider the likelihood in terms of the hazard,

$L(h)=∏i=1nh(Xi)exp{−H(Xi)}=∏i=1nh(X(i))exp{−H(X(i))},$

where $H(t)=∫0th(s)ds$. This can be made arbitrarily large by increasing the value of h(X(n)). We therefore find $h^n:[0,X(n))↦R+$ by maximizing the modified likelihood

$Lmod(h)=∏i=1n−1h(X(i))exp{−H(X(i))}×exp{−H(X(n))}$
(1.1)

over $K+$, the space of non-negative convex functions on [0, X(n)). The full MLE is then found by setting $h^n(x)=∞$ for all xX(n). This is the same approach as taken in [10]. Equivalently, one could first impose the constraint that hM, and then let M → ∞ (see, for example, [26], page 338).

To illustrate the proposed estimator, consider the distribution with density given by

$f(t)=1+2b2Ab2+(1+2b)t∕Aon0≤t≤A.$

This distribution was derived in [14] as a relatively simple model with bathtub-shaped hazards which also has an adequate ability to model lifetime behavior. We will call this the HS distribution, after the authors. It has convex hazards for all values of b in the parameter space (b > −1/2). Figure 1 shows the MLE for simulated data from this distribution with sample size n = 100.

Examples of the estimator. Left: Estimating the HS hazard with b = 0 and A = 1 for a sample size of 100 (bold = true hazard, solid = MLE). Right: Estimation of the earthquake hazard from CPTI04 data (solid = MLE).

We also applied our estimators to the earthquake data of the Appennino Abruzzese region of Italy (Region 923) recently considered by [19], where Bayesian estimation methods are studied. The data comes from the Gruppo di Lavoro CPTI (2004) catalog [9]. It consists of 46 inter-quake times for Region 923, occurring after the year 1650 and with moment magnitude greater than 5.1 (details on the justification of these criteria is available in [19], page 14). Figure 1 shows the resulting estimator.

The main results of this paper are the characterizations, consistency and asymptotic behavior of the nonparametric MLE of a convex hazard function. The estimator is continuous and piecewise linear on [0, X(n)). Although we give a characterization of the MLE, the final form of the estimator is not explicit. We therefore propose an algorithm (based on the support reduction algorithm of [13]). This algorithm is discussed in a separate report, [16], and is available as the R package convexHaz [17].

To describe the local asymptotics of the MLE, we introduce the following process.

### Definition 1.1

Let W(s) denote a standard two-sided Brownian motion, with W(0) = 0, and define $Y(t)=∫0tW(s)ds+t4$. The function ${I(t):t∈R}$, the invelope of the process ${Y(t):t∈R}$ is defined as fallows:

$the functionIis above the functionY:I(t)≥Y(t)for allt∈R;$
(1.2)

$the functionIhas a convex second derivative;$
(1.3)

$the functionIsatisfies∫R{I(t)−Y(t)}dI(3)(t)=0.$
(1.4)

It was shown in [11] that the process $I$ exists and is almost surely uniquely defined. Moreover, with probability one, $I$ is three times differentiable at t = 0. The asymptotic behavior of all of our estimators may be described in terms of the derivatives of the invelope $I$ at zero. The following theorem builds on the basic results in [12] concerning nonparametric estimation of a decreasing convex density.

### Theorem 1.2

Suppose that h0 is convex, x0 > 0 is a point which satisfies h0(x0) > 0 and $h0′′(x0)mak0$, and $h0′′(⋅)$ is continuous in a neighborhood of x0. Then

$(n2∕5(h^n(x0)−h0(x0))n1∕5(h^n′(x0)−h0′(x0)))→d(c1I(2)(0)c2I(3)(0)),$

where $I(2)(0)$ and $I(3)(0)$ are the second and third derivatives at 0, respectively, of the invelope of $Y(t)≡∫0tW(s)ds+t4$ and where

$c1=(h02(x0)h0′′(x0)24S02(x0))1∕5,c2=(h0(x0)h0′′(x0)3243S0(x0))1∕5.$

The key to this result lies in Lemma 5.3, where we establish that the “touchpoints” (defined carefully in Section 2) cluster around x0 at a local scale of n−1/5. The assumption that $h0′′$ is strictly convex and continuous near x0 is crucial in this step. If $h0′′(x0)=0$ (and is continuous in a neighborhood of x0), we conjecture that $h^n(x0)$ converges at the rate n1/2. Similar behavior has been noted for monotone density estimators in [8]. If $h0′$ is discontinuous at x0, unpublished work of Cai and Low [7] suggests that $h^n(x0)$ converges to h0(x0) at rate n1/3. The behavior of convex-constrained estimators in both of these situations remains unknown and is the subject of current research.

The limiting distributions of $h^n(x0)$ and $h^n′(x0)$ involve the constants c1 and c2, which depend on the (unknown) hazard function h0, as well as the random variables $(I(2)(0),I(3)(0))$, which have a universal distribution free of the parameters of the problem. Thus, Theorem 1.2 can be used, in principle, to form confidence intervals for h0(x0) and $h0′(x0)$. This would involve estimation of the constants c1 = c1(h0, x0) and c2 = c2(h0, x0), respectively, both of which depend on $h0′′(x0)$, and appropriate quantiles of the distributions of $I(2)(0)$ and $I(3)(0)$, respectively. Although virtually nothing is known about the distribution of the invelope and its derivatives analytically, the algorithms developed in [11] can easily be used to obtain simulated values of the needed quantiles. Other possible approaches to confidence intervals in this problem involve inversion of likelihood ratio tests (see [2-4] for this approach in the context of monotone or U-shaped function estimation) or resampling methods as in [22] and as discussed in [4] in the setting of nonparametric estimation of monotone functions. It should be noted that our Theorem 1.2 verifies one of the key hypotheses needed for validity of the general subsampling theory of [21,22], and therefore makes the subsampling approach to confidence intervals viable. The details and properties of all these approaches remain to be investigated.

The outline of this paper is as follows. Section 2 is dedicated to the proof of characterizations, existence and uniqueness of the MLE. Consistency is proved in Sections 3 and 4 establishes lower bounds for the pointwise minimax risk of $h^n$. Rates of convergence are established in Section 5, with Section 5.3 containing proofs of our main results concerning the limiting distribution at a fixed point. The companion technical report [15] also includes a detailed treatment of a least-squares estimator, as well as sketches of similar results for censored data and intensity functions of Poisson processes.

## 2. Characterizations, uniqueness and existence

### Proposition 2.1

The function $h^n$ which minimizes n over $K+$ is piecewise linear. It has at most one change of slope between observations, except perhaps in one such interval, where, if the estimator touches zero, it may have two changes of slope (it is zero between these two changes). Also, between zero and X(1), the minimizer may have at most one change of slope, but this happens only if it touches zero, and in this case the estimator is increasing and equal to zero before the first change of slope. Between X(n−1) and X(n), the minimizer will also have at most one change of slope, and this only in the case where it is decreasing on [X(n−1), X(n)) and equal to zero after the change.

#### Proof

Consider any h and choose a convex g such that h(Xi) = g(Xi) for i = 1, … , n − 1 and h(x) ≥ g(x) ≥ 0 on [0, X(n)). It follows that n(h) − n(g) ≥ 0 if and only if H(Xi) ≥ G(Xi) for i = 1, … , n. Hence, the smaller we make g on [0, X(n)), the smaller n(g) will become. It is not difficult to see that the smallest such g, with values of g(Xi) fixed, must have the prescribed form.

Since $h^n$ is piecewise linear, it may be expressed as

$h^n(t)=a^+∑j=1kν^j(τj−t)++∑j=1mμ^j(t−ηj)+,$
(2.1)

where $ν^j,a^,μ^j≥0$. We let τj denote the points of change of slope of $h^n$ where $h^n$ is decreasing and let ηj > 0 denote the points of change of slope where $h^n$ is increasing. For simplicity, we assume that these are ordered. Also, we have τkη1. As seen in the next lemma, the τj ’s and ηj’s correspond to “points of touch” or equality of processes defined on the one hand in terms of $h^n$ and the data, and on the other hand just in terms of the data. We therefore also refer to them as “touchpoints” repeatedly in the remainder of the paper.

It is convenient to define the MLE in terms of the minimization of the criterion function

$φn(h)=∫0∞{H(t)−logh(t)1t≠X(n)}dFn(t),$

where $Fn$ denotes the empirical distribution function of the data,

$Fn(t)=1n∑i=1n1[0,t](Xi).$

We will also use the notation $Sn=1−Fn(t)$ for the empirical survival function.

### Lemma 2.2

Let $F~n(t)=(1∕n)∑i=1n−11[0,t](X(i))$. A function $h^n$ minimizes n over $K+$ (and hence is the MLE) if and only if:

$∫0xx−th^n(t)dF~n(t)=∫0x∫0tSn(s)dsdt$
(2.2)

for all x ≥ 0with equality at τi for i = 1, … , k;

$∫x∞t−xh^n(t)dF~n(t)=∫x∞∫t∞Sn(s)dsdt$
(2.3)

for all x ≥ 0with equality at ηj for j = 1, … , m;

$∫0∞1h^n(t)dF~n(t)≤∫0∞Sn(t)dt,$
(2.4)

$∫0∞H^n(t)dFn(t)=1−1∕n.$
(2.5)

Moreover, the minimizer $h^n$ satisfies

$∫0xh^n(t)Sn(t)dt=Fn(x)$
(2.6)

for x {τ1, … , τk, η1, … , ηm}.

### Remark 2.3

As we assume a priori that $h^n(X(n))=∞$, we may rewrite the left-hand side terms in (2.2)–(2.4) via

$∫Ax−th^n(t)dF~n(t)=∫Ax−th^n(t)1t≠X(n)dFn(t)=∫Ax−th^n(t)dFn(t),∫A1h^n(t)dF~n(t)=∫A1h^n(t)1t≠X(n)dFn(t)=∫A1h^n(t)dFn(t)$

for any set A. We will hereafter use this latter formulation.

### Corollary 2.4

Let ${τi}i=1k$ and ${ηj}j=1m$ denote the change points of $h^n$ as in (2.1). It fallows that

$∫0τi1h^n(t)dFn(t)=∫0τiSn(u)du,$
(2.7)

$∫ηj∞1h^n(t)dFn(t)=∫ηj∞Sn(u)du$
(2.8)

for i = 1, … , k and j = 1, … , m.

#### Proof

The function

$ϕ(x)≡∫oxx−th^n(t)dFn(t)−∫0x∫0tSn(s)dsdt$

is maximized at τi for i = 1, … , k. Since it is also differentiable, (2.7) follows. A similar argument proves (2.8).

#### Proof of Lemma 2.2

Consider any non-negative convex function h. It follows that there exists a non-negative constant a and non-negative measures ν and μ (these measures have supports with intersection containing at most one point) such that

$h(t)=a+∫0∞(x−t)+dν(x)+∫0∞(t−x)+dμ(x).$

For any function ĥ in $K+$, we calculate

$φn(h)−φn(h^)≥∫0∞{H(t)−H^(t)+(1−h(t)h^(t))1t≠X(n)}dFn(t)$

since − log x ≥ 1 − x. Plugging in the explicit form of h from above, we find that the right-hand side is equal to

$a{∫[0,∞)(t−1h^(t)1t≠X(n))dFn(t)}+{n−1n−∫0∞H^(t)dFn(t)}+∫0∞{∫0x∫0tSn(s)dsdt−∫0xx−th^(t)1t≠X(n)dFn(t)}dν(x)+∫0∞{∫x∞∫t∞Sn(s)dsdt−∫0xt−xh^(t)1t≠X(n)dFn(t)}dμ(x).$

This is non-negative if $h^$ is a function which satisfies conditions (2.2)–(2.5). It follows that these conditions are sufficient to describe a minimizer of n.

We next show that these conditions are necessary. To do this, we first define the directional derivative

$∂γφn(h)≡limε→0φn(h+εγ)−φn(h)ε=∫0∞{Γ(t)−γ(t)h(t)1t≠X(n)}dFn(t).$
(2.9)

If $h^n$ minimizes n, then for any γ such that $h^n+εγ$ is in $K+$ for sufficiently small ε, we must have $∂γφn(h^n)≥0$. If however, $h^n±εγ$ is in $K+$ for sufficiently small ε, then $∂γφn(h^n)=0$.

If we choose, respectively, γ(t) 1, (t − y)+, (y − t)+, then $h^n+εγ$ is in $K+$ and we obtain the inequalities in conditions (2.2)–(2.4). Since $(1±ε)h^n$ is also in $K+$, for sufficiently small ε, we obtain (2.5). Choosing γ = (τi − t)+, (t − ηj)+ yields the equalities in (2.2) and (2.3), respectively, since for each of these functions, $h^n±εγ$ is in $K+$.

Lastly, we prove (2.6). For any τi, define

$γ(t)={h^n(t)−h^n(τi),fort∈[0,τi),0,otherwise.}$

Since (1 ± ε)γ is also in $K+$, It follows that $∂γφn(h^n)=0$ and hence

$0={∫0τiH^n(t)dFn(t)−Fn(τi)+H^n(τi)Sn(τi)}+h^n(τi){∫0τi1h^n(t)dFn(t)−∫0τitdFn(t)−τiSn(τi)}.$

Integration by parts and Corollary 2.4 yield (2.6) for x = τi. The case where x = ηj is obtained in a similar manner, but using $γ(t)=(h^n(t)−h^n(ηj))1(ηj,∞)(t)$ and (2.5).

The next corollary allows us to extend the equalities of the characterization of the MLE to some extra touchpoints. The significance of these equations will become clear in Section 5, where we consider asymptotics of the estimator.

### Corollary 2.5

Suppose that $h^n$ is strictly positive and recall the formulation given in (2.1). Then we also have that

$∫0η1η1−th^n(t)dFn(t)=∫0η1∫0sSn(u)duds,$
(2.10)

$∫τk∞t−τkh^n(t)dFn(t)=∫τk∞∫t∞Sn(s)dsdt,$
(2.11)

$∫0τi1h^n(t)dFn(t)=∫0τiSn(u)du,$
(2.12)

$∫ηj∞1h^n(t)dFn(t)=∫ηj∞Sn(u)du,$
(2.13)

#### Proof

The first two equalities follow by noting that if $h^n$ is strictly positive, then for ε sufficiently small, $h^n±εγ$ is in $K+$ for γ(t) = (t − τk)+, (η1 − t)+. Arguing as for Corollary 2.4 proves the remaining identities.

### Proposition 2.6

There exists a unique minimizer $h^n$ of n over $K+$.

#### Proof

We will show that a minimizer exists by reducing the search to bounded positive convex functions on a compact domain. As this is a compact set, under the topology of uniform convergence, a minimizer of n exists (see [27], Theorems 10.6, 10.8 and 27.3).

We must first handle the issue of a compact domain. As we assume a priori that $h^n(X(n))=∞$, we are really looking for the minimizer of the modified negative of the log-likelihood with domain [0, X(n)). However, we have also argued that the minimizer must have the specific functional form as described in Proposition 2.1. Therefore, it is sufficient to reduce the domain to [0, X(n−1) + δ], for any δ > 0, since $h^n$ is then extended linearly beyond X(n−1) + δ in a unique manner. It will therefore be sufficient to show that we may reduce the search to functions bounded on [0, X(n−1)], with a derivative at X(n−1) which is bounded above.

Recall that the minimizer must satisfy (2.5). We therefore reduce our search to the class of functions which satisfy this condition. For any such h, write h = h+ + h, where h+ is increasing and h is decreasing. It follows that for any x,

$1≥∫0∞H(t)dFn(t)=∫0∞h(t)Sn(t)dt≥h−(x)∫0xSn(t)dt.$

A similar bound for h+ yields

$h(x)≤(∫0xSn(t)dt)−1+(∫x∞Sn(t)dt)−1≡Mn(x)$
(2.14)

for all x in (0, X(n)). Thus we know that h(x) must be bounded for x (0, X(n−1)].

To show that h is also bounded at zero, we need to show that h’(X(1)) is bounded from below. Assuming that it is negative, we may write, for 0 < xX(1),

$h(X(1))+h′(X(1))(x−X(1))=h(x)≤Mn(x).$

Fixing x* > 0 and less than X(1), we then obtain that

$h′(X(1))≥(Mn(x∗)−h(X(1)))∕(x∗−X(1)),$

from which it follows that h must be bounded on the set [0, X(n−1)].

By (2.5), we also have that

$n≥H(X(n))≥∫X(n−1)X(n−1)+δh(t)dt=∫X(n−1)X(n−1)+δ{h(X(n−1))+h′(X(n−1))(t−X(n−1))}dt$

if h is increasing on [X(n−1), X(n)). This implies that h’(X(n−1)) is bounded above, completing the proof.

We now show uniqueness. Suppose that h1 and h2 both minimize n. Then, by (2.5), n(h1) and n(h2) differ only in the term $−∫0∞loghi(t)1(t≠X(n))dFn(t)$. However, this term is strictly convex and it follows that h1(X(i)) = h2(X(i)) for all i = 1, … , n − 1.

Let h‾ = (h1 + h2)/2. By linearity, we have that n(h1) = (h2) = (h‾), which implies that h‾ is also a minimizer. However, the only way that this is possible is if h‾ also satisfies the conditions of Proposition 2.1. This implies that one of the following holds:

1. Both h1 and h2 are increasing and h1(0) = h2(0) = 0. In this case, they must have the same locations for their changes of slope, as otherwise h‾ violates Proposition 2.1.
2. Point 1 above does not hold. Then, by the same argument as above, if h1 and h2 have at least one change of slope in an interval between observations (or between zero and X(1)), then these locations of change of slope must be equal.

If the first case holds, then it is not difficult to see that h1 h2 on [0, X(n)), as h1(t) = h1(t) = 0 on [0, τ1] and h1(Xi) = h2(Xi) for all observation points.

In the second case, we use a different argument. We know that neither h1 nor h2 have touchpoints before X(1). Let t* denote the first touchpoint of h1 and (without loss of generality) assume that the first touchpoint of h2 is greater than t*. Hence, by (2.6),

$h1(X(1))=h2(X(1)),∫0t∗h1(t)dt=Fn(t∗).$

Now, h‾ = (h1 + h2)/2 and h2 are also minimizers of the MLE criterion function n. Also, h‾ has a touchpoint at t* and h‾(X(1)) = h2(X(1)).

Averaging h‾ with h2 yields the functions h‾l = 2−1(h1h2) + h2, which satisfy

$h‒l(X(1))=h2(X(1)),∫0t∗h‒l(t)dt=Fn(t∗)for alll≥1.$

Since h‾lh2 pointwise, it follows from the dominated convergence theorem that $∫0t∗h2(t)dt=Fn(t∗)$. Therefore, since h1 and h2 are both linear on [0, t*] with h1(X(1)) = h2(X(1)) and $∫0t∗h1(t)dt=∫0t∗h2(t)dt$, they must have both the same value and slope at X(1). That is, both h1(X(1)) = h2(X(2)) and $h1′(X(1))=h2′(X(2))$ hold.

Now write,

$h1(t)=a1+b1t+∑i=1m1−1νi,1(t−ti,1)+,h2(t)=a2+b2t+∑i=1m2−1νi,2(t−ti,2)+,$

where X(1) < t1,j < t2,j < < tmj−1,j < X(n), j = 1, 2, and where h1(X(i)) = h2(X(i)) for i = 1, … , n. We also assume that νi,j > 0 for i = 1, … , mj − 1, j = 1, 2. This implies, in particular, that hj(t) = aj + bjt for tt1,j , j = 1, 2, and since X(1) < t1,j, j = 1, 2, h1(X(1)) = h2(X(1)). Thus a1 + b1X(1) = a2 + b2X(1). From the argument above, it follows that $b1=h1′(X(1))=h2′(X(1))=b2$. We conclude that a1 = a2 and b1 = b2 so that h1(t) = h2(t) for 0 ≤ tt*. It also follows that t1,1 = t1,2.

Repeating this argument on the interval [t*, t**] with t** = min{t2,1, t2,2 shows that ν1,1 = ν1,2 or t2,1 = t2,2. Proceeding by induction yields νj,1 = νj,2 and tj+1,1 = tj+1,2 for j = 1, … , m1 − 1 = m2 − 1, hence uniqueness.

## 3. Consistency

### Theorem 3.1

Suppose that X1, … , Xn are i.i.d. random variables with convex hazard function h0 and corresponding distribution function F0. Let T0 T0(F0) inf{t : F0(t) = 1}. The MLE $h^n(t)$ is then consistent for all t (0, T0). Also, for all δ > 0,

$supδ≤t≤T0−δ∣h^n(t)−h(t)∣→0,almost surely,$

if T0 < ∞. if T0 = ∞, the above statement holds with T0δ replaced by any K < ∞.

### Remark 3.2

If h0 is increasing at 0, then one can show that $h^n$ is not consistent at zero. This is a frequently occurring difficulty of shape constrained estimators; see, for example, [1,12,30].

#### Proof

We first show that $h^n$ is bounded appropriately so that we can select convergent subsequences. Decompose $h^n$ into its decreasing and increasing parts: $h^n=h^n,↓+h^n,↑$. Then, arguing as in (2.14), it follows from (2.5) that

$h^n,↓(x)≤1∫0xSn(t)dt,$
(3.1)

where the right-hand side is almost surely bounded and, in fact, converges almost surely to $1∕∫0xS0(t)dtphb∞$ for all x > 0. Also,

$h^n,↑(x)≤1∫xx+δSn(t)dt,$
(3.2)

where the right-hand side is almost surely bounded for x (supp(F0))° and converges almost surely to $1∕∫xx+δS0(t)dtphb∞$.

Now, take γ = h0 in the directional derivative (2.9). It follows that

$0≤limε↓0φn(h^n+εh0)−φn(h^n)ε=∫0∞{H0(t)−h0(t)h^n(t)}dFn(t),$

noting that $h^n(X(n))=∞$, and hence

$∫0∞h0(t)h^n(t)dFn(t)≤∫0∞H0(t)dFn(t)→a.s.∫0∞H0(t)dF0(t)=1.$

Fix any 0 < a < b < ∞ such that a, b (supp(F0))°. It follows that limn X(n) > b with probability one (this can be shown using the Borel–Cantelli theorem). Also, sup $∣Fn(t)−F0(t)∣→a.s.0$ by the Glivenko–Cantelli lemma. Both of these events occur on the set Ω, with P(Ω) = 1. Fix ω Ω. We will show that $h^n→h0$ for such an ω.

Let {n’} denote any subsequence of {n}. By the bounds in (3.1) and (3.2) (which are finite for our choice of ω), using a classical diagonalization argument and the continuity of convex functions, we may extract a further subsequence {n”} such that $h^n′′→h^$ pointwise on [a, b], where the limit $h^$ must be convex. We denote this subsequence {n} to simplify notation.

From Fatou’s lemma, it follows that

$∫abh02(t)h^n(t)S0(t)dt=∫abh0(t)h^n(t)f0(t)dt≤liminfn∫abh0(t)h^n(t)dFn(t)≤limsupn∫0∞h0(t)h^n(t)dFn(t)≤limn∫0∞H0(t)dFn(t)≤1.$

Note that this implies that if $h^(t)=0$, then h0(t) = 0. By (2.5) and integration by parts, we see that $1≥∫[0,X(n))h^n(t)Sn(t)dt$. Therefore, again applying Fatou’s lemma,

$1≥∫abh^(t)S0(t)dt.$

It also follows that

$0≤∫ab(h^(t)−h0(t))2h^(t)S0(t)dt=∫abh^(t)S0(t)dt−2∫abh0(t)S0(t)dt+∫abh02(t)h^(t)S0(t)dt≤2−2∫abh0(t)S0(t)dt.$

Define $h^=h0$ for t [a, b], which allows us to let both 1/a and b → ∞ in the above display. Since $∫0∞h0(t)S0(t)dt=1$, it follows that

$∫0∞(h^(t)−h0(t))2h^(t)S0(t)dt=0$

and this implies that $h^(t)=h0(t)$ for all t [a, b].

We have thus shown that every subsequence ${h^n(x)}$ has a further subsequence which converges to the true hazard function h0(x) pointwise, for all x (supp F0)°. It follows that ${h^n}$ converges to h0 pointwise. By Theorem 10.8, page 90, [27], this implies that the claimed uniform convergence on [a, b] also holds. As this happens for any ω Ω, and P(Ω) = 1, we have proven the result.

### Corollary 3.3

Suppose that $h0′′$ is continuous and strictly positive at x0 (supp F0)°. It follows that there exist touchpoints τnx0ηn such that τn, ηnx0 in probability.

#### Proof

Let ηn, τn be touchpoints such that τnx0ηn. If τn does not exist, then set τn = 0, and ηn = ∞ otherwise. Suppose that it is not the case that τn, ηnp x0. It then follows from Theorem 3.1 that there exists an interval I = [a, b] such that x0 I for |I| > 0, lim supn τna and lim infn ηnb almost surely, and, lastly, $limh^n(t)→a.s.h0(t)$ on I. However, this implies that h0(t) is linear on I, which is a contradiction.

From consistency of the estimator, we also obtain consistency of the derivatives.

### Corollary 3.4

Suppose that x (a, b) and $supa≤t≤b∣h^n(t)−h0(t)∣→a.s.0$. Then $h^n′(x)→a.s.h0′(x)$ at all continuity points x of $h0′$ on (a, b).

This follows from the following simple result for convex functions, proved in [12].

### Lemma 3.5

Suppose that h‾n is a sequence of convex functions satisfying supaxb |h‾n(t) − h0(t)| → 0 with probability one. Then (also with probability one), for all x (a, b),

$−∞

## 4. Asymptotic lower bounds for the minimax risk

Define the class of densities $C$ by

$C={f:[0,∞)→[0,∞):∫0∞f(x)dx=1},{h(x)=f(x)∕(1−F(x))is convex,h(x)>0for allx>0}.$

We want to derive asymptotic lower bounds for the local minimax risks for estimating the convex hazard function h and its derivative at a fixed point. The L1-minimax risk for estimating a functional T of f0 based on a sample X1, … , Xn of size n from f0 which is known to be in a subset $Cn$ of $C$ is defined by

$MMR1(n,T,Cn)=infTnsupf∈CnEf∣Tn−Tf∣,$
(4.1)

where the infimum ranges over all possible measurable functions Tn = tn(X1, … , Xn) mapping $Rn$ to $R$. The shrinking classes $Cn$ used here are Hellinger balls centered at f0,

$Cn,τ={f∈C:H2(f,f0)≡12∫0∞(f(z)−f0(z))2dz≤τ∕n}.$

Consider estimation of

$T1(f)=f(x0)1−F(x0)=h(x0),T2(f)=h′(x0).$
(4.2)

Let $f0∈C$ and x0 > 0 be fixed such that h0 is twice continuously differentiable at x0. Define, for ε > 0, the functions hε as follows:

$hε(z)={h0(x0−εcε)+(z−x0+εcε)h0′(x0−εcε),z∈[x0−εcε,x0−ε],h0(x0+ε)+(z−x0−ε)h0′(x0+ε),z∈[x0−ε,x0+ε],h0(z),otherwise.}$

Here, cε is chosen so that hε is continuous at x0ε. Using continuity of hε and a second order expansion of h0, it follows that cε = 3 + o(1) as ε → 0. Now, define fε by

$fε(z)=exp(−Hε(z))hε(z),$

where $Hε(z)≡∫0zhε(u)du$. It follows easily that

$T1(fε)−T1(f0)=12h0′′(x0)ε2+o(ε2),$
(4.3)

$T2(fε)−T2(f0)=h0′′(x0)ε+o(ε).$
(4.4)

Furthermore, the following lemma holds.

### Lemma 4.1

Under the above assumptions,

$H2(fε,f0)=25h0′′(x0)2(1−F(x0))h0(x0)ε5+o(ε5)≡ν0ε5+o(ε5).$

#### Proof

The lemma follows from Lemma 2 of [18] and

$∫(fε(x)−f0(x))2f0(x)dx=165h0′′(x0)2(1−F(x0))h0(x0)ε5+o(ε5).$

This is achieved by careful calculation.

Combining (4.3) and (4.4) with the lemma, it follows that

$∣T1(f(ε∕ν0)1∕5)−T1(f0)∣≥(h0(x0)h0′′(x0)S0(x0)82)2∕5ε2∕5(1+o(1)),∣T2(f(ε∕ν0)1∕5)−T2(f0)∣≥(5h0(x0)h0′′(x0)32S0(x0))1∕5ε1∕5(1+o(1)).$

From these calculations, together with Lemma 5.1 of [12], we have the following result. Along with Theorem 1.2, it indicates that $h^n(x0)$ and $h^n′(x0)$ achieve optimal rates and also have the correct dependence on the parameters h”(x0) and h(x0) (up to absolute constants).

### Theorem 4.2 (Minimax risk lower bound)

For the functionals T1 and T2 as defined in (4.2), and with $MMR1(n,T,Cn,τ)$ as defined in (4.1),

$supτ>0limsupn→∞n2∕5MMR1(n,T1,Cn,τ)≥14(h0(x0)h0′′(x0)S0(x0)e82)2∕5andsupτ>0limsupn→∞n1∕5MMR1(n,T2,Cn,τ)≥14(14eh0(x0)h0′′(x0)32S0(x0))1∕5.$

In particular, Theorem 1.2 shows that the MLE achieves the optimal pointwise rate of convergence, n2/5, at points x0 with h”(x0) > 0. Convergence rates over the larger class of bathtub-shaped functions would be slower: the MLE of a U-shaped hazard is known to converge locally at rate n1/3; see, for example, [2].

## 5. Rates of convergence

In this section, we identify the local rates of convergence of the MLE. Fix a point x0 (supp F0)°. To obtain the results, we assume that $h0′′(⋅)$ is continuous and strictly positive in a neighborhood of x0 and that h(x0) > 0.

### 5.1. Some useful estimates

For 0 < xy, define

$Un(x,y)=∫xy{z−(1∕2)(x+y)h^n(z)}d(Fn−F0)(z).$

#### Lemma 5.1

Let x0 (supp F0)°. Then, for each ε > 0, there exist constants δ, c0, n0 and (positive) random variables Mn (independent of x, y), of order Op(1), such that for each |xx0| < δ,

$∣Un(x,y)∣≤ε(y−x)4+n−4∕5Mn,0≤y−x≤c0,$
(5.1)

for all nn0.

##### Proof

Note that $Un=(Pn−P0)(gx,y,h^n)$, where

$gx,y,h(z)≡fx,y(z)h(z)1[x,y](z)$

and, in view of the consistency established in Theorem 3.1, $h^n$ is a convex function uniformly close to h0 on neighborhoods of x0. This leads to the consideration of the class of functions

$Fx,R≡{z↦gx,y,h(z):x≤y≤x+R,hconvex,∥h−h0∥x0−δx0+δ+c0≤γ}$

with γ infx0δxx0+δ+c0 h0(x)/2 and we define $Gn≡{∥h^n−h0∥x0−δx0+δ+c0≤γ}$. The class $Fx,R$ has an envelope function Fx,R(z) = γ−1{(zx)1[x,x+R](z)+2−1R1[x,x+R](z)} and hence the following second moment bound holds:

$E{[Fx,R]2}=1γ2∫[x,x+R][(z−x)+R∕2]2f0(z)dz≤1312γ2∥f0∥x0−δx0+δR3.$

Furthermore, $logN[](ε,Fx,R,L2(P0))≤K∕ε1∕2$ for some constant K by [29], Theorem 2.7.10, page 159, and a straightforward bracketing argument. It then follows from [29], Theorems 2.14.2 and 2.14.5, pages 240 and 244, that

$E{(supf∈Fx,R∣(Pn−P0)(f)∣)2}≤1nK′E{[Fx,R(X1)]2}=O(n−1R3).$
(5.2)

Define Mn(ω) as the infimum (possibly +∞) of those values such that (5.1) holds and define A(n, j) to be the set [(j − 1)n−1/5, jn−1/5). Then, for m constant,

$P(Mn>m)≤P([Mn>m]∩Gn)+P(Gnc)≤P([∃u:∣Un(x,x+u)∣>εu4+n−4∕5m]∩Gn)+P(Gnc)≤∑j≥1P([∃u∈A(n,j):n4∕5∣Un(x,x+u)∣>ε(j−1)4+m]∩Gn)+P(Gnc).$

The jth summand is hence bounded by

$n8∕5E[supu∈A(n,j)∣Un(x,x+u)m2∣1Gn]∕[m+ε(j−1)4]2≤Cj3[m+ε(j−1)4]2$

due to (5.2). Thus it follows, using Theorem 3.1 to conclude that $P(Gnc)→0$, that

$limsupn→∞P(Mn>m)≤C∑j=1∞j3[m+ε(j−1)4]2,$

where the sum in the bound is finite and converges to zero as m → ∞. This completes the proof of the claim.

A similar approach proves the following for the function

$Vn(x,y)=∫xy{z−(x+y)∕2}(Sn(z)−S0(z))dz.$

#### Lemma 5.2

Let x0 (supp F0)°. Then, for each ε > 0, there exist constants δ, c0 > 0 and (positive) random variables Mn (independent of x, y) of order Op(1) such that for each |x − x0| < δ,

$∣Vn(x,y)∣≤εn−1∕5(y−x)4+n−1Mn,0≤y−x≤c0.$
(5.3)

### 5.2. Asymptotic behavior of touchpoints and resulting bounds

#### Lemma 5.3

Let x0 > 0 be a point at which h0 has a continuous and strictly positive second derivative, and where h(x0) > 0. Let ξn be any sequence of numbers converging to x0 and define τn and ηn to be the largest touchpoint of $h^n$ smaller than ξn and the smallest touchpoint larger than ξn, respectively. Then

$ηn−τn=Op(n−1∕5).$

##### Proof

By Theorem 3.1, we know that $h^n$ is positive near x0 for large enough n. Also, it is either strictly increasing or strictly decreasing in a neighborhood of x0, or it is locally flat. If $h^n$ is decreasing between τn and ηn, then (2.7) and (2.2) with equality at both ηn and τn hold. If, instead, $h^n$ is increasing, then (2.8) and (2.3) with equality at both ηn and τn hold. There is only the potential for a problem in the locally flat case. However, since $h^n$ is strictly positive, by Corollary 2.5, we can extend the necessary equalities to this case as well. Therefore, we need only consider two cases, $h^n$ is either non-increasing or non-decreasing on [τn, ηn].

We first assume that $h^n$ is non-increasing on [τn, ηn]. Define

$H^n,↓(z)=∫0zz−th^n(t)dFn(t)andAn,↓(z)=∫0zSn(t)dt,$
(5.4)

and let mn be the midpoint of [τn, ηn], mn = (τn + ηn)/2. We may then calculate

$H^n,↓(mn)=∫mnηnx−mnh^n(x)dFn(x)+H^n,↓(ηn)−(ηn−mn)H^n,↓′(ηn)=∫τnmnmn−xh^n(x)dFn(x)+H^n,↓(τn)+(mn−τn)H^n,↓′(τn).$

From (2.2), we know that $2Hn,↓(mn)≤2∫0mnAn,↓(t)dt$. The equality in (2.2), together with (2.7), allows us to rewrite this as 0 ≥ L1,↓ + L2,↓, where L1,↓ is equal to

$∫mnηnx−mnh^n(x)dFn(x)+∫τnmnmn−xh^n(x)dFn(x)−ηn−τn4{H^n,↓′(ηn)−H^n,↓′(τn)}=∫mnηnx−(1∕2)(ηn+mn)h^n(x)dFn(x)+∫τnmn(1∕2)(τn+mn)−xh^n(x)dFn(x)$

and

$L2,↓=∫mnηnAn,↓(x)dx−∫τnmnAn,↓(x)dx−14(ηn−τn){An,↓(ηn)−An,↓(τn)}=−{∫mnηn{x−12(ηn+mn)}Sn(x)dx+∫τnmn{12(τn+mn)−x}Sn(x)dx},$

by integration by parts.

Now replace $Fn$ by the true F0 in the definition of L1,↓ to obtain

$L1,↓0≡∫mnηnx−(1∕2)(ηn+mn)h^n(x)dF0(x)+∫τnmn(1∕2)(τn+mn)−xh^n(x)dF0(x)=∫mnηn{x−12(ηn+mn)}{1h^n(x)−1h0(x)}dF0(x)+∫τnmn{12(τn+mn)−x}{1h^n(x)−1h0(x)}dF0(x)−L2,↓0,$

where

$L2,↓0=−∫mnηn{x−12(ηn+mn)}S0(x)dx−∫τnmn{12(τn+mn)−x}S0(x)dx.$

Next, using a Taylor expansion of order 2 on the function $1∕h^n(x)−1∕h0(x)$ and about the point mn, we obtain

$L1,↓0+L2,↓0=1192{h0′′(x0)h02(x0)f0(x0)}(ηn−τn)4+o((ηn−τn)4)$
(5.5)

since both $h^n$ and $h^n′$ are consistent by Theorem 3.1, $h^n′′(x)=0$ on (τn, ηn) and because τn − ηn = op(1) by Corollary 3.3. Therefore, by Lemmas 5.1 and 5.2, together with the above calculations, we may write

$0≥L1,↓+L2,↓=L1,↓0+L2,↓0+(L1,↓−L1,↓0)+(L2,↓−L2,↓0)≥L1,↓0+L2,↓0−ε(ηn−τn)4−Op(n−4∕5)−εn−1∕5(ηn−τn)4−Op(n−1)=1192{h0′′(x0)h02(x0)f(x0)−192ε}(ηn−τn)4+o((ηn−τn)4)−Op(n−4∕5).$

We choose ε sufficiently small (so that the leading term in the last line of the above display is positive) and hence conclude that (ηnτn) = Op(n−1/5). A similar approach proves the non-decreasing case.

#### Lemma 5.4

Let ξn be a sequence converging to x0. Then, for any ε > 0, there exists an M > 1 and a c > 0 such that, with probability greater than 1 − ε, we have that there exist change points τn < ξn < ηn of $h^n$ such that

$inft∈[τn,ηn]∣h^n(t)−h0(t)∣

for all n sufficiently large.

##### Proof

Fix ε > 0. From Lemma 5.3, it follows that there exist touchpoints ηn and τn, and an M > 1 such that ξnMn−1/5τnξnn−1/5ξn + n−1/5ηnξn + Mn−1/5.

Fix c > 0 and consider the event

$inft∈[τn,ηn]∣h^n(t)−h0(t)∣≥cn−2∕5.$
(5.6)

First, assume that $h^n$ is non-increasing on [τn, ηn]. On this set, we have that

$∣∫τnηn(ηn−t)h^n(t)−h0(t)h^n(t)S0(t)dt∣≥Bcn−2∕5(ηn−τn)2≥Bcn−4∕5,$

where B is some constant depending on x0. Using the definitions in (5.4), as well as the equality in condition (2.2) with (2.7), it follows that

$0=H^n,↓(ηn)−∫0ηnAn,↓(t)dt−H^n,↓(τn)+∫0τnAn,↓(t)dt−(H^n,↓′(τn)−An,↓(τn))(ηn−τn)=∫τnηnηn−th^n(t)dFn(t)−∫τnηn(ηn−t)Sn(t)dt=∫τnηn(ηn−t)h^n(t)−h0(t)h^n(t)S0(t)dt+∫τnηnηn−th^n(t)dF˘n(t)−∫τnηn(ηn−t)S˘n(t)dt,$

where $F˘n(t)=Fn(t)−F0(t)$ and $S˘n(t)=Sn(t)−S0(t)$. By the assumption on h0 and x0, and arguments similar to those used for Lemmas 5.1 and 5.2, we can show that

$∫τnηn(ηn−t)h^n(t)−h0(t)h^n(t)S0(t)dt=Op(n−4∕5),$

which is a contradiction to (5.6) if c is chosen large enough. A similar argument completes the proof for the non-decreasing case.

The next proposition is the key to proving tightness in the next section. The results follow from the previous lemmas and make extensive use of the underlying convexity.

#### Proposition 5.5

Under the assumptions of this section, we have that, for each M > 0,

$sup∣t∣≤M∣h^n(x0+n−1∕5t)−h0(x0)−n−1∕5th0′(x0)∣=Op(n−2∕5),$
(5.7)

$sup∣t∣≤M∣h^n′(x0+n−1∕5t)−h0′(x0)∣=Op(n−1∕5).$
(5.8)

##### Proof

For M, ε > 0 fixed, define ηn,1 to be the first point of touch after x0 + Mn−1/5 and ηn,i to be the first point of touch after ηn,i−1 + n−1/5 , i = 2, 3 Define the points ηn,−i for i = 1, 2, 3 similarly, but working to the left of x0. By Lemma 5.4, there exist points ξn,i (ηn,i, ηn,i+1), i = 1, 2, −2, −3, and a constant c > 0 such that with probability at least 1 − ε, we have that $∣h^n(ξn,i)−h0(ξn,i)∣≤cn−2∕5$.

As $h^n$ is convex, it follows that for any t [x0Mn−1/5, x0 + Mn−1/5],

$h^n′(t)≤h^n′(ξn,1)≤h^n(ξn,2)−h^n(ξn,1)ξn,2−ξn,1≤h0(ξn,2)−h0(ξn,1)+2cn−2∕5ξn,2−ξn,1≤h0′(ξn,2)+2cn−1∕5$

since ξn,2ξn,1n−1/5, where $h^n′(t)$ denotes the right derivative at t . Because of the continuity of $h0′′(⋅)$ near x0, we may replace $h0′(ξn,2)$ with $h0′(x0)+c~n−1∕5$ for some new constant $c~$. The result follows. A similar argument shows the lower bound.

We now consider (5.7). By Lemma 5.3, there exists a constant K > M such that there exist two touchpoints in [x0 + Mn−1/5, x0 + Kn−1/5], n−1/5 apart with probability 1 − ε. The same is the case in the interval [x0Mn−1/5, x0Kn−1/5]. From Lemma 5.4, it follows that there exist points ξn,1 [x0 + Mn−1/5, x0 + Kn−1/5] and ξn,2 [x0Mn−1/5, x0Kn−1/5] such that $∣h^n(ξn,i)−h0(ξn,i)∣≤cn−2∕5$, for i = 1, 2, with probability at least 1 − ε and sufficiently large n. Lastly, we have already shown that there exists a c’ such that with probability at least 1 − ε,

$supt∈(x0−Kn−1∕5,x0+Kn−1∕5)∣h^n′(t)−h0′(x0)∣≤c′n−1∕5.$

Therefore, with probability at least 1 − 3ε, we have that for any t [x0Mn−1/5, x0 + Mn−1/5] and sufficiently large n,

$h^n(t)≥h^n(ξn,1)+h^n′(ξn,1)(t−ξn,1)≥h0(ξn,1)−cn−2∕5+(h0′(x0)−c′n−1∕5)(t−ξn,1)=h0(x0)+h′(x0)(t−x0)+12h′′(x0∗)(ξn,1−x0)2−cn−2∕5−c′n−1∕5(t−ξn,1)≥h0(x0)+h′(x0)(t−x0)−Bn−2∕5$

for some constant B > 0. A similar argument proves the other direction.

### 5.3. Limit distribution theory at a fixed point

From Lemma 5.3, we know what rescaling is necessary to pick up a meaningful limit. The idea of the proof is now to write carefully a local version of the characterization of the MLE, Lemma 2.2, and to show that in the limit, these become the characterization of the invelope (Definition 1.1). The invelope $I(⋅)$ is described in terms of the “driving” process Y(·). Our goal will then be to identify the two processes, one which converges to the invelope and another which converges to the driving process Y .

Note that at x0 (where h”(x0) > 0), we have three possibilities:

1. $h0′(x0)mak0$: By continuity, $h0′(x)mak0$ in a neighborhood of x0. It follows from the consistency of the MLE derivatives that $h^n′mak0$ for sufficiently large n and hence all touchpoints to be considered are of the “increasing” kind.
2. $h0′(x0)phb0$: By the same argument, all touchpoints are decreasing.
3. $h0′(x0)=0$: Since h(x0) > 0, by Corollary 2.5, there is always at least one touchpoint which satisfies both the non-increasing and non-decreasing properties. The limiting process may then be “stitched” together in an appropriate manner.

Therefore, it will be sufficient to prove the asymptotic results for both types of touchpoints: non-increasing and non-decreasing. For the sake of brevity, we outline the argument only for the non-increasing setting.

For any interval $[a,b]⊂R$, let D[a, b] denote the space of cadlag functions from [a, b] into $R$ endowed with the Skorohod topology and C[a, b] the space of continuous functions endowed with the uniform topology.

#### Driving process

Define

$Bn(t)≡n(Hn(t)−H0(t)),$
(5.9)

where $Hn$ is the empirical cumulative hazard function, defined by $dHn(u)=(1−Fn(t−))−1dFn(t)$. From [28], Chapter 7, Theorem 7.4.1, page 307, we know that for t (0, T0) with T0 T0(F0) inf{x : F(x) = 1}, $Bn(t)⇒B(C(t))$ in D[0, M] for M < T0, where B denoes a standard Brownian motion on [0, ∞) and C(t) = F0(t)/S0(t). Let xn(t) = x0 + n−1/5t and define

$Y~nloc(t)≡n4∕5∫x0xn(t){Hn(v)−Hn(x0)−∫x0v(h0(x0)+(u−x0)h0′(x0))du}dv.$
(5.10)

It is not difficult to show that

$Y~nloc(t)⇒C′(x0)∫0tW(s)ds+124h0′′(x0)t4,(Y~nloc)′(t)⇒C′(x0)W(t)+13!h0′′(x0)t3$

in D[−M, M] for each fixed 0 < M < ∞, where W is a two-sided Brownian motion process starting at 0 and C’(t) = h0(t)/S0(t). Next, define

$Y^n,↓loc(t)=n4∕5h0(x0)S0(x0)∫x0xn(t)∫x0v{h0(u)−h0(x0)−(u−x0)h0′(x0)h^n(u)}Sn(u)dudv+n4∕5h0(x0)S0(x0)∫x0x0+n−1∕5t∫x0vSn(u)h^n(u)d{Hn∗(u)−H0(u)}dv,$

where $dHn∗(u)=Sn(u−)Sn(u)dHn(u)$. The derivative $(Y^n,↓loc)′(t)$ is not difficult to calculate. By consistency of $h^n$ and since $supt∣Sn(t)−S0(t)∣→0a.s.,$ for any M > 0,

$limnsup∣t∣≤M∣Y^n,↓loc(t)−Y~nloc(t)∣=limnsup∣t∣≤M∣(Y^n,↓loc)′(t)−(Y~nloc)′(t)∣=0a.s.$
(5.11)

$Y^n,↓loc$ is our driving process.

#### Invelope process

Recall definitions (5.4). Our initial candidate for the invelope is defined as

$I^n,↓loc(t)=n4∕5h0(x0)S0(x0)∫x0xn(t)∫x0v{h^n(u)−h0(x0)−(u−x0)h0′(x0)h^n(u)}Sn(u)dudv+A^n,↓t+B^n,↓,$

where

$A^n,↓=−n3∕5h0(x0)S0(x0){H^n,↓′(x0)−An,↓(x0)}andB^n,↓=−n4∕5h0(x0)S0(x0){H^n,↓(x0)−∫0x0An,↓(v)dv}.$

Notice that because of the presence of $Sn(v)$ in its definition, $I^n,↓loc(t)$ is not three times differentiable. We therefore define

$I^n,↓∗,loc(t)=n4∕5h0(x0)S0(x0)∫x0xn(t)∫x0v{h^n(u)−h0(x0)−(u−x0)h0′(x0)h^n(u)}S0(u)dudv+A^n,↓t+B^n,↓,$

From Proposition 5.5, we have that for any M > 0,

$limnsup∣t∣≤M∣I^n,↓loc(t)−I^n,↓∗,loc(t)∣=0.$
(5.12)

The derivatives of $I^n,↓∗,loc$ will describe the limiting behavior of our estimators. First, though, we must show that this process converges to the invelope. To do this, define the vector

$Z^n(t)=(Y^n,↓loc(t),(Y^n,↓loc)′(t),I^n,↓∗,loc(t),(I^n,↓∗,loc)′(t),(I^n,↓∗,loc)′′(t),(I^n,↓∗,loc)′′′(t))$
(5.13)

and fix M > 0. We will show that $Z^n$ is tight in the product space

$E[−M,M]≡C[−M,M]×D(−M,M)×C[−M,M]3×D[−M,M].$

This will be done last. We first assume that $Z^n$ has a weak limit and identify its unique limit. The two arguments together prove that $Z^n$, and hence $I^n,↓∗,loc$, have the appropriate limiting distribution.

#### Identifying the limit

It is sufficient to show that $I^n,↓∗,loc$ satisfies (1.2)–(1.4) in the limit.

For condition (1.2), calculate

$I^n,↓loc(t)−Y^n,↓loc(t)=n4∕5h0(x0)S0(x0){∫0x0+n−1∕5tAn,↓(v)dv−H^n,↓(x0+n−1∕5t)}≥0,$
(5.14)

with equality at the (non-increasing) touchpoints of $h^n$, using (2.2). By (5.12), it follows that $I^n,↓∗,loc$ satisfies (1.2) in the limit.

Next, the derivatives of $I^n,↓∗,loc(t)$ are calculated as follows:

$(I^n,↓∗,loc)′(t)=n3∕5h0(x0)S0(x0)∫x0xn(t){h^n(u)−h0(x0)−(u−x0)h0′(x0)h^n(x)}S0(u)du+A^n,↓,(I^n,↓∗,loc)′′(t)=n2∕5h0(x0)S0(x0){h^n(xn(t))−h0(x0)−n−1∕5th0′(x0)h^n(xn(t))}S0(x0+n−1∕5t).$

Due to Theorem 3.1 and Proposition 5.5, we have that

$limnsup∣t∣≤M∣(I^n,↓∗,loc)′′(t)−n2∕5[h^n(x0+n−1∕5t)−h0(x0)−n−1∕5th0′(x0)]∣=0,$
(5.15)

where $n2∕5[h^n(x0+n−1∕5t)−h0(x0)−n−1∕5th0′(x0)]$ is convex, and hence the limit of $(I^n,↓∗,loc)′′(t)$ will be convex. Thus, (1.3) is satisfied in the limit.

Let $Bn(t)=(h0(x0)∕S0(x0))×(S0(t)∕h^n(t))$. We may then write

$(I^n,↓∗,loc)′′′(t)=n1∕5(h^n′(xn(t))−h0′(x0)]Bn(xn(t))+n1∕5[h^n(xn(t))−h0(x0)−n−1∕5th0′(x0)]×Bn′(xn(t)).$

Notice that sup|t|≤M |1 − Bn(x0 + n−1/5t)| →a.s. 0, with $limnBn′(x0+n−1∕5t)$ bounded. Therefore, from Proposition 5.5, it follows that

$limnsup∣t∣≤M∣(I^n,↓∗,loc)′′′(t)−n1∕5[h^n′(x0+n−1∕5t)−h0′(x0)]∣=0,$
(5.16)

where $n1∕5[h^n′(x0+n−1∕5t)−h0′(x0)]$ is piecewise constant, with jumps at the touchpoints of $h^n$. By consistency of $h^n$, we have

$(I^n,↓∗,loc)′′′(t)=Bn(x0+n−1∕5t)dg^n(t)+2[h^n′(x0+n−1∕5t)−h0′(x0)]Bn′(x0+n−1∕5t)dt+n1∕5(h^n[x0+n−1∕5t)−h0(x0)−n−1∕5th0′(x0)]dBn′(x0+n−1∕5t)={Bn(x0+n−1∕5t)+Op∗(n−2∕5)}dg^n(t)+Op∗(n−1∕5)dt,$

where $g^n(t)=n1∕5[h^n′(x0+n−1∕5t)−h0′(x0)]$. We say that a process Xn(t) is $Op∗(1)$ if sup|t|≤M |Xn(t)| is Op(1).

Next, fix a c > 0. Since $h^n$ is piecewise linear, it follows that dĝn puts mass only at the locations of touchpoints of $h^n$ However, at these locations, by (5.14), the process $I^n,↓loc(t)−Y^n,↓loc(t)$ is equal to zero. It follows that

$∫−cc(I^n,↓loc(t)−Y^n,↓loc(t))dg^n(t)=0.$

Hence,

$∫−cc(I^n,↓∗,loc(t)−Y^n,↓loc(t))d(I^n,↓∗,loc)′′′(t)=∫−cc(I^n,↓∗,loc(t)−Y^n,↓loc(t))d[(I^n,↓∗,loc)′′′−g^n](t)+∫−cc(I^n,↓∗,loc(t)−I^n,↓loc(t))dg^n(t)=op(1),$

using Proposition 5.5, (5.12) and the fact that ĝn is increasing.

It remains to show that (1.4) is maintained under limits. This follows from the continuous mapping theorem since for any element z = {z1, z2, z3, z4, z5, z6} E[−M, M],

$ψ(z)=∫−MM(z3−z1)dz6$

is continuous in z for z6 increasing. We have thus shown that $I^n,↓∗,loc(t)$ satisfies the invelope conditions (1.2)–(1.4) asymptotically. This shows that the only possible limit of $I^n,↓∗,loc(t)$ is the process $I$.

#### Tightness

We already know that $Y^n,↓loc(t)$ and $(Y^n,↓loc)′(t)$ are tight in C[−M, M] and D[−M, M], respectively. To address tightness of the invelope processes, note that bounded] and increasing functions are compact in D[−M, M] and that bounded continuous functions with uniformly bounded derivatives are compact in C[−M, M]. These two facts allow us to address only stochastic boundedness of $(I^n,↓∗,loc)(i)(t)$, i = 0, … , 3, to obtain tightness. Thus, Proposition 5.5, along with (5.15) and (5.16), says that $(I^n,↓∗,loc)′′(t)$ and $(I^n,↓∗,loc)′′′(t)$ are tight. It remains to argue the same for $(I^n,↓∗,loc)′(t)$ and $I^n,↓∗,loc(t)$. However, this will follow by Proposition 5.5, and (5.12), if we can show that both $A^n,↓$ and $A^n,↓t+B^n,↓$ are tight.

Let τn be the largest touchpoint smaller than x0. By (2.7), and after careful calculations, we have

$−S0(x0)h0(x0)A^n,↓=−n3∕5{∫τnx0h^n(u)−h0(x0)−h0′(x0)(u−x0)h^n(u)S0(u)du}+n3∕5{∫τnx0h0(u)−h0(x0)−h0′(x0)(u−x0)h^n(u)S0(u)du}+n3∕5∫τnx01h^n(u)d{Fn−F0}(u)+n3∕5∫τnx0Sn(u)−S0(u)du.$

By Proposition 5.5, Lemma 5.3 and Theorem 3.1, the first two terms are tight in C[−M, M]. Arguments similar to those used in the proof of Lemma 5.1, along with Lemma 5.3, may be used to handle the remaining terms. Since $H^n,↓(τn)=∫0τnAn,↓(v)dv$ by Lemma 2.2, it follows that $B^n,↓$ is tight in C[−M, M], which, in turn, implies that $Z^n$ is tight in the space E[−M, M].

#### From the invelope to Theorem 1.2

By (5.15) and (5.16), the limiting behavior of $n2∕5(h^n(x0)−h0(x0))$ and $n1∕5(h^n′(x0)−h0′(x0))$ is the same as that of the second and third derivatives of $I^n,↓∗,loc$, which converge to the invelope of $Y^n,↓loc$. Define k1, k2 by

$limnY^n,↓loc(t)=C′(x0)∫0tW(s)ds+124h0′′(x0)t4≡k1∫0tW(s)ds+k2t4.$

For any a, b > 0, $bY(at)=da3∕2b∫0tW(s)ds+a4bt4$. Therefore, choose a, b so that a4b = k2 and a3/2b = k1. It follows that

$Y^n,↓loc(t)⇒bY(at).$

Applying this rescaling to all processes shows that

$(I^n,↓∗,loc)′′(0)⇒ba2I′′(0)and(I^n,↓∗,loc)′′′(0)⇒ba3I′′′(0).$

It is now straightforward to calculate the correct constants, c1 and c2, of Theorem 1.2.

## Acknowledgements

The research of Hanna Jankowski was supported by the NSERC; the research of Jon A. Wellner was supported in part by NSF Grant DMS-0503822 and NI-AID Grant 2R01 AI291968-04.

## References

[1] Balabdaoui F. Consistent estimation of a convex density at the origin. Math. Methods Statist. 2007;16:77–95.
[2] Banerjee M. Estimating monotone, unimodal and U-shaped failure rates using asymptotic pivots. Statist. Sinica. 2008;18:467–492.
[3] Banerjee M, Wellner JA. Likelihood ratio tests for monotone functions. Ann. Statist. 2001;29:1699–1731.
[4] Banerjee M, Wellner JA. Confidence intervals for current status data. Scand. J. Statist. 2005;32:405–424.
[5] Baraud Y, Birgé L. Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Related Fields. 2009;143:239–284.
[6] Brunel E, Comte F. Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Statist. 2006;15:233–255.
[7] Cai T, Low M. Technical report. Dept. Statistics, Univ. Pennsylvania; 2007. Adaptive estimation and confidence intervals for convex functions and monotone functions.
[8] Carolan C, Dykstra R. Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist. 1999;27:557–566.
[9] Gruppo di Lavoro MPS. Catalogo parametrico dei terremoti italiani, versione 2004 (cpti04) INGV; Bologna: 2004. Available at http://emidius.mi.ingv.it/CPTI. in Italian.
[10] Grenander U. On the theory of mortality measurement. II. Skand. Aktuarietidskr. 1956;39:125–153.
[11] Groeneboom P, Jongbloed G, Wellner JA. A canonical process for estimation of convex functions: The “invelope” of integrated Brownian motion +t4. Ann. Statist. 2001;29:1620–1652.
[12] Groeneboom P, Jongbloed G, Wellner JA. Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 2001;29:1653–1698.
[13] Groeneboom P, Jongbloed G, Wellner JA. The support reduction algorithm for computing non-parametric function estimates in mixture models. Scand. J. Statist. 2008;35:385–399. [PubMed]
[14] Haupt E, Schäbe H. The TTT transformation and a new bathtub distribution model. J. Statist. Plann. Inference. 1997;60:229–240.
[15] Jankowski H, Wellner JA. Nonparametric estimation of a convex bathtub-shaped hazard function. Univ. Washington, Department of Statistics; 2007. Technical Report 521. [PubMed]
[16] Jankowski H, Wellner JA. Computation of nonparametric convex hazard estimators via profile methods. J. Nonparametr. Stat. 2009;21:505–518. [PubMed]
[17] Jankowski H, Wang X, McCauge H, Wellner J. convexHaz: R functions for convex hazard rate estimation. R package version 0.2. 2008.
[18] Jongbloed G. Minimax lower bounds and moduli of continuity. Statist. Probab. Lett. 2000;50:279–284.
[19] La Rocca L. Bayesian non-parametric estimation of smooth hazard rates for seismic hazard assessment. Scand. J. Statist. 2008;35:524–539.
[20] Lai CD, Xie M, Murthy NP. Bathtub-shaped failure rate life distributions. In: Balakrishnan N, Rao CR, editors. Advances in Reliability. Vol. 20. North-Holland Publishing; Amsterdam: 2001. pp. 69–104. Handbook of Statistics.
[21] Politis DN, Romano JP. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 1994;22:2031–2050.
[22] Politis DN, Romano JP, Wolf M. Subsampling. Springer; New York: 1999.
[23] Rajarshi S, Rajarshi MB. Bathtub distributions: A review. Comm. Statist. Theory Methods. 1988;17:2597–2621.
[24] Reboul L. Estimation of a function under shape restrictions. Applications to reliability. Ann. Statist. 2005;33:1330–1356.
[25] Reynaud-Bouret P. Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields. 2003;126:103–153.
[26] Robertson T, Wright FT, Dykstra RL. Order Restricted Statistical Inference. Wiley; Chichester: 1988.
[27] Rockafellar RT. Convex Analysis. Princeton Mathematical Series. Vol. 28. Princeton Univ. Press; Princeton, NJ: 1970.
[28] Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. Wiley; New York: 1986.
[29] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications in Statistics. Springer; New York: 1996.
[30] Woodroofe M, Sun J. A penalized maximum likelihood estimate of f(0+) when f is nonincreasing. Statist. Sinica. 1993;3:501–515.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.