Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Bernoulli (Andover). Author manuscript; available in PMC 2010 May 1.
Published in final edited form as:
Bernoulli (Andover). 2009 November 1; 15(4): 1010–1035.
doi:  10.3150/09-BEJ202
PMCID: PMC2850000

Nonparametric estimation of a convex bathtub-shaped hazard function


In this paper, we study the nonparametric maximum likelihood estimator (MLE) of a convex hazard function. We show that the MLE is consistent and converges at a local rate of n2/5 at points x0 where the true hazard function is positive and strictly convex. Moreover, we establish the pointwise asymptotic distribution theory of our estimator under these same assumptions. One notable feature of the nonparametric MLE studied here is that no arbitrary choice of tuning parameter (or complicated data-adaptive selection of the tuning parameter) is required.

Keywords: antimode, bathtub, consistency, convex, failure rate, force of mortality, hazard rate, invelope process, limit distribution, nonparametric estimation, U-shaped

1. Introduction

Information on the behavior of time to a random event is of much interest in many fields. The random event could be failure of a material or machine, death, an earthquake or infection by a disease, to name but a few examples. Frequently, this type of data is called lifetime data, and it is natural to assume that it takes values in [0, ∞). If the lifetime distribution F has a density f , then a key quantity of interest is the hazard (or failure) rate h(t) = f(t)/(1 − F(t)). Heuristically, h(t) dt is the probability that, given survival until time t, the event will occur in the next duration of length dt. The hazard function is also known as the force of mortality in actuarial science or the intensity function in extreme value theory.

Certain shape restrictions arise quite naturally for hazard rates. In this work, we are particularly interested in the family of hazard functions which are convex. That is, we attach the additional smoothness constraint of convexity to the more traditional assumption of a bathtub-shaped failure rate (that is, first decreasing, then increasing). Heuristically, bathtub-shaped hazards correspond to lifetime distributions with high initial hazard (or infant mortality), lower and often rather constant hazard during the middle of life and then increasing hazard of failure (or wear-out) as aging proceeds; see [20,23].

Many other estimators of hazard functions (and solutions to the closely related problem of estimating the intensity of a Poisson process) with and without shape restrictions have been considered in the literature; see [24] for a partial review up to 2002. In recent years, the focus has shifted to construction of “adaptive” estimators over large scales of smoothness classes; see, for example, [5,6,25]. Virtually all of these other estimators require careful choice of penalty terms or tuning parameters, and computation of the adaptive estimators typically involves methods of combinatorial optimization. As far as we know, reliable algorithms for computing them are not yet available. Our estimators avoid the choices of tuning parameters or penalty terms by virtue of the shape constraint of convexity and are relatively straightforward to compute since the corresponding optimization problems are convex.

Recall the definition of a convex function. Let CR+=[0,) be convex. Then h:CR is convex (on C) if it satisfies


for all x, y [set membership] C. Equivalently, a function is convex if its epigraph


is a convex set in R2 (see, for example, [27], Section 4). Thus, a convex function on C may be extended to a convex function on R+ by setting h(x) = +∞ for xR+Cc.

Suppose, then, taht we observe i.i.d. variables X1, … , Xn from a distribution F0 with density f0 and hazard rate h0. We denote the true cumulative hazard function by H0(t)=0th0(s)ds, and the true survival function by S0 = 1 − F0. Also, 0 < X(1) < X(2) < (...) < X(n) denote the order statistics corresponding to X1, … , Xn.

To define the MLE of h0, h^n, we first consider the likelihood in terms of the hazard,


where H(t)=0th(s)ds. This can be made arbitrarily large by increasing the value of h(X(n)). We therefore find h^n:[0,X(n))R+ by maximizing the modified likelihood


over K+, the space of non-negative convex functions on [0, X(n)). The full MLE is then found by setting h^n(x)= for all xX(n). This is the same approach as taken in [10]. Equivalently, one could first impose the constraint that hM, and then let M → ∞ (see, for example, [26], page 338).

To illustrate the proposed estimator, consider the distribution with density given by


This distribution was derived in [14] as a relatively simple model with bathtub-shaped hazards which also has an adequate ability to model lifetime behavior. We will call this the HS distribution, after the authors. It has convex hazards for all values of b in the parameter space (b > −1/2). Figure 1 shows the MLE for simulated data from this distribution with sample size n = 100.

Figure 1
Examples of the estimator. Left: Estimating the HS hazard with b = 0 and A = 1 for a sample size of 100 (bold = true hazard, solid = MLE). Right: Estimation of the earthquake hazard from CPTI04 data (solid = MLE).

We also applied our estimators to the earthquake data of the Appennino Abruzzese region of Italy (Region 923) recently considered by [19], where Bayesian estimation methods are studied. The data comes from the Gruppo di Lavoro CPTI (2004) catalog [9]. It consists of 46 inter-quake times for Region 923, occurring after the year 1650 and with moment magnitude greater than 5.1 (details on the justification of these criteria is available in [19], page 14). Figure 1 shows the resulting estimator.

The main results of this paper are the characterizations, consistency and asymptotic behavior of the nonparametric MLE of a convex hazard function. The estimator is continuous and piecewise linear on [0, X(n)). Although we give a characterization of the MLE, the final form of the estimator is not explicit. We therefore propose an algorithm (based on the support reduction algorithm of [13]). This algorithm is discussed in a separate report, [16], and is available as the R package convexHaz [17].

To describe the local asymptotics of the MLE, we introduce the following process.

Definition 1.1

Let W(s) denote a standard two-sided Brownian motion, with W(0) = 0, and define Y(t)=0tW(s)ds+t4. The function {I(t):tR}, the invelope of the process {Y(t):tR} is defined as fallows:

the functionIis above the functionY:I(t)Y(t)for alltR;

the functionIhas a convex second derivative;

the functionIsatisfiesR{I(t)Y(t)}dI(3)(t)=0.

It was shown in [11] that the process I exists and is almost surely uniquely defined. Moreover, with probability one, I is three times differentiable at t = 0. The asymptotic behavior of all of our estimators may be described in terms of the derivatives of the invelope I at zero. The following theorem builds on the basic results in [12] concerning nonparametric estimation of a decreasing convex density.

Theorem 1.2

Suppose that h0 is convex, x0 > 0 is a point which satisfies h0(x0) > 0 and h0(x0)mak0, and h0() is continuous in a neighborhood of x0. Then


where I(2)(0) and I(3)(0) are the second and third derivatives at 0, respectively, of the invelope of Y(t)0tW(s)ds+t4 and where


The key to this result lies in Lemma 5.3, where we establish that the “touchpoints” (defined carefully in Section 2) cluster around x0 at a local scale of n−1/5. The assumption that h0 is strictly convex and continuous near x0 is crucial in this step. If h0(x0)=0 (and is continuous in a neighborhood of x0), we conjecture that h^n(x0) converges at the rate n1/2. Similar behavior has been noted for monotone density estimators in [8]. If h0 is discontinuous at x0, unpublished work of Cai and Low [7] suggests that h^n(x0) converges to h0(x0) at rate n1/3. The behavior of convex-constrained estimators in both of these situations remains unknown and is the subject of current research.

The limiting distributions of h^n(x0) and h^n(x0) involve the constants c1 and c2, which depend on the (unknown) hazard function h0, as well as the random variables (I(2)(0),I(3)(0)), which have a universal distribution free of the parameters of the problem. Thus, Theorem 1.2 can be used, in principle, to form confidence intervals for h0(x0) and h0(x0). This would involve estimation of the constants c1 = c1(h0, x0) and c2 = c2(h0, x0), respectively, both of which depend on h0(x0), and appropriate quantiles of the distributions of I(2)(0) and I(3)(0), respectively. Although virtually nothing is known about the distribution of the invelope and its derivatives analytically, the algorithms developed in [11] can easily be used to obtain simulated values of the needed quantiles. Other possible approaches to confidence intervals in this problem involve inversion of likelihood ratio tests (see [2-4] for this approach in the context of monotone or U-shaped function estimation) or resampling methods as in [22] and as discussed in [4] in the setting of nonparametric estimation of monotone functions. It should be noted that our Theorem 1.2 verifies one of the key hypotheses needed for validity of the general subsampling theory of [21,22], and therefore makes the subsampling approach to confidence intervals viable. The details and properties of all these approaches remain to be investigated.

The outline of this paper is as follows. Section 2 is dedicated to the proof of characterizations, existence and uniqueness of the MLE. Consistency is proved in Sections 3 and 4 establishes lower bounds for the pointwise minimax risk of h^n. Rates of convergence are established in Section 5, with Section 5.3 containing proofs of our main results concerning the limiting distribution at a fixed point. The companion technical report [15] also includes a detailed treatment of a least-squares estimator, as well as sketches of similar results for censored data and intensity functions of Poisson processes.

2. Characterizations, uniqueness and existence

Proposition 2.1

The function h^n which minimizes [var phi]n over K+ is piecewise linear. It has at most one change of slope between observations, except perhaps in one such interval, where, if the estimator touches zero, it may have two changes of slope (it is zero between these two changes). Also, between zero and X(1), the minimizer may have at most one change of slope, but this happens only if it touches zero, and in this case the estimator is increasing and equal to zero before the first change of slope. Between X(n−1) and X(n), the minimizer will also have at most one change of slope, and this only in the case where it is decreasing on [X(n−1), X(n)) and equal to zero after the change.


Consider any h and choose a convex g such that h(Xi) = g(Xi) for i = 1, … , n − 1 and h(x) ≥ g(x) ≥ 0 on [0, X(n)). It follows that [var phi]n(h) − [var phi]n(g) ≥ 0 if and only if H(Xi) ≥ G(Xi) for i = 1, … , n. Hence, the smaller we make g on [0, X(n)), the smaller [var phi]n(g) will become. It is not difficult to see that the smallest such g, with values of g(Xi) fixed, must have the prescribed form.

Since h^n is piecewise linear, it may be expressed as


where ν^j,a^,μ^j0. We let τj denote the points of change of slope of h^n where h^n is decreasing and let ηj > 0 denote the points of change of slope where h^n is increasing. For simplicity, we assume that these are ordered. Also, we have τkη1. As seen in the next lemma, the τj ’s and ηj’s correspond to “points of touch” or equality of processes defined on the one hand in terms of h^n and the data, and on the other hand just in terms of the data. We therefore also refer to them as “touchpoints” repeatedly in the remainder of the paper.

It is convenient to define the MLE in terms of the minimization of the criterion function


where Fn denotes the empirical distribution function of the data,


We will also use the notation Sn=1Fn(t) for the empirical survival function.

Lemma 2.2

Let F~n(t)=(1n)i=1n11[0,t](X(i)). A function h^n minimizes [var phi]n over K+ (and hence is the MLE) if and only if:


for all x ≥ 0with equality at τi for i = 1, … , k;


for all x ≥ 0with equality at ηj for j = 1, … , m;



Moreover, the minimizer h^n satisfies


for x [set membership] {τ1, … , τk, η1, … , ηm}.

Remark 2.3

As we assume a priori that h^n(X(n))=, we may rewrite the left-hand side terms in (2.2)–(2.4) via


for any set A. We will hereafter use this latter formulation.

Corollary 2.4

Let {τi}i=1k and {ηj}j=1m denote the change points of h^n as in (2.1). It fallows that



for i = 1, … , k and j = 1, … , m.


The function


is maximized at τi for i = 1, … , k. Since it is also differentiable, (2.7) follows. A similar argument proves (2.8).

Proof of Lemma 2.2

Consider any non-negative convex function h. It follows that there exists a non-negative constant a and non-negative measures ν and μ (these measures have supports with intersection containing at most one point) such that


For any function ĥ in K+, we calculate


since − log x ≥ 1 − x. Plugging in the explicit form of h from above, we find that the right-hand side is equal to


This is non-negative if h^ is a function which satisfies conditions (2.2)–(2.5). It follows that these conditions are sufficient to describe a minimizer of [var phi]n.

We next show that these conditions are necessary. To do this, we first define the directional derivative


If h^n minimizes [var phi]n, then for any γ such that h^n+εγ is in K+ for sufficiently small ε, we must have γφn(h^n)0. If however, h^n±εγ is in K+ for sufficiently small ε, then γφn(h^n)=0.

If we choose, respectively, γ(t) [equivalent] 1, (t − y)+, (y − t)+, then h^n+εγ is in K+ and we obtain the inequalities in conditions (2.2)–(2.4). Since (1±ε)h^n is also in K+, for sufficiently small ε, we obtain (2.5). Choosing γ = (τi − t)+, (t − ηj)+ yields the equalities in (2.2) and (2.3), respectively, since for each of these functions, h^n±εγ is in K+.

Lastly, we prove (2.6). For any τi, define


Since (1 ± ε)γ is also in K+, It follows that γφn(h^n)=0 and hence


Integration by parts and Corollary 2.4 yield (2.6) for x = τi. The case where x = ηj is obtained in a similar manner, but using γ(t)=(h^n(t)h^n(ηj))1(ηj,)(t) and (2.5).

The next corollary allows us to extend the equalities of the characterization of the MLE to some extra touchpoints. The significance of these equations will become clear in Section 5, where we consider asymptotics of the estimator.

Corollary 2.5

Suppose that h^n is strictly positive and recall the formulation given in (2.1). Then we also have that






The first two equalities follow by noting that if h^n is strictly positive, then for ε sufficiently small, h^n±εγ is in K+ for γ(t) = (t − τk)+, (η1 − t)+. Arguing as for Corollary 2.4 proves the remaining identities.

Proposition 2.6

There exists a unique minimizer h^n of [var phi]n over K+.


We will show that a minimizer exists by reducing the search to bounded positive convex functions on a compact domain. As this is a compact set, under the topology of uniform convergence, a minimizer of [var phi]n exists (see [27], Theorems 10.6, 10.8 and 27.3).

We must first handle the issue of a compact domain. As we assume a priori that h^n(X(n))=, we are really looking for the minimizer of the modified negative of the log-likelihood with domain [0, X(n)). However, we have also argued that the minimizer must have the specific functional form as described in Proposition 2.1. Therefore, it is sufficient to reduce the domain to [0, X(n−1) + δ], for any δ > 0, since h^n is then extended linearly beyond X(n−1) + δ in a unique manner. It will therefore be sufficient to show that we may reduce the search to functions bounded on [0, X(n−1)], with a derivative at X(n−1) which is bounded above.

Recall that the minimizer must satisfy (2.5). We therefore reduce our search to the class of functions which satisfy this condition. For any such h, write h = h+ + h, where h+ is increasing and h is decreasing. It follows that for any x,


A similar bound for h+ yields


for all x in (0, X(n)). Thus we know that h(x) must be bounded for x [set membership] (0, X(n−1)].

To show that h is also bounded at zero, we need to show that h’(X(1)) is bounded from below. Assuming that it is negative, we may write, for 0 < xX(1),


Fixing x* > 0 and less than X(1), we then obtain that


from which it follows that h must be bounded on the set [0, X(n−1)].

By (2.5), we also have that


if h is increasing on [X(n−1), X(n)). This implies that h’(X(n−1)) is bounded above, completing the proof.

We now show uniqueness. Suppose that h1 and h2 both minimize [var phi]n. Then, by (2.5), [var phi]n(h1) and [var phi]n(h2) differ only in the term 0loghi(t)1(tX(n))dFn(t). However, this term is strictly convex and it follows that h1(X(i)) = h2(X(i)) for all i = 1, … , n − 1.

Let h‾ = (h1 + h2)/2. By linearity, we have that [var phi]n(h1) = [var phi](h2) = [var phi](h‾), which implies that h‾ is also a minimizer. However, the only way that this is possible is if h‾ also satisfies the conditions of Proposition 2.1. This implies that one of the following holds:

  1. Both h1 and h2 are increasing and h1(0) = h2(0) = 0. In this case, they must have the same locations for their changes of slope, as otherwise h‾ violates Proposition 2.1.
  2. Point 1 above does not hold. Then, by the same argument as above, if h1 and h2 have at least one change of slope in an interval between observations (or between zero and X(1)), then these locations of change of slope must be equal.

If the first case holds, then it is not difficult to see that h1 [equivalent] h2 on [0, X(n)), as h1(t) = h1(t) = 0 on [0, τ1] and h1(Xi) = h2(Xi) for all observation points.

In the second case, we use a different argument. We know that neither h1 nor h2 have touchpoints before X(1). Let t* denote the first touchpoint of h1 and (without loss of generality) assume that the first touchpoint of h2 is greater than t*. Hence, by (2.6),


Now, h‾ = (h1 + h2)/2 and h2 are also minimizers of the MLE criterion function [var phi]n. Also, h‾ has a touchpoint at t* and h‾(X(1)) = h2(X(1)).

Averaging h‾ with h2 yields the functions h‾l = 2−1(h1h2) + h2, which satisfy

hl(X(1))=h2(X(1)),0thl(t)dt=Fn(t)for alll1.

Since h‾lh2 pointwise, it follows from the dominated convergence theorem that 0th2(t)dt=Fn(t). Therefore, since h1 and h2 are both linear on [0, t*] with h1(X(1)) = h2(X(1)) and 0th1(t)dt=0th2(t)dt, they must have both the same value and slope at X(1). That is, both h1(X(1)) = h2(X(2)) and h1(X(1))=h2(X(2)) hold.

Now write,


where X(1) < t1,j < t2,j < (...) < tmj−1,j < X(n), j = 1, 2, and where h1(X(i)) = h2(X(i)) for i = 1, … , n. We also assume that νi,j > 0 for i = 1, … , mj − 1, j = 1, 2. This implies, in particular, that hj(t) = aj + bjt for tt1,j , j = 1, 2, and since X(1) < t1,j, j = 1, 2, h1(X(1)) = h2(X(1)). Thus a1 + b1X(1) = a2 + b2X(1). From the argument above, it follows that b1=h1(X(1))=h2(X(1))=b2. We conclude that a1 = a2 and b1 = b2 so that h1(t) = h2(t) for 0 ≤ tt*. It also follows that t1,1 = t1,2.

Repeating this argument on the interval [t*, t**] with t** = min{t2,1, t2,2 shows that ν1,1 = ν1,2 or t2,1 = t2,2. Proceeding by induction yields νj,1 = νj,2 and tj+1,1 = tj+1,2 for j = 1, … , m1 − 1 = m2 − 1, hence uniqueness.

3. Consistency

Theorem 3.1

Suppose that X1, … , Xn are i.i.d. random variables with convex hazard function h0 and corresponding distribution function F0. Let T0 [equivalent] T0(F0) [equivalent] inf{t : F0(t) = 1}. The MLE h^n(t) is then consistent for all t [set membership] (0, T0). Also, for all δ > 0,

supδtT0δh^n(t)h(t)0,almost surely,

if T0 < ∞. if T0 = ∞, the above statement holds with T0δ replaced by any K < ∞.

Remark 3.2

If h0 is increasing at 0, then one can show that h^n is not consistent at zero. This is a frequently occurring difficulty of shape constrained estimators; see, for example, [1,12,30].


We first show that h^n is bounded appropriately so that we can select convergent subsequences. Decompose h^n into its decreasing and increasing parts: h^n=h^n,+h^n,. Then, arguing as in (2.14), it follows from (2.5) that


where the right-hand side is almost surely bounded and, in fact, converges almost surely to 10xS0(t)dtphb for all x > 0. Also,


where the right-hand side is almost surely bounded for x [set membership] (supp(F0))° and converges almost surely to 1xx+δS0(t)dtphb.

Now, take γ = h0 in the directional derivative (2.9). It follows that


noting that h^n(X(n))=, and hence


Fix any 0 < a < b < ∞ such that a, b [set membership] (supp(F0))°. It follows that limn X(n) > b with probability one (this can be shown using the Borel–Cantelli theorem). Also, sup Fn(t)F0(t)a.s.0 by the Glivenko–Cantelli lemma. Both of these events occur on the set Ω, with P(Ω) = 1. Fix ω [set membership] Ω. We will show that h^nh0 for such an ω.

Let {n’} denote any subsequence of {n}. By the bounds in (3.1) and (3.2) (which are finite for our choice of ω), using a classical diagonalization argument and the continuity of convex functions, we may extract a further subsequence {n”} such that h^nh^ pointwise on [a, b], where the limit h^ must be convex. We denote this subsequence {n} to simplify notation.

From Fatou’s lemma, it follows that


Note that this implies that if h^(t)=0, then h0(t) = 0. By (2.5) and integration by parts, we see that 1[0,X(n))h^n(t)Sn(t)dt. Therefore, again applying Fatou’s lemma,


It also follows that


Define h^=h0 for t [negated set membership] [a, b], which allows us to let both 1/a and b → ∞ in the above display. Since 0h0(t)S0(t)dt=1, it follows that


and this implies that h^(t)=h0(t) for all t [set membership] [a, b].

We have thus shown that every subsequence {h^n(x)} has a further subsequence which converges to the true hazard function h0(x) pointwise, for all x [set membership] (supp F0)°. It follows that {h^n} converges to h0 pointwise. By Theorem 10.8, page 90, [27], this implies that the claimed uniform convergence on [a, b] also holds. As this happens for any ω [set membership] Ω, and P(Ω) = 1, we have proven the result.

Corollary 3.3

Suppose that h0 is continuous and strictly positive at x0 [set membership] (supp F0)°. It follows that there exist touchpoints τnx0ηn such that τn, ηnx0 in probability.


Let ηn, τn be touchpoints such that τnx0ηn. If τn does not exist, then set τn = 0, and ηn = ∞ otherwise. Suppose that it is not the case that τn, ηnp x0. It then follows from Theorem 3.1 that there exists an interval I = [a, b] such that x0 [set membership] I for |I| > 0, lim supn τna and lim infn ηnb almost surely, and, lastly, limh^n(t)a.s.h0(t) on I. However, this implies that h0(t) is linear on I, which is a contradiction.

From consistency of the estimator, we also obtain consistency of the derivatives.

Corollary 3.4

Suppose that x [set membership] (a, b) and supatbh^n(t)h0(t)a.s.0. Then h^n(x)a.s.h0(x) at all continuity points x of h0 on (a, b).

This follows from the following simple result for convex functions, proved in [12].

Lemma 3.5

Suppose that h‾n is a sequence of convex functions satisfying supaxb |h‾n(t) − h0(t)| → 0 with probability one. Then (also with probability one), for all x [set membership] (a, b),


4. Asymptotic lower bounds for the minimax risk

Define the class of densities C by

C={f:[0,)[0,):0f(x)dx=1},{h(x)=f(x)(1F(x))is convex,h(x)>0for allx>0}.

We want to derive asymptotic lower bounds for the local minimax risks for estimating the convex hazard function h and its derivative at a fixed point. The L1-minimax risk for estimating a functional T of f0 based on a sample X1, … , Xn of size n from f0 which is known to be in a subset Cn of C is defined by


where the infimum ranges over all possible measurable functions Tn = tn(X1, … , Xn) mapping Rn to R. The shrinking classes Cn used here are Hellinger balls centered at f0,


Consider estimation of


Let f0C and x0 > 0 be fixed such that h0 is twice continuously differentiable at x0. Define, for ε > 0, the functions hε as follows:


Here, cε is chosen so that hε is continuous at x0ε. Using continuity of hε and a second order expansion of h0, it follows that cε = 3 + o(1) as ε → 0. Now, define fε by


where Hε(z)0zhε(u)du. It follows easily that



Furthermore, the following lemma holds.

Lemma 4.1

Under the above assumptions,



The lemma follows from Lemma 2 of [18] and


This is achieved by careful calculation.

Combining (4.3) and (4.4) with the lemma, it follows that


From these calculations, together with Lemma 5.1 of [12], we have the following result. Along with Theorem 1.2, it indicates that h^n(x0) and h^n(x0) achieve optimal rates and also have the correct dependence on the parameters h”(x0) and h(x0) (up to absolute constants).

Theorem 4.2 (Minimax risk lower bound)

For the functionals T1 and T2 as defined in (4.2), and with MMR1(n,T,Cn,τ) as defined in (4.1),


In particular, Theorem 1.2 shows that the MLE achieves the optimal pointwise rate of convergence, n2/5, at points x0 with h”(x0) > 0. Convergence rates over the larger class of bathtub-shaped functions would be slower: the MLE of a U-shaped hazard is known to converge locally at rate n1/3; see, for example, [2].

5. Rates of convergence

In this section, we identify the local rates of convergence of the MLE. Fix a point x0 [set membership] (supp F0)°. To obtain the results, we assume that h0() is continuous and strictly positive in a neighborhood of x0 and that h(x0) > 0.

5.1. Some useful estimates

For 0 < xy, define


Lemma 5.1

Let x0 [set membership] (supp F0)°. Then, for each ε > 0, there exist constants δ, c0, n0 and (positive) random variables Mn (independent of x, y), of order Op(1), such that for each |xx0| < δ,


for all nn0.


Note that Un=(PnP0)(gx,y,h^n), where


and, in view of the consistency established in Theorem 3.1, h^n is a convex function uniformly close to h0 on neighborhoods of x0. This leads to the consideration of the class of functions


with γ [equivalent] infx0δxx0+δ+c0 h0(x)/2 and we define Gn{h^nh0x0δx0+δ+c0γ}. The class Fx,R has an envelope function Fx,R(z) = γ−1{(zx)1[x,x+R](z)+2−1R1[x,x+R](z)} and hence the following second moment bound holds:


Furthermore, logN[](ε,Fx,R,L2(P0))Kε12 for some constant K by [29], Theorem 2.7.10, page 159, and a straightforward bracketing argument. It then follows from [29], Theorems 2.14.2 and 2.14.5, pages 240 and 244, that


Define Mn(ω) as the infimum (possibly +∞) of those values such that (5.1) holds and define A(n, j) to be the set [(j − 1)n−1/5, jn−1/5). Then, for m constant,


The jth summand is hence bounded by


due to (5.2). Thus it follows, using Theorem 3.1 to conclude that P(Gnc)0, that


where the sum in the bound is finite and converges to zero as m → ∞. This completes the proof of the claim.

A similar approach proves the following for the function


Lemma 5.2

Let x0 [set membership] (supp F0)°. Then, for each ε > 0, there exist constants δ, c0 > 0 and (positive) random variables Mn (independent of x, y) of order Op(1) such that for each |x − x0| < δ,


5.2. Asymptotic behavior of touchpoints and resulting bounds

Lemma 5.3

Let x0 > 0 be a point at which h0 has a continuous and strictly positive second derivative, and where h(x0) > 0. Let ξn be any sequence of numbers converging to x0 and define τn and ηn to be the largest touchpoint of h^n smaller than ξn and the smallest touchpoint larger than ξn, respectively. Then



By Theorem 3.1, we know that h^n is positive near x0 for large enough n. Also, it is either strictly increasing or strictly decreasing in a neighborhood of x0, or it is locally flat. If h^n is decreasing between τn and ηn, then (2.7) and (2.2) with equality at both ηn and τn hold. If, instead, h^n is increasing, then (2.8) and (2.3) with equality at both ηn and τn hold. There is only the potential for a problem in the locally flat case. However, since h^n is strictly positive, by Corollary 2.5, we can extend the necessary equalities to this case as well. Therefore, we need only consider two cases, h^n is either non-increasing or non-decreasing on [τn, ηn].

We first assume that h^n is non-increasing on [τn, ηn]. Define


and let mn be the midpoint of [τn, ηn], mn = (τn + ηn)/2. We may then calculate


From (2.2), we know that 2Hn,(mn)20mnAn,(t)dt. The equality in (2.2), together with (2.7), allows us to rewrite this as 0 ≥ L1,↓ + L2,↓, where L1,↓ is equal to




by integration by parts.

Now replace Fn by the true F0 in the definition of L1,↓ to obtain




Next, using a Taylor expansion of order 2 on the function 1h^n(x)1h0(x) and about the point mn, we obtain


since both h^n and h^n are consistent by Theorem 3.1, h^n(x)=0 on (τn, ηn) and because τn − ηn = op(1) by Corollary 3.3. Therefore, by Lemmas 5.1 and 5.2, together with the above calculations, we may write


We choose ε sufficiently small (so that the leading term in the last line of the above display is positive) and hence conclude that (ηnτn) = Op(n−1/5). A similar approach proves the non-decreasing case.

Lemma 5.4

Let ξn be a sequence converging to x0. Then, for any ε > 0, there exists an M > 1 and a c > 0 such that, with probability greater than 1 − ε, we have that there exist change points τn < ξn < ηn of h^n such that


for all n sufficiently large.


Fix ε > 0. From Lemma 5.3, it follows that there exist touchpoints ηn and τn, and an M > 1 such that ξnMn−1/5τnξnn−1/5ξn + n−1/5ηnξn + Mn−1/5.

Fix c > 0 and consider the event


First, assume that h^n is non-increasing on [τn, ηn]. On this set, we have that


where B is some constant depending on x0. Using the definitions in (5.4), as well as the equality in condition (2.2) with (2.7), it follows that


where F˘n(t)=Fn(t)F0(t) and S˘n(t)=Sn(t)S0(t). By the assumption on h0 and x0, and arguments similar to those used for Lemmas 5.1 and 5.2, we can show that


which is a contradiction to (5.6) if c is chosen large enough. A similar argument completes the proof for the non-decreasing case.

The next proposition is the key to proving tightness in the next section. The results follow from the previous lemmas and make extensive use of the underlying convexity.

Proposition 5.5

Under the assumptions of this section, we have that, for each M > 0,




For M, ε > 0 fixed, define ηn,1 to be the first point of touch after x0 + Mn−1/5 and ηn,i to be the first point of touch after ηn,i−1 + n−1/5 , i = 2, 3 Define the points ηn,−i for i = 1, 2, 3 similarly, but working to the left of x0. By Lemma 5.4, there exist points ξn,i [set membership] (ηn,i, ηn,i+1), i = 1, 2, −2, −3, and a constant c > 0 such that with probability at least 1 − ε, we have that h^n(ξn,i)h0(ξn,i)cn25.

As h^n is convex, it follows that for any t [set membership] [x0Mn−1/5, x0 + Mn−1/5],


since ξn,2ξn,1n−1/5, where h^n(t) denotes the right derivative at t . Because of the continuity of h0() near x0, we may replace h0(ξn,2) with h0(x0)+c~n15 for some new constant c~. The result follows. A similar argument shows the lower bound.

We now consider (5.7). By Lemma 5.3, there exists a constant K > M such that there exist two touchpoints in [x0 + Mn−1/5, x0 + Kn−1/5], n−1/5 apart with probability 1 − ε. The same is the case in the interval [x0Mn−1/5, x0Kn−1/5]. From Lemma 5.4, it follows that there exist points ξn,1 [set membership] [x0 + Mn−1/5, x0 + Kn−1/5] and ξn,2 [set membership] [x0Mn−1/5, x0Kn−1/5] such that h^n(ξn,i)h0(ξn,i)cn25, for i = 1, 2, with probability at least 1 − ε and sufficiently large n. Lastly, we have already shown that there exists a c’ such that with probability at least 1 − ε,


Therefore, with probability at least 1 − 3ε, we have that for any t [set membership] [x0Mn−1/5, x0 + Mn−1/5] and sufficiently large n,


for some constant B > 0. A similar argument proves the other direction.

5.3. Limit distribution theory at a fixed point

From Lemma 5.3, we know what rescaling is necessary to pick up a meaningful limit. The idea of the proof is now to write carefully a local version of the characterization of the MLE, Lemma 2.2, and to show that in the limit, these become the characterization of the invelope (Definition 1.1). The invelope I() is described in terms of the “driving” process Y(·). Our goal will then be to identify the two processes, one which converges to the invelope and another which converges to the driving process Y .

Note that at x0 (where h”(x0) > 0), we have three possibilities:

  1. h0(x0)mak0: By continuity, h0(x)mak0 in a neighborhood of x0. It follows from the consistency of the MLE derivatives that h^nmak0 for sufficiently large n and hence all touchpoints to be considered are of the “increasing” kind.
  2. h0(x0)phb0: By the same argument, all touchpoints are decreasing.
  3. h0(x0)=0: Since h(x0) > 0, by Corollary 2.5, there is always at least one touchpoint which satisfies both the non-increasing and non-decreasing properties. The limiting process may then be “stitched” together in an appropriate manner.

Therefore, it will be sufficient to prove the asymptotic results for both types of touchpoints: non-increasing and non-decreasing. For the sake of brevity, we outline the argument only for the non-increasing setting.

For any interval [a,b]R, let D[a, b] denote the space of cadlag functions from [a, b] into R endowed with the Skorohod topology and C[a, b] the space of continuous functions endowed with the uniform topology.

Driving process



where Hn is the empirical cumulative hazard function, defined by dHn(u)=(1Fn(t))1dFn(t). From [28], Chapter 7, Theorem 7.4.1, page 307, we know that for t [set membership] (0, T0) with T0 [equivalent] T0(F0) [equivalent] inf{x : F(x) = 1}, Bn(t)B(C(t)) in D[0, M] for M < T0, where B denoes a standard Brownian motion on [0, ∞) and C(t) = F0(t)/S0(t). Let xn(t) = x0 + n−1/5t and define


It is not difficult to show that


in D[−M, M] for each fixed 0 < M < ∞, where W is a two-sided Brownian motion process starting at 0 and C’(t) = h0(t)/S0(t). Next, define


where dHn(u)=Sn(u)Sn(u)dHn(u). The derivative (Y^n,loc)(t) is not difficult to calculate. By consistency of h^n and since suptSn(t)S0(t)0a.s., for any M > 0,


Y^n,loc is our driving process.

Invelope process

Recall definitions (5.4). Our initial candidate for the invelope is defined as




Notice that because of the presence of Sn(v) in its definition, I^n,loc(t) is not three times differentiable. We therefore define


From Proposition 5.5, we have that for any M > 0,


The derivatives of I^n,,loc will describe the limiting behavior of our estimators. First, though, we must show that this process converges to the invelope. To do this, define the vector


and fix M > 0. We will show that Z^n is tight in the product space


This will be done last. We first assume that Z^n has a weak limit and identify its unique limit. The two arguments together prove that Z^n, and hence I^n,,loc, have the appropriate limiting distribution.

Identifying the limit

It is sufficient to show that I^n,,loc satisfies (1.2)–(1.4) in the limit.

For condition (1.2), calculate


with equality at the (non-increasing) touchpoints of h^n, using (2.2). By (5.12), it follows that I^n,,loc satisfies (1.2) in the limit.

Next, the derivatives of I^n,,loc(t) are calculated as follows:


Due to Theorem 3.1 and Proposition 5.5, we have that


where n25[h^n(x0+n15t)h0(x0)n15th0(x0)] is convex, and hence the limit of (I^n,,loc)(t) will be convex. Thus, (1.3) is satisfied in the limit.

Let Bn(t)=(h0(x0)S0(x0))×(S0(t)h^n(t)). We may then write


Notice that sup|t|≤M |1 − Bn(x0 + n−1/5t)| →a.s. 0, with limnBn(x0+n15t) bounded. Therefore, from Proposition 5.5, it follows that


where n15[h^n(x0+n15t)h0(x0)] is piecewise constant, with jumps at the touchpoints of h^n. By consistency of h^n, we have


where g^n(t)=n15[h^n(x0+n15t)h0(x0)]. We say that a process Xn(t) is Op(1) if sup|t|≤M |Xn(t)| is Op(1).

Next, fix a c > 0. Since h^n is piecewise linear, it follows that dĝn puts mass only at the locations of touchpoints of h^n However, at these locations, by (5.14), the process I^n,loc(t)Y^n,loc(t) is equal to zero. It follows that




using Proposition 5.5, (5.12) and the fact that ĝn is increasing.

It remains to show that (1.4) is maintained under limits. This follows from the continuous mapping theorem since for any element z = {z1, z2, z3, z4, z5, z6} [set membership] E[−M, M],


is continuous in z for z6 increasing. We have thus shown that I^n,,loc(t) satisfies the invelope conditions (1.2)–(1.4) asymptotically. This shows that the only possible limit of I^n,,loc(t) is the process I.


We already know that Y^n,loc(t) and (Y^n,loc)(t) are tight in C[−M, M] and D[−M, M], respectively. To address tightness of the invelope processes, note that bounded] and increasing functions are compact in D[−M, M] and that bounded continuous functions with uniformly bounded derivatives are compact in C[−M, M]. These two facts allow us to address only stochastic boundedness of (I^n,,loc)(i)(t), i = 0, … , 3, to obtain tightness. Thus, Proposition 5.5, along with (5.15) and (5.16), says that (I^n,,loc)(t) and (I^n,,loc)(t) are tight. It remains to argue the same for (I^n,,loc)(t) and I^n,,loc(t). However, this will follow by Proposition 5.5, and (5.12), if we can show that both A^n, and A^n,t+B^n, are tight.

Let τn be the largest touchpoint smaller than x0. By (2.7), and after careful calculations, we have


By Proposition 5.5, Lemma 5.3 and Theorem 3.1, the first two terms are tight in C[−M, M]. Arguments similar to those used in the proof of Lemma 5.1, along with Lemma 5.3, may be used to handle the remaining terms. Since H^n,(τn)=0τnAn,(v)dv by Lemma 2.2, it follows that B^n, is tight in C[−M, M], which, in turn, implies that Z^n is tight in the space E[−M, M].

From the invelope to Theorem 1.2

By (5.15) and (5.16), the limiting behavior of n25(h^n(x0)h0(x0)) and n15(h^n(x0)h0(x0)) is the same as that of the second and third derivatives of I^n,,loc, which converge to the invelope of Y^n,loc. Define k1, k2 by


For any a, b > 0, bY(at)=da32b0tW(s)ds+a4bt4. Therefore, choose a, b so that a4b = k2 and a3/2b = k1. It follows that


Applying this rescaling to all processes shows that


It is now straightforward to calculate the correct constants, c1 and c2, of Theorem 1.2.


The research of Hanna Jankowski was supported by the NSERC; the research of Jon A. Wellner was supported in part by NSF Grant DMS-0503822 and NI-AID Grant 2R01 AI291968-04.


[1] Balabdaoui F. Consistent estimation of a convex density at the origin. Math. Methods Statist. 2007;16:77–95.
[2] Banerjee M. Estimating monotone, unimodal and U-shaped failure rates using asymptotic pivots. Statist. Sinica. 2008;18:467–492.
[3] Banerjee M, Wellner JA. Likelihood ratio tests for monotone functions. Ann. Statist. 2001;29:1699–1731.
[4] Banerjee M, Wellner JA. Confidence intervals for current status data. Scand. J. Statist. 2005;32:405–424.
[5] Baraud Y, Birgé L. Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Related Fields. 2009;143:239–284.
[6] Brunel E, Comte F. Adaptive nonparametric regression estimation in presence of right censoring. Math. Methods Statist. 2006;15:233–255.
[7] Cai T, Low M. Technical report. Dept. Statistics, Univ. Pennsylvania; 2007. Adaptive estimation and confidence intervals for convex functions and monotone functions.
[8] Carolan C, Dykstra R. Asymptotic behavior of the Grenander estimator at density flat regions. Canad. J. Statist. 1999;27:557–566.
[9] Gruppo di Lavoro MPS. Catalogo parametrico dei terremoti italiani, versione 2004 (cpti04) INGV; Bologna: 2004. Available at in Italian.
[10] Grenander U. On the theory of mortality measurement. II. Skand. Aktuarietidskr. 1956;39:125–153.
[11] Groeneboom P, Jongbloed G, Wellner JA. A canonical process for estimation of convex functions: The “invelope” of integrated Brownian motion +t4. Ann. Statist. 2001;29:1620–1652.
[12] Groeneboom P, Jongbloed G, Wellner JA. Estimation of a convex function: Characterizations and asymptotic theory. Ann. Statist. 2001;29:1653–1698.
[13] Groeneboom P, Jongbloed G, Wellner JA. The support reduction algorithm for computing non-parametric function estimates in mixture models. Scand. J. Statist. 2008;35:385–399. [PMC free article] [PubMed]
[14] Haupt E, Schäbe H. The TTT transformation and a new bathtub distribution model. J. Statist. Plann. Inference. 1997;60:229–240.
[15] Jankowski H, Wellner JA. Nonparametric estimation of a convex bathtub-shaped hazard function. Univ. Washington, Department of Statistics; 2007. Technical Report 521. [PMC free article] [PubMed]
[16] Jankowski H, Wellner JA. Computation of nonparametric convex hazard estimators via profile methods. J. Nonparametr. Stat. 2009;21:505–518. [PMC free article] [PubMed]
[17] Jankowski H, Wang X, McCauge H, Wellner J. convexHaz: R functions for convex hazard rate estimation. R package version 0.2. 2008.
[18] Jongbloed G. Minimax lower bounds and moduli of continuity. Statist. Probab. Lett. 2000;50:279–284.
[19] La Rocca L. Bayesian non-parametric estimation of smooth hazard rates for seismic hazard assessment. Scand. J. Statist. 2008;35:524–539.
[20] Lai CD, Xie M, Murthy NP. Bathtub-shaped failure rate life distributions. In: Balakrishnan N, Rao CR, editors. Advances in Reliability. Vol. 20. North-Holland Publishing; Amsterdam: 2001. pp. 69–104. Handbook of Statistics.
[21] Politis DN, Romano JP. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Statist. 1994;22:2031–2050.
[22] Politis DN, Romano JP, Wolf M. Subsampling. Springer; New York: 1999.
[23] Rajarshi S, Rajarshi MB. Bathtub distributions: A review. Comm. Statist. Theory Methods. 1988;17:2597–2621.
[24] Reboul L. Estimation of a function under shape restrictions. Applications to reliability. Ann. Statist. 2005;33:1330–1356.
[25] Reynaud-Bouret P. Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Related Fields. 2003;126:103–153.
[26] Robertson T, Wright FT, Dykstra RL. Order Restricted Statistical Inference. Wiley; Chichester: 1988.
[27] Rockafellar RT. Convex Analysis. Princeton Mathematical Series. Vol. 28. Princeton Univ. Press; Princeton, NJ: 1970.
[28] Shorack GR, Wellner JA. Empirical Processes with Applications to Statistics. Wiley; New York: 1986.
[29] van der Vaart AW, Wellner JA. Weak Convergence and Empirical Processes with Applications in Statistics. Springer; New York: 1996.
[30] Woodroofe M, Sun J. A penalized maximum likelihood estimate of f(0+) when f is nonincreasing. Statist. Sinica. 1993;3:501–515.