PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Sci China Ser A Math. Author manuscript; available in PMC 2010 August 30.
Published in final edited form as:
Sci China Ser A Math. 2009 July; 52(7): 1525–1538.
doi:  10.1007/s11425-009-0102-y
PMCID: PMC2929990
NIHMSID: NIHMS177734

On the rate of convergence of the maximum like-lihood estimator of a k-monotone density

Abstract

Bounds for the bracketing entropy of the classes of bounded k-monotone functions on [0, A] are obtained under both the Hellinger distance and the Lp(Q) distance, where 1 ≤ p < ∞ and Q is a probability measure on [0, A]. The result is then applied to obtain the rate of convergence of the maximum likelihood estimator of a k-monotone density.

Keywords: bracketing metric entropy, maximum likelihood estimator, Hellinger distance

1 Introduction

A function on (0, ∞) is called k-monotone if (−1)jf(j)(x) is non-negative, non-increasing, and convex for 0 ≤ jk−2 if k ≥ 2, and f is non-negative, non-increasing if k = 1. These functions fill the gap between monotone functions and completely monotone functions. They appear very commonly in nonparametric estimation, such as the Maximum Likelihood Estimator (MLE) in statistics via renewal theory and mixing of uniform distributions. Indeed, k-monotone functions have been studied since at least the 1950s; for example, Williamson[1] gave a characterization of m-monotone functions on (0, ∞) in 1956. In recent years, there has been some interest in statistics regarding this class of functions. We refer to [2] and the references therein for recent results and their statistical applications.

Note that a k-monotone function may not be bounded near t = 0. In order to study the metric entropy, we restrict ourselves to the subclass that consists of only the functions that are continuous at t = 0. We refer to this subclass as the class of k-monotone functions on [0, ∞). We denote by Mk(I) the class of k-monotone functions on I, and by Fk(I) the class of probability densities on I that are k-monotone.

For statistical applications, we wish to estimate the bracketing entropy of Fk(R+) and Mk(R+) under all Lp(Q) distances, where 1 ≤ p < ∞ and Q is any probability measure on R+, and under the Hellinger distance h which is defined by

h(f,g)=([f(x)g(x)]2dx)12.
(1)

Recall that the bracketing metric entropy of a function class F under distance ρ is defined as logN[](ε,F,ρ), where logN[](ε,F,ρ) is defined by

N[](ε,F,ρ)min{n:f1,f1,,fn,fns.t.ρ(fk,fk)ε,Fk=1n[fk,fk]},

where

[fk,fk]={gF:fkgfk}.

It is easy to see that both Fk(R+) and Mk(R+) are bounded under the Hellinger distance. However, they are not compact. Indeed, for any δ<2, we can find infinitely many functions in Fk(R+) with mutual Hellinger distance at least δ. In fact, for any α > 0, the functions pn(t) = 2e−2t are clearly in Fk(R+). For m > n,

0[pn(t)pm(t)]2dt=2022nα+mαe(2nα+2mα)t2dt=242nα+mα2nα+2mα=242(mn)α21+2(mn)α242α21+2α=δ2

for

α=2log2(2+δ4δ22δ2).

Note that the sequence {pn(t)}n≥1 is unbounded near the origin. This suggests us that for the Hellinger distance, we need to restrict ourselves to k-monotone functions whose values are bounded near the origin. However, the sequence {pn(t)}n<0 is uniformly bounded by 1. Thus, this example also indicates that the non-compactness of Fk(R+) under the Hellinger distance is partly due to the fact that the interval is unbounded. Hence, we should also restrict ourselves to bounded intervals. Therefore, in what follows, we consider the subclasses FkB([0,A]) and MkB([0,A]) instead, where FkB(I) and MkB(I) denote the classes of functions that are bounded by B and belong to Fk(I) and Mk(I) respectively.

By changing variables, it is easy to see that

N[](ε,FkB([0,A]),h)=N[](ε,FkAB([0,1]),h),N[](ε,MkB([0,A]),h)=N[](ε,MkAB([0,1]),h).

Hence, we only need to consider the case A = 1.

Let us remark that when k = 1, the problem has been studied by Van de Geer[3] based on an earlier work of Birman and Solomjak[4]. For example, it was proved that (see also [5]; Theorem 2.7.5)

logN[](ε,M1B([0,1]),h)CBε1
(2)

for some absolute constant C > 0, and

logN[](ε,M1B(R+),p,Q)CpBε1,
(3)

for some positive constant Cp depending only on p, where 1 ≤ p < ∞, and Q is any probability measure on R+. For simpler proofs, see also [6, 7]. In particular, the iteration method used in [6] is useful in our argument in this paper.

For k > 1, Gao[8] also established the following metric entropy bound for MkB([0,1]):

C1B1kε1klogN(ε,MkB([0,1]),2)C2B1kε1k.
(4)

The method revealed a nice connection between the metric entropy of these function classes and the small ball probability of k-times integrated Brownian motions. However, because for k > 1 the square root of a k-monotone function may not be k-monotone, the metric entropy estimate under L2 distance does not yield an estimate under the Hellinger distance. Furthermore, that method cannot produce any result on bracketing metric entropy. Thus, it cannot be readily used to determine the convergence rate of the MLE of a k-monotone density.

In this paper, we directly estimate the bracketing metric entropy of these function classes under the Hellinger distance and under all Lp(Q) distances, where 1 ≤ p < ∞, and apply these estimates to statistical settings.

Our main tool is the following lemma, which provides a useful method to estimate bracketing entropy. An extension to more general integral operators will appear elsewhere.

Lemma 1. Let F be a class of functions on [0, 1], and G be the class of function on [0, 1] defined by G={0α(x)f(t)dt:fF}, where 0 ≤ α(x) ≤ 1 is any increasing function on [0, 1]. If logN(ε,F,1)ϕ(ε), where || · ||1 stands for the L1 distance under the Lebesgue measure on [0, 1], then

  1. There exists a constant C depending only on p, such that for any probability measure Q on [0, 1]
    logN[](εϕ(ε),G,p,Q)Cϕ(ε),
    where || · ||p,Q is defined by
    fgp,Q=(01f(x)g(x)pdQ(x))1p.
  2. If we further assume that for all functions in gG, g(x) ≥ δ, then there exists a constant C, such that
    logN[](ε2δϕ(ε),G,h)Cϕ(ε),
    where h(f, g) is defined by (1).

Proof. Let {fi}, 1 ≤ ieϕ(ε), be an ε-net for F in the L1 distance under the Lebesgue measure on [0, 1]. For each i, and f [set membership] F, we can write

0α(x)f(t)dt=0α(x)(f(t)fi(t))+dt0α(x)(fi(t)f(t))+dt+0α(x)fi(t)dt.

Thus, if we define

Gi+={g:g(x)=0α(x)(f(t)fi(t))+dt,fF,ffi1ε},Gi={g:g(x)=0α(x)(fi(t)f(t))+dt,fF,ffi1ε},

we have

Gi(Gi+Gi+0α(x)fi(t)dt).

Note that Gi+ and Gi both consist of non-negative increasing functions bounded by ε. Thus, by (3) we can find e(ε) many ε/ϕ(ε)-brackets (with respect to || · ||p,Q) that cover Gi+Gi. Say these brackets are [hi,j,hi,j], 1 ≤ je(ε). Then clearly the brackets

[0α(x)fi(t)dt+hi,j,0α(x)fi(t)dt+hi,j],1ieϕ(ε),1jecϕ(ε)

cover G. Statement (i) follows by noticing that these are ε/ϕ(ε)-brackets under the || · ||p,Q distance.

To prove Statement (ii), we notice that with the additional assumption gδ, we have for any g1, g2 [set membership] G,

h(g1,g2)=g1g2212δg1g22.

Hence,

N[](ε2δϕ(ε),G,h)N[](εϕ(ε),G,2).

Thus, Statement (ii) follows from Statement (i).

2 Under Hellinger distance

Note that by scaling arguments, we can easily show that

N[](ε,MkB([0,A]),h)=N[](ε,MkAB([0,1]),h)==N[](εAB,Mk1([0,1]),h),N[](ε,FkB([0,A]),h)=N[](ε,FkAB([0,1]),h).

Hence, we only need to consider the classes Mk1([0,1]) and FkB([0,1]).

Before we process with the detailed calculation, we make some observations that can simplify the later arguments. Firstly, because k-monotone C functions are dense in Mk1([0,1]) (cf. [8]), we can and will assume that all the densities in Mk1([0,1]) are continuously k-times differentiable. Secondly, if for I = [a, b] [subset or is implied by] [0, 1] we define

HkB(I)={f:f(u)=g(bu),gMkB(I)}.

then for every fHk1([0,1]),

djf(u)duj=djduj(g(1u))=(1)jg(j)(1u)0

for all 0 ≤ jk, and for all u [set membership] [0, 1]. For fHk1([0,1]), we can write

f(u)=f(0)+f(0)u++f(k2)(0)(k2)!uk2+0u0tk20t1f(k1)(s)dsdt1dtk2.
(5)

All the terms on the right-hand side are non-negative. The sum of the first k – 1 terms is a polynomial of u with degree k – 2, and non-negative coefficients.

2.1 Bounded k-monotone functions

For fHk1([0,1]), because f(1) ≤ 1, we have

k=0k2f(k)(0)f(1)1.

Denote

Pk={a0+a1u++ak2uk2:a0,a1,,ak20;i=0k2ai1},

and denote by H~kB(I) the class of functions g on I [subset or is implied by] [0, 1] that satisfy 0 ≤ g ≤ 1 and are of the form

g(u)=0u0tk20t1f(s)dsdt1dtk2

where f are non-negative increasing functions on I. By (5), we have Hk1([0,1])Pk+H~k1([0,1]), and thus

N[](ε,Hk1([0,1]),h)N[](ε2,Pk,h)N[](ε2,H~k1([0,1]),h).
(6)

Note that the set

{a0+a1u++ak2uk2:ai{1N,2N,,NN},0ik2}

forms a kN-net for Pk under the Hellinger distance. Indeed, for any p=a0+a1u++ak2uk2Pm, choose

p~=[a0N]N+[a1N]Nu++[ak2N]Nuk2.

Then

pp~2pp~kN,

which implies that h(p,p~)kN. Note that there are no more than Nk–1 elements in this set. By choosing N = [left ceiling]4−2[right ceiling], we obtain

logN[](ε2,Pk,h)klog(1+4kε2).
(7)

Of course, because pp~2kN, we also have

logN[](ε2,Pk,2)klog(1+2kε).
(8)

Our next goal is to estimate N[](ε2,H~k1([0,1]),h). To this end, we first consider

N[](ε2,H~k1([0,1]),2).

For gH~k1([0,1]), we have

g(1)=010tk20t1f(s)dsdt1dtk2du12112tk212t1f(s)dsdt1dtk2duf(12)12112tk212t1dsdt1dtk2du=f(12)12k1(k1)!.

Because g(1) is bounded by 1, and f is increasing, for all 0 ≤ u ≤ 1/2 we have

f(u)f(12)2k1(k1)!.

Define f1 = min{f, 2k–1(k – 1)!} and f2 = f – f1. Then, by the above argument we see that f2 is non-negative and increasing, and is supported on [1/2, 1]. For gH~k1([0,1]) and 0 ≤ u ≤ 1, we can write

g(u)=0u0tk20t1[f1(s)+f2(s)]dsdt1dtk2=0u0t1f1(s)dsdt1dtk2+1[12,1](u)02u10t112f2(12+s2)dsdtk2.

We construct two function classes:

U(k)={0u0tk20t1f(s)dsdt1dtk2:0f2k1(k1)!,fincreases},Vk={l(u)1[12,1](u):l(u)=h(2u1),hH~k1([0,1])}.

Then the decomposition above gives H~k1([0,1])Uk+Vk, and thus we have for any 0 < θ < 1,

N[](ε,H~k1([0,1]),2)N[]((1θ)ε,Uk,2)N[](θε,Vk,2).
(9)

We first claim that

N[](ε,Uk,2)exp(Cε1k)
(10)

for some constant C depending only on k. Indeed, the claim is clearly true for k = 1, because in this case Uk consists of monotone functions that are bounded by 1. Thus, when k = 1, the inequality (10) is the special case of (3).

Suppose that (10) is true for k = r. That is, N[](ε,Ur,2)exp(Cε1r). Because

N(ε,Ur,1)N[](ε,Ur,2),

by applying Lemma 1 for L2 norm under Lebesgue measure, we have

N[](ε1+1r,Ur+1,2)eCε1r,
(11)

which implies that N[](ε,Ur+1,2)exp(Cε1(r+1)) with a different constant C. Hence (10) holds for all k ≥ 1. Therefore,

N[]((1θ)ε,Uk,2)exp(C[(1θ)ε]1k).
(12)

Next, we prove that

N[](θε,Vk,2)N[](2θε,H~k1([0,1]),2).
(13)

Indeed, if [hi,hi], 1 ≤ iN are 2θε under the || · ||2 distance that cover H~k1([0,1]), then the brackets

[li(x),li(x)][1[12,1](x)hi(2x1),1[12,1](x)hi(2x1)],

1 ≤ iN, clearly cover Vk. To see they are θε-brackets, we notice that

lili2={121hi(2x1)hi(2x1)2dx}12=12{01hi(u)hi(u)2du}12θε.

Applying (12) and (13) to (9), we obtain

N[](ε,H~k1([0,1]),2)exp(C[(1θ)ε]1k)N[](2θε,H~k1([0,1]),2).

Choosing 12<θ<1 and by iteration, we obtain

N[](ε,H~k1([0,1]),2)exp(Cε1k)
(14)

for a different constant C depending only on k. Plugging (14) and (8) into (6), we obtain

N[](ε,Mk1([0,1]),2)=N[](ε,Hk1([0,1]),2)exp(Cε1k).
(15)

If we let

M~kδ(I)={gMk1(I):gδ},

then because for all g1,g2M~kδ([0,1]) we have h(g1,g2)12δg1g22, we obtain

N[](ε,M~kδ([0,1]),h)N[](2δε,Mk1([0,1]),2)exp(Cδ12kε1k).
(16)

Back to our goal of estimating N[](ε,H~k1([0,1]),h). We define

A={gH~k1([0,1]);g(12)<δ}andB={gH~k1([0,1]);g(12)δ}.

Then

AδH~k1([0,12])+Hk1([12,1]),BH~k1([0,12])+M~kδ([12,1]).

Therefore

N[](ε,A,h)N[](θε,δH~k1([0,12]),h)N[]((1θ)ε,Hk1([12,1]),h),
(17)

N[](ε,B,h)N[]((1η)ε,H~k1([0,12]),h)N[](ηε,M~kδ([12,1]),h).
(18)

Note that

N[](θε,δH~k1([0,12]),h)N[](θεδ,H~k1([0,1]),h).

Also note that

N[]((1θ)ε,Hk1([12,1]),h)=N[]((1θ)ε,Mk1([0,12]),h)=N[](2(1θ)ε,Mk1([0,1]),h)N[](2(1θ)2ε,H~k1([0,1]),h)N[](2(1θ)θε,Pk,h)N[](2(1θ)2ε,H~k1([0,1]),h)(1+2k(1θ)2θ2ε2)k,

where the last inequality follows from (7). We choose θ and δ so that 2(1θ)2=1.4 and θδ=L, where L is a large number to be fixed later. Thus by plugging the two bounds above into (17) we obtain

N[](ε,A,h)N[](1.4ε,H~k1([0,1]),h)N[](Lε,H~k1([0,1]),h)(1+C1ε2)k
(19)

for some constant C1.

On the other hand, by choosing η so that 2(1η)=1.4, we have

N[]((1η)ε,H~k1([0,12]),h)N[](1.4ε,H~k1([0,1]),h).

Recall that by (16) we have

N[](ηε,M~kδ([12,1]),h)exp(C2ε1k),

with a constant C2 depending on L. Plugging into (??), we obtain

N[](ε,B,h)N[](1.4ε,H~k1([0,1]),h)exp(C2ε1k).
(20)

But H~k1([0,1])AB, (19) and (20) imply that

logN[](ε,H~k1([0,1]),h)logN[](1.4ε,H~k1([0,1]),h)+logN[](Lε,H~k1([0,1]),h)+C2ε1k+klog(1+C1ε2)+2.

Let Z(ε)=ε1klogN[](ε,H~k1([0,1]),h). Then the inequality above implies that

Z(ε)1.41kZ(1.4ε)+L1kZ(Lε)+C

for some constant C, which further implies that

supnεZ(η)(1.41k+L1k)supnεZ(η)+C.

By choosing L large so that 1.4−1/k + L−1/k < 1, we immediate obtain

logN[](ε,H~k1([0,1]),h)Cε1k,
(21)

for some constant C. Together with (7), we have

logN[](ε,Hk1([0,1]),h)Cε1k.

Summarizing, we obtain

Theorem 2. There exists a constant C depending only on k, such that

logN[](ε,MkB([0,A]),h)C(AB)12kε1k.
(22)

2.2 Bounded k-monotone densities

Now we consider N[](ε,FkB([0,1]),h). Because FkB([0,1])MkB([0,1]), the result of Theorem 2 applies to FkB([0,1]). Our goal is to improve the constant CB12k by using the the extra fact that 01g(u)du=1 for gFkB([0,1]). For convenience, we will actually relax the condition 01g(u)du=1 in the definition of FkB([0,1]) to the condition 01g(u)du1.

Assume B > 2. For 1/B < δ < 1, we define

Xδ={fFkB([0,1]):012f(u)du<δ}andYδ={fFkB([0,1]):012f(u)duδ}.

Then, FkB([0,1])=XδYδ. For any fFkB([0,1]), because f increases and 121g(u)du1, we have f(u) ≤ 2 for all u [set membership] [0, 1/2]. Hence, we can decompose Xδ and Yδ into two classes of functions with disjoint supports:

XδδFk2δ([0,12])+FkB([12,1]),YδFk2([0,12])+(1δ)FkB(1δ)([12,1]).

Therefore, we have

N[](ε,FkB([0,1]),h)N[](2δε,δFk2δ([0,12]),h)N[](14δε,FkB([12,1]),h)+N[](δε,Fk2([0,12]),h)N[](1δε,(1δ)FkB(1δ)([12,1]),h)=N[](2ε,Fk2δ([0,12]),h)N[](14δε,FkB([12,1]),h)+N[](δε,Fk2([0,12]),h)N[](ε,FkB(1δ)([12,1]),h)N[](2ε,FkB([0,1]),h)N[](14δε,FkB2([0,1]),h)+exp(Cδ12kε1k)N[](ε,FkB(22δ)([0,1]),h)N[](2ε,FkB([0,1]),h)+exp(Cδ12kε1k)]N[](14δε,FkB(22δ)([0,1]),h).

By iteration, we obtain

N[](ε,FkB([0,1]),h)[N[](2mε,FkB([0,1]),h)+exp(Cδ12kε1k)]N[]((14δ)mε,FkB(22δ)m([0,1]),h).

Let δ = 1/ log2 B and choose m = [left ceiling]log2 B[right ceiling], we have

N[](ε,FkB([0,1]),h)exp(C(logB)12kε1k).

We summarize this bound in the following theorem:

Theorem 3. Let FkB([0,A]) be the class of k-monotone densities on [0, A] that are bounded by B. Then

logN[](ε,FkB([0,A]),h)ClogAB12kε1k,

where C is a constant depending only on k.

3 Under Lp(Q) Distances

In this section we will consider the bracketing entropy of MkB([0,A]) and FkB([0,A]) under the Lp(Q) distance, where 1 ≤ p < ∞ and Q is any probability measure on [0, A]. We will prove the following theorem:

Theorem 4. (i) There exists a constant C depending only on p and k, such that for any probability measure Q on [0, A] that is absolutely continuous with respect to Lebesgue measure with bounded density q,

logN[](ε,MkB([0,A]),p,Q)C(qABp)1(pk)ε1k,
(23)

logN[](ε,FkB([0,A]),p,Q)C(qABp)1(pk)ε1k,
(24)

(ii) Let GkB([0,A]) consist of k-monotone functions g on [0, A], such that g’(0) ≥ −B, and let G~kB([0,A]) consist of probability densities that belong to GkB([0,A]). Then there exists a constant C depending only on p and k, such that for any probability measure Q on [0, A],

logN[](ε,GkB([0,A]),p,Q)CA1(pk)B1kε1k,
(25)

logN[](ε,G~kB([0,A]),p,Q)C[log(ABp)]1(pk)ε1k.
(26)

Remark 5. In view of the result proved in [8] the rate ε−1/k is sharp when p = 2.

Proof. The result is known for the case k = 1. Thus, we only need to consider the case k > 1.

To prove the first inequality in the statement (i), we note that

N[](ε,MkB([0,A]),p,Q)N[](εq1pA1pB,Mk1([0,1]),p),

where || · ||p is the Lp distance under Lebesgue measure. Thus, it suffices to prove that for any η > 0, N[](η,Mk1([0,1]),p)Cη1k. However, this follows from the same argument as in the proof of Theorem 2. Indeed, the only change needed is to replace (11) by

N[](ε1+1k1,Uk,p)eCε1(k1),

which leads to N[](ε,Uk,p)eCε1k. The first inequality in the statement (i) then follows by iteration. The proof of the second inequality is also similar to that of Theorem 3.

To prove the first inequality in the statement (ii), we note that by the change of variables x = Au, we have

N[](ε,GkB([0,A]),p,Q)=N[](ε,GkA1pB([0,1]),p,P)

where P is the probability measure on [0, 1] defined by P([0, u]) = Q([0, Au]). Thus, it suffices to consider the case ABp = 1. Furthermore, by approximation, we can assume that P is absolutely continuously with respect to the Lebesgue measure on [0, 1]. Let α be the inverse function of P([0, x]). By the change of variable u = P([0, x]), it is easy to see that

N[](ε,Gk1([0,1]),p,P)=N[](ε,Gk,α1([0,1]),p)

where

Gk,α1([0,1])={g(α(u)):gGk1([0,1])}={0α(u)f(t)dt:fMk11([0,1])}.

By (23) we have

N[](ε,Mk11([0,1]),1)exp(Cε1k1).

Applying Lemma 1, we obtain

N[](ε1+1k1,Gk,α1([0,1]),p)exp(Cε1k1)).

which leads the inequality (25). The proof of (26) follows the same argument as Theorem 3, and is thus omitted.

4 Rates of convergence for the maximum likelihood estimator of a bounded k-monotone density

Let p^n,k be the MLE of a k-monotone density p0 on [0, A] based on X1, … , Xn i.i.d. with density p0FkB([0,A]) for some 0 < A, B < ∞. Thus p0 is bounded and concentrated on [0, A].

From [2] (see also [9]) we know that p^n,k is characterized by

10ykyk(yx)k1p^n,k(x)dFn(x)for ally>01p^n,k(0)0ykyk(yx)k1dFn(x)=1p^n,k(0)0yky(1xy)k1dFn(x),
(27)

with equality in the inequality in (27) at points τsupp(G^n,k)={τ1,τ2,,τm}, where we may assume that 0 < τ1 < … < τm (with m random) and where

p^n,k(x)=0k(yx)+k1ykdG^n,k(y).

Therefore

p^n,k(0)0kyk(yx)+k1dFn(x)for ally>0.
(28)

To apply our entropy bounds we need to show that p^n,k is bounded with (arbitrarily) high probability when p0 is bounded. This is the content of the following proposition.

Proposition 6. Suppose that p0FkB([0,A]). Then the MLE p^n,k satisfies

p^n,k(0+)ksupx>0(Fn(x)x)=Op(1).

Proof. The characterization of the MLE implies that

1kyk0y(yx)k1p^n,k(x)dFn(x)for all0<yτ1
(29)

=ky0y(1xy)k1p^n,k(x)dFn(x)
(30)

with equality at y = τ1:

1=kτ10τ1(1xτ1)k1p^n,k(x)dFn(x).
(31)

Now note that the support of G^n,k is concentrated on yτ1, so x/yx/τ1, or (1 – x/y) ≥ (1 – x/τ1) for yτ1 and 0 ≤ xτ1. Thus it follows that

p^n,k(x)=0ky(1xy)+k1dG^n,k(y)(1xτ1)+k10kydG^n,k(y)=(1xτ1)+k1p^n,k(0).

Combining the last two displays we find that

1=kτ10τ1(1xτ1)k1p^n,k(x)dFn(x)kτ10τ1(1xτ1)k1p^n,k(0)(1xτ1)k1dFn(x)kτ1p^n,k(0)Fn(τ1),

which yields

p^n,k(0)kFn(τ1)τ1ksupt>0Fn(t)t=ksupt>0Fn(t)F0(t)F0(t)tksupt>0Fn(t)F0(t)p0(0+)=Op(1)

by Daniels’ inequality; see [10], Theorem 9.1.2, page 345.

Now suppose that P is a collection of densities with respect to a sigma-finite measure μ. The following theorem is a simplified version of Theorem 3.4.4 of [5] page 327. Our rate theorem for the MLE p^n,k over the class Pk([0,A]), the class of k-monotone densities on [0, A], will be proved by combining the upper bound of this theorem with (an easy modification of) the rate results given in [5], Theorem 3.2.5, page 289.

Theorem 7. Suppose that X1, … , Xn are i.i.d. P0 with density p0P. Let h be the Hellinger distance between densities, and mp be defined, for pP, by

mp(x)=log(p(x)+p0(x)2p0(x)).

Then

M(p)M(p0)P0(mpmp0)h2(p,p0).

Furthermore, with Mδ={mpmp0:h(p,p0)δ,pP0}, we also have

EP0GnMδJ~[](δ,P0,h)(1+J~[](δ,P0,h)δ2n)ϕn(δ,P0),
(32)

where

J~[](δ,P0,h)=0δ1+logN[](ε,P0,h)dε.

Here is our main result of this section:

Theorem 8. Suppose that p0FkB([0,A])Pk([0,A]) for some 0 < A, B < ∞. Then the maximum likelihood estimator p^n,k of p0 in Pk satisfies

h(p^n,k,p0)=Op(nk2k+1).

Remark 9. This generalizes the rate result of [3] (with resulting rate of convergence n−1/3) to k > 1. For the case k = 1, closely related results with the Hellinger metric replaced by the L1 metric, were obtained by [1113]. The rate established in Theorem 8 is apparently consistent with the local rate result of n−k/(2k+1) established (up to an envelope conjecture) by [2].

Proof. For simplicity, we write p^np^n,k. Let M > 0 and K > 0. Then

P(rnh(p^n,p0)2M)P(rnh(p^n,p0)2M,p^n(0)K)+P(p^n(0)>k)In+IIn,

where, by Proposition 6,

IInP(kFnF0p0(0+)>K)kp0(0+)K

by Daniels’ inequality, and hence IIn can be made arbitrarily small (uniformly in n) by choosing K large. Now we essentially follow the proof of Theorem 3.2.5 of [5] (with θ identified with p), but exploit the fact that p^n(0+)K. Thus, letting Mn(p)Pnm(p) we have for any large η > 0,

P(rnh(p^n,p0)>2M,p^n(0+)K)j=Mlog2(ηrn)P(supfSj,n(Mn(p)Mn(p0)0))+P(2h(p^n,p0)η)IA,n+IB,n

where the shells Sj,n are now defined with the additional restriction that p(0+) ≤ K:

Sj,n={pPk([0,A]):2j1<rnh(p,p0)2j,p(0+)K}={pFkK([0,A]):2j1<rnh(p,p0)2j}.

Here the term IB,n can be made arbitrarily small for all large n by the consistency of p^n established by [1]. Thus the same argument as in [9] yields, with P0=FkK in (32), and ϕn(δ,FkK)ϕn(δ),

IA,njMϕn(2jrn)rn2n22j.
(33)

By (32)

ϕn(δ)ϕn(δ,FnK)=J~[](δ,Fn,h)(1+J~[](δ,Fn,h)δ2n).

A direct calculation using Theorem 3 gives

J~[](δ,Fn,h)=CD112kδ112k.

for the same constant C as in Theorem 3, where D [equivalent] [mid ] log(AK)[mid ]1/(2k). This implies,

ϕn(δ)=CD112kδ112k(1+CD112kδ112kδ2n).

By taking

rn(c0112kCD)2k2k+1nk2k+1

where c0(51)2, we have rn2ϕn(1rn)=n. Note that the functions δ [mapsto] ϕn(δ)/δ are decreasing, therefore for any j > 0,

ϕn(2jrn)rn22jϕn(1rn)rn2.

Hence, (33) can be estimated by

IA,njMϕn(2jrn)rn2n22jjM2j=2(M1).

Acknowledgments

This work was supported by National Science Foundation of USA (Grant No. DMS-0405855, DMS-0804587).

Footnotes

MSC(2000): 62G05, 46B50

References

1. Williamson RE. Multiply monotone functions and their Laplace transforms. Duke Math J. 1956;23:189–207.
2. Balabdaoui F, Wellner JA. Estimation of a k-monotone density: limit distribution theory and the spline connection. Ann Statist. 2007;35(6):2536–2564.
3. van de Geer S. Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann Statist. 1993;21(1):14–44.
4. Birman MŠ, Solomjak MZ. Piecewise polynomial approximations of functions of classes Wpα Mat Sb. 1967;73(115):331–355.
5. van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer-Verlag; New York: 1996. (Springer Series in Statistics).
6. Blei R, Gao F, Li W. Metric entropy of high dimensional distributions. Proc Amer Math Soc. 2007;135(12):4009–4018.
7. Gao F, Wellner JA. Entropy estimate for high-dimensional monotonic functions. J Multivariate Anal. 2007;98(9):1751–1764.
8. Gao F. Entropy estimate for k-monotone functions via small ball probability of integrated Brownian motion. Electron Commun Probab. 2008;13:121–130.
9. Balabdaoui F, Wellner JA. Technical Report. University of Washington, Department of Statistics; 2004. Estimation of a k-monotone density, part 1: characterizations, consistency, and minimax lower bounds. [PMC free article] [PubMed]
10. Shorack GR, Wellner JA. Empirical processes with applications to statistics. John Wiley & Sons Inc; New York: 1986. (Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics).
11. Groeneboom P. Some current developments in density estimation. Mathematics and Computer Science (Amsterdam, 1983) 1986;1:163–192.
12. Birgé L. On estimating a density using Hellinger distance and some other strange facts. Probab Theory Related Fields. 1986;71(2):271–291.
13. Birgé L. The Grenander estimator: a nonasymptotic approach. Ann Statist. 1989;17(4):1532–1549.