Sci China Ser A Math. Author manuscript; available in PMC 2010 August 30.
Published in final edited form as:
Sci China Ser A Math. 2009 July; 52(7): 1525–1538.
PMCID: PMC2929990
NIHMSID: NIHMS177734

# On the rate of convergence of the maximum like-lihood estimator of a k-monotone density

## Abstract

Bounds for the bracketing entropy of the classes of bounded k-monotone functions on [0, A] are obtained under both the Hellinger distance and the Lp(Q) distance, where 1 ≤ p < ∞ and Q is a probability measure on [0, A]. The result is then applied to obtain the rate of convergence of the maximum likelihood estimator of a k-monotone density.

Keywords: bracketing metric entropy, maximum likelihood estimator, Hellinger distance

## 1 Introduction

A function on (0, ∞) is called k-monotone if (−1)jf(j)(x) is non-negative, non-increasing, and convex for 0 ≤ jk−2 if k ≥ 2, and f is non-negative, non-increasing if k = 1. These functions fill the gap between monotone functions and completely monotone functions. They appear very commonly in nonparametric estimation, such as the Maximum Likelihood Estimator (MLE) in statistics via renewal theory and mixing of uniform distributions. Indeed, k-monotone functions have been studied since at least the 1950s; for example, Williamson[1] gave a characterization of m-monotone functions on (0, ∞) in 1956. In recent years, there has been some interest in statistics regarding this class of functions. We refer to [2] and the references therein for recent results and their statistical applications.

Note that a k-monotone function may not be bounded near t = 0. In order to study the metric entropy, we restrict ourselves to the subclass that consists of only the functions that are continuous at t = 0. We refer to this subclass as the class of k-monotone functions on [0, ∞). We denote by $Mk(I)$ the class of k-monotone functions on I, and by $Fk(I)$ the class of probability densities on I that are k-monotone.

For statistical applications, we wish to estimate the bracketing entropy of $Fk(R+)$ and $Mk(R+)$ under all Lp(Q) distances, where 1 ≤ p < ∞ and Q is any probability measure on $R+$, and under the Hellinger distance h which is defined by

$h(f,g)=(∫[f(x)−g(x)]2dx)1∕2.$
(1)

Recall that the bracketing metric entropy of a function class $F$ under distance ρ is defined as $logN[⋅](ε,F,ρ)$, where $logN[⋅](ε,F,ρ)$ is defined by

$N[⋅](ε,F,ρ)≔min{n:∃f‒1,f‒1,…,f‒n,f‒ns.t.ρ(f‒k,f‒k)≤ε,F⊂⋃k=1n[f‒k,f‒k]},$

where

$[f‒k,f‒k]={g∈F:f‒k≤g≤f‒k}.$

It is easy to see that both $Fk(R+)$ and $Mk(R+)$ are bounded under the Hellinger distance. However, they are not compact. Indeed, for any $δ<2$, we can find infinitely many functions in $Fk(R+)$ with mutual Hellinger distance at least δ. In fact, for any α > 0, the functions pn(t) = 2e−2t are clearly in $Fk(R+)$. For m > n,

$∫0∞[pn(t)−pm(t)]2dt=2−∫0∞22nα+mαe−(2nα+2mα)t∕2dt=2−42nα+mα2nα+2mα=2−42−(m−n)α∕21+2−(m−n)α≥2−42−α∕21+2−α=δ2$

for

$α=2log2(2+δ4−δ22−δ2).$

Note that the sequence {pn(t)}n≥1 is unbounded near the origin. This suggests us that for the Hellinger distance, we need to restrict ourselves to k-monotone functions whose values are bounded near the origin. However, the sequence {pn(t)}n<0 is uniformly bounded by 1. Thus, this example also indicates that the non-compactness of $Fk(R+)$ under the Hellinger distance is partly due to the fact that the interval is unbounded. Hence, we should also restrict ourselves to bounded intervals. Therefore, in what follows, we consider the subclasses $FkB([0,A])$ and $MkB([0,A])$ instead, where $FkB(I)$ and $MkB(I)$ denote the classes of functions that are bounded by B and belong to $Fk(I)$ and $Mk(I)$ respectively.

By changing variables, it is easy to see that

$N[⋅](ε,FkB([0,A]),h)=N[⋅](ε,FkAB([0,1]),h),N[⋅](ε,MkB([0,A]),h)=N[⋅](ε,MkAB([0,1]),h).$

Hence, we only need to consider the case A = 1.

Let us remark that when k = 1, the problem has been studied by Van de Geer[3] based on an earlier work of Birman and Solomjak[4]. For example, it was proved that (see also [5]; Theorem 2.7.5)

$logN[⋅](ε,M1B([0,1]),h)≤CBε−1$
(2)

for some absolute constant C > 0, and

$logN[⋅](ε,M1B(R+),‖⋅‖p,Q)≤CpBε−1,$
(3)

for some positive constant Cp depending only on p, where 1 ≤ p < ∞, and Q is any probability measure on $R+$. For simpler proofs, see also [6, 7]. In particular, the iteration method used in [6] is useful in our argument in this paper.

For k > 1, Gao[8] also established the following metric entropy bound for $MkB([0,1])$:

$C1B1∕kε−1∕k≤logN(ε,MkB([0,1]),‖⋅‖2)≤C2B1∕kε−1∕k.$
(4)

The method revealed a nice connection between the metric entropy of these function classes and the small ball probability of k-times integrated Brownian motions. However, because for k > 1 the square root of a k-monotone function may not be k-monotone, the metric entropy estimate under L2 distance does not yield an estimate under the Hellinger distance. Furthermore, that method cannot produce any result on bracketing metric entropy. Thus, it cannot be readily used to determine the convergence rate of the MLE of a k-monotone density.

In this paper, we directly estimate the bracketing metric entropy of these function classes under the Hellinger distance and under all Lp(Q) distances, where 1 ≤ p < ∞, and apply these estimates to statistical settings.

Our main tool is the following lemma, which provides a useful method to estimate bracketing entropy. An extension to more general integral operators will appear elsewhere.

Lemma 1. Let $F$ be a class of functions on [0, 1], and $G$ be the class of function on [0, 1] defined by $G={∫0α(x)f(t)dt:f∈F}$, where 0 ≤ α(x) ≤ 1 is any increasing function on [0, 1]. If $logN(ε,F,‖⋅‖1)≤ϕ(ε)$, where · 1 stands for the L1 distance under the Lebesgue measure on [0, 1], then

1. There exists a constant C depending only on p, such that for any probability measure Q on [0, 1]
$logN[⋅](εϕ(ε),G,‖⋅‖p,Q)≤Cϕ(ε),$
where · p,Q is defined by
$‖f−g‖p,Q=(∫01∣f(x)−g(x)∣pdQ(x))1∕p.$
2. If we further assume that for all functions in $g∈G$, g(x) ≥ δ, then there exists a constant C, such that
$logN[⋅](ε2δϕ(ε),G,h)≤Cϕ(ε),$
where h(f, g) is defined by (1).

Proof. Let {fi}, 1 ≤ ieϕ(ε), be an ε-net for F in the L1 distance under the Lebesgue measure on [0, 1]. For each i, and f F, we can write

$∫0α(x)f(t)dt=∫0α(x)(f(t)−fi(t))+dt−∫0α(x)(fi(t)−f(t))+dt+∫0α(x)fi(t)dt.$

Thus, if we define

$Gi+={g:g(x)=∫0α(x)(f(t)−fi(t))+dt,f∈F,‖f−fi‖1≤ε},Gi−={g:g(x)=∫0α(x)(fi(t)−f(t))+dt,f∈F,‖f−fi‖1≤ε},$

we have

$G⊂⋃i(Gi+−Gi−+∫0α(x)fi(t)dt).$

Note that $Gi+$ and $Gi−$ both consist of non-negative increasing functions bounded by ε. Thus, by (3) we can find e(ε) many ε/ϕ(ε)-brackets (with respect to · p,Q) that cover $Gi+−Gi−$. Say these brackets are $[h‒i,j,h‒i,j]$, 1 ≤ je(ε). Then clearly the brackets

$[∫0α(x)fi(t)dt+h‒i,j,∫0α(x)fi(t)dt+h‒i,j],1≤i≤eϕ(ε),1≤j≤ecϕ(ε)$

cover G. Statement (i) follows by noticing that these are ε/ϕ(ε)-brackets under the · p,Q distance.

To prove Statement (ii), we notice that with the additional assumption gδ, we have for any g1, g2 G,

$h(g1,g2)=‖g1−g2‖2≤12δ‖g1−g2‖2.$

Hence,

$N[⋅](ε2δϕ(ε),G,h)≤N[⋅](εϕ(ε),G,‖⋅‖2).$

Thus, Statement (ii) follows from Statement (i).

## 2 Under Hellinger distance

Note that by scaling arguments, we can easily show that

$N[⋅](ε,MkB([0,A]),h)=N[⋅](ε,MkAB([0,1]),h)==N[⋅](ε∕AB,Mk1([0,1]),h),N[⋅](ε,FkB([0,A]),h)=N[⋅](ε,FkAB([0,1]),h).$

Hence, we only need to consider the classes $Mk1([0,1])$ and $FkB([0,1])$.

Before we process with the detailed calculation, we make some observations that can simplify the later arguments. Firstly, because k-monotone C functions are dense in $Mk1([0,1])$ (cf. [8]), we can and will assume that all the densities in $Mk1([0,1])$ are continuously k-times differentiable. Secondly, if for I = [a, b] [0, 1] we define

$HkB(I)={f:f(u)=g(b−u),g∈MkB(I)}.$

then for every $f∈Hk1([0,1])$,

$djf(u)duj=djduj(g(1−u))=(−1)jg(j)(1−u)≥0$

for all 0 ≤ jk, and for all u [0, 1]. For $f∈Hk1([0,1])$, we can write

$f(u)=f(0)+f′(0)u+⋯+f(k−2)(0)(k−2)!uk−2+∫0u∫0tk−2⋯∫0t1f(k−1)(s)dsdt1⋯dtk−2.$
(5)

All the terms on the right-hand side are non-negative. The sum of the first k – 1 terms is a polynomial of u with degree k – 2, and non-negative coefficients.

### 2.1 Bounded k-monotone functions

For $f∈Hk1([0,1])$, because f(1) ≤ 1, we have

$∑k=0k−2f(k)(0)≤f(1)≤1.$

Denote

$Pk={a0+a1u+⋯+ak−2uk−2:a0,a1,…,ak−2≥0;∑i=0k−2ai≤1},$

and denote by $H~kB(I)$ the class of functions g on I [0, 1] that satisfy 0 ≤ g ≤ 1 and are of the form

$g(u)=∫0u∫0tk−2⋯∫0t1f(s)dsdt1⋯dtk−2$

where f are non-negative increasing functions on I. By (5), we have $Hk1([0,1])⊂Pk+H~k1([0,1])$, and thus

$N[⋅](ε,Hk1([0,1]),h)≤N[⋅](ε∕2,Pk,h)⋅N[⋅](ε∕2,H~k1([0,1]),h).$
(6)

Note that the set

${a0+a1u+⋯+ak−2uk−2:ai∈{1N,2N,…,NN},0≤i≤k−2}$

forms a $k∕N$-net for $Pk$ under the Hellinger distance. Indeed, for any $p=a0+a1u+⋯+ak−2uk−2∈Pm$, choose

$p~=[a0N]N+[a1N]Nu+⋯+[ak−2N]Nuk−2.$

Then

$∣p−p~∣2≤∣p−p~∣≤k∕N,$

which implies that $h(p,p~)≤k∕N$. Note that there are no more than Nk–1 elements in this set. By choosing N = 4−2, we obtain

$logN[⋅](ε∕2,Pk,h)≤klog(1+4k∕ε2).$
(7)

Of course, because $‖p−p~‖2≤k∕N$, we also have

$logN[⋅](ε∕2,Pk,‖⋅‖2)≤klog(1+2k∕ε).$
(8)

Our next goal is to estimate $N[⋅](ε∕2,H~k1([0,1]),h)$. To this end, we first consider

$N[⋅](ε∕2,H~k1([0,1]),‖⋅‖2).$

For $g∈H~k1([0,1])$, we have

$g(1)=∫01∫0tk−2⋯∫0t1f(s)dsdt1⋯dtk−2du≥∫1∕21∫1∕2tk−2⋯∫1∕2t1f(s)dsdt1⋯dtk−2du≥f(1∕2)∫1∕21∫1∕2tk−2⋯∫1∕2t1dsdt1⋯dtk−2du=f(1∕2)12k−1(k−1)!.$

Because g(1) is bounded by 1, and f is increasing, for all 0 ≤ u ≤ 1/2 we have

$f(u)≤f(1∕2)≤2k−1(k−1)!.$

Define f1 = min{f, 2k–1(k – 1)!} and f2 = f – f1. Then, by the above argument we see that f2 is non-negative and increasing, and is supported on [1/2, 1]. For $g∈H~k1([0,1])$ and 0 ≤ u ≤ 1, we can write

$g(u)=∫0u∫0tk−2⋯∫0t1[f1(s)+f2(s)]dsdt1⋯dtk−2=∫0u⋯∫0t1f1(s)dsdt1⋯dtk−2+1[1∕2,1](u)∫02u−1⋯∫0t112f2(1∕2+s∕2)ds⋯dtk−2.$

We construct two function classes:

$U(k)={∫0u∫0tk−2⋯∫0t1f(s)dsdt1⋯dtk−2:0≤f≤2k−1(k−1)!,fincreases},Vk={l(u)1[1∕2,1](u):l(u)=h(2u−1),h∈H~k1([0,1])}.$

Then the decomposition above gives $H~k1([0,1])⊂Uk+Vk$, and thus we have for any 0 < θ < 1,

$N[⋅](ε,H~k1([0,1]),‖⋅‖2)≤N[⋅]((1−θ)ε,Uk,‖⋅‖2)⋅N[⋅](θε,Vk,‖⋅‖2).$
(9)

We first claim that

$N[⋅](ε,Uk,‖⋅‖2)≤exp(Cε−1k)$
(10)

for some constant C depending only on k. Indeed, the claim is clearly true for k = 1, because in this case $Uk$ consists of monotone functions that are bounded by 1. Thus, when k = 1, the inequality (10) is the special case of (3).

Suppose that (10) is true for k = r. That is, $N[⋅](ε,Ur,‖⋅‖2)≤exp(Cε−1∕r)$. Because

$N(ε,Ur,‖⋅‖1)≤N[⋅](ε,Ur,‖⋅‖2),$

by applying Lemma 1 for L2 norm under Lebesgue measure, we have

$N[⋅](ε1+1r,Ur+1,‖⋅‖2)≤eCε−1∕r,$
(11)

which implies that $N[⋅](ε,Ur+1,‖⋅‖2)≤exp(Cε−1∕(r+1))$ with a different constant C. Hence (10) holds for all k ≥ 1. Therefore,

$N[⋅]((1−θ)ε,Uk,‖⋅‖2)≤exp(C[(1−θ)ε]−1∕k).$
(12)

Next, we prove that

$N[⋅](θε,Vk,‖⋅‖2)≤N[⋅](2θε,H~k1([0,1]),‖⋅‖2).$
(13)

Indeed, if $[h‒i,h‒i]$, 1 ≤ iN are $2θε$ under the · 2 distance that cover $H~k1([0,1])$, then the brackets

$[l‒i(x),l‒i(x)]≕[1[1∕2,1](x)h‒i(2x−1),1[1∕2,1](x)h‒i(2x−1)],$

1 ≤ iN, clearly cover $Vk$. To see they are θε-brackets, we notice that

$‖l‒i−l‒i‖2={∫1∕21∣h‒i(2x−1)−h‒i(2x−1)∣2dx}1∕2=12{∫01∣h‒i(u)−h‒i(u)∣2du}1∕2≤θε.$

Applying (12) and (13) to (9), we obtain

$N[⋅](ε,H~k1([0,1]),‖⋅‖2)≤exp(C[(1−θ)ε]−1∕k)N[⋅](2θε,H~k1([0,1]),‖⋅‖2).$

Choosing $12<θ<1$ and by iteration, we obtain

$N[⋅](ε,H~k1([0,1]),‖⋅‖2)≤exp(C′ε−1∕k)$
(14)

for a different constant C depending only on k. Plugging (14) and (8) into (6), we obtain

$N[⋅](ε,Mk1([0,1]),‖⋅‖2)=N[⋅](ε,Hk1([0,1]),‖⋅‖2)≤exp(Cε−1∕k).$
(15)

If we let

$M~kδ(I)={g∈Mk1(I):g≥δ},$

then because for all $g1,g2∈M~kδ([0,1])$ we have $h(g1,g2)≤12δ‖g1−g2‖2$, we obtain

$N[⋅](ε,M~kδ([0,1]),h)≤N[⋅](2δε,Mk1([0,1]),‖⋅‖2)≤exp(Cδ−12kε−1k).$
(16)

Back to our goal of estimating $N[⋅](ε,H~k1([0,1]),h)$. We define

$A={g∈H~k1([0,1]);g(1∕2)<δ}andB={g∈H~k1([0,1]);g(1∕2)≥δ}.$

Then

$A⊂δH~k1([0,1∕2])+Hk1([1∕2,1]),B⊂H~k1([0,1∕2])+M~kδ([1∕2,1]).$

Therefore

$N[⋅](ε,A,h)≤N[⋅](θε,δH~k1([0,1∕2]),h)⋅N[⋅]((1−θ)ε,Hk1([1∕2,1]),h),$
(17)

$N[⋅](ε,B,h)≤N[⋅]((1−η)ε,H~k1([0,1∕2]),h)⋅N[⋅](ηε,M~kδ([1∕2,1]),h).$
(18)

Note that

$N[⋅](θε,δH~k1([0,1∕2]),h)≤N[⋅](θε∕δ,H~k1([0,1]),h).$

Also note that

$N[⋅]((1−θ)ε,Hk1([1∕2,1]),h)=N[⋅]((1−θ)ε,Mk1([0,1∕2]),h)=N[⋅](2(1−θ)ε,Mk1([0,1]),h)≤N[⋅](2(1−θ)2ε,H~k1([0,1]),h)⋅N[⋅](2(1−θ)θε,Pk,h)≤N[⋅](2(1−θ)2ε,H~k1([0,1]),h)⋅(1+2k(1−θ)2θ2ε2)k,$

where the last inequality follows from (7). We choose θ and δ so that $2(1−θ)2=1.4$ and $θ∕δ=L$, where L is a large number to be fixed later. Thus by plugging the two bounds above into (17) we obtain

$N[⋅](ε,A,h)≤N[⋅](1.4ε,H~k1([0,1]),h)⋅N[⋅](Lε,H~k1([0,1]),h)⋅(1+C1ε2)k$
(19)

for some constant C1.

On the other hand, by choosing η so that $2(1−η)=1.4$, we have

$N[⋅]((1−η)ε,H~k1([0,1∕2]),h)≤N[⋅](1.4ε,H~k1([0,1]),h).$

Recall that by (16) we have

$N[⋅](ηε,M~kδ([1∕2,1]),h)≤exp(C2ε−1∕k),$

with a constant C2 depending on L. Plugging into (??), we obtain

$N[⋅](ε,B,h)≤N[⋅](1.4ε,H~k1([0,1]),h)⋅exp(C2ε−1∕k).$
(20)

But $H~k1([0,1])⊂A∪B$, (19) and (20) imply that

$logN[⋅](ε,H~k1([0,1]),h)≤logN[⋅](1.4ε,H~k1([0,1]),h)+logN[⋅](Lε,H~k1([0,1]),h)+C2ε−1∕k+klog(1+C1ε2)+2.$

Let $Z(ε)=ε1∕klogN[⋅](ε,H~k1([0,1]),h)$. Then the inequality above implies that

$Z(ε)≤1.4−1∕kZ(1.4ε)+L−1∕kZ(Lε)+C$

for some constant C, which further implies that

$supn≥εZ(η)≤(1.4−1∕k+L−1∕k)supn≥εZ(η)+C.$

By choosing L large so that 1.4−1/k + L−1/k < 1, we immediate obtain

$logN[⋅](ε,H~k1([0,1]),h)≤Cε−1∕k,$
(21)

for some constant C. Together with (7), we have

$logN[⋅](ε,Hk1([0,1]),h)≤Cε−1∕k.$

Summarizing, we obtain

Theorem 2. There exists a constant C depending only on k, such that

$logN[⋅](ε,MkB([0,A]),h)≤C(AB)12kε−1k.$
(22)

### 2.2 Bounded k-monotone densities

Now we consider $N[⋅](ε,FkB([0,1]),h)$. Because $FkB([0,1])⊂MkB([0,1])$, the result of Theorem 2 applies to $FkB([0,1])$. Our goal is to improve the constant $CB12k$ by using the the extra fact that $∫01g(u)du=1$ for $g∈FkB([0,1])$. For convenience, we will actually relax the condition $∫01g(u)du=1$ in the definition of $FkB([0,1])$ to the condition $∫01g(u)du≤1$.

Assume B > 2. For 1/B < δ < 1, we define

$Xδ={f∈FkB([0,1]):∫01∕2f(u)du<δ}andYδ={f∈FkB([0,1]):∫01∕2f(u)du≥δ}.$

Then, $FkB([0,1])=Xδ∪Yδ$. For any $f∈FkB([0,1])$, because f increases and $∫1∕21g(u)du≤1$, we have f(u) ≤ 2 for all u [0, 1/2]. Hence, we can decompose Xδ and Yδ into two classes of functions with disjoint supports:

$Xδ⊂δFk2∕δ([0,1∕2])+FkB([1∕2,1]),Yδ⊂Fk2([0,1∕2])+(1−δ)FkB∕(1−δ)([1∕2,1]).$

Therefore, we have

$N[⋅](ε,FkB([0,1]),h)≤N[⋅](2δε,δFk2∕δ([0,1∕2]),h)⋅N[⋅](1−4δε,FkB([1∕2,1]),h)+N[⋅](δε,Fk2([0,1∕2]),h)⋅N[⋅](1−δε,(1−δ)FkB∕(1−δ)([1∕2,1]),h)=N[⋅](2ε,Fk2∕δ([0,1∕2]),h)⋅N[⋅](1−4δε,FkB([1∕2,1]),h)+N[⋅](δε,Fk2([0,1∕2]),h)⋅N[⋅](ε,FkB∕(1−δ)([1∕2,1]),h)≤N[⋅](2ε,FkB([0,1]),h)⋅N[⋅](1−4δε,FkB∕2([0,1]),h)+exp(Cδ−12kε−1k)⋅N[⋅](ε,FkB∕(2−2δ)([0,1]),h)≤N[⋅](2ε,FkB([0,1]),h)+exp(Cδ−12kε−1k)]⋅N[⋅](1−4δε,FkB∕(2−2δ)([0,1]),h).$

By iteration, we obtain

$N[⋅](ε,FkB([0,1]),h)≤[N[⋅](2mε,FkB([0,1]),h)+exp(Cδ−12kε−1k)]⋅N[⋅]((1−4δ)mε,FkB∕(2−2δ)m([0,1]),h).$

Let δ = 1/ log2 B and choose m = log2 B, we have

$N[⋅](ε,FkB([0,1]),h)≤exp(C(logB)−12kε−1k).$

We summarize this bound in the following theorem:

Theorem 3. Let $FkB([0,A])$ be the class of k-monotone densities on [0, A] that are bounded by B. Then

$logN[⋅](ε,FkB([0,A]),h)≤C∣logAB∣12kε−1k,$

where C is a constant depending only on k.

## 3 Under Lp(Q) Distances

In this section we will consider the bracketing entropy of $MkB([0,A])$ and $FkB([0,A])$ under the Lp(Q) distance, where 1 ≤ p < ∞ and Q is any probability measure on [0, A]. We will prove the following theorem:

Theorem 4. (i) There exists a constant C depending only on p and k, such that for any probability measure Q on [0, A] that is absolutely continuous with respect to Lebesgue measure with bounded density q,

$logN[⋅](ε,MkB([0,A]),‖⋅‖p,Q)≤C(‖q‖∞ABp)1∕(pk)ε−1∕k,$
(23)

$logN[⋅](ε,FkB([0,A]),‖⋅‖p,Q)≤C(‖q‖∞ABp)1∕(pk)ε−1∕k,$
(24)

(ii) Let $GkB([0,A])$ consist of k-monotone functions g on [0, A], such that g’(0) ≥ −B, and let $G~kB([0,A])$ consist of probability densities that belong to $GkB([0,A])$. Then there exists a constant C depending only on p and k, such that for any probability measure Q on [0, A],

$logN[⋅](ε,GkB([0,A]),‖⋅‖p,Q)≤CA1∕(pk)B1∕kε−1∕k,$
(25)

$logN[⋅](ε,G~kB([0,A]),‖⋅‖p,Q)≤C[log(ABp)]1∕(pk)ε−1∕k.$
(26)

Remark 5. In view of the result proved in [8] the rate ε−1/k is sharp when p = 2.

Proof. The result is known for the case k = 1. Thus, we only need to consider the case k > 1.

To prove the first inequality in the statement (i), we note that

$N[⋅](ε,MkB([0,A]),‖⋅‖p,Q)≤N[⋅](ε‖q‖∞1∕pA1∕pB,Mk1([0,1]),‖⋅‖p),$

where · p is the Lp distance under Lebesgue measure. Thus, it suffices to prove that for any η > 0, $N[⋅](η,Mk1([0,1]),‖⋅‖p)≤Cη−1∕k$. However, this follows from the same argument as in the proof of Theorem 2. Indeed, the only change needed is to replace (11) by

$N[⋅](ε1+1k−1,Uk,‖⋅‖p)≤eCε−1∕(k−1),$

which leads to $N[⋅](ε,Uk,‖⋅‖p)≤eCε−1∕k$. The first inequality in the statement (i) then follows by iteration. The proof of the second inequality is also similar to that of Theorem 3.

To prove the first inequality in the statement (ii), we note that by the change of variables x = Au, we have

$N[⋅](ε,GkB([0,A]),‖⋅‖p,Q)=N[⋅](ε,GkA1∕pB([0,1]),‖⋅‖p,P)$

where P is the probability measure on [0, 1] defined by P([0, u]) = Q([0, Au]). Thus, it suffices to consider the case ABp = 1. Furthermore, by approximation, we can assume that P is absolutely continuously with respect to the Lebesgue measure on [0, 1]. Let α be the inverse function of P([0, x]). By the change of variable u = P([0, x]), it is easy to see that

$N[⋅](ε,Gk1([0,1]),‖⋅‖p,P)=N[⋅](ε,Gk,α1([0,1]),‖⋅‖p)$

where

$Gk,α1([0,1])={g(α(u)):g∈Gk1([0,1])}={∫0α(u)f(t)dt:f∈Mk−11([0,1])}.$

By (23) we have

$N[⋅](ε,Mk−11([0,1]),‖⋅‖1)≤exp(Cε1k−1).$

Applying Lemma 1, we obtain

$N[⋅](ε1+1k−1,Gk,α1([0,1]),‖⋅‖p)≤exp(Cε1k−1)).$

which leads the inequality (25). The proof of (26) follows the same argument as Theorem 3, and is thus omitted.

## 4 Rates of convergence for the maximum likelihood estimator of a bounded k-monotone density

Let $p^n,k$ be the MLE of a k-monotone density p0 on [0, A] based on X1, … , Xn i.i.d. with density $p0∈FkB([0,A])$ for some 0 < A, B < ∞. Thus p0 is bounded and concentrated on [0, A].

From [2] (see also [9]) we know that $p^n,k$ is characterized by

$1≥∫0ykyk(y−x)k−1p^n,k(x)dFn(x)for ally>0≥1p^n,k(0)∫0ykyk(y−x)k−1dFn(x)=1p^n,k(0)∫0yky(1−xy)k−1dFn(x),$
(27)

with equality in the inequality in (27) at points $τ∈supp(G^n,k)={τ1,τ2,…,τm}$, where we may assume that 0 < τ1 < … < τm (with m random) and where

$p^n,k(x)=∫0∞k(y−x)+k−1ykdG^n,k(y).$

Therefore

$p^n,k(0)≥∫0∞kyk(y−x)+k−1dFn(x)for ally>0.$
(28)

To apply our entropy bounds we need to show that $p^n,k$ is bounded with (arbitrarily) high probability when p0 is bounded. This is the content of the following proposition.

Proposition 6. Suppose that $p0∈FkB([0,A])$. Then the MLE $p^n,k$ satisfies

$p^n,k(0+)≤ksupx>0(Fn(x)∕x)=Op(1).$

Proof. The characterization of the MLE implies that

$1≥kyk∫0y(y−x)k−1p^n,k(x)dFn(x)for all0
(29)

$=ky∫0y(1−x∕y)k−1p^n,k(x)dFn(x)$
(30)

with equality at y = τ1:

$1=kτ1∫0τ1(1−x∕τ1)k−1p^n,k(x)dFn(x).$
(31)

Now note that the support of $G^n,k$ is concentrated on yτ1, so x/yx/τ1, or (1 – x/y) ≥ (1 – x/τ1) for yτ1 and 0 ≤ xτ1. Thus it follows that

$p^n,k(x)=∫0∞ky(1−xy)+k−1dG^n,k(y)≥(1−xτ1)+k−1∫0∞kydG^n,k(y)=(1−xτ1)+k−1p^n,k(0).$

Combining the last two displays we find that

$1=kτ1∫0τ1(1−x∕τ1)k−1p^n,k(x)dFn(x)≤kτ1∫0τ1(1−x∕τ1)k−1p^n,k(0)(1−x∕τ1)k−1dFn(x)≤kτ1p^n,k(0)Fn(τ1),$

which yields

$p^n,k(0)≤kFn(τ1)τ1≤ksupt>0Fn(t)t=ksupt>0Fn(t)F0(t)F0(t)t≤ksupt>0Fn(t)F0(t)⋅p0(0+)=Op(1)$

by Daniels’ inequality; see [10], Theorem 9.1.2, page 345.

Now suppose that $P$ is a collection of densities with respect to a sigma-finite measure μ. The following theorem is a simplified version of Theorem 3.4.4 of [5] page 327. Our rate theorem for the MLE $p^n,k$ over the class $Pk([0,A])$, the class of k-monotone densities on [0, A], will be proved by combining the upper bound of this theorem with (an easy modification of) the rate results given in [5], Theorem 3.2.5, page 289.

Theorem 7. Suppose that X1, … , Xn are i.i.d. P0 with density $p0∈P$. Let h be the Hellinger distance between densities, and mp be defined, for $p∈P$, by

$mp(x)=log(p(x)+p0(x)2p0(x)).$

Then

$M(p)−M(p0)≡P0(mp−mp0)≲−h2(p,p0).$

Furthermore, with $Mδ={mp−mp0:h(p,p0)≤δ,p∈P0}$, we also have

$EP0∗‖Gn‖Mδ≲J~[⋅](δ,P0,h)(1+J~[⋅](δ,P0,h)δ2n)≡ϕn(δ,P0),$
(32)

where

$J~[⋅](δ,P0,h)=∫0δ1+logN[⋅](ε,P0,h)dε.$

Here is our main result of this section:

Theorem 8. Suppose that $p0∈FkB([0,A])⊂Pk([0,A])$ for some 0 < A, B < ∞. Then the maximum likelihood estimator $p^n,k$ of p0 in $Pk$ satisfies

$h(p^n,k,p0)=Op(n−k2k+1).$

Remark 9. This generalizes the rate result of [3] (with resulting rate of convergence n−1/3) to k > 1. For the case k = 1, closely related results with the Hellinger metric replaced by the L1 metric, were obtained by [1113]. The rate established in Theorem 8 is apparently consistent with the local rate result of n−k/(2k+1) established (up to an envelope conjecture) by [2].

Proof. For simplicity, we write $p^n≡p^n,k$. Let M > 0 and K > 0. Then

$P(rnh(p^n,p0)≥2M)≤P(rnh(p^n,p0)≥2M,p^n(0)≤K)+P(p^n(0)>k)≡In+IIn,$

where, by Proposition 6,

$IIn≤P(k‖Fn∕F0‖∞p0(0+)>K)≤kp0(0+)K$

by Daniels’ inequality, and hence IIn can be made arbitrarily small (uniformly in n) by choosing K large. Now we essentially follow the proof of Theorem 3.2.5 of [5] (with θ identified with p), but exploit the fact that $p^n(0+)≤K$. Thus, letting $Mn(p)≡Pnm(p)$ we have for any large η > 0,

$P(rnh(p^n,p0)>2M,p^n(0+)≤K)≤∑j=M⌊log2(ηrn)⌋P(supf∈Sj,n(Mn(p)−Mn(p0)≥0))+P(2h(p^n,p0)≥η)≡IA,n+IB,n$

where the shells Sj,n are now defined with the additional restriction that p(0+) ≤ K:

$Sj,n={p∈Pk([0,A]):2j−1

Here the term IB,n can be made arbitrarily small for all large n by the consistency of $p^n$ established by [1]. Thus the same argument as in [9] yields, with $P0=FkK$ in (32), and $ϕn(δ,FkK)≡ϕn(δ)$,

$IA,n≲∑j≥Mϕn(2j∕rn)rn2n22j.$
(33)

By (32)

$ϕn(δ)≡ϕn(δ,FnK)=J~[⋅](δ,Fn,h)(1+J~[⋅](δ,Fn,h)δ2n).$

A direct calculation using Theorem 3 gives

$J~[⋅](δ,Fn,h)=CD1−12kδ1−12k.$

for the same constant C as in Theorem 3, where D log(AK)1/(2k). This implies,

$ϕn(δ)=CD1−12kδ1−12k(1+CD1−12kδ1−12kδ2n).$

By taking

$rn≡(c01−12kCD)2k2k+1nk2k+1$

where $c0≡(5−1)∕2$, we have $rn2ϕn(1∕rn)=n$. Note that the functions δ ϕn(δ)/δ are decreasing, therefore for any j > 0,

$ϕn(2j∕rn)rn2≤2jϕn(1∕rn)rn2.$

Hence, (33) can be estimated by

$IA,n≲∑j≥Mϕn(2j∕rn)rn2n22j≤∑j≥M2−j=2−(M−1).$

## Acknowledgments

This work was supported by National Science Foundation of USA (Grant No. DMS-0405855, DMS-0804587).

## Footnotes

MSC(2000): 62G05, 46B50

## References

1. Williamson RE. Multiply monotone functions and their Laplace transforms. Duke Math J. 1956;23:189–207.
2. Balabdaoui F, Wellner JA. Estimation of a k-monotone density: limit distribution theory and the spline connection. Ann Statist. 2007;35(6):2536–2564.
3. van de Geer S. Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann Statist. 1993;21(1):14–44.
4. Birman MŠ, Solomjak MZ. Piecewise polynomial approximations of functions of classes Wpα Mat Sb. 1967;73(115):331–355.
5. van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer-Verlag; New York: 1996. (Springer Series in Statistics).
6. Blei R, Gao F, Li W. Metric entropy of high dimensional distributions. Proc Amer Math Soc. 2007;135(12):4009–4018.
7. Gao F, Wellner JA. Entropy estimate for high-dimensional monotonic functions. J Multivariate Anal. 2007;98(9):1751–1764.
8. Gao F. Entropy estimate for k-monotone functions via small ball probability of integrated Brownian motion. Electron Commun Probab. 2008;13:121–130.
9. Balabdaoui F, Wellner JA. Technical Report. University of Washington, Department of Statistics; 2004. Estimation of a k-monotone density, part 1: characterizations, consistency, and minimax lower bounds. [PubMed]
10. Shorack GR, Wellner JA. Empirical processes with applications to statistics. John Wiley & Sons Inc; New York: 1986. (Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics).
11. Groeneboom P. Some current developments in density estimation. Mathematics and Computer Science (Amsterdam, 1983) 1986;1:163–192.
12. Birgé L. On estimating a density using Hellinger distance and some other strange facts. Probab Theory Related Fields. 1986;71(2):271–291.
13. Birgé L. The Grenander estimator: a nonasymptotic approach. Ann Statist. 1989;17(4):1532–1549.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.