PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Am Math Mon. Author manuscript; available in PMC 2010 July 1.
Published in final edited form as:
Am Math Mon. 2010; 117(2): 138–160.
doi:  10.4169/000298910X476059
PMCID: PMC2834376
NIHMSID: NIHMS177692

Nemirovski's Inequalities Revisited

1. Introduction

Our starting point is the following well-known theorem from probability: Let X1, …, Xn be independent random variables with finite second moments, and let Sn=i=1nXi. Then

Var(Sn)=i=1nVar(Xi).
(1)

If we suppose that each Xi has mean zero, An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0, then (1) becomes

ESn2=i=1nEXi2.
(2)

This equality generalizes easily to vectors in a Hilbert space H with inner product [left angle bracket]·, ·[right angle bracket]: If the Xi's are independent with values in H such that An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0 and An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi2 < ∞, then Sn2=Sn,Sn=i,j=1nXi,Xj, and since An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpg[left angle bracket]Xi, Xj[right angle bracket] = 0 for ij by independence,

ESn2=i,j=1nEXi,Xj=i=1nEXi2.
(3)

What happens if the Xi's take values in a (real) Banach space (An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, ‖ · ‖)? In such cases, in particular when the square of the norm ‖ · ‖ is not given by an inner product, we are aiming at inequalities of the following type: Let X1, X2, …, Xn be independent random vectors with values in (An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, ‖ · ‖) with An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0 and An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi2 < ∞. With Sni=1nXi we want to show that

ESn2Ki=1nEXi2
(4)

for some constant K depending only on (An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, ‖ · ‖).

For statistical applications, the case (B,)=rd(d,r) for some r [set membership] [1, ∞] is of particular interest. Here the r-norm of a vector x [set membership] Rd is defined as

xr{(j=1d|xj|r)1/rif1r,max1jd|xj|ifr=.
(5)

An obvious question is how the exponent r and the dimension d enter an inequality of type (4). The influence of the dimension d is crucial, since current statistical research often involves small or moderate “sample size” n (the number of independent units), say on the order of 102 or 104, while the number d of items measured for each independent unit is large, say on the order of 106 or 107. The following two examples for the random vectors Xi provide lower bounds for the constant K in (4):

Example 1.1 (A lower bound in rd)

Let b1, b2, …, bd denote the standard basis of Rd, and let ε1, ε2, …, εd be independent Rademacher variables, i.e., random variables taking the values +1 and −1 each with probability 1/2. Define Xi:= εibi for 1 ≤ in := d. Then An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0, Xir2=1, and Snr2=d2/r=d2/r1i=1nXir2. Thus any candidate for K in (4) has to satisfy Kd2/r−1.

Example 1.2 (A lower bound in d)

Let X1, X2, X3,… be independent random vectors, each uniformly distributed on {−1, 1}d. Then An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0 and ‖Xi = 1. On the other hand, according to the Central Limit Theorem, n−1/2Sn converges in distribution as n → ∞ to a random vector Z=(Zj)j=1d with independent, standard Gaussian components, Zj ~ N(0, 1). Hence

supn1ESn2i=1nEXi2=supn1En1/2Sn2EZ2=Emax1jdZj2.

But it is well known that max1jd|Zj|2logdp0 as d → ∞. Thus candidates K (d) for the constant in (4) have to satisfy

liminfdK(d)2logd1.

At least three different methods have been developed to prove inequalities of the form given by (4). The three approaches known to us are:

  1. deterministic inequalities for norms;
  2. probabilistic methods for Banach spaces;
  3. empirical process methods.

Approach (a) was used by Nemirovski [14] to show that in the space rd with d ≥ 2, inequality (4) holds with K = C min(r, log(d)) for some universal (but unspecified) constant C. In view of Example 1.2, this constant has the correct order of magnitude if r = ∞. For statistical applications see Greenshtein and Ritov [7]. Approach (b) uses special moment inequalities from probability theory on Banach spaces which involve nonrandom vectors in An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg and Rademacher variables as introduced in Example 1.1. Empirical process theory (approach (c)) in general deals with sums of independent random elements in infinite-dimensional Banach spaces. By means of chaining arguments, metric entropies, and approximation arguments, “maximal inequalities” for such random sums are built from basic inequalities for sums of independent random variables or finite-dimensional random vectors, in particular, “exponential inequalities”; see, e.g., Dudley [4], van der Vaart and Wellner [26], Pollard [21], de la Pena and Gine [3], or van de Geer [25].

Our main goal in this paper is to compare the inequalities resulting from these different approaches and to refine or improve the constants K obtainable by each method. The remainder of this paper is organized as follows: In Section 2 we review several deterministic inequalities for norms and, in particular, key arguments of Nemirovski [14]. Our exposition includes explicit and improved constants. While finishing the present paper we became aware of yet unpublished work of Nemirovski [15] and Juditsky and Nemirovski [12] who also improved some inequalities of [14]. Rio [22] uses similar methods in a different context. In Section 3 we present inequalities of type (4) which follow from type and cotype inequalities developed in probability theory on Banach spaces. In addition, we provide and utilize a new type inequality for the normed space d. To do so we utilize, among other tools, exponential inequalities of Hoeffding [9] and Pinelis [17]. In Section 4 we follow approach (c) and treat d by means of a truncation argument and Bernstein's exponential inequality. Finally, in Section 5 we compare the inequalities resulting from these three approaches. In that section we relax the assumption that An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0 for a more thorough understanding of the differences between the three approaches. Most proofs are deferred to Section 6.

2. Nemirovski's Approach: Deterministic Inequalities for Norms

In this section we review and refine inequalities of type (4) based on deterministic inequalities for norms. The considerations for (B,)=rd follow closely the arguments of [14].

2.1. Some Inequalities for Rd and the Norms ‖ · ‖r

Throughout this subsection let An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg = Rd, equipped with one of the norms ‖ · ‖r defined in (5). For x [set membership] Rd we think of x as a column vector and write x[top top] for the corresponding row vector. Thus xx[top top] is a d × d matrix with entries xixj for i, j [set membership] {1, …, d}.

A first solution

Recall that for any x [set membership] Rd,

xrxqd1/q1/rxrfor1q<r.
(6)

Moreover, as mentioned before,

ESn22=i=1nEXi22.

Thus for 1 ≤ q < 2,

ESnq2(d1/q1/2)2ESn22=d2/q1i=1nEXi22d2/q1i=1nEXiq2,

whereas for 2 < r ≤ ∞,

ESnr2ESn22=i=1nEXi22d12/ri=1nEXir2.

Thus we may conclude that (4) holds with

K=K(d,r){d2/r1if1r2,d12/rif2r.

Example 1.1 shows that this constant K(d, r) is indeed optimal for 1 ≤ r ≤ 2.

A refinement for r > 2

In what follows we shall replace K(d, r) = d1−2/r with substantially smaller constants. The main ingredient is the following result:

Lemma 2.1

For arbitrary fixed r [set membership] [2, ∞) and x [set membership] Rd \ {0} let

h(x)2xr2r(|xi|r2xi)i=1d

while h (0) := 0. Then for arbitrary x, y [set membership] Rd,

xr2+h(x)Tyx+yr2xr2+h(x)Ty+(r1)yr2.

[16] and [14] stated Lemma 2.1 with the factor r − 1 on the right side replaced with Cr for some (absolute) constant C > 1. Lemma 2.1, which is a special case of the more general Lemma 2.4 in the next subsection, may be applied to the partial sums S0 := 0 and Ski=1kXi, 1 ≤ kn, to show that for 2 ≤ r < ∞,

ESkr2E(Sk1r2+h(Sk1)TXk+(r1)Xkr2)=ESk1r2+Eh(Sk1)TEXk+(r1)EXkr2=ESk1r2+(r1)EXkr2,

and inductively we obtain a second candidate for K in (4):

ESnr2(r1)i=1nEXir2for2r.

Finally, we apply (6) again: For 2 ≤ qr ≤ ∞ with q < ∞,

ESnr2ESnq2(q1)i=1nEXiq2(q1)d2/q2/ri=1nEXir2.

This inequality entails our first (q = 2) and second (q = r < ∞) preliminary result, and we arrive at the following refinement:

Theorem 2.2

For arbitrary r [set membership] [2, ∞],

ESnr2KNem(d,r)i=1nEXir2

with

KNem(d,r)infq[2,r](q1)d2/q2/r.

This constant KNem(d, r) satisfies the (in)equalities

KNem(d,r){=d12/rifd7r12elogdeifd3,

and

KNem(d,)2elogd3e.

Corollary 2.3

In the case (B,·)=d with d ≥ 3, inequality (4) holds with constant K = 2e log d − e. If the Xi 's are also identically distributed, then

En1/2Sn2(2elogde)EX12.

Note that

limdKNem(d,)2logd=limd2elogde2logd=e.

Thus Example 1.2 entails that for large dimension d, the constants KNem(d, ∞) and 2e log de are optimal up to a factor close to e [equals, single dot above] 2.7183.

2.2. Arbitrary Lr-spaces

Lemma 2.1 is a special case of a more general inequality: Let (T, Σ, μ) be a σ-finite measure space, and for 1 ≥ r < ∞ let Lr(μ) be the set of all measurable functions f : TR with finite norm

fr(|f|rdμ)1/r,

where two such functions are viewed as equivalent if they coincide almost everywhere with respect to μ. In what follows we investigate the functional

fV(f)fr2

on Lr(μ). Note that (Rd, ‖ · ‖r) corresponds to (Lr(μ), ‖ · ‖r) if we take T = {1, 2, …, d} equipped with counting measure μ.

Note that V(·) is convex; thus for fixed f, g [set membership] Lr(μ), the function

v(t)V(f+tg)=f+tgr2,t

is convex with derivative

v(t)=v1r/2(t)2|f+tg|r2(f+tg)gdμ.

By convexity of v it follows that

V(f+g)V(f)=v(1)v(0)v(0)DV(f,g).

This proves the lower bound in the following lemma. We will prove the upper bound in Section 6 by computation of v″ and application of Hölder's inequality.

Lemma 2.4

Let r ≥ 2. Then for arbitrary f, g [set membership] Lr(μ),

DV(f,g)=h(f)gdμwithh(f)2fr2r|f|r2fLq(μ),

where q := r/(r − 1). Moreover,

V(f)+DV(f,g)V(f+g)V(f)+DV(f,g)+(r1)V(g).

Remark 2.5

The upper bound for V(f + g) is sharp in the following sense: Suppose that μ(T) < ∞, and let f, go : TR be measurable such that | f | [equivalent] | go | [equivalent] 1 and ∫ f go = 0. Then our proof of Lemma 2.4 reveals that

V(f+tgo)V(f)DV(f,tgo)V(tgo)r1ast0.

Remark 2.6

If r = 2, Lemma 2.4 is well known and easily verified. Here the upper bound for V(f + g) is even an equality, i.e.,

V(f+g)=V(f)+DV(f,g)+V(g).

Remark 2.7

Lemma 2.4 improves on an inequality of [16]. After writing this paper we realized Lemma 2.4 is also proved by Pinelis [18]; see his (2.2) and Proposition 2.1, page 1680.

Lemma 2.4 leads directly to the following result:

Corollary 2.8

In the case An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg = Lr (μ) for r ≥ 2, inequality (4) is satisfied with K = r − 1.

2.3. A Connection to Geometrical Functional Analysis

For any Banach space (An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, ‖ · ‖) and Hilbert space (H, [left angle bracket]·, ·[right angle bracket], ‖ · ‖), their Banach-Mazur distance D(An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, H) is defined to be the infimum of

T·T1

over all linear isomorphisms T : An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpgH, where ‖T‖ and ‖ T −1‖ denote the usual operator norms

Tsup{Tx:xBx1},T1sup{T1y:yy1}.

(If no such bijection exists, one defines D(An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, H) := ∞.) Given such a bijection T,

ESn2T12ETSn2=T12i=1nETXi2T12T2i=1nEXi2.

This leads to the following observation:

Corollary 2.9

For any Banach space (An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, ‖ · ‖) and any Hilbert space (H, [left angle bracket], ·, ·,[right angle bracket], ‖ · ‖) with finite Banach-Mazur distance D(An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, H), inequality (4) is satisfied with K = D(An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg,H)2.

A famous result from geometrical functional analysis is John's theorem (see [24], [11]) for finite-dimensional normed spaces. It entails that D(B,2dimB)dimB. This entails the following fact:

Corollary 2.10

For any normed space (An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg, ‖ · ‖) with finite dimension, inequality (4) is satisfied with K = dim(An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg).

Note that Example 1.1 with r = 1 provides an example where the constant K = dim(An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg) is optimal.

3. The probabilistic approach: type and cotype inequalities

3.1. Rademacher Type and Cotype Inequalities

Let {εi} denote a sequence of independent Rademacher random variables. Let 1 ≤ p < ∞. A Banach space An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg with norm ‖ · ‖ is said to be of (Rademacher) type p if there is a constant Tp such that for all finite sequences {xi} in An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg,

Ei=1nεixipTppi=1nxip.

Similarly, for 1 ≤ q < ∞, An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg is of (Rademacher) cotype q if there is a constant Cq such that for all finite sequences {xi} in An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg,

Ei=1nεixiqCqqi=1nxiq.

Ledoux and Talagrand [13, p. 247], note that type and cotype properties appear as dual notions: if a Banach space An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg is of type p, its dual space An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg′ is of cotype q = p/(p − 1).

One of the basic results concerning Banach spaces with type p and cotype q is the following proposition:

Proposition 3.1

[13, Proposition 9.11, p. 248]. If An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg is of type p ≥ 1 with constant Tp, then

ESnp(2Tp)pi=1nEXip.

If An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg is of cotype q ≥ 1 with constant Cq, then

ESnq(2Cq)qi=1nEXiq.

As shown in [13, p. 27], the Banach space Lr(μ) with 1 ≤ r < ∞ (cf. Section 2.2) is of type min(r, 2). Similarly, Lr (μ) is cotype max(r, 2). If r ≥ 2 = p, explicit values for the constant Tp in Proposition 3.1 can be obtained from the optimal constants in Khintchine's inequalities due to Haagerup [8].

Lemma 3.2

For 2 ≤ r < ∞, the space Lr (μ) is of type 2 with constant T2 = Br, where

Br21/2(Γ((r+1)/2)π)1/r.

Corollary 3.3

For (An external file that holds a picture, illustration, etc.
Object name is nihms177692ig2.jpg) = Lr (μ), 2 ≤ r < ∞, inequality (4) is satisfied with K=4Br2.

Note that B2 = 1 and

Brr1easr.

Thus for large values of r, the conclusion of Corollary 3.3 is weaker than that of Corollary 2.8.

3.2. The Space d

The preceding results apply only to r < ∞, so the special space d requires different arguments. First we deduce a new type inequality based on Hoeffding's [9] exponential inequality: if ε1, ε2, …, εn are independent Rademacher random variables, a1, a2, …,an are real numbers, and υ2i=1nai2, then the tail probabilities of the random variable |i=1naiεi| may be bounded as follows:

(|i=1naiεi|z)2exp(z22v2),z0.
(7)

At the heart of these tail bounds is the following exponential moment bound:

Eexp(ti=1naiεi)exp(t2v2/2),t.
(8)

From the latter bound we shall deduce the following type inequality in Section 6:

Lemma 3.4

The space d is of type 2 with constant 2log(2d).

Using this upper bound together with Proposition 3.1 yields another Nemirovski-type inequality:

Corollary 3.5

For (B,)=d, inequality (4) is satisfied with K = KType2(d, ∞) = 8 log(2d).

Refinements

Let T2(d) be the optimal type-2 constant for the space d. So far we know that T2(d)2log(2d). Moreover, by a modification of Example 1.2 one can show that

T2(d)cdEmax1jdZj2.
(9)

The constants cd can be expressed or bounded in terms of the distribution function Φ of N(0, 1), i.e., Φ(z)=zϕ(x)dx with ϕ(x)=exp(x2/2)/2π. Namely, with W := max1≤jd|Zj|,

cd2=E(W2)=E02t1[tW]dt=02t(Wt)dt,

and for any t > 0,

(Wt){=1(W<t)=1(|Z1|<t)d=1(2Φ(t)1)d,d(|Z1|t)=2d(1Φ(t)).

These considerations and various bounds for Φ will allow us to derive explicit bounds for cd.

On the other hand, Hoeffding's inequality (7) has been refined by Pinelis [17, 20] as follows:

(|i=1naiεi|z)2K(1Φ(z/v)),z>0,
(10)

where K satisfies 3.18 ≤ K ≤ 3.22. This will be the main ingredient for refined upper bounds for T2(d). The next lemma summarizes our findings:

Lemma 3.6

The constants cd and T2(d) satisfy the following inequalities:

2logd+h1(d)cd{T2(d)2logd+h2(d),d12logd,d32logd+h3(d),d1
(11)

where h2(d) ≤ 3, h2(d) becomes negative for d > 4.13795 × 1010, h3(d) becomes negative for d ≥ 14, and hj(d) ~ −log log d as d → ∞ for j = 1, 2, 3.

In particular, one could replace KType2(d, ∞) in Corollary 3.5 with 8 log d + 4h2(d).

4. The Empirical Process Approach: Truncation and Bernstein's Inequality

An alternative to Hoeffding's exponential tail inequality (7) is a classical exponential bound due to Bernstein (see, e.g., [2]): Let Y1, Y2, …,Yn be independent random variables with mean zero such that |Yi| ≤ κ. Then for υ2=i=1nVar(Yi),

(|i=1nYi|x)2exp(x22(v2+κx/3)),x>0.
(12)

We will not use this inequality itself but rather an exponential moment inequality underlying its proof:

Lemma 4.1

For L > 0 define

e(L)exp(1/L)11/L.

Let Y be a random variable with mean zero and variance σ2 such that |Y| < κ. Then for any L > 0,

Eexp(YκL)1+σ2e(L)κ2exp(σ2e(L)κ2).

With the latter exponential moment bound we can prove a moment inequality for random vectors in Rd with bounded components:

Lemma 4.2

Suppose that Xi=(Xi,j)j=1d satisfies ‖Xiκ, and let Γ be an upper bound for max1jdi=1nVar(Xi,j). Then for any L > 0,

ESn2κLlog(2d)+ΓLe(L)κ

Now we return to our general random vectors Xi [set membership] Rd with mean zero and EXi2<. They are split into two random vectors via truncation: Xi=Xi(a)+Xi(b) with

Xi(a)1[Xiκo]XiandXi(b)1[Xi>κo]Xi

for some constant κo > 0 to be specified later. Then we write Sn = An + Bn with the centered random sums

Ani=1n(Xi(a)EXi(a))andBni=1n(Xi(b)EXi(b))

The sum An involves centered random vectors in [−2κo, 2κo]d and will be treated by means of Lemma 4.2, while Bn will be bounded with elementary methods. Choosing the threshold κ and the parameter L carefully yields the following theorem.

Theorem 4.3

In the case (B,)=d for some d ≥ 1, inequality (4) holds with

K=KTrBern(d,)(1+3.46log(2d))2

If each of the random vectors Xi is symmetrically distributed around 0, one may even set

K=KTrBern(symm)(d,)=(1+2.9log(2d))2

5. Comparisons

In this section we compare the three approaches just described for the space d. As to the random vectors Xi, we broaden our point of view and consider three different cases:

  • General case: The random vectors Xi are independent with EXi2< for all i.
  • Centered case: In addition, An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0 for all i.
  • Symmetric case: In addition, Xi is symmetrically distributed around 0 for all i.

In view of the general case, we reformulate inequality (4) as follows:

ESnESn2Ki=1nEXi2
(13)

One reason for this extension is that in some applications, particularly in connection with empirical processes, it is easier and more natural to work with uncentered summands Xi. Let us discuss briefly the consequences of this extension in the three frameworks:

Nemirovski's approach

Between the centered and symmetric cases there is no difference. If (4) holds in the centered case for some K, then in the general case

ESnESn2Ki=1nEXiEXi24Ki=1nEXi2

The latter inequality follows from the general fact that

EYEY2E((Y+EY)2)2EY2+2EY24EY2

This looks rather crude at first glance, but in the case of the maximum norm and high dimension d, the factor 4 cannot be reduced. For let Y [set membership] Rd have independent components Y1, …,Yd [set membership] {−1, 1} with P(Yj = 1) = 1− P (Yj = −1) = p [set membership] [1/2,1). Then ‖Y [equivalent] 1, while EY=(2p1)j=1d and

YEY={2(1p)ifY1==Yd=1,2potherwise.

Hence

EYEY2EY2=4((1p)2pd+p2(1pd))

If we set p = 1 − d−1/2 for d ≥ 4, then this ratio converges to 4 as d → ∞.

The approach via Rademacher type-2 inequalities

The first part of Proposition 3.1, involving the Rademacher type constant Tp, remains valid if we drop the assumption that An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0 and replace Sn with SnAn external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgSn. Thus there is no difference between the general and centered cases. In the symmetric case, however, the factor 2p in Proposition 3.1 becomes superfluous. Thus, if (4) holds with a certain constant K in the general and centered cases, we may replace K with K/4 in the symmetric case.

The approach via truncation and Bernstein's inequality

Our proof for the centered case does not utilize that An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgXi = 0, so again there is no difference between the centered and general cases. However, in the symmetric case, the truncated random vectors 1{‖Xiκ}Xi and 1{‖Xi > κ}Xi are centered, too, which leads to the substantially smaller constant K in Theorem 4.3.

Summaries and comparisons

Table 1 summarizes the constants K = K(d, ∞) we have found so far by the three different methods and for the three different cases. Table 2 contains the corresponding limits

Table 1
The different constants K (d, ∞).
Table 2
The different limits K*.
KlimdK(d,)logd

Interestingly, there is no global winner among the three methods. But for the centered case, Nemirovski's approach yields asymptotically the smallest constants. In particular,

limdKTrBern(d,)KNem(d,)=3.4622e2.20205,limdKType2(d,)KNem(d,)=4e1.47152,limdKTrBern(d,)KType2(d,)=3.46281.49645.

The conclusion at this point seems to be that Nemirovski's approach and the type 2 inequalities yield better constants than Bernstein's inequality and truncation. Figure 1 shows the constants K(d, ∞) for the centered case over a certain range of dimensions d.

Figure 1
Comparison of K (d, ∞) obtained via the three proof methods: Medium dashing (bottom) = Nemirovski; Small and tiny dashing (middle) = type 2 inequalities; Large dashing (top) = truncation and Bernstein inequality.

6. Proofs

6.1. Proofs for Section 2

Proof of (6)

In the case r = ∞, the asserted inequalities read

xxqd1/qxfor1q

and are rather obvious. For 1 ≤ q < r < ∞, (6) is an easy consequence of Hölder's inequality.

Proof of Lemma 2.4

In the case r = 2, V(f + g) is equal to V(f) + DV(f, g) + V(g). If r ≥ 2 and ‖ fr = 0, both DV(f, g) and ∫ h(f)g d μ are equal to zero, and the asserted inequalities reduce to the trivial statement that V(g) ≤ (r – 1) V(g). Thus let us restrict our attention to the case r > 2 and ‖ fr > 0.

Note first that the mapping

tht|f+tg|r

is pointwise twice continuously differentiable with derivatives

h˙t=r|f+tg|r1sign(f+tg)g=r|f+tg|r2(f+tg)g,h¨t=r(r1)|f+tg|r2g2.

By means of the inequality |x + y|b ≤ 2b–1 (|x|b + |y|b) for real numbers x, y and b ≥ 1, a consequence of Jensen's inequality, we can conclude that for any bound to > 0,

max|t|t0|h˙t|r2r2(|f|r1|g|+t0r1|g|r),max|t|t0|h¨t|r(r1)2r3(|f|r2|g|2+t0r2|g|r).

The latter two envelope functions belong to L1(μ). This follows from Hölder's inequality which we rephrase for our purposes in the form

|f|(1λ)r|g|λrdμfr(1λ)rgrλrfor0λ1.
(14)

Hence we may conclude via dominated convergence that

tv(t)f+tgrr

is twice continuously differentiable with derivatives

v(t)=r|f+tg|r2(f+tg)gdμ,v(t)=r(r1)|f+tg|r2g2dμ.

This entails that

tv(t)V(f+tg)=v(t)2/r

is continuously differentiable with derivative

v(t)=(2/r)v(t)2/r1v(t)=v2/r1(t)h(f+tg)gdμ.

For t = 0 this entails the asserted expression for DV(f, g). Moreover, υ(t) is twice continuously differentiable on the set {t [set membership] R: ‖ f + tgr > 0} which equals either R or R \ {to} for some to ≠ 0. On this set the second derivative equals

v(t)=(2/r)v(t)2/r1v(t)+(2/r)(2/r1)v(t)2/r2v(t)2=2(r1)|f+tg|r2f+tgrr2g2dμ2(r2)(|f+tg|r2(f+tg)f+tgrr1gdμ)22(r1)|f+tgf+tgr|r2|g|2dμ2(r1)gr2=2(r1)V(g)

by virtue of Hölder's inequality (14) with λ = 2/r. Consequently, by using

v(t)v(0)=0tv(s)ds2(r1)V(g)t,

we find that

V(f+g)V(f)DV(f,g)=v(1)v(0)v(0)=01(v(t)v(0))dt2(r1)V(g)01t=(r1)V(g).

Proof of Theorem 2.2

The first part is an immediate consequence of the considerations preceding the theorem. It remains to prove the (in)equalities and expansion for KNem(d, r). Note that KNem(d, r) is the infimum of h(q)d−2/r over all real q [set membership] [2, r], where h(q) := (q – 1)d2/q satisfies the equation

h(q)=d2/qq2((qlogd)2(logd2)logd).

Since 7 < e2 < 8, this shows that h is strictly increasing on [2, ∞) if d ≤ 7. Hence

KNem(d,r)=h(2)d2/r=d12/rifd7.

For d ≥ 8, one can easily show that log d(logd2)logd<2, so that h is strictly decreasing on [2,rd] and strictly increasing on [rd, ∞), where

rdlogd+(logd2)logd{<2logd,>2logd2.

Thus for d ≥ 8,

KNem(d,r)={h(r)d2/r=r1<2logd1ifrrd,h(rd)d2/rh(2logd)=2elogdeifrrd.

Moreover, one can verify numerically that KNem(d, r) ≤ d ≤ 2e log de for 3 ≤ d ≤ 7.

Finally, for d ≥ 8, the inequalities rd'2logd2<rd<rd2logd Yield

kNem(d,)=h(rd)(rd'1)d2/rd=2elogd3e,

and for 1 ≤ d ≤ 7, the inequality d = KNem(d, ∞) ≥ 2e log(d) – 3e is easily verified.

6.2. Proofs for Section 3

Proof of Lemma 3.2

The following proof is standard; see, e.g., [1, p. 160], [13, p. 247]. Let x1,…, xn be fixed functions in Lr (μ). Then by [8], for any t [set membership] T,

{E|i=1nεixi(t)|r}1/rBr(i=1n|xi(t)|2)1/2.
(15)

To use inequality (15) for finding an upper bound for the type constant for Lr, rewrite it as

E|i=1nεixi(t)|rBrr(i=1n|xi(t)|2)r/2.

It follows from Fubini's theorem and the previous inequality that

Ei=1nεixirr=E|i=1nεixi(t)|rdμ(t)=E|i=1nεixi(t)|rdμ(t)Brr(i=1n|xi(t)|2)r/2dμ(t).

Using the triangle inequality (or Minkowski's inequality), we obtain

{Ei=1nεixirr}2/rBr2{(i=1n|xi(t)|2)r/2dμ(t)}2/rBr2i=1n(|xi(t)|rdμ(t))2/r=Br2i=1nxir2

Furthermore, since g(v) = v2/r is a concave function of v ≥ 0, the last display implies that

Ei=1nεixir2{Ei=1nεixirr}2/rBr2i=1nxir2.

Proof of Lemma 3.4

For 1 ≤ in let xi=(xim)m=1d be an arbitrary fixed vector in Rd, and set Si=1nεixi. Further let Sm be the mth component of S with variance vm2i=1nxim2, and define v2max1mdvm2, which is not greater than i=1nxi2. It suffices to show that

ES22log(2d)v2.

To this end note first that h : [0, ∞) → [1, ∞) with

h(t)cosh(t1/2)=k=0tk(2k)!

is bijective, increasing, and convex. Hence its inverse function h−1 : [1, ∞) → [0, ∞) is increasing and concave, and one easily verifies that

h1(s)=(log(s+(s21)1/2))2(log(2s))2.

Thus it follows from Jensen's inequality that for arbitrary t > 0,

ES2=t2Eh1(cosh(tS))t2h1(Ecosh(tS))t2(log(2Ecosh(tS)))2.

Moreover,

Ecosh(tS)=Emax1mdcosh(tSm)m=1dEcosh(tSm)dexp(t2v2/2),

according to (8), whence

ES2t2log(2dexp(t2v2/2))2=(log(2d)/t+tv2/2)2.

Now the assertion follows if we set t=2log(2d)/v2.

Proof of (9)

We may replace the random sequence {Xi} in Example 1.2 with the random sequence {εiXi}, where {εi} is a Rademacher sequence independent of {Xi}. Thereafter we condition on {Xi}, i.e., we view it as a deterministic sequence such that n1i=1nXiXiT converges to the identity matrix Id as n → ∞, by the strong law of large numbers. Now Lindeberg's version of the multivariate Central Limit Theorem shows that

supn1Ei=1nεiXi2i=1nXi2supn1En1/2i=1nεiXi2cd2.

Inequalities for Φ

The subsequent results will rely on (10) and several inequalities for 1 − Φ(z). The first of these is:

1Φ(z)z1φ(z),z>0,
(16)

which is known as Mills' ratio; see [6] and [19] for related results. The proof of this upper bound is easy: since ϕ′ (z) = −(z) it follows that

1Φ(z)=zφ(t)dtztzφ(t)dt=1zzφ(t)dt=φ(z)z.
(17)

A very useful pair of upper and lower bounds for 1 − Φ(z) is as follows:

2z+z2+4φ(z)1Φ(z)43z+z2+8φ(z),z>1;
(18)

the inequality on the left is due to Komatsu (see, e.g., [10, p. 17]), while the inequality on the right is an improvement of an earlier result of Komatsu due to Szarek and Werner [23].

Proof of Lemma 3.6

To prove the upper bound for T2(d), let (εi)i≥1 be a Rademacher sequence. With S and Sm as in the proof of Lemma 3.4, for any δ > 0 we may write

ES2=02t(sup1md|Sm|>t)dtδ2+δ2t(sup1md|Sm|>t)dtδ2+m=1dδ2t(|Sm|>t)dt.

Now by (10) with v2 and vm2 as in the proof of Lemma 3.4, followed by Mills' ratio (16),

δ2t(|Sm|>t)dtδ4Kvm2πttet2/(2vm2)dt=4Kvm2πδet2/(2vm2)dt=4Kvm2δet2/(2vm2)2πvmdt=4Kvm2(1Φ(δ/vm))4Kv2(1Φ(δ/v)).
(19)

Now instead of the Mills' ratio bound (16) for the tail of the normal distribution, we use the upper bound part of (18). This yields

δ2t(|Sm|>t)dt4Kv2(1Φ(δ/v))4cv23δ/v+δ2/v2+8eδ2/(2v2),

where we have defined c4K/2π=12.88/2π, and hence

ES2δ2+4cdv23δ/v+δ2/v2+8eδ2/(2v2).

Taking

δ2=v22log(cd/22log(cd/2))

gives

ES2v2{2logd+2log(c/2)log(2log(dc/2))+82log(cd/2)32log(cd22log(cd/2))+2log(cd22log(cd/2))+8}v2{2logd+h2(d)}

where it is easily checked that h2(d) ≤ 3 for all d ≥ 1. Moreover h2(d) is negative for d > 4.13795 × 1010. This completes the proof of the upper bound in (11).

To prove the lower bound for cd in (11), we use the lower bound of [13, Lemma 6.9, p. 157] (which is, in this form, due to Giné and Zinn [5]). This yields

cd2λ1+λto2+11+λdto4t(1Φ(t))dt
(20)

for any to > 0, where λ = 2d(1 − Φ(to)). By using Komatsu's lower bound (18), we find that

tot(1Φ(t))dtto2tt+t2+4φ(t)dt2toto+to2+4toφ(t)dt=21+1+4/to2(1Φ(to)).

Using this lower bound in (20) yields

cd2λ1+λto2+11+λd81+1+4/to2(1Φ(to))=2d(1Φ(to))1+2d(1Φ(to)){to2+41+1+4/to2}4dto+to2+4φ(to)1+4dto+to2+4φ(to){to2+41+1+4/to2}.
(21)

Now we let c2/π and δ > 0 and choose

to2=2log(cd(2log(cd))(1+δ)/2).

For this choice we see that to → ∞ as d → ∞,

4dφ(to)=2d2π(2log(cd))(1+δ)/2cd=2(2log(cd))(1+δ)/2,

and

4dφ(to)to=2(2log(cd))(1+δ)/2{2log(cd/(2log(cd))(1+δ)/2)}1/2

as d → ∞, so the first term on the right-hand side of (21) converges to 1 as d → ∞, and it can be rewritten as

Ad{to2+41+1+4/to2}=Ad{2log(cd(2log(cd))(1+δ)/2)+41+1+4/to2}1{2logd+2logc(1+δ)log(2log(cd))+2}.

To prove the upper bounds for cd, we will use the upper bound of [13, Lemma 6.9, p. 157] (which is, in this form, due to Giné and Zinn [5]). For every to > 0

cd2Emax1jd|Zj|2to2+dto2tP(|Z1|>t)dt=to2+4dtot(1Φ(t))dtto2+4dtoφ(t)dt(by Mills'ratio)=to2+4d(1Φ(to)).

Evaluating this bound at to=2log(d/2π) and then using Mills' ratio again yields

cd22log(d/2π)+4d(1Φ(2log(d/2π)))2logd212log(2π)+4dφ(2log(d/2π))2log(d/2π)=2logdlog(2π)+22log(d/2π)2logd,
(22)

where the last inequality holds if

22log(d/2π)log(2π),

or equivalently if

logd8(log(2π))2+log(2π)2=3.28735,

and hence if d ≥ 27 > e3.28735… [equals, single dot above] 26.77. The claimed inequality is easily verified numerically for d = 3, …, 26. (It fails for d = 2.) As can be seen from (22), 2 log d − log(2π) gives a reasonable approximation to Emax1jdZj2 for large d. Using the upper bound in (18) instead of the second application of Mills' ratio and choosing to2=2log(cd/2log(cd)) with c2/π yields the third bound for cd in (11) with

h3(d)=log(π)log(log(cd))+831log(2log(cd))2log(cd)+1log(2log(cd))2log(cd)+4log(cd).

6.3. Proofs for Section 4

Proof of Lemma 4.1

It follows from An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgY = 0, the Taylor expansion of the exponential function, and the inequality An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpg|Y|mσ2κm−2 for m ≥ 2 that

Eexp(YκL)=1+E{exp(YκL)1YκL}1+m=21m!E|Y|m(κL)m1+σ2κ2m=21m!1Lm=1+σ2e(L)κ2.

Proof of Lemma 4.2

Applying Lemma 4.1 to the jth components Xi,j of Xi and Sn, j of Sn yields for all L > 0,

Eexp(±Sn,jκL)=i=1nEexp(±Xi,jκL)i=1nexp(Var(Xi,j)e(L)κ2)exp(Γe(L)κ2).

Hence

Ecosh(||Sn||κL)=Emax1jdcosh(Sn,jκL)j=1dEcosh(Sn,jκL)dexp(Γe(L)κ2).

As in the proof of Lemma 3.4 we conclude that

ESn2(κL)2(log(2Ecosh(SnκL)))2(κL)2(log(2d)+Γe(L)κ2)2=(κLlog(2d)+ΓLe(L)κ)2,

which is equivalent to the inequality stated in the lemma.

Proof of Theorem 4.3

For fixed κo > 0 we split Sn into An + Bn as described before. Let us bound the sum Bn first: For this term we have

Bni=1n{1[Xi>κo]Xi+E(1[Xi>κo]Xi)}=i=1n{1[Xi>κo]XiE(1[Xi>κo]Xi)}+2i=1nE(1[Xi>κo]Xi)Bn1+Bn2.

Therefore, since An external file that holds a picture, illustration, etc.
Object name is nihms177692ig1.jpgBn1 = 0,

EBn2E(Bn1+Bn2)2=EBn12+Bn22=i=1nVar(1[Xi>κo]Xi)+4(i=1nE(Xi1[Xi>κo]))2i=1nEXi2+4(i=1nEXi2κo)2=Γ+4Γ2κo2,

where we define Γi=1nEXi2.

The first sum, An, may be bounded by means of Lemma 4.2 with κ = 2κo, utilizing the bound

Var(Xi,j(a))=Var(1[Xiκo]Xi,j)E(1[Xiκo]Xi,j2)EXi2.

Thus

EAn2(2κoLlog(2d)+ΓLe(L)2κo)2.

Combining the bounds we find that

ESn2EAn2+EBn22κoLlog(2d)+ΓLe(L)2κo+Γ+2Γκo=ακo+βκo+Γ,

where α := 2L log(2d) and β := Γ (L e(L) + 4)/2. This bound is minimized if κo=β/α with minimum value

2αβ+Γ=(1+2L2e(L)+4Llog(2d))Γ,

and for L = 0.407 the latter bound is not greater than

(1+3.46log(2d))Γ.

In the special case of symmetrically distributed random vectors Xi, our treatment of the sum Bn does not change, but in the bound for EAn2 one may replace 2κo with κo, because EXi(a)=0. Thus

ESn2κoLlog(2d)+ΓLe(L)κo+Γ+2Γκo=ακo+βκo+Γ(withαLlog(2d),βΓ(Le(L)+2))=(1+2L2e(L)+2Llog(2d))Γ(ifκo=β/α).

For L = 0.5 the latter bound is not greater than

(1+2.9log(2d))Γ.

Acknowledgments

The authors owe thanks to the referees for a number of suggestions which resulted in a considerable improvement in the article. The authors are also grateful to Ilya Molchanov for drawing their attention to Banach-Mazur distances, and to Stanislaw Kwapien and Vladimir Koltchinskii for pointers concerning type and cotype proofs and constants. This research was initiated during the opening week of the program on “Statistical Theory and Methods for Complex, High-Dimensional Data” held at the Isaac Newton Institute for Mathematical Sciences from 7 January to 27 June, 2008, and was made possible in part by the support of the Isaac Newton Institute for visits of various periods by Dümbgen, van de Geer, and Wellner. The research of Wellner was also supported in part by NSF grants DMS-0503822 and DMS-0804587. The research of Dümbgen and van de Geer was supported in part by the Swiss National Science Foundation.

Biographies

• 

LUTZ DÜMBGEN received his Ph.D. from Heidelberg University in 1990. From 1990-1992 he was a Miller research fellow at the University of California at Berkeley. Thereafter he worked at the Universities of Bielefeld, Heidelberg, and Lübeck. Since 2002 he has been professor of statistics at the University of Bern. His research interests are nonparametric, multivariate, and computational statistics.

• 

SARA A. VAN DE GEER obtained her Ph.D. at Leiden University in 1987. She worked at the Center for Mathematics and Computer Science in Amsterdam, at the Universities of Bristol, Utrecht, Leiden, and Toulouse, and at the Eidgenössische Technische Hochschule in Zürich (2005-present). Her research areas are empirical processes, statistical learning, and statistical theory for high-dimensional data.

• 

MARK C. VERAAR received his Ph.D. from Delft University of Technology in 2006. In the year 2007 he was a postdoctoral researcher in the European RTN project “Phenomena in High Dimensions” at the IMPAN institute in Warsaw (Poland). In 2008 he spent one year as an Alexander von Humboldt fellow at the University of Karlsruhe (Germany). Since 2009 he has been Assistant Professor at Delft University of Technology (the Netherlands). His main research areas are probability theory, partial differential equations, and functional analysis.

• 

JON A. WELLNER received his B.S. from the University of Idaho in 1968 and his Ph.D. from the University of Washington in 1975. He got hooked on research in probability and statistics during graduate school at the UW in the early 1970s, and has enjoyed both teaching and research at the University of Rochester (1975–1983) and the University of Washington (1983-present). If not for probability theory and statistics, he might be a ski bum.

Contributor Information

Lutz Dümbgen, Institute of Mathematical Statistics and Actuarial Science, University of Bern, Alpeneggstrasse 22, CH-3012 Bern, Switzerland.

Sara A. van de Geer, Seminar for Statistics, ETH Zurich, CH-8092 Zurich, Switzerland.

Mark C. Veraar, Delft Institute of Applied Mathematics, Delft University of Technology, P.O. Box 5031, 2600 GA Delft, The Netherlands.

Jon A. Wellner, Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195-4322.

References

1. Araujo A, Giné E. Wiley Series in Probability and Mathematical Statistics. John Wiley; New York: 1980. The Central Limit Theorem for Real and Banach Valued Random Variables.
2. Bennett G. Probability inequalities for the sum of independent random variables. J Amer Statist Assoc. 1962;57:33–45. doi: 10.2307/2282438. [Cross Ref]
3. de la Peña VH, Giné E. Probability and its Applications. Springer-Verlag; New York: 1999. Decoupling: From Dependence to Independence.
4. Dudley RM. Cambridge Studies in Advanced Mathematics. Vol. 63. Cambridge University Press; Cambridge: 1999. Uniform Central Limit Theorems.
5. Giné E, Zinn J. Central limit theorems and weak laws of large numbers in certain Banach spaces. Z Wahrsch Verw Gebiete. 1983;62:323–354. doi: 10.1007/BF00535258. [Cross Ref]
6. Gordon RD. Values of Mills' ratio of area to bounding ordinate and of the normal probability integral for large values of the argument. Ann Math Statistics. 1941;12:364–366. doi: 10.1214/aoms/1177731721. [Cross Ref]
7. Greenshtein E, Ritov Y. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli. 2004;10:971–988. doi: 10.3150/bj/1106314846. [Cross Ref]
8. Haagerup U. The best constants in the Khintchine inequality. Studia Math. 1981;70:231–283.
9. Hoeffding W. Probability inequalities for sums of bounded random variables. J Amer Statist Assoc. 1963;58:13–30. doi: 10.2307/2282952. [Cross Ref]
10. Itô K, McKean HP., Jr . Classics in Mathematics. Springer-Verlag; Berlin: 1974. Diffusion Processes and their Sample Paths.
11. Johnson WB, Lindenstrauss J. Handbook of the Geometry of Banach Spaces. I. North-Holland; Amsterdam: 2001. Basic concepts in the geometry of Banach spaces; pp. 1–84.
12. Juditsky A, Nemirovski AS. Tech report. Georgia Institute of Technology; Atlanta, GA: 2008. Large deviations of vector-valued martingales in 2-smooth normed spaces.
13. Ledoux M, Talagrand M. Ergebnisse der Mathematik und ihrer Grenzgebiete 3. Folge / A Series of Modern Surveys in Mathematics. Vol. 23. Springer-Verlag; Berlin: 1991. Probability in Banach Spaces: Isoperimetry and Processes.
14. Nemirovski AS. Lectures on Probability Theory and Statistics (Saint-Flour, 1998), Lecture Notes in Mathematics. Vol. 1738. Springer; Berlin: 2000. Topics in non-parametric statistics; pp. 85–277.
15. Nemirovski AS. Regular Banach spaces and large deviations of random sums. 2004. working paper.
16. Nemirovski AS, Yudin DB. Problem Complexity and Method Efficiency in Optimization. John Wiley; Chichester, UK: 1983.
17. Pinelis I. Extremal probabilistic problems and Hotelling's T2 test under a symmetry condition. Ann Statist. 1994;22:357–368. doi: 10.1214/aos/1176325373. [Cross Ref]
18. Pinelis I. Optimum bounds for the distributions of martingales in Banach spaces. Ann Probab. 1994;22:1679–1706. doi: 10.1214/aop/1176988477. [Cross Ref]
19. Pinelis I. Monotonicity properties of the relative error of a Padé approximation for Mills' ratio. J Inequal Pure Appl Math. 2002;3/2
20. Pinelis I. Toward the best constant factor for the Rademacher-Gaussian tail comparison. ESAIM Probab Stat. 2007;11:412–426. doi: 10.1051/ps:2007027. [Cross Ref]
21. Pollard D. NSF-CBMS Regional Conference Series in Probability and Statistics. Vol. 2. Institute of Mathematical Statistics; Hayward, CA: 1990. Empirical Processes: Theory and Applications.
22. Rio E. Moment inequalities for sums of dependent random variables under projective conditions. J Theoret Probab. 2009;22:146–163. doi: 10.1007/s10959-008-0155-9. [Cross Ref]
23. Szarek SJ, Werner E. A nonsymmetric correlation inequality for Gaussian measure. J Multivariate Anal. 1999;68:193–211. doi: 10.1006/jmva.1998.1784. [Cross Ref]
24. Tomczak-Jaegermann N. Pitman Monographs and Surveys in Pure and Applied Mathematics. Vol. 38. Longman Scientific & Technical; Harlow, UK: 1989. Banach-Mazur Distances and Finite-Dimensional Operator Ideals.
25. van de Geer SA. Cambridge Series in Statistical and Probabilistic Mathematics. Vol. 6. Cambridge University Press; Cambridge: 2000. Applications of Empirical Process Theory.
26. van der Vaart AW, Wellner JA. Springer Series in Statistics. Springer-Verlag; New York: 1996. Weak Convergence and Empirical Processes: With Applications to Statistics.