PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Bernoulli (Andover). Author manuscript; available in PMC 2017 March 21.
Published in final edited form as:
Bernoulli (Andover). 2017 February; 23(1): 23–57.
Published online 2016 September 27. doi:  10.3150/15-BEJ702
PMCID: PMC5360110
NIHMSID: NIHMS752266

Statistical analysis of latent generalized correlation matrix estimation in transelliptical distribution

Abstract

Correlation matrix plays a key role in many multivariate methods (e.g., graphical model estimation and factor analysis). The current state-of-the-art in estimating large correlation matrices focuses on the use of Pearson’s sample correlation matrix. Although Pearson’s sample correlation matrix enjoys various good properties under Gaussian models, its not an effective estimator when facing heavy-tail distributions with possible outliers. As a robust alternative, Han and Liu (2013b) advocated the use of a transformed version of the Kendall’s tau sample correlation matrix in estimating high dimensional latent generalized correlation matrix under the transelliptical distribution family (or elliptical copula). The transelliptical family assumes that after unspecified marginal monotone transformations, the data follow an elliptical distribution. In this paper, we study the theoretical properties of the Kendall’s tau sample correlation matrix and its transformed version proposed in Han and Liu (2013b) for estimating the population Kendall’s tau correlation matrix and the latent Pearson’s correlation matrix under both spectral and restricted spectral norms. With regard to the spectral norm, we highlight the role of “effective rank” in quantifying the rate of convergence. With regard to the restricted spectral norm, we for the first time present a “sign subgaussian condition” which is sufficient to guarantee that the rank-based correlation matrix estimator attains the optimal rate of convergence. In both cases, we do not need any moment condition.

Keywords: Kendall’s tau correlation matrix, Optimal rates of convergence, Double asymptotics, Minimax lower bound, Transelliptical model, Elliptical copula

1 Introduction

Covariance and correlation matrices play a central role in multivariate analysis. An efficient estimation of covariance/correlation matrix is a major step in conducting many methods, including principal component analysis, scale-invariant principal component analysis, graphical model estimation, discriminant analysis, and factor analysis. Large covariance/correlation matrix estimation receives lots of attentions in high dimensional statistics. This is partially because the sample covariance/correlation matrix can be an inconsistent estimator when d/n [not right arrow] 0 (d and n represent the dimensionality and sample size).

Given n observations x1, …, xn of a d dimensional random vector X [set membership] Rd with the populational covariance matrix Ω, let Ŝ be the Pearson’s sample covariance matrix calculated based on x1, …, xn. Under a double asymptotic framework where both the dimension d and sample size n can increase to infinity, Johnstone (2001), Baik and Silverstein (2006), and Jung and Marron (2009) pointed out that, even when X follows a Gaussian distribution with identity covariance matrix, Ŝ can be an inconsistent estimator of Σ under spectral norm. In other word, letting ||·||2 denote the spetral norm of a matrix, there exist cases such that when (n, d) → ∞,

S^-Ω20.

This observation motivates different versions of sparse covariance/correlation matrix estimation methods. See, for example, banding method (Bickel and Levina, 2008a), tapering method (Cai et al., 2010; Cai and Zhou, 2012), and thresholding method (Bickel and Levina, 2008b). However, although the regularization methods exploited are different, they all use the Pearson’s sample covariance/correlation matrix as a pilot estimator, and accordingly the performance of the estimators relies heavily on existence of higher order moments of the data. For example, letting ||·||max and ||·||2,s denote the elementwise supremum norm and restricted spectral norm (detailed definitions provided later), in proving

S^-Ωmax=OP(logdn)orS^-Ω2,s=OP(slogdn),
(1.1)

it is commonly assumed that X = (X1, …, Xd)T satisfies

(marginalsubgaussian)Eexp(tXj)exp(σ2t22),forallj{1,,d},or(multivariatesubgaussian)Eexp(tvTX)exp(σ2t22),forallvSd-1,
(1.2)

for some Σ2 > 0. Here Sd−1 is the d-dimensional unit sphere in Rd.

The moment conditions in (1.2) cannot be satisfied for many distributions. To elaborate how strong this condition can be, we consider the student’s t distribution. Assuming that T follows a student’s t distribution with degree of freedom ν, it is known (Hogg and Craig, 2012) that

ET2k=forkν/2.

Recently, Han and Liu (2013b) advocated to use the transelliptical distribution for modeling and analyzing complex and noisy data. They exploited a transformed version of the Kendall’s tau sample correlation matrix [Sigma] to estimate the latent Pearson’s correlation matrix Σ. The transelliptical family assumes that, after a set of unknown marginal transformations, the data follow an elliptical distribution. This family is closely related to the elliptical copula and contains many well known distributions, including multivariate Gaussian, rank-deficient Gaussian, multivariate-t, Cauchy, Kotz, logistic, etc.. Under the transelliptical distribution, without any moment constraint, they showed that a transformed Kendall’s tau sample correlation matrix [Sigma] approximates the latent Pearson’s correlation matrix Σ in a parametric rate:

^-max=OP(logdn),
(1.3)

which attains the minimax rate of convergence.

Although (1.3) is inspiring, in terms of theoretical analysis of many multivariate methods, the rates of convergence under spectral norm and restricted spectral norm are more desired. For example, Bickel and Levina (2008b) and Yuan and Zhang (2013) showed that the performances of principal component analysis and a computationally tractable sparse PCA method are determined by the rates of convergence for the plug-in matrix estimators under spectral and restricted spectral norms. A trivial extension of (1.3) gives us that

^-2=OP(dlogdn)and^-2,s=OP(slogdn),

which are both not tight compared to the parametric rates (For more details, check Lounici (2013) and Bunea and Xiao (2012) for results under the spectral norm, and Vu and Lei (2012) for results under the restricted spectral norm).

In this paper we push the results in Han and Liu (2013b) forward, providing the optimal results of the transformed Kendall’s tau correlation matrix under both spectral and restricted spectral norms. We consider the statistical properties of the Kendall’s tau sample correlation matrix T in estimating the Kendall’s tau correlation matrix T, and the transformed version [Sigma] in estimating Σ.

First, we considering estimating the Kendall’s tau correlation matrix T itself. Estimating Kendall’s tau is of its self-interest compared to the Pearson’s correlation coefficient. For example, Embrechts et al. (2003) claimed that Pearson’s correlation coefficient “might prove very misleading” in many cases in modeling dependence and advocated to use the Kendall’s tau correlation coefficient as the “perhaps best alternatives to the linear correlation coefficient as a measure of dependence for non-elliptical distributions”. In estimating T, we show that, without any condition, for any continuous random vector X,

T^-T2=OP(T2re(T)logdn),

where re(T) := Tr(T)/||T||2 is called effective rank. Moreover, we provide a new term called “sign subgaussian condition”, under which we have

T^-T2,s=OP(T2slogdn).

Secondly, under the transelliptical family, we consider estimating the Pearson’s correlation matrix Σ of the latent elliptical distribution using the transformed Kendall’s tau sample correlation matrix ^=[sin(π2T^jk)]. Without any moment condition, we show that, as long as X belongs to the transelliptical family,

^-2=OP(2{re()logdn+re()logdn}),

which attains the nearly optimal rate of convergence obtained in Lounici (2013) and Bunea and Xiao (2012). Moreover, provided that the sign subgaussian condition is satisfied, we have

^-2,s=OP(2slogdn+slogdn).

which attains the nearly optimal rate of convergence obtained in Vu and Lei (2012).

In an independent work, Wegkamp and Zhao (2013) proposed to use the same transformed Kendall’s tau correlation coefficient estimator to analyze the elliptical copula factor model and proved a similar spectral norm convergence result as in Theorem 3.1 of this paper. The proofs are different and these two papers are independent work.

The rest of this paper is organized as follows. In the next section we briefly overview the transelliptical distribution family and the main concentration results for the transformed Kendall’s tau sample correlation matrix proposed by Han and Liu (2013b). In Section 3 we analyze the convergence rates of Kendall’s tau sample correlation matrix and its transformed version with regard to the spectral norm. In Section 4 we analyze the convergence rates of Kendall’s tau sample correlation matrix and its transformed version with regard to the restricted spectral norm. The technical proofs of these results are provided in Section 5. More discussions and conclusions are provided in Section 6.

2 Preliminaries and Background Overview

In this section, we briefly review the transelliptical distribution and the corresponding latent generalized correlation matrix estimator proposed by Han and Liu (2013b). We start with an introduction of notation: Let M = [Mij] [set membership] Rd×d and v = (v1, …, vd)T [set membership] Rd. We denote vI to be the subvector of v whose entries are indexed by a set I. We also denote MI,J to be the submatrix of M whose rows are indexed by I and columns are indexed by J. Let MI* and M*J be the submatrix of M with rows indexed by I, and the submatrix of M with columns indexed by J. Let supp(v) := {j : vj ≠ 0}. For 0 < q < ∞, we define the [ell]0, [ell]q, and [ell] vector (pseudo-)norms as

v0:=card(supp(v)),vq:=(i=1dviq)1/q,andv:=max1idvi.

Let λj(M) be the j-th largest eigenvalue of M and Θj(M) be the corresponding leading eigenvector. In particular, we let λmax(M) := λ1(M). We define Sd−1 := {v [set membership] Rd : ||v||2 = 1} to be the d-dimensional unit sphere. We define the matrix elementwise absolute supremum norm ([ell]max norm), spectral norm ([ell]2 norm), and restricted spectral norm ([ell]2,s norm) as

Mmax:=max{Mij},M2:=supvSd-1Mv2,andM2,s:=supvSd-1v0sMv2.

We define diag(M) to be a diagonal matrix with [diag(M)]jj = Mjj for j = 1, …, d. We also denote vec(M):=(M1T,,MdT)T. For any two vector a, b [set membership] Rd, we denote left angle bracketa, bright angle bracket:= aT b and sign(a):= (sign(a1), …, sign(ad))T, where sign(x) = x/|x| with the convention 0/0 = 0.

2.1 Transelliptical Distribution Family

The concept of transelliptical distribution builds upon the elliptical distribution. Accordingly, we first provide a definition of the elliptical distribution, using the stochastic representation as in Fang et al. (1990). In the sequel, for any two random vectors X and Y, we denote X=dY if they are identically distributed.

Definition 2.1 (Fang et al. (1990))

A random vector Z = (Z1, …, Zd)T follows an elliptical distribution if and only if Z has a stochastic representation: Z=dμ+ξAU. Here μ [set membership] Rd, q: = rank(A), A [set membership] Rd×q, ξ ≥ 0 is a random variable independent of U, U [set membership] Sq−1 is uniformly distributed on the unit sphere in Rq. In this setting, letting Σ := AAT, we denote Z ~ ECd(μ, Σ, ξ). Here Σ is called the scatter matrix.

The elliptical family can be viewed as a semiparametric generalization of the Gaussian family, maintaining the symmetric property of the Gaussian distribution but allowing heavy tails and richer structures. Moreover, it is a natural model for many multivariate methods such as principal component analysis (Boente et al., 2012). The transelliptical distribution family further relaxes the symmetric assumption of the elliptical distribution by assuming that, after unspecified strictly increasing marginal transformations, the data are elliptically distributed. A formal definition of the transelliptical distribution is as follows.

Definition 2.2 (Han and Liu (2013b))

A random vector X = (X1, …, Xd)T follows a transelliptical distribution, denoted by X ~ TEd(Σ, ξ; f1, …, fd), if there exist univariate strictly increasing functions f1, …, fd such that

(f1(X1),,fd(Xd))T~ECd(0,,ξ),wherediag()=IdandP(ξ=0)=0.

Here Id [set membership] Rd×d is the d-dimensional identity matrix and Σ is called the latent generalized correlation matrix.

We note that the transelliptical distribution is closely related to the nonparanormal distribution (Liu et al., 2009, 2012a; Xue and Zou, 2012; Han and Liu, 2012; Han et al., 2013) and meta-elliptical distribution (Fang et al., 2002). The nonparanormal distribution assumes that after unspecified strictly increasing marginal transformations the data are Gaussian distributed. It is easy to see that the transelliptical family contains the nonparanormal family. On the other hand, it is subtle to elaborate the difference between the transelliptical and meta-elliptical. In short, the transelliptical family contains metaelliptical family. Compared to the meta-elliptical, the transelliptical family does not require the random vectors to have densities and brings new insight into both theoretical analysis and model interpretability. We refer to Liu et al. (2012b) for more detailed discussion on the comparison between the transelliptical family, nonparanormal, and meta-elliptical families.

2.2 Latent Generalized Correlation Matrix Estimation

Following Han and Liu (2013b), we are interested in estimating the latent generalized correlation matrix Σ, i.e., the correlation matrix of the latent elliptically distributed random vector f(X) := (f1(X1), …, fd(Xd))T. By treating both the generating variable ξ and the marginal transformation functions f={fj}j=1d as nuisance parameters, Han and Liu (2013b) proposed to use a transformed Kendall’s tau sample correlation matrix to estimate the latent generalized correlation matrix Σ. More specifically, letting x1, …, xn be n independent and identically distributed observations of a random vector X [set membership] TEd(Σ, ξ; f1, …, fd), the Kendall’s tau correlation coefficient between the variables Xj and Xk is defined as

τ^jk:=2n(n-1)i<isign((xi-xi)j(xi-xi)k).

Its population quantity can be written as

τjk:=P((Xj-Xj)(Xk-Xk)>0-P((Xj-Xj)(Xk-Xk)<0),
(2.1)

where X = (X1, …, Xd)T is an independent copy of X. We denote

T:=[τjk]andT^:=[τ^jk]

to be the Kendall’s tau correlation matrix and Kendall’s tau sample correlation matrix.

For the transelliptical family, it is known that jk=sin(π2τjk) (Check, for example, Theorem 3.2 in Han and Liu (2013b)). A latent generalized correlation matrix estimator [Sigma] := [[Sigma]jk], called the transformed Kendall’s tau sample correlation matrix, is accordingly defined by:

^jk=sin(π2τ^jk).
(2.2)

Han and Liu (2013b) showed that, without any moment constraint,

^-max=OP(logdn),

and accordingly by simple algebra we have

^-2=OP(dlogdn)and^-2,s=OP(slogdn).
(2.3)

The rates of convergence in (2.3) are far from optimal (Check Lounici (2013), Bunea and Xiao (2012), and Vu and Lei (2012) for the parametric rates). In the next two sections, we will push the results in Han and Liu (2013b) forward, showing that better rates of convergence can be built in estimating the Kendall’s tau correlation matrix and the latent generalized correlation matrix.

3 Rate of Convergence under Spectral Norm

In this section we provide the rate of convergence of the Kendall’s tau sample correlation matrix T to T, as well as the transformed Kendall’s tau sample correlation matrix [Sigma] to Σ, under the spectral norm. The next theorem shows that, without any moment constraint or assumption on the data distribution (as long as it is continuous), the rate of convergence of T to T under the spectral norm is T2re(T)logd/n, where for any positive semidefinite matrix M [set membership] Rd,

re(M):=Tr(M)M2

is called the effective rank of M and must be less than or equal to the dimension d. For notational simplicity, in the sequel we assume that the sample size n is even. When n is odd, we can always use n−1 data points without affecting the obtained rate of convergence.

Theorem 3.1

Let x1, …, xn be n observations of a d dimensional continuous random vector X. Then for sufficiently large n and any 0 < α < 1, with probability larger than 1 − 2α, we have

T^-T24T2{re(T)+1}log(d/α)3n.
(3.1)

Theorem 3.1 shows that, when re(T) log d/n → 0, we have

T^-T2=OP(T2re(T)logdn).

This rate of convergence we proved is the same parametric rate as obtained in Vershynin (2010), Lounici (2013), and Bunea and Xiao (2012) when there is not any additional structure.

In the next theorem, we show that, under the modeling assumption that X is transelliptically distributed, which is of particular interest in real applications as shown in Han and Liu (2013b), we have that a transformed version of the Kendall’s tau sample correlation matrix can estimate the latent generalized correlation matrix in a nearly optimal rate.

Theorem 3.2

Let x1, …, xn be n observations of X ~ TEd(Σ, ξ; f1, …, fd). Let [Sigma] be the transformed Kendall’s tau sample correlation matrix defined in (2.2). We have, for n large enough and 0 < α < 1, with probability larger than 1 − 2αα2, we have

^-2π22(2{re()+1}log(d/α)3n+re()log(d/α)n).
(3.2)

Theorem 3.2 indicates that, when re(Σ) log d/n → 0, we have

^-2=OP(2re()logdn).

By the discussion of Theorem 2 in Lounici (2013), the obtained rate of convergence is minimax optimal up to a logarithmic factor with respect to a suitable parameter space. However, compared to the conditions in Lounici (2013), and Bunea and Xiao (2012), which require strong multivariate subgaussian modeling assumption on X (which implies the existence of arbitrary order moments), [Sigma] attains this parametric rate in estimating the latent generalized correlation matrix without any moment constraints.

4 Rate of Convergence under Restricted Spectral Norm

In this section we analyze the rates of convergence of the Kendall’s tau sample correlation matrix and its transformed version under the restricted spectral norm. The main target is to improve the rate OP(slogd/n) shown in (2.3) to the rate OP(slogd/n). Such a rate has been shown to be minimax optimal under the Gaussian model. Obtaining such an improved rate is technically challenging since the data could be very heavy-tailed and the transformed Kendall’s tau sample correlation matrix has a much more complex structure than the Pearson’s covariance/correlation matrix.

In the following we lay out a venue to analyze the statistical efficiency of T and [Sigma] under the restricted spectral norm. In particular, we characterize a subset of the transelliptical distributions for which T and [Sigma] can approximate Σ in an improved rate. More specifically, we provide a “sign subgaussian” condition which is sufficient for T and [Sigma] to attain the nearly optimal rate. This condition is related to the subgaussian assumption in Vu and Lei (2012), Lounici (2013), and Bunea and Xiao (2012) (see Assumption 2.2 in Vu and Lei (2012), for example). Before proceeding to the formal definition of this condition, we first define an operator ψ: RR as follows:

Definition 4.1

For any random variable Y [set membership] R, the operator ψ: RR is defined as

ψ(Y;α,t0):inf{c>0:Eexp{t(Yα-EYα)}exp(ct2),fort<t0}.
(4.1)

The operator ψ(·) can be used to quantify the tail behaviors of random variables. We recall that a zero-mean random variable X [set membership] R is said to be subgaussian distributed if there exists a constant c such that Eexp(tX) ≤ exp(ct2) for all t [set membership] R. A zero-mean random variable Y [set membership] R with ψ(Y; 1, ∞) bounded is well known to be a subgaussian distribution, which implies a tail probability

P(Y-EY>t)<2exp(-t2/(4c)),

where c is the constant defined in Equation (4.1). Moreover, ψ(Y; α, t0) is related to the Orlicz ψ2-norm. A formal definition of the Orlicz norm is provided as follows.

Definition 4.2

For any random variable Y [set membership] R, its Orlicz ψ2-norm is defined as

Yψ2:=inf{c>0:Eexp(Y/c2)2}.

It is well known that a random variable Y has ψ(Y; 1, ∞) to be bounded if and only if ||Y||ψ2 in Definition 4.2 is bounded (van de Geer and Lederer, 2013). We refer to Lemma A.1 in the appendix for a more detailed description on this property.

Another relevant norm to ψ(·) is the subgaussian norm ||·||ϕ2 used in, for example, Vershynin (2010). A former definition of the subgaussian norm is as follows.

Definition 4.3

For any random variable X [set membership] R, its subgaussian norm is defined as

Xϕ2:=supk1k-1/2(EXk)1/k.

The subgaussian norm is also highly related to the subgaussian distribution. In particular, we have if EX = 0, then Eexp(tX)exp(Ct2Xϕ22).

We say that a random vector X [set membership] Rd is multivariate subgaussian distributed if for any vector v [set membership] Rd, vTX is subgaussian distributed. We refer to Vershynin (2010) for more details on the properties of the subgaussian norm.

Using the operator ψ(·), we now proceed to define the sign subgaussian condition. Here for any vector v = (v1, …, vd) [set membership] Rd, we remind that sign(v) := (sign(v1), …, sign(vd))T.

Definition 4.4 (Sign subgaussian condition)

For a random vector X = (X1, …, Xd)T [set membership] Rd, letting X = (X1, …, Xd)T be an independent copy of X, then X satisfies the sign subgaussian condition if and only if

supvSd-1ψ(sign(X-X),v;2,t0)KT22,
(4.2)

where K is a fixed constant and t0 is a positive number such that t0KT22 is lower bounded by a fixed constant. We remind that here T can be written as

T:=Esign(X-X)·(sign(X-X))T.

To gain more insights about the sign subgaussian condition, In the following we point out two distribution families of interest that satisfy the sign subgaussian condition.

Proposition 4.5

For any random vector X [set membership] Rd and X being an independent copy of X, if we have

supvSd-1sign(X-X),v2-vTTvψ2L1T2,
(4.3)

where L1 is a fixed constant, then X satisfies the sign subgaussian condition by setting t0 = ∞ and K=5L12/2 in Equation (4.2).

Proposition 4.6

For any random vector X [set membership] Rd and X being an independent copy of X, if there exists an absolute constant L2 such that sign(XX) is subgaussian distributed and

vTsign(X-X)ϕ22L2T22forallvSd-1,
(4.4)

then X satisfies the sign subgaussian condition with t0=cT2-1 and K = C in Equation (4.2), where c and C are two fixed absolute constants.

Proposition 4.6 builds a bridge between the sign subgaussian condition and Assumption 1 in Bunea and Xiao (2012) and Lounici (2013). More specifically, saying that X satisfies Equation (4.4) is equivalent to saying that sign(XX) satisfies the multivariate subgaussian condition defined in Bunea and Xiao (2012). Therefore, Proposition 4.6 can be treated as an explanation of why we call the condition in Equation (4.2) “sign subgaussian”. However, by Lemma 5.14 in Vershynin (2010), the sign subgaussian condition is weaker than that of Equation (4.4), i.e., a distribution satisfying the sign subgaussian condition does not necessarily satisfy the condition in Equation (4.4).

The sign subgaussian condition is intuitive due to its relationship with the Orlicz and subgaussian norms. However, it is extremely difficult to verify whether a given distribution satisfies this condition. The main difficulty lies in the fact that we must sharply characterize the tail behavior of the summation of a sequence of possibly correlated discrete Bernoulli random variables, which is much harder than analyzing the summation of Gaussian random variables as usually done in the literature.

In the following we provide several examples that satisfy the sign subgaussian condition. The next theorem shows that for any transelliptically distributed random vector X ~ TEd(Σ, ξ; f1, …, fd) such that Σ = Id (i.e., the underlying is a spherical distribution), the distribution of X satisfies the condition shown in Equation (4.2). The proof of Theorem 4.7 is in Section 5.4.

Theorem 4.7

Suppose that X ~ TEd(Id, ξ; f1, …, fd) is transelliptically distributed with a latent spherical distribution. Then X satisfies the sign subgaussian condition shown in Equation (4.2).

In the next theorem, we provide a stronger version of Theorem 4.7. We call a square matrix compound symmetric if the off diagonal values of the matrix are equal. The next theorem shows that any transelliptically distributed random vector X ~ TEd(Σ.ξ; f1, …, fd), with Σ a compound symmetric matrix, satisfies Equation (4.4), and therefore satisfies the sign subgaussian condition.

Theorem 4.8

Suppose that X ~ TEd(Σ, ξ; f1, …, fd) is transelliptically distributed such that Σ is a compound symmetric matrix (i.e., Σjk = ρ for all jk). Then if 0 ≤ ρ:= Σ12C0 < 1 for some absolute positive constant C0, we have that X satisfies Equation (4.4), and therefore satisfies the sign subgaussian condition shown in Equation (4.2).

Although Theorem 4.7 can be directly proved using the result in Theorem 4.8, the proof of Theorem 4.7 contains utterly different techniques which are more transparent and illustrate the main challenges of analyzing binary sequences even in the uncorrelated setting. Therefore, we still list this theorem separately and provide a separate proof in Section 5.4. Theorem 4.8 leads to the following corollary, which characterizes a subfamily of the transelliptical distributions satisfying the sign subgaussian condition.

Corollary 4.9

Suppose that X ~ TEd(Σ, ξ; f1, …, fd) is transelliptically distributed with Σ a block diagonal compound symmetric matrix, i.e.,

=(10000200000q),
(4.5)

where Σk [set membership] Rdk×dk for k = 1, …, q is compound symmetric matrix with ρk:= [Σk]12 ≥ 0. We have, if q is upper bounded by an absolute positive constant and 0 ≤ ρkC1 < 1 for some absolute positive constant C1, then X satisfies the sign subgaussian condition.

We call the matrix in the form of Equation (4.5) block diagonal compound symmetric matrix. Corollary 4.9 indicates that any transelliptical distribution with a latent block diagonal compound symmetric latent generalized correlation matrix satisfies the sign subgaussian condition.

Using the sign subgaussian condition, we have the following main result, which shows that as long as the sign subgaussian condition holds, improved rates of convergence for both T and [Sigma] under the restricted spectral norm can be attained.

Theorem 4.10

Let x1, …, xn be n observations of X, for which the sign subgaussian condition holds. We have, with probability larger than 1 − 2α,

T^-T2,s4(2K)1/2T2s(3+log(d/s))+log(1/α)n.
(4.6)

Moreover, when we further have X ~ TEd(Σ, ξ; f1, …, fd), with probability larger 1−2αα2,

^-2,sπ2(2(2K)1/22s(3+log(d/s))+log(1/α)n+slog(d/α)n).
(4.7)

The results presented in Theorem 4.10 show that under various settings the rate of convergence for [Sigma] under the restricted spectral norm is OP(slogd/n), which is the parametric and minimax optimal rate shown in Vu and Lei (2012) within the Gaussian family. However, the Kendall’s tau sample correlation matrix and its transformed version attains this rate with all the moment constraints waived.

5 Technical Proofs

In this section we provide the technical proofs of the theorems shown in Sections 3 and 4.

5.1 Proof of Theorem 3.1

Proof

Reminding that xi := (xi1, …, xid)T, for ii′, let

Si,i:=(sign(xi,1-xi,1),,sign(xi,d-xid))T.

We denote [Delta with circumflex]i,i to be n(n − 1) random matrices with

Δ^i,i:=1n(n-1)(Si,iSi,iT-T).

By simple calculation, we have TT = Σi,i [Delta with circumflex]i,i and TT is a U-statistic.

In the following we extends the standard decoupling trick from Hoeffding (1963) from the U-statistic of random variables to the matrix setting. The extension relies on the matrix version of the Laplace transform method. For any square matrix M [set membership] Rd, we define

exp(M):=Id+k=1Mkk!,

where k! represents the factorial product of k. Using Proposition 3.1 in Tropp (2010), we have

P[λmax(T^-T)t]infθ>0e-θtE[Treθ(T^-T)],
(5.1)

and we bound E[Treθ(TT)] as follows.

The trace exponential function

Trexp:ATreA

is a convex mapping from the space of self-adjoint matrix to R+ (see Section 2.4 of Tropp (2010) and reference therein). Let m = n/2. For any permutation σ of 1, …, n, let (i1, …, in) := σ (1, …, n). For r = 1, …, m, we define Srσ and Δ^rσ to be

Srσ:=S2ir,2ir-1andΔ^rσ:=1m(Srσ[Srσ]T-T).

Moreover, for i = 1, …, m, let

Si:=S2i,2i-1andΔ^i:=1m(SiSiT-T).

The convexity of the trace exponential function implies that

Treθ(T^-T)=Treθi,iΔ^i,i=Trexp{1card(Sn)σSnθr=1mΔ^rσ}1card(Sn)σSnTreθr=1mΔ^rσ,
(5.2)

where Sn is the permutation group of {1, …, n}. Taking expectation on both sides of Equation (5.2) gives that

ETreθ(T^-T)ETreθi=1mΔ^i.
(5.3)

According to the definition, [Delta with circumflex]1, …, [Delta with circumflex]m are m independent and identically distributed random matrices, and this finishes the decoupling step.

Combing Equations (5.1) and (5.3), we have

P[λmax(T^-T)t]infθ>0e-θtETreθi=1mΔ^i.
(5.4)

Recall that E[Delta with circumflex]i = 0. Following the proof of Theorem 6.1 in Tropp (2010), if we can show that there are some nonnegative numbers R1, R2, and R3 such that

λmax(Δ^i)R1,R3i=1mEΔ^i22R2,

then the right hand side of Equation (5.4) can be bounded by

infθ>0e-θtETreθi=1mΔ^idexp{-t2/2R2+R1t/3}.

We first show that R1=2dm. Because ||[Delta with circumflex]i||max ≤ 2/m, by simple calculation, we have

λmax(Δ^i)Δ^i1d·Δ^imax2dm.

We then calculate R2. For this, we have, because X is continuous,

i=1mEΔ^i2=1mE(S1S1T-T)2=1m(E(dS1S1T)-T2)=1m(dT-T2).

Accordingly,

1m(dT2-T22)i=1mEΔ^i221m(dT2+T22),

so we set R2=1m(dT2+T22) and R3=1m(dT2-T22).

Thus using Theorem 6.1 in Tropp (2010) and the fact that ||T||2d implies dT2-T220, for any

tR3/R1=dT2-T222d,

we have

P{λmax(T^-T)t}d·exp(-3nt216(dT2+T22)).

A similar argument holds for λmax(−T + T). Accordingly, we have

P{T^-T2t}2d·exp(-3nt216(dT2+T22)).

Finally, when

n64d2log(d/α)(dT2+T22)3(dT2-T22)2,

we have

16(dT2+T22)log(d/α)3ndT2-T222d.

This completes the proof.

5.2 Proof of Theorem 3.2

To prove Theorem 3.2, we first need the following lemma, which connects 1-jk2 to a Gaussian distributed random vector (X, Y)T [set membership] R2 and plays a key role in bounding ||[Sigma]Σ||2 by ||TT||2.

Lemma 5.1

Provided that

(XY)~N2(0,[1σσ1]),

we have

EXY=EXYEsign(XY)+2π1-σ2.

Proof

We recall that σ:=sin(π2τ) with τ the Kendall’s tau correlation coefficient of X, Y. Without loss of generality, assume that σ > 0, τ > 0 (otherwise show for −Y instead of Y). Define

β+=EXYI(XY>0),β-=EXYI(XY<0),

where I(·) is the indicator function. We then have

EXY=β++β-,EXY=σ=β+-β-.
(5.5)

To compute β+, using the fact that

X=d1+σ2Z1+1-σ2Z2,Y=d1+σ2Z1-1-σ2Z2,

where Z1, Z2 ~ N1(0, 1) are independently and identically distributed.

Let FX,Y and FZ1,Z2 be joint distribution functions of (X, Y)T and (Z1, Z2)T. We have

β+=xy>0xydFX,Y(x,y)=xy>0(x+y)2-(x-y)24dFX,Y(x,y)=z12>1-σ1+σz22(1+σ2z12-1-σ2z22)dFZ1,Z2(z1,z2)=0+-αα2{1+σ2r2cos2(θ)-1-σ2r2sin2(θ)}·12πe-r2/2rdθdr,

where α:=arcsin(1+σ2). By simple calculation, we have

0r3e-r2/2dr=120ue-u/2du=2.

Accordingly, we can proceed the proof and show that

β+=0+-αα(cos(2θ)+σ)·r312πe-r2/2dθdr=1π(sin(2α)+2ασ).
(5.6)

Since sin(2α)=1-σ2=cos(πτ/2) and αarcsin(1/2)π/4, we have that 2α=π2(1+τ), and then Equation (5.6) continues to give

β+=σ2(1+τ)+1π1-σ2.

Combined with Equation (5.5) gives the equality claimed.

Using Theorem 3.1 and 5.1, we proceed to the prove Theorem 3.2.

Proof of Theorem 3.2

Using Taylor expansion, for any jk, we have

sin(π2τ^jk)-sin(π2τjk)=cos(π2τjk)π2(τ^jk-τjk)-12sin(θjk)(π2)2(τ^jk-τjk)2,

where θjk lies between τjk and [tau]jk. Thus

^-=E1+E2,

where E1, E2 [set membership] Rd×d satisfy that for jk,

[E1]jk=cos(π2τjk)π2(τ^jk-τjk),[E2]jk=-12sin(θjk)(π2)2(τ^jk-τjk)2,

and the diagonal entries of both E1 and E2 are all zero.

Using the results of U-statistics shown in Hoeffding (1963), we have that for any jk and t > 0,

P(τ^jk-τjk>t)<2e-nt2/4.

For some constant α, let the set Ω2 be defined as

Ω2:={1jkd,[E2]jk>π2·log(d/α)n}.

Since [E2]jkπ28(τ^jk-τjk)2, by union bound, we have

P(Ω2)d22·2e-2log(d/α)=α2.

Conditioning on Ω2C, for any v [set membership] Sd−1, we have,

vTE2vj,kJ[E2]jk2·v22d2(π2·log(d/α)n)2=π2·dlog(d/α)n.
(5.7)

We then analyze the term E1. Let W = [Wjk] [set membership] Rd×d with Wjk=π2cos(π2τjk) and T = [Tjk] be the Kendall’s tau sample correlation matrix with Tjk = [tau]jk. We can write

E1=W(T^-T),

where [composite function (small circle)] represents the Hadamard product. Given the spectral norm bound of TT shown in Theorem 3.1, we now focus on controlling E1. Let Y:= (Y1, …, Yd)T ~ Nd(0, Σ) follow a Gaussian distribution with mean zero and covariance matrix Σ. Using the equality in Lemma 5.1, we have, for any jk,

EYjYk=τjkjk+2π1-jk2.

Reminding that

cos(π2τjk)=1-sin2(π2τjk)=1-jk2,

we have

Wjk=π2cos(π2τjk)=π24(EYjYk-τjkjk).

Then let Y:=(Y1,,Yd)Td be an independent copy of Y. We have, for any v [set membership] Sd−1 and symmetric matrix M [set membership] Rd×d,

vTMWv=j,k=1dvjvkMjkWjk=|Eπ24j,kvjvkMjk(YjYk-YjYksign(YjYk))|π24E(|j,kvjvkMjkYjYk|+|j,kvjvkMjkYjYksign(YjYk)|)π24M2·E(2jvj2Yj2)=π24M2·(2jvj2)=π22M2.
(5.8)

Here the second inequality is due to the fact that for any M [set membership] Rd×d and v [set membership] Rd, |vTMv| ≤ ||M||2||v||2 and the third equality is due to the fact that EYj2=jj=1 for any j [set membership] {1, …, d}. Accordingly, we have

E12=W(T^-T)2π22T^-T2.
(5.9)

The bound in Theorem 3.2, with Σ being replaced by T, follows from the fact that

^-2=E1+E22E12+E22

and by combining Equations (3.1), (5.7), and (5.9). Finally, we prove that ||T||2 ≤ ||Σ||2. We have Tjk=2πarcsin(jk). Using the Taylor expansion and the fact that |Σjk| ≤ 1 for any (j, k) [set membership] {1, …, d}, we have

T=2πm=0(2m)!4m(m!)2(2m+1)()2m+1.

By Schur’s theorem (see, for example, Page 95 in Johnson (1990)), we have for any two positive semidefinite matrices A and B,

AB2(maxjAjj)B2.

Accordingly, using the fact that Σjj = 1 for all 1 ≤ jd, we have

()2m+122,

implying that

T22·2πm=0(2m)!4m(m!)2(2m+1)=2·2πarcsin1=2.
(5.10)

Accordingly, we can replace T with Σ in the upper bound and have the desired result.

5.3 Proofs of Propositions 4.5 and 4.6

Proposition 4.5 is a direct consequence of Lemma A.1. To prove Proposition 4.6, we first introduce the subexponential norm. For any random variable X [set membership] R, ||X||ϕ1 is defined as follows:

Xϕ1:=supk11k(EXk)1/k.

Let S := sign(XX). Because vTS is subgaussian and EvTS = 0, using Lemma 5.14 in Vershynin (2010), we get

(vTS)2-E(vTS)2ϕ1(vTS)2ϕ1+vTTvϕ12vTSϕ22+vTTv(L2+1)T2.

Since (vTS)2E(vTS)2 is a zero-mean random variable and vTS is subgaussian distributed, using Lemma 5.15 in Vershynin (2010), there exist two fixed constants C′, c′ such that if |t| ≤ c′/||(vTS)2E(vTS)2||ϕ1, we have

Eexp(t((vTS)2-E(vTS)2))exp(Ct2(vTS)2-E(vTS)2ϕ12).

Accordingly, by choosing t0=c(L2+1)-1T2-1 and K = C′(L2 + 1)2 in Equation (4.2), noticing that t0KT22=cC(L2+1)T2cC(L2+1), the sign subgaussian condition is satisfied.

5.4 Proof of Theorem 4.7

In this section, we provide the proof of Theorem 4.7. In detail, we show that for any transelliptically distributed random vector X such that f(X) ~ ECd(0, Id, ξ), we have that X satisfies the condition in Equation (4.2).

Proof

Because the sign function is invariant to strictly increasing functions f={fj}j=1d and the nonnegative generating variable ξ with P(ξ = 0) = 0, we only need to consider the random vector X ~ Nd(0, Id). For X = (X1, …, Xd)T ~ Nd(0, Id) and X as an independent copy of X, we have XX ~ Nd(0, 2Id). Reminding that the off-diagonal entries of Id are all zero, defining X0=(X10,,Xd0)T=X-X and

g(X0,v):=j,kvjvksign(Xj0Xk0),

we have

{vTsign(X-X)}2-E{vTsign(X-X)}2=g(X0,v)-Eg(X0,v).

Accordingly, to bound ψ(left angle bracketsign(XX), vright angle bracket; 2), we only need to focus on g(X0, v). Letting S := (S1, …, Sd)T with Sj:=sign(Yj0) for j = 1, …, d. Using the property of Gaussian distribution, S1, …, Sd are independent Bernoulli random variables in {−1, 1} almost surely. We then have

g(Y0,v)-Eg(Y0,v)=j,kvjvksign(Yj0,Yk0)-1=(vTS)2-1.

Here the first equality is due to the fact that v2=j=1dvj2=1.

We then proceed to analyze the property of (vTS)2 − 1. By the Hubbard-Stratonovich transform (Hubbard, 1959), for any η [set membership] R,

exp(η2)=-14πe-y2/4+yηdy.
(5.11)

Using Equation (5.11), we have that, for any t > 0,

Eexp[t{(vTS)2-1}]=e-tEet(vTS)2=e-t4πt-+e-y2/4tEeyj=1dvjSjdy=e-t4πt-+e-y2/4tj=1d12(eyvj+e-yvj)dy.

For any number z [set membership] N, we define z! to represent the factorial product of z. Because for any a [set membership] R, by Taylor expansion, we have

{exp(a)+exp(-a)}/2=k=0a2k/(2k)!,andexp(a2/2)=k=0a2k/(2k·k!).

Because (2k)! > 2k · k!, we have

{exp(a)+exp(-a)}/2exp(a2/2).

Accordingly, we have for any 0 < t < 1/4,

Eexp[t{(vTS)2-1}]=e-t4πt-+e-y2/4tj=1d12(eyvj+e-yvj)dye-t4πt-+e-y2/4tej=1d12y2vj2dy=e-t4πt-+e-y2/4t+12y2dy=e-t1-2t.

By Taylor expansion of log(1 − x), we have that

11-2t=exp{12k=1(2t)kk},

which implies that for all 0 < t < 1/4,

e-t1-2t=exp(t2+12k=3(2t)kk)exp(2t2).

This concludes that for 0 < t < 1/4,

Eexp[t{(vTS)2-1}]exp(2t2).
(5.12)

Due to that (vTS)2 ≥ 0, we can apply Theorem 2.6 in Chung and Lu (2006) to control the term Eexp[t{1 − (vTS)2}]. In detail, suppose that the random variable Y satisfying EY = 0, Ya0, and EY2 = b0 for some absolute constants a0 and b0. Then for any 0 < t < 2/a0, using the proof of Theorem 2.8 in Chung and Lu (2006), we have,

EetYexp{3b0/2·t2}.
(5.13)

For Y = 1 − (vTS)2, we have

a0=1andb0=E(vTS)4-1=2-2j=1dvj4<2.
(5.14)

Here we remind that E(vTS)2=jvj2=1. Combining Equations (5.13) and (5.14) implies that for any t > 0,

Eexp[t{1-(vTS)2}]exp{3t2}.
(5.15)

Combining Equations (5.12) and (5.15), we see that Equation (4.2) holds with K = 3/4 and t0 = 1/4 (Reminding that here ||T||2 = 1).

5.5 Proof of Theorem 4.8 and Corollary 4.9

In this section, we prove Theorem 4.8 and Corollary 4.9. Using the same argument as in the proof of Theorem 4.7, because the sign function is invariant to the marginal strictly increasing transformation functions and the nonnegative generating variable ξ with P(ξ = 0) = 0, we only need to focus on those random vectors that are Gaussian distributed.

Proof of Theorem 4.8

Assume that Σ [set membership] Rd×d is a compound symmetry matrix such that

jj=1andjk=ρforjk.

By the discussion in Page 11 of Vershynin (2010), to prove Equation (4.4) holds, we only need to prove that for 0 ≤ ρC0 where C0 is some absolute constant, X = (X1, …, Xd)T ~ Nd(0, Σ) and v [set membership] Sd−1, we have

exp(tvTsign(X-X))exp(cT2t2),

for some fixed constant c. This result can be proved as follows. Let η0, η1, …, ηd be i.i.d. standard Gaussian distribution such that ηi ~ N1(0, 1) for i = 0, 1, …, d, we then have Z:= XX can be expressed as Z=d(Z1,,Zd)T, where

Z1=2ρη0+2-2ρη1,Z2=2ρη0+2-2ρη2,,Zd=2ρη0+2-2ρηd.

Accordingly, we have

Eexp(tvTsign(X-X))=E(exp(tj=1dvjsign(2ρη0+2-2ρηj)))=E(E(exp(tj=1dvjsign(2ρη0+2-2ρηj))η0))

Moreover, we have

2ρη0+2-2ρηjη0~N1(2ρη0,2-2ρ).
(5.16)

Letting μ:=2ρη0 and σ:=2-2ρ, Equation (5.16) implies that

P(2ρη0+2-2ρηj>0η0)=Φ(μσ),

where Φ(·) is the CDF of the standard Gaussian. This further implies that

sign(2ρη0+2-2ρηj)η0~Bern(Φ(μσ)),

where we denote Y ~ Bern(p) if P(Y = 1) = p and P(Y = −1) = 1−p. Accordingly, letting α:= Φ(μ/σ), we have

E(exp(tvjsign(2ρη0+2-2ρηj))η0)=(1-α)e-vjt+αevjt.

Letting β:= α − 1/2, we have

E(exp(tvjsign(2ρη0+2-2ρηj))η0)=12e-vjt+12evjt+β(evjt-e-vjt).

Using that fact that 12ea+12e-aea2/2, we have

E(exp(tvjsign(2ρη0+2-2ρηj))η0)exp(vj2t2/2)+β(evjt-e-vjt).

Because conditioning on η0, sign(2ρη0+2-2ρηj) are independent of each other, we have

E(exp(tj=1dvjsign(2ρη0+2-2ρηj))η0)j=1d{exp(vj2t2/2)+β(evjt-e-vjt)}=et2/2(1+k=1dβkj1<j2<<jkj{j1,,jk}evjt-e-vjtevj2t2/2).

Moreover, for any centered Gaussian distribution Y ~ N1(0, κ) and t [set membership] R, we have

P(Φ(Y)>1/2+t)=P(Y>Φ-1(1/2+t))=P(Y>-Φ-1(1/2-t))=P(Y<Φ-1(1/2-t))=P(Φ(Y)<1/2-t).

Combined with the fact that Φ(Y) [set membership] [0, 1], we have

E(Φ(Y)-1/2)k=0whenkisodd.

This implies that when k is odd,

Eβk=0=E(Φ(ρ/(1-ρ)η0)-12)k=0.

Accordingly, denoting ε=Eexp(tj=1dvjsign(2ρη0+2-2ρηj)), we have

εet2/2(1+kisevenEβkj1<j2<<jkj{j1,,jk}evjt-e-vjtevj2t2/2).

Using the fact that

ea-e-a=|j=1ajj!-j=1(-a)jj!|=2|m=0a2m+1(2m+1)!|=2a·|m=0a2m(2m+1)!|2aexp(a2/2),

we further have

εet2/2(1+kisevenEβkj1<j2<<jkj{j1,,jk}2vjt)=et2/2(1+kisevenEβk(2t)kj1<j2<<jkvj1vjk).

By Maclaurin’s inequality, for any x1, …, xd ≥ 0, we have

x1++xnn(1i<jnxixj(n2))1/2(x1xn)1/n.

Accordingly,

et2/2(1+kisevenEβk(2t)kj1<j2<<jkvj1vjk)et2/2(1+kisevenEβk(2t)k{(dk)·(v1/d)k})et2/2(1+kisevenEβk(2t)kdk/2(e/k)k).
(5.17)

The last inequality is due to the fact that v1dv2=d and (dk)(ed/k)k.

Finally, we analyze Eβ2m for m = 1, 2, …. Reminding that

β:=Φ(ρ1-ρη0)-12,

consider the function f(x):xΦ(ρ/(1-ρ)x), we have

f(x)=ρ1-ρ·12πexp(-ρ2(1-ρ)x2)ρ2π(1-ρ).

Accordingly, f(·) is a Lipschitz function with a Lipschitz constant K0:=ρ2π(1-ρ). By the concentration of Lipschitz functions of Gaussian (Ledoux, 2001), we have

P(β>t)=P(f(η0)-Ef(η0)>t)2exp(-t2/(2K02)).

This implies that, for m = 1, 2, …,

Eβ2m=2m0t2m-1P(β>t)dt4m0t2m-1exp(-t2/(2K02))dt=4m(2K0)2m0t2m-1exp(-t2)dt=2m(2K02)m0tm-1exp(-t)dt.

Using the fact that 0exp(-t)dt=1 and for any m ≥ 1,

m0tm-1exp(-t)dt=0exp(-t)dtm=0tmexp(-t)dt,

we have for m [set membership] Z+, 0tmexp(-t)dt=m!. Accordingly,

Eβ2m2m(2K02)m(m-1)!=2(2K02)mm!.

Plugging the above result into Equation (5.17), we have

εet2/2(1+m=12(2K02)mm!(2t)2mdm(e/(2m))2m)=et2/2(1+m=1(K02d)m·m!2(22et)2m/(2m)2m).

Reminding that ρC0 and K0:=ρ2π(1-ρ)ρ2π(1-C0), we have

εet2/2(1+m=1(K02d)m·m!2(22et)2m/(2m)2m)et2/2(1+m=1m!2(2dρπ(1-C0)et)2m/(2m)2m).

Finally, we have for any m ≥ 1

2m!·m!(2m)2m,

implying that

εet2/2·exp(4dρe2/π·t2)=exp{(12+4dρe2π(1-C0))t2},
(5.18)

where the term 12+4dρe2π(1-C0) is in the same scale of T2=1+(d-1)·2πarcsin(ρ). This finishes the proof.

Corollary 4.9 can be proved similarly as Theorem 4.8.

Proof of Corollary 4.9

Letting Jk={1+j=1k-1dj,,j=1kdj}. By independence of the Gaussian distribution, we have

exp(tvTsign(X-X))=k=1qexp(tvJkTsign(X-X)Jk).

Here we note that the bound in Equation (5.18) also holds for each exp(tvJkTsign(X-X)Jk) by checking Equation (5.17). Accordingly,

k=1qexp(tvJkTsign(X-X)Jk)k=1qexp{(12+4dkρke2π(1-C1))t2}exp{t2(q2+4e2qπ(1-C1)maxk(dkρk))}.

Because q is upper bounded by a fixed constant, we have vT sign(XX) is a subgaussian distribution. This finishes proof.

5.6 Proof of Theorem 4.10

Proof

We first prove that (4.6) in Theorem 4.10 holds. Letting ζ:=KT22, we aim to prove that with probability larger than or equal to 1 − 2α,

supbSs-1supJs{1,,d}|bT[T^-T]Js,Jsb|2(8ζ)1/2s(3+log(d/s))+log(1/α)n.
(5.19)

For the sphere Ss−1 equipped with Euclidean metric, we let [mathematical script N]ε be a subset of Ss−1 such that for any v [set membership] Ss−1, there exists u [set membership] [mathematical script N]ε subject to ||uv||2ε. The cardinal number of [mathematical script N]ε has the upper bound

card(Nε)<(1+2ε)s.

Let [mathematical script N]1/4 be a (1/4)-net of Ss−1. Then the cardinality of [mathematical script N]1/4 is bounded by 9s. Moreover, for any matrix M [set membership] Rs×s,

supvSs-1vTMv11-2εsupvNεvTMv.

This implies that

supvSs-1vTMv2supvN1/4vTMv.

Let β > 0 be a constant defined as

β:=(8ζ)1/2s(3+log(d/s))+log(1/α)n.

We have

P(supbSs-1supJs{1,,d}|bT[T^-T]Js,Jsb|>2β)P(supbN1/4supJs{1,,d}|bT[T^-T]Js,Jsb|>β)9s(ds)P(|bT[T^-T]Js,Jsb|>(8ζ)1/2s(3+log(d/s))+log(1/α)n,forfixedbandJs).

Thus, if we can show that for any fixed b and Js, we have

P(|bT[T^-T]Js,Jsb|>t)2e-nt2/(8ζ),
(5.20)

for the absolute constant ζ:=KT22. Then using the bound (ds)<{ed/(s)}s, we have

9s(ds)P(|bT[T^-T]Js,Jsb|>(8ζ)1/2s(3+log(d/s))+log(1/α)n,forfixedbandJ)2exp{s(1+log9-logs)+slogd-s(3+logd-logs)log(1/α)}2α,

It gives that with probability greater than 1 − 2α the bound in Equation (5.19) holds.

Finally, we show that Equation (5.20) holds. For any t, we have

Eexp{t·bT[T^-T]Js,Jsb}=Eexp{t·jkJsbjbk(τ^jk-τjk)}=Eexp{t·1(n2)i<ijkJsbjbk(sign((xi-xi)j(xi-xi)k)-τjk)}.

Let Sn represent the permutation group of {1, …, n}. For any σ [set membership] Sn, let (i1, …, in) := σ(1, …, n) represent a permuted series of {1, …, n} and O(σ) := {(i1, i2), (i3, i4), …, (in−1, in)}. In particular, we denote O(σ0) := {(1, 2), (3, 4), …, (n − 1, n)}. By simple calculation,

Eexp{t·1(n2)i<ijkJsbjbk(sign((xi-xi)j(xi-xi)k)-τjk)}=Eexp{t·1card(Sn)σSn2n(i,i)O(σ)jkJsbjbk(sign((xi-xi)j(xi-xi)k)-τjk)}1card(Sn)σSnEexp{t·2n(i,i)O(σ)jkJsbjbk(sign((xi-xi)j(xi-xi)k)-τjk)}=Eexp{t·2n(i,i)O(σ0)jkJsbjbk(sign((xi-xi)j(xi-xi)k)-τjk)}.
(5.21)

The inequality is due to the Jensen’s inequality.

Let m := n/2 and remind that X = (X1, …, Xd)T ~ TEd(Σ, ξ; f1, …, fd). Let X = (X1, …, Xd)T be an independent copy of X. By Equation (4.2), we have that for any |t| < t0 and v [set membership] Sd−1,

Eexp[t{(vTsign(X-X))2-E(vTsign(X-X))2}]eζt2.

In particular, letting vJs = b and vJsC=0, we have

Eexp{tjkJsbjbk(sign((X-X)j(X-X)k)-τjk)}eζt2.
(5.22)

Then we are able to continue Equation (5.21) as

Eexp{t·2n(i,i)O(σ0)jkJsbjbk(sign((xi-xi)j(xi-xi)k)-τjk)}=Eexp{tmi=1m{jkJsbjbk(sign((x2i-x2i-1)j(x2i-x2i-1)k)-τjk)}}=(Eetm(sign((X-X)j(X-X)k)-τjk))meζt2/m,
(5.23)

where by Equation (4.2), the last inequality holds for any |t/m|< t0. Accordingly, choosing t = βm/(2ζ), by Markov inequality, we have

P(bT[T^-T]Js,Jsb>β)e-nβ2/8(ζ),forallβ<2ζt0.
(5.24)

By symmetry, we have the same bound for P(bT[TT]Js,Jsb < −β) as in Equation (5.24). Together they give us Equation (5.20). This completes the proof of the first part.

Using (4.6), we can now proceed to the quantify the term

supvSd-1,v0svT(^-)v.

We aim to prove that, under the conditions in Theorem 4.10, we have with probability larger than or equal to 1 − 2αα2,

supbSs-1supJs{1,,d}bT[^-]Js,Jsbπ2(8ζ)1/2s(3+log(d/s))+log(1/α)n+π2·slog(d/α)n.
(5.25)

Using a similar argument as in the proof of Theorem 3.2, we let E1, E2 [set membership] Rd×d, satisfying that for jk,

[E1]jk=cos(π2τjk)π2(τ^jk-τjk),[E2]jk=-12sin(θjk)(π2)2(τ^jk-τjk)2.

We then have

^-=E1+E2.

Let the set Ω2 be defined as

Ω2:={1jkd,[E2]jk>π2log(d/α)n}.

Using the result in the proof of Theorem 3.2, we have P2) ≤ α2. Moreover, conditioning on Ω2, for any Js [set membership] {1, …, d} and b [set membership] Ss−1,

bT[E2]Js,Jsbj,kJs[E2]jk2·b22s·π2·log(d/α)n=π2·slog(d/α)n.
(5.26)

We then proceed to control the term |bT[E1]Js,Jsb|. Using a similar argument as shown in Equation (5.8), for Y = (Y1, …, Yd)T ~ Nd(0, Σ), any symmetric matrix M [set membership] Rd×d, W with Wjk=π2cos(π2τjk) and v [set membership] Sd−1 with ||v||0q, we have

vTMWvπ24E(|j,kvjvkMjkYjYk|+|j,kvjvkMjkYjYksign(YjYk)|)π24supbSd-1,b0qbTMb·E(2jvj2Yj2)=π24supbSd-1,b0qbTMb·(2jvj2)=π22supbSd-1,b0qbTMb.

Accordingly, we have

supbSs-1supJs{1,,d}bT[E1]Js,Jsbπ2bsupbSs-1supJs{1,,d}|bT[T^-T]Js,Jsb|.

Combined with Equations (4.6), (5.26), and (5.10), we have the desired result in (4.7).

6 Discussions

This paper considers robust estimation of the correlation matrix using the rank-based correlation coefficient estimator Kendall’s tau and its transformed version. We showed that the Kendall’s tau is an very robust estimator in high dimensions, in terms of that it can achieve the parametric rate of convergence under various norms without any assumption on the data distribution, and in particular, without assuming any moment constraints. We further consider the transelliptical family proposed in Han and Liu (2013b), showing that a transformed version of the Kendall’s tau attains the parametric rate in estimating the latent Pearson’s correlation matrix without assuming any moment constraints. Moreover, unlike the Gaussian case, the theoretical analysis performed here motivates new understandings on rank-based estimators as well as new proof techniques. These new understandings and proof techniques are of self interest.

Han and Liu (2013a) studied the performance of the latent generalized correlation matrix estimator on dependent data under some mixing conditions and proved that [Sigma] can attain a slogd/(nγ) rate of convergence under the restricted spectral norm, where γ ≤ 1 reflects the impact of non-independence on the estimation accuracy. It is also interesting to consider extending the results in this paper to dependent data under similar mixing conditions and see whether a similar slogd/(nγ) rate of convergence can be attained. However, it is much more challenging to obtain such results in dependent data. The current theoretical analysis based on U statistics is not sufficient to achieve this goal.

A problem closely related to the leading eigenvector estimation is principal component detection, which is initiated in the work of Berthet and Rigollet (2013b,a). It is interesting to extend the analysis here to this setting and conduct sparse principal component detection under the transelliptical family. It is worth pointing out that Theorems 3.2 and 4.10 in this paper can be exploited in measuring the statistical performance of the corresponding detection of sparse principal components.

Acknowledgments

We sincerely thank Marten Wegkamp for his very helpful discussions and generously providing independent credit for our work. We also thank Xiuyuan Cheng, Luo Xiao, Ramon van Handel, and Philippe Rigollet for their many helps that significantly improve the quality of our work.

A Appendix

In this section we provide a lemma quantifying the relationship between Orlicz ψ2-norm and the subgaussian condition. Although this result is well known, in order to quantify this relationship in numbers, we include a proof here. We do not claim any original contribution in this section.

Lemma A.1

For any random variable Y [set membership] R, we say that Y has a subgaussian distribution with factor c > 0 if and only if for any t [set membership] R, Eexp(tY) ≤ exp(ct2). We than have Y has a subgaussian distribution if and only if ||Y ||ψ2 is bounded. In particular, we have that if Y is subgaussian with factor c, then Yψ212c. If ||Y||ψ2D ≤ ∞, then Y is subgaussian with factor c = 5D2/2.

Proof

If Y has a subgaussian distribution, then for any m > 0, we have

Eexp(Y/m2)=1+0P(Y2m2>t)etdt=1+0P(Y>mt)etdt.

By Markov inequality, we know that if Y is subgaussian distributed, then for any t > 0

P(Y>t)2exp(-t2/(4c)).

Accordingly, we can proceed the proof

Eexp(Y/m2)1+20e-m2t/(4c)·etdt=1+20e-(m2/(4c)-1)tdt=1+2m2/(4c)-1.

Picking m=12c, we have Eexp(|Y/m|2) ≤ 2. Accordingly, Yψ212c. On the other hand, if ||Y||ψ2 ≤ ∞, then there exists some m < ∞ such that Eexp(|Y/m|2) ≤ 2. Using integration by part, it is easy to check that

exp(a)=1+a201(1-y)eaydy.

This implies that

Eexp(tX)=1+01(1-u)E[(tX)2exp(utX)]du1+t2E(X2exp(tX))01(1-u)du1+t22E(X2etX).

Using the fact that for any a, b [set membership] R, aba2+b22 and aea, we can further prove that

Eexp(tX)1+t22E(X2etX)1+m2t2em2t2/2E(X22m2eX2/(2m2))1+m2t2em2t2/2EeX2/m2(1+2m2t2)em2t2/2e5m2t2/2.

The last inequality is due to the fact that for any a [set membership] R, 1 + aea. Accordingly, X is subgaussian with the factor c = 5m2/2.

References

  • Baik J, Silverstein JW. Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis. 2006;97(6):1382–1408.
  • Berthet Q, Rigollet P. Computational lower bounds for sparse PCA. 2013a arXiv preprint arXiv:1304.0828.
  • Berthet Q, Rigollet P. Optimal detection of sparse principal components in high dimension. Annals of Statistics. 2013b;41(4):1780–1815.
  • Bickel P, Levina E. Regularized estimation of large covariance matrices. The Annals of Statistics. 2008a;36(1):199–227.
  • Bickel PJ, Levina E. Covariance regularization by thresholding. The Annals of Statistics. 2008b;36(6):2577–2604.
  • Boente G, Barrerab MS, Tylerc DE. A characterization of elliptical distributions and some optimality properties of principal components for functional data. Technical report, Technical report. 2012 Available at http://www.stat.ubc.ca/~matias/PropertyFPCA.pdf.
  • Bunea F, Xiao L. On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fpca. 2012 arXiv preprint arXiv:1212.5321.
  • Cai TT, Zhang CH, Zhou HH. Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics. 2010;38(4):2118–2144.
  • Cai TT, Zhou HH. Minimax estimation of large covariance matrices under [ell]1 norm. Statist Sinica. 2012;22:1319–1378.
  • Chung F, Lu L. Complex Graphs and Networks. American Mathematical Society; 2006.
  • Embrechts P, Lindskog F, McNeil A. Modelling dependence with copulas and applications to risk management. Handbook of Heavy Tailed Distributions in Finance. 2003;8(1):329–384.
  • Fang H, Fang K, Kotz S. The meta-elliptical distributions with given marginals. Journal of Multivariate Analysis. 2002;82(1):1–16.
  • Fang K, Kotz S, Ng K. Symmetric Multivariate and Related Distributions. Chapman & Hall; 1990.
  • Han F, Liu H. Semiparametric principal component analysis. Proceedings of the Twenty-fifthy Annual Conference on Neural Information Processing Systems; 2012. pp. 171–179.
  • Han F, Liu H. Principal component analysis on non-Gaussian dependent data. Proceedings of the Thirtieth International Conference on Machine Learning.2013a.
  • Han F, Liu H. Scale-invariant sparse PCA on high dimensional meta-elliptical data. Journal of the American Statistical Association. 2013b In press. [PMC free article] [PubMed]
  • Han F, Zhao T, Liu H. CODA: High dimensional copula discriminant analysis. Journal of Machine Learning Research. 2013;14:629–671.
  • Hoeffding W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association. 1963;58(301):13–30.
  • Hogg RV, Craig A. Introduction to Mathematical Statistics. 7. Pearson; 2012.
  • Hubbard J. Calculation of partition functions. Physical Review Letters. 1959;3(2):77.
  • Johnson CR. Matrix Theory and Applications: Lecture Notes Prepared for the American Mathematical Society Short Course. Vol. 40. AMS Bookstore; 1990.
  • Johnstone IM. On the distribution of the largest eigenvalue in principal components analysis. The Annals of Statistics. 2001;29(2):295–327.
  • Jung S, Marron J. PCA consistency in high dimension, low sample size context. The Annals of Statistics. 2009;37(6B):4104–4130.
  • Ledoux M. The Concentration of Measure Phenomenon. Vol. 89. American Mathematical Society; 2001.
  • Liu H, Han F, Yuan M, Lafferty J, Wasserman L. High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics. 2012a;40(4):2293–2326.
  • Liu H, Han F, Zhang C-H. Transelliptical graphical models. Proceedings of the Twenty-fifthy Annual Conference on Neural Information Processing Systems; 2012b. pp. 809–817.
  • Liu H, Lafferty J, Wasserman L. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research. 2009;10:2295–2328.
  • Lounici K. High-dimensional covariance matrix estimation with missing observations. Bernoulli. 2013 In press.
  • Tropp JA. User-friendly tail bounds for sums of random matrices. 2010 arXiv preprint arXiv:1004.4389.
  • van de Geer S, Lederer J. The Bernstein-Orlicz norm and deviation inequalities. Probability Theory and Related Fields. 2013;157(1–2):225–250.
  • Vershynin R. Introduction to the non-asymptotic analysis of random matrices. 2010 arXiv preprint arXiv:1011.3027.
  • Vu V, Lei J. Minimax rates of estimation for sparse PCA in high dimensions. Journal of Machine Learning Research (AISTATS Track) 2012
  • Wegkamp M, Zhao Y. Analysis of elliptical copula correlation factor model with kendall’s tau. Personal Communication. 2013.
  • Xue L, Zou H. Regularized rank-based estimation of high-dimensional nonparanormal graphical models. The Annals of Statistics. 2012;40(5):2541–2571.
  • Yuan X, Zhang T. Truncated power method for sparse eigenvalue problems. Journal of Machine Learning Research. 2013;14:899–925.